Copy and re-arrange data from one column into another

Hi
I have a table with a column of data all like this - Extract clusters: FCL (2 files) | CG_MLD_11
I'd like to copy a part of this data into another column but in this structure - CG_MLD_11_FCL
Also in another table I have data in a column like this - Perm t-test equal [100ms,250ms 100, 101]: Avg: 11_right | bl | _HLG_MLD (20) vs. Avg: 11_right | bl | _CG (15) | FCL
I similarly would like to copy some of this data into another column in the same format as above - CG_MLD_11_FCL
The content varies but the structure is the same e.g. CG might be MOD, or FCL might be CPL, or 11 might be 21, etc. but all in the same format as above.
Can anyone advise how I do this?

11 Comments

Show us an example (attach a working small file would be best) don't just try to tell us.
One thing is that you can't copy just "some of this data into another colum" -- an array has to be regular and therefore will have to have something for the entire column, not just a piece of it.
I attach a sample copy of the columns from the two tables. In this sample I have data in the first and second columns for 50 rows. I am looking to get part of this data from the first column into the third column in the format described above (CG_MLD_11_FCL). And the same from the second column into the fourth column. Some of the first two variables (CG and MLD) have subgroups and will have an additional tag - CG_LOW or MLD_HGH.
It looks like you either you are looking for string manipulation, e.g., literally change
"Extract clusters: FCL (2 files) | CG_MLD_11 "
to
"CG_MLD_11_FCL"
Or you actually want to parse the data in those strings e.g.
"CG_MLD_11"
to
["CG","MLD","11"]
and organize them into named columns (table).
Which is it? or is it something else?
The first one. I want "CG_MLD_11_FCL" in one column.
All the data that I want in these two columns are some mix of the data below. Is there some way to match these variables with a column in the table and place these matches in a new column in this format and without the other data- "CG_MLD_11_FCL"?
a = ["CG", "CG_LOW", "CG_HGH", "MLD", "MLD_LOW", "MLD_HGH", "MOD", "MOD_LOW", "MOD_HGH"];
b = ["11", "12", "21", "22"];
c = ["FCL", "FCR", "CPL", "CPR", "POL", "POR"];
It's still not at all clear to me what you really want, but doe:
Tsample.Var3=extractAfter(Tsample.Var1,'|');
yielding
>> Tsample(1:10,[1 3])
ans =
10×2 table
Var1 Var3
______________________________________________________ _____________________
{'Extract clusters: CPL (2 files) | CG_LOW_CG_HGH_11'} {' CG_LOW_CG_HGH_11'}
{'Extract clusters: CPL (2 files) | CG_LOW_CG_HGH_12'} {' CG_LOW_CG_HGH_12'}
{'Extract clusters: CPL (2 files) | CG_LOW_CG_HGH_22'} {' CG_LOW_CG_HGH_22'}
{'Extract clusters: CPL (2 files) | CG_MLD_11' } {' CG_MLD_11' }
{'Extract clusters: CPL (2 files) | CG_MLD_12' } {' CG_MLD_12' }
{'Extract clusters: CPL (2 files) | CG_MLD_21' } {' CG_MLD_21' }
{'Extract clusters: CPL (2 files) | CG_MLD_22' } {' CG_MLD_22' }
{'Extract clusters: CPL (2 files) | CG_MLD_HGH_11' } {' CG_MLD_HGH_11' }
{'Extract clusters: CPL (2 files) | CG_MLD_HGH_12' } {' CG_MLD_HGH_12' }
{'Extract clusters: CPL (2 files) | CG_MLD_HGH_21' } {' CG_MLD_HGH_21' }
>>
hit the mark?
Or, something more on the order of
>> Tnew=array2table(strtrim(split(tmp.Var1,'|')))
Tnew =
8×2 table
Var1 Var2
___________________________________ _____________________
{'Extract clusters: CPL (2 files)'} {'CG_LOW_CG_HGH_11'}
{'Extract clusters: CPL (2 files)'} {'CG_LOW_CG_HGH_12'}
{'Extract clusters: CPL (2 files)'} {'CG_LOW_CG_HGH_22'}
{'Extract clusters: CPL (2 files)'} {'CG_MLD_11' }
{'Extract clusters: CPL (2 files)'} {'CG_MLD_12' }
{'Extract clusters: CPL (2 files)'} {'CG_MLD_21' }
{'Extract clusters: CPL (2 files)'} {'CG_MLD_22' }
{'Extract clusters: CPL (2 files)'} {'CG_MLD_HGH_11' }
>>
?
Thank you. This is a table of statistical results. What I'm looking for is to create a column with a list of names of the tests in the structure of "CG_MLD_11_FCL", where CG and MLD are two groups that were compared, 11 is the outcome, and FCL is the region of interest (this was a neuro-imaging study). If you look at the a, b and c variables I outlined above you will see all variations of these three factors. In the above examples you gave me, you extracted the groups that were compared and the outcomes, but not the region of interest (CPL). I'm looking to do this for the two columns in the example attached. These two columns are from two separate tables.
Well, that's about the most convoluted mess of stuff in one place I've seen for a while... :)
Unfortunately, a way to make things (perhaps) somewhat similar doesn't seem viable since
>> table(strtrim(split(Tsample.Var2,'|')))
Error using split (line 99)
Element 9 of the text contains 5 delimiters while the previous elements have 7. All elements must contain the same number of delimiters.
>>
If they had produced a regular field where all rows in the table in that second variable had at least the same number of fields (even if some were empty), then one could pull the field belonging to the ROI column from it by position.
Failing that, pretty much left to using ismember to find the various pieces of the components and then use that index into the population table (your a,b,c above) to see which of those it is that particular search located.
Undoubtedly a regular expressions whizard could do great things here, but that isn't me, sorry...
I worked it out using a combination of insert (/) and split functions. Not the most elegant solution but appropriate for a convoluted mess :)
dpb, I don't know if this is what you had in mind but:
TSplit = array2table(strtrim(split(Tsample.Var1,'|')))
parts = cellfun(@(s)split(s,"_"),TSplit.Var2,'UniformOutput',false)
gives the broken up pieces in a cell array of cell arrays.
I would think for purposes of this problem that's still not helpful because not only is the demarcator symbol "_", but also some of the experiment identifiers contain the same character within them, e.g., CH_MLD. I guess if it is important to actually split the string into the parts, you'd have to use a matching strategy as you identified, and I guess regexp could work although as you mentioned it would be pretty challenging to do it in one go.

Sign in to comment.

 Accepted Answer

Is this what you are looking for?
load("~/Downloads/Sample_table_MatLab.mat")
a = ["CG", "CG_LOW", "CG_HGH", "MLD", "MLD_LOW", "MLD_HGH", "MOD", "MOD_LOW", "MOD_HGH"];
b = ["11", "12", "21", "22"];
c = ["FCL", "FCR", "CPL", "CPR", "POL", "POR"];
TSplit = array2table(strtrim(split(Tsample.Var1,'|')))
for i = 1:height(TSplit)
matches = regexp(TSplit.Var1{i},cellstr(c),"match");
TSplit.Var3(i) = matches{~cellfun('isempty',matches)};
end
TSplit.Var4 = string(TSplit.Var2)+"_"+string(TSplit.Var3)

2 Comments

Yes that was it! Thank you.
Ok, glad that was it. But the more general problem if things to the right of "|" had not already been in order is circumvented here, which only extracts one of the substrings specified in "c" from the left side of "|" and joins it to the right side with "_".

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!