Create tables based on partial match in variable names in other table

Hi Matlab-gurus,
I have a question on creating new tables based on partial matches in variable names from an original table. I'm trying with the strcmp-function and and eval-function. I know this is not optimal but it's my best shot for now. See below for my attached code. matchVals are the partial matches in the variable names that I consider when creating the new tables. The matchvals are on position 5 to 8 (see strcat).
Thanks in advance.
names5 = Table_A.Properties.VariableNames;%1
names6 = Table_B.Properties.VariableNames;%1
matchVals = {'1234', '5678','9101','1123'}; %part of the variable names that match => four new tables
numTables = numel(matchVals);
%%
tableNames_ = cell(numTables,1);
for k = 1:numel(matchVals)
idl5 = cellfun(@(x) strcmp(x(5),matchVals{k})... %assessing position 5 to 10 for correct variable names
&&strcmp(x(6),matchVals{k})...
&&strcmp(x(7),matchVals{k})...
&&strcmp(x(8),matchVals{k}),names5);
idl6 = cellfun(@(x) strcmp(x(5),matchVals{k})...%assessing position 5 to 10 for correct variable names
&&strcmp(x(6),matchVals{k})...
&&strcmp(x(7),matchVals{k})...
&&strcmp(x(8),matchVals{k}),names6);
%
eval(['Summary',matchVals{k},' = [Table_A(:,idl5) Table_B(:,idl6)]']);
tableNames_EC2217{k} = ['Summary',matchVals{k}]; %trying to create new table
end

 Accepted Answer

It's not clear why you want to split a single table into 4 tables, but here's one way:
T = readtable('m.xlsx');
head(T)
ans = 8×8 table
absd1234xs absd5678xs aood9101xs aood1123xs adad1234xs aiid5678xs adsd9101xs addd1123xs __________ __________ __________ __________ __________ __________ __________ __________ 1 2 3 4 5 6 8 1 3 2 2 312 1 2 1 5 4 2 1 2 1 1 2 6 5.6667 2 0 104 -1.6667 -2 -2.3333 9 7.1667 2 -1 103 -3.6667 -4.5 -5.3333 11.5 8.6667 2 -2 102 -5.6667 -7 -8.3333 14 10.167 2 -3 101 -7.6667 -9.5 -11.333 16.5 11.667 2 -4 100 -9.6667 -12 -14.333 19
matchVals = {'1234', '5678','9101','1123'};
n_match = numel(matchVals);
T_new = cell(1,n_match);
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
T_new{k} = T(:,idx);
end
T_new % cell array of tables
T_new = 1×4 cell array
{10000×2 table} {10000×2 table} {10000×2 table} {10000×2 table}
head(T_new{1})
ans = 8×2 table
absd1234xs adad1234xs __________ __________ 1 5 3 1 4 1 5.6667 -1.6667 7.1667 -3.6667 8.6667 -5.6667 10.167 -7.6667 11.667 -9.6667
head(T_new{2})
ans = 8×2 table
absd5678xs aiid5678xs __________ __________ 2 6 2 2 2 1 2 -2 2 -4.5 2 -7 2 -9.5 2 -12
It would be better to index into the existing table:
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
% do what you need to do with T(:,idx)
end

11 Comments

Thank you so much for the answer, exactly what i was looking for.
Kind regards
I have a small thought. The output now is a 1x4 cell, is it possible to get 4 matrices as the output from the for loop above? I tried with cell2mat but I get an error each time...
More specifically, I tried with:
T_new{k} = cell2mat(T_new{1,1:k})
at the end of the for loop you presented above. In addition, is it possible to connect the matchVals with the "new" table names?
Thank you
is it possible to get 4 matrices as the output from the for loop above?
Can you do this, creating dynamically named variables? Yes.
Should you do this? The general consensus is no. See that Answers post for an explanation and alternatives.
I know it is not optimal. However, I have a similar table with a significant number of columns (>500) in which I want to compare two rows simultanously which gives me more than 250 combinations in total. Furthermore, by this comparison I can visualize the data set and thereafter perform necessary mathematical operations...
"is it possible to get 4 matrices as the output ... [and] connect the matchVals with the "new" table names?"
Maybe use a struct array:
T = readtable('m.xlsx');
matchVals = {'1234', '5678','9101','1123'};
n_match = numel(matchVals);
S = struct('name',matchVals,'data',cell(1,n_match));
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
S(k).data = table2array(T(:,idx));
end
S
S = 1×4 struct array with fields:
name data
S(1)
ans = struct with fields:
name: '1234' data: [10000×2 double]
S(1).data
ans = 10000×2
1.0000 5.0000 3.0000 1.0000 4.0000 1.0000 5.6667 -1.6667 7.1667 -3.6667 8.6667 -5.6667 10.1667 -7.6667 11.6667 -9.6667 13.1667 -11.6667 14.6667 -13.6667
Thank you for the reply. It is very close to what I want to get. Is it possible to get 4 separate structs/tables as direct output from the loop itself? The structs/tables should be based on matchVals = {'1234', '5678','9101','1123'} and each contain their data values according to your comment above?
The reason for my question is that I have a huge file with hundreds of columns that I will need to process several times. Hence, in terms of time-effectiveness, I believe it is best to do it within the loop.
Thank u again
"Is it possible to get 4 separate structs/tables as direct output from the loop itself?"
Yes, my original answer showed how to get 4 separate tables - all contained in a cell array - and my most recent comment showed how to get 4 separate structs - all contained in a struct array.
If you are under the impression that a few indexing operations, e.g., having to do S(1) instead of S1, is going to have an impact on the time it takes the code to run, don't worry about that, it won't.
In fact, it may be that the most efficient approach is what was stated at the end of my original answer, which is to avoid splitting apart the table at all, and operate on it directly (thus avoiding making a copy of the whole thing):
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
% do what you need to do with T(:,idx),
% which is operating on T directly, within the loop
end
I see. I think I explained my dilemma insufficiently. Your first example creates 4 separate tables perfectly as you say (in a cell array). And your second example in a struct array.
I understand that computational effort won't be significantly affected by indexing operations.
I think I'm doing the process a bit more difficult than it should be. I'm not so common working with cells since I'm quite a beginner. But probably it is easier to do the mathematical operation just within the cell according to your latest suggestion. I guess it is not any more difficult than normal table operation in a loop (what i saw on mathworks at least), as well as plotting. However, I tried to add the values within the same "matchvals" and could not make it to work. Hence, my effort to divide the cell within the loop.
A last question, in order to add the values within two columns in the table "T" that corresond to the same matchvals- is it something according to the code below. This should correspond to four new columns from the original eigh(since I'm adding the values of the sum from two columns). I get the error "uncrecognized table vairable name "properties")
Thank you
T.sum=T.Properties(:,idx)+EC2217_T.Properties(:,idx)
T.Properties is a TableProperties object, essentially a scalar struct, so it doesn't make sense to try to index its columns with T.Properties(:,idx).
T = readtable('m.xlsx');
T.Properties
ans =
TableProperties with properties: Description: '' UserData: [] DimensionNames: {'Row' 'Variables'} VariableNames: {'absd1234xs' 'absd5678xs' 'aood9101xs' 'aood1123xs' 'adad1234xs' 'aiid5678xs' 'adsd9101xs' 'addd1123xs'} VariableDescriptions: {} VariableUnits: {} VariableContinuity: [] RowNames: {} CustomProperties: No custom properties are set. Use addprop and rmprop to modify CustomProperties.
You can sum all the columns of T with variable names matching each element of matchVals, by doing sum(T{:,idx},2) (here I'm storing the sums in the struct array and also in new columns of T - two different options):
matchVals = {'1234', '5678','9101','1123'};
n_match = numel(matchVals);
S = struct('name',matchVals,'data',cell(1,n_match));
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
% sum all columns of T matching matchVals{k},
% store in data field of S(k):
S(k).data = sum(T{:,idx},2);
% --- or ---
% sum all columns of T matching matchVals{k},
% store in a new column of T called "sum_1234", etc.:
T.(['sum_' matchVals{k}]) = sum(T{:,idx},2);
end
head(T) % now T has new columns, sum_1234, etc.
ans = 8×12 table
absd1234xs absd5678xs aood9101xs aood1123xs adad1234xs aiid5678xs adsd9101xs addd1123xs sum_1234 sum_5678 sum_9101 sum_1123 __________ __________ __________ __________ __________ __________ __________ __________ ________ ________ ________ ________ 1 2 3 4 5 6 8 1 6 8 11 5 3 2 2 312 1 2 1 5 4 4 3 317 4 2 1 2 1 1 2 6 5 3 3 8 5.6667 2 0 104 -1.6667 -2 -2.3333 9 4 0 -2.3333 113 7.1667 2 -1 103 -3.6667 -4.5 -5.3333 11.5 3.5 -2.5 -6.3333 114.5 8.6667 2 -2 102 -5.6667 -7 -8.3333 14 3 -5 -10.333 116 10.167 2 -3 101 -7.6667 -9.5 -11.333 16.5 2.5 -7.5 -14.333 117.5 11.667 2 -4 100 -9.6667 -12 -14.333 19 2 -10 -18.333 119
S(1).data % now each S(k).data is a single column
ans = 10000×1
6.0000 4.0000 5.0000 4.0000 3.5000 3.0000 2.5000 2.0000 1.5000 1.0000
thank you. This is preferrable compared to working with tons of tables. I even managed to include the matching variables as headings in the corresponding plots. Not the easiest to do include legends for the columns in each plot haha, but I will manage

Sign in to comment.

More Answers (1)

Convert your cellstrs to strings
string(t.Properties.VariableNames)
Then you can use any of the easy string matching functions like matches startsWith or any of the patterns.

2 Comments

I changed the cellstr to strings but cannot manage to identify the patters. Do you maybe know how i could do this within my loop attached above?
My four table outputs are of type: 10 000×0 empty table (i.e. the rows are empty which they should not be)
I attached the file I'm working with. I want to create four new columns based on the coloring of the variables (and the matches are based on the strings in bold)

Sign in to comment.

Categories

Products

Release

R2022a

Asked:

on 6 May 2022

Commented:

on 15 May 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!