Concatenate multiple tables with different variable names vertically to a big table

Hi,
I have 20 tables in the matlab workplace with the the same dimensions (1000x8). However, the variable names are not the same. How can i concatenate the 20 tables vertically? I do not need the variable names in the big table, I'm just interest in the values within the 20 tables...Please note that the first column is datetime in each table. This, among other aspects, creates difficulties for me. The datetimes are not necessary either.
Helpful for any advice.
Kind regards

7 Comments

Convert the tables to cell arrays then concatenate, then convert back to a table.
Or edit the Variables names to be same and then vertically concatenate.
Converting a table to a cell array is generally not a good idea. Each element of the table (not each variable in the table) becomes a separate cell. Memory issues aside (with 1000x8, maybe not a problem, but with 1000000x8, definitely), you lose the homogeneity of each variable in the table, and run the risk of ending up with some cells containing, e.g., numbers and others containing who knows what. And the operations you can perform on a cell array are limited. I realize that you've suggested converting back, but table2cell is a risky tool to use, and your second approach is the way to go.
@Peter Perkins - Yes, I agree that the operations that can be performed on a cell array are limited.
But doesn't MATLAB store tables as a cell arrays internally? (I remember the lengthy thread where Walter demonstrated this)
"but table2cell is a risky tool to use"
Could you elaborate on this?
I interpreted "convert table to cell array" as "call table2cell". I may have misunderstood.
Consider this:
X = rand(1000000,10);
T = array2table(X);
C = table2cell(T);
whos
Name Size Bytes Class Attributes
C 1000000x10 1120000000 cell
T 1000000x10 80002889 table
X 1000000x10 80000000 double
X stores 10M doubles as one contiguous array.
T stores 10 vectors, each 1M doubles. There are 11 arrays (the eleventh is the cell array that Walter referred to, but it's not visible and you shouldn't count on it).
C stores 10M arrays, each a scalar double. It's the memory of 10M array "headers" (something that's not visible, but that whos, for a cell array, accounts for). So when I said "table2cell is risky", I meant you may use up A LOT of memory you did not mean to.
"I interpreted "convert table to cell array" as "call table2cell". I may have misunderstood."
No, you were right to interpret it that way. That's what I meant.
Ok, yes, I kinda forgot that table2cell() converts to cell where each element of the table is stored as a scalar. And the overhead is quite hefty.
I have one more question - Why was table2cell() made to behave this way? Why not convert each column of table as a cell element? I feel that this is something that should have been discussed when the function was being written.
One way around this is to call TABLE2STRUCT with ToScalar=true (thus keeping the table columns/variables together), then STRUCT2CELL. I guess due to copy-on-write the conversion won't even copy any data in memory.
X = rand(1000000,10);
T = array2table(X);
C = struct2cell(table2struct(T,'ToScalar',true));
whos
Name Size Bytes Class Attributes C 10x1 80001040 cell T 1000000x10 80002889 table X 1000000x10 80000000 double
Given that tables apparently store the column/variable data in a cell array it would be nice to have a way to access that cell array efficiently, e.g. to have ToRowVector option for TABLE2CELL (much like TABLE2STRUCT has the ToScalar option). And also the corresponding FromRowVector for CELL2TABLE.
If you find yourself needing functions like these, you may be doing something unnecessary. In this case, the much more straight-forward solution is to just set the variables names all the same.

Sign in to comment.

 Accepted Answer

You can put all those tables into a cell array, then make all the tables in the cell array have the same set of variable names, and finally vertically concatenate the tables in the cell array into a single table.
Here's an example with two tables that have 5 variables each with different names.
% create two tables with 5 variables each:
Nvar = 5;
T1 = array2table(magic(Nvar))
T1 = 5×5 table
Var1 Var2 Var3 Var4 Var5 ____ ____ ____ ____ ____ 17 24 1 8 15 23 5 7 14 16 4 6 13 20 22 10 12 19 21 3 11 18 25 2 9
T2 = array2table(eye(Nvar),'VariableNames',"OtherVar"+(1:Nvar))
T2 = 5×5 table
OtherVar1 OtherVar2 OtherVar3 OtherVar4 OtherVar5 _________ _________ _________ _________ _________ 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
% put the tables in a cell array C:
C = {T1,T2};
% pick some new variable names (I'm using "1" to "5"):
new_names = string(1:Nvar);
% rename the variables in each table in C:
C = cellfun(@(t)renamevars(t,t.Properties.VariableNames,new_names),C,'uni',0);
% vertically concatenate the contents of C into one table:
T = vertcat(C{:})
T = 10×5 table
1 2 3 4 5 __ __ __ __ __ 17 24 1 8 15 23 5 7 14 16 4 6 13 20 22 10 12 19 21 3 11 18 25 2 9 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1

3 Comments

Thank you for the help! Works fine! A question though. Is it possible to do this automatically? Since I have ten tables I need to process each table separately. This is my code when I read the various excel files
% List all Excel files in the folder
excelFiles = dir(fullfile(folder_path, '*.xlsx'));
% Loop through each Excel file
for i = 1:length(excelFiles)
% Construct the full file path
file_path = fullfile(folder_path, excelFiles(i).name);
% Read the data from the Excel file and create a table
table_name = genvarname(excelFiles(i).name(1:end-5)); % Extracting the name without the extension
current_table = readtable(file_path);
% Remove the first four rows from the table
current_table(1:4, :) = [];
% Add a column with the file name to the table
current_table.FileName = repmat({table_name}, size(current_table, 1), 1);
% Assign the modified table to a variable named after the file name
assignin('base', table_name, current_table);
% Store the table in the cell array
data_tables{i} = current_table;
end

Sign in to comment.

More Answers (1)

Are you trying to combine rows of those tables based on the date and time information stored in the first variable? If so I would try using table2timetable to turn those tables into timetables then use synchronize to combine the timetables based on their row times.

Categories

Products

Release

R2020b

Asked:

on 15 Nov 2023

Commented:

on 20 Nov 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!