- Flatten the table to 2D with the header meta data as actual data so the column headings would simply be Major, Class, Gender, Nationality or
- You could try the MATLAB map container which uses key value indexing...but would still be the above keys.
How to set variable names using Readtable with hierarchical categories (Excel file with merged cells / multiple header rows)
12 views (last 30 days)
Show older comments
I have an Excel file with a series of header rows containing a hierarchy of descriptors. I would like to use this existing hierarchy to access the data. In the image below, I have data for various engineering majors (electical, mechincal, ...) that are split by year of study (freshman, sophomore, ...), then by gender, and finally my domestic versus international. This heirachy repeats for each major and there are many rows of data corresponding to academic years (not pictured in the table below) going back a long way. Is there a way to use readtable or a similar function to load that data in such a way that I can retrieve a specific number. For example, can I import the data in such a way that I could retrieve it with something like the following pseudo-code
table = readtable('filename', <other options>)
table(major=electrical, gender=female, <additional options>)
% or even
table(row1='electrical', row2='female', <additional options>)
Alternatively, can I manipulate the way column names are created? WIthout any tweaking I get names like
'dometic', 'domestic_1', 'domestic_2', ...
What I would prefer is to get column names such as
'electrical-freshman-male-domestic'
'electrical-freshman-female-domestic'
'electrical-freshman-unspecified-domestic'
'electrical-sophomore-male-domestic'
...
and likewise for other combinations so that I drill down to the appropriate column of data.

3 Comments
Stephen23
on 2 May 2025
That is a very unfortunate data design. Processing that data (in MS Excel, MATLAB, and most other tools) would likely be much simpler if the data was flattened into e.g. five columns (exactly as dpb suggested):
Accepted Answer
Stephen23
on 4 May 2025
Edited: Stephen23
on 5 May 2025
In lieu of a sample data file I created my own (attached) which looks like this (not all columns shown):

If you must keep that hierarchy in the imported data then consider using nested structures:
fnm = 'hierHead.xlsx';
mat = readmatrix(fnm, 'Range',5);
hdr = readcell(fnm, 'Range','1:4', 'MergedCellColumnRule','duplicate');
hdr(:,2:end) = regexprep(hdr(:,2:end),'\W+','');
out = struct();
for k = 2:size(hdr,2)
out = setfield(out,hdr{:,k},mat(:,k));
end
All of the (non-year) data is stored in nested structures within OUT, for example:
out.ElectricalEngineering.Junior.Female.Domestic
But flattening the data and storing it in one table would likely make it much easier to analyze and process:
Year = mat(:,1);
cvn = {'Department','Class','Gender','Residency'};
out = num2cell(mat(:,2:end),1);
fnh = @(Data) table(Year,Data);
out = cellfun(fnh,out,'uni',0);
for ii = 1:numel(out)
for jj = 1:numel(cvn)
out{ii}{:,cvn{jj}} = hdr(jj,ii+1);
end
end
out = vertcat(out{:})
Which would allow you to use the very efficient inbuilt routines for processing groups in tables:
https://www.mathworks.com/help/matlab/matlab_prog/grouped-calculations-in-tables-and-timetables.html
etc.
1 Comment
dpb
on 4 May 2025
Edited: dpb
on 6 May 2025
Just came back to illustrate with @Cris LaPierre's example dataset and you had already done, @Stephen23...
I'd only add that for efficiency, then
out=convertvars(out,@iscellstr,'categorical');
More Answers (1)
Cris LaPierre
on 2 May 2025
There may be a better way, but I would do this in 2 steps. First, read the headers and create variable names, then read the data and assign variable names.
Here's an example using a dummy data set I created. I used the underscore instead of the dash between names to simplify accessing the data (simplest with valid MATLAB variable names).
% create variable names that combine all headers
T = readcell("hierHead.xlsx","MergedCellColumnRule","duplicate","Range","1:4");
H = string(T);
H = erase(H,[" ","Engineering"]);
varNms = join(H,"_",1);
varNms(ismissing(varNms)) = H(1,ismissing(varNms))
% Read the data
D = readtable("hierHead.xlsx","NumHeaderLines",4,"ReadVariableNames",false);
D.Properties.VariableNames = varNms
D.Electrical_Freshman_Female_International
0 Comments
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!