How to set variable names using Readtable with hierarchical categories (Excel file with merged cells / multiple header rows)

12 views (last 30 days)
I have an Excel file with a series of header rows containing a hierarchy of descriptors. I would like to use this existing hierarchy to access the data. In the image below, I have data for various engineering majors (electical, mechincal, ...) that are split by year of study (freshman, sophomore, ...), then by gender, and finally my domestic versus international. This heirachy repeats for each major and there are many rows of data corresponding to academic years (not pictured in the table below) going back a long way. Is there a way to use readtable or a similar function to load that data in such a way that I can retrieve a specific number. For example, can I import the data in such a way that I could retrieve it with something like the following pseudo-code
table = readtable('filename', <other options>)
table(major=electrical, gender=female, <additional options>)
% or even
table(row1='electrical', row2='female', <additional options>)
Alternatively, can I manipulate the way column names are created? WIthout any tweaking I get names like
'dometic', 'domestic_1', 'domestic_2', ...
What I would prefer is to get column names such as
'electrical-freshman-male-domestic'
'electrical-freshman-female-domestic'
'electrical-freshman-unspecified-domestic'
'electrical-sophomore-male-domestic'
...
and likewise for other combinations so that I drill down to the appropriate column of data.
  3 Comments
Stephen23
Stephen23 on 2 May 2025
That is a very unfortunate data design. Processing that data (in MS Excel, MATLAB, and most other tools) would likely be much simpler if the data was flattened into e.g. five columns (exactly as dpb suggested):
dpb
dpb on 3 May 2025
Edited: dpb on 3 May 2025
Indeed. Typing in those unique variable names invidually is going to get very old, very fast methinks...and not at all amenable to generalizing code; lots of copy 'n paste.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 4 May 2025
Edited: Stephen23 on 5 May 2025
In lieu of a sample data file I created my own (attached) which looks like this (not all columns shown):
If you must keep that hierarchy in the imported data then consider using nested structures:
fnm = 'hierHead.xlsx';
mat = readmatrix(fnm, 'Range',5);
hdr = readcell(fnm, 'Range','1:4', 'MergedCellColumnRule','duplicate');
hdr(:,2:end) = regexprep(hdr(:,2:end),'\W+','');
out = struct();
for k = 2:size(hdr,2)
out = setfield(out,hdr{:,k},mat(:,k));
end
All of the (non-year) data is stored in nested structures within OUT, for example:
out.ElectricalEngineering.Junior.Female.Domestic
ans = 9×1
151 152 153 154 155 156 157 158 159
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
But flattening the data and storing it in one table would likely make it much easier to analyze and process:
Year = mat(:,1);
cvn = {'Department','Class','Gender','Residency'};
out = num2cell(mat(:,2:end),1);
fnh = @(Data) table(Year,Data);
out = cellfun(fnh,out,'uni',0);
for ii = 1:numel(out)
for jj = 1:numel(cvn)
out{ii}{:,cvn{jj}} = hdr(jj,ii+1);
end
end
out = vertcat(out{:})
out = 432x6 table
Year Data Department Class Gender Residency ____ ____ _________________________ ____________ ________ _________________ 1990 11 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1991 12 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1992 13 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1993 14 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1994 15 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1995 16 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1996 17 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1997 18 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1998 19 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'Domestic' } 1990 21 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'} 1991 22 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'} 1992 23 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'} 1993 24 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'} 1994 25 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'} 1995 26 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'} 1996 27 {'ElectricalEngineering'} {'Freshman'} {'Male'} {'International'}
Which would allow you to use the very efficient inbuilt routines for processing groups in tables:
etc.
  1 Comment
dpb
dpb on 4 May 2025
Edited: dpb on 6 May 2025
Just came back to illustrate with @Cris LaPierre's example dataset and you had already done, @Stephen23...
I'd only add that for efficiency, then
out=convertvars(out,@iscellstr,'categorical');

Sign in to comment.

More Answers (1)

Cris LaPierre
Cris LaPierre on 2 May 2025
There may be a better way, but I would do this in 2 steps. First, read the headers and create variable names, then read the data and assign variable names.
Here's an example using a dummy data set I created. I used the underscore instead of the dash between names to simplify accessing the data (simplest with valid MATLAB variable names).
% create variable names that combine all headers
T = readcell("hierHead.xlsx","MergedCellColumnRule","duplicate","Range","1:4");
H = string(T);
H = erase(H,[" ","Engineering"]);
varNms = join(H,"_",1);
varNms(ismissing(varNms)) = H(1,ismissing(varNms))
varNms = 1x49 string array
Columns 1 through 7 "Year" "Electrical_Fresh..." "Electrical_Fresh..." "Electrical_Fresh..." "Electrical_Fresh..." "Electrical_Fresh..." "Electrical_Fresh..." Columns 8 through 14 "Electrical_Sopho..." "Electrical_Sopho..." "Electrical_Sopho..." "Electrical_Sopho..." "Electrical_Sopho..." "Electrical_Sopho..." "Electrical_Junio..." Columns 15 through 21 "Electrical_Junio..." "Electrical_Junio..." "Electrical_Junio..." "Electrical_Junio..." "Electrical_Junio..." "Electrical_Senio..." "Electrical_Senio..." Columns 22 through 28 "Electrical_Senio..." "Electrical_Senio..." "Electrical_Senio..." "Electrical_Senio..." "Mechanical_Fresh..." "Mechanical_Fresh..." "Mechanical_Fresh..." Columns 29 through 35 "Mechanical_Fresh..." "Mechanical_Fresh..." "Mechanical_Fresh..." "Mechanical_Sopho..." "Mechanical_Sopho..." "Mechanical_Sopho..." "Mechanical_Sopho..." Columns 36 through 42 "Mechanical_Sopho..." "Mechanical_Sopho..." "Mechanical_Junio..." "Mechanical_Junio..." "Mechanical_Junio..." "Mechanical_Junio..." "Mechanical_Junio..." Columns 43 through 49 "Mechanical_Junio..." "Mechanical_Senio..." "Mechanical_Senio..." "Mechanical_Senio..." "Mechanical_Senio..." "Mechanical_Senio..." "Mechanical_Senio..."
% Read the data
D = readtable("hierHead.xlsx","NumHeaderLines",4,"ReadVariableNames",false);
D.Properties.VariableNames = varNms
D = 19x49 table
Year Electrical_Freshman_Male_Domestic Electrical_Freshman_Male_International Electrical_Freshman_Female_Domestic Electrical_Freshman_Female_International Electrical_Freshman_Unspecified_Domestic Electrical_Freshman_Unspecified_International Electrical_Sophomore_Male_Domestic Electrical_Sophomore_Male_International Electrical_Sophomore_Female_Domestic Electrical_Sophomore_Female_International Electrical_Sophomore_Unspecified_Domestic Electrical_Sophomore_Unspecified_International Electrical_Junior_Male_Domestic Electrical_Junior_Male_International Electrical_Junior_Female_Domestic Electrical_Junior_Female_International Electrical_Junior_Unspecified_Domestic Electrical_Junior_Unspecified_International Electrical_Senior_Male_Domestic Electrical_Senior_Male_International Electrical_Senior_Female_Domestic Electrical_Senior_Female_International Electrical_Senior_Unspecified_Domestic Electrical_Senior_Unspecified_International Mechanical_Freshman_Male_Domestic Mechanical_Freshman_Male_International Mechanical_Freshman_Female_Domestic Mechanical_Freshman_Female_International Mechanical_Freshman_Unspecified_Domestic Mechanical_Freshman_Unspecified_International Mechanical_Sophomore_Male_Domestic Mechanical_Sophomore_Male_International Mechanical_Sophomore_Female_Domestic Mechanical_Sophomore_Female_International Mechanical_Sophomore_Unspecified_Domestic Mechanical_Sophomore_Unspecified_International Mechanical_Junior_Male_Domestic Mechanical_Junior_Male_International Mechanical_Junior_Female_Domestic Mechanical_Junior_Female_International Mechanical_Junior_Unspecified_Domestic Mechanical_Junior_Unspecified_International Mechanical_Senior_Male_Domestic Mechanical_Senior_Male_International Mechanical_Senior_Female_Domestic Mechanical_Senior_Female_International Mechanical_Senior_Unspecified_Domestic Mechanical_Senior_Unspecified_International ____ _________________________________ ______________________________________ ___________________________________ ________________________________________ ________________________________________ _____________________________________________ __________________________________ _______________________________________ ____________________________________ _________________________________________ _________________________________________ ______________________________________________ _______________________________ ____________________________________ _________________________________ ______________________________________ ______________________________________ ___________________________________________ _______________________________ ____________________________________ _________________________________ ______________________________________ ______________________________________ ___________________________________________ _________________________________ ______________________________________ ___________________________________ ________________________________________ ________________________________________ _____________________________________________ __________________________________ _______________________________________ ____________________________________ _________________________________________ _________________________________________ ______________________________________________ _______________________________ ____________________________________ _________________________________ ______________________________________ ______________________________________ ___________________________________________ _______________________________ ____________________________________ _________________________________ ______________________________________ ______________________________________ ___________________________________________ 1990 0.82512 0.78267 0.50915 0.39753 0.26743 0.17232 0.15144 0.64288 0.68656 0.23464 0.039588 0.21205 0.93701 0.90131 0.31666 0.97826 0.80373 0.89721 0.7915 0.38041 0.58364 0.46538 0.10598 0.43168 0.41302 0.84744 0.78618 0.35872 0.79838 0.17831 0.79529 0.29643 0.61139 0.95578 0.072898 0.18101 0.016447 0.67959 0.19957 0.89132 0.86831 0.92052 0.53564 0.71479 0.36787 0.97786 0.29973 0.99646 1991 0.36499 0.50844 0.099762 0.94665 0.59379 0.50436 0.3379 0.63741 0.66871 0.19614 0.8787 0.92028 0.46331 0.68441 0.91427 0.45594 0.066546 0.2119 0.70773 0.83506 0.49742 0.21702 0.97537 0.36453 0.045695 0.99532 0.71756 0.90853 0.29609 0.34201 0.12176 0.58305 0.55403 0.023444 0.15205 0.42393 0.98401 0.75749 0.4069 0.60612 0.70031 0.72628 0.20663 0.37473 0.21994 0.82508 0.60704 0.80611 1992 0.66878 0.54892 0.87242 0.2767 0.41184 0.35512 0.5291 0.041398 0.3042 0.79475 0.83048 0.064178 0.37132 0.34124 0.63594 0.2537 0.21315 0.61917 0.044526 0.012079 0.82558 0.6292 0.22512 0.44535 0.16527 0.8149 0.27949 0.44477 0.49864 0.50414 0.36724 0.24696 0.97724 0.65522 0.74158 0.22879 0.4215 0.30945 0.95284 0.15506 0.0075198 0.18381 0.51964 0.746 0.95002 0.10786 0.78777 0.22038 1993 0.15787 0.60315 0.12515 0.23183 0.43836 0.46658 0.097691 0.55479 0.26144 0.18716 0.81963 0.45767 0.98965 0.80697 0.197 0.16901 0.8064 0.8385 0.55436 0.84738 0.15679 0.073237 0.30943 0.39855 0.30856 0.90091 0.33364 0.759 0.1725 0.14003 0.24004 0.0078137 0.4522 0.86567 0.1455 0.55247 0.96247 0.34221 0.26277 0.62835 0.3361 0.76952 0.72094 0.44897 0.78224 0.10892 0.28969 0.036133 1994 0.48205 0.2447 0.94966 0.88506 0.20088 0.32552 0.44332 0.80385 0.86339 0.72825 0.033694 0.42427 0.37457 0.84234 0.11828 0.61242 0.82171 0.81283 0.11541 0.073686 0.42315 0.98193 0.095135 0.095991 0.24811 0.24644 0.91394 0.71961 0.46448 0.53789 0.29168 0.28198 0.93329 0.65567 0.32048 0.63332 0.70588 0.71266 0.88985 0.91285 0.48824 0.41772 0.44093 0.9503 0.40887 0.16851 0.64332 0.61363 1995 0.55244 0.52187 0.23351 0.96077 0.52883 0.44635 0.2387 0.4808 0.81223 0.75437 0.32965 0.45263 0.89608 0.1871 0.36762 0.98423 0.97321 0.27165 0.53556 0.86962 0.1236 0.70078 0.62588 0.67483 0.045114 0.11142 0.87904 0.82902 0.61459 0.032371 0.05777 0.35375 0.032176 0.86429 0.43846 0.44867 0.54241 0.40367 0.15312 0.40803 0.91071 0.24825 0.8769 0.8001 0.84253 0.80514 0.46434 0.17695 1996 0.87238 0.094371 0.57317 0.42577 0.89433 0.068117 0.75801 0.72346 0.079957 0.66714 0.64628 0.91385 0.41877 0.27139 0.23085 0.21092 0.8759 0.49093 0.66588 0.34272 0.66619 0.33652 0.75108 0.87092 0.64349 0.057792 0.61721 0.91952 0.33395 0.72714 0.43055 0.41956 0.21084 0.21823 0.51615 0.41659 0.17926 0.022884 0.56003 0.30691 0.37919 0.069978 0.025002 0.29509 0.81851 0.028387 0.54359 0.29817 1997 0.70486 0.83104 0.5644 0.88504 0.26145 0.13781 0.18327 0.99033 0.081003 0.55415 0.38559 0.019403 0.61598 0.94858 0.066764 0.76926 0.8002 0.20367 0.57328 0.81929 0.39293 0.812 0.5527 0.17528 0.72545 0.58072 0.36541 0.82325 0.32379 0.41548 0.16604 0.37801 0.10824 0.29444 0.744 0.31844 0.41784 0.034818 0.69086 0.22978 0.33338 0.80133 0.73285 0.89409 0.24927 0.51505 0.18502 0.39573 1998 0.3034 0.67519 0.33307 0.038338 0.89854 0.91596 0.80999 0.49586 0.3408 0.69358 0.18906 0.11525 0.67571 0.24928 0.85617 0.9261 0.62678 0.68851 0.033672 0.078174 0.43894 0.21662 0.64097 0.8517 0.30689 0.1159 0.33774 0.99746 0.050749 0.15695 0.018406 0.53235 0.46456 0.95329 0.46721 0.55981 0.42493 0.64136 0.97849 0.46878 0.68805 0.80671 0.7693 0.2766 0.36889 0.30159 0.88479 0.14429 1999 0.51885 0.7822 0.8718 0.83821 0.34093 0.49248 0.81068 0.27452 0.19388 0.082377 0.21388 0.60686 0.54174 0.039243 0.13695 0.68766 0.12071 0.61572 0.42903 0.96018 0.58776 0.88351 0.41622 0.50952 0.59446 0.34684 0.904 0.16056 0.53255 0.5967 0.93404 0.26255 0.48301 0.30569 0.24822 0.14738 0.43934 0.43511 0.1868 0.81086 0.88385 0.58415 0.38419 0.68113 0.34531 0.27472 0.16601 0.50847 2000 0.84217 0.15489 0.94264 0.89512 0.8079 0.5356 0.74946 0.5554 0.36335 0.5186 0.39424 0.77572 0.89961 0.94907 0.44989 0.91345 0.58184 0.016635 0.48453 0.6914 0.11794 0.26826 0.22158 0.82138 0.29474 0.2142 0.48889 0.44215 0.80556 0.33823 0.57333 0.13206 0.6206 0.13225 0.4453 0.51617 0.67215 0.5524 0.94336 0.6889 0.93239 0.75244 0.071508 0.31044 0.55256 0.23457 0.77615 0.57046 2001 0.14795 0.7309 0.45623 0.80185 0.87418 0.61292 0.88616 0.36865 0.19615 0.7757 0.52935 0.010001 0.25692 0.72877 0.32129 0.98995 0.97829 0.50388 0.14527 0.12935 0.32459 0.93193 0.72906 0.27755 0.75065 0.37964 0.97785 0.017558 0.34185 0.17775 0.99651 0.95002 0.3966 0.53596 0.017218 0.30847 0.090967 0.058832 0.35689 0.16327 0.5656 0.089457 0.93066 0.56682 0.27226 0.78406 0.39772 0.40753 2002 0.9749 0.85883 0.15255 0.66029 0.87383 0.17436 0.29177 0.10799 0.17232 0.62182 0.2119 0.13272 0.78882 0.96222 0.21 0.2278 0.51882 0.59303 0.70576 0.96476 0.5872 0.57689 0.51545 0.33597 0.7558 0.83687 0.63966 0.98211 0.61378 0.026278 0.15287 0.79612 0.19472 0.40253 0.88561 0.18327 0.61061 0.87097 0.85266 0.088194 0.24101 0.43888 0.61058 0.6309 0.25078 0.05485 0.80739 0.80857 2003 0.9088 0.91561 0.091528 0.89033 0.11745 0.76991 0.34158 0.31264 0.12642 0.42046 0.33401 0.10597 0.64757 0.22378 0.067714 0.96688 0.50892 0.89611 0.20859 0.79804 0.91815 0.022013 0.088396 0.56556 0.96917 0.53804 0.56276 0.44936 0.46878 0.17235 0.81471 0.13418 0.29375 0.64198 0.24389 0.79869 0.61115 0.60982 0.51437 0.38435 0.2661 0.8388 0.42999 0.96353 0.36878 0.47319 0.73804 0.90176 2004 0.52399 0.99063 0.95775 0.77195 0.14655 0.48942 0.999 0.75323 0.38634 0.90244 0.15092 0.59502 0.084557 0.5844 0.3482 0.12226 0.039125 0.80138 0.80376 0.058658 0.048616 0.54448 0.32696 0.10214 0.44193 0.42905 0.96886 0.83527 0.49083 0.89146 0.48968 0.21296 0.062806 0.67799 0.081267 0.93008 0.73602 0.49318 0.78987 0.45848 0.2402 0.13975 0.28827 0.88778 0.46759 0.20584 0.37363 0.7283 2005 0.74634 0.32467 0.93116 0.83892 0.93106 0.10606 0.77576 0.71107 0.19385 0.067241 0.29217 0.68057 0.014981 0.84311 0.55559 0.81001 0.93351 0.64989 0.55738 0.088996 0.6587 0.18422 0.32915 0.89101 0.074852 0.54196 0.17692 0.79831 0.26438 0.88586 0.76092 0.51649 0.59238 0.97776 0.90431 0.88055 0.30202 0.74795 0.72841 0.7026 0.57012 0.50334 0.87912 0.1649 0.14082 0.92406 0.09466 0.47128
Now you can access the data the way you wanted:
D.Electrical_Freshman_Female_International
ans = 19×1
0.3975 0.9467 0.2767 0.2318 0.8851 0.9608 0.4258 0.8850 0.0383 0.8382
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!