How to read Excel files with unknown number of header rows?

Below is my code to read an Excel file using readtable:
T1 = readtable ('test.xlsx','PreserveVariableNames',true);
Headers = T1.Properties.VariableNames;
A = T1{:,1};
Here is the problem. My Excel file could have unknow number of header rows (from 1 to 20). It seems that (a) the Headers are always the first row, and (b) the A values always start from the first all numerical row.
What I need is the Row # of the A values. If I know there is only one header row, I know the first element of A starts from Row # 2. With unknown number of header rows, how do I derive that Row # info of A?
Thanks!

6 Comments

detectImportOptions can often figure it out. Not always though.
Many thanks! Would you please explain how exactly?
In the attached example, the below gives me an S1 of "A2", instead of the correct answer of "A3". The 2nd row is my unit row. It is very common for people to put units in the 2nd row.
opt = detectImportOptions('example.xlsx');
S1 = opt.DataRange;
If you use
opt = detectImportOptions('example.xlsx', 'DataRange', 3, 'VariableNamesRange', 1, 'DataRange', 3, 'VariableUnitsRange', 2);
then it gets all the parts right -- but for reasons not clear to me, it gets two extra rows of missing data at the end.
Many thanks for the reply.
The point is that I need to know the DataRange value in advance as this info needs to be decided automatically for batch processing.
Is there anything that is consistent between the files? Same number of variables with the same headers?
If there is a variable number of header lines, then is the variable names always going to be the first row, and the variable units always going to be the second row? Or is the variable names always the first row but the variable units is always the row before the data? Or are there a variable number of headers all followed by names and then units and then data?
Thanks for the reply, Walter.
The first row is always the header row. The 2nd row can either be the data or the unit row.

Sign in to comment.

 Accepted Answer

I notice you are using R2019b. Starting with R2019a, you can use readcell . Then you would ask isnumeric(T{2,1}) to determine whether the second row was header (units) or numeric. Then cell2table() using the first row of the cell as the VariableNames, and either 2:end or 3:end indexing for the content depending where the numerics start; if row 2 was not numeric then use the content to set the VariableUnits property.

1 Comment

Many thanks for the recommended solution, Walter!
That will work most of the time, except that T{2,1} may not always be numeric even when they are part of the data. Sometimes, it can be a string. My data does containt some strings such as the Station ID, Cruise_name, Expedition code, etc.
Maybe the only way is to identify a column that is always numerica first?

Sign in to comment.

More Answers (1)

The spreadsheetDataStore function can help with this alongwith detectImportOptions
You can also use the Import Tool in the MATLAB Toolstrip to select the rows required.
Hope this Helps!

Products

Release

R2019b

Tags

Asked:

on 11 Feb 2020

Commented:

on 7 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!