Using textscan to read header with mixed formats

14 views (last 30 days)
I have a *.csv file I want to import (see attachment). I read part of testFile.csv as follows:
fileID = fopen(file);
X = textscan(fileID,'%f %f %f %f','Delimiter',',','Headerlines',22)
fclose(fileID);
X = cell2mat(X);
DATA.waveAxis = X(:,1)';
DATA.absorbanceSpectrum = X(:,2)';
DATA.backgroundReference = X(:,3)';
DATA.sampleSignal = X(:,4)';
That works well, but I haven't retreived all the information from testFile.csv yet.
That is, I would also like to add the first 20 rows of the testFile.csv to my structure "DATA". For instance, I want to add header information to DATA such that it looks like "DATA.method = Column1" (first header line) or "DATA.serialNumber = 5490232" (10th header line).
However, the first 20 header lines have different formats, so I find it very difficult to write a piece of needy & speedy code to do the job. Therefore, any help is greatly appreciated!
  1 Comment
dpb
dpb on 15 Dec 2021
Wouldn't be too bad to parse and create a structure with dynamic fields names the specific file -- generically, this could be a pain given the what appear to be superfluous fields in the header data -- unless there's some known key about which has how many fields.
For example, starts off with a single piece of data in the second field for Method, Date-Time, Version, Temp, ... until get to 'Shift Vector Coefficients' which also appears to have an array of three doubles in the second data field -- except they're separated by the same delimiter as is used in the other records so there are instead five delimiters instead of only three. How to handle that will simply have to have a look up of what to do when get a given record.
Then, to add confusion, "Section 1,," only has two and no leading name field -- are there other sections in a real file, maybe, trailing after the first. But, there doesn't seem to be any indicator by which to determine how long a section of data might be...
You'll just have to have logic to treat the records by type...if it is a fixed header where it always has the same header information, then it's tedious to do once, but is fairly straightforward. If you have to be able to recognize any number of header records that could have any name string, that'll be tougher to deal with.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!