What's the best way to read an ASCII/txt file containing a few lines of text and then the data I want to load into a matrix?

I will have files converted to something like this:
Pad 1:1 /This is what the converter writes out /Not sure if it will have the same number of comments every time /information /more text
0.0 14.666
0.2 134.567
0.3 1567.435
... ...
and so forth. I want to read only the numerical data into a matrix to later work with and prefer it to not be a hack job, and something that will consistently read in many files. Thanks!
- Mark

 Accepted Answer

If you always have the same number of header lines, use TEXTSCAN and set the parameter 'HeaderLines' to a relevant value, e.g. 2 if you have two lines of header in each file.
In the example that you provided, it seems that you have have a line of text, an empty line, and then numbers, so you should be able to work with something like:
fid = fopen( 'myFile.txt', 'r' ) ;
data = textscan( fid, '%f%f', 'HeaderLines', 2 ) ;
fclose( fid ) ;

6 Comments

Awesome, didn't know about the 'Headerlines' parameter with textscan. Thank you
If I wasn't sure how many 'Headerlines' I would have in my text file and wanted to automatically count them instead of hard coding a number, what would you suggest? In my example above, whenever there was a '/' it was a new line, but a different text file would have a different amount. What do you think is the best way to "count" the 'Headerlines'? Thank you
There are several options; I guess that one of the classic approaches is something like
data = zeros( 1e6, 2 ) ; % Prealloc (see note 1).
rowId = 0 ;
fid = fopen( 'myFile.txt', 'r' ) ;
while ~feof( fid )
line = fgetl( fid ) ;
num = sscanf( line, '%f %f' ) ;
if ~isempty( num )
rowId = rowId + 1 ;
data(rowId,:) = num.' ;
end
end
fclose( fid ) ;
data = data(1:rowId,:) ; % Truncate to filled portion.
Note 1 : prealloc for more rows (a million) than what you have in the file. This is not mandatory, but it prevents data to be reallocated each time a valid row is read, which is more efficient. If you don't know if a million is enough but you don't want to prealloc with more, you can implement a mechanism which adds another million each time rowId reach the size of the preallocated array. You would have to bring the following update in the internal IF statement:
if ~isempty( num )
rowId = rowId + 1
if rowId > size( data, 1 )
data = [data; zeros( 1e6, 2 )] ;
end
data(rowId,:) = num.' ;
end
If it is not efficient enough, you can read the file while SSCANF returns an empty array or eof(fid), and then read the rest of the file in one shot. It is not the first solution that I gave you because it is a bit more difficult to understand. You would have to implement something like (not tested):
fid = fopen( 'myFile.txt', 'r' ) ;
data = [] ;
while ~feof( fid ) && isempty( data )
line = fgetl( fid ) ;
data = sscanf( line, '%f %f' ).' ;
end
if ~feof( fid )
data = [data; fscanf( fid, '%f %f', [2, Inf] ).'] ;
end
fclose( fid ) ;
Thank you for the help. I ended up checking for the comment delimiter and using fgetl until it hit the data. From there textscan just read the rest of the text file and only put in numbers. Thanks again
You're welcome, just be careful not to loose the first line of data; if it was read by FGETL, it won't be read by TEXTSCAN as the file pointer was moved by FGETL after this line. This is why I have the concatenation in my last solution.

Sign in to comment.

More Answers (0)

Products

Asked:

on 8 Oct 2013

Commented:

on 8 Oct 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!