How can I read from a text file in the following format?

I have a text file formatted as follows,I want to read this text file and separate it in three blocks, each one should be a [4*4] matrix, I don't know how I can do that with "textscan"?!!
# TT50/Data/BaseballPitch/v_BaseballPitch_g01_c01
5 0.25 0.228125 0.0654206
5 0.133333 0.0375 0.0747664
5 0.208333 0.55625 0.0747664
5 0.495833 0.221875 0.0747664
#TT50/Data/BaseballPitch/v_BaseballPitch_g01_c02
5 0.591667 0.134375 0.0860215
5 0.320833 0.125 0.0967742
5 0.458333 0.24375 0.0967742
5 0.520833 0.140625 0.0967742
# TT50/Data/BaseballPitch/v_BaseballPitch_g01_c03
5 0.625 0.821875 0.0873786
5 0.6125 0.765625 0.203883
5 0.575 0.78125 0.262136
5 0.6 0.778125 0.271845
I would appreciate if you help me with this problem...
Best,

 Accepted Answer

If your above pattern hold true (i.e. line of text, 4x4 numeric matrix, line of text, 4x4 numeric matrix, etc.) then you can do something like the following (which is also described here importing text files):
fid = fopen('text.txt');
% if file descriptor valid
if fid>0
% continue until end-of-file reached
while ~feof(fid)
% read/get the line of text (and ignore it)
header = fgetl(fid);
% read the 4x4 matrix of floating point data, transposing the result of the
% fscanf (since it populates A in column order)
[A] = fscanf(fid,'%f\n',[4,4])';
% warning!! the above overwrites the previous entry so you will have to store differently
end
% close the file
fclose(fid);
end
The one gotcha with the above is the overwriting of A on each iteration…you will want to store all those matrices separately.

4 Comments

First of all thank you (so much) for your answer Geoff. There are two things that I'm concerning about:
1) This is just an example of the data set I am dealing with, it means each block is not necessarily a [4*4] matrix. They have different number of rows (which I don't know).
2) My original problem is a very very big data set with some blocks (which I don't know the number again), but I know that each block starts with a text line like above example. What I am trying to do, is find those blocks and randomly select some data (rows) form each block and store them separately. I can not open the file in MATLAB due to its large size.
Best,
Hi Niloofar - if you know something about the text lines (i.e. some sort of common phrase or tag, like Cedric asks below), you could read in the file, line by line and ignore those lines that have that tag:
COMMON_TAG = 'Data';
% continue until end-of-file reached
while ~feof(fid)
% read/get the current line of text
currentline = fgetl(fid);
if ~isempty(strfind(currentline,COMMON_TAG))
% since current line has the tag, then move to the next line
currentline = fgetl(fid);
end
% since pattern is line of text then numbers, then can assume
% that the current line is a set of numbers separated by spaces
% so convert to a numeric array
numericData = str2num(currentline);
% do something with this array of numeric data which is
% just a single row with a variable number of columns
end
The above would allow you to read in the numeric data by row only and allow you to manipulate it or save this random row for further processing.
Thank you Geoff for your answer, this one is working as well, but as I prefer to keep track of reference for each block, I have to modify it, if I want to use it.
Thank you so much for your help!
Niloo,

Sign in to comment.

More Answers (1)

If block sizes can vary, I would build a solution around:
content = fileread( 'myFile.txt' ) ;
nCols = 4 ;
blocks = regexp( content, '#\s?(\S+)([^#]*)', 'tokens' ) ;
nBlocks = length( blocks ) ;
labels = cell( nBlocks, 1 ) ;
data = cell( nBlocks, 1 ) ;
for k = 1 : nBlocks
labels{k} = blocks{k}{1} ;
data{k} = reshape( sscanf(blocks{k}{2}, '%f'), nCols, [] ).' ;
end
Run this on your data file and observe then cell arrays labels and data. The whole could be made more concise with CELLFUN, but it wouldn't be as clear and maybe not as efficient.
Let me know if you have any questions.

8 Comments

Thank you so much Cedric,
It worked, but I am dealing with a very big data set, so I am afraid that I couldn't store all the data in blocks for my large data set. I just want to randomly select some data (rows) from blocks and store the selected data.
Best, Niloo,
How big is big?
If you want to extract specific blocks, I can tell you how to modify the call to REGEXP for targetting relevant blocks based on labels/headers.
My smallest data set has 130550 rows and 172 columns!!! Actually, I want to have some data (100 rows) randomly from "ALL" blocks. (It might help if you know each block starts with "#").
Thanks,
Ok, do all blocks have the same number of columns, or can this number vary? Also, do you need to keep track of the reference (label/header) for each block?
On my laptop, for example, it takes 5s for processing a file with 140,000 rows and 180 columns. It's not that much overall. In any case, you can store a random selection of rows in data{k}.
Yes, all blocks have the same number of columns which is 172. I also need to keep track of the reference for each block.
If you have the statistics toolbox, you can use RANDSAMPLE. Also, you'll need a little more code if some blocks can have fewer than 100 rows.
content = fileread( 'myFile.txt' ) ;
nCols = 172 ;
nSamples = 100 ;
blocks = regexp( content, '#\s?(\S+)([^#]*)', 'tokens' ) ;
nBlocks = length( blocks ) ;
labels = cell( nBlocks, 1 ) ;
data = cell( nBlocks, 1 ) ;
for k = 1 : nBlocks
fprintf( 'Block %d/%d..\n', k, nBlocks ) ; % Can be removed.
labels{k} = blocks{k}{1} ;
temp = reshape( sscanf(blocks{k}{2}, '%f'), nCols, [] ).' ;
rowIds = randsample( size(temp, 1), nSamples ) ;
data{k} = temp(rowIds,:) ;
end
It worked. Thank you so much. You helped me a lot. I was working on it for one week, but I couldn't find an efficient code yet. I tried yours on my biggest data set, and it took only 1 minutes to process the file. You made my day!
Thanks again,
Niloo,
My pleasure! On my machine, the call to SSCANF takes the most time. If you profile the code and realize that on your biggest dataset the call to REGEXP takes most of the run time, we could work on optimizing the pattern.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!