Load parts of VERY LARGE text file content and create a smaller matrix
Show older comments
I have a very large file. Sample format is attached. There is a header with comment marks $$.
The rest of the data begins from Start 1, Pos #, , followed by two columns of data the length

to Start 2, Pos # etc. Note that the columns after Pos # is NOT fixed.
The length of the two columns after Start #, Pos # ranges between 100 to around 500,000.
The Scan # ranges from 1 to around 4000.
I want to be able to read in sequentially the two columns after each Start 1, Pos# to just before Start 2 and Pos # and then move on to Start 2, Pos # etc.
I have tried textscan with block size but this is not working well.
It is not possible to load all data directly into Matlab.
Any directions will be greatly appreciated.
6 Comments
dpb
on 11 Mar 2015
"Note that the columns after Pos # is NOT fixed."
What does this mean?
John Tetteh
on 11 Mar 2015
John Tetteh
on 11 Mar 2015
dpb
on 11 Mar 2015
That's of no help; this is a two-way street. You want help, clarify the problem PRECISELY.
That's an image not the actual file; looks fixed column to me unless you can say something different, what's the problem with it that it's mentioned specifically?
Do you want/need those values as well?
John Tetteh
on 11 Mar 2015
dpb
on 11 Mar 2015
OK, that's a big step forward...I've gotta' run and finish up the evening chores now but I'll try to take a look at it later on this evening. My first hunch is can make a textscan call work ok since you can process by grouping but I'll have to 'spearmint to test the hypothesis...altho the basic idea is once you get to the beginning of the first section you then do an unterminated read of the floating point data; textscan will convert until it errors on the next section. Then you trap the error and get the next character line to reset the file pointer to a clean record and repeat. "Rinse and repeat" until feof.
As say, one generally has to test these things on a given file to work out the nitty, but the above tactic generally works as a tactic.
Accepted Answer
More Answers (1)
Robert Cumming
on 11 Mar 2015
0 votes
Use fopen to open the file then parse it line by line saving what you need and ignoring the rest. Remember to close the file with fclose as well.
4 Comments
John Tetteh
on 11 Mar 2015
dpb
on 11 Mar 2015
"... used fopen and textscan and strcmp to locate the the string 'Scan' indexes in a block for the whole data..."
So you were able to read the entire file into memory? Your earlier posting said you weren't able to do so? If can, that simplifies things a bunch.
Show your actual code and again, "clarify, clarify, clarify!" We only know what you tell us; we can't see your workstation from here nor know what you have/have not done that is clear to you those results.
Robert Cumming
on 11 Mar 2015
Use fgetl to read each individual line.
John Tetteh
on 13 Mar 2015
Categories
Find more on Import, Export, and Conversion in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!