Elegant way to extract data from text files with an arbitrary format?
1 view (last 30 days)
Show older comments
Hi Guys,
I need to process a large number of text files to extract numerical data. The data is fairly complex, as the files have a arbitrary format and contain several different blocks of data. To illustrate:
Boys Names:
Tom Dick Harry...
Animals:
Cat Dog Squirrel Triceratops Shark...
Rectangle Properties:
x0 y0 width height angle
0 1 4 2 30
-1 2 5 1.5 0.5
7 1 4 5 22
3 9 7.5 6 0
Some more data...
The challenge is that the data I need to access is somewhere in the middle of each file. I never know where the block (Rectangle Properties in this case) will show up. There could, for example, be a large number of records under the Names or Animals sections, which means I need to locate the Rectangle section of the file. To complicate things further - I don't know how many rectangles I need to read in.
The header "Rectangle Coordinates" only appears once in each file. The sub-header line "x0 y0...." occurs in several places (e.g. different shapes).
My current approach is:
- Scan through the file (using fgetl) until I get to the "Rectangle Coordinates:" header.
- Skip a line (I don't need the sub-header)
- Read 5 items of numerical data (sscanf) from each of the subsequent lines until I reach a blank line
This works fine, but I'm wondering if there'a a more elegant approach, perhaps using regular expressions or some other technique?
The data files I'm processing are quite large and I need to extract several different blocks of data (e.g. Rectangles, Triangles, Circles). Each block has a unique header but may have a one or more sub-header lines which are not unique. The number of data items in each block varies, and there is no way to know how many items there are when I begin processing the data. This makes it difficult to produce a "one size fits all function" and the code gets pretty messy.
Any advice would be appreciated!
B
1 Comment
Walter Roberson
on 1 Dec 2015
For the blocks that you need, is the order of blocks fixed?
Is the first line of the file always the same?
Answers (1)
See Also
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!