Reading complicated mixed text/numbers file

2 views (last 30 days)
I would like to read a vtu file containing the solution of a problem in Matlab.
In particular, I'd like to get the size of the data I want to read, which is given at the beginning of my file by the variable "NumberOfPoints" in this piece of file
<VTKFile type="UnstructuredGrid" version="0.1" >
<UnstructuredGrid>
<Piece NumberOfPoints="5101" NumberOfCells="10000">
<Points>
Also, the data that I'd like to import in Matlab are preceded by
<DataArray type="Float64" Name="u" format="ascii">1.0000000000000000e+00 2.0000000000000000e+00
At the moment I can read them only if I put my data on a new line in the file, i.e.
<Piece NumberOfPoints>
5101
NumberOfCells="10000">
and
<DataArray type="Float64" Name="u" format="ascii">
1.0000000000000000e+00 2.0000000000000000e+00
using this code
file = fopen( fileName, 'rt' );
while (~feof( file ))
str = fgets( file );
str = strtrim(str);
switch (str)
case '<Piece NumberOfPoints>'
n = fscanf( file, '%d', 1 )
case '<DataArray type="Float64" Name="u" format="ascii">'
val = fscanf( file, '%f', [1, n] )';
end
end
fclose( file );
How can I get the values without modifying my files by hand? I have a lot of files with very big size and this procedure takes long time.
Thank you,
Elisa

Accepted Answer

Cam Salzberger
Cam Salzberger on 30 Jul 2015
Hi Elisa,
I understand that you are trying to read the data points from a VTU file with a specific format without having to modify the file by hand. I am assuming that all of the data points are contained on the line starting with the "DataArray" tag. There are many different ways of parsing the file, so I'll give you a couple of approaches.
The first approach is very similar to your code, but avoids using "switch" to check for the line of interest. Switch-case constructs will only work for exact matches, but you want to know if a particular string is only part of the file line. The "strfind" function, among others, will look for the specified substring within the given string. You could also use the "strncmp" function if you would prefer that.
Also, since the data you are interested in is on the same line as the substring that specifies it as the line of interest, you cannot use "fscanf" to parse that line. If you always know that "NumberOfPoints" will be the first attribute in the "Piece" tag, you can use the "strsplit" function to extract the number of data points you want. You can use similar methods to extract the data points from the "DataArray" line.
file = fopen( fileName, 'rt' );
while (~feof( file ))
str = fgets( file );
str = strtrim( str );
if strfind( str, '<Piece NumberOfPoints=' )
strPieces = strsplit( str, '"' ); % Split at the double-quote marks
n = str2double( strPieces{2} ); % Convert to number
elseif strfind( str, '<DataArray type=' )
strPieces = strsplit( str, '>' ); % Split at the end of the tag
val = sscanf( strPieces{end}, '%f', [1 n] ); % Read in data
end
end
fclose( file );
One of the issues with this approach, however, is that it is not very robust for files of slightly different formats. For example, if there were only a single space between "Piece" and "NumberOfPoints", it would be enough to ensure that this code will never find the value for "n". A much more robust approach would be to use regular expressions . These can be tricky to work with, but they allow for more flexibility in the file format.
file = fopen( fileName, 'rt' );
while (~feof( file ))
str = fgets( file );
% The token of interest must have one or more digits, and only digits
strTokens = regexp( str, 'NumberOfPoints="(\d+)"', 'tokens' );
if ~isempty( strTokens )
n = str2double( strTokens{1}{1} );
else
% The token may have space, tab, any digit, decimal point, the 'e'
% character, plus, or minus since all can be used to write numbers
% in exponential notation
strTokens = ...
regexp( str, '<DataArray.*?>([ \t\d\.e\+\-]+)', 'tokens' );
if ~isempty( strTokens )
val = sscanf( strTokens{1}{1}, '%f', [1 n] );
end
end
end
fclose( file );
You may wish to add some error checking to ensure that the code found the value of n, before trying to use it to extract the data points.
I hope that this helps with the file parsing.
-Cam
  1 Comment
Elisa Schenone
Elisa Schenone on 3 Aug 2015
Thank you very much, this helps a lot! Now it's finally working. I had to use the second approach since I have several expressions
<DataArray ...
with different variables in my file. If I used the first approach with the whole expression
<DataArray type="Float64" Name="basis"
in order to read the variable named "basis", it did not work, maybe because of the double-quote marks?
Anyway, the second approach works very well.
Thank you,
Elisa

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!