How to read text file which has blanks in some columns ?

I have a text file in the following format :
Algorithm SNR Q
A 20 30
25 32
30 35
B 20 32
25 34
30 36
I want to read this data of SNR and Q for 2 different algorithms so that i can plot SNR vs Q graph.
But the problem is that when I read this text file using Matlab functions (example textscan) , it doesn't read it correctly due to the empty spaces present in this text file in the column 'Algorithm'. For example: I tried to open this text file and read it and then display its contents :
fid = fopen('Test.txt');
C = textscan(fid,'%s %d %f');
fclose(fid);
celldisp(C);
The output of the above code is :
C{1}{1} = Algorithm
C{2} = []
C{3} = []
If I remove the first column named 'Algorithm' from the text file and then read that file, then it works fine. But I am unable to read it correctly when that String column is also present. Can anyone please help me to know how can I read such data correctly from a text file. Thanks for your help in advance.

 Accepted Answer

There are quite a few approaches. Here one example:
fh = fopen('Swati.txt', 'r') ;
fgetl(fh) ; % Skip header.
algs = {} ;
data = {} ;
while ~feof(fh)
line = fgetl(fh) ;
if all(line(1:12) == ' ') % No algo. name.
content = textscan(line, '%f %f') ; % Read only SNR and Q.
data{end} = [data{end}; [content{:}]] ; % Append to current data.
else
content = textscan(line, '%s %f %f') ; % Read name, SNR, and Q.
algs = [algs; content(1)] ; % Append to names.
data = [data; {[content{2:3}]}] ; % Create new data entry.
end
end
fclose(fh) ;
Running this leads to:
>> algs
algs =
{1x1 cell}
{1x1 cell}
>> algs{1}
ans =
'A'
>> data
data =
[3x2 double]
[3x2 double]
>> data{1}
ans =
20 30
25 32
30 35
Please note that it is not the best possible approach/practice, but it is simple enough.

4 Comments

Hello, Thank you so much for writing such a detailed answer with comments. I really appreciate that. I tried using your implementation and its working as what I wanted. It works wonderful. Thanks a lot.
Can you please let me know if it is possible to separate the columns of the data{1} into two different arrays/vectors ? For Example:
>> data{1}
ans =
20 30
25 32
30 35
For the above output, can I get
x= [ 20,25,30 ]
y= [30, 32,35 ]
So that I can plot the graphs and can do calculations required on individual columns.
This approach is very direct: Read the line, check if it contains the inital string or not, consider the string if existing. This is exactly like I would do it without a computer also. +1
I assume that [data; {[content{2:3}]}] can be simplified to: |[data; content(2:3)], or :
data(end+1:end+2) = content(2:3);
I assume in the first case you need "data{end+1}" instead of "data{end}=...".
@Jan: Actually no, the end is correct. I know that it looks weird ;-), but what we do in the first clause is to append a row to the array that is already stored in the last cell of cell array data.
@Swati: Defining x=data{1}(:,1) and y=data{1}(:,2) works. Actually data{1} is the content of cell #1, and it is a regular numeric array. If you don't like the condensed notation, you can use an intermediary variable, e.g.
xy = data{1} ;
x = xy(:,1) ;
y = xy(:,2) ;
@Cedric: Now I see.

Sign in to comment.

More Answers (1)

Comments
  • If you are in control of the program, which produces this text file, you could write the algorithm-value 'A' or 'B' on every line.
  • The file as is can be read with a loop
while not( feof(fid) )
str = fgetl( fid )
parse str
end
.
In response to comment
My fault; I was too terse.
  • parse str was intended as a hint (pseudo-code) and that you should replace it by real code. "as is" refers to the file you show in OP. Cedric provided a working code according to this approach.
  • "'A' or 'B' on every line": My point is that one should write files that are easy to read. (There is a trade-off between easy to read for human and easy to read by a program.)
Here is an example of reading your modified file
>> cac = cssm()
cac =
{5x1 cell} [5x1 double] [5x1 double]
>> cac{1}
ans =
'Algo-A'
'Algo-A'
'Algo-B'
'Algo-B'
'Algo-B'
>> cac{2}
ans =
3
4
5
6
7
>> cac{3}
ans =
16.1200
20.1200
25.1200
31.1200
38.1200
>>
where
function cac = cssm()
fid = fopen( 'cssm.txt' );
cac = textscan( fid, '%s%f%f', 'Headerlines', 1 );
fclose( fid );
end

2 Comments

Hello, Thanks a lot for answering my query. I really appreciate that. I tried to modify my text file and used the method suggested by you as follows:
Modified text file is : (Is this what you meant?)
Algo SNR Q
Algo-A 3 16.12
Algo-A 4 20.12
Algo-B 5 25.12
Algo-B 6 31.12
Algo-B 7 38.12
Then I run the following code:
fid = fopen('Test1.txt', 'r') ;
while not( feof(fid) )
str = fgetl( fid );
parse str
end
fclose(fid) ;
The error which I get is : Undefined function 'parse' for input arguments of type 'char'.
I am sorry if I didn't implement your suggestion properly , I am new to working with files.. Please correct me where did I go wrong in implementing your method.
See the addition to my answer

Sign in to comment.

Categories

Asked:

on 10 Jul 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!