Read mixed numbers in Matlab

Dear Sir/Madam,
I have a data file containing text and mixed number like this: (see the file attached data.txt.)
AA, BB, 28 21/64, 28 45/64,
AA, BB, 1/64, 11/64,
the mixed number have format:
integer space numerator/denominator
I would like to read the data file in matlab as
AA, BB, 28.328125, 28.703125,
AA, BB, 0.015625, 0.171875,
i. e. read mixed numbers and convert them into decimal numbers.
What Matlab command to use? I would greatly appreciate it if you left your code and running output.
I am using MATLAB R2014a.
Thank you

 Accepted Answer

Stephen23
Stephen23 on 23 Dec 2018
Edited: Stephen23 on 23 Dec 2018
opt = {'CollectOutput',true,'Delimiter',','};
fmt = repmat('%s',1,4);
[fid,msg] = fopen('data.txt','rt');
assert(fid>=3,msg);
C = textscan(fid,fmt,opt{:})
fclose(fid);
C = C{1}; % raw character data in a cell array.
% Convert fractions to numeric:
foo = @(v) sum([v(1:end-2),v(end-1)/v(end)]);
baz = @(s) foo(sscanf(s,'%d%*[ /]'));
M = cellfun(baz,C(:,3:4))
Giving:
M =
28.328125 28.703125
0.031250 0.046875
and the string data is in C. If you want the numeric data reinserted back into C then use num2cell:
>> C(:,3:4) = num2cell(M)
C =
'AA' 'CC' 28.328125 28.703125
'AA' 'CC' 0.031250 0.046875

14 Comments

Thank you very much Stephen,
When I run your code I get as output:
M =
28.3281 28.7031
0.0313 0.0469
C =
'AA' 'CC' [28.3281] [28.7031]
'AA' 'CC' [ 0.0313] [ 0.0469]
M is perfect, however C has Square Brackets, see above
[ 28.3281 ]
How to remove the Square Brackets
[ ]
By the way, I am using MATLAB R2014a
Thank you again
Stephen23
Stephen23 on 23 Dec 2018
Edited: Stephen23 on 23 Dec 2018
@John Smith: the square brackets are just an artifact of how the numeric data is displayed, it does not mean that the numeric data somehow magically has square brackets around it. How data is displayed is not the same thing as what data is actually stored in memory: in this case, C is a cell array and so numeric scalars in any cell are helpfully displayed with square brackets around them to indicate that they are numeric values (and not strings or char arrays or something else).
Try it yourself:
D = {1}
There are NO square brackets in the data, it is just how it is displayed. You do NOT need to remove them because there is NOTHING THERE to remove. Exactly the same thing applies to the character vectors, which have single quotes to indicate that they are character vectors:
'this is a char vector'
But those single quotes are also just an artifact of how the character vector is displayed, and they cannot be removed because they are NOT actually in the character vector itself.
Learn about cell arrays:
Thank you very much Stephen,
I just modified your code to my real data, I am attaching the data file as ZN_CALL.txt .
For the data file ZN_CALL.txt, I need skip first two rows(lines), Then I need to extract
4th column, 8th column and 9th column, all other columns are not important.
The following is modified code:
opt = {'CollectOutput',true,'Delimiter',','}
fmt = repmat('%s',1,10) %% 'FORMAT'
% %s is used to print output that formated as string. %s represents character vector(containing letters)
%% repmat Replicate and tile an array. 1, 4 mean 1 row 4 column, %s%s%s%s
[fid,msg] = fopen('ZN_CALL.txt','rt') %%% r Open file for reading (default). To open in text mode, add "t" to the permission string, for example 'rt' and 'wt+'.
assert(fid>=3,msg); %%% make sure fid >=3
C = textscan(fid,fmt,opt{:},'headerlines',2)
disp(C)
fclose(fid);
C = C{1} % show raw character data in a cell array.
% Convert fractions to numeric:
foo = @(v) sum([v(1:end-2),v(end-1)/v(end)])
baz = @(s) foo(sscanf(s,'%d%*[ /]'))
M = cellfun(baz,C(:,4,8:9))
disp(M)
%%% and the string data is in C. If you want the numeric data reinserted back into C then use num2cell:
C(:,3:4) = num2cell(M)
C(2,2)
C = C(2,3) % show raw character data in a cell array.
I receive the folloing error:
foo =
@(v)sum([v(1:end-2),v(end-1)/v(end)])
baz =
@(s)foo(sscanf(s,'%d%*[ /]'))
Index exceeds matrix dimensions.
Error in StephenCobeldick_ZN_data (line 22)
M = cellfun(baz,C(:,4,8:9))
Thank you for your time and knowledge.
Stephen23
Stephen23 on 24 Dec 2018
Edited: Stephen23 on 24 Dec 2018
You will need to use indexing to select only the relevant cells of the imported data:
opt = {'CollectOutput',true,'Delimiter',',','HeaderLines',2};
fmt = repmat('%s',1,10);
[fid,msg] = fopen('ZN_CALL.txt','rt');
assert(fid>=3,msg);
C = textscan(fid,fmt,opt{:});
fclose(fid);
C = C{1};
% eighth & ninth columns:
tmp = C(:,8:9);
out = str2double(tmp);
idx = cellfun('length',tmp)>2;
foo = @(v) sum([v(1:end-2),v(end-1)/v(end)]);
baz = @(s) foo(sscanf(s,'%d%*[ /]'));
out(idx) = cellfun(baz,tmp(idx));
% fourth column:
out = [str2double(C(:,4)),out];
This gives the three columns that you requested in one numeric array out:
>> out
out =
92.50000 28.32812 28.70312
93.00000 28.04688 28.42188
93.50000 27.32812 27.70312
94.00000 26.82812 27.20312
94.50000 26.54688 26.92188
95.00000 26.04688 26.42188
95.50000 25.32812 25.70312
96.00000 25.04688 25.42188
96.50000 24.32812 24.70312
97.00000 24.04688 24.42188
97.50000 23.54688 23.92188
98.00000 23.04688 23.42188
98.50000 22.54688 22.92188
99.00000 22.04688 22.42188
99.50000 21.32812 21.70312
100.00000 21.04688 21.42188
100.50000 20.54688 20.92188
101.00000 20.04688 20.42188
101.50000 19.54688 19.92188
102.00000 19.04688 19.42188
102.50000 18.32812 18.70312
103.00000 18.04688 18.42188
103.50000 17.54688 17.92188
104.00000 17.04688 17.42188
104.50000 16.54688 16.92188
105.00000 15.82812 16.20312
105.50000 15.32812 15.70312
106.00000 14.82812 15.20312
106.50000 14.32812 14.70312
107.00000 13.82812 14.20312
107.50000 13.54688 13.92188
108.00000 12.82812 13.20312
108.50000 12.54688 12.70312
109.00000 11.82812 12.20312
109.50000 11.32812 11.70312
110.00000 11.04688 11.42188
110.50000 10.54688 10.92188
111.00000 10.04688 10.42188
111.50000 9.54688 9.92188
112.00000 9.04688 9.42188
112.50000 8.54688 8.92188
113.00000 8.04688 8.42188
113.50000 7.54688 7.92188
114.00000 7.04688 7.42188
114.50000 6.54688 6.92188
115.00000 6.04688 6.42188
115.50000 5.54688 5.92188
116.00000 5.04688 5.21875
116.50000 4.54688 4.71875
117.00000 4.06250 4.23438
117.50000 3.56250 3.73438
118.00000 3.06250 3.25000
118.50000 2.64062 2.68750
119.00000 2.17188 2.21875
119.50000 1.71875 1.76562
120.00000 1.29688 1.34375
120.50000 0.95312 0.98438
121.00000 0.67188 0.70312
121.50000 0.46875 0.48438
122.00000 0.31250 0.34375
122.50000 0.21875 0.25000
123.00000 0.17188 0.18750
123.50000 0.12500 0.15625
124.00000 0.10938 0.12500
124.50000 0.07812 0.10938
125.00000 0.06250 0.09375
125.50000 0.06250 0.07812
126.00000 0.04688 0.06250
126.50000 0.03125 0.06250
127.00000 0.03125 0.04688
127.50000 0.01562 0.04688
128.00000 0.01562 0.03125
128.50000 0.01562 0.03125
129.00000 NaN 0.03125
129.50000 NaN 0.03125
130.00000 NaN 0.01562
130.50000 NaN 0.01562
131.00000 NaN NaN
131.50000 NaN NaN
132.00000 NaN NaN
132.50000 NaN NaN
133.00000 NaN NaN
133.50000 NaN NaN
134.00000 NaN NaN
134.50000 NaN 0.00000
135.00000 NaN 0.00000
135.50000 NaN 0.00000
136.00000 NaN 0.00000
136.50000 NaN 0.00000
137.00000 NaN 0.00000
137.50000 NaN 0.00000
138.00000 NaN 0.00000
138.50000 NaN 0.00000
139.00000 NaN 0.00000
139.50000 NaN 0.00000
140.00000 NaN 0.00000
140.50000 NaN 0.00000
141.00000 NaN 0.00000
141.50000 NaN 0.00000
142.00000 NaN 0.00000
142.50000 NaN 0.00000
143.00000 NaN 0.00000
143.50000 NaN 0.00000
144.00000 NaN 0.00000
144.50000 NaN 0.00000
145.00000 NaN 0.00000
145.50000 NaN 0.00000
146.00000 NaN 0.00000
Note that blank cells are converted to NaN.
PS: Note that your indexing is incorrect:
C(:,4,8:9)
^ All of first dimension (rows)
^ 4th of second dimension (columns)
^^^ 8th & 9th of third dimension (pages)
To get the 4th, 8th and 9th columns then you would need to create a vector of those indices:
C(:,[4,8:9])
^ All of first dimension (rows)
^^^^^^^ 4th, 8th & 9th of second dimension (columns)
Dear Stephen,
Thank you very much for the code.
Could you please make your code also apply to decimal numbers (as well as mixed number). The reason is that some data has mixed numbers, some data use decimal numbers. I would like have one code fits all.
I am sending you the data ZN_CALL_NEW.txt, note the last row has decimal numbers on 8th, 9th columns.
when I just run I get:
Attempted to access v(0); index must be a positive integer or logical.
Error in @(v)sum([v(1:end-2),v(end-1)/v(end)])
Error in @(s)foo(sscanf(s,'%d%*[ /]'))
Error in StephenCobeldick_ZN_data_3 (line 14)
out(idx) = cellfun(baz,tmp(idx));
Thank you
Stephen23
Stephen23 on 24 Dec 2018
Edited: Stephen23 on 24 Dec 2018
Just change the index condition to this (the rest of the code stays the same):
idx = cellfun('length',tmp)>2 & isnan(out);
With your example data file it gives this:
>> out
out =
92.50000 NaN NaN
93.00000 NaN NaN
93.50000 NaN NaN
94.00000 NaN NaN
94.50000 NaN NaN
95.00000 NaN NaN
... lots of lines here
143.50000 NaN 0.00000
144.00000 NaN 0.00000
144.50000 NaN 0.00000
145.00000 NaN 0.00000
145.50000 NaN 0.00000
146.00000 18.27000 25.32000
Dear Stephen,
Thank you very much,
with this code ,I got error
opt = {'CollectOutput',true,'Delimiter',',','HeaderLines',2};
fmt = repmat('%s',1,10);
[fid,msg] = fopen('ZN_CALL_NEW.txt','rt');
assert(fid>=3,msg);
C = textscan(fid,fmt,opt{:});
fclose(fid);
C = C{1};
% eighth & ninth columns:
tmp = C(:,8:9);
out = str2double(tmp);
%%%%%%idx = cellfun('length',tmp)>2;
idx = cellfun('length',tmp)>2 && isnan(out);
foo = @(v) sum([v(1:end-2),v(end-1)/v(end)]);
baz = @(s) foo(sscanf(s,'%d%*[ /]'));
out(idx) = cellfun(baz,tmp(idx));
% fourth column:
out = [str2double(C(:,4)),out]
Operands to the || and && operators must be convertible to logical scalar values.
Error in StephenCobeldick_ZN_data_4 (line 12)
idx = cellfun('length',tmp)>2 && isnan(out);
Stephen23
Stephen23 on 24 Dec 2018
Edited: Stephen23 on 24 Dec 2018
idx = cellfun('length',tmp)>2 & isnan(out);
% ^ only one &!
Dear Stephen,
Your code is perfect,
Could you please explain your code , especially
idx = cellfun('length',tmp)>2 & isnan(out);
foo = @(v) sum([v(1:end-2),v(end-1)/v(end)]);
baz = @(s) foo(sscanf(s,'%d%*[ /]'));
out(idx) = cellfun(baz,tmp(idx));
what do you do with
baz = @(s) foo(sscanf(s,'%d%*[ /]'));
what is %*[ /]' ?
what do you mean with @(v) and @(s) ?
thank you
Stephen23
Stephen23 on 24 Dec 2018
Edited: Stephen23 on 30 Dec 2018
idx = cellfun('length',tmp)>2 & isnan(out);
Checks if the length of each character vector in the cell array tmp is greater than two AND where the elements of out are NaN (i.e. the elements of tmp do not encode scalar numbers). So this will only be TRUE for values that are non-empty and that are not scalar numeric. It will therefore be FALSE where the cell is empty (or has length <=2) OR it has already been converted to a scalar numeric by str2double.
foo = @(v) sum([v(1:end-2),v(end-1)/v(end)]);
Accepts a numeric vector v of two or three elements, and converts this into a scalar numeric. The final two elements of the vector form a fraction, which is then summed with any preceding elements of the vector.
baz = @(s) foo(sscanf(s,'%d%*[ /]'));
Converts a string/character vector representing numbers into a numeric vector, e.g.
>> sscanf('7 2/3','%d%*[ /]')
ans =
7
2
3
>> sscanf('5/6','%d%*[ /]')
ans =
5
6
and then applies the function foo to this numeric vector.
out(idx) = cellfun(baz,tmp(idx));
Applies the function baz to cells of the cell array tmp that were selected by the index idx. Thus selects the character vectors that could not be converted by str2double and are not blank, converts to numeric vector, calculates the fraction, sums, and assigns the output numeric values into numeric matrix out.
"what do you do with baz = @(s) foo(sscanf(s,'%d%*[ /]')) "
Define a function that uses sscanf to convert a character vector into a numeric vector, and then applies foo to that numeric vector.
"what is %*[ /]' ?"
Tells sscanf to ignore space characters and forward slash characters.
"what do you mean with @(v) and @(s) ?"
Define anonymous functions:
John Smith
John Smith on 24 Dec 2018
Edited: John Smith on 24 Dec 2018
Dear Stephen,
Thank you very much for your explanations.
I try these
D1={ }
cellfun('length',D1)
D2={23.5}
cellfun('length',D2)
D3={5/64}
cellfun('length',D3)
D4={28 1/64}
cellfun('length',D4)
I get from MATLAB
D1 =
{}
ans =
[]
D2 =
[23.5000]
ans =
1
D3 =
[0.0781]
ans =
1
D4 =
[28] [0.0156]
ans =
1 1
Are the answer from MATLAB correct? no answer is great than 2 ?
Stephen23
Stephen23 on 25 Dec 2018
Edited: Stephen23 on 26 Dec 2018
"Are the answer from MATLAB correct?"
Yes.
>> D1 = {}; % an empty cell array, so there are no arrays inside of it.
>> cellfun('length',D1) % no arrays -> no lengths -> empty output.
ans = []
>> D2 = {23.5}; % a 1x1 cell array containing a scalar numeric.
>> cellfun('length',D2) % a scalar numeric has length 1.
ans = 1
>> D3 = {5/64}; % a 1x1 cell array containing a scalar numeric.
>> cellfun('length',D3) % a scalar numeric has length 1.
ans = 1
>> D4 = {28,1/64}; % a 1x2 cell array, each cell contains a scalar numeric.
>> cellfun('length',D4) % two scalar numerics, each has length 1.
ans =
1 1
"no answer is great than 2 ?"
Correct, none of those cell arrays contain arrays with length greater than two. In fact all of the lengths are one because you defined all of the arrays in every cell to be scalar. Scalars have length one:
>> length(23.5)
ans = 1
A cell array is just a container for holding other arrays. This is useful because sometimes we need to hold together other arrays of different sizes, types, or dimensions. In your examples, you have cell arrays of different sizes (with zero, one, or two cells), and you placed a scalar numeric in every cell. The length of a scalar is one, so cellfun('length',...) carefully measures the length of the array in each cell and returns that answer for every cell of the cell arrays you defined.
Dear Stephen,
I would like to ask you the following question:
I have a data file like this
tmp =
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438
I would like re-group (or reduce) it with following conditions:
For any row, if column 1 AND column 3 of this row is identical with any column 1 AND column 3 of any other row. Then reduce to one new row with new value of column 2, this new value of column 2 is the sum of original values of column 2. Column 1 is kept the same, Column 4 is not important.
So for above data, I expect to have the answer:
119 5 6832 0.20313 % 5+4-4=5
122 -48 6853 0.29688
125 -8 6853 0.4375 % -12+4=-8
121 12 6914 0.5625
119 48 6914 0.17188
Thank you very much
@John Smith: that is quite a different topic. Please ask a new question about that.

Sign in to comment.

More Answers (2)

Here's some code to read the data in as a table, then update the mixed number values in each row to the decimal...
T = readtable("C:\Users\MATLAB\Desktop\data.txt",'Delimiter',',','ReadVariableNames',false)
numRows = size(T,1);
for i = 1:numRows
for j=3:4
tmpstr = T{i,j};
tmpstr=tmpstr{:};
tmpstr=strrep(tmpstr, '/',' ');
mixedVal = sscanf(tmpstr,'%d');
if size(mixedVal,1) == 2
decVal = mixedVal(1)/mixedVal(2);
else
decVal = mixedVal(1) + mixedVal(2)/mixedVal(3);
end
T(i,j) = {num2str(decVal)};
end
end
disp(T)

1 Comment

Thank you Brian,
The code give me those errors: (I am using Matlab R2014a)

Sign in to comment.

Guillaume
Guillaume on 22 Dec 2018
Edited: Guillaume on 24 Dec 2018
I would recommend that you modify whatever is creating these files so that it creates files that don't have such an unusual format.
As it is, the following should work but is not particularly good code:
data = readtable('data.txt', 'Delimiter', ',', 'ReadVariableNames', false);
data = [data(:, [1 2]), ... %leave text variables unchanged
varfun(@(var) cellfun(@(var) str2num(strrep(var, ' ', '+')), var), data, 'InputVariables', [3 4])]; %convert 'numeric' variables
It simply replaces the space by + and use num2str to parse the expression which is simple but dangerous.
edited to add missing input to cellfun and wrong function use

3 Comments

Thank you Guillaume,
your code give those errors: (I am using Matlab R2014a)
Error using cellfun
Not enough input arguments
Sorry,forgot the input to cellfun. Fixed now.
In the future, please copy/paste the error as text rather than a screenshot.
Note: the advantage of my answer over the other proposed ones, is that it will parse correctly
28 21/64
21/64
28
John wrote in a comment now deleted: "Could you please make your code also apply to decimal numbers (as well as mixed number). The reason is that some data has mixed numbers, some data use decimal numbers. I would like have one code fits all. "
As I pointed out, my answer already does that. In a much simpler way. Also note that the numeric columns are stored as vectors which is more memory efficient and easier to use than a cell array of scalars.

Sign in to comment.

Categories

Asked:

on 22 Dec 2018

Commented:

on 30 Dec 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!