Import txt file and pick the values after the selected key words using Regular Expression

13 views (last 30 days)
Hello everybody,
I have a large non-homogeneous text file. And I need find the key words in the text then pick the value next to the key words. Here is one part of this text file:
sfhafjlhakjfhahfaoh(some text before)
LAW NUMBER
8907 0 1 0
-1876.98 11440 1
8 2 2 2 7 8
LAW TYPE:
152 0 1
7 8
163 154 155 156
Geomaterial_2 - solid PHASE
ELASTIC CONSTITUTIVE LAW FOR SOLID ELEMENTS
AT CONSTANT TEMPERATURE
USE OF EFFECTIVE STRESSES. ISOL = 1
NUMBER OF SUBINTERVALS.... NINTV= 1
YOUNG'S MODULUS .......... = 0.200000E+08
POISSON'S RATIO .......... = 0.300000
SPECIFIC MASS AS A MATERIAL LAW,
RHO ...................... = 2670.00
LAW NUMBER
288 0 1 0
-13.45 110 1
8 2 2 2 9 8
LAW TYPE:
171 0 1
5 6
173 174 175 179
Geomaterial_2 - liquid PHASE
WATER-AIR SEEPAGE- VAPOR -THERMAL COUPLED
CONSTITUTIVE LAW FOR SOLID ELEMENTS
ISOTROPIC CASE IANI = 0
FORMULATION INDEX FOR krw IKW = 0
FORMULATION INDEX FOR kra IKA = 0
I need the first values after the key words and the values in the third line after the key word 'LAW NUMBER' and 'LAW TYPE'. So in this case: two vector will be created: Lawnumber=[8907 288] and Lawtype=[152 288] and another two matrix of the third line will be [8 2 2 2 7 8; 8 2 2 2 9 8] for LAW NUMBER and [163 154 155 156; 173 174 175 179]
Mr.Oleg Komarov proposed me to use regexp. His code is very easy and powerful here is the lien: http://www.mathworks.com/matlabcentral/answers/13585-find-the-key-word-in-the-text-file-then-pick-the-value-next-to-it
here is the code:
% Import the whole file at once
fid = fopen('test.txt','r');
text = textscan(fid,'%s','Delimiter','','endofline','');
text = text{1}{1};
fid = fclose(fid);
% Parse with regexp
tk = regexp(text,'LAW NUMBER[\s\.=]+(\d+)|LAW TYPE[:\s]+(\d+)','tokens');
% tk = regexp(text,'LAW TYPE\s+(\d+ ){2}(?:[^\n]+\n){2}(\d+ )+','tokens'); Optional code
% textscan([tk{1}{:}],'%f') Optional code
% COnvert to double
tk = reshape(str2double([tk{:}]),2,[])
It is very powerful to get the first value after key words. But the optional code doesn't work very well. Until now I am not successful to get the third line. Is someone could improve it and help me out ?
Thank you very much.
Gringoire

Accepted Answer

Oleg Komarov
Oleg Komarov on 13 Aug 2011
This time I slightly different solution:
fid = fopen('test.txt','r');
text = textscan(fid,'%s','Delimiter','');
text = text{1};
fid = fclose(fid);
%Parse LAW NUMBER
idx = find(~cellfun('isempty',strfind(text,'LAW NUMBER'))) + 1;
LW = cellfun(@(x) textscan(x,'%f%*[^\n]'),text(idx),'un',0);
LW = cell2mat(cat(1,LW{:}));
LWm = cellfun(@(x) textscan(x,'%f'),text(idx+2),'un',0);
LWm = cell2mat([LWm{:}]).';
% Parse LAW TYPE
idx = find(~cellfun('isempty',strfind(text,'LAW TYPE'))) + 1;
LT = cellfun(@(x) textscan(x,'%f%*[^\n]'),text(idx),'un',0);
LT = cell2mat(cat(1,LT{:}));
LTm = cellfun(@(x) textscan(x,'%f'),text(idx+2),'un',0);
LTm = cell2mat([LTm{:}]).';

More Answers (1)

Fangjun Jiang
Fangjun Jiang on 13 Aug 2011
For a file like this, I prefer using fgetl().
NumbCount=0;
TypeCount=0;
fid=fopen('test.txt');
fline=fgetl(fid);
while ~feof(fid)
if strfind(fline,'LAW NUMBER')
NumbCount=NumbCount+1;
fline=fgetl(fid);
Temp=sscanf(fline,'%d');
LawNumber(NumbCount)=Temp(1);
fline=fgetl(fid);
fline=fgetl(fid);
Temp=sscanf(fline,'%d');
LawNumberMatrix(NumbCount,1:6)=Temp(1:6);
elseif strfind(fline,'LAW TYPE');
TypeCount=TypeCount+1;
fline=fgetl(fid);
Temp=sscanf(fline,'%d');
LawType(TypeCount)=Temp(1);
fline=fgetl(fid);
fline=fgetl(fid);
Temp=sscanf(fline,'%d');
LawTypeMatrix(TypeCount,1:4)=Temp(1:4);
end
fline=fgetl(fid);
end
fclose(fid);
  2 Comments
gringoire
gringoire on 14 Aug 2011
Thank you very much. Your code works well. In fact, before I used fgetl too. But I don't know the function of feof.... It seems very useful. :P Have a good day.
gringoire
gringoire on 15 Aug 2011
Hello, this is the comment for today's question. I don't know what's happened. Your answer is missing from the web...
In fact, I am trying to use the code which you gave me yesterday. Here it is my mistake. I copied a wrong file. the F = cell2mat(textscan(fid,priformat1)) should be F = cell2mat(textscan(st,priformat1))
I wanna loop each line to get the value. But as Mr.Oleg said, the %d cannot recognize a fixed width file...
Thanks for your help indeed.

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!