MATLAB Answers

Pull out strings and its values from a text file.

7 views (last 30 days)

Accepted Answer

Guillaume
Guillaume on 11 Jun 2019
HI Sriram, sorry I was away last week.
Parsing the the first part of each message (date, level, source) is trivial. It's the part after that that is difficult due to the variations of format. I don't fully understand the algorithm you've written and I don't think you can use : indiscriminately as a delimiter. For example on line 2, it's part of https://www....
Here is how I would start the parsing:
filecontent = string(fileread('File.txt')); %read whole file as STRING (for easier text comparison later)
messages = regexp(filecontent, '^(?<date>[^ ]+) (?<level>[^ ]+) (?<source>[^:]+):\s+(?<content>[^\r\n]+)', 'names', 'lineanchors'); %parse all lines according to common format
dates = num2cell(datetime([messages.date], 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSZZZZZ', 'TimeZone', 'UTC')); %decode date
[messages.date] = dates{:}; %and put back into structure
%parsing of kernel messages
iskernel = [messages.source] == "kernel";
parsedkernel = regexp([messages(iskernel).content], '\[\s*(?<cputime>[^\]]+)]\s+(?<message>.*)', 'names'); %parse kernel messages. Not sure of the rule
parsedkernel = [parsedkernel{:}]; %convert into structure array
cputime = num2cell(str2double([parsedkernel.cputime])); %convert cputime to numeric
[parsedkernel.cputime] = cputime{:}; %and put back into structure
parsedkernel = num2cell(parsedkernel); %convert to cell array to put back into messages structure
[messages(iskernel).content] = parsedkernel{:};

  6 Comments

Show 3 older comments
Guillaume
Guillaume on 14 Jun 2019
So which version?
Same code to work with char arrays instead of strings:
filecontent = fileread('File.txt'); %read whole file as STRING (for easier text comparison later)
messages = regexp(filecontent, '^(?<date>[^ ]+) (?<level>[^ ]+) (?<source>[^:]+):\s+(?<content>[^\r\n]+)', 'names', 'lineanchors'); %parse all lines according to common format
dates = num2cell(datetime({messages.date}, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSZZZZZ', 'TimeZone', 'UTC')); %decode date
[messages.date] = dates{:}; %and put back into structure
%parsing of kernel messages
iskernel = strcmp({messages.source}, 'kernel');
parsedkernel = regexp({messages(iskernel).content}, '\[\s*(?<cputime>[^\]]+)]\s+(?<message>.*)', 'names'); %parse kernel messages. Not sure of the rule
parsedkernel = [parsedkernel{:}]; %convert into structure array
cputime = num2cell(str2double({parsedkernel.cputime})); %convert cputime to numeric
[parsedkernel.cputime] = cputime{:};
parsedkernel = num2cell(parsedkernel); %convert to cell array to put back into messages structure
[messages(iskernel).content] = parsedkernel{:};
Guillaume
Guillaume on 14 Jun 2019
Sriram's comment mistakenly posted as an answer (please use comments!):
Thanks a lot. I works.
Guillaume
Guillaume on 14 Jun 2019
Then consider changing your accepted answer, particularly after all the hard work that has gone in getting you there.

Sign in to comment.

More Answers (1)

Dimitar Georgiev
Dimitar Georgiev on 26 May 2019
cell = readcell('filename.xlsx','Range','......');
stringname = '......';
variable = strcmp(stringname,cell);

  12 Comments

Show 9 older comments
Matlab
Matlab on 30 May 2019
Thanks! Yes,I agree with your algorithm style ,can you please give me a sample code write up?
Guillaume
Guillaume on 1 Jun 2019
As I wrote:
So, I'm afraid, the task is back onto you You first need to define rules (there's going to be several due to the complex formatting of the lines) on how to split a line into various components. Only once you've done that can we think about writing the code to do it

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!