MATLAB Answers

How to parse text data

147 views (last 30 days)
Matlab
Matlab on 17 Jul 2019
Commented: Matlab on 2 Aug 2019
Hi
I have data in the below format. I need the mechanism to parse the data from below format with expected output.
Input data format:
07/16 12:55:22.012 INFO | test_runner_utils:0812| Began logging to /tmp/test_that_results_hatch_deL3lZ
07/16 12:55:27.477 INFO | test_runner_utils:0259| autoserv| Processing control file
Expected Output format:
Define level of message extraction based on the marker sign ==> |
-Step 1: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|
-Step 2: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>| extract full text in a variable, option to grab variable if associated with value
-Step 3: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|
-Step 4: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|extract full text in a variable, option to grab variable if associated with value
-Step 5: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<string>|
-Step 6: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<string>|extract full text in a variable, option to grab variable if associated with value
Input data format:
07/16 12:55:27.620 DEBUG| utils:0287| [stdout] CHROMEOS_RELEASE_BOARD=hatch
07/16 13:28:58.330 INFO | mode_switcher:0673| -[FAFT]-[ start wait_for_client ]---
Expected Output format:
-Step 1: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]>
-Step 2: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]> extract full text in a variable, option to grab variable if associated with value
-Step 3: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]>
-Step 4: Extract Timestamp in mm/dd HH:MM:sec.millisec <string>|<string>:%1.3f|<[string]> [string] extract full text in a variable, option to grab variable if associated with value
Input data format:
2019-07-16 12:55:30 > string
2019-07-16 12:55:30 powerbtn: released
Expected Output format:
Note the marker >
-Step 1: Extract Timestamp in YYYY:MM:DD HH:mm:sec > < string>
-Step 2: Extract Timestamp in YYYY:MM:DD HH:mm:sec < full string>
Input data format
2019-07-16 12:55:31 > [12074.734997 HC 0x121 err 1]
Expected Output format
-Step 1: Extract Timestamp in YYYY:MM:DD HH:mm:sec > [< %1.3f string extract full text in a variable, option to grab variable if associated with value>]
Thanks a lot

  5 Comments

Show 2 older comments
Guillaume
Guillaume on 19 Jul 2019
I've not looked at this question in details. Does the file format differ much from the one in your previous question?
If not, it should be fairly trivial to adapt the parser I wrote, which would be a lot less effort than starting again from scratch.
Matlab
Matlab on 19 Jul 2019
Previous one timestamp was in UTC format and this one is dd/mm format.
Can you please support me?
Thanks
Matlab
Matlab on 23 Jul 2019
Any feedback ?

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 23 Jul 2019
Edited: Guillaume on 23 Jul 2019
Are you still on very old version (please fill the release field next to the question)?. If on a modern version, the file can easily be read with:
VariableNames = {'Date', 'Level', 'delim1', 'PID', 'delim2', 'Message'};
VariableWidths = [19, 5, 1, 23, 2, 5000];
VariableTypes = {'datetime', 'char', 'char', 'char', 'char', 'char'};
opts = fixedWidthImportOptions('VariableNames', VariableNames, 'VariableWidths', VariableWidths, 'VariableTypes', VariableTypes, 'SelectedVariableNames', [1, 2, 4, 6]);
opts = setvaropts(opts, 'Date', 'InputFormat', 'MM/yy hh:mm:ss.SSS');
content = readtable('test_that.txt', opts);
results in:
If on a version fo matlab that doesn't have tables, use textscan with fixed width fields:
fid = fopen('test_that.txt', 'rt');
content = textscan(fid, '%18c%*c%5c%*c%23c%*2c%s', 'Delimiter', '', 'Whitespace', '');
fclose(fid);
content = [cellstr(content{1}), cellstr(content{2}), cellstr(content{3}), content{4}]

  23 Comments

Guillaume
Guillaume on 1 Aug 2019
Yes, you can put text on a figure. How do you determine the position of said text. You need at least an x (time?) and a y (????).
Maybe you should draw what you want (on paper, in excel, whatever) because I still don't understand.
On the other, we've completely deviated from your original question which I believe has been answered, so maybe you should start a new question dedicated to your plotting. Others may be more likely to contribute that way.
Matlab
Matlab on 2 Aug 2019
Yes, you can put text on a figure. How do you determine the position of said text. You need at least an x (time?) and a y (????).
Sure.
x time = content.Date
y time = content.Level
x time = content.Date
y time = content.PID
x time = content.Date
y time = content.Message % May a trimmed one for view purpose
Thank a ton . Highly useful tip that too in time help.
Matlab
Matlab on 2 Aug 2019
Probably you can suggest on time DATAPOINTS plot text information which is text annotations

Sign in to comment.

More Answers (2)

Bob Nbob
Bob Nbob on 18 Jul 2019
I need next steps
◾Convert Datacontent into cell's - like timestamp , message data-1,message data-2
◾Put cell in proper format
◾Create Matlab variables
◾Display Matlab variable for good analysis
1) regexp automatically outputs all results in a cell, each containing a string.
2) You can convert strings to date time formats using datetime. To do this 'quickly' I suggest using a loop through your regexp results, or by using cellfun (which is really still a loop).
3) What exactly do you mean by this? I personally do not know of a way to dynamically create variables within Matlab, and I think you would be better served to keep the information in a cell array, or to make a table out of it. It is certainly possible to create new variables in a table from a captured string from regexp.
4) Displaying Matlab variables is simply a matter of not suppressing them, or if specifically wanting to display them then you can use fprintf with no target so it defaults to the command window.

  5 Comments

Show 2 older comments
Bob Nbob
Bob Nbob on 18 Jul 2019
You're getting the structure class error because 'names' outputs the results as a structure, rather than a cell, as I was expecting. Personally, I prefer 'tokens' or 'match' as my output flag for regexp.
Cellfun will not work with any input that is not a cell, hence the error.
I would suggest something like the following:
fileData = regexp(filecontent, '^(?<Time_MDY>[^ ]+) (?<Time_HMSsss>[^ ]+) (?<first>[^|]\w+)|\s+(?<last>[^|\r\n]+)|(?<last>[^|\r\n]+),\s+(?<first>[^|]|\w+)', 'tokens', 'lineanchors');
dates = datetime([fileData{1}{1},' ',fileData{1}{2}], 'InputFormat', 'MM/dd HH:mm:ss.SSS');
I did use a singular line to test this, so if you have multiple rows of inputs and outputs from regexp then you may need to investigate using a loop.
Bob Nbob
Bob Nbob on 19 Jul 2019
Are you only looking to capture the timestamp? It seems like the issue is more in the initial regexp processing than in the date time conversion.
If you are only looking to capture the timestamp I would suggest doing a regexp call like this:
filedata = regexp(filecontent'(\d\d.\d\d\s\d\d.\d\d.\d\d.\d\d\d)\D+\d\d\d\d\D+\n','tokens');
dates = datetime([filedata{:}], 'InputFormat', 'MM/dd HH:mm:ss.SSS');
If you are looking to capture more than the timestamps then please explain more. I know you outline some more in your OP, but I'm not entirely sure what you're referring to.
Matlab
Matlab on 19 Jul 2019
I am looking not only for timestamp but associated data.I wrote in the begining how my requirement/algorithm looks like.
My text source file contains data as below
07/18 11:27:02.968 DEBUG| autoserv:0729| autoserv is running in drone lab_chrome-debug.
07/18 11:27:02.968 DEBUG| autoserv:0730| autoserv command was: /build/hatch/usr/local/build/autotest/server/autoserv -p -r /tmp/test_that_results_hatch_iGVg61/results-1-firmware_UpdateKernelSubkeyVersion -m 10.223.131.106 --no_console_prefix -u autotest_system -l ad_hoc_build/ad_hoc_suite/firmware_UpdateKernelSubkeyVersion -s --no_use_packaging /tmp/tmphhqbvd --args servo_host=localhost servo_port=9999
07/18 11:27:02.968 INFO | pidfile:0016| Logged pid 23629 to /tmp/test_that_results_hatch_iGVg61/results-1-firmware_UpdateKernelSubkeyVersion/.autoserv_execute
07/18 11:27:02.969 DEBUG| host_info:0263| Committing HostInfo to store InMemoryHostInfoStore[HostInfo[Labels: [], Attributes: {}]]
07/18 11:27:02.969 DEBUG| host_info:0267| HostInfo updated to: HostInfo[Labels: [], Attributes: {}]
07/18 11:27:02.970 DEBUG| base_job:0357| Persistent state global_properties.tag now set to ''
07/18 11:27:02.972 DEBUG| base_job:0357| Persistent state global_properties.fast now set to False
07/18 11:28:16.561 DEBUG| servo:0666| Setting power_state to 'rec'
07/18 11:28:23.419 WARNI| test:0606| The test failed with the following exception
Traceback (most recent call last):
File "/build/hatch/usr/local/build/autotest/client/common_lib/test.py", line 567, in _exec
_cherry_pick_call(self.initialize, *args, **dargs)
File "/build/hatch/usr/local/build/autotest/client/common_lib/test.py", line 715, in _cherry_pick_call
return func(*p_args, **p_dargs)
File "/build/hatch/usr/local/build/autotest/server/site_tests/firmware_UpdateKernelSubkeyVersion/firmware_UpdateKernelSubkeyVersion.py", line 60, in initialize
self.switcher.setup_mode('dev' if dev_mode else 'normal')
File "/build/hatch/usr/local/build/autotest/server/cros/faft/utils/mode_switcher.py", line 427, in setup_mode
self.reboot_to_mode(mode)
File "/build/hatch/usr/local/build/autotest/server/cros/faft/utils/mode_switcher.py", line 474, in reboot_to_mode
self._enable_dev_mode_and_reboot()
File "/build/hatch/usr/local/build/autotest/server/cros/faft/utils/mode_switcher.py", line 717, in _enable_dev_mode_and_reboot
self._enable_rec_mode_and_reboot(usb_state='host')
File "/build/hatch/usr/local/build/autotest/server/cros/faft/utils/mode_switcher.py", line 590, in _enable_rec_mode_and_reboot
psc.power_on(psc.REC_ON)
File "/build/hatch/usr/local/build/autotest/server/cros/servo/servo.py", line 134, in power_on
self._servo.set_nocheck('power_state', rec_mode)
File "/build/hatch/usr/local/build/autotest/server/cros/servo/servo.py", line 672, in set_nocheck
raise error.TestFail(err_msg)
TestFail: Setting 'power_state' to 'rec' :: Timeout waiting for response.
07/18 11:28:23.420 DEBUG| test:0611| Running cleanup for test.
07/18 11:44:07.043 DEBUG| ssh_host:0310| Running (ssh) 'true' from '_install|wait_up|is_up|ssh_ping|run|run_very_slowly'
I have to parse the data as following
Example
fileData.timestamp = 07/18 11:28:23.420
fileData.timestamp.Msglib = DEBUG
fileData.timestamp.MsgSublib = test
fileData.timestamp.MsgSublib.idx = 0611
fileData.timestamp.MsgSublib.FullContenet = Running cleanup for test.
If error is seen, then skip those line from the input text file and continue parsing the information

Sign in to comment.


Matlab
Matlab on 18 Jul 2019
Adding the input file

  0 Comments

Sign in to comment.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!