How to textscan dates and data delimited with spaces?

I have data like this:
2010 12 31 23 50 198 8.2 999 99.0 9999
2011 01 01 00 00 199 8.1 199 8.5 2347
2011 01 01 00 10 198 8.4 999 99.0 9999
How would I parse the lines into dates and data?
Ideally I would expect something like the below to parse it into dates and data:
textscan('2010 12 31 23 50 198 8.2 999 99.0 9999','%16{yyyy MM dd HH mm}D %f %f %f %f')
But the spaces in the dates seem to be treated as delimiters and only the first field is passed to the date parser.
Is it possible to do this without preprocessing?
I found I could parse the space-formatted date by disabling the space delimiters, but then I couldn't parse later fields on the line unless I added delimiters:
>> textscan('2010 12 31 23 50,198,8.2,999,99.0,9999','%{yyyy MM dd HH mm}D %f %f %f %f','Delimiter',',')
ans =
1×5 cell array
{2×1 datetime} {[198]} {[8.2000]} {[999]} {[99]}
>>

3 Comments

I can preprocess a file like:
awk '{print $1"-"$2"-"$3"T"$4":"$5,$6,$7,$8,$9,$10,$11}' file.txt > file_pp.txt
and then parse it like:
>> textscan('2010-12-31T23:50 198 8.2 999 99.0 9999','%{yyyy-MM-dd''T''HH:mm}D %f %f %f %f')
ans =
1×5 cell array
{2×1 datetime} {[198]} {[8.2000]} {[999]} {[99]}
>>
but I'd prefer not to require temporary files.
Is your original data a matrix, cell array, character array, or string array?
The original is a gzipped text file with a header from NOAA at https://www.ndbc.noaa.gov/station_page.php?station=chlv2
% head ~/Downloads/chlv2c2011.txt
#YY MM DD hh mm WDIR WSPD GDR GST GTIME
#yr mo dy hr mn degT m/s degT m/s hhmm
2010 12 31 23 10 202 7.5 999 99.0 9999
2010 12 31 23 20 204 7.5 999 99.0 9999
2010 12 31 23 30 204 7.5 999 99.0 9999

Sign in to comment.

 Accepted Answer

Assuming your data is numeric, I would do this.
data = [2010 12 31 23 50 198 8.2 999 99.0 9999;
2011 01 01 00 00 199 8.1 199 8.5 2347;
2011 01 01 00 10 198 8.4 999 99.0 9999];
Dates = datetime(data(:,1),data(:,2),data(:,3),data(:,4),data(:,5),0);
final=table(Dates,data(:,6:end));
final = splitvars(final,2,'NewVariableNames',{'A','B','C','D','E'})
final = 3×6 table
Dates A B C D E ____________________ ___ ___ ___ ___ ____ 31-Dec-2010 23:50:00 198 8.2 999 99 9999 01-Jan-2011 00:00:00 199 8.1 199 8.5 2347 01-Jan-2011 00:10:00 198 8.4 999 99 9999

1 Comment

This is also probably faster for converting the dates that using the %D format.

Sign in to comment.

More Answers (1)

Another option, for an input string:
temp = split('2010 12 31 23 50,198,8.2,999,99.0,9999',',');
Date = datetime(temp{1},'InputFormat','yyyy MM dd HH mm');
data = temp(2:end);
(But readmatrix() to import the data in numeric format is probably a better idea)

Categories

Products

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!