MATLAB Answers

b
0

How to read csv file with asterix

Asked by b
on 1 May 2014
Latest activity Commented on by dpb
on 3 May 2014
Accepted Answer by dpb
Hello,
The NOAA csv data file has the following format (showing only the first 5 rows):
USAF WBAN YR--MODAHRMN DIR SPD GUS CLG SKC L M H VSB WW WW WW W TEMP DEWP SLP ALT STP MAX MIN PCP01 PCP06 PCP24 PCPXX SD
037683 ***** 201001010020 010 10 *** 34 BKN * * * 7.0 ** ** ** * 36 30 ****** 29.56 ****** *** *** ***** ***** ***** ***** **
037683 ***** 201001010050 010 9 *** *** SCT * * * 7.0 ** ** ** * 36 30 ****** 29.56 ****** *** *** ***** ***** ***** ***** **
037683 ***** 201001010120 020 9 *** 722 SCT * * * 7.0 ** ** ** * 36 30 ****** 29.56 ****** *** *** ***** ***** ***** ***** **
037683 ***** 201001010150 360 9 *** 30 OVC * * * 7.0 ** ** ** * 36 30 ****** 29.56 ****** *** *** ***** ***** ***** ***** **
This one has 28 columns (but it can vary). It has all mixed - for example, the column 'CLG' has values [34, *, 722, 30, ...]. How to extract a particular column, and yet be able to skip the row containing the asterix data? One entire row can be asterix - which would mean that no data is available for that particular day. If the index of the skipped row containing asterix is known, that would be beneficial.
Thanks.

  0 Comments

Sign in to comment.

1 Answer

Answer by dpb
on 1 May 2014
 Accepted Answer

Actually, it's not too bad...
>> fid=fopen('noaa.csv','r');
>> l=fgetl(fid); % the header
>> toks=tokens(l);ntok=length(toks); % how many columns?
>> c=textscan(fid,repmat('%s',1,ntok),'collectoutput',true) % read that many as strings
c =
{4x28 cell}
>> fid=fclose(fid);
Now begin to do something w/ the data...
>> [~,iclg]=intersect(toks,'CLG','rows') % find a particular variable location
iclg =
7
>> clg=str2double(c{1}(:,iclg)) % convert to numeric
clg =
34
NaN
722
30
>>
find(isnan())
will return locations of missing for however you wish to deal with them.
This is one place where the cell array helps, for sure...

  5 Comments

Oh, tokens has been one of my utility functions for so long I forget it's not a TMW-supplied one...it's simple enough use of strtok...
function tok = tokens(s,d)
% Simple string parser returns tokens in input string s
%
% STRTOK(S) the tokens in the string S delimited
% by "white space". Any leading white space characters are ignored.
%
% STRTOK(S,D) returns tokens delimited by one of the
% characters in D. Any leading delimiter characters are ignored.
% Get initial token and set up for rest
if nargin==1
[tok,r] = strtok(s);
while ~isempty(r)
[t,r] = strtok(r);
tok = strvcat(tok,t);
end
else
[tok,r] = strtok(s,d);
while ~isempty(r)
[t,r] = strtok(r,d);
tok = strvcat(tok,t);
end
end
I don't think strcmpi does what need, either, unless I'm missing something about way it works. The repeated columns can be found with ismember instead of intersect
>> find(ismember(toks,'WW','rows'))
ans =
13
14
15
>>
Oh, it comes to me--to make strncmp work, have to cast toks to cell string...then it functions same as does ismember above--
>> find(strncmp('WW',cellstr(toks),length('WW')))
ans =
13
14
15
>>
I never much liked the need for the length argument with strncmp, so I prefer the 'rows' argument w/ the various logical functions instead. Personal preference there, only...
Thank you Cedric and dpb, but is there a reason why 'rows' conflicts with 'intersect':
>> The 'rows' input is not supported for cell array inputs.
> In cell.intersect>cellintersectR2012a at 246
In cell.intersect at 137
In noaa at 26
Need to see what you actually did...the example above was R2012b here.

Sign in to comment.