Textscan to read CSV File (Mixed Format)

Question

0 votes

I am trying to read multiple large CSV files (200,000 X 200 each) into MATLAB 2013a 64bit which contains numbers and strings. The order of numbers and strings is not fixed between files. I am using "Textscan" function. If I use the "%f" format then "textscan" stops execution as soon as it encounters the string so instead I ended up using "%s". But, using "str2double" to convert to numbers takes a long time. "str2double" also converts any non-numeric entry in the data to NaNs and I like using it but the overhead is too much.

I am wondering if anyone else has found a work around for this. I would ideally prefer using "textscan" since the code queries user to specify the line at which column names are in the CSV file and also some lines are commented in the CSV file and textscan is ideal for this file type. But any ideas are appreciated.

Thanks

1 Comment
Show -1 older comments Hide -1 older comments

per isakson on 26 Apr 2015

Did you ever try to read these files with importdata?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Image Analyst on 12 Dec 2014

0 votes

Apparently you need to inspect the line of text first to determine if it's a comment or not. So use fgetl() to read a string from the file. Then call strtrim() to get rid of any leading whitespace. Then check if the first character in non-numeric, a % symbol, or whatever else indecates a comment in your file. Then use sscanf() or textscan() to extract the numbers into a numerical array.

1 Comment
Show -1 older comments Hide -1 older comments

abhay kaul on 12 Dec 2014

Edited: abhay kaul on 12 Dec 2014

Just a clarification on my question. The issue is not to identify if the line that has a comment or not. Textscan does that for me. The data I have has some columns that are numeric whereas some columns have all strings. But the order of columns which are numeric or string is not fixed among multiple files. I should have provided that information in my original questions. Thanks.

Sign in to comment.

Answer 2

Star Strider on 12 Dec 2014

Open in MATLAB Online

0 votes

Please post one of your files or a representative sample of one that demonstrates the problem.

As an interim possibility, see if this approach works:

fidi = fopen('filename.csv');
d1 = textscan(fidi, '%f %f', 'Delimiter','\n', 'HeaderLines',11, 'CollectOutput',1);
fseek(fidi,0,0);
d2 = textscan(fidi, '%f %f', 'Delimiter','\n', 'HeaderLines',11, 'CollectOutput',1);

You may have to put that in a loop and subscript ‘d1’ and ‘d2’ and others appropriately. (This is archive code that worked for the file it was intended to read.) You will have to adapt it to your file structure, including an initial 'HeaderLines' option if necessary.

The code reads starting with the initial string and the numerical data following until it gets to the next string (here assumed to have the same structure) where it stops. The fseek function automatically re-starts textscan reading the file at the correct position at the second textscan call. It then reads that string and the subsequent data until it gets to the end.

2 Comments
Show None Hide None

abhay kaul on 12 Dec 2014

Here is an example file.

File 1

# File 1

# This line should be ignored

ParaA, ParaB, ParaC, ParaD

ab, 1,2,3

bc, 4,5,6

%%%%%%%%%%%%%%%%%

File 2

# File 2

ParaM, ParaN, ParaO, ParaP, ParaQ

# this line should be ignored

1,2,ab,4,12:40:50

3,4,cd,5,12:40:51

Please see that I am able to read all these file types properly with Textscan if I specify data type to be string. But the overhead of str2double is too high. So, I was wondering if there is some way to read all as %f and have MATLAB store non-numeric columns as NaN.

Thanks

Star Strider on 12 Dec 2014

Open in MATLAB Online

You would likely need to read them in with different format strings for each file. The first one might be:

'%s%f%f%f'

the second:

'%f%f%s%f%s'

reading the time in as a string, since it probably is. I see no way around that otherwise. You could parse the inputs to store only the numeric data and ignore the strings after you read each file.

Sign in to comment.

Textscan to read CSV File (Mixed Format)

1 Comment
Show -1 older comments Hide -1 older comments

Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Textscan to read CSV File (Mixed Format)

1 Comment Show -1 older comments Hide -1 older comments

Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None