How can I read a file with two different set of columns and multiple delimiters

2 views (last 30 days)
The whole file is just one column of data if we do not use any delimiter in the 'importdata' command. But, the first part of the file contains about 15 columns of data and 1000 rows and the second set of data contains 200 columns and 3000 rows if we use the comma and tab delimiter together in the import data function.
So I think the best way to write a code for this is in two steps, first to just extract the single column with tab delimiter and then save this data as an another delimited file and redo the whole thing using a comma delimiter.
Is this the right way to do it ?
  3 Comments
Harshith Nutulapati
Harshith Nutulapati on 16 Sep 2015
Edited: Harshith Nutulapati on 16 Sep 2015
I also tried to use various other functions, textscan, dlmread/write, fscanf/fread, csvread etc.. I dont need the first part of the data, I only need the comma delimited tab. I actually got the data as a single row using the importdata command without specifying any delimiters and deleted the first part of the data using 'cellfunction'. But I did not yet figure out how process the rest of the data.

Sign in to comment.

Answers (2)

dpb
dpb on 16 Sep 2015
Edited: dpb on 17 Sep 2015
Not clear what the two rows at the beginning of the green section are--are they also header rows or actually data? Also, is the last column the column number indicator for the given row or a line number or somesuch that is actually the 200th column value? IOW, are the lines actually the same length or not?
Assuming there really are 200 columns in the second section and the first two lines are headers, then
data=csvread('yourfile.csv',1002-1,0);
should work just fine. NB: the "-1" is because the offset in csvread is zero-base; I wrote it that way to emphasize the assumption above of 1000 lines in first section plus two header lines are being skipped. If either assumption above is wrong, clarify the actual situation in detail.
Of course, textscan or any number of alternatives is possible as well...
data=cell2mat(fid,repmat('%f',1,200), 'delimiter',',', ...
'headerlines',1002, ...
'collectoutput',1));
is the equivalent.

Kirby Fears
Kirby Fears on 16 Sep 2015
Try using delimread. Download the function and add it to your Matlab path using the addpath() function.
You can specify what rows you want to read and what the delimiter is.
out=delimread('harshith.txt',',','num',[4 7],[]);
disp(out.num);
I used the sample file below with the code above and it worked fine.
a b c d e f g h i j k
a b c d e f g
a b c d e f g h i j k
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
If your data is not purely numerical, you can get numerical only, text only, mixed output, etc.
  1 Comment
dpb
dpb on 16 Sep 2015
Edited: dpb on 16 Sep 2015
No apparent need for a File Exchange routine, the builtin dlmread (or, since it's comma-delimited, csvread is even simpler).
I copied your sample file and it also works just fine; of course as noted above, the line offset is zero-based counting so the syntax for the above file is
data=csvread('filename',3,0);
or
data= dlmread('filename',',',3,0)
At one time there were "issues" with csvread and friends on files containing nonnumeric data even if it was to be skipped over by the row count; this seems to have been alleviated with recent versions. I checked with your file and R12 handled it fine; R11 read the numeric values but returned a column vector instead of the 2D array. R11 is the earliest version I still have installed.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!