TEXTSCAN is my personal nemesis: Quotes strings with commas

Dear all,
I thought I acquired some Matlab knowledge over the years. But textscan really makes me feel dumb. I can not figure out the syntax to read a quoted text file.
Let's say I have a text file with the following format:
"L1: quoted text, with commas","111.111",123\n
"L2: quoted text, with commas","222.222",234\n
...
(the field width are the same just for the example; consider them variable in the real world).
Now my understanding of textscan's formatSpec would be:
format = '%q,"%f",%f';
delim = '\n';
res = textscan(fileId,format,'Delimiter',delim,'ReturnOnError',false);
This is because I am assuming that %q considers everything between two quotes as part of the same string, as the help states
"If the string begins with a double quotation mark ("), omit the leading quotation mark
and its accompanying closing mark, which is the second instance of a lone double
quotation mark"
However this fails with the ominous error message:
Error using textscan
Mismatch between file and format string.
Trouble reading 'Literal' field from file (row number 1, field number 2) ==>
"L2: quoted text, with commas","222.222",234\n
So the commas in the quoted string somehow screw-up the parsing, since using the regexp-like
format = '"%[^"]","%f",%f';
instead of %q works perfectly fine.
So my questions are:
  1. What am I doing wrong?
  2. Why is there the %q option if it does not treat everything in between as string (as this would than simply be a shorthand for "%s")?
  3. Am I using the delimiter option correctly?
Any help appreciated!
Greetings, David

2 Comments

But without a nemesis, life might get boring. :)
You mean every superhero needs a supervillain?

Sign in to comment.

 Accepted Answer

fmt = '%q"%f"%f';
fid = fopen('temp5.txt','rt');
C = textscan(fid,fmt,'Delimiter',',');
fclose(fid);
Gives this:
>> C{:}
ans =
'L1: quoted text, with commas'
'L2: quoted text, with commas'
ans =
111.1110
222.2220
ans =
123
234
And your questions:
  1. you are over-specifying the delimiter, and also confusing the delimiter with the end of line character |

6 Comments

Seems to be a synch mismatch... Now saw the full text. Thx for the answer.
Just to be sure. So the proper way is to concatenated the field formats and then specifying the delimiter separately (always ignoring EOLs)?
@David J. Mack: I hope that it helps. It takes a while to get comfortable with MATLAB details like this: lots of time, practice, and reading the documentation :)
Dear Stephen, normally the help is a very good start, however textscan makes it rather difficult.
Nevertheless, my example file seemed to over-simplify the real world. Some of my files look like this:
'"11","12","2016-11-20T00:55:25.400036Z"\n'
'"21","22","2016-11-20T00:55:25.473201Z"\n'
with the last column containing datetime. Following your suggestion, using textscan as intended with
fId = fopen('test.txt');
fmt = '"%f""%f"%{yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z''}D';
res = textscan(fId,fmt, 'Delimiter',',', 'ReturnOnError',0);
fails again with the ominous error (or returns only the first numeric field). This is exactly why I started to used the over-specified work-around! Actually using
fmt = '"%f","%f",%{yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z''}D';
instead works. And even worse, removing the first column from the file and the delimter-free format again works... gnarr.
Any suggestions (file attached)?
@David J. Mack: I use an old version of MATLAB that does not have datetime, so I cannot try that format string. I can suggest that you try defining the delimiter as both , and ", and select the option MultipleDelimsAsOne, something like this:
fmt = '%f%f%{yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z''}D';
opt = {'Delimiter',{'"',','}, 'MultipleDelimsAsOne',true};
fid = fopen(...);
C = textscan(fid,fmt,opt{:});
fclose(fid);
This worked for me (%s instead of datetime):
fmt = '%f%f%s';
opt = {'Delimiter',{'"',','}, 'MultipleDelimsAsOne',true};
fid = fopen('test.txt','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
>> C{:}
ans =
11
21
ans =
12
22
ans =
'2016-11-20T00:55:25.400036Z'
'2016-11-20T00:55:25.473201Z'
Thanks for the help again (example unfortunately also does not work with %q instead of the datetime format). But this is exactly why I do not like textscan (too say the least). Using the multi-delimiter option will collapse over empty fields, which is not what I want. I'll keep your advice of not over-specifying and merge it with my approach (added an answer, just in case someone encouters a similar problem).
@David J. Mack: the task is complicated by having numeric fields surrounded by double quotes, which is rather erroneous because double quotes always indicate strings. Two solution for this are:
  1. Change the file writing so that it correctly does not put double quotes around numeric values.
  2. Import the numeric values are strings using %q, and quickly convert them using str2double once inside MATLAB.
In both of these cases you could go back to defining the delimiter as just the comma, which would avoid collapsing fields together.

Sign in to comment.

More Answers (2)

If you have access to R2016b, try using detectImportOptions with readtable.
opts = detectImportOptions(yourfile);
T = readtable(yourfile,opts)
I think this will do what you want without a lot of fuss.

1 Comment

Dear Jeremy,
just installed R2016b and will have a look. Judging by the name of the function, I implemented that already (as readtable uses textscan if I remeber it correctly). But its good to know!

Sign in to comment.

Thanks Stephen for staying with me (you where right in the first place; reading the doc carefully helps)!
I finally found my kryptonite! Don't think REGEX when using textscan!
The solution is not using quotes directly in the format, as this is not supported by textscan. Instead create a matching group for each quote. For the example above, the correct format is (using %q instead of the datetime expression for simplicity).
fmt = '%*["]%f%*["]%*["]%f%*["]%q';
instead of
fmt = '"%f""%f"%q';
So just in case, somebody encounters the same problem.

1 Comment

Any suggestions for this question:
https://au.mathworks.com/matlabcentral/answers/402792-yet-another-textscan-question

Sign in to comment.

Asked:

on 13 Feb 2017

Commented:

on 27 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!