How do you read a text file into a cell string?

16 views (last 30 days)
Jan
Jan on 26 Oct 2015
Answered: David Szwer on 16 Nov 2021
The import of a text file into a cell string a common task. Unfortunately there is no specific Matlab command for this. This can be accomplished by powerful commands also, but these commands have been subject to changes in the past.
Currently:
fid = fopen(FileName, 'r');
if fid == -1
error('Cannot open file fpr reading: %s', FileName);
end
DataC = textscan(fid, '%s', 'delimiter', '\n', 'whitespace', '');
Data = DataC{1};
fclose(fid);
Huh. In R2009b this was much leaner:
Data = textread(FileName, '%s', 'delimiter', '\n', 'whitespace', '');
Nicer. But unfortunately textread has been removed.
A straight solution is not trivial also: Read the text as string using fread() and split it at the linebreaks into a cell string. The splitting worked in historical Matlab versions (e.g. R6.5) with
CStr = dataread('string', Str, '%s', 'delimiter', Sep, 'whitespace', '');
(Same engine as textread but removed also). In new versions regexp can split the string:
regexp(a, '\n', 'split')
But special care must be taken for DOS char([13, 10]) and old Mac linebreaks char(13) (the latter are rare today, but possible).
I'm using a longer function based on fread, converting the DOS and Mac line breaks to modern char(10) and a C-Mex function for splitting the string into a cell string. This works from R6.5 until today and is not affected by the different impeding breaks of compatibility. But it is too complicated for such a standard task.
How do you import text files to a cell string? Do you (have to) care about (backward) compatibility?

Answers (1)

David Szwer
David Szwer on 16 Nov 2021
The readcell() function might do the job. It was introduced in R2019a.
Data = readcell(FileName, Delimiter = "")
It automatically accepts the linebreaks you want ({'\n','\r','\r\n'}, where \r is char(13) and \n is char(10)), but we need to set the delimiter to nothing (empty string) to prevent it returning a 2D cell array. There are two possible downsides to this function. First, it does not return blank lines as empty cells; they are ignored completely. There are options to deal with consecutive delimiters, but not (as far as I can tell) consecutive newlines. Second, the modern readx() functions do some heavy auto-detection of the file format. This might make them slow, and also there is a chance of Matlab making a wrong assumption about your file (for example, trying to interpret numbers into numeric types). You can fix this by specifying various options, but that brings you back to something very complicated.
I would also note that
Data = textread(FileName, '%s', 'delimiter', '\n', 'whitespace', '');
still works for me (R2021b). Sure, the textread function is "Not recommended starting in R2012b", but apparently "There are no plans to remove textread."

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!