"readcell()" command does not read my entire file
8 views (last 30 days)
Show older comments
I am trying to read a csv file that is 43000 kb in size with readcell.
The file is mixed with numbers and strings, and when I try to read a smaller csv file of the same type it reads it no problem.
When I read the bigger file it reads only part of it.
How can I solve this issue?
11 Comments
dpb
on 26 Jun 2025
"Regarding the Java heep space and RAM alocation, I don't know how to do any of that."
Click on the "Preferences" icon in the toolstrip "Environment" section and explore...all kinds of tweaks you can make there.
The Java heap memory setting is under "General" while the array size limit is under "Workspace"
" because I have switched to using "fopen" instead."
Of course, fopen by itself doesn't do anything except return a file handle; it takes other explicit code to acutally read the file content. It would be interesting to see the full code used...I was going to suggest one could revert to lower-level i/o as an alterntive, but lacking the file format that wasn't really much of an option.
It would be a very interesting exercise to understand if, indeed, MATLAB is failing to successfully read a file with readcell that it can read/store in memory otherwise; that would be fodder very significant to Mathworks in enchancing performance and finding/fixing wasteful memory use.
dpb
on 27 Jun 2025
Edited: dpb
on 27 Jun 2025
"Although 43MB doesn't seem terribly big other than when add in the overhead of cell arrays."
What are the dimensions of the CSV file -- how many variables and of what type per field? Are the string data fields of varying liength or some known size (or at least maximum)? How many rows would be typical?
It can be demonstrated about the overhead of a cell array for simple cases to get an estimate of how much memory should be required...
d=ones; md=whos('d');
c={d}; mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 8, Cell: 112, Overhead: 104 bytes
d=ones(1,2); md=whos('d');
c={d}; mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 16, Cell: 120, Overhead: 104 bytes
d=ones(2); md=whos('d');
c={d}; mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 32, Cell: 136, Overhead: 104 bytes
d=ones; d=[d d]; md=whos('d');
c=num2cell(d); mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 16, Cell: 224, Overhead: 208 bytes
From which one can deduce the cell array overhead is 104 bytes per cell element over the base data storage. The same can be shown for character arrays with 2 bytes/element instead of 8, of course.
Consequently, given today's typical memory footprint, an extra N*104 bytes per cell could begin to add up with very long and wide files...
But, to bring the same data into MATLAB as one variable array would require the same overhead to put the disparate types into a cell array so the internal footprint would be the same. @Tevel didn't tell/show us what alternate form was used with fopen; but if textscan can succeed while readcell fails, then there's a major flaw in @readcell as it (textscan) must return a cell array if data types are mixed as well.
Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!