how to read data and process it from text file in chunks without saving the data in Matlab
16 views (last 30 days)
Show older comments
Hi everybody I have a .csv file of 126000 rows with single column, I want to read them in chunks of 128 each time I process it, The problem is that the size of .csv file may increase more than that hence without saving the data how can I process it just to reduce memory cosumption
2 Comments
Accepted Answer
Edric Ellis
on 2 Feb 2017
In recent versions of MATLAB (R2014b or later), you can use a datastore, which might possibly be slightly simpler than @Jan's approach. Here's how you might do that:
ds = datastore('myfile.csv');
ds.ReadSize = 128;
while ds.hasdata
data = ds.read(); % Note: returns data as a MATLAB 'table' object
end
In R2016b, you could use a tall array to process this data. A tall array behaves like an ordinary MATLAB array, but the data remains on disk, and is read in chunks only when necessary. For example, you might do:
ds = datastore('myfile.csv');
tallTable = tall(ds);
% Perform calculations on the tall table
minVal = min(ds.Var1);
maxVal = max(ds.Var1);
% force evaluation - actually reads the data from disk
[minVal, maxVal] = gather(minVal, maxVal);
3 Comments
Stephane
on 10 Jun 2020
I've been limited by the number of lines despite setting a high ds.ReadSize of 1e6.
Maximum number of line read is around 300,000 on my system with matlab 2020a.
More Answers (1)
Jan
on 30 Jan 2017
Edited: Jan
on 30 Jan 2017
fid = fopen(FileName);
if fid == -1
error('Cannot open file: %s', FileName);
end
chunkLen = 128;
while ~feof(fid)
data = fscanf(fid, '%g', [1, chunkLen]);
... Process the data here ...
... The last chunk might be shorter than chunkLen ...
end
fclose(fid);
Importing 126000 numbers occupies 984kB of RAM only. Are you sure that this will be a problem?
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!