how to read data and process it from text file in chunks without saving the data in Matlab

16 views (last 30 days)
Hi everybody I have a .csv file of 126000 rows with single column, I want to read them in chunks of 128 each time I process it, The problem is that the size of .csv file may increase more than that hence without saving the data how can I process it just to reduce memory cosumption

Accepted Answer

Edric Ellis
Edric Ellis on 2 Feb 2017
In recent versions of MATLAB (R2014b or later), you can use a datastore, which might possibly be slightly simpler than @Jan's approach. Here's how you might do that:
ds = datastore('myfile.csv');
ds.ReadSize = 128;
while ds.hasdata
data = ds.read(); % Note: returns data as a MATLAB 'table' object
end
In R2016b, you could use a tall array to process this data. A tall array behaves like an ordinary MATLAB array, but the data remains on disk, and is read in chunks only when necessary. For example, you might do:
ds = datastore('myfile.csv');
tallTable = tall(ds);
% Perform calculations on the tall table
minVal = min(ds.Var1);
maxVal = max(ds.Var1);
% force evaluation - actually reads the data from disk
[minVal, maxVal] = gather(minVal, maxVal);
  3 Comments
Jan
Jan on 6 Feb 2017
Edited: Jan on 8 Feb 2017
@madhuri: Then please unaccept my answer and accept this one. Thanks.
[EDITED, I've accepted this question. 08.02.2017 01:16 UTC]
Stephane
Stephane on 10 Jun 2020
I've been limited by the number of lines despite setting a high ds.ReadSize of 1e6.
Maximum number of line read is around 300,000 on my system with matlab 2020a.

Sign in to comment.

More Answers (1)

Jan
Jan on 30 Jan 2017
Edited: Jan on 30 Jan 2017
fid = fopen(FileName);
if fid == -1
error('Cannot open file: %s', FileName);
end
chunkLen = 128;
while ~feof(fid)
data = fscanf(fid, '%g', [1, chunkLen]);
... Process the data here ...
... The last chunk might be shorter than chunkLen ...
end
fclose(fid);
Importing 126000 numbers occupies 984kB of RAM only. Are you sure that this will be a problem?
  1 Comment
madhuri G
madhuri G on 2 Feb 2017
Thanks for the reply Jan Simon, Here the problem is not reading blocks, but if I have some 30 files to be read at each iteration to keep track of all file ids in upcoming iterations the processor may take long time, So i thought of using Datastore for saving file ,reading files & writing to another file

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!