I found that the functions of readall and read+hasdata seem to be exactly the same. read+hasdata is a loop body, is it less efficient? So in any case you should avoid using read+hasdata? Why does matlab also provide the hasdata function? In what scenario is it more meaningful to use read+hasdata? hasdata ds = datastore('mapredout.mat'); while hasdata(ds) T = read(ds); end readall ds = datastore('mapredout.mat'); readall(ds) I think this comment is help：comment1 comment2

What is the difference between readall and read+hasdata?

fa wu on 21 Jul 2023

thanks for your anwser

`readall` is suitable for smaller datasets that can fit into memory, while `read` with `hasdata` is more appropriate for larger datasets------------`read` with `hasdata` +loop，is it read all data one time？the program consider whether the memory space is sufficient?

“Processing data in a streaming fashion: ”-------------would you like to provide any example

“Parallel processing:”---------------use parfor loop？ is it faster than readall？ readall can not use Parallel processing？

Mrutyunjaya Hiremath on 21 Jul 2023

Dear Fa Wu,

Accept the answer if you ok with the justification. That helps me a lot. And close the thread.

This means creating new threads for new Questions. Those will be explained with examples by others, and searching for answers is also good. :)

Walter Roberson on 21 Jul 2023

When you use a loop of read() and hasdata(), then the amount of data saved depends on how much you deliberately save in the loop. When you use read() and hasdata(), MATLAB will not read all the data first and then return it in chunks: when you use that, MATLAB will read only part of the data at a time (exact amount will depend on how you configured the dataset and on the dataset size.)

When you use readall(), MATLAB reads all of the data and returns it all.

The default implementation for readall() just loops doing hasdata() and growing the output array as it grows. However, individual datastore types are permitted to override the method if there is a more efficient implementation for their kind of datastore; for example for some kinds of datastores it is possible to pre-allocate the output array instead of growing it in a loop.

fa wu on 22 Jul 2023

thanks for your comment.

"for example for some kinds of datastores it is possible to pre-allocate the output array instead of growing it in a loop."--------------It seems that readall is better than read+hasdata in most cases because it automatically selects the appropriate algorithm based on different situations. Right ?

Walter Roberson on 22 Jul 2023

It depends: do you need all of the data in memory at the same time? If so then use readall(), which will never be worse than looping read() and hasdata(), and might potentially be better (for some kinds of datasets.)

If you do not need all of the data in memory at the same time, then consider looping.

For example, if you were doing your own training for some kind of custom learning algorithm, then you might need to read in all of the training data first, do statistical analysis over the complete set of data, and then start working with it. However, if you had already done your training and your goal was to use your trained network to classify a set of images, then you do not need to have all of the images to classify in memory at the same time: you can read them one by one, classify the one, record the result, and release the memory that was used to store that particular part of the data.

fa wu on 22 Jul 2023

Open in MATLAB Online

Thank you very much for your comment

"If you do not need all of the data in memory at the same time, then consider looping."--------Once this loop starts running, it will read all the data. Until all the data in mapredout.mat is read out. This seems to conflict with“If you do not need all of the data ”？ I'm not sure

ds = datastore('mapredout.mat');
while hasdata(ds)
    T = read(ds);
end

Walter Roberson on 23 Jul 2023

Yes, that loop will read all of the data, but it only stores one element of the datasets at a time, into T. At the end of the loop only the final element of the datasets is available. If you had used readall() then all of the data would be available after the end of the loop.

fa wu on 23 Jul 2023

Open in MATLAB Online

I'm a bit confused about “At the end of the loop only the final element of the datasets is available. ”

while hasdata(str)
T = read(str);
end
T
T =
  15×4 table

all the talbe T be print，not only the final element of the datasets.

Walter Roberson on 23 Jul 2023

Open in MATLAB Online

Your file name mapredout.mat hints that the .mat file might be the output of a mapreduce() call . If so then it is a Key-Value Datastore https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.keyvaluedatastore.html . Key-Value datastores default to

ReadSize — Maximum number of key-value pairs to read

1 (default) | positive integer

Maximum number of key-value pairs to read in a call to the read or preview

functions, specified as a positive integer.

So any one read() call on the datastore is not going to read all of the data.

The particular datastore you are using might have been configured for a larger ReadSize, but the ReadSize cannot be set to be infinite -- in general when you read() from a datastore, even one configured with only a single .mat file, the read() might not read in all of the data if the datastore is large enough . Whereas readall() will always read all of the data, provided that it does not run out of memory.

For testing purposes, I suggest you experiment with

while hasdata(str)
  T = read(str)
end
T

and see whether the read() is being called more than once, and if so whether the T at the end has all of the data that was read in. Depending on the kind of datastore and how big it is, sometimes a single read() is enough to read in all of the data; other datastores might need to read the data in chunks when you read(), and other datastores might only read one file at a time if the datastore has multiple files.

fa wu on 24 Jul 2023

Thank you for your guidance and help. I'll do an experiment

What is the difference between readall and read+hasdata?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

10 Comments
Show 8 older comments Hide 8 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

What is the difference between readall and read+hasdata?

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

10 Comments Show 8 older comments Hide 8 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

10 Comments
Show 8 older comments Hide 8 older comments