Choose How to Manage Data in Parallel Computing
To perform parallel computations, you need to manage data access and transfer between your MATLAB® client and the parallel workers. Use this page to decide how to transfer data the client and workers. You can manage data such as files, MATLAB variables, and handle-type resources.
Determine Your Data Management Approach
The best techniques for managing data depend on your parallel application. Use the following tables to look for your goals and discover appropriate data management functions and their key features. In some cases, more than one type of object or function might meet your requirements. You can choose the type of object or function based on your workflow.
Transfer Data from Client to Workers
Use this table to identify some goals for transferring data from the client to workers and discover recommended workflows.
| Goal | Recommended Workflow |
|---|---|
Use variables in your MATLAB workspace in an interactive parallel pool. | The |
Transfer variables in your MATLAB workspace to workers on a cluster in a batch workflow. | Pass variables as inputs into |
Give workers access to large data stored on your desktop. |
|
Access large amounts of data or large files stored in the cloud and process it in an onsite or cloud cluster. | Use |
Give workers access to files stored on the client computer. | For workers in a parallel pool:
For workers running batch jobs:
|
Access custom MATLAB functions or libraries that are stored on the cluster. | Specify paths to the libraries or functions using
the |
Allow workers in a parallel pool to access non-copyable resources such as database connections or file handle | Use |
Send a message to a worker in an interactive pool running a function. | Create a
Before R2025a: Create a |
Transfer Data Between Workers
Use this table to identify some goals for transferring data between workers and discover recommended workflows.
| Goal | Recommended Workflow |
|---|---|
| In an
interactive parallel pool, use Use the |
Offload results from workers, which another worker can process. | Store the data in the |
Transfer Data from Workers to Client
Use this table to identify some goals for transferring data from a worker to a client and discover recommended workflows.
| Goal | Recommended Workflow |
|---|---|
Retrieve results from a
| Apply the |
Retrieve large results at the client. | Store the data in the |
| Use the |
Fetch the results from a parallel job. | Apply the |
Load the workspace variables from a | Apply the |
Transfer Data from Workers to Client During Execution
Use this table to identify some goals for transferring data from a worker during execution and discover recommended workflows.
| Goal | Recommended Workflow |
|---|---|
Inspect results from | Use a |
Update a plot, progress bar or other user interface with data from a function running in an interactive parallel pool. | Send the data to the client with a For very large computations with
1000s of calls to the |
Collect data asynchronously to update a plot,
progress bar or other user interface with data from a
| Use |
| Store the data in the |
| Store the files in the |
Compare Data Management Functions and Objects
Some parallel computing objects and functions that manage data have similar features. This section provides comparisons of the functions and objects that have similar features for managing data.
DataQueue vs. ValueStore
DataQueue and ValueStore are two objects in
Parallel Computing Toolbox™ you can use transfer data between client and workers. The
DataQueue object passes data from workers to the client in
a first-in, first-out (FIFO) order, while ValueStore stores
data that multiple workers as well as the client can access and update. You can
use both objects for asynchronous data transfer to the client. However,
DataQueue is only supported on interactive parallel
pools.
The choice between DataQueue and ValueStore
depends on the data access pattern you require in your parallel application. If
you have many independent tasks that workers can execute in any order, and you
want to pass data to the client in a streaming fashion, then use a
DataQueue object. However, if you want to store and share
values to multiple workers and access or update it at any time, then use
ValueStore instead.
fetchOutputs (parfeval) vs. ValueStore
Use the fetchOutputs function to retrieve the output
arguments of a Future object, which the software returns when
you run a parfeval or parfevalOnAll
computation. fetchOutputs blocks the client until the
computation is complete, then sends the results of the
parfeval or parfevalOnAll
computation to the client. In contrast, you can use ValueStore
to store and retrieve values from any parallel computation and also retrieve
intermediate results as they are produced without blocking the program.
Additionally, the ValueStore object is not held in system
memory, so you can store large results in the ValueStore.
However, be careful when storing large amounts of data to avoid filling up the
disk space on the cluster.
If you only need to retrieve the output of a parfeval or
parfevalOnAll computation, then
fetchOutputs is the simpler option. However, if you
want to store and access the results of multiple independent parallel
computations, then use ValueStore. In cases where you have
multiple parfeval computations generating large amounts of
data, using the pool ValueStore object can help avoid memory
issues on the client. You can temporarily save the results in the
ValueStore and retrieve them when you need them.
load and fetchOutputs (Jobs) vs. ValueStore
load, fetchOutputs (Jobs), and
ValueStore provide different ways of transferring data from
jobs back to the client.
load retrieves the variables related to a job you create
when you use the batch function to run a script or an
expression. This includes any input arguments you provide and temporary
variables the workers create during the computation. load
does not retrieve the variables from batch jobs that run a
function and you cannot retrieve results while the job is running.
fetchOutputs (Jobs) retrieves the output arguments
contained in the tasks of a finished job you create using the
batch, createJob or
createCommunicatingJob functions. If the job is still
running when you call the fetchOutputs (Jobs) function, the
fetchOutputs (Jobs) function returns an error.
When you create a job on a cluster, the software automatically creates a
ValueStore object for the job, and you can use it to store
data generated during job execution. Unlike the load and
fetchOutputs functions, the ValueStore
object does not automatically store data. Instead, you must manually add data as
key-value pairs to the ValueStore object. Workers can store
data in the ValueStore object that the MATLAB client can retrieve during the job execution. Additionally, the
ValueStore object is not held in system memory, so you can
store large results in the store.
To retrieve the results of a job after the job has finished, use the
load or fetchOutputs (Jobs)
function. To access the results or track the progress of a job while it is still
running, or to store potentially high memory results, use the
ValueStore object
AdditionalPaths vs. AttachedFiles vs. AutoAttachedFiles
AdditionalPaths, AttachedFiles, and
AutoAttachedFiles are all parallel job properties that
you can use to specify additional files and directories that are required to run
parallel code on workers.
AdditionalPaths is a property you can use to add cluster
file locations to the MATLAB path on all workers running your job. This can be useful if you
have files with large data stored on the cluster storage, functions or libraries
that are required by the workers, but are not on the MATLAB path by default.
The AttachedFiles property allows you to specify files or
directories that are required by the workers but are not stored on the cluster
storage. These files are copied to a temporary directory on each worker before
the parallel code runs. The files can be scripts, functions, or data files, and
must be located within the directory structure of the client.
Use the AutoAttachedFiles property to allow files needed
by the workers to be automatically attached to the job. When you submit a job or
task, MATLAB performs dependency analysis on all the task functions, or on the
batch job script or function. Then it automatically adds the files required to
the job or task object so they are transferred to the workers. Essentially, you
only want to set the AutoAttachedFiles property to
false if you know that you do not need the software to
identify the files for you. For example, if the files your job is going to use
are already present on the cluster, perhaps inside one of the
AdditionalPaths locations.
Use AdditionalPaths when you have functions and libraries
stored on the cluster that are required on all workers. Use
AttachedFiles when you have small files that are
required to run your code. To let MATLAB automatically determine if a job requires additional files to run,
set the AutoAttachedFiles property to
true.
See Also
ValueStore | FileStore | parallel.pool.Constant | parallel.pool.PollableDataQueue | spmdSend | spmdReceive | spmdSendReceive | spmdBarrier | fetchOutputs | fetchOutputs | load | parallel.pool.DataQueue