Main Content

Quick Start Parallel Computing in MATLAB

You can use parallel computing to carry out many calculations simultaneously. Split large problems into smaller ones, which you can process at the same time.

With parallel computing, you can:

  • Save time by distributing tasks and executing them simultaneously

  • Solve big data problems by partitioning data

  • Take advantage of your desktop computer resources and scale up to clusters and cloud computing

Diagram that shows four MATLAB workers in a loop.

This table lists some essential parallel computing terms and their definitions.

TermDefinition
Thread

Smallest set of instructions that a CPU can schedule and execute independently. A GPU, multiprocessor, or multicore computer can perform multithreading, or executing multiple threads simultaneously.

Process

Execution of an instance of a computer program by one or many threads. Each process has its own blocks of memory.

Node

Standalone computer containing one or more CPUs or GPUs. Nodes can be networked to form a cluster or supercomputer.

Cluster

Collection of interconnected computers that work together as a unified system to provide high-performance computing power for processing complex and data-intensive tasks.

Scalability

Increase in parallel speedup with the addition of more resources.

Prerequisites

To run the examples on this page, you must have a Parallel Computing Toolbox™ license. To determine whether you have Parallel Computing Toolbox installed, and whether your machine can create a default parallel pool, enter this code in the MATLAB® Command Window.

if canUseParallelPool
    disp("Parallel Computing Toolbox is installed")
else
    disp("Parallel Computing Toolbox is not installed")
end

Alternatively, to see which MathWorks® products you have installed, in the Command Window, enter ver.

Accelerate MATLAB Code

Before you parallelize your code, you can use techniques such as vectorization and preallocation to improve the sequential performance of your MATLAB code. Sequential acceleration and parallelization can often work together to give cumulative performance improvements.

Vectorization

MATLAB is optimized for operations involving matrices and vectors. The process of revising loop-based, scalar-oriented code to use MATLAB matrix and vector operations is called vectorization. Using vectorized code instead of loop-based operations often improves your code performance.

These code snippets compare the amount of time the software needs to calculate the square root of 1,000,000 values with loop-based code against vectorized code.

Without Vectorization

With Vectorization

tic
for k = 1:1000000
   x(k) = sqrt(k);
end
toc
Elapsed time is 0.112298 seconds.
tic
k = 1:1000000;
x = sqrt(k);
toc
Elapsed time is 0.006783 seconds.

Preallocation

In some cases, while- and for-loops that incrementally increase the size of an array each time through the loop can adversely affect performance and memory use. You can preallocate the maximum amount of space required for an array instead of continuously resizing arrays when you run loop-based code.

These code snippets compare the amount of time the software needs to create a scalar variable x, when you gradually increase the size of x in a for-loop against when you preallocate a 1-by-1,000,000 block of memory for x.

Without Preallocation

With Preallocation

tic
x = 0;
for k = 2:1000000
   x(k) = x(k-1) + 5;
end
toc
Elapsed time is 0.103415 seconds.
tic
x = zeros(1,1000000);
for k = 2:1000000
   x(k) = x(k-1) + 5;
end
toc
Elapsed time is 0.018758 seconds.

This table shows the appropriate preallocation function for the type of array you want to initialize.

Array Type to InitializePreallocation Function
Numericzeros
Stringstrings
Cellcell
Tabletable

Run MATLAB on Multicore and Multiprocessor Nodes

MATLAB supports two ways to parallelize your code on multicore and multiprocessor nodes.

Implicit Parallelization with Built-in Multithreading

Some MATLAB functions implicitly use multithreading to parallelize their execution. These functions automatically execute on multiple computational threads in a single MATLAB session, which means they run faster on multicore-enabled machines. Some examples are linear algebra and numerical functions such as fft, mldivide, eig, svd, and sort. Therefore, if you use these functions on a machine with many cores, you can observe an increase in performance.

Diagram comparing the time it takes a MATLAB client using multithreading to accelerate task execution to a MATLAB client that does not use panellization.

Explicit Parallelization with MATLAB Workers

MATLAB and Parallel Computing Toolbox software uses MATLAB workers to explicitly parallelize your code. MATLAB workers are MATLAB computational engines that run in the background without a graphical desktop. The MATLAB session you interact with, also called the MATLAB client, instructs the workers with parallel language functions. You use Parallel Computing Toolbox functions to automatically divide tasks and assign them to these workers to execute the computations in parallel.

Diagram comparing the time it takes parallel computing workers to accelerate task execution to a MATLAB client with no parallelization.

Set Up Environment for Explicit Parallelization

If you have Parallel Computing Toolbox installed on your machine, you can start an interactive parallel pool of workers to take advantage of the cores in your multicore computer.

A parallel pool (parpool) is a group of MATLAB workers on which you can interactively run code.

You can create a parallel pool of workers using parpool or functions with automatic parallel support. By default, parallel language functions such as parfor, parfeval, and spmd automatically create a parallel pool when you need one. When the workers start, your MATLAB session connects to them. For example, this code automatically starts a parallel pool and runs the statement in the parfor-loop in parallel on six workers.

parfor i = 1:100
    c(i) = max(eig(rand(1000)));
end
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 6 workers.

You can also use the parallel status indicator in the lower left corner of MATLAB desktop to start a parallel pool manually. Click the indicator icon, and then select Start Parallel Pool.

The parallel status indicator, including a menu showing options for starting a parallel pool and inspecting your parallel preferences.

To stop a parallel pool while it is starting, press Ctrl+C or Ctrl+Break. On Apple macOS operating systems, you also can use command+ (the command key and the plus key).

Starting a parallel pool often takes a long time, which can impact performance for code that takes only a few seconds to execute. For longer running code, the overhead becomes less significant.

Your default parallel environment determines the parallel pool cluster. The default parallel environment of your local machine is called Processes. This environment starts a parallel pool of process workers. You can see the selection of available cluster profiles in the Parallel menu on the MATLAB Home tab.

Selection of the Processes local cluster profile from the MATLAB menu.

Note

For the default Processes profile, the default number of process workers is one per physical CPU core using a single computational thread. This restriction ensures that each worker has exclusive access to a floating-point unit, and generally optimizes performance of computational code. If your code is not computationally intensive, for example, code that is input/output (I/O) intensive, then consider using up to two workers per physical core. Running too many workers on too few resources can impact the performance and stability of your machine.

This table summarizes the different ways you can create interactive parallel pools.

Parallel EnvironmentWorker TypeLocationNumber of Available Cores or Threads
ProcessesProcessLocal machine

Up to 512 cores

ThreadsThreadLocal machine

Up to 512 threads

backgroundPoolThreadLocal machine

Without a Parallel Computing Toolbox license: 1 thread

With a Parallel Computing Toolbox license: Up to the number of threads that the maxNumCompThreads function returns

ClusterProcessOnsite or cloud cluster

Up to the maximum number of workers the cluster can start

Parallel Computing Toolbox also supports running a parallel pool of workers that are backed by computing threads instead of process workers. This parallel environment is called Threads. Thread workers have reduced memory usage, faster scheduling, and lower data transfer costs. However thread workers support only a subset of the MATLAB functions that are available to process workers.

MATLAB also supports an additional local parallel environment called backgroundPool. The backgroundPool environment is backed by thread workers and supports running code in the background while you run other code in your session at the same time. You can use one thread worker in the backgroundPool environment when you do not have a Parallel Computing Toolbox license. If you have a Parallel Computing Toolbox license, the maximum number of thread workers in your backgroundPool is the value that the maxNumCompThreads function returns.

If you have access to onsite or cloud clusters, you can discover other clusters running on your network or on Cloud Center by clicking Parallel > Discover Clusters and following the prompts. Parallel pools on clusters are backed by process workers and support the full parallel language.

When you have an interactive parallel pool of workers, you can use parallel language functions to split large problems into smaller tasks that workers can execute in parallel. To accelerate your MATLAB code, use interactive parallel features such as parfor.

Run Explicit Parallelization with parfor-loop

This example shows how to convert a for-loop into a parfor-loop and calculate the scalability of the parfor-loop with the number of workers.

You can convert for-loops to run in parallel by using a parfor-loop. Often, you can simply replace for with parfor. However, you often need to adjust your code further to run in it parallel.

Mechanics of parfor-loops

When you run a parfor-loop, MATLAB executes the statements in the loop body in parallel. Each execution of the parfor-loop body is an iteration. The MATLAB client issues the parfor command and coordinates with the workers to execute the loop iterations in parallel on the workers in a parallel pool. A parfor-loop can provide significantly better performance than its analogous for-loop because several workers compute iterations simultaneously.

When you run a parfor-loop, the MATLAB client divides the loop iterations into subranges and assigns them to the workers. If the number of workers is equal to the number of loop iterations, each worker performs one iteration of the loop. If the number of iterations is greater than the number of workers, some workers perform more than one loop iteration. In this case, a worker receives multiple iterations at once to reduce communication time. The client also performs a static analysis of the parfor-loop code to determine which data to transfer to each worker and which data to transfer back to the client. The client sends the necessary data to the workers, which execute most of the computation. The workers then send the results back to the client, which assembles those results. MATLAB workers evaluate iterations in no particular order and independently of each other. Because each iteration is independent, the iterations need not be synchronized, and often are not.

A parfor-loop must satisfy these basic requirements.

  • Loop iterations are independent. When you convert your for-loop into a parfor-loop, you must ensure that the loop iterations are independent. If your parfor code has dependence between the loop iterations, the Code Analyzer in the MATLAB Editor detects the dependence. Executing the parfor-loop generates an error.

  • Loop execution are not in order. Because parfor-loop iterations have no guaranteed order, you must ensure that your code that uses a parfor-loop does not rely on the output of the parfor-loop being in order.

Convert for-loops to parfor-loops

Convert a for-loop into a parfor-loop in code that calculates the maximum value of the singular-value decomposition of 5000 200-by-200 random matrices by replacing for with parfor. Execute the parfor-loop on six workers. Compare their execution times.

When you use parfor and you have Parallel Computing Toolbox software installed, MATLAB automatically starts a parallel pool of workers. The parallel pool can take a long time to start. This example shows a second run with the pool already started. You can observe that the parfor code executed on six workers runs much faster than the for-loop code.

tic
y = zeros(5000,1);
for n = 1:5000
    y(n) = max(svd(randn(200)));
end
toc
Elapsed time is 21.837346 seconds.
tic
y = zeros(5000,1);
parfor n = 1:5000
    y(n) = max(svd(randn(200)));
end
toc
Elapsed time is 3.908282 seconds.

If the speed-up is less than you expect, you can calculate the scalability of your parfor-loop code.

Calculate Scalability

You can calculate the scalability of converting this for-loop into a parfor-loop. Use the scalability to determine whether your parfor-loop code scales well with the number of workers, and whether a limit exists.

Use a for-loop to iterate through different numbers of workers to run the parfor-loop. To specify the number of workers, use the second input argument of parfor. You can modify the values in the NumWorkers array to match your available resources.

numIterations = 5000;
numWorkers = [1 2 3 4 5 6];
t = zeros(size(numWorkers));
for w = 1:numel(numWorkers)
    tic;
    y = zeros(numIterations,1);
    parfor (n = 1:numIterations,numWorkers(w))
        y(n) = max(svd(randn(200)));
    end
    t(w) = toc;
end

Calculate the speedup by computing the ratio between the computation time of a single worker and the computation time of each maximum number of workers. To calculate the efficiency of parallelizing the tasks, divide the ideal speedup by the calculated speedup.

speedup = t(1)./t;
efficiency = (speedup./numWorkers).*100;

To visualize how the computations scale up with the number of workers, plot the speedup and efficiency against the number of workers with the comparePlot function defined at the end of the example.

The speedup increases as the number of workers increases. Adding more workers shows a reduction in computation time, but the scaling is not perfect because the efficiency decreases as the number of workers increases. This is due to the overhead associated with parallelization. Parallel overhead includes the time the software needs for communication, coordination, and data transfer from the client to the workers and back.

parfor-loops that do not have many iterations or computationally demanding tasks generally do not scale well with an increasing number of workers because the time the software needs for data transfer is significant compared with the time the software needs for computation.

comparePlot(numWorkers,speedup,efficiency);

After you finish your computation, you can delete the current parallel pool. Get the current parallel pool with the gcp function.

delete(gcp)
Parallel pool using the 'Processes' profile is shutting down.

Helper Functions

This function plots the speedup and efficiency of the parfor-loop against the number of workers.

function comparePlot(numWorkers,speedup,efficiency)
yyaxis left
plot(numWorkers,speedup,'-*')
grid on
title('Speedup and Efficiency with Number of Workers');
xlabel('Number of Workers');
xticks(numWorkers);
ylabel('Speedup');
yyaxis right
plot(numWorkers,efficiency,'--o');
ylabel('Efficiency')
xticks(numWorkers);
ylabel('Efficiency (%)');
legend('Speedup','Efficiency')
end

Discover Other Parallel Language Functions

You can perform these tasks by using Parallel Computing Toolbox with other parallel language functions.

  • Perform asynchronous processing with parfeval.

  • Speed up your calculation on the supported GPUs of your computer by using gpuArray.

  • Scale up your computation using big data processing tools, such as distributed and tall, with parallel pools.

  • Offload your calculation to computer clusters or cloud computing facilities using batch.

  • Run Simulink® models in parallel with parsim (Simulink) and batchsim (Simulink).

  • Offload your calculation to a cluster onsite or in the cloud using MATLAB Parallel Server™ software. For more information, see Clusters and Clouds.

Several MathWorks products now offer built-in support for parallel computing products without requiring extra coding. For the current list of these products and their parallel functionality, see Parallel Computing Support in MATLAB and Simulink Products.

For more information about the parallel language functions and their applications, see Choose a Parallel Computing Solution and Parallel Language Decision Tables.

See Also

| | | | | | | | | (Simulink) | (Simulink)

Related Topics