Automatic Parallelization of for
-Loops in the Generated
Code
MATLAB®
Coder™ automatically parallelizes for
-loops in generated C/C++
code by default, using the Open Multiprocessing (OpenMP) library. Automatic
parallelization supports parallelization of explicit and implicit
for
-loops, and for
-loops performing reduction
operations. For more information, see Reduction Operations Supported for Automatic Parallelization of for-loops.
To generate parallel C/C++ code, your compiler must support OpenMP library. The Enable automatic parallelization option supports all
build types (MEX, DLL, LIB, and EXE) of coder.config
function.
MATLAB
Coder uses internal heuristics to determine whether a
for
-loop should be parallelized.
Parallelization of Explicit and Implicit for
-loops
Automatic parallelization of
for
-loops supports both explicit and implicit
for
-loops. You can specify the Maximum number of CPU threads to run parallel
for
-loops in the generated C/C++ code.
For more information, see Specify Maximum Number of Threads to Run Parallel for-Loops in the Generated Code.
Parallelization of explicit
for
-loops
Explicit for
-loops are for
-loops that are
present in your MATLAB code. Elementwise operations on an array benefit from automatic
parallelization. The table shows a MATLAB function with an explicit for
-loop and the C code
generated using automatic parallelization. To generate code, save the MATLAB function as explicitLoop.m
in the current working
directory and run the codegen
command.
MATLAB Code | Generated C Code |
---|---|
% MATLAB code function out = explicitLoop(a, b) out = zeros(size(a)); for i = 1:numel(a) if a(i) > 1000 out(i) = a(i) - b(i); else out(i) = a(i) + b(i); end end end % C code generation command >> codegen explicitLoop -args {1:10000, 1:10000} -config:lib -report |
|
Parallelization of implicit
for
-loops
Implicit for
-loops are the loops that are not written in the
MATLAB code, but are MATLAB operations that are translated to a for
-loop in the
generated C/C++ code. The table shows a MATLAB function with an implicit for
-loop and the C code
generated using automatic parallelization. To generate code, save the MATLAB code as implicitLoop.m
in the current working
directory and run the codegen
command.
MATLAB Code | Generated C Code |
---|---|
% MATLAB code function [y]= implicitLoop(in) a = ones(10000,1) + in; y = [a a]; end % C code generation command >> codegen implicitLoop -args {100} -config:lib -report |
|
Loop Versioning
In the above examples, the loop bounds are compile-time constants. When the loop
bounds are not known at compile time, the code generator generates both serial and
parallel versions of the for
-loop. Depending on the number of
loop iterations at run-time, the more efficient version of the loop is
executed.
The table shows a MATLAB function loopVersion
and the generated C code
containing both serial and parallel versions of the
for
-loop.
MATLAB Code | Generated C Code |
---|---|
% MATLAB code function y = loopVersion(A, n) y = zeros(size(A)); for i = 1:n y(i) = sin(A(i)); end end % C code generation command >> codegen loopVersion -args {1:10000, 10000} -config:lib -report |
|
Code Generation Report and Code Insights
To view the generated C/C++ code for the above MATLAB function explicitLoop
, open the code generation
report. In the Code pane of the report, the line numbers
highlighted in green next to the for
-loop show the part of the
code that is parallelized.
Generated Code
In the generated code, the OpenMP pragma statement before the
for
-loop indicates the parallelization of the
for
-loop.
void explicitLoop(const double a[10000], const double b[10000],
double out[10000])
{
double d;
int i;
if (!isInitialized_explicitLoop) {
explicitLoop_initialize();
}
#pragma omp parallel for num_threads(omp_get_max_threads()) private(d)
for (i = 0; i < 10000; i++) {
d = a[i];
if (d > 1000.0) {
out[i] = d - b[i];
} else {
out[i] = d + b[i];
}
}
}
Code Insights
In the Code Insights tab, under Automatic
Parallelization, you can see detailed information about the
for
-loops that are not parallelized or versioned in the
generated code.
For example, regenerate code for the explicitLoop
function defined earlier by specifying a smaller size for the input
arguments.
>> codegen explicitLoop -args {1:100, 1:100} -config:lib -report
In this case, the for
-loop is not parallelized as there is
no performance benefit in execution time. To view such code insights, open the
code generation report and click Code Insights > Automatic
Parallelization.
Control Parallelization of for
-loops
You can disable automatic parallelization of for
-loops if the
loop performs better in serial execution.
Disable parallelization of all
for
-loops
You cannot disable the parallelization of parfor
and the loops
followed by coder.loop.parallelize("loopID")
.
To disable automatic parallelization of all for
-loops:
In the MATLAB Coder app, in the More Settings > Speed pane, and uncheck the Enable automatic parallelization setting.
In the MATLAB Command Window, set the code configuration option
EnableAutoParallelization
tofalse
.
Disable parallelization of specific
for
-loops
To prevent parallelization of a specific for
-loop, place
coder.loop.parallelize("never")
immediately before the loop
in the MATLAB code. This overrides the EnableAutoParallelization
setting.
For example, the code generator does not parallelize this loop:
coder.loop.parallelize("never"); for i = 1:n y(i) = y(i)*sin(i); end
Enable parallelization of specific
for
-loops
To parallelize specific for
-loops, place
coder.loop.parallelize("loopID")
immediately before the
for
-loop in the MATLAB code. This overrides the EnableAutoParallelization
setting.
For example, this for
-loop is always parallelized in the
generated
code.
coder.loop.parallelize("i"); for i = 1:100 out1(i) = out1(i)*i; end
For more information, see coder.loop.parallelize
.
Usage Notes and Limitations
In case of nested
for
-loops, MATLAB Coder parallelizes the outermostfor
-loop and vectorizes the innermostfor
-loop.for
-loops that containparfor
-loops are not parallelized.Automatic parallelization does not support
for
-loops whose bodies contain either persistent variables or calls to functions that access persistent variables.Automatic parallelization does not support
for
-loops in your code that contain calls to external functions.while
-loops are not parallelized.Hardware targets with a single core or
NumberOfCpuThreads
set to1
are not automatically parallelized.If OpenMP is not supported on target hardware or if the
coder.CodeConfig
object property EnableOpenMP is set tofalse
, then nofor
-loop is parallelized.If a single level
for
-loop can be vectorized and parallelized, then it is vectorized.
See Also
parfor
| coder.loop.parallelize
| coder.config
| coder.MexCodeConfig
| coder.CodeConfig
| coder.EmbeddedCodeConfig