coder.gpuConfig
Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder
Description
The coder.GpuCodeConfig
or
coder.gpuConfig
object contains the configuration parameters that
codegen
uses for generating CUDA® MEX, a static library, a dynamically linked library, or an executable
program with GPU Coder™. Pass the object to the codegen
function by using the
-config
option.
Creation
Syntax
Description
cfg = coder.gpuConfig(
creates a code generation configuration object for the specified build type,
which can be CUDA MEX, a static library, a dynamically linked library, or an
executable program. If the Embedded Coder® product is installed, it creates a build_type
)coder.EmbeddedCodeConfig
object for static library, dynamic
library, or executable build types.
cfg = coder.gpuConfig(
creates a code generation configuration object to generate CUDA
build_type
,'ecoder',false)'lib'
, 'dll'
, or 'exe'
output even if the Embedded Coder product is installed.
cfg = coder.gpuConfig(
creates a build_type
,'ecoder',true)coder.EmbeddedCodeConfig
configuration object even if the Embedded Coder product is not installed. However, code generation using a
coder.EmbeddedCodeConfig
object requires an Embedded Coder license.
Input Arguments
build_type
— Output to build from generated CUDA code
'MEX'
| 'LIB'
| 'DLL'
| 'EXE'
Output to build from generated CUDA code, specified as one of the values in this table.
Value | Description |
---|---|
'MEX' | CUDA MEX |
'LIB' | Static library |
'DLL' | Dynamically linked library |
'EXE' | Executable program |
Properties
coder.GpuConfig
contains only GPU specific configuration parameters
of the code configuration object. To see the properties of the code configuration
object, see coder.CodeConfig
and coder.EmbeddedCodeConfig
.
Enabled
— Control GPU code generation
true
(default) | false
Control generation of CUDA (*.cu) files by using one of the values in this table.
Value | Description |
---|---|
true | This value is the default value. Enables CUDA code generation. |
false | Disables CUDA code generation. |
Example: cfg.GpuConfig.Enabled = true
MallocMode
— GPU memory allocation
'discrete'
(default) | 'unified'
Memory allocation (malloc
) mode to be used in the
generated CUDA code, specified as one of the values in this
table.
Value | Description |
---|---|
'discrete' | This value is the default value. The generated code uses the
|
'unified' | The generated code uses the
For NVIDIA® embedded targets only. See unified memory allocation mode on host being removed. |
For more information, see Discrete and Managed Modes.
Example: cfg.GpuConfig.MallocMode =
'discrete'
KernelNamePrefix
— Custom kernel name prefixes
' ' (default) | character vector
Specify a custom name prefix for all the
kernels in the generated code. For example, using the value
'CUDA_'
creates kernels with names
CUDA_kernel1
, CUDA_kernel2
, and so
on. If no name is provided, GPU Coder prepends the kernel name with the name of the entry-point
function. Kernel names can contain upper-case letters, lowercase letters,
digits 0–9, and underscore character _. GPU Coder removes unsupported characters from the kernel names and
appends alpha
to prefixes that do not begin with an
alphabetic letter.
Example: cfg.GpuConfig.KernelNamePrefix =
'myKernel'
EnableCUBLAS
— Use cuBLAS
library
true
(default) | false
Replacement of math function calls with NVIDIA
cuBLAS
library calls, specified as one of the values in
this table.
Value | Description |
---|---|
true | This value is the default value. Allows GPU Coder to replace corresponding math function
calls with calls to the |
false | Disable the use of the
|
For more information, see Kernels from Library Calls.
Example: cfg.GpuConfig.EnableCUBLAS =
true
EnableCUSOLVER
— Use cuSOLVER
library
true
(default) | false
Replacement of math function calls with NVIDIA
cuSOLVER
library calls, specified as one of the values in
this table.
Value | Description |
---|---|
true | This value is the default value. Allows GPU Coder to replace corresponding math function
calls with calls to the |
false | Disable the use of the
|
For more information, see Kernels from Library Calls.
Example: cfg.GpuConfig.EnableCUSOLVER =
true
EnableCUFFT
— Use cuFFT
library
true
(default) | false
Replacement of fft
function calls with NVIDIA
cuFFT
library calls, specified as one of the values in
this table.
Value | Description |
---|---|
true | This value is the default value. Allows GPU Coder to replace appropriate
|
false | Disables use of the |
For more information, see Kernels from Library Calls.
Example: cfg.GpuConfig.EnableCUFFT = true
Benchmarking
— Add benchmarking to the generated code
false
(default) | true
Control addition of benchmarking code to the generated CUDA code by using one of the values in this table.
Value | Description |
---|---|
false | This value is the default value. The generated CUDA code does not contain benchmarking functionality. |
true | Generates CUDA code with benchmarking functionality.
This option uses CUDA APIs such as
|
After execution, the generated benchmarking code creates the
gpuTimingData
comma separated values (CSV) file in
the current working folder. The CSV file contains timing data for kernel,
memory, and other events. The table describes the format of the CSV
file.
Event Type | Format |
---|---|
CUDA kernels |
|
CUDA memory copy |
|
Miscellaneous |
|
Example: cfg.GpuConfig.Benchmarking =
true
SafeBuild
— Error checking in the generated code
false
(default) | true
Add error-checking functionality to the generated CUDA code by using one of the values in this table.
Value | Description |
---|---|
false | This value is the default value. The generated CUDA code does not contain error-checking functionality. |
true | Generates code with error-checking for CUDA API and kernel calls. |
Example: cfg.GpuConfig.SafeBuild = true
ComputeCapability
— Minimum compute capability for code generation
'Auto'
(default) | '3.2'
| '3.5'
| '3.7'
| '5.0'
| '5.2'
| '5.3'
| '6.0'
| '6.1'
| '6.2'
| '7.0'
| '7.2'
| '7.5'
| '8.0'
| '8.6'
| '8.7'
| '8.9'
| '9.0'
ComputeCapability
specifies the minimum compute
capability of an NVIDIA GPU device for which CUDA code is generated. CUDA compute capability is a numerical representation of the
capabilities and features provided by a GPU architecture for executing CUDA
code. The compute capability version is denoted by a major and minor version
number and determines the available hardware features, instruction sets,
memory capabilities, and other GPU-specific functionalities that can be
utilized by CUDA programs. It also affects the compatibility and performance
of CUDA code on different GPUs.
For example, a GPU with compute capability 7.0 will have more features and capabilities compared to a GPU with compute capability 3.2. Newer compute capabilities generally introduce enhancements, improved performance, and additional features, allowing you to take advantage of the latest GPU architecture advancements. Certain CUDA features might may have specific compute capability requirements. To see the CUDA compute capability requirements for code generation, consult the following table.
Target | Compute Capability |
---|---|
CUDA MEX | |
Source code, static or dynamic library, and executables | 3.2 or higher. |
Deep learning applications in 8-bit integer precision | 6.1, 6.3 or higher. |
Deep learning applications in half-precision (16-bit floating point) | 5.3, 6.0, 6.2 or higher. |
If you specify custom compute capability, GPU Coder ignores this setting.
When ComputeCapability
is set to
'Auto'
, the software uses the compute capability of
the GPU device that you select for GPU code generation. If no GPU device is
available or if the software is unable to detect a GPU device, the code
generator uses a compute capability value of 5.0.
Example: cfg.GpuConfig.ComputeCapability =
'6.1'
CustomComputeCapability
— Control GPU code generation
''
(default) | character vector
Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled.
For example, to specify a virtual architecture type
-arch=compute_50
. You can specify a real architecture
using -arch=sm_50
. For more information, see the
Options for Steering GPU Code Generation topic in
the CUDA Toolkit documentation.
Example: cfg.GpuConfig.CustomComputeCapability =
'-arch=compute_50'
CompilerFlags
— Additional flags to the GPU compiler
''
(default) | character vector
Pass additional flags to the GPU compiler. For example,
--fmad=false
instructs the nvcc
compiler to disable contraction of floating-point multiply and add to a
single Floating-Point Multiply-Add (FMAD) instruction.
For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA Toolkit documentation.
Example: cfg.GpuConfig.CompilerFlags =
'--fmad=false'
StackLimitPerThread
— Stack limit per GPU thread
1024
(default) | integer
Specify the maximum stack limit per GPU thread as an integer value.
Example: cfg.GpuConfig.StackLimitPerThread =
1024
MallocThreshold
— Malloc threshold
200
(default) | integer
Specify the size above which the private variables are allocated on the heap instead of the stack, as an integer value.
Example: cfg.GpuConfig.MallocThreshold =
256
MaximumBlocksPerKernel
— Maximum number of blocks created during a kernel launch
0
(default) | integer
Specify the maximum number of blocks created during a kernel launch.
Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading and unloading of blocks.
If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.
When you specify the maximum number of blocks for each kernel, the code
generator creates 1-D kernels. To force the code generator to create 2-D or
3-D kernels, use the coder.gpu.kernel
pragma. The
coder.gpu.kernel
pragma takes precedence over the
maximum number of kernels for each block.
Example: cfg.GpuConfig.MaximumBlocksPerKernel =
1024
EnableMemoryManager
— Use GPU memory manager
true
(default) | false
Select the GPU memory manager for efficient memory allocation, management, and improving run-time performance.
Value | Description |
---|---|
true | The GPU memory manager creates a collection of large GPU memory pools and manages allocation and deallocation of chunks of memory blocks within these pools. By creating large memory pools, the memory manager reduces the number of calls to the CUDA memory APIs, improving run-time performance. You can use the GPU memory manager for MEX and standalone CUDA code generation. This value is the default value. |
false | Disable the use GPU memory manager for memory allocation and management. |
Example: cfg.GpuConfig.EnableMemoryManager =
true
SelectCudaDevice
— CUDA device selection
-1
(default) | deviceID
In a multi GPU environment such as NVIDIA Drive platforms, specify the CUDA device to target.
Example: cfg.GpuConfig.SelectCudaDevice =
<DeviceID>
Examples
Generate CUDA MEX
Generate CUDA MEX function from a MATLAB function that is suitable for GPU code generation. Also, enable a code generation report.
Write a MATLAB function VecAdd
, that performs vector
addition of inputs A
and B
.
function [C] = VecAdd(A,B) %#codegen C = coder.nullcopy(zeros(size(A))); coder.gpu.kernelfun(); C = A + B; end
To generate a MEX function, create a code generation configuration object.
cfg = coder.gpuConfig('mex');
Enable the code generation report.
cfg.GpuConfig.EnableCUBLAS = true; cfg.GenerateReport = true;
Generate a MEX function in the current folder specifying the configuration
object using the -config
option.
% Generate a MEX function and code generation report codegen -config cfg -args {zeros(512,512,'double'),zeros(512,512,'double')} VecAdd
Limitations
GPU Coder sets the
PassStructByReference
property of thecoder.CodeConfig
andcoder.EmbeddedCodeConfig
code configuration object to true.GPU Coder sets the
EnableSignedLeftShifts
and theEnableSignedRightShifts
property of thecoder.EmbeddedCodeConfig
code configuration object to true.For standalone targets such as static library, dynamically linked library, or executable program in the Windows® environment, the generated makefiles does not set
/MT
or/MD
compiler flags. These flags indicate to the Visual Studio compiler to use the multithread library. By default, Visual Studio uses the/MT
during compilation. To pass other compiler specific flags, use theCompilerFlags
option. For example,cfg.GpuConfig.CompilerFlags = '-Xcompiler /MD';
The
nvcc
compiler has limitations on input file suffixes. For example, if object file contains version numbers, compilation may fail. In such cases create symbolic links or pass'-Xlinker'
to theCompilerFlags
.
Version History
Introduced in R2017bR2024a: GPU memory manager is enabled by default
In previous releases, the default value of the
EnableMemoryManager
property was false
.
Now, the default value has changed to true
. Therefore, when you
generate CUDA code, the GPU memory manager is enabled by default.
Because of this change, once you generate a CUDA MEX with the default configuration setting, you cannot run this MEX on
a different GPU. If you want to run the generated MEX on a different GPU, set the
EnableMemoryManager
property to false
before you generate code.
R2024a: Configuration parameters related to GPU memory manager are removed
In previous releases, the GPU memory manager provided code configuration parameters to manage the allocation and deallocation of memory blocks in the GPU memory pools. These properties have now been removed.
The removed properties are:
BlockAlignment
FreeMode
MinPoolSize
MaxPoolSize
R2024a: Change to default compute capability value in code configuration
The default value of the ComputeCapability
property is now
'Auto'
instead of '3.5'
. When compute
capability is set to 'Auto'
, the code generator detects and uses
the compute capability of the GPU device that you have selected for GPU code
generation. If no GPU device is available or if the code generator is unable to
detect a GPU device, the code generator uses a compute capability value of
'5.0'
.
For Simulink®
Coder™, the default compute capability value is now '5.0'
instead of '3.5'
. To change this default value, modify the
Compute capability parameter on the Code Generation > GPU Code pane in the Configuration Parameters dialog box. For more information,
see Compute capability (Simulink Coder).
R2021a: unified
memory allocation mode on host being removed
In a future release, the unified memory allocation
(cudaMallocManaged
) mode will be removed when targeting
NVIDIA GPU devices on the host development computer. You can continue to use
unified memory allocation mode when targeting NVIDIA embedded platforms.
When generating CUDA code for the host from MATLAB, set the MallocMode
property of the
coder.gpuConfig
code configuration object to
'discrete'
.
See Also
Apps
Functions
codegen
|coder.gpu.kernel
|gpucoder.stencilKernel
|gpucoder.matrixMatrixKernel
|coder.gpu.constantMemory
|gpucoder.reduce
|gpucoder.sort
|coder.gpu.nokernel
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)