coder.gpuConfig

Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder

Description

The coder.GpuCodeConfig or coder.gpuConfig object contains the configuration parameters that codegen uses for generating CUDA^® MEX, a static library, a dynamically linked library, or an executable program with GPU Coder™. Pass the object to the codegen function by using the -config option.

Creation

Syntax

cfg = coder.gpuConfig(build_type)

cfg = coder.gpuConfig(build_type,'ecoder',false)

cfg = coder.gpuConfig(build_type,'ecoder',true)

Description

cfg = coder.gpuConfig(build_type) creates a code generation configuration object for the specified build type, which can be CUDA MEX, a static library, a dynamically linked library, or an executable program. If the Embedded Coder^® product is installed, it creates a coder.EmbeddedCodeConfig object for static library, dynamic library, or executable build types.

example

cfg = coder.gpuConfig(build_type,'ecoder',false) creates a code generation configuration object to generate CUDA 'lib', 'dll', or 'exe' output even if the Embedded Coder product is installed.

cfg = coder.gpuConfig(build_type,'ecoder',true) creates a coder.EmbeddedCodeConfig configuration object even if the Embedded Coder product is not installed. However, code generation using a coder.EmbeddedCodeConfig object requires an Embedded Coder license.

Input Arguments

expand all

`build_type` — Output to build from generated CUDA code
`'MEX'` | `'LIB'` | `'DLL'` | `'EXE'`

Output to build from generated CUDA code, specified as one of the values in this table.

Value	Description
`'MEX'`	CUDA MEX
`'LIB'`	Static library
`'DLL'`	Dynamically linked library
`'EXE'`	Executable program

Properties

expand all

coder.GpuConfig contains only GPU specific configuration parameters of the code configuration object. To see the properties of the code configuration object, see coder.CodeConfig and coder.EmbeddedCodeConfig.

`Enabled` — Control GPU code generation
`true` (default) | `false`

Control GPU Code generation, specified as true or false. For more information, see Generate GPU Code.

Example: cfg.GpuConfig.Enabled = true

`MallocMode` — GPU memory allocation
`'discrete'` (default) | `'unified'`

Memory allocation (malloc) mode to be used in the generated CUDA code, specified as 'discrete' or 'unified'. For more information, see Malloc mode.

Example: cfg.GpuConfig.MallocMode = 'discrete'

`KernelNamePrefix` — Custom kernel name prefixes
' ' (default) | character vector

Custom kernel name prefixes, specified as a character vector. For more information, see Kernel name prefix.

Example: cfg.GpuConfig.KernelNamePrefix = 'myKernel'

`EnableCUBLAS` — Use `cuBLAS` library
`true` (default) | `false`

Replacement of math function calls with NVIDIA^® cuBLAS library calls, specified as true or false. For more information, see Enable cuBLAS.

Example: cfg.GpuConfig.EnableCUBLAS = true

`EnableCUSOLVER` — Use `cuSOLVER` library
`true` (default) | `false`

Replacement of math function calls with NVIDIA cuSOLVER library calls, specified as true or false. For more information, see Enable cuSOLVER.

Example: cfg.GpuConfig.EnableCUSOLVER = true

`EnableCUFFT` — Use `cuFFT` library
`true` (default) | `false`

Replacement of fft function calls with NVIDIA cuFFT library calls, specified as true or false. For more information, see Enable cuFFT.

Example: cfg.GpuConfig.EnableCUFFT = true

`Benchmarking` — Add benchmarking to the generated code
`false` (default) | `true`

Add benchmarking code to the generated CUDA code, specified as true or false. For more information, see Benchmarking.

Example: cfg.GpuConfig.Benchmarking = true

`SafeBuild` — Error checking in the generated code
`false` (default) | `true`

Check for errors in CUDA API calls and kernel launches, specified as true or false. For more information, see Safe build.

Example: cfg.GpuConfig.SafeBuild = true

`ComputeCapability` — Minimum compute capability for code generation
`'Auto'` (default) | `'5.0'` | `'5.2'` | `'5.3'` | `'6.0'` | `'6.1'` | `'6.2'` | `'7.0'` | ...

Minimum compute capability of an NVIDIA GPU device for which CUDA code is generated, specified as one of these values.

'Auto'
'3.2'
'3.5'
'3.7'
'5.0'
'5.2'
'5.3'
'6.0'
'6.1'
'6.2'
'7.0'
'7.2'
'7.5'
'8.0'
'8.6'
'8.7'
'8.9'
'9.0'

For more information, see Minimum compute capability.

Example: cfg.GpuConfig.ComputeCapability = '6.1'

`CustomComputeCapability` — Virtual GPU Architecture
`''` (default) | character vector

Name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled, specified as a character vector. For more information, see Custom compute capability.

Example: cfg.GpuConfig.CustomComputeCapability = '-arch=compute_50'

`CompilerFlags` — Additional flags to the GPU compiler
`''` (default) | character vector

Additional flags passed to the GPU compiler, specified as a character vector. For more information, see Compiler flags.

Example: cfg.GpuConfig.CompilerFlags = '--fmad=false'

`StackLimitPerThread` — Stack limit per GPU thread
`1024` (default) | integer

Stack limit in bytes per GPU thread, specified as an integer. For more information, see Stack limit.

Example: cfg.GpuConfig.StackLimitPerThread = 1024

`MallocThreshold` — Malloc threshold
`200` (default) | `integer`

Threshold in bytes above which the private variables are allocated on the heap instead of the stack, specified as an integer. For more information, see Malloc threshold.

Example: cfg.GpuConfig.MallocThreshold = 256

`MaximumBlocksPerKernel` — Maximum number of blocks created during a kernel launch
`0` (default) | `integer`

Maximum number of blocks created during a kernel launch, specified as an integer. For more information, see Maximum blocks per kernel.

Example: cfg.GpuConfig.MaximumBlocksPerKernel = 1024

`EnableMemoryManager` — Use GPU memory manager
`true` (default) | `false`

Whether to use the GPU memory manager, specified as true or false. For more information, see Enable GPU memory manager.

Example: cfg.GpuConfig.EnableMemoryManager = true

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

CUDA device selection, specified as the numeric value of the device ID. For more information, see GPU device ID.

Example: cfg.GpuConfig.SelectCudaDevice = <DeviceID>

Examples

collapse all

Generate CUDA MEX

Generate CUDA MEX function from a MATLAB^® function that is suitable for GPU code generation. Also, enable a code generation report.

Write a MATLAB function VecAdd, that performs vector addition of inputs A and B.

function [C] = VecAdd(A,B) %#codegen
    C = coder.nullcopy(zeros(size(A)));
    coder.gpu.kernelfun();
    C = A + B;
end

To generate a MEX function, create a code generation configuration object.

cfg = coder.gpuConfig('mex');

Enable the code generation report.

cfg.GpuConfig.EnableCUBLAS = true;
cfg.GenerateReport = true;

Generate a MEX function in the current folder specifying the configuration object using the -config option.

% Generate a MEX function and code generation report
codegen -config cfg -args {zeros(512,512,'double'),zeros(512,512,'double')} VecAdd

Limitations

For standalone targets such as static library, dynamically linked library, or executable program in the Windows^® environment, the generated makefiles does not set /MT or /MD compiler flags. These flags indicate to the Visual Studio compiler to use the multithread library. By default, Visual Studio uses the /MT during compilation. To pass other compiler specific flags, use the CompilerFlags option. For example,
```
cfg.GpuConfig.CompilerFlags = '-Xcompiler /MD';
```
The nvcc compiler has limitations on input file suffixes. For example, if object file contains version numbers, compilation may fail. In such cases create symbolic links or pass '-Xlinker' to the CompilerFlags.

Version History

Introduced in R2017b

expand all

R2024a: GPU memory manager is enabled by default

In previous releases, the default value of the EnableMemoryManager property was false. Now, the default value has changed to true. Therefore, when you generate CUDA code, the GPU memory manager is enabled by default.

Because of this change, once you generate a CUDA MEX with the default configuration setting, you cannot run this MEX on a different GPU. If you want to run the generated MEX on a different GPU, set the EnableMemoryManager property to false before you generate code.

R2024a: Configuration parameters related to GPU memory manager are removed

In previous releases, the GPU memory manager provided code configuration parameters to manage the allocation and deallocation of memory blocks in the GPU memory pools. These properties have now been removed.

The removed properties are:

BlockAlignment
FreeMode
MinPoolSize
MaxPoolSize

R2024a: Change to default compute capability value in code configuration

The default value of the ComputeCapability property is now 'Auto' instead of '3.5'. When compute capability is set to 'Auto', the code generator detects and uses the compute capability of the GPU device that you have selected for GPU code generation. If no GPU device is available or if the code generator is unable to detect a GPU device, the code generator uses a compute capability value of '5.0'.

For Simulink^® Coder™, the default compute capability value is now '5.0' instead of '3.5'. To change this default value, modify the Compute capability parameter on the Code Generation > GPU Code pane in the Configuration Parameters dialog box. For more information, see Compute capability (Simulink Coder).

R2021a: `unified` memory allocation mode on host being removed

In a future release, the unified memory allocation (cudaMallocManaged) mode will be removed when targeting NVIDIA GPU devices on the host development computer. You can continue to use unified memory allocation mode when targeting NVIDIA embedded platforms.

When generating CUDA code for the host from MATLAB, set the MallocMode property of the coder.gpuConfig code configuration object to 'discrete'.

coder.gpuConfig

Description

Creation

Syntax

Description

Input Arguments

`build_type` — Output to build from generated CUDA code
`'MEX'` | `'LIB'` | `'DLL'` | `'EXE'`

Properties

`Enabled` — Control GPU code generation
`true` (default) | `false`

`MallocMode` — GPU memory allocation
`'discrete'` (default) | `'unified'`

`KernelNamePrefix` — Custom kernel name prefixes
' ' (default) | character vector

`EnableCUBLAS` — Use `cuBLAS` library
`true` (default) | `false`

`EnableCUSOLVER` — Use `cuSOLVER` library
`true` (default) | `false`

`EnableCUFFT` — Use `cuFFT` library
`true` (default) | `false`

`Benchmarking` — Add benchmarking to the generated code
`false` (default) | `true`

`SafeBuild` — Error checking in the generated code
`false` (default) | `true`

`ComputeCapability` — Minimum compute capability for code generation
`'Auto'` (default) | `'5.0'` | `'5.2'` | `'5.3'` | `'6.0'` | `'6.1'` | `'6.2'` | `'7.0'` | ...

`CustomComputeCapability` — Virtual GPU Architecture
`''` (default) | character vector

`CompilerFlags` — Additional flags to the GPU compiler
`''` (default) | character vector

`StackLimitPerThread` — Stack limit per GPU thread
`1024` (default) | integer

`MallocThreshold` — Malloc threshold
`200` (default) | `integer`

`MaximumBlocksPerKernel` — Maximum number of blocks created during a kernel launch
`0` (default) | `integer`

`EnableMemoryManager` — Use GPU memory manager
`true` (default) | `false`

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

Examples

Generate CUDA MEX

Limitations

Version History

R2024a: GPU memory manager is enabled by default

R2024a: Configuration parameters related to GPU memory manager are removed

R2024a: Change to default compute capability value in code configuration

R2021a: `unified` memory allocation mode on host being removed

See Also

Apps

Functions

Objects

Topics

coder.gpuConfig

Description

Creation

Syntax

Description

Input Arguments

build_type — Output to build from generated CUDA code 'MEX' | 'LIB' | 'DLL' | 'EXE'

Properties

Enabled — Control GPU code generation true (default) | false

MallocMode — GPU memory allocation 'discrete' (default) | 'unified'

KernelNamePrefix — Custom kernel name prefixes ' ' (default) | character vector

EnableCUBLAS — Use cuBLAS library true (default) | false

EnableCUSOLVER — Use cuSOLVER library true (default) | false

EnableCUFFT — Use cuFFT library true (default) | false

Benchmarking — Add benchmarking to the generated code false (default) | true

SafeBuild — Error checking in the generated code false (default) | true

ComputeCapability — Minimum compute capability for code generation 'Auto' (default) | '5.0' | '5.2' | '5.3' | '6.0' | '6.1' | '6.2' | '7.0' | ...

CustomComputeCapability — Virtual GPU Architecture '' (default) | character vector

CompilerFlags — Additional flags to the GPU compiler '' (default) | character vector

StackLimitPerThread — Stack limit per GPU thread 1024 (default) | integer

MallocThreshold — Malloc threshold 200 (default) | integer

MaximumBlocksPerKernel — Maximum number of blocks created during a kernel launch 0 (default) | integer

EnableMemoryManager — Use GPU memory manager true (default) | false

SelectCudaDevice — CUDA device selection -1 (default) | deviceID

Examples

Generate CUDA MEX

Limitations

Version History

R2024a: GPU memory manager is enabled by default

R2024a: Configuration parameters related to GPU memory manager are removed

R2024a: Change to default compute capability value in code configuration

R2021a: unified memory allocation mode on host being removed

See Also

Apps

Functions

Objects

Topics

`build_type` — Output to build from generated CUDA code
`'MEX'` | `'LIB'` | `'DLL'` | `'EXE'`

`Enabled` — Control GPU code generation
`true` (default) | `false`

`MallocMode` — GPU memory allocation
`'discrete'` (default) | `'unified'`

`KernelNamePrefix` — Custom kernel name prefixes
' ' (default) | character vector

`EnableCUBLAS` — Use `cuBLAS` library
`true` (default) | `false`

`EnableCUSOLVER` — Use `cuSOLVER` library
`true` (default) | `false`

`EnableCUFFT` — Use `cuFFT` library
`true` (default) | `false`

`Benchmarking` — Add benchmarking to the generated code
`false` (default) | `true`

`SafeBuild` — Error checking in the generated code
`false` (default) | `true`

`ComputeCapability` — Minimum compute capability for code generation
`'Auto'` (default) | `'5.0'` | `'5.2'` | `'5.3'` | `'6.0'` | `'6.1'` | `'6.2'` | `'7.0'` | ...

`CustomComputeCapability` — Virtual GPU Architecture
`''` (default) | character vector

`CompilerFlags` — Additional flags to the GPU compiler
`''` (default) | character vector

`StackLimitPerThread` — Stack limit per GPU thread
`1024` (default) | integer

`MallocThreshold` — Malloc threshold
`200` (default) | `integer`

`MaximumBlocksPerKernel` — Maximum number of blocks created during a kernel launch
`0` (default) | `integer`

`EnableMemoryManager` — Use GPU memory manager
`true` (default) | `false`

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

R2021a: `unified` memory allocation mode on host being removed