Main Content

coder.gpuConfig

Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder

Description

The coder.GpuCodeConfig or coder.gpuConfig object contains the configuration parameters that codegen uses for generating CUDA® MEX, a static library, a dynamically linked library, or an executable program with GPU Coder™. Pass the object to the codegen function by using the -config option.

Creation

Description

example

cfg = coder.gpuConfig(build_type) creates a code generation configuration object for the specified build type, which can be CUDA MEX, a static library, a dynamically linked library, or an executable program. If the Embedded Coder® product is installed, it creates a coder.EmbeddedCodeConfig object for static library, dynamic library, or executable build types.

cfg = coder.gpuConfig(build_type,'ecoder',false) creates a code generation configuration object to generate CUDA 'lib', 'dll', or 'exe' output even if the Embedded Coder product is installed.

cfg = coder.gpuConfig(build_type,'ecoder',true) creates a coder.EmbeddedCodeConfig configuration object even if the Embedded Coder product is not installed. However, code generation using a coder.EmbeddedCodeConfig object requires an Embedded Coder license.

Input Arguments

expand all

Output to build from generated CUDA code, specified as one of the values in this table.

ValueDescription
'MEX'CUDA MEX
'LIB'Static library
'DLL'Dynamically linked library
'EXE'Executable program

Properties

expand all

coder.GpuConfig contains only GPU specific configuration parameters of the code configuration object. To see the properties of the code configuration object, see coder.CodeConfig and coder.EmbeddedCodeConfig.

Control generation of CUDA (*.cu) files by using one of the values in this table.

ValueDescription
true

This value is the default value.

Enables CUDA code generation.

false

Disables CUDA code generation.

Example: cfg.GpuConfig.Enabled = true

Memory allocation (malloc) mode to be used in the generated CUDA code, specified as one of the values in this table.

ValueDescription
'discrete'

This value is the default value.

The generated code uses the cudaMalloc API for transferring data between the CPU and the GPU. From the programmers point-of-view, the discrete mode has a traditional memory architecture with separate CPU and GPU global memory address space.

'unified'

The generated code uses the cudaMallocManaged API that uses a shared (unified) CPU and GPU global memory address space.

For NVIDIA® embedded targets only. See unified memory allocation mode on host being removed.

For more information, see Discrete and Managed Modes.

Example: cfg.GpuConfig.MallocMode = 'discrete'

Specify a custom name prefix for all the kernels in the generated code. For example, using the value 'CUDA_' creates kernels with names CUDA_kernel1, CUDA_kernel2, and so on. If no name is provided, GPU Coder prepends the kernel name with the name of the entry-point function. Kernel names can contain upper-case letters, lowercase letters, digits 0–9, and underscore character _. GPU Coder removes unsupported characters from the kernel names and appends alpha to prefixes that do not begin with an alphabetic letter.

Example: cfg.GpuConfig.KernelNamePrefix = 'myKernel'

Replacement of math function calls with NVIDIA cuBLAS library calls, specified as one of the values in this table.

ValueDescription
true

This value is the default value.

Allows GPU Coder to replace corresponding math function calls with calls to the cuBLAS library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB® functions and attempts to map them to the GPU.

false

Disable the use of the cuBLAS library in the generated code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUBLAS = true

Replacement of math function calls with NVIDIA cuSOLVER library calls, specified as one of the values in this table.

ValueDescription
true

This value is the default value.

Allows GPU Coder to replace corresponding math function calls with calls to the cuSOLVER library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB functions and attempts to map them to the GPU.

false

Disable the use of the cuSOLVER library in the generated code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUSOLVER = true

Replacement of fft function calls with NVIDIA cuFFT library calls, specified as one of the values in this table.

ValueDescription
true

This value is the default value.

Allows GPU Coder to replace appropriate fft calls with calls to the cuFFT library.

false

Disables use of the cuFFT library in the generated code. With this option, GPU Coder uses C FFTW libraries where available or generates kernels from portable MATLAB fft code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUFFT = true

Control addition of benchmarking code to the generated CUDA code by using one of the values in this table.

ValueDescription
false

This value is the default value.

The generated CUDA code does not contain benchmarking functionality.

true

Generates CUDA code with benchmarking functionality. This option uses CUDA APIs such as cudaEvent to time kernel, memcpy, and other events.

After execution, the generated benchmarking code creates the gpuTimingData comma separated values (CSV) file in the current working folder. The CSV file contains timing data for kernel, memory, and other events. The table describes the format of the CSV file.

Event TypeFormat

CUDA kernels

<name_N>,<block dimension>,<grid dimension>,<execution time in ms>,<name of parent>

N is the Nth execution of the kernel. <block dimension> represents the total block dimension. For example is block dimension is dim3(32,32,32), then the <block dimension> value is 32768.

CUDA memory copy

<name_N>,<memory copy size>,<execution time in ms>,<IO flag>,<name of parent>

N is the Nth execution of the memory copy.

Miscellaneous

<name_N>,<execution time in ms>,<name of parent>

N is the Nth execution of the operation.

Example: cfg.GpuConfig.Benchmarking = true

Add error-checking functionality to the generated CUDA code by using one of the values in this table.

ValueDescription
false

This value is the default value.

The generated CUDA code does not contain error-checking functionality.

true

Generates code with error-checking for CUDA API and kernel calls.

Example: cfg.GpuConfig.SafeBuild = true

Select the minimum compute capability for code generation. The compute capability identifies the features supported by the GPU hardware. It is used by applications at run time to determine which hardware features, instructions are available on the present GPU. If you specify custom compute capability, GPU Coder ignores this setting.

To see the CUDA compute capability requirements for code generation, consult the following table.

TargetCompute Capability

CUDA MEX

See GPU Computing Requirements.

Source code, static or dynamic library, and executables

3.2 or higher.

Deep learning applications in 8-bit integer precision

6.1, 6.3 or higher.

Deep learning applications in half-precision (16-bit floating point)

5.3, 6.0, 6.2 or higher.

Example: cfg.GpuConfig.ComputeCapability = '6.1'

Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled.

For example, to specify a virtual architecture type -arch=compute_50. You can specify a real architecture using -arch=sm_50. For more information, see the Options for Steering GPU Code Generation topic in the CUDA Toolkit documentation.

Example: cfg.GpuConfig.CustomComputeCapability = '-arch=compute_50'

Pass additional flags to the GPU compiler. For example, --fmad=false instructs the nvcc compiler to disable contraction of floating-point multiply and add to a single Floating-Point Multiply-Add (FMAD) instruction.

For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA Toolkit documentation.

Example: cfg.GpuConfig.CompilerFlags = '--fmad=false'

Specify the maximum stack limit per GPU thread as an integer value.

Example: cfg.GpuConfig.StackLimitPerThread = 1024

Specify the size above which the private variables are allocated on the heap instead of the stack, as an integer value.

Example: cfg.GpuConfig.MallocThreshold = 256

Specify the maximum number of blocks created during a kernel launch.

Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading and unloading of blocks.

If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.

When you specify the maximum number of blocks for each kernel, the code generator creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the coder.gpu.kernel pragma. The coder.gpu.kernel pragma takes precedence over the maximum number of kernels for each block.

Example: cfg.GpuConfig.MaximumBlocksPerKernel = 1024

Select the GPU memory manager for efficient memory allocation, management, and improving run-time performance.

ValueDescription
true

The GPU memory manager creates a collection of large GPU memory pools and manages allocation and deallocation of chunks of memory blocks within these pools. By creating large memory pools, the memory manager reduces the number of calls to the CUDA memory APIs, improving run-time performance. You can use the GPU memory manager for MEX and standalone CUDA code generation.

false

Disable the use GPU memory manager for memory allocation and management.

This value is the default value.

Example: cfg.GpuConfig.EnableMemoryManager = true

Specifies the alignment of memory blocks used by the GPU memory manager. The block sizes (bytes) in the pool are a multiple of the specified value. The value for BlockAlignment must be power of 2.

Example: cfg.GpuConfig.BlockAlignment = 1024

Specify when the memory manager frees the GPU device memory by using one of the values in this table.

ValueDescription
'Never'

Free memory when the memory manager is destroyed.

This value is the default value.

'AtTerminate'

Free empty GPU pools when the terminate function is called in the generated code.

For MEX targets, memory is freed after every call to the generated MEX function.

For other targets, memory is freed when calling the terminate function.

'AfterAllocate'

Empty pools are freed after each call to CUDA memory allocate.

Example: cfg.GpuConfig.FreeMode = 'AtTerminate'

Specify the minimum pool size in megabytes (MB) of the GPU memory manager. The value for MinPoolSize must be power of 2.

The memory manager computes the size levels using the MinPoolSize and MaxPoolSize parameters by interpolating between the two values in increasing powers of 2. For example, if the MinPoolSize is 4 and the MaxPoolSize is 1024, the size levels are {4, 8, 16, 32, 64, 128, 256, 512, 1024}.

Example: cfg.GpuConfig.MinPoolSize = 32

Specify the maximum pool size in megabytes (MB) of the GPU memory manager. The value for MaxPoolSize must be power of 2.

The memory manager computes the size levels using the MinPoolSize and MaxPoolSize parameters by interpolating between the two values in increasing powers of 2. For example, if the MinPoolSize is 4 and the MaxPoolSize is 1024, the size levels are {4, 8, 16, 32, 64, 128, 256, 512, 1024}.

Example: cfg.GpuConfig.MaxPoolSize = 4096

In a multi GPU environment such as NVIDIA Drive platforms, specify the CUDA device to target.

Example: cfg.GpuConfig.SelectCudaDevice = <DeviceID>

Note

SelectCudaDevice can be used with gpuArray only if gpuDevice and SelectCudaDevice point to the same GPU. If gpuDevice points to a different GPU, a CUDA_ERROR_INVALID_VALUE runtime error is thrown.

Examples

collapse all

Generate CUDA MEX function from a MATLAB function that is suitable for GPU code generation. Also, enable a code generation report.

Write a MATLAB function VecAdd, that performs vector addition of inputs A and B.

function [C] = VecAdd(A,B) %#codegen
    C = coder.nullcopy(zeros(size(A)));
    coder.gpu.kernelfun();
    C = A + B;
end

To generate a MEX function, create a code generation configuration object.

cfg = coder.gpuConfig('mex');

Enable the code generation report.

cfg.GpuConfig.EnableCUBLAS = true;
cfg.GenerateReport = true;

Generate a MEX function in the current folder specifying the configuration object using the -config option.

% Generate a MEX function and code generation report
codegen -config cfg -args {zeros(512,512,'double'),zeros(512,512,'double')} VecAdd

Limitations

  • GPU Coder sets the PassStructByReference property of the coder.CodeConfig and coder.EmbeddedCodeConfig code configuration object to true.

  • GPU Coder sets the EnableSignedLeftShifts and the EnableSignedRightShifts property of the coder.EmbeddedCodeConfig code configuration object to true.

  • For standalone targets such as static library, dynamically linked library, or executable program in the Windows® environment, the generated makefiles does not set /MT or /MD compiler flags. These flags indicate to the Visual Studio compiler to use the multithread library. By default, Visual Studio uses the /MT during compilation. To pass other compiler specific flags, use the CompilerFlags option. For example,

    cfg.GpuConfig.CompilerFlags = '-Xcompiler /MD';
    

  • The nvcc compiler has limitations on input file suffixes. For example, if object file contains version numbers, compilation may fail. In such cases create symbolic links or pass '-Xlinker' to the CompilerFlags.

Version History

Introduced in R2017b

expand all