Main Content

Kernel Creation from MATLAB Code

MATLAB code structures and patterns that create CUDA® GPU kernels

GPU Coder™ generates and executes optimized CUDA kernels for specific algorithm structures and patterns in your MATLAB® code. The generated code calls optimized NVIDIA® CUDA libraries, including cuFFT, cuSolver, cuBLAS, cuDNN, and TensorRT. The generated code can be integrated into your project as source code, static libraries, or dynamic libraries, and can be compiled for desktops, servers, and GPUs embedded on NVIDIA Jetson, DRIVE, and other platforms. GPU Coder lets you incorporate handwritten CUDA code into your algorithms and into the generated code.

Apps

expand all

GPU CoderGenerate CUDA code from MATLAB code
GPU Environment CheckVerify and set up GPU code generation environment

Functions

expand all

codegenGenerate C/C++ code from MATLAB code
gpucoderOpen GPU Coder app
coder.checkGpuInstallVerify GPU code generation environment
coder.gpuConfigConfiguration parameters for CUDA code generation from MATLAB code by using GPU Coder
coder.gpu.kernelPragma that maps for-loops to GPU kernels
coder.gpu.kernelfunPragma that maps function to GPU kernels
coder.gpu.nokernelPragma to disable kernel creation for loops
coder.cevalCall C/C++ function from generated code
coder.gpu.iterationsPragma that provides information to the code generator for making parallelization decisions on variable bound loops
coder.gpu.constantMemoryPragma that maps a variable to the constant memory on GPU
coder.gpu.persistentMemoryPragma to allocate a variable as persistent memory on the GPU (Since R2020b)
cudaMemoryManagerQuery memory usage by shared GPU memory manager for MEX functions (Since R2024a)
gpucoder.atomicAddAtomically add a specified value to a variable in global or shared memory (Since R2021b)
gpucoder.atomicAndAtomically perform bit-wise AND between a specified value and a variable in global or shared memory (Since R2021b)
gpucoder.atomicCASAtomically compare and swap the value of a variable in global or shared memory (Since R2021b)
gpucoder.atomicDecAtomically decrement a variable in global or shared memory within a specified upper bound (Since R2021b)
gpucoder.atomicExchAtomically exchange a variable in global or shared memory with the specified value (Since R2021b)
gpucoder.atomicIncAtomically increment a variable in global or shared memory within a specified upper bound (Since R2021b)
gpucoder.atomicMaxAtomically find the maximum between a specified value and a variable in global or shared memory (Since R2021b)
gpucoder.atomicMinAtomically find the minimum between a specified value and a variable in global or shared memory (Since R2021b)
gpucoder.atomicOrAtomically perform bit-wise OR between a specified value and a variable in global or shared memory (Since R2021b)
gpucoder.atomicSubAtomically subtract a specified value from a variable in global or shared memory (Since R2021b)
gpucoder.atomicXorAtomically perform bit-wise XOR between a specified value and a variable in global or shared memory (Since R2021b)
halfConstruct half-precision numeric object
stencilfunGenerate CUDA code for stencil functions (Since R2022b)
selectdataSelect slices of arrays and generate CUDA code (Since R2025a)
gpucoder.matrixMatrixKernelOptimized GPU implementation of functions containing matrix-matrix operations
gpucoder.batchedMatrixMultiplyOptimized GPU implementation of batched matrix multiply operation
gpucoder.stridedMatrixMultiplyOptimized GPU implementation of strided and batched matrix multiply operation
gpucoder.batchedMatrixMultiplyAddOptimized GPU implementation of batched matrix multiply with add operation
gpucoder.stridedMatrixMultiplyAddOptimized GPU implementation of strided, batched matrix multiply with add operation
gpucoder.sortOptimized GPU implementation of the MATLAB sort function
gpucoder.ctransposeOptimized GPU implementation of the MATLAB transpose function
gpucoder.transposeOptimized GPU implementation of the MATLAB transpose function
gpucoder.reduceOptimized GPU implementation for reduction operations

Code Configuration Settings

expand all

Generate GPU CodeControl GPU code generation
GPU device IDCUDA device selection
Minimum compute capabilityMinimum compute capability for code generation
Custom compute capabilityVirtual GPU architecture
Malloc modeGPU memory allocation
Malloc thresholdMalloc threshold
Stack limitStack limit per GPU thread
Maximum blocks per kernelMaximum number of blocks created during a kernel launch
BenchmarkingAdd benchmarking to the generated code
Safe buildError checking in the generated code
Kernel name prefixCustom kernel name prefixes
Compiler flagsPass additional flags to GPU compiler
Enable cuBLASReplace math function calls with cuBLAS library calls
Enable cuSOLVERReplace math function calls with cuSOLVER library calls
Enable cuFFTReplace fft function calls with cuFFT library calls
Enable GPU Memory managerUse GPU memory manager

Objects

expand all

coder.gpuConfigConfiguration parameters for CUDA code generation from MATLAB code by using GPU Coder
coder.CodeConfigConfiguration parameters for C/C++ code generation from MATLAB code
coder.EmbeddedCodeConfigConfiguration parameters for C/C++ code generation from MATLAB code with Embedded Coder
coder.gpuEnvConfigConfiguration object for checking the GPU code generation environment

Topics

Featured Examples