Pass GPU Inputs to Entry-Point Functions
This example shows how to configure GPU Coder to pass GPU inputs to entry-point functions and produce GPU outputs. It can improve the performance of the generated code when you integrate the code with a system that produces and consumes data on GPU. When you create inputs on GPU in the caller of entry-point function and access them on
the GPU in the entry-point function, you can avoid creating unnecessary memory copies between CPU and GPU. It also avoids unnecessary memory copy for outputs.
Third-Party Prerequisites
CUDA-enabled NVIDIA® GPU and compatible driver.
Verify GPU Environment
To verify that the compilers and libraries necessary for running this example are set up correctly, use the coder.checkGpuInstall
function.
envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);
The sobelEdgeDetection
Entry-Point Function
The sobelEdgeDetection
entry-point function is a sobel edge detection algorithm that takes a image input and produces image output that shows the edges.
type sobelEdgeDetection.m
function outputImg = sobelEdgeDetection(inputImg) % % Copyright 2023 The MathWorks, Inc. coder.gpu.kernelfun(); inputSize = size(inputImg); outputSize = inputSize -2; outputImg = zeros(outputSize, 'like', inputImg); inputImg = double(inputImg); for colIdx = 1:outputSize(2) for rowIdx = 1:outputSize(1) hDiff = inputImg(rowIdx, colIdx) + 2* inputImg(rowIdx, colIdx+1) + inputImg(rowIdx,colIdx + 2) - ... inputImg(rowIdx + 2, colIdx) - 2* inputImg(rowIdx + 2, colIdx+1) - inputImg(rowIdx + 2,colIdx + 2); vDiff = inputImg(rowIdx, colIdx) + 2* inputImg(rowIdx + 1, colIdx) + inputImg(rowIdx + 2,colIdx) - ... inputImg(rowIdx, colIdx + 2) - 2* inputImg(rowIdx + 1, colIdx + 2) - inputImg(rowIdx + 2,colIdx + 2); diff = hDiff*hDiff + vDiff*vDiff; if diff > 3600 outputImg(rowIdx, colIdx) = 255; else outputImg(rowIdx, colIdx) = 0; end end end end
Generate GPU Code and Run gpuPerformanceAnalyzer
on CPU
Use coder.gpuConfig
to create a GPU code configuration object and use codegen
command to generate MEX function.
cfg = coder.gpuConfig('mex'); imRGB = imread('peppers.png'); imGray = rgb2gray(imRGB); codegen -config cfg -args {imGray} sobelEdgeDetection
Code generation successful.
gpuPerformanceAnalyzer('sobelEdgeDetection', {imGray}, Config=cfg, OutFolder='sobleEdgeWithCPUIO');
### Starting GPU code generation Code generation successful: View report ### GPU code generation finished ### Starting application profiling ### Application profiling finished ### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
By default, GPU Coder expects the inputs from the CPU and produces the output on the CPU. It copies the data from CPU to GPU before running computation on GPU and copies the results back to CPU.
The GPU Performance Analyzer report shows that memory copies takes most of the time.
Generate GPU Code and Run gpuPerformanceAnalyzer
on GPU
The sobel edge detection algorithm passes the input immediately to the GPU to compute the edges and produces the final results on the GPU. If algorithm passes the inputs to and takes the outputs from the GPU, it does not require any memory copies.
Pass the inputs to the GPU by using the gpuArray
function. When you pass inputs to the GPU, GPU Coder produces the outputs on the GPU when the GPU output types are supported.
imGrayGpu = gpuArray(imGray); codegen -config cfg -args {imGrayGpu} sobelEdgeDetection
Code generation successful.
You can also use coder.typeof
to represent inputs on the GPU.
inputImg = coder.typeof(imGray, 'Gpu', true); codegen -config cfg -args {inputImg} sobelEdgeDetection
Code generation successful.
Run gpuPerformanceAnalyzer
with inputs and outputs on the GPU.
gpuPerformanceAnalyzer('sobelEdgeDetection', {imGrayGpu}, Config=cfg, OutFolder='sobleEdgeWithGPUIO');
### Starting GPU code generation Code generation successful: View report ### GPU code generation finished ### Starting application profiling ### Application profiling finished ### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
With the inputs and outputs on the GPU, there are no GPU memory copies in the entry-point function.