GPU Coder™ supports code generation for series and directed acyclic graph (DAG)
convolutional neural networks (CNNs or ConvNets). You can generate code for any trained
convolutional neural network whose layers are supported for code generation. See Supported Layers. You can train a
convolutional neural network on either a CPU, a GPU, or multiple GPUs by using the
Deep Learning
Toolbox™ or use one of the pretrained networks listed in the table and generate
CUDA^{®} code.

Network Name | Description | cuDNN | TensorRT | ARM^{®} Compute Library for Mali GPU |
---|---|---|---|---|

AlexNet | AlexNet convolutional neural network. For the pretrained AlexNet model,
see The syntax
| Yes | Yes | Yes |

Caffe Network | Convolutional neural network models from Caffe. For importing a
pretrained network from Caffe, see | Yes | Yes | Yes |

Darknet-19 | Darknet-19 convolutional neural network. For more information, see
The syntax
| Yes | Yes | Yes |

Darknet-53 | Darknet-53 convolutional neural network. for more information, see
The syntax
| Yes | Yes | Yes |

DeepLab v3+ | DeepLab v3+ convolutional neural network. For more information, see
| Yes | Yes | No |

DenseNet-201 | DenseNet-201 convolutional neural network. For the pretrained
DenseNet-201 model, see The syntax
| Yes | Yes | Yes |

GoogLeNet | GoogLeNet convolutional neural network. For the pretrained GoogLeNet
model, see The syntax
| Yes | Yes | Yes |

Inception-v3 | Inception-v3 convolutional neural network. For the pretrained
Inception-v3 model, see The syntax
| Yes | Yes | Yes |

Inception-ResNet-v2 | Inception-ResNet-v2 convolutional neural network. For the pretrained
Inception-ResNet-v2 model, see | Yes | Yes | No |

Mobilenet-v2 | MobileNet-v2 convolutional neural network. For the pretrained
MobileNet-v2 model, see The syntax
| Yes | Yes | Yes |

NASNet-Large | NASNet-Large convolutional neural network. For the pretrained
NASNet-Large model, see | Yes | Yes | No |

NASNet-Mobile | NASNet-Mobile convolutional neural network. For the pretrained
NASNet-Mobile model, see | Yes | Yes | No |

ResNet | ResNet-18, ResNet-50, and ResNet-101 convolutional neural networks. For
the pretrained ResNet models, see The syntax
| Yes | Yes | Yes |

SegNet | Multi-class pixelwise segmentation network. For more information, see
| Yes | Yes | No |

SqueezeNet | Small deep neural network. For the pretrained SqueezeNet models, see
The syntax
| Yes | Yes | Yes |

VGG-16 | VGG-16 convolutional neural network. For the pretrained VGG-16 model,
see The syntax
| Yes | Yes | Yes |

VGG-19 | VGG-19 convolutional neural network. For the pretrained VGG-19 model,
see The syntax
| Yes | Yes | Yes |

Xception | Xception convolutional neural network. For the pretrained Xception
model, see The syntax
| Yes | Yes | Yes |

YOLO v2 | You only look once version 2 convolutional neural network based object
detector. For more information, see | Yes | Yes | Yes |

The following layers are supported for code generation by GPU Coder for the target deep learning libraries specified in the table.

Once you install the support package GPU Coder Interface for Deep Learning Libraries, you can use `coder.getDeepLearningLayers`

to see a list of the layers supported for a
specific deep learning library. For example,
`coder.getDeepLearningLayers('cudnn')`

shows the list of layers supported
for code generation by using the NVIDIA^{®} cuDNN library.

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | An image input layer inputs 2-D images to a network and applies data normalization. Code generation does not support
| Yes | Yes | Yes | |

Deep Learning Toolbox | A sequence input layer inputs sequence data to a network. For code generation, only vector input types are supported. 2-D and 3-D image sequence input is not supported. Code generation
does not support | Yes | Yes | No |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | A 2-D convolutional layer applies sliding convolutional filters to the input. | Yes | Yes | Yes | |

Deep Learning Toolbox | A 2-D grouped convolutional layer separates the input channels into groups and applies sliding convolutional filters. Use grouped convolutional layers for channel-wise separable (also known as depth-wise separable) convolution. Code generation for the ARM Mali GPU is not supported for a 2-D grouped convolution layer that
has the | Yes | Yes | Yes | |

Deep Learning Toolbox | A transposed 2-D convolution layer upsamples feature maps. | Yes | Yes | Yes | |

Deep Learning Toolbox | A fully connected layer multiplies the input by a weight matrix and then adds a bias vector. | Yes | Yes | No |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | A sequence input layer inputs sequence data to a network. For code generation, only vector input types are supported. 2-D and 3-D image sequence input is not supported. Code generation
does not support | Yes | Yes | No | |

Deep Learning Toolbox | An LSTM layer learns long-term dependencies between time steps in time series and sequence data. For code generation, the
For code generation, the
| Yes | Yes | No | |

Deep Learning Toolbox | A bidirectional LSTM (BiLSTM) layer learns bidirectional long-term dependencies between time steps of time series or sequence data. These dependencies can be useful when you want the network to learn from the complete time series at each time step. For code generation, the
For code generation, the
| Yes | Yes | No | |

Deep Learning Toolbox | A flatten layer collapses the spatial dimensions of the input into the channel dimension. | Yes | No | No | |

Text Analytics Toolbox™ | A word embedding layer maps word indices to vectors. | Yes | Yes | No |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | A ReLU layer performs a threshold operation to each element of the input, where any value less than zero is set to zero. | Yes | Yes | Yes | |

Deep Learning Toolbox | A leaky ReLU layer performs a threshold operation, where any input value less than zero is multiplied by a fixed scalar. | Yes | Yes | Yes | |

Deep Learning Toolbox | A clipped ReLU layer performs a threshold operation, where any input
value less than zero is set to zero and any value above the | Yes | Yes | Yes | |

Deep Learning Toolbox | An ELU activation layer performs the identity operation on positive inputs and an exponential nonlinearity on negative inputs. | Yes | Yes | No | |

Deep Learning Toolbox | A hyperbolic tangent (tanh) activation layer applies the tanh function on the layer inputs. | Yes | Yes | Yes |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | A batch normalization layer normalizes each input channel across a mini-batch. | Yes | Yes | Yes | |

Deep Learning Toolbox | A channel-wise local response (cross-channel) normalization layer carries out channel-wise normalization. | Yes | Yes | Yes | |

Deep Learning Toolbox | A dropout layer randomly sets input elements to zero with a given probability. | Yes | Yes | Yes | |

Deep Learning Toolbox | A 2-D crop layer applies 2-D cropping to the input. | Yes | Yes | Yes |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | An average pooling layer performs down-sampling by dividing the input into rectangular pooling regions and computing the average values of each region. | Yes | Yes | Yes | |

Deep Learning Toolbox | A global average pooling layer performs down-sampling by computing the mean of the height and width dimensions of the input. | Yes | Yes | Yes | |

Deep Learning Toolbox | A max pooling layer performs down-sampling by dividing the input into rectangular pooling regions, and computing the maximum of each region. | Yes | Yes | Yes | |

Deep Learning Toolbox | A global max pooling layer performs down-sampling by computing the maximum of the height and width dimensions of the input. | Yes | Yes | Yes | |

Deep Learning Toolbox | A max unpooling layer unpools the output of a max pooling layer. | Yes | Yes | No |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | An addition layer adds inputs from multiple neural network layers element-wise. | Yes | Yes | Yes | |

Deep Learning Toolbox | A depth concatenation layer takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension). | Yes | Yes | Yes | |

Deep Learning Toolbox | A concatenation layer takes inputs and concatenates them along a specified dimension. | Yes | Yes | No |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Computer Vision Toolbox™ | An anchor box layer stores anchor boxes for a feature map used in object detection networks. | Yes | Yes | Yes | |

Computer Vision Toolbox | An SSD merge layer merges the outputs of feature maps for subsequent regression and classification loss computation. | Yes | Yes | No | |

Computer Vision Toolbox | Create output layer for YOLO v2 object detection network. | Yes | Yes | Yes | |

Computer Vision Toolbox | Create reorganization layer for YOLO v2 object detection network. | Yes | Yes | Yes | |

Computer Vision Toolbox | Create transform layer for YOLO v2 object detection network. | Yes | Yes | Yes |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

Deep Learning Toolbox | A softmax layer applies a softmax function to the input. | Yes | Yes | Yes | |

Deep Learning Toolbox | A classification layer computes the cross entropy loss for multi-class classification problems with mutually exclusive classes. | Yes | Yes | Yes | |

Deep Learning Toolbox | A regression layer computes the half-mean-squared-error loss for regression problems. | Yes | Yes | Yes | |

Computer Vision Toolbox | A pixel classification layer provides a categorical label for each image pixel or voxel. | Yes | Yes | Yes | |

Computer Vision Toolbox | A Dice pixel classification layer provides a categorical label for each image pixel or voxel using generalized Dice loss. | Yes | Yes | Yes | |

| Deep Learning Toolbox | All output layers including custom classification or regression output
layers created by using For an example showing how to define a custom classification output layer and specify a loss function, see Define Custom Classification Output Layer (Deep Learning Toolbox). For an example showing how to define a custom regression output layer and specify a loss function, see Define Custom Regression Output Layer (Deep Learning Toolbox). | Yes | Yes | Yes |

Layer Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

| Deep Learning Toolbox | Flatten activations into 1-D assuming C-style (row-major) order. | Yes | Yes | Yes |

| Deep Learning Toolbox | Global average pooling layer for spatial data. | Yes | Yes | Yes |

| Deep Learning Toolbox | Sigmoid activation layer. | Yes | Yes | Yes |

| Deep Learning Toolbox | Hyperbolic tangent activation layer. | Yes | Yes | Yes |

| Deep Learning Toolbox | Zero padding layer for 2-D input. | Yes | Yes | Yes |

| Deep Learning Toolbox | Layer that performs element-wise scaling of the input followed by an addition. | Yes | Yes | Yes |

| Deep Learning Toolbox | Flattens the spatial dimensions of the input tensor to the channel dimensions. | Yes | Yes | Yes |

| Deep Learning Toolbox | Layer that implements ONNX identity operator. | Yes | Yes | Yes |

The following classes are supported for code generation by GPU Coder for the target deep learning libraries specified in the table.

Name | Product | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|---|

`yolov2ObjectDetector` | Computer Vision Toolbox | Detect objects using YOLO v2 object detector
Only the `detect` method of the`yolov2ObjectDetector` is supported for code generation.The `roi` argument to the`detect` method must be a codegen constant (`coder.const()` ) and a 1x4 vector.Only the `Threshold` ,`SelectStrongest` ,`MinSize` ,`MaxSize` , and`MiniBatchSize` Name-Value pairs are supported.The height, width, channel, and batch size of the input image must be fixed size. The minimum batch size value passed to detect method must be fixed size. The labels output is returned as a cell array of character vectors, such as {'car','bus'}.
| Yes | Yes | Yes |

`ssdObjectDetector` | Computer Vision Toolbox | Object to detect objects using the SSD-based detector.
Only the `detect` method of the`ssdObjectDetector` is supported for code generation.The `roi` argument to the`detect` method must be a codegen constant (`coder.const()` ) and a 1x4 vector.Only the `Threshold` ,`SelectStrongest` ,`MinSize` ,`MaxSize` , and`MiniBatchSize` Name-Value pairs are supported. All Name-Value pairs must be compile-time constants.The channel and batch size of the input image must be fixed size. The `labels` output is returned as a categorical array.In the generated code, the input is rescaled to the size of the input layer of the network. But the bounding box that the `detect` method returns is in reference to the original input size.The bounding boxes might not numerically match the simulation results.
| Yes | Yes | No |

`codegen`

| `coder.CodeConfig`

| `coder.CuDNNConfig`

| `coder.DeepLearningConfig`

| `coder.EmbeddedCodeConfig`

| `coder.getDeepLearningLayers`

| `coder.gpuConfig`

| `coder.gpuEnvConfig`

- Pretrained Deep Neural Networks (Deep Learning Toolbox)
- Get Started with Transfer Learning (Deep Learning Toolbox)
- Create Simple Deep Learning Network for Classification (Deep Learning Toolbox)
- Load Pretrained Networks for Code Generation
- Code Generation for Deep Learning Networks by Using cuDNN
- Code Generation for Deep Learning Networks by Using TensorRT
- Code Generation for Deep Learning Networks Targeting ARM Mali GPUs