Considerations for Supported Layers for Quantization

Layers that are supported for quantization have some limitations. When you encounter these limitations, first consider using the prepareNetwork function to automatically resolve some network architectures that cause errors in the quantization workflow. If you continue to encounter errors, refer to the limitations on this page for information about specific network architectures, layers, and execution environments. For a list of supported layers for each execution environment, see Supported Layers for Quantization.

Benefits of Network Preparation

The prepareNetwork function or the Prepare network for quantization option in the Deep Network Quantizer helps to automatically resolve some network architectures that cause issues in the quantization workflow. Key transformations that the prepareNetwork function performs include:

Conversion to dlnetwork object — Network preparation converts your DAGNetwork or SeriesNetwork object to a dlnetwork object. Your network must be a dlnetwork to quantize using the "MATLAB" execution environment.
Batch normalization fusion — If a batchNormalizationLayer layer follows a convolution1dLayer, convolution2dLayer, groupedConvolution2dLayer, or fullyConnectedLayer layer, the batch normalization layer is fused to the convolutional or fully connected layer to support quantization. For other network architectures, the prepareNetwork function replaces the batchNormalizationLayer layer with a convolutional or fully connected layer. Batch normalization layers that are not fused or replaced are not supported for quantization.
Dropout layer removal — The dropoutLayer layer is not supported for quantization for any execution environment. Network preparation removes this layer.
Multiplication layer restructuring — For improved performance, any one multiplicationLayer layer should have at most two inputs. For the "MATLAB" and "FPGA" execution environments, network preparation restructures the network to contain multiple multiplicationLayer blocks each with no more than two inputs.

Network Architecture and Properties

The network you quantize must meet specific criteria with regard to the overall architecture or properties.

Output layers must be graphically at the end of the network by having unconnected outputs. For more information, see the OutputNames property for the dlnetwork object.
A dlnetwork object must have at least one input layer.
A dlnetwork object must be initialized.
For the MATLAB execution environment, your network must be a dlnetwork object.
Your network must not contain a custom layer containing a learnable dlnetwork object. For more information on this type of nested layer, see Define Nested Deep Learning Layer Using Network Composition.

Conditionally Quantized Layers

When a layer does not have any learnable parameters, quantization of the layer means that the software uses fixed-point data to perform the computations of the layer. In the MATLAB execution environment, many layers that can be quantized but do not have any learnable parameters are quantized on the condition the input data to the layer is fixed-point.

These layers are conditionally quantized in the MATLAB execution environment:

Activation layers:
Combination layers:
- additionLayer
- multiplicationLayer
Pooling layers:
Sequence layers:
- flattenLayer

For layers with multiple inputs, like additionLayer or multiplicationLayer, all inputs must be fixed-point in order for the layer to perform the layer with fixed-point computations.

Execution Environment Specific Limitations

Some limitations only apply to specific execution environments.

MATLAB Execution Environment

To use the MATLAB execution environment for calibration, quantization, and validation, your network must be a dlnetwork object. Use the prepareNetwork function or the Network Preparation step in the Deep Network Quantizer app to convert your network to a dlnetwork.

This table outlines limitations in the MATLAB execution environment for specific layers in the quantization workflow. For additional limitations that may affect code generation, see List of Deep Learning Layer Blocks and Subsystems.

Layer Names	Limitations in MATLAB Execution Environment
`averagePooling1dLayer` and `averagePooling2dLayer`	The `PaddingValue` value must be `0`.
`batchNormalizationLayer`	If the `batchNormalizationLayer` layer follows a `convolution1dLayer`, `convolution2dLayer`, `groupedConvolution2dLayer`, or `fullyConnectedLayer` layer, fuse the `batchNormalizationLayer` layer to the convolutional or fully connected layer using the `prepareNetwork` function to support quantization. For other network architectures, the `prepareNetwork` function replaces the `batchNormalizationLayer` layer with a convolutional or fully connected layer. Batch normalization layers that are not fused or replaced are not supported for quantization.
`lstmLayer` and `lstmProjectedLayer`	The `StateActivationFunction` value must be `"tanh"` or `"softsign"`. The `GateActivationFunction` value must be `"sigmoid"` or `"hard-sigmoid"`. The `HasStateInputs` and `HasStateOutputs` values must be `0 (false)`.
Rescale-Symmetric 1D, Rescale-Symmetric 2D, Rescale-Zero-One 1D, and Rescale-Zero-One 2D	The `Output minimum` and `Output maximum` values must not be equal.
Zerocenter 1D, Zerocenter 2D, Zscore 1D, and Zscore 2D	The `Mean` value of the corresponding `imageInputLayer`, `featureInputLayer`, or `sequenceInputLayer` must be nonzero.

GPU Execution Environment

This table outlines limitations in the GPU execution environment for specific layers in the quantization workflow. For additional limitations that affect code generation, see Supported Networks, Layers, and Classes (GPU Coder).

Layer Names	Limitations in GPU Execution Environment
`additionLayer`	This layer is only quantized if it is in one of these architectures and the involved convolution layer only feeds into quantized layers: Input layer is `convolution2dLayer` or `groupedConvolution2dLayer`, and output layer is `reluLayer`. The same architecture as above but with a `batchNormalizationLayer` layer between the convolutional layer and the `additionLayer` layer.
`batchNormalizationLayer`	If the `batchNormalizationLayer` layer follows a `convolution2dLayer` or `groupedConvolution2dLayer` layer, then the `batchNormalizationLayer` layer automatically fuses to the convolutional or fully connected layer to support quantization. If the `batchNormalizationLayer` layer follows a `fullyConnectedLayer` layer, then fuse the `batchNormalizationLayer` layer to the fully connected layer using the `prepareNetwork` function to support quantization. For other network architectures, the `prepareNetwork` function replaces the `batchNormalizationLayer` layer with a convolutional or fully connected layer. Batch normalization layers that are not fused or replaced are not supported for quantization.
`maxPooling2dLayer`	The `NumOutputs` value must be 1.
`reluLayer`	This layer is only quantized if it is in one of these architectures and the involved convolution layer only feeds into quantized layers: Input layer is `convolution2dLayer` or `groupedConvolution2dLayer`. The same architecture as above but with a `batchNormalizationLayer` layer between the convolutional layer and the `reluLayer` layer. Any of the architectures listed in the limitations for `additionLayer`.

FPGA Execution Environment

This table outlines limitations in the FPGA execution environment for specific layers in the quantization workflow. For additional limitations that affect code generation, see Supported Networks, Boards, and Tools (Deep Learning HDL Toolbox).

Layer Names Limitations in FPGA Execution Environment

Layer Names	Limitations in FPGA Execution Environment
`batchNormalizationLayer`	If the `batchNormalizationLayer` layer follows a `convolution2dLayer` or `groupedConvolution2dLayer` layer, then the `batchNormalizationLayer` layer automatically fuses to the convolutional or fully connected layer to support quantization. If the `batchNormalizationLayer` follows a `convolution1dLayer` or `fullyConnectedLayer` layer, fuse the `batchNormalizationLayer` layer to the convolutional or fully connected layer using the `prepareNetwork` function to support quantization. For other network architectures, the `prepareNetwork` function replaces the `batchNormalizationLayer` layer with a convolutional or fully connected layer. Batch normalization layers that are not fused or replaced are not supported for quantization.
`convolution1dLayer`	Only supported in a `dlnetwork` object.

batchNormalizationLayer

If the batchNormalizationLayer layer follows a convolution2dLayer or groupedConvolution2dLayer layer, then the batchNormalizationLayer layer automatically fuses to the convolutional or fully connected layer to support quantization.

If the batchNormalizationLayer follows a convolution1dLayer or fullyConnectedLayer layer, fuse the batchNormalizationLayer layer to the convolutional or fully connected layer using the prepareNetwork function to support quantization.

For other network architectures, the prepareNetwork function replaces the batchNormalizationLayer layer with a convolutional or fully connected layer.

Batch normalization layers that are not fused or replaced are not supported for quantization.

convolution1dLayer

Only supported in a dlnetwork object.