Main Content

Custom Deep Learning Processor Generation to Meet Performance Requirements

This example shows how to create a custom processor configuration and estimate the performance of a pretrained series network. You can then modify parameters of the custom processor configuration and re-estimate the performance. Once you have achieved your performance requirements you can generate a custom bitstream by using the custom processor configuration.

Prerequisites

  • Deep Learning HDL Toolbox™Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

  • Deep Learning Toolbox Model Quantization Library

  • MATLAB Coder Interface for Deep Learning Libraries

Load Pretrained Series Network

To load the pretrained series network LogoNet, enter:

snet = getLogoNetwork;

Create Custom Processor Configuration

To create a custom processor configuration, use the dlhdl.ProcessorConfig object. For more information, see dlhdl.ProcessorConfig. To learn about modifiable parameters of the processor configuration, see getModuleProperty and setModuleProperty.

hPC = dlhdl.ProcessorConfig;
hPC.TargetFrequency = 220;
hPC
hPC = 
                    Processing Module "conv"
                            ModuleGeneration: 'on'
                          LRNBlockGeneration: 'on'
                            ConvThreadNumber: 16
                             InputMemorySize: [227 227 3]
                            OutputMemorySize: [227 227 3]
                            FeatureSizeLimit: 2048

                      Processing Module "fc"
                            ModuleGeneration: 'on'
                      SoftmaxBlockGeneration: 'off'
                              FCThreadNumber: 4
                             InputMemorySize: 25088
                            OutputMemorySize: 4096

                   Processing Module "adder"
                            ModuleGeneration: 'on'
                             InputMemorySize: 40
                            OutputMemorySize: 40

              Processor Top Level Properties
                              RunTimeControl: 'register'
                          InputDataInterface: 'External Memory'
                         OutputDataInterface: 'External Memory'
                           ProcessorDataType: 'single'

                     System Level Properties
                              TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
                             TargetFrequency: 220
                               SynthesisTool: 'Xilinx Vivado'
                             ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
                     SynthesisToolChipFamily: 'Zynq UltraScale+'
                     SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
                    SynthesisToolPackageName: ''
                     SynthesisToolSpeedValue: ''

Estimate LogoNet Performance

To estimate the performance of the LogoNet series network, use the estimatePerformance function of the dlhdl.ProcessorConfig object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).

hPC.estimatePerformance(snet)
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   39926006                  0.18148                       1           39926006              5.5
    ____conv_1             6825671                  0.03103 
    ____maxpool_1          3755088                  0.01707 
    ____conv_2            10440701                  0.04746 
    ____maxpool_2          1447840                  0.00658 
    ____conv_3             9405685                  0.04275 
    ____maxpool_3          1765856                  0.00803 
    ____conv_4             1819636                  0.00827 
    ____maxpool_4            28098                  0.00013 
    ____fc_1               2651288                  0.01205 
    ____fc_2               1696632                  0.00771 
    ____fc_3                 89511                  0.00041 
 * The clock frequency of the DL processor is: 220MHz

The estimated frames per second is 5.5 Frames/s. To improve the network performance, modify the custom processor convolution module kernel data type, convolution processor thread number, fully connected module kernel data type, and fully connected module thread number. For more information about these processor parameters, see getModuleProperty and setModuleProperty.

Create Modified Custom Processor Configuration

To create a custom processor configuration, use the dlhdl.ProcessorConfig object. For more information, see dlhdl.ProcessorConfig. To learn about modifiable parameters of the processor configuration, see getModuleProperty and setModuleProperty.

hPCNew = dlhdl.ProcessorConfig;
hPCNew.TargetFrequency = 300;
hPCNew.ProcessorDataType = 'int8';
hPCNew.setModuleProperty('conv', 'ConvThreadNumber', 64);
hPCNew.setModuleProperty('fc', 'FCThreadNumber',   16);
hPCNew
hPCNew = 
                    Processing Module "conv"
                            ModuleGeneration: 'on'
                          LRNBlockGeneration: 'on'
                            ConvThreadNumber: 64
                             InputMemorySize: [227 227 3]
                            OutputMemorySize: [227 227 3]
                            FeatureSizeLimit: 2048

                      Processing Module "fc"
                            ModuleGeneration: 'on'
                      SoftmaxBlockGeneration: 'off'
                              FCThreadNumber: 16
                             InputMemorySize: 25088
                            OutputMemorySize: 4096

                   Processing Module "adder"
                            ModuleGeneration: 'on'
                             InputMemorySize: 40
                            OutputMemorySize: 40

              Processor Top Level Properties
                              RunTimeControl: 'register'
                          InputDataInterface: 'External Memory'
                         OutputDataInterface: 'External Memory'
                           ProcessorDataType: 'int8'

                     System Level Properties
                              TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
                             TargetFrequency: 300
                               SynthesisTool: 'Xilinx Vivado'
                             ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
                     SynthesisToolChipFamily: 'Zynq UltraScale+'
                     SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
                    SynthesisToolPackageName: ''
                     SynthesisToolSpeedValue: ''

Quantize LogoNet Series Network

To quantize the LogoNet network, enter:

dlquantObj = dlquantizer(snet,'ExecutionEnvironment','FPGA');
Image = imageDatastore('heineken.png','Labels','Heineken');
dlquantObj.calibrate(Image);

Estimate LogoNet Performance

To estimate the performance of the LogoNet series network, use the estimatePerformance function of the dlhdl.ProcessorConfig object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).

hPCNew.estimatePerformance(dlquantObj)
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   14306694                  0.04769                       1           14306694             21.0
    ____conv_1             3477191                  0.01159 
    ____maxpool_1          1876680                  0.00626 
    ____conv_2             2932291                  0.00977 
    ____maxpool_2           723536                  0.00241 
    ____conv_3             2611391                  0.00870 
    ____maxpool_3           882544                  0.00294 
    ____conv_4              641788                  0.00214 
    ____maxpool_4            14025                  0.00005 
    ____fc_1                665265                  0.00222 
    ____fc_2                425425                  0.00142 
    ____fc_3                 56558                  0.00019 
 * The clock frequency of the DL processor is: 300MHz

The estimated frames per second is 21.2 Frames/s.

Generate Custom Processor and Bitstream

Use the new custom processor configuration to build and generate a custom processor and bitstream. Use the custom bitstream to deploy the LogoNet network to your target FPGA board.

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2019.2\bin\vivado.bat');
% dlhdl.buildProcessor(hPCNew);

To learn how to use the generated bitstream file, see Generate Custom Bitstream.

The generated bitstream in this example is similar to the zcu102_int8 bitstream. To deploy the quantized LogoNet network using the zcu102_int8 bitstream, see Obtain Prediction Results for Quantized LogoNet Network.