Main Content

estimatePerformance

Class: dlhdl.ProcessorConfig
Package: dlhdl

Retrieve layer-level latencies and performance by using estimatePerformance method

Description

example

estimatePerformance(network) returns the layer-level latencies and network performance for the object specified by the network argument.

performance = estimatePerformance(network) returns a table containing the network object layer-level latencies and performance.

performance = estimatePerformance(network,Name,Value) returns a table containing the network object layer-level latencies and performance, with one or more arguments specified by optional name-value pair arguments.

Input Arguments

expand all

Name of network object for performance estimate.

Example: estimatePerformance(snet)

Name-Value Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Number of frames to consider for the calculation of performance estimation, specified as a positive number integer.

Example: 'FrameCount',10

Output Arguments

expand all

Network object performance for the ProcessorConfig object, returned as a table.

Examples

expand all

  1. Create a file in your current working folder called getLogoNetwork.m. In the file, enter:

    function net = getLogoNetwork
     if ~isfile('LogoNet.mat')
            url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat';
            websave('LogoNet.mat',url);
        end
        data = load('LogoNet.mat');
        net  = data.convnet;
    end
  2. Create a dlhdl.ProcessorConfig object.

    snet = getLogoNetwork;
    hPC = dlhdl.ProcessorConfig;
  3. To retrieve the layer-level latencies and performance for the LogoNet network, call the estimatePerformance method.

    hPC.estimatePerformance(snet)
    ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
    ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
    
    
                  Deep Learning Processor Estimator Performance Results
    
                       LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                             -------------             -------------              ---------        ---------       ---------
    Network                   39426458                  0.19713                       1           39426458              5.1
        ____conv_1             6822215                  0.03411 
        ____maxpool_1          3755088                  0.01878 
        ____conv_2            10440701                  0.05220 
        ____maxpool_2          1447840                  0.00724 
        ____conv_3             9362677                  0.04681 
        ____maxpool_3          1765856                  0.00883 
        ____conv_4             1377268                  0.00689 
        ____maxpool_4            28098                  0.00014 
        ____fc_1               2644886                  0.01322 
        ____fc_2               1692534                  0.00846 
        ____fc_3                 89295                  0.00045 
     * The clock frequency of the DL processor is: 200MHz

Estimate the performance of the ResNet-18 network for mulltiple frames by using the dlhdl.ProcessorConfig object.

Load the ResNet-18 network and save it to net

net = resnet18;

Create a dlhdl.ProcessorConfig object and save to hPC

hPC = dlhdl.ProcessorConfig;

Retrieve layer level latencies and performance in frames per second (FPS) for multiple frames by using the estimatePerformance method with FrameNumber as an optional input argument.

hPC.estimatePerformance(net,'FrameCount',10);
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'ClassificationLayer_predictions' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   21219873                  0.10610                      10          210125220              9.5
    ____conv1              2165372                  0.01083 
    ____pool1               646226                  0.00323 
    ____res2a_branch2a      966221                  0.00483 
    ____res2a_branch2b      966221                  0.00483 
    ____res2a               210750                  0.00105 
    ____res2b_branch2a      966221                  0.00483 
    ____res2b_branch2b      966221                  0.00483 
    ____res2b               210750                  0.00105 
    ____res3a_branch1       540749                  0.00270 
    ____res3a_branch2a      708564                  0.00354 
    ____res3a_branch2b      919117                  0.00460 
    ____res3a               105404                  0.00053 
    ____res3b_branch2a      919117                  0.00460 
    ____res3b_branch2b      919117                  0.00460 
    ____res3b               105404                  0.00053 
    ____res4a_branch1       509261                  0.00255 
    ____res4a_branch2a      509261                  0.00255 
    ____res4a_branch2b      905421                  0.00453 
    ____res4a                52724                  0.00026 
    ____res4b_branch2a      905421                  0.00453 
    ____res4b_branch2b      905421                  0.00453 
    ____res4b                52724                  0.00026 
    ____res5a_branch1       751693                  0.00376 
    ____res5a_branch2a      751693                  0.00376 
    ____res5a_branch2b     1415373                  0.00708 
    ____res5a                26368                  0.00013 
    ____res5b_branch2a     1415373                  0.00708 
    ____res5b_branch2b     1415373                  0.00708 
    ____res5b                26368                  0.00013 
    ____pool5                54594                  0.00027 
    ____fc1000              207351                  0.00104 
 * The clock frequency of the DL processor is: 200MHz

Tips

To obtain the performance estimation for a dlquantizer object, set the dlhdl.ProcessorConfig object ProcessorDataType to int8.

Introduced in R2021a