Main Content

Code Generation for Sound Classification on ARM Cortex-M Targets using CMSIS-NN

This example shows how you can use the CMSIS-NN library and pretrained network to generate code for the melSpectrogram (Audio Toolbox) example. In this example, you can classify white noise, brown noise and pink noise by generating a processor-in-the-loop (PIL) MEX function, which allows you to execute the generated code on target hardware, such as the STM32 Nucleo F767ZI board. The PIL interface in the MATLAB™ environment facilitates the execution of the generated executable on the target hardware.

Sound classification plays a crucial role in various applications such as speech recognition, audio surveillance, and environmental sound monitoring. Deep learning techniques have shown great potential in accurately classifying sound. The CMSIS-NN library is used to optimize deep learning models for execution on microcontrollers and embedded systems. It enables low-precision (int8) inference, leading to improved memory and computational efficiency.

To run this example, you need a Cortex-M™ hardware such as the STM32 Nucleo F767ZI board and the CMSIS-NN Library.

The workflow in this example includes: calibrating and generating a quantizer object, generating code using the quantizer object and network, validating the test results and measuring the performance gain.

Generate Calibration Result File

Load the pretrained network .mat file. Create a dlquantizer (Deep Learning Toolbox) (Deep Learning Toolbox) object and specify the network. Note that code generation does not support quantized deep neural networks produced by the quantize (Deep Learning Toolbox) function.

load('soundClassificationNet.mat');
quantizedNet = dlquantizer(net, 'ExecutionEnvironment', 'CPU');

Load the training data .mat file containing the featuresTrain and labelsTrain variables. The featuresTrain variable contains the white noise, brown noise, and pink noise signals, and labelsTrain variable contains their corresponding labels. You can use the calibrate (Deep Learning Toolbox) function to train the network with a set of inputs and collect range information. This function collects the dynamic ranges of the weights and biases in the Long Short Term Memory (LSTM) and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network.

load('soundClassificationTrainingData.mat');
featuresDatastore = arrayDatastore(featuresTrain,"IterationDimension",1,"OutputType","same");
labelsTrain = cellstr(labelsTrain);
labelsDatastore = arrayDatastore(labelsTrain,"IterationDimension",1,"OutputType","same");
data = combine(featuresDatastore,labelsDatastore);
quantizedNet.calibrate(data);

Save the dlquantizer object as a .mat file to pass it to the codegen function.

save('soundClassificationQuantObj.mat', 'quantizedNet')

Configure and Generate PIL MEX Function

The net_predict.m entry point function uses the coder.loadDeepLearningNetwork function to load a deep learning model and construct and set up a Recurrent Neural Network. The entry point function then predicts the responses by using the predict (Deep Learning Toolbox) function.

type net_predict.m
% Copyright 2021-23 The MathWorks, Inc.

function out = net_predict(netMatFile, in)

persistent net

if isempty(net)
    net = coder.loadDeepLearningNetwork(netMatFile);
end

out = net.predict(in);
end

To generate a PIL MEX function, create a coder.config object for a static library and set the verification mode to 'PIL'. Set the target language to C. Limit the stack size to a reasonable size, for example, 512 bytes, as the default size is much larger than the memory available on the hardware board.

cfg = coder.config('lib', 'ecoder', true);
cfg.VerificationMode = 'PIL';
cfg.StackUsageMax = 512;
cfg.TargetLang = 'C';
cfg.CodeExecutionProfiling = true;

Create a deep learning configuration object coder.DeepLearningConfig for the CMSIS-NN library.

dlcfg = coder.DeepLearningConfig('cmsis-nn');

To generate a code that performs low-precision (int8) inference, assign the saved dlquantizer object .mat file to the coder.DeepLearningConfig object. Set the DeepLearningConfig property of coder.config object cfg to the coder.DeepLearningConfig object dlcfg.

dlcfg.CalibrationResultFile = 'soundClassificationQuantObj.mat';
cfg.DeepLearningConfig = dlcfg;

Create a coder.hardware object for the STM32 Nucleo F767ZI board. Set the hardware property of the coder.config object cfg to coder.hardware object hw. In the following code, replace COM4 with the port number to which you have connected the Cortex-M hardware. On the Cortex-M hardware board, set the CMSISNN_PATH environment variable to the location of the CMSIS-NN library build on the Cortex-M board. For more information on building a library and setting environment variables, see Prerequisites for Deep Learning with MATLAB Coder.

hardware = 'STM32 Nucleo F767ZI';
hw = coder.hardware(hardware);
hw.PILInterface = 'Serial';
% Replace COM4 with the actual port number
hw.PILCOMPort = 'COM4';
cfg.Hardware = hw;

Load the test data attached as a MAT-file that contains the featuresTest and labelsTest variables. The featuresTest contains new white noise, brown noise, and pink noise signals and labelsTest contains their corresponding labels.

load('soundClassificationTestingData.mat')
args = {coder.Constant('soundClassificationNet.mat'), featuresTest{1}};

Use the codegen command to generate a PIL MEX function.

codegen -config cfg net_predict -args args -report

Run Generated PIL MEX Function

Run the generated MEX function net_predict_pil on test data and classify white noise, brown noise and pink noise signals.

outputPil = cell(size(featuresTest)); 
classNames = {'white', 'brown', 'pink'}; 
predictedClasses = cell(size(featuresTest)); 

for i = 1:numel(featuresTest)
    outputPil{i} = net_predict_pil('soundClassificationNet.mat', featuresTest{i});
    [~, classIdx] = max(outputPil{i});
    predictedClasses{i} = classNames{classIdx};
end

Create a confusion matrix chart from the true labels in labelsTest and the predicted labels predictedClasses.

cm = confusionchart(labelsTest,predictedClasses);

Measure Performance Gain

Terminate the PIL execution to measure the execution time of CMSIS-NN code.

clear net_predict_pil

Generate an execution profile report to evaluate execution time.

executionProfileCMSISNN = getCoderExecutionProfile('net_predict');
report(executionProfileCMSISNN, ...
    'Units','Seconds', ...
    'ScaleFactor','1e-03', ...
    'NumericFormat','%0.4f')
executionTimeCMSISNN = mean([executionProfileCMSISNN.Sections.ExecutionTimeInSeconds]);

Create a deep learning configuration object coder.DeepLearningConfig to measure the performance gain when using CMSIS-NN library.

deepLearningCfg = coder.DeepLearningConfig('none');
cfg.DeepLearningConfig = deepLearningCfg;

Use the codegen command to generate a PIL MEX function.

codegen('net_predict.m', '-config', cfg, '-args', args, '-report');

Run the generated MEX function net_predict_pil on test data and terminate the PIL execution.

for i = 1:numel(featuresTest)
    outputPil{i} = net_predict_pil('soundClassificationNet.mat', featuresTest{i});
    [~, classIdx] = max(outputPil{i});
    predictedClasses{i} = classNames{classIdx};
end
clear net_predict_pil

Measure execution time of plain C code by generating execution profile report.

executionProfilePlainC = getCoderExecutionProfile('net_predict');
report(executionProfilePlainC, ...
    'Units','Seconds', ...
    'ScaleFactor','1e-03', ...
    'NumericFormat','%0.4f')
executionTimePlainC = mean([executionProfilePlainC.Sections.ExecutionTimeInSeconds]);

Calculate the performance gain of CMSIS-NN over plain C.

CMSISNNPerformanceGainOverPlainC = executionTimePlainC ./ executionTimeCMSISNN
CMSISNNPerformanceGainOverPlainC = 1.3305
bar(["CMSIS-NN","plain C"],[executionTimeCMSISNN;executionTimePlainC])
ylabel('Execution Time (seconds)');
title('Performance Comparison of CMSIS-NN and plain C');