Generate and Deploy Optimized Code for Interpolated FIR Filter on Raspberry Pi Using ARM Cortex-A CMSIS CRL
This example shows how to generate and deploy optimized code for an interpolated finite impulse response filter (IFIR) on a Raspberry Pi® target using the ARM® Cortex®-A CMSIS code replacement library (CRL) in Simulink®. Additionally, this example uses SIL/PIL (software in loop/processor in loop) manager application to conduct simulations, which collect the execution time metrics of the generated code and compare the execution time of the generated code with that of the plain C code.
This example uses ARM Cortex-A CMSIS CRL to generate optimized code, but you can also use the Neon V7 instruction set to generate optimized code. For more information on this workflow, see Use Target Hardware Instruction Set Extensions to Generate SIMD Code from Simulink Blocks for ARM Cortex-A Processors (DSP System Toolbox).
Design IFIR Filter for Lowpass Response
IFIR consists of FIR Decimation (DSP System Toolbox), Discrete FIR Filter, and FIR Interpolation (DSP System Toolbox) blocks. The FIR Decimation block downconverts the input signal to a lower sampling rate. The FIR Filter block filters the signal, and the FIR Interpolation block restores the sampling rate of the filtered output to the original sampling rate of the input signal.
Open the model.
mdl = 'ifir_example';
open_system(mdl);
Set passband ripple to 0.005 dB, stopband attenuation to 80 dB, interpolation factor to 7, passband edge frequency to 0.1π rad/sample, and stopband edge frequency to 0.101π rad/sample.
APass = 0.005; % dB AStop = 80; % dB FStop = .101; M = 7; F = [.1 FStop];
Use convertmagunits
function to convert the passband ripple and stop band attenuation from dB to the linear scale.
A = [convertmagunits(APass,'db','linear','pass') convertmagunits(AStop,'db','linear','stop')];
Use the ifir
(DSP System Toolbox) function to get the coefficients of the FIR Decimation, Discrete FIR filter, and FIR interpolation for the specified lowpass response parameters. The ifir
(DSP System Toolbox) function designs a periodic filter, h(z)
, which provides the coefficients for the Discrete FIR Filter block. It also designs an image-suppressor filter, g(z)
, which provides the coefficients for the FIR Decimation and FIR Interpolation blocks shown in this model.
[h,g] = ifir(M,'low',F,A);
The code to compute h(z) and g(z) is set in the PreLoadFcn
of the model as the FIR decimation, FIR Interpolation and FIR Filter blocks use these coefficients as parameters. To open PreLoadFcn
, follow these steps:
In the Simulink Toolstrip, on the Modeling tab, in the Design gallery, select Property Inspector.
Clear any model element selections and, on the Properties tab, in the Callbacks section, select
PreLoadFcn
.
To distinguish the performance metrics of IFIR Filter in the execution profile report, you create an atomic subsystem consisting of the FIR decimation, FIR Interpolation and FIR Filter blocks. To create a subsystem, you select the block, right click and click on the option Create Subsystem from Selection. To make the subsystem atomic, you select the subsystem block, go to the Subsystem Block tab and click on Atomic Subsystem.
Simulate the IFIR model by running these commands. Set the default simulation time to 100
seconds. View the noisy input signal and the interpolated FIR filter output in the Spectrum Analyzer.:
set_param(mdl,'SimulationMode', 'normal'); sim(mdl);
Configure IFIR Simulink Model to Generate Optimized Code
You can configure the Simulink model either interactively, using the Model Configuration Parameters UI, or programmatically, through the MATLAB command line interface.
Configure using Model Configuration Parameters UI
In the Apps tab of the Simulink toolstrip, click the Embedded Coder app. In the C Code tab that opens, click Settings.
In the left pane of the Configuration Parameters dialog box, select Hardware Implementation, and, in the right pane, set the Hardware board parameter to Raspberry Pi
.
In the left pane, select Code Generation , and, in the right pane:
Set the System target file to
ert.tlc
.Set the Build configuration to
Faster Runs
to prioritize execution speed.
In the left pane, under Code Generation, select Interface, and, in the right pane, set the Code replacement libraries to ARM Cortex-A CMSIS
.
To see which block triggers code replacement, you can set these options under Code Generation, in the Report pane:
Select the Create code generation report.
Select the Open report automatically.
Select the Summarize which blocks triggered code replacements
Configure using MATLAB Command Line Interface
Alternatively, you can set all the configurations using set_param
commands.
Set the Hardware Board parameter to Raspbery Pi
based on your hardware.
set_param('ifir_example','HardwareBoard','Raspberry Pi');
Select ert.tlc
as the system target file to optimize the code for embedded real-time systems, and choose Faster Runs
for the build configuration to prioritize execution speed.
set_param(mdl,'SystemTargetFile','ert.tlc'); set_param(mdl,'BuildConfiguration','Faster Runs');
Set the code replacement libraries to ARM Cortex-A CMSIS
.
set_param(mdl,'CodeReplacementLibrary','ARM Cortex-A CMSIS');
Configure the code generation report to generate and open automatically, and show blocks that triggered code replacement.
set_param(mdl,'GenerateReport','On'); set_param(mdl,'LaunchReport','On'); set_param(mdl,'GenerateCodeReplacementReport','On');
Simulate on Target Using SIL/PIL Manager
Use the SIL/PIL Manager app to simulate the code on the target and to get the execution time of the generated code.
Follow these steps to perform SIL simulation:
On the Embedded Coder app tab, in the Verify section, click Verify Code > SIL/PIL Manager.
On the SIL/PIL tab, in the Mode section, select Automated Verification.
In the Prepare section, set SIL/PIL Mode to
Processor-in-Loop (PIL)
.In the Run Automated Verification section, click Run Verification.
In the SIL/PIL Manager tab, click on 'Run SIL/PIL'. Once the artifacts are built successfully, you can check replacements from the code generation report. Alternatively, you can execute the following command to run the SIL simulation.
set_param(mdl,'SimulationMode', 'processor-in-the-loop (pil)'); sim(mdl);
You can view the code execution metrics by clicking either 'Code Profile Analyzer' or 'Code execution profiling report'. ifir_example_step
section corresponds to the IFIR subsystem. To compare the performance of the generated code, use the Average Execution Time in ns for the ifir_example_step
.
Generate Code and Compare Performance
Use this interactive section to compare the performance of the generated code with the plain C code.You select the ARM Cortex-A CMSIS Code Replacement Library or Neon V7 Instruction Set Extension to optimize the generated code from the drop down list.
compare ="CRL";comparewith="None";
To get a better average, set the sample time of the Gaussian Noise to 1 and, stop time to 10000 so that the function is called 10001 times.
set_param(mdl,'StopTime','10000'); set_param([mdl,'/Gaussian Noise'],'SampTime','1'); set_param(mdl,'FixedStep','1'); if compare == "CRL" set_param(mdl,'CodeReplacementLibrary','ARM Cortex-A CMSIS'); else set_param(mdl,'InstructionSetExtensions',compare); set_param(mdl,'OptimizeReductions','on'); set_param(mdl,'OptimizationLevel','level2'); set_param(mdl,'OptimizationPriority','Speed'); end out = sim(mdl);
Get the total execution time of the generated code for comparison.
profileSectionIndex = 4;
tcompare = out.get('executionProfile').Sections(profileSectionIndex).TotalExecutionTimeInTicks;
Set the instruction set and code replacement library to none
to generate Plain C code.
set_param(mdl,'InstructionSetExtensions','None'); set_param(mdl,'CodeReplacementLibrary','None'); out = sim(mdl);
Get the execution time of the generated plain C code for comparison.
tcomparewith = out.get('executionProfile').Sections(profileSectionIndex).TotalExecutionTimeInTicks;
close_system(mdl,0);
Compare the performance.
performanceGain = single(tcomparewith) ./ single(tcompare)
performanceGain = single
2.1060
The ARM Cortex-A CMSIS CRL achieves a performance gain of about 2.2x compared to plain C code, while the Neon V7 instruction set extension shows a performance gain of about 1.2x compared to plain C code.
Note: This example compares the performance within a 32-bit Raspberry Pi environment. Note that the performance numbers may vary in your Raspberry Pi environment.