Main Content

Generate and Deploy Optimized Code for Interpolated FIR Filter on Raspberry Pi Using ARM Cortex-A CMSIS CRL

Since R2024b

This example shows how to generate and deploy optimized code for an interpolated finite impulse response filter (IFIR) on a Raspberry Pi® target using the ARM® Cortex®-A CMSIS code replacement library (CRL) in Simulink®. Additionally, this example uses SIL/PIL (software in loop/processor in loop) manager application to conduct simulations, which collect the execution time metrics of the generated code and compare the execution time of the generated code with that of the plain C code.

This example uses ARM Cortex-A CMSIS CRL to generate optimized code, but you can also use the Neon V7 instruction set to generate optimized code. For more information on this workflow, see Use Target Hardware Instruction Set Extensions to Generate SIMD Code from Simulink Blocks for ARM Cortex-A Processors (DSP System Toolbox).

Design IFIR Filter for Lowpass Response

IFIR consists of FIR Decimation (DSP System Toolbox), Discrete FIR Filter, and FIR Interpolation (DSP System Toolbox) blocks. The FIR Decimation block downconverts the input signal to a lower sampling rate. The FIR Filter block filters the signal, and the FIR Interpolation block restores the sampling rate of the filtered output to the original sampling rate of the input signal.

Open the model.

mdl = 'ifir_example';
open_system(mdl);

Set passband ripple to 0.005 dB, stopband attenuation to 80 dB, interpolation factor to 7, passband edge frequency to 0.1π rad/sample, and stopband edge frequency to 0.101π rad/sample.

APass = 0.005; % dB
AStop = 80; % dB
FStop = .101;
M = 7;
F = [.1 FStop];

Use convertmagunits function to convert the passband ripple and stop band attenuation from dB to the linear scale.

A = [convertmagunits(APass,'db','linear','pass') convertmagunits(AStop,'db','linear','stop')];

Use the ifir (DSP System Toolbox) function to get the coefficients of the FIR Decimation, Discrete FIR filter, and FIR interpolation for the specified lowpass response parameters. The ifir (DSP System Toolbox) function designs a periodic filter, h(z), which provides the coefficients for the Discrete FIR Filter block. It also designs an image-suppressor filter, g(z), which provides the coefficients for the FIR Decimation and FIR Interpolation blocks shown in this model.

[h,g] = ifir(M,'low',F,A);

The code to compute h(z) and g(z) is set in the PreLoadFcn of the model as the FIR decimation, FIR Interpolation and FIR Filter blocks use these coefficients as parameters. To open PreLoadFcn, follow these steps:

  1. In the Simulink Toolstrip, on the Modeling tab, in the Design gallery, select Property Inspector.

  2. Clear any model element selections and, on the Properties tab, in the Callbacks section, select PreLoadFcn.

To distinguish the performance metrics of IFIR Filter in the execution profile report, you create an atomic subsystem consisting of the FIR decimation, FIR Interpolation and FIR Filter blocks. To create a subsystem, you select the block, right click and click on the option Create Subsystem from Selection. To make the subsystem atomic, you select the subsystem block, go to the Subsystem Block tab and click on Atomic Subsystem.

Simulate the IFIR model by running these commands. Set the default simulation time to 100 seconds. View the noisy input signal and the interpolated FIR filter output in the Spectrum Analyzer.:

set_param(mdl,'SimulationMode', 'normal');
sim(mdl);

Configure IFIR Simulink Model to Generate Optimized Code

You can configure the Simulink model either interactively, using the Model Configuration Parameters UI, or programmatically, through the MATLAB command line interface.

Configure using Model Configuration Parameters UI

In the Apps tab of the Simulink toolstrip, click the Embedded Coder app. In the C Code tab that opens, click Settings.

In the left pane of the Configuration Parameters dialog box, select Hardware Implementation, and, in the right pane, set the Hardware board parameter to Raspberry Pi.

In the left pane, select Code Generation , and, in the right pane:

  • Set the System target file to ert.tlc.

  • Set the Build configuration to Faster Runs to prioritize execution speed.

In the left pane, under Code Generation, select Interface, and, in the right pane, set the Code replacement libraries to ARM Cortex-A CMSIS.

To see which block triggers code replacement, you can set these options under Code Generation, in the Report pane:

  • Select the Create code generation report.

  • Select the Open report automatically.

  • Select the Summarize which blocks triggered code replacements

Configure using MATLAB Command Line Interface

Alternatively, you can set all the configurations using set_param commands.

Set the Hardware Board parameter to Raspbery Pi based on your hardware.

set_param('ifir_example','HardwareBoard','Raspberry Pi');

Select ert.tlc as the system target file to optimize the code for embedded real-time systems, and choose Faster Runs for the build configuration to prioritize execution speed.

set_param(mdl,'SystemTargetFile','ert.tlc');
set_param(mdl,'BuildConfiguration','Faster Runs');

Set the code replacement libraries to ARM Cortex-A CMSIS.

set_param(mdl,'CodeReplacementLibrary','ARM Cortex-A CMSIS');

Configure the code generation report to generate and open automatically, and show blocks that triggered code replacement.

set_param(mdl,'GenerateReport','On');
set_param(mdl,'LaunchReport','On');
set_param(mdl,'GenerateCodeReplacementReport','On');

Simulate on Target Using SIL/PIL Manager

Use the SIL/PIL Manager app to simulate the code on the target and to get the execution time of the generated code.

Follow these steps to perform SIL simulation:

  1. On the Embedded Coder app tab, in the Verify section, click Verify Code > SIL/PIL Manager.

  2. On the SIL/PIL tab, in the Mode section, select Automated Verification.

  3. In the Prepare section, set SIL/PIL Mode to Processor-in-Loop (PIL).

  4. In the Run Automated Verification section, click Run Verification.

In the SIL/PIL Manager tab, click on 'Run SIL/PIL'. Once the artifacts are built successfully, you can check replacements from the code generation report. Alternatively, you can execute the following command to run the SIL simulation.

set_param(mdl,'SimulationMode', 'processor-in-the-loop (pil)');
sim(mdl);

You can view the code execution metrics by clicking either 'Code Profile Analyzer' or 'Code execution profiling report'. ifir_example_step section corresponds to the IFIR subsystem. To compare the performance of the generated code, use the Average Execution Time in ns for the ifir_example_step.

Generate Code and Compare Performance

Use this interactive section to compare the performance of the generated code with the plain C code.You select the ARM Cortex-A CMSIS Code Replacement Library or Neon V7 Instruction Set Extension to optimize the generated code from the drop down list.

compare ="CRL";comparewith="None";

To get a better average, set the sample time of the Gaussian Noise to 1 and, stop time to 10000 so that the function is called 10001 times.

set_param(mdl,'StopTime','10000');
set_param([mdl,'/Gaussian Noise'],'SampTime','1');
set_param(mdl,'FixedStep','1');
if compare == "CRL"
    set_param(mdl,'CodeReplacementLibrary','ARM Cortex-A CMSIS');
else 
    set_param(mdl,'InstructionSetExtensions',compare);
    set_param(mdl,'OptimizeReductions','on');
    set_param(mdl,'OptimizationLevel','level2');
    set_param(mdl,'OptimizationPriority','Speed');
end

out = sim(mdl);

Get the total execution time of the generated code for comparison.

profileSectionIndex = 4;
tcompare = out.get('executionProfile').Sections(profileSectionIndex).TotalExecutionTimeInTicks;

Set the instruction set and code replacement library to none to generate Plain C code.

set_param(mdl,'InstructionSetExtensions','None');
set_param(mdl,'CodeReplacementLibrary','None');

out = sim(mdl);

Get the execution time of the generated plain C code for comparison.

tcomparewith = out.get('executionProfile').Sections(profileSectionIndex).TotalExecutionTimeInTicks;
close_system(mdl,0);

Compare the performance.

performanceGain = single(tcomparewith) ./ single(tcompare)
performanceGain = single
    2.1060

The ARM Cortex-A CMSIS CRL achieves a performance gain of about 2.2x compared to plain C code, while the Neon V7 instruction set extension shows a performance gain of about 1.2x compared to plain C code.

Note: This example compares the performance within a 32-bit Raspberry Pi environment. Note that the performance numbers may vary in your Raspberry Pi environment.