Main Content

Run Sequence-to-Sequence Classification on Intel FPGA

This example shows how to create, compile, and deploy a long short-term memory (LSTM) network trained on accelerometer data from human movement by using the Deep Learning HDL Toolbox™ Support Package for Intel® FPGA and SoC. Use the deployed network to classify human activity based on sequence input data. Use MATLAB® to retrieve the prediction results from the target device.

This example uses the network trained in the Sequence-to-Sequence Classification Using Deep Learning. This example uses sensor data obtained from a smartphone worn on the body and deploys an LSTM network trained to recognize the activity of the wearer based on time series data that represents accelerometer readings in three different directions. The graphs below show the raw data for these accelerometer readings over time and the resulting classifications. The training data contains time series data for seven people. Each sequence has three features and varies in length. The data set contains six training observations and one test observation.

ClassificationResultImage.png

Prerequisites

  • Intel Arria® 10 SoC development board

Load the Pretrained Network

To load the pretrained human body movement network, enter:

load SequenceToSequenceClassification

View the layers of the network by using the Deep Network Designer app.

deepNetworkDesigner(net)

Define FPGA Board Interface

Define the target FPGA board programming interface by creating a dlhdl.Target object. Specify that the interface is for a Intel board with an Ethernet interface. A JTAG interface can also be used.

To create the target object, enter:

hTarget = dlhdl.Target("Intel", "Interface", "Ethernet");

To use the JTAG interface, install Intel Quartus® Prime Standard Edition 22.1. To set the Intel Quartus Prime Standard Edition tool path, enter:

hdlsetuptoolpath('ToolName', 'Altera Quartus II', 'ToolPath', 'C:\altera\22.1\quartus\bin64');

Prepare Network for Deployment

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and FPGA board. In this example, the target FPGA board is the Intel Arria 10 SOC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('network',net,'Bitstream','arria10soc_lstm_single','Target',hTarget);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment. Because the total number of frames exceeds the default value, set the InputFrameNumberLimit name-value argument to 55,000.

dn = compile(hW,'InputFrameNumberLimit',55000)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream arria10soc_lstm_single.
### An output layer called 'Output1_softmax' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### The network includes the following layers:
     1   'sequenceinput'   Sequence Input    Sequence input with 3 dimensions  (SW Layer)
     2   'lstm'            LSTM              LSTM with 200 hidden units        (HW Layer)
     3   'fc'              Fully Connected   5 fully connected layer           (HW Layer)
     4   'softmax'         Softmax           softmax                           (SW Layer)
                                                                             
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'sequenceinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'Output1_softmax' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: lstm.wi ...
### Compiling layer group: lstm.wi ... complete.
### Compiling layer group: lstm.wo ...
### Compiling layer group: lstm.wo ... complete.
### Compiling layer group: lstm.wg ...
### Compiling layer group: lstm.wg ... complete.
### Compiling layer group: lstm.wf ...
### Compiling layer group: lstm.wf ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space
    _______________________    ______________    _______________

    "InputDataOffset"           "0x00000000"     "3.4 MB"       
    "OutputResultOffset"        "0x0035c000"     "3.4 MB"       
    "SchedulerDataOffset"       "0x006b8000"     "868.0 kB"     
    "SystemBufferOffset"        "0x00791000"     "20.0 kB"      
    "InstructionDataOffset"     "0x00796000"     "4.0 kB"       
    "FCWeightDataOffset"        "0x00797000"     "680.0 kB"     
    "EndOffset"                 "0x00841000"     "Total: 8.3 MB"

### Network compilation complete.


dn = 

  struct with fields:

             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]
       resourceTable: [6×2 table]

Program Bitstream on FPGA and Download Network Weights

To deploy the network on the Intel Arria 10 SoC hardware, run the deploy method of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board and download the network weights and biases. The deploy function programs the FPGA device and displays progress messages, and the required time to deploy the network.

 deploy(hW)
### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 172.21.89.235...
### Connection successful
### Programming FPGA device on Intel SoC hardware board at 172.21.89.235...
### Attempting to connect to the hardware board at 172.21.89.235...
### Connection successful
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
WARNING: Uboot script u-boot.scr detected, this may override boot settings
# Copying Bitstream arria10soc_lstm_single.core.rbf to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/arria10soc_lstm_single.core.rbf
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'LIBIIO CNN system with 3 AXI4 Master'
### Rebooting Intel SoC at 172.21.89.235...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 172.21.89.235...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Resetting network state.
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 29-Aug-2024 11:33:21

Load Human Activity Test Data

Load the test data and classify the activity at each time step. Each sequence has three features and varies in length. The three features correspond to the accelerometer readings in three different directions.

Load the human activity test data. XTest contains a single sequence of dimension 3. YTest contains a sequence of categorical labels that correspond to the activity at each time step.

load HumanActivityTest
numFeatures = 3;
figure
plot(XTest{1}')
xlabel("Time Step")
legend("Feature " + (1:numFeatures))
title("Test Data")

Run the Prediction

Classify the test data.

XTest{1} = dlarray(XTest{1}, 'CT');
YPred = predict(hW.Network, XTest{1});
YPred = scores2label(YPred, categories(YTest{1}));

Calculate the accuracy of the prediction.

acc = sum(YPred == YTest{1})./numel(YTest{1})
acc = 
0.9995

Compare the predictions with the test data by using a plot.

figure
plot(YPred,'.-')
hold on
plot(YTest{1})
hold off

xlabel("Time Step")
ylabel("Activity")
title("Predicted Activities")
legend(["Predicted" "Test Data"])

Compare this graph to the output of the predict method.

Run the predict method of the dlhdl.Workflow object, to retrieve the hardware prediction results.

predictions = hW.predict(XTest{1}, Profile='on');
### Resetting network state.
### Finished writing input activations.
### Running a sequence of length 53888.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                      18628                  0.00012                   53888         1029435175           7852.1
    memSeparator_2              78                  0.00000 
    memSeparator_0             245                  0.00000 
    lstm.wi                   3820                  0.00003 
    lstm.wo                   3841                  0.00003 
    lstm.wg                   3856                  0.00003 
    lstm.wf                   3938                  0.00003 
    lstm.sigmoid_1             267                  0.00000 
    lstm.sigmoid_3             267                  0.00000 
    lstm.tanh_1                247                  0.00000 
    lstm.sigmoid_2             251                  0.00000 
    lstm.multiplication_1       307                  0.00000 
    lstm.multiplication_2       271                  0.00000 
    lstm.c_add                 261                  0.00000 
    lstm.tanh_2                281                  0.00000 
    lstm.multiplication_3       211                  0.00000 
    fc                         443                  0.00000 
    memSeparator_1              44                  0.00000 
 * The clock frequency of the DL processor is: 150MHz
save("hardwarepredictions.mat","predictions")
actions = scores2label(predictions, categories(YTest{1}));

Calculate the accuracy of the FPGA board prediction.

accFPGA = sum(actions == YTest{1})./numel(YTest{1})
accFPGA = 
0.9995

Plot the comparison between the FPGA board predictions and test data.

figure
plot(actions,'.-')
hold on
plot(YTest{1})
hold off

xlabel("Time Step")
ylabel("Activity")
title("Predicted Activities")
legend(["Predicted" "Test Data"])

The hardware-predicted activities are similar to the simulation results.

See Also

| | | | | |

Topics