Packet-based systems are common in wireless communications. Data is received over the air and is decoded as discrete packet data on a compute device. For given system requirements, it is difficult to design a system and implement directly on SoC as it often involves long iterations of debugging and integration on hardware since hardware effects are difficult to account for at design time. In this example, you will design packet-based airplane tracking application based on Automatic Dependent Surveillance Broadcast (ADS-B) standard, partitioned between FPGA and embedded processor. Unlike traditional methods, you will simulate the application design with memory interface before implementation on hardware using SoC Blockset to shorten development time. You will then validate the design on hardware by automatically generated code from the model.
Supported Hardware Platforms:
Xilinx® Zynq® ZC706 evaluation kit + Analog Devices® FMCOMMS2/3/4 card.
ZedBoard™ + Analog Devices FMCOMMS2/3/4 card.
As per ADS-B standard a message packet contains a total of 120 bits which has an 8 bit preamble and 112 bits of information about the aircraft including its position and velocity. For an introduction to the Mode-S signaling scheme and ADS-B technology for tracking aircraft, refer to the 'Airplane Tracking Using MATLAB®' example in Communications Toolbox.
Our task is to design a system to receive ADS-B messages off the air and decode with following performance requirements:
Latency: 0.5 seconds
Drop sample rate: < 1 in 105 messages
Throughput: 0.125 MBps (for capacity of maximum 300 aircrafts)
Design Parameters: Data is transferred from FPGA to processor across shared memory as a frame of samples. There are two key design parameters, Frame Size and Number of Buffers which affect the above performance requirements.
Frame Size: Frame Size is the number of samples in a frame. It will be used for determining the buffer size in memory channel.
Number of Buffers: Number of frame buffers in memory channel. Data is continuously written into memory by FPGA algorithm as frame buffers which are then read by processor to execute its identification algorithm task.
Select the design parameters to satisfy the system requirements as follows:
Design to Meet Latency Requirement: Latency is the time period between when the data is received by the FPGA logic and the data is ready to be processed by the processor. It comprises of two parts, latency through the FPGA logic and the latency for the processor to be available to process data.
Latency through the FPGA logic is the time required for data processing through the FPGA. This is typically on the order of a number of clock cycles with the clock running in MHz range. Latency for the processor to be available to process data, is determined by the time it takes for samples to transfer from FPGA to processor through FIFO and memory frame buffers. If we size FPGA FIFO equivalent to one frame buffer, then the maximum latency can be written as follows:
As the Time to gather a frame is directly proportional to Frame Size, therefore, the maximum latency in the data transfer is directly proportional to Frame Size and Number of buffers.
Time to gather a frame is a constant for continuously streaming applications and is equal to Frame Size times the FPGA output sample time. However, for asynchronous packet-based systems, this time also depends on the frequency of arrival of packets. If you choose a Frame size larger than the packet size, then you may have to wait for an indeterminate time for arrival of all the packets required to make a frame. If you choose the packet size smaller than packet size, then it will adversely affect the throughput. Therefore, for asynchronous packet based systems, Frame Size equal to packet size is a reasonable choice. This allows each packet to transfer to processor as soon as the FPGA processing is completed, thereby reducing the latency.
For this example, the decoded packet length is 112 bits, packed into four 32-bit samples. So, the frame size is 4.
Design to Meet Throughput Requirement: Throughput is the amount of data produced as output per unit of time. This is a function of the data processing in FPGA and the data transfer & processing by processor. For FPGA logic, the data is processed at clock frequencies of the order of MHz and an output is produced every few clock cycles. For data transfer and processing by processor, it depends on Frame Size. A typical tradeoff is larger Frame size results in higher throughput but it increases the latency. Conversely, a smaller frame size results in lower latency but it decreases the throughput.
Design to Meet Drop Samples Requirement: An application may tolerate occasional drop data caused by the variations in task execution durations. Frame buffers in a memory channel hold data when it can't be immediately processed by the processor. Therefore, increasing the number of Frame buffers reduces the sample drop-outs but it adversely affects the latency as explained earlier.
Choose the Number of Buffers value such that you are able to meet the Drop samples requirement without affecting the maximum latency requirement.
For this example, the mean task duration, as measured on ZC706 is 114us. Each packet duration is 120us. Even if the packets arrive back to back, they can be processed with minimal number of frame buffers since on average the task is processed before the new packet arrives. So, set the number of frame buffers to the minimum possible, 3.
Create an SoC Model: Use the SDR Template for creating an SoC model for wireless communications applications.
The top model is depicted with bounding boxes that segment the model as follows:
External I/O: This part of the model contains the AD9361 RF Input and Output blocks which are connected to each other using a simplified channel model. In addition this region has LED blocks that connect the FPGA logic.
FPGA: The FPGA section of the model contains the FPGA algorithms which are designed in a separate model and instantiated here using model reference.
Memory: This section models the memory channel between FPGA and processor. It simulates the latencies in the HW/SW connection.
Register Channel: This section models three FPGA registers that are configured by the processor.
Processor: This section contains the Task Manager that is connected to processor model. The Task Manager controls the scheduling of processor tasks. The processor algorithm and initialization tasks are modeled in a separate model and is instantiated here using model references.
FPGA model contains the ADS-B Transmitter Algorithm that transmits test ADS-B packets at a variable rate and the ADS-B Receiver Algorithm that decodes received ADS-B messages.
The processor model contains Processor Algorithm that unpacks the received ADS-B packets into information bits and sends them via UDP Send block to another system for reporting the aircraft information. The processor algorithm task is denoted as dataTask in the Task Manager block and is specified as event-driven. The Task Manager schedules data asynchronously by means of a buffer ready event rdEvent in the memory channel.
The Initialize Function subsystem initializes appropriate hardware configuration registers. The AD9361 blocks set the center frequency, gain mode, and baseband sample rate of the attached FMC RF board. The other blocks model three memory mapped configurations of the ADS-B packet detector datapath. These include selection of input to receiver algorithm, transmit period of test packets from FPGA and threshold value for detection algorithm.
The model soc_ADSB_UDP_HostPrintout is a host UDP-based receive model that decodes ADS-B messages. Run this model in parallel to the ADSB simulation or deployment model to display the decoded ADS-B messages and also optionally map the aircraft location.
Run the model to visualize data transfer between the FPGA and the processor. The time period between the arrival of packets is a function of number of aircrafts. Given system requirement of detecting 300 aircrafts, there will be on average 300*6.2 = 1860 messages per second (or a message every 1/1860 = 0.54 ms). You can set the number of aircrafts using the variable NumAircraft which in turn sets the period in the Initialize Function subsystem. The default setting is 300 to match the allowable system capacity.
Open the Logic Analyzer window to see the waveforms, and notice that the memory transfers are taking place in buffers of 4 samples, or 16 bytes.
To view the external memory bandwidth usage, open the Mem Controller block, select the Performance tab and click View performance plots . Select all the masters and click Create Plot. The plot shows the bandwidth of 0.125 MBps. Since 4 bytes of data is transferred every 32us, the expected bandwidth is 4/32e-6 = 0.125 MBps.
Using the Simulation Data Inspector, you can visualize the task execution schedule. The data task is driven by the event from FPGA notifying the processor that a packet has been decoded by the FPGA, written to external memory, and read by the DMA driver.
To see the decoded messages, run the companion UDP receive model. This model will display the aircraft tracking information on a GUI.
As discussed earlier, since mean task duration of 114us is less than the packet duration of 120us, the messages are not dropped on average, during the transfer to the processor. This is confirmed by looking at the number of dropped samples at FIFO using signal icFIFODroppedCount in the Simulation Data Inspector.
The SoC model can be used to explore the design space. Consider the worst-case scenario when the plane messages are received densely and there is more computation load on the processor. You can modify the model settings and simulate and determine whether packets are dropped in this more aggressive scenario.
Set the NumAircraft to 990 (a new message every 163us) to simulate back to back arrival of plane messages. Modify the task specification on the Task Manager block to simulate more computation load on processor. On the Simulation tab, choose the second distribution by setting the Percent value to 100% on second row and 0% on the first row. This assigns a mean task duration of 163us, which will result in some task executions taking longer than allowed. Set the simulation time to 0.1ms and simulate. For 990 planes, the messages arrival rate is 990*6.2 = 6138 messages per second. The drop packet requirement is therefore, 6138/105 = 58 messages per second or 5.8 messages in 0.1 sec. Upon simulation notice in the Logic Analyzer that this requirement is violated as 18 messages have been dropped.
Following products are required for this section:
To implement the model on a supported SoC board use the SoC Builder tool. By default, the model will be implemented on Xilinx® Zynq® ZC706 evaluation kit as it is configured with that board. To open SoC Builder, select the 'System on Chip' tab in the Simulink toolstrip, and click the 'Configure, Build, & Deploy' button. Once SoC Builder opens, follow these steps:
Select 'Build Model' on 'Setup' screen. Click 'Next'.
Click 'View/Edit Memory Map' to view the memory map on 'Review Memory Map' screen. Click 'Next'.
Specify project folder on 'Select Project Folder' screen. Click 'Next'.
Select 'Build, load and run' on 'Select Build Action' screen. Click 'Next'.
Click 'Validate' to check the compatibility of model for implementation on 'Validate Model' screen. Click 'Next'.
Click 'Build' to begin building of the model on 'Build Model' screen. An external shell will open when FPGA synthesis begins. Click 'Next'.
Click 'Test Connection' on 'Connect Hardware' screen to test the connectivity of host computer with SoC board. Click 'Next' to go to 'Run Application' screen.
The FPGA synthesis may take more than 30 minutes to complete. To save time, you may want to use the provided pre-generated bitstream by following these steps:
Close the external shell to terminate synthesis.
Copy pre-generated bitstream to your project folder by running the command below and then,
Click 'Load and Run' button to load pre-generated bitstream and run the model on SoC board
copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot,'toolbox','soc',... 'supportpackages','xilinxsoc','xilinxsocexamples','bitstreams',... 'soc_ADSB-zc706.bit'),'./soc_prj');
Implementation on ZedBoard: To implement the model on ZedBoard, you must first configure the model to ZedBoard and set the following example parameters. Open Model Configuration Parameters, navigate to Hardware Implementation tab and perform the following:
Select ZedBoard from the drop-down list under 'Hardware board' on both top and processor model.
Navigate to Target hardware resources > FPGA design (top level) tab, enable Include MATLAB as AXI Master IP for host-based interaction and set IP core clock frequency (MHz) to 4 MHz.
Navigate to Target hardware resources > FPGA design (debug) tab and enable Include AXI Interconnect monitor.
Navigate to Device details and select Support long long on both top and processor model.
Next, open SoC Builder and follow the steps as previously stated for Xilinx® Zynq® ZC706 above. Modify the copyfile command to match Zedboard bitstream 'soc_ADSB-zedboard.bit'.
To enable processor task profiling, open configuration parameters and navigate to Hardware Implementation > Hardware Board settings > Task Profiling on processor and select 'Show on SDI' and 'Save to file'. Set the Simulation stop time to 10 seconds and run the model in external mode. After simulation is completed, open Simulation Data Inspector (SDI) and navigate to the latest run and add signal DataReadTask to the plot. Observe that the simulation model accurately predicted how the application would perform on hardware.
This example showed how SoC Blockset is used to design packet-based ADS-B standard to meet system requirements. By simulating the design with memory channel as interface between the FPGA and the Processor you validated that the system requirements of throughput and drop packets are met at the design time. You implemented the design on SoC device from the model and verified the results on hardware. Although ADS-B is not a computationally intensive standard, it is useful to demonstrate the design process for packet-based systems intended for implementation on a SoC device. You can follow the same design procedure for even more computationally intensive requirements for this application or another packet-based application.