Synthesis Benchmark of Common Native Floating Point Operators
This example shows how to access and generate synthesis benchmarks for common native floating-point operators with Xilinx® Vivado® and Intel® Quartus® tool.
Access Generated Synthesis Results from a MAT File
Perform synthesis and timing analysis on common operators with Xilinx Vivado and Intel Quartus tool. These operators include basic math operators such as addition and subtraction, as well as more complex operators such as log, sin, cos, and atan. In this example, you configure the Simulink® models for synthesis with native floating-point mode in single-precision floating-point type, and utilizes different latency and mantissa multiplication strategies.
In this example, you use benchmarking data that has already been generated for the hdl_nfp_single_ops_benchmark
model. The benchmarking data is in the hdlcoder_synthesis_benchmarkNFP
MAT file. The MAT file contains the data for subsystems whose names start with op_
. Each subsystem corresponds to a device under test (DUT).
Open the Simulink model.
addpath(fullfile(matlabroot, 'toolbox', 'hdlcoder', 'hdlcoder', 'hdlutils', 'hdlBenchMarking')) open_system('hdl_nfp_single_ops_benchmark')
To generate the synthesis result from the MAT file:
1. Make a copy of the Simulink model in a directory for which you have write permission.
2. Run the runBenchmarkNFP
function by running these commands.
addpath(fullfile(matlabroot, 'toolbox', 'hdlcoder', 'hdlcoder', 'hdlutils', 'hdlBenchMarking')) Results = runBenchmarkNFP;
Generating synthesis benchmarks with the hdl_nfp_single_ops_benchmark
model takes several days to finish benchmarking.
To view the MAT file that contains synthesis results.
load('hdlcoder_synthesis_benchmarkNFP.mat')
NFPSynthesisResults
NFPSynthesisResults = struct with fields: Vivado: [1×1 struct] Quartus: [1×1 struct]
The MAT file contains a structure named NFPSynthesisResults
, which has two fields. The Vivado
field contains the synthesis results of the Xilinx Vivado. The Quartus
field contains the synthesis results of the Intel Quartus. View synthesis results of Vivado tool.
NFPSynthesisResults.Vivado
ans = struct with fields: Auto: [1×1 struct] FullMultiplier: [1×1 struct] PartMultiplierPartAddShift: [1×1 struct] NoMultiplierFullAddShift: [1×1 struct]
NFPSynthesisResults.Vivado
and NFPSynthesisResults.Quartus
contains synthesis results for different mantissa multiply strategies:
Auto
- Structure of synthesis results usingAuto
mantissa multiply strategyFullMultiplier
- Structure of synthesis results usingFull Multiplier
implementationPartMultiplierPartAddShift
- Structure of synthesis results usingPart Multiplier part AddShift
implementationNoMultiplierFullAddShift
- Structure of synthesis results usingNo Multiplier Full AddShift
implementation
For more information, see Mantissa Multiplier Strategy.
Each of these structures contain synthesis results for different latency strategies.
NFPSynthesisResults.Vivado.Auto
ans = struct with fields: MaxLatency: [14×13 table] MinLatency: [14×13 table] ZeroLatency: [14×13 table] CustomLatency: [58×13 table]
MaxLatency
- Table of synthesis results using the MAX latency strategyMinLatency
- Table of synthesis results using the MIN latency strategyZeroLatency
- Table of synthesis results using the ZERO latency strategyCustomLatency
- Table of synthesis results using the latency values from zero to the maximum latency
The NFPSynthesisResults.Vivado.Auto.CustomLatency
and NFPSynthesisResults.Quartus.Auto.CustomLatency
contains list of performance and resource utilization of the floating-point operators for latency values from zero to max latency with Auto
mantissa multiply strategy. View the complete list of performance and resource utilization results for Xilinx Vivado:
NFPSynthesisResults.Vivado.Auto.CustomLatency
ans = 58×13 table Fmax Slices SliceRegs LUTs DSPs RAMs URAMs Latency DataPathDelay Slack LogicLevels LogicDelay RouteDelay ______ ______ _________ ____ ____ ____ _____ _______ _____________ ______ ___________ __________ __________ add_sys_vivado_Auto_latency0 83.928 180 96 593 0 0 0 0 11.887 -9.915 35 3.042 8.845 add_sys_vivado_Auto_latency1 167.64 149 127 498 0 0 0 0 5.952 -3.965 22 2.219 3.733 add_sys_vivado_Auto_latency2 221.09 150 208 444 0 0 0 1 4.393 -2.523 13 1.512 2.881 add_sys_vivado_Auto_latency3 267.52 171 281 508 0 0 0 2 3.741 -1.738 8 0.603 3.138 add_sys_vivado_Auto_latency4 311.82 186 340 524 0 0 0 3 3.179 -1.207 7 0.97 2.209 add_sys_vivado_Auto_latency5 306.75 162 387 444 0 0 0 4 3.23 -1.26 10 0.898 2.332 add_sys_vivado_Auto_latency6 311.14 191 436 536 0 0 0 5 3.218 -1.214 9 0.95 2.267 add_sys_vivado_Auto_latency7 314.07 201 502 542 0 0 0 6 3.18 -1.184 6 0.714 2.466 add_sys_vivado_Auto_latency8 365.23 184 563 483 0 0 0 7 2.709 -0.738 7 0.931 1.778 add_sys_vivado_Auto_latency9 406.67 203 562 541 0 0 0 8 2.43 -0.459 6 0.91 1.52 add_sys_vivado_Auto_latency10 399.84 191 646 506 0 0 0 9 2.473 -0.501 8 1.014 1.459 add_sys_vivado_Auto_latency11 457.25 205 668 521 0 0 0 10 2.191 -0.187 5 0.62 1.571 sub_sys_vivado_Auto_latency0 86.252 166 96 573 0 0 0 0 11.566 -9.594 38 3.36 8.206 sub_sys_vivado_Auto_latency1 164.74 152 127 499 0 0 0 0 6.084 -4.07 23 2.105 3.979 sub_sys_vivado_Auto_latency2 202.63 134 208 444 0 0 0 1 4.569 -2.935 13 1.164 3.405 sub_sys_vivado_Auto_latency3 262.47 171 281 504 0 0 0 2 3.781 -1.81 8 0.603 3.178 sub_sys_vivado_Auto_latency4 320.72 187 341 527 0 0 0 3 3.034 -1.118 9 0.854 2.18 sub_sys_vivado_Auto_latency5 287.52 175 387 445 0 0 0 4 3.449 -1.478 8 0.77 2.679 sub_sys_vivado_Auto_latency6 322.27 201 438 541 0 0 0 5 3.104 -1.103 9 1.272 1.832 sub_sys_vivado_Auto_latency7 304.79 196 510 544 0 0 0 6 3.253 -1.281 13 1.456 1.797 sub_sys_vivado_Auto_latency8 391.7 195 562 483 0 0 0 7 2.524 -0.553 7 0.952 1.572 sub_sys_vivado_Auto_latency9 417.01 208 562 543 0 0 0 8 2.37 -0.398 6 0.89 1.48 sub_sys_vivado_Auto_latency10 351.49 194 647 506 0 0 0 9 2.817 -0.845 9 1.053 1.764 sub_sys_vivado_Auto_latency11 491.88 203 668 522 0 0 0 10 1.935 -0.033 4 0.53 1.405 singletonumerictype_sys_vivado_Auto_latency0 218.87 104 64 345 0 0 0 0 4.54 -2.569 16 1.418 3.122 singletonumerictype_sys_vivado_Auto_latency1 775.8 62 66 222 0 0 0 0 1.259 0.711 3 0.445 0.814 singletonumerictype_sys_vivado_Auto_latency2 465.12 107 99 346 0 0 0 1 2.154 -0.15 4 0.431 1.723 singletonumerictype_sys_vivado_Auto_latency3 461.89 110 157 365 0 0 0 2 2.136 -0.165 3 0.352 1.784 singletonumerictype_sys_vivado_Auto_latency4 462.32 116 189 366 0 0 0 3 2.134 -0.163 4 0.431 1.703 singletonumerictype_sys_vivado_Auto_latency5 468.6 109 222 343 0 0 0 4 2.106 -0.134 5 0.438 1.668 singletonumerictype_sys_vivado_Auto_latency6 459.35 113 211 366 0 0 0 5 2.178 -0.177 5 0.438 1.74 mul_sys_vivado_Auto_latency0 103.21 135 96 415 1 0 0 0 9.692 -7.689 23 4.765 4.927 mul_sys_vivado_Auto_latency1 218.77 147 164 377 1 0 0 0 1.984 -2.571 4 0.431 1.553 mul_sys_vivado_Auto_latency2 406.17 144 260 383 1 0 0 1 2.083 -0.462 3 0.414 1.669 mul_sys_vivado_Auto_latency3 262.12 153 314 401 1 0 0 2 3.472 -1.815 17 1.814 1.658 mul_sys_vivado_Auto_latency4 336.81 160 357 409 1 0 0 3 2.602 -0.969 14 1.286 1.316 mul_sys_vivado_Auto_latency5 439.56 172 443 386 1 0 0 4 1.895 -0.275 3 0.388 1.507 mul_sys_vivado_Auto_latency6 261.92 166 448 391 1 0 0 5 1.231 -1.818 1 0.266 0.965 mul_sys_vivado_Auto_latency7 455.37 163 543 391 1 0 0 6 0.566 -0.196 0 0.223 0.343 mul_sys_vivado_Auto_latency8 472.81 178 673 389 1 0 0 7 1.966 -0.115 11 1.095 0.871 round_sys_vivado_Auto_latency0 310.46 67 64 195 0 0 0 0 3.191 -1.221 9 1.031 2.16 round_sys_vivado_Auto_latency1 339.44 74 127 205 0 0 0 0 2.95 -0.946 9 1.056 1.894 round_sys_vivado_Auto_latency2 542.01 72 116 212 0 0 0 1 1.817 0.155 5 0.474 1.343 round_sys_vivado_Auto_latency3 350.02 73 160 180 0 0 0 2 2.828 -0.857 11 1.193 1.635 round_sys_vivado_Auto_latency4 394.63 92 246 219 0 0 0 3 2.505 -0.534 11 1.206 1.299 round_sys_vivado_Auto_latency5 534.19 93 311 214 0 0 0 4 1.876 0.128 4 0.531 1.345 ceil_sys_vivado_Auto_latency0 315.66 51 64 165 0 0 0 0 3.14 -1.168 11 1.514 1.626 ceil_sys_vivado_Auto_latency1 412.71 66 126 188 0 0 0 0 2.394 -0.423 7 0.96 1.434 ceil_sys_vivado_Auto_latency2 544.66 69 115 189 0 0 0 1 1.808 0.164 4 0.395 1.413 ceil_sys_vivado_Auto_latency3 414.42 81 159 162 0 0 0 2 2.384 -0.413 11 1.442 0.942 ceil_sys_vivado_Auto_latency4 424.81 85 244 194 0 0 0 3 2.324 -0.354 11 1.203 1.121 ceil_sys_vivado_Auto_latency5 514.14 84 310 203 0 0 0 4 1.916 0.055 4 0.497 1.419 floor_sys_vivado_Auto_latency0 312.89 53 64 165 0 0 0 0 3.168 -1.196 14 1.641 1.527 floor_sys_vivado_Auto_latency1 411.69 61 125 187 0 0 0 0 2.401 -0.429 7 0.924 1.477 floor_sys_vivado_Auto_latency2 515.46 64 115 189 0 0 0 1 1.912 0.06 4 0.496 1.416 floor_sys_vivado_Auto_latency3 398.57 73 158 162 0 0 0 2 2.48 -0.509 12 1.584 0.896 floor_sys_vivado_Auto_latency4 454.34 91 243 194 0 0 0 3 2.172 -0.201 11 1.22 0.952 floor_sys_vivado_Auto_latency5 542.89 96 308 203 0 0 0 4 1.814 0.158 4 0.395 1.419
Generate Synthesis Results
To generate synthesis results for other Simulink models or different settings, you can pass input arguments to the runBenchmarkNFP
function. Specify name-value arguments, where name
is the argument name and value
is the corresponding value.
OpList
- List of operators, specified as a cell array of character vectors. The default value is {'ADDSUB','CONVERT','COS','MINMAX','MUL','MULTADD','ROUNDING','POW'}.
TargetFrequency
- Target frequency in MHz. The default value is 500 MHz.
HardwareTool
- Hardware applications for generating results, specified as a cell array of character vectors. The default value is {'Vivado','Quartus'}.
LatencyStrategy
- Latency strategy, specified as a cell array of character vectors. The default value is {'Min','Max','Zero','Custom'}.
MantissaMultiplyStrategy
- Mantissa multiply strategy, specified as a cell array of character vectors. The default value is {'Auto','FullMultiplier','PartMultiplierPartAddShift','NoMultiplierFullAddShift'}
DataType
- Datatype used for the benchmarking, specified as character vector. The default value is "single".
VivadoHardwareDetails
orQuartusHardwareDetails
- Hardware details, specified as a structure containing these five fields:
Tool
- Synthesis tool name, specified as character vector.
ChipFamily
- Chip family, specified as character vector.
DeviceName
- Device name, specified as character vector.
PackageName
- Package name, specified as character vector.
SpeedValue
- Hardware speed, specified as character vector.
For example, set the hardware details for Xilinx Vivado and Altera Quartus II.
vivado_hardware_details.Tool = 'Xilinx Vivado'; vivado_hardware_details.ChipFamily = 'Virtex7'; vivado_hardware_details.DeviceName = 'xc7v2000t'; vivado_hardware_details.PackageName = 'fhg1761'; vivado_hardware_details.SpeedValue = '-2'; vivado_hardware_details =
struct with fields:
Tool: 'Xilinx Vivado' ChipFamily: 'Virtex7' DeviceName: 'xc7v2000t' PackageName: 'fhg1761' SpeedValue: '-2'
quartus_hardware_details.Tool = 'Altera QUARTUS II'; quartus_hardware_details.ChipFamily = 'Stratix V'; quartus_hardware_details.DeviceName = '5SGSMD4E1H29C1'; quartus_hardware_details.PackageName = ''; quartus_hardware_details.SpeedValue = ''; quartus_hardware_details =
struct with fields:
Tool: 'Altera QUARTUS II' ChipFamily: 'Stratix V' DeviceName: '5SGSMD4E1H29C1' PackageName: '' SpeedValue: ''
Call the runBenchmarkNFP
function for the specified name-value arguments.
runNFPBenchmarkResults = runBenchmarkNFP('OpList', {'ADDSUB','SQRT','UMINUS'},... 'TargetFrequency',500,... 'VivadoHardwareDetails', vivado_hardware_details,... 'QuartusHardwareDetails', quartus_hardware_details,... 'HardwareTool',{'Vivado','Quartus'},... 'LatencyStrategy',{'Min','Max','Zero','Custom'},... 'MantissaMultiplyStrategy',{'Auto','FullMultiplier', 'PartMultiplierPartAddShift','NoMultiplierFullAddShift'},... 'DataType', 'single');
Customize Synthesis Results
Generating synthesis benchmarks with the hdl_nfp_single_ops_benchmark
model takes several days to finish benchmarking. To generate benchmarks for only a subset of the operators, specify list of operators you are interested in benchmarking to the OpList
argument. Operators such as ADDSUB, CONVERT, ROUNDING, and MINMAX, generate separate blocks for different operations. For example, when you specify MINMAX operator to the OpList
, the function generates synthesis results for both Min and MAX operations.
To generate benchmark synthesis results with your own model using the runBenchmarkNFP
function, ensure that:
Each DUT or subsystem name starts with
op_
.For custom latency benchmarking, each DUT name contains characters required to get the list of operators that support custom latency. See
purgeDutList
function inrunBenchmarkNFP_Vivado
andrunBenchmarkNFP_Quartus
. To view the list of blocks that supports custom latency, see Latency Values of Floating-Point Operators.