Accelerate Link-Level Simulations with Parallel Processing
This example shows how to accelerate link-level simulations by using a cluster of workers from a parallel pool.
Introduction
Link-level simulations require a large number of frames to provide statistically valid results. Therefore, these simulations can take a long time to run. Parallel computing is a common technique to speed up these simulations. This example shows how to run link-level simulations by using MATLAB® workers from a parallel pool (requires Parallel Computing Toolbox™).
Parallel Computing Toolbox enables you to use the full processing power of multicore desktops by executing applications on workers (MATLAB computational engines) that run locally. Without changing the code, you can run the same applications on clusters or clouds.
For an example of how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.
You can parallelize the link-level simulation over a number of parallel workers. Each worker runs the same link simulation with different random processes to generate random bits and noise samples. Each worker simulates slots. Therefore, the total number of slots in this simulation is . The example combines the resulting throughput measurements for each worker to produce the overall throughput. Each worker runs all the required SNR points.
To show how to speed up link-level simulations by using parallel processing, this example uses a simplified link-level simulation modeling a 5G link with one antenna, one layer, AWGN channel, and no HARQ.
Set Simulation Parameters
Set the SNR points and the overall number of frames to simulate.
SNRdB = 5.7:0.1:6.2; % SNR in dB numFrames = 12; % Number of frames to simulate
Configure the carrier, PDSCH, and DL-SCH.
carrier = nrCarrierConfig; pdsch = nrPDSCHConfig; pdsch.Modulation = "16QAM"; pdsch.PRBSet = 0:carrier.NSizeGrid-1; % Full band allocation [encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder();
Configure Parallel Pool
By default, this example enables parallel execution. Alternatively, you can disable parallel execution, for example, when debugging your code.
enableParallelism = true;
Create a parallel pool and get the number of workers if parallel execution is enabled.
if (enableParallelism) pool = gcp; % create parallel pool, requires Parallel Computing Toolbox numWorkers = pool.NumWorkers; maxNumWorkers = pool.NumWorkers; else numWorkers = 1; % No parallelism maxNumWorkers = 0; % Used to convert the parfor-loop into a for-loop end
Starting parallel pool (parpool) using the 'Processes' profile ... 01-Jul-2024 12:17:03: Job Queued. Waiting for parallel pool job with ID 1 to start ... 01-Jul-2024 12:18:04: Job Queued. Waiting for parallel pool job with ID 1 to start ... Connected to parallel pool with 12 workers.
Configure Random Number Generator
To reproduce the same set of random bits and noise samples in a parfor
-loop each time the loop runs, you must control random generation by assigning a particular substream for each worker. First, create a constant random stream to avoid unnecessary copying of the random stream multiple times to each worker. Use a generator with substream support. Substreams provide mutually independent random streams to each worker. For information about random number streams on workers, see Control Random Number Streams on Workers (Parallel Computing Toolbox) and Repeat Random Numbers in parfor-Loops (Parallel Computing Toolbox).
randStr = RandStream('Threefry','Seed',0); constantStream = parallel.pool.Constant(randStr);
Simulate PDSCH Throughput
Calculate the number of slots per worker by taking into account the number of frames to simulate and the available number of workers. Use the ceil
function to ensure that all workers simulate the same number of slots. This operation may result in the total number of frames simulated being slightly larger than the value specified in numFrames
.
% Calculate the number of slots per worker numSlotsPerWorker = ceil((numFrames*carrier.SlotsPerFrame)/numWorkers); disp("Parallel execution: "+enableParallelism)
Parallel execution: true
Display the number of workers. This value depends on the workers available to you and the settings of your parallel pool. This example sets the number of workers to 1 if enableParallelism = false
.
disp("Number of workers: "+numWorkers)
Number of workers: 12
disp("Number of slots per worker: "+numSlotsPerWorker)
Number of slots per worker: 10
The simulation is based on a parallel loop that uses the workers from the parallel pool. By setting maxNumWorkers = 0
, you can switch between parallel and serial execution when testing your code. This setting allows you to debug your code. You cannot set a breakpoint in the body of the parfor-
loop, but you can set breakpoints within functions called from the body of the parfor-
loop.
% Results storage numSNRPoints = numel(SNRdB); numSlotErrorsPerWorker = zeros(numWorkers,numSNRPoints); simulatedBitsPerWorker = zeros(numWorkers,numSNRPoints); numCorrectBitsPerWorker = zeros(numWorkers,numSNRPoints); % Parallel processing, worker parfor-loop parfor (workerIdx = 1:numWorkers,maxNumWorkers) % Set random streams to ensure repeatability % Use substreams in the generator so each worker uses mutually independent streams stream = constantStream.Value; % Extract the stream from the Constant stream.Substream = workerIdx; % Set substream value = parfor index RandStream.setGlobalStream(stream); % Set global stream per worker % Per worker processing: PDSCH link resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker); % Gather results numSlotErrorsPerWorker(workerIdx,:) = resultsPerWorker.NumSlotErrors; simulatedBitsPerWorker(workerIdx,:) = resultsPerWorker.NumBits; numCorrectBitsPerWorker(workerIdx,:) = resultsPerWorker.NumCorrectBits; end % parfor % Combine results from all workers totalNumTrBlkErrors = sum(numSlotErrorsPerWorker,1); totalSimulatedTrBlks = numSlotsPerWorker*numWorkers*ones(1,numSNRPoints); totalSimulatedFrames = totalSimulatedTrBlks/carrier.SlotsPerFrame; totalsimulatedBits = sum(simulatedBitsPerWorker,1); totalCorrectBits = sum(numCorrectBitsPerWorker,1); % Throughput results calculation throughput = 100*(1-totalNumTrBlkErrors./totalSimulatedTrBlks); throughputMbps = 1e-6*totalCorrectBits/(numFrames*10e-3); ResultsTable = table(SNRdB.',totalsimulatedBits.',totalNumTrBlkErrors.',totalSimulatedTrBlks.',totalSimulatedFrames.',throughput.',throughputMbps.'); ResultsTable.Properties.VariableNames = ["SNR" "Simulated bits" "Tr Block errors" "Number of Tr Blocks" "Number of frames" "Throughput (%)" "Throughput (Mbps)"]; disp(ResultsTable)
SNR Simulated bits Tr Block errors Number of Tr Blocks Number of frames Throughput (%) Throughput (Mbps) ___ ______________ _______________ ___________________ ________________ ______________ _________________ 5.7 1.8749e+06 120 120 12 0 0 5.8 1.8749e+06 108 120 12 10 1.5624 5.9 1.8749e+06 67 120 12 44.167 6.9006 6 1.8749e+06 31 120 12 74.167 11.588 6.1 1.8749e+06 6 120 12 95 14.843 6.2 1.8749e+06 1 120 12 99.167 15.494
Accelerate Simulation
You can reduce the simulation time by increasing the number of workers. You can use all the workers on your local machine or use multiple workers in a cluster. You do not need to set the number of workers in the example code. To configure the number of workers, use the Cluster Profile Manager in the Parallel menu on the MATLAB® Home tab. For more information on how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.
The table shows the results of running the example three times for 1000 frames with different worker configurations.
1 Worker on Desktop (No Parallelism) | 6 Workers on Desktop | 96 Workers in Cluster | |
---|---|---|---|
Simulation Time | 3543 sec (~1 hr) | 983 sec (~16 min) | 108 sec (~1.8 min) |
Local Functions
function resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker) % Simplified PDSCH link simulation executed by all workers resultsPerWorker.NumSlotErrors = zeros(1,numel(SNRdB)); resultsPerWorker.NumBits = zeros(1,numel(SNRdB)); resultsPerWorker.NumCorrectBits = zeros(numel(SNRdB),1); ofdmInfo = nrOFDMInfo(carrier); % for all SNR points for snrIdx = 1:length(SNRdB) % Noise power calculation SNR = 10^(SNRdB(snrIdx)/10); % Linear noise gain % No need to normalize N0 by the number of receive antennas as % there is only one N0 = 1/sqrt(double(ofdmInfo.Nfft)*SNR); % Process all the slots per worker for nSlot = 0:numSlotsPerWorker-1 % New slot number carrier.NSlot = nSlot; % Transmit and receive slot (AWGN channel) [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0); % Store results resultsPerWorker.NumSlotErrors(snrIdx) = resultsPerWorker.NumSlotErrors(snrIdx)+blkerr; resultsPerWorker.NumBits(snrIdx) = resultsPerWorker.NumBits(snrIdx)+trBlkSize; resultsPerWorker.NumCorrectBits(snrIdx) = resultsPerWorker.NumCorrectBits(snrIdx)+sum(~blkerr .* trBlkSize); end % for nSlot = 0:numSlotsPerWorker-1 end % for all SNR points end function [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0) % Generate PDSCH indices info and indices for present slot [pdschIndices,pdschInfo] = nrPDSCHIndices(carrier,pdsch); % Calculate transport block sizes trBlkSize = nrTBS(pdsch.Modulation,pdsch.NumLayers,numel(pdsch.PRBSet),pdschInfo.NREPerPRB,encodeDLSCH.TargetCodeRate,0); % Get new transport blocks (single codeword) and flush decoder soft buffer trBlk = randi([0 1],trBlkSize,1); setTransportBlock(encodeDLSCH,trBlk); decodeDLSCH.TransportBlockLength = trBlkSize; resetSoftBuffer(decodeDLSCH,0); % DL-SCH encoding codedTrBlock = encodeDLSCH(pdsch.Modulation,pdsch.NumLayers,pdschInfo.G,0); % PDSCH encoding pdschSymbols = nrPDSCH(carrier,pdsch,codedTrBlock); % Create resource grid and map PDSCH pdschGrid = nrResourceGrid(carrier,1,"OutputDataType","single"); pdschGrid(pdschIndices) = pdschSymbols; % OFDM modulation [txWaveform,waveformInfo] = nrOFDMModulate(carrier,pdschGrid); % AWGN channel noise = N0*randn(size(txWaveform),"like",txWaveform); rxWaveform = txWaveform + noise; % OFDM demodulation rxGrid = nrOFDMDemodulate(carrier,rxWaveform); % Extract PDSCH pdschRx = nrExtractResources(pdschIndices,rxGrid); % PDSCH decoding, assume noise variance is known noiseEst = (N0.^2*waveformInfo.Nfft); [dlschLLRs,~] = nrPDSCHDecode(carrier,pdsch,pdschRx,noiseEst); % DL-SCH decoding [~,blkerr] = decodeDLSCH(dlschLLRs,pdsch.Modulation,pdsch.NumLayers,0); end function [encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder() % Coding rate codeRate = 490/1024; % Create DL-SCH encoder object encodeDLSCH = nrDLSCH; encodeDLSCH.MultipleHARQProcesses = false; encodeDLSCH.TargetCodeRate = codeRate; % Create DL-SCH decoder object decodeDLSCH = nrDLSCHDecoder; decodeDLSCH.MultipleHARQProcesses = false; decodeDLSCH.TargetCodeRate = codeRate; decodeDLSCH.LDPCDecodingAlgorithm = "Normalized min-sum"; decodeDLSCH.MaximumLDPCIterationCount = 20; end
See Also
parfor
(Parallel Computing Toolbox)
Related Topics
- Scale Up from Desktop to Cluster (Parallel Computing Toolbox)
- Control Random Number Streams on Workers (Parallel Computing Toolbox)
- Repeat Random Numbers in parfor-Loops (Parallel Computing Toolbox)