# Lidar 3-D Object Detection Using PointPillars Deep Learning

This example shows how to train a PointPillars network for object detection in point clouds.

Lidar point cloud data can be acquired by a variety of lidar sensors, including Velodyne®, Pandar, and Ouster sensors. These sensors capture 3-D position information about objects in a scene, which is useful for many applications in autonomous driving and augmented reality. However, training robust detectors with point cloud data is challenging because of the sparsity of data per object, object occlusions, and sensor noise. Deep learning techniques have been shown to address many of these challenges by learning robust feature representations directly from point cloud data. One deep learning technique for 3-D object detection is PointPillars [1]. Using a similar architecture to PointNet, the PointPillars network extracts dense, robust features from sparse point clouds called pillars, then uses a 2-D deep learning network with a modified SSD object detection network to estimate joint 3-D bounding boxes, orientations, and class predictions.

This example uses the PandaSet [2] data set from Hesai and Scale. PandaSet contains 8240 unorganized lidar point cloud scans of various city scenes captured using a Pandar64 sensor. The data set provides 3-D bounding box labels for 18 different object classes, including car, truck, and pedestrian.

This example uses a subset of PandaSet that contains 2560 preprocessed organized point clouds. Each point cloud covers ${360}^{\mathit{o}}$ of view, and is specified as a 64-by-1856 matrix. The point clouds are stored in PCD format and their corresponding ground truth data is stored in the PandaSetLidarGroundTruth.mat file. The file contains 3-D bounding box information for three classes, which are car, truck, and pedestrian. The size of the data set is 5.2 GB.

doTraining = false;

outputFolder = fullfile(tempdir,'Pandaset');

lidarURL = ['https://ssd.mathworks.com/supportfiles/lidar/data/' ...
'Pandaset_LidarData.tar.gz'];

Create a file datastore to load the PCD files from the specified path using the pcread (Computer Vision Toolbox) function.

path = fullfile(outputFolder,'Lidar');

Load the 3-D bounding box labels of the car and truck objects.

gtPath = fullfile(outputFolder,'Cuboids','PandaSetLidarGroundTruth.mat');
Labels = timetable2table(data.lidarGtLabels);
boxLabels = Labels(:,2:3);

Display the full-view point cloud.

figure
ax = pcshow(ptCld.Location);
set(ax,'XLim',[-50 50],'YLim',[-40 40]);
zoom(ax,2.5);
axis off;

reset(lidarData);

### Preprocess Data

The PandaSet data consists of full-view point clouds. For this example, crop the full-view point clouds to front-view point clouds using the standard parameters [1]. These parameters determine the size of the input passed to the network. Selecting a smaller range of point clouds along the x, y, and z-axis helps detect objects that are closer to the origin and also decreases the overall training time of the network.

xMin = 0.0;     % Minimum value along X-axis.
yMin = -39.68;  % Minimum value along Y-axis.
zMin = -5.0;    % Minimum value along Z-axis.
xMax = 69.12;   % Maximum value along X-axis.
yMax = 39.68;   % Maximum value along Y-axis.
zMax = 5.0;     % Maximum value along Z-axis.
xStep = 0.16;   % Resolution along X-axis.
yStep = 0.16;   % Resolution along Y-axis.
dsFactor = 2.0; % Downsampling factor.

% Calculate the dimensions for the pseudo-image.
Xn = round(((xMax - xMin) / xStep));
Yn = round(((yMax - yMin) / yStep));

% Define point cloud parameters.
pointCloudRange = [xMin,xMax,yMin,yMax,zMin,zMax];
voxelSize = [xStep,yStep];

Use the cropFrontViewFromLidarData helper function, attached to this example as a supporting file, to:

• Crop the front view from the input full-view point cloud.

• Select the box labels that are inside the ROI specified by gridParams.

[croppedPointCloudObj,processedLabels] = cropFrontViewFromLidarData(...
lidarData,boxLabels,pointCloudRange);
Processing data 100% complete

Display the cropped point cloud and the ground truth box labels using the helperDisplay3DBoxesOverlaidPointCloud helper function defined at the end of the example.

pc = croppedPointCloudObj{1,1};
gtLabelsCar = processedLabels.Car{1};
gtLabelsTruck = processedLabels.Truck{1};

helperDisplay3DBoxesOverlaidPointCloud(pc.Location,gtLabelsCar,...
'green',gtLabelsTruck,'magenta','Cropped Point Cloud');

reset(lidarData);

### Create Datastore Objects for Training

Split the data set into training and test sets. Select 70% of the data for training the network and the rest for evaluation.

rng(1);
shuffledIndices = randperm(size(processedLabels,1));
idx = floor(0.7 * length(shuffledIndices));

trainData = croppedPointCloudObj(shuffledIndices(1:idx),:);
testData = croppedPointCloudObj(shuffledIndices(idx+1:end),:);

trainLabels = processedLabels(shuffledIndices(1:idx),:);
testLabels = processedLabels(shuffledIndices(idx+1:end),:);

So that you can easily access the datastores, save the training data as PCD files by using the saveptCldToPCD helper function, attached to this example as a supporting file. You can set writeFiles to "false" if your training data is saved in a folder and is supported by the pcread function.

writeFiles = true;
dataLocation = fullfile(outputFolder,'InputData');
[trainData,trainLabels] = saveptCldToPCD(trainData,trainLabels,...
dataLocation,writeFiles);
Processing data 100% complete

Create a file datastore using fileDatastore to load PCD files using the pcread (Computer Vision Toolbox) function.

Createa box label datastore using boxLabelDatastore (Computer Vision Toolbox) for loading the 3-D bounding box labels.

bds = boxLabelDatastore(trainLabels);

Use the combine function to combine the point clouds and 3-D bounding box labels into a single datastore for training.

cds = combine(lds,bds);

### Data Augmentation

This example uses ground truth data augmentation and several other global data augmentation techniques to add more variety to the training data and corresponding boxes. For more information on typical data augmentation techniques used in 3-D object detection workflows with lidar data, see Data Augmentations for Lidar Object Detection Using Deep Learning (Lidar Toolbox).

Read and display a point cloud before augmentation using the helperDisplay3DBoxesOverlaidPointCloud helper function, defined at the end of the example..

augptCld = augData{1,1};
augLabels = augData{1,2};
augClass = augData{1,3};

labelsCar = augLabels(augClass=='Car',:);
labelsTruck = augLabels(augClass=='Truck',:);

helperDisplay3DBoxesOverlaidPointCloud(augptCld.Location,labelsCar,'green',...
labelsTruck,'magenta','Before Data Augmentation');

reset(cds);

Use the sampleGroundTruthObjectsFromLidarData helper function, attached to this example as a supporting file, to extract all the ground truth bounding boxes from the training data.

classNames = {'Car','Truck'};
sampleLocation = fullfile(tempdir,'GTsamples');
[sampledGTData,indices] = sampleGroundTruthObjectsFromLidarData(cds,classNames,...
'MinPoints',20,'sampleLocation',sampleLocation);

Use the augmentGroundTruthObjectsToLidarData helper function, attached to this example as a supporting file, to randomly add a fixed number of car and truck class objects to every point cloud. Use the transform function to apply the ground truth and custom data augmentations to the training data.

numObjects = [10,10];
cdsAugmented = transform(cds,@(x) augmentGroundTruthObjectsToLidarData(x,...
sampledGTData,indices,classNames,numObjects));

In addition, apply the following data augmentations to every point cloud.

• Random flipping along the x-axis

• Random scaling by 5 percent

• Random rotation along the z-axis from [-pi/4, pi/4]

• Random translation by [0.2, 0.2, 0.1] meters along the x-, y-, and z-axis respectively

cdsAugmented = transform(cdsAugmented,@(x) augmentData(x));

Display an augmented point cloud along with ground truth augmented boxes using the helperDisplay3DBoxesOverlaidPointCloud helper function, defined at the end of the example.

augptCld = augData{1,1};
augLabels = augData{1,2};
augClass = augData{1,3};

labelsCar = augLabels(augClass=='Car',:);
labelsTruck = augLabels(augClass=='Truck',:);

helperDisplay3DBoxesOverlaidPointCloud(augptCld.Location,labelsCar,'green',...
labelsTruck,'magenta','After Data Augmentation');

reset(cdsAugmented);

### Create PointPillars Object Detector

Use the pointPillarsObjectDetector (Lidar Toolbox) function to create a PointPillars object detection network automatically. The PointPillars network uses a simplified version of the PointNet network that takes pillar features as input. For each pillar feature, the network applies a linear layer, followed by batch normalization and ReLU layers. Finally, the network applies a max-pooling operation over the channels to get high-level encoded features. These encoded features are scattered back to the original pillar locations to create a pseudo-image. The network then processes the pseudo-image with a 2-D convolutional backbone followed by various SSD detection heads to predict the 3-D bounding boxes along with its classes.

The PointPillars network present in the PointPillars detector is illustrated in the following diagram.

You can use Deep Network Designer to create the network shown in the diagram.

The pointPillarsObjectDetector function requires you to specify several inputs that parameterize the PointPillars network:

• Class names

• Anchor boxes

• Point cloud range

• Voxel size

• Number of prominent pillars

• Number of points per pillar

% Define number of prominent pillars.
P = 12000;

% Define number of points per pillar.
N = 100;

% Estimate anchor boxes from training data.
anchorBoxes = calculateAnchorsPointPillars(trainLabels);
classNames = trainLabels.Properties.VariableNames;

% Define the PointPillars detector.
detector = pointPillarsObjectDetector(pointCloudRange,classNames,anchorBoxes,...
'VoxelSize',voxelSize,'NumPillars',P,'NumPointsPerPillar',N);

If more control is required over the PointPillars network architecture, you can design the network manually. For more information, see Design a PointPillars Network.

### Train Pointpillars Object Detector

Specify the network training parameters using trainingOptions. Set 'CheckpointPath' to a temporary location to enable saving of partially trained detectors during the training process. If training is interrupted, you can resume training from the saved checkpoint.

Train the detector using a CPU or GPU. Using a GPU requires Parallel Computing Toolbox™ and a CUDA® enabled NVIDIA® GPU. For more information, see GPU Support by Release (Parallel Computing Toolbox). To automatically detect if you have a GPU available, set executionEnvironment to "auto". If you do not have a GPU, or do not want to use one for training, set executionEnvironment to "cpu". To ensure the use of a GPU for training, set executionEnvironment to "gpu".

executionEnvironment = "auto";
if canUseParallelPool
dispatchInBackground = true;
else
dispatchInBackground = false;
end

'Plots',"none",...
'MaxEpochs',60,...
'MiniBatchSize',3,...
'LearnRateSchedule',"piecewise",...
'InitialLearnRate',0.0002,...
'LearnRateDropPeriod',15,...
'LearnRateDropFactor',0.8,...
'ExecutionEnvironment',executionEnvironment,...
'DispatchInBackground',dispatchInBackground,...
'BatchNormalizationStatistics','moving',...
'ResetInputNormalization',false,...
'CheckpointPath',tempdir);

Use trainPointPillarsObjectDetector (Lidar Toolbox) function to train the PointPillars object detector if doTraining is true. Otherwise, load the pretrained detector.

if doTraining
[detector,info] = trainPointPillarsObjectDetector(cdsAugmented,detector,options);
else
detector = pretrainedDetector.detector;
end

### Generate Detections

Use the trained network to detect objects in the test data:

• Read the point cloud from the test data.

• Run the detector on the test point cloud to get the predicted bounding boxes and confidence scores.

• Display the point cloud with bounding boxes using the helperDisplay3DBoxesOverlaidPointCloud helper function, defined at the end of the example.

ptCloud = testData{45,1};
gtLabels = testLabels(45,:);

% Specify the confidence threshold to use only detections with
% confidence scores above this value.
confidenceThreshold = 0.5;
[box,score,labels] = detect(detector,ptCloud,'Threshold',confidenceThreshold);

boxlabelsCar = box(labels'=='Car',:);
boxlabelsTruck = box(labels'=='Truck',:);

% Display the predictions on the point cloud.
helperDisplay3DBoxesOverlaidPointCloud(ptCloud.Location,boxlabelsCar,'green',...
boxlabelsTruck,'magenta','Predicted Bounding Boxes');

### Evaluate Detector Using Test Set

Evaluate the trained object detector on a large set of point cloud data to measure the performance.

numInputs = 50;

% Generate rotated rectangles from the cuboid labels.
bds = boxLabelDatastore(testLabels(1:numInputs,:));
groundTruthData = transform(bds,@(x) createRotRect(x));

% Set the threshold values.
nmsPositiveIoUThreshold = 0.5;
confidenceThreshold = 0.25;

detectionResults = detect(detector,testData(1:numInputs,:),...
'Threshold',confidenceThreshold);

% Convert to rotated rectangles format for calculating metrics
for i = 1:height(detectionResults)
box = detectionResults.Boxes{i};
detectionResults.Boxes{i} = box(:,[1,2,4,5,7]);
end

metrics = evaluateDetectionAOS(detectionResults,groundTruthData,...
nmsPositiveIoUThreshold);
disp(metrics(:,1:2))
AOS        AP
_______    _______

Car      0.89735    0.89735
Truck      0.758      0.758

### Helper Functions

% Download the data set from the given URL to the output folder.

lidarDataTarFile = fullfile(outputFolder,'Pandaset_LidarData.tar.gz');

if ~exist(lidarDataTarFile,'file')
mkdir(outputFolder);

websave(lidarDataTarFile,lidarURL);
untar(lidarDataTarFile,outputFolder);
end

% Extract the file.
if (~exist(fullfile(outputFolder,'Lidar'),'dir'))...
&&(~exist(fullfile(outputFolder,'Cuboids'),'dir'))
untar(lidarDataTarFile,outputFolder);
end

end

function helperDisplay3DBoxesOverlaidPointCloud(ptCld,labelsCar,carColor,...
labelsTruck,truckColor,titleForFigure)
% Display the point cloud with different colored bounding boxes for different
% classes.
figure;
ax = pcshow(ptCld);
showShape('cuboid',labelsCar,'Parent',ax,'Opacity',0.1,...
'Color',carColor,'LineWidth',0.5);
hold on;
showShape('cuboid',labelsTruck,'Parent',ax,'Opacity',0.1,...
'Color',truckColor,'LineWidth',0.5);
title(titleForFigure);
zoom(ax,1.5);
end

#### References

[1] Lang, Alex H., Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. "PointPillars: Fast Encoders for Object Detection From Point Clouds." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12689-12697. Long Beach, CA, USA: IEEE, 2019. https://doi.org/10.1109/CVPR.2019.01298.

[2] Hesai and Scale. PandaSet. https://scale.com/open-datasets/pandaset.