Main Content

Train Classification Network to Classify Object in 3-D Point Cloud

This example demontrates the approach outlined in [1] in which point cloud data is preprocessed into a voxelized encoding and then used directly with a simple 3-D convolutional neural network architecture to perform object classification. In more recent approaches such as [2], encodings of point cloud data can be more complicated and can be learned encodings that are trained end-to-end along with a network performing a classification/object detection/segmentation task. However, the general pattern of moving from irregular unordered points to a gridded structure that can be fed into convnets remains similar in all of these appraoches.

Import and Analyze Data

In this example, we work with the Sydney Urban Objects Dataset. In this example, we use folds 1-3 from the data as the training set and fold 4 as the validation set.

dataPath = downloadSydneyUrbanObjects(tempdir);
dsTrain = loadSydneyUrbanObjectsData(dataPath,[1 2 3]);
dsVal = loadSydneyUrbanObjectsData(dataPath,4);

Analyze the training set to understand the labels present in the data and the overall distribution of labels.

dsLabels = transform(dsTrain,@(data) data{2});
labels = readall(dsLabels);
figure
histogram(labels)

From the histogram, it is apparent that there is a class imbalance issue in the training data in which certain object classes like Car and Pedestrian are much more common than less frequent classes like Ute.

Data augmentation pipeline

To avoid overfitting and add robustness to a classifier, some amount of randomized data augmentation is generally a good idea when training a network. The functions randomAffine2d and pctransform make it easy to define randomized affine transformations on point cloud data. We additionally add some randomized per-point jitter to each point in every point cloud. The function augmentPointCloudData is included in the supporting functions section below.

dsTrain = transform(dsTrain,@augmentPointCloudData);

Verify that augmentation of point cloud data looks reasonable.

dataOut = preview(dsTrain);
figure
pcshow(dataOut{1});
title(dataOut{2});

We next add a simple voxelization transform to each input point cloud as discussed in the previous example, to transform our input point cloud into a pseudo-image that can be used with a convolutional neural network. Use a simple occupancy grid.

dsTrain = transform(dsTrain,@formOccupancyGrid);
dsVal = transform(dsVal,@formOccupancyGrid);

Examine a sample of the final voxelized volume that we will feed into the network to verify that voxelixation is working correctly.

data = preview(dsTrain);
figure
p = patch(isosurface(data{1},0.5));
p.FaceColor = 'red';
p.EdgeColor = 'none';
daspect([1 1 1])
view(45,45)
camlight; 
lighting phong
title(data{2});

Define network architecture

In this example, we use a simple 3-D classification architecture as described in [1].

layers = [image3dInputLayer([32 32 32],'Name','inputLayer','Normalization','none'),...
    convolution3dLayer(5,32,'Stride',2,'Name','Conv1'),...
    leakyReluLayer(0.1,'Name','leakyRelu1'),...
    convolution3dLayer(3,32,'Stride',1,'Name','Conv2'),...
    leakyReluLayer(0.1,'Name','leakyRulu2'),...
    maxPooling3dLayer(2,'Stride',2,'Name','maxPool'),...
    fullyConnectedLayer(128,'Name','fc1'),...
    reluLayer('Name','relu'),...
    dropoutLayer(0.5,'Name','dropout1'),...
    fullyConnectedLayer(14,'Name','fc2'),...
    softmaxLayer('Name','softmax'),...
    classificationLayer('Name','crossEntropyLoss')];

voxnet = layerGraph(layers);
figure
plot(voxnet);

Setup training options

Use stochastic gradient descent with momentum with a piecewise adjustment to the learning rate schedule. This example was run on a TitanX GPU, for GPUs with less memory, it may be necessary to reduce the batch size. Though 3D convnets have an advantage of conceptual simplicity, they have the drawback of large amounts of memory useage at training time.

miniBatchSize = 32;
dsLength = length(dsTrain.UnderlyingDatastore.Files);
iterationsPerEpoch = floor(dsLength/miniBatchSize);
dropPeriod = floor(8000/iterationsPerEpoch);

options = trainingOptions('sgdm','InitialLearnRate',0.01,'MiniBatchSize',miniBatchSize,...
    'LearnRateSchedule','Piecewise',...
    'LearnRateDropPeriod',dropPeriod,...
    'ValidationData',dsVal,'MaxEpochs',60,...
    'DispatchInBackground',false,...
    'Shuffle','never');

Train network

voxnet = trainNetwork(dsTrain,voxnet,options);
Training on single CPU.
|======================================================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Validation  |  Mini-batch  |  Validation  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |   Accuracy   |     Loss     |     Loss     |      Rate       |
|======================================================================================================================|
|       1 |           1 |       00:00:12 |        0.00% |        3.23% |       2.6579 |       2.6466 |          0.0100 |
|       4 |          50 |       00:01:53 |       31.25% |       29.03% |       2.1520 |       2.3095 |          0.0100 |
|       8 |         100 |       00:03:33 |       28.12% |       36.77% |       2.2633 |       2.1510 |          0.0100 |
|      12 |         150 |       00:05:11 |       43.75% |       46.45% |       2.0506 |       1.9057 |          0.0100 |
|      16 |         200 |       00:06:49 |       37.50% |       52.26% |       1.8627 |       1.6161 |          0.0100 |
|      20 |         250 |       00:08:35 |       50.00% |       59.35% |       1.8573 |       1.4587 |          0.0100 |
|      24 |         300 |       00:10:14 |       34.38% |       58.06% |       1.8636 |       1.4360 |          0.0100 |
|      27 |         350 |       00:11:51 |       62.50% |       61.94% |       1.4174 |       1.3093 |          0.0100 |
|      31 |         400 |       00:13:31 |       65.62% |       64.52% |       1.1966 |       1.2727 |          0.0100 |
|      35 |         450 |       00:15:09 |       56.25% |       61.94% |       1.3562 |       1.2473 |          0.0100 |
|      39 |         500 |       00:16:49 |       62.50% |       66.45% |       1.2819 |       1.1354 |          0.0100 |
|      43 |         550 |       00:18:27 |       56.25% |       65.16% |       1.4563 |       1.1351 |          0.0100 |
|      47 |         600 |       00:20:05 |       56.25% |       66.45% |       1.3096 |       1.1142 |          0.0100 |
|      50 |         650 |       00:21:40 |       56.25% |       65.16% |       1.0104 |       1.1023 |          0.0100 |
|      54 |         700 |       00:23:21 |       75.00% |       70.32% |       0.9403 |       1.0848 |          0.0100 |
|      58 |         750 |       00:25:00 |       65.62% |       71.61% |       1.0909 |       1.1003 |          0.0100 |
|      60 |         780 |       00:25:59 |       65.62% |       72.26% |       0.9628 |       1.0406 |          0.0100 |
|======================================================================================================================|

Evaluate network

Following the structure of [1], this example only forms a training and validation set from Sydney Urban Objects. Evaluate the performance of the trained network using the validation, since it was not used to train the network.

valLabelSet = transform(dsVal,@(data) data{2});
valLabels = readall(valLabelSet);
outputLabels = classify(voxnet,dsVal);
accuracy = nnz(outputLabels == valLabels) / numel(outputLabels);
disp(accuracy)
    0.7226

View the confusion matrix to study the accuracy across the various label categories

confusionchart(valLabels,outputLabels) 

The label imbalance noted in the training set is an issue in the classification accuracy. The confusion chart illustrates higher precision and recall for pedestrian, the most common class, than for less common classes like van. Since the purpose of this example is to demonstate a basic classification network training approach with point cloud data, possible next steps that could be taken to improve classification performance such as resampling the training set or achieve better label balance or using a loss function more robust to label imbalance (e.g. weighted cross-entropy) will not be explored.

References

1) Voxnet: A 3d convolutional neural network for real-time object recognition, Daniel Maturana, Sebastian Scherer, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

2) PointPillars: Fast Encoders for Object Detection from Point Clouds, Alex H. Lang, Sourabh Vora, et al, CVPR 2019

3) Sydney Urban Objects Dataset, Alastair Quadros, James Underwood, Bertrand Douillard, Sydney Urban Objects

Supporting Functions

function datasetPath = downloadSydneyUrbanObjects(dataLoc)

if nargin == 0
    dataLoc = pwd();
end

dataLoc = string(dataLoc);

url = "http://www.acfr.usyd.edu.au/papers/data/";
name = "sydney-urban-objects-dataset.tar.gz";

if ~exist(fullfile(dataLoc,'sydney-urban-objects-dataset'),'dir')
    disp('Downloading Sydney Urban Objects Dataset...');
    untar(url+name,dataLoc);
end

datasetPath = dataLoc.append('sydney-urban-objects-dataset');

end

function ds = loadSydneyUrbanObjectsData(datapath,folds)
% loadSydneyUrbanObjectsData Datastore with point clouds and
% associated categorical labels for Sydney Urban Objects dataset.
%
% ds = loadSydneyUrbanObjectsData(datapath) constructs a datastore that
% represents point clouds and associated categories for the Sydney Urban
% Objects dataset. The input, datapath, is a string or char array which
% represents the path to the root directory of the Sydney Urban Objects
% Dataset.
%
% ds = loadSydneyUrbanObjectsData(___,folds) optionally allows
% specification of desired folds that you wish to be included in the
% output ds. For example, [1 2 4] specifies that you want the first,
% second, and fourth folds of the Dataset. Default: [1 2 3 4].

if nargin < 2
    folds = 1:4;
end

datapath = string(datapath);
path = fullfile(datapath,'objects',filesep);

% For now, include all folds in Datastore
foldNames{1} = importdata(fullfile(datapath,'folds','fold0.txt'));
foldNames{2} = importdata(fullfile(datapath,'folds','fold1.txt'));
foldNames{3} = importdata(fullfile(datapath,'folds','fold2.txt'));
foldNames{4} = importdata(fullfile(datapath,'folds','fold3.txt'));
names = foldNames(folds);
names = vertcat(names{:});

fullFilenames = append(path,names);
ds = fileDatastore(fullFilenames,'ReadFcn',@extractTrainingData,'FileExtensions','.bin');

% Shuffle
ds.Files = ds.Files(randperm(length(ds.Files)));

end

function dataOut = extractTrainingData(fname)

[pointData,intensity] = readbin(fname);

[~,name] = fileparts(fname);
name = string(name);
name = extractBefore(name,'.');
name = replace(name,'_',' ');

labelNames = ["4wd","building","bus","car","pedestrian","pillar",...
    "pole","traffic lights","traffic sign","tree","truck","trunk","ute","van"];

label = categorical(name,labelNames);

dataOut = {pointCloud(pointData,'Intensity',intensity),label};

end

function [pointData,intensity] = readbin(fname)
% readbin Read point and intensity data from Sydney Urban Object binary
% files.

% names = ['t','intensity','id',...
%          'x','y','z',...
%          'azimuth','range','pid']
% 
% formats = ['int64', 'uint8', 'uint8',...
%            'float32', 'float32', 'float32',...
%            'float32', 'float32', 'int32']

fid = fopen(fname, 'r');
c = onCleanup(@() fclose(fid));
    
fseek(fid,10,-1); % Move to the first X point location 10 bytes from beginning
X = fread(fid,inf,'single',30);
fseek(fid,14,-1);
Y = fread(fid,inf,'single',30);
fseek(fid,18,-1);
Z = fread(fid,inf,'single',30);

fseek(fid,8,-1);
intensity = fread(fid,inf,'uint8',33);

pointData = [X,Y,Z];

end

function dataOut = formOccupancyGrid(data)

grid = pcbin(data{1},[32 32 32]);
occupancyGrid = zeros(size(grid),'single');
for ii = 1:numel(grid)
    occupancyGrid(ii) = ~isempty(grid{ii});
end
label = data{2};
dataOut = {occupancyGrid,label};

end

function dataOut = augmentPointCloudData(data)

ptCloud = data{1};
label = data{2};

% Apply randomized rotation about Z axis.
tform = randomAffine3d('Rotation',@() deal([0 0 1],360*rand),'Scale',[0.98,1.02],'XReflection',true,'YReflection',true); % Randomized rotation about z axis
ptCloud = pctransform(ptCloud,tform);

% Apply jitter to each point in point cloud
amountOfJitter = 0.01;
numPoints = size(ptCloud.Location,1);
D = zeros(size(ptCloud.Location),'like',ptCloud.Location);
D(:,1) = diff(ptCloud.XLimits)*rand(numPoints,1);
D(:,2) = diff(ptCloud.YLimits)*rand(numPoints,1);
D(:,3) = diff(ptCloud.ZLimits)*rand(numPoints,1);
D = amountOfJitter.*D;
ptCloud = pctransform(ptCloud,D);

dataOut = {ptCloud,label};

end