Visual Tracking of Occluded and Unresolved Objects
This example shows how to resolve challenging tracking scenarios when objects are occluded or when they are in close proximity to each other. The example revisits the Motion-Based Multiple Object Tracking (Computer Vision Toolbox) example available in the Computer Vision Toolbox™. The problem of motion-based object tracking can be divided into two parts:
Detecting moving objects in each frame
Tracking the objects detected in each video frame over time
The example uses multi-object trackers available in the Sensor Fusion and Tracking Toolbox™ to elaborate on the tracking part, which includes the following stages:
Associating the detections corresponding to the same object over time
Managing the emergence and disappearance of objects in the scene
Filtering the noisy measurements made by the detector
Understand the Challenges in Video-Based Tracking
This section presents two major challenges of tracking moving objects in a video frame: Detecting the objects in the presence of occlusion and providing resolved detections when the objects are close to each other.
Video and Detector
Define a video reader and video player. This example is based on the atrium video, in which individuals are walking in an atrium with some plants that can potentially occlude the people.
filename = "atrium.mp4";
vidReader = VideoReader(filename);
vidPlayer = vision.DeployableVideoPlayer;
One way to detect moving objects when the camera is static is to analyze changes in the video frame, called foreground, relative to the static frame, considered background. The following code section creates the detector objects that separate foreground from background and connect areas of foreground into blobs. A blob detector is a simple, yet effective, detector because it does not require any prior knowledge about the moving objects.
minBlobArea = 400; % Minimum blob size, in pixels, to be considered as a detection
detectorObjects = setupDetectorObjects(minBlobArea);
Run the video and observe the detection, in purple boxes, that are created.
interestingFrameInds = [150,160,170,330,350,370,Inf]; interestingFrames = cell(1,numel(interestingFrameInds)-1); ind = 0; frameCount = 0; numFrames = vidReader.NumFrames; bboxes = cell(1,numFrames); centroids = cell(1,numFrames); while hasFrame(vidReader) % Read a video frame and detect objects in it. frame = readFrame(vidReader); % Read frame frameCount = frameCount + 1; % Increment frame count % Detect blocs in the video frame [centroids{frameCount}, bboxes{frameCount}] = detectBlobs(detectorObjects, frame); % Annotate frame with blobs frame = insertShape(frame,"rectangle",bboxes{frameCount}, ... Color="magenta",LineWidth=4); % Add frame count in the top right corner frame = insertText(frame,[0,0],"Frame: "+int2str(frameCount), ... BoxColor="black",TextColor="yellow",BoxOpacity=1); % Display Video step(vidPlayer,frame); % Grab interesting frames if frameCount == interestingFrameInds(ind+1) ind = ind + 1; interestingFrames{ind} = frame; end end
Occlusion and Missed Detections
The first challenge with vision-based tracking is occlusion. Occlusion happens when a moving object moves behind another object, whether moving or static. In the series of pictures below, follow the detection of the person on the left when he is about to go behind the plant (frame 150), when he is completely occluded by the plant (frame 160), and when he emerges on the other side of the plant (frame 170).
imshow(interestingFrames{1});
imshow(interestingFrames{2});
imshow(interestingFrames{3});
Unresolved Detections
A second common challenge in tracking is when the detector is unable to resolve two or more objects when they are near each other. In this video, two individuals approach each other and then continue on their way. As long as they are far from each other, the blob detector can resolve two distinct blobs (frame 330). However, when the two individuals are too close to each other, the blob detector merges the two blobs into a single unresolved blob (frame 350). Only after the two people separate, the blob detector can resolve them and provides two separate detections (frame 370).
imshow(interestingFrames{4});
imshow(interestingFrames{5});
imshow(interestingFrames{6});
Use Multi-Object Trackers to Overcome Challenges
Multi-object trackers provide solutions that overcome the challenges described in the previous section.
Occlusion: To keep track of objects that are temporarily occluded, a multi-object tracker uses a track management algorithm. A track management algorithm is responsible for three things:
Start a new track when a new object appears in the frame, which is called track initialization.
Reduce the number of false tracks, which may be caused by false detections from the detector, using a confirmation logic. For example, it may count how many detections have been associated with the track before it is considered as real or confirmed.
Keep tracks that are temporarily occluded a while longer using a deletion logic. For example, the tracker may count how many frames the track was not associated to any detection before it gets deleted.
Unresolved detections: The way the tracker handles unresolved detections depends on the association algorithm that it uses. If the tracker makes crisp association decisions, like a global nearest neighbor tracker does, it can only associate the detection to one track and the other track is considered undetected. If the tracker uses an association algorithm that is probabilistic or allows for multiple hypotheses, both tracks may be maintained for a while longer.
Convert Blob Detections to objectDetection Objects
All the trackers in the Sensor Fusion and Tracking Toolbox™ require an input in the objectDetection
format. This section shows how to convert the blob detections provided by the blob detector into this format. The blob detection consists of the centroid, which the tracker will track, and a bounding box, which the tracker will use to draw the tracks. In objectDetection
terms, the centroid is the Measurement
and the bounding box that is only used for visualization is ObjectAttributes
. The objectDetection
also requires Time
, which in this case will be the frame count. Since the Measurement
is reported in pixels and the Time
is reported in frames, the tracker tracks the centroid position in pixels and velocity in pixels per frame units.
detectionHistory = cell(1,numFrames); for frameCount = 1:numFrames thisFrameCentroids = centroids{frameCount}; thisFrameBboxes = bboxes{frameCount}; numMeasurementsInFrame = size(thisFrameCentroids,1); detectionsInFrame = cell(numMeasurementsInFrame,1); for detCount = 1:numMeasurementsInFrame detectionsInFrame{detCount} = objectDetection(... frameCount, ... % Use frame count as time thisFrameCentroids(detCount,:), ... % Use centroid as measurement in pixels MeasurementNoise = diag([100 100]), ... % Centroid measurement noise in pixels ObjectAttributes = struct(BoundingBox = thisFrameBboxes(detCount,:)) ... % Attach bounding box information ); end detectionHistory{frameCount} = detectionsInFrame; end
Define Multi-Object Tracker
To use a multi-object tracker, first define the object. The following code section defines a global nearest neighbor (GNN) tracker, trackerGNN
. The term GNN relates to how the tracker associates detections with tracks, in this case using the best association as found by the Hungarian algorithm. The benefit of GNN is its simplicity, but, as the next section shows, different association algorithms can lead to better tracking.
Generally, trackerGNN
can handle any number of sensors and any number of tracks. In this video, there are only several people and one sensor. Therefore, define the tracker for one sensor and 10 tracks.
tracker = trackerGNN(MaxNumSensors=1,MaxNumTracks=10);
Next, define how to track the people in the video. The video has a high frame rate of 30 frames per second. Within the short periods of time between frames, people motion can be described as mostly constant velocity. Therefore, tracking the centroid of the bounding box as a constant velocity linear Kalman filter is the simplest way. The function initcvkf
defines an initialization function for a constant velocity Kalman filter.
tracker.FilterInitializationFcn = @initcvkf;
Finally, a multi-object tracker needs to handle the occlusion and appearance/disappearance of people from the frame. The ConfirmationThreshold
and DeletionThreshold
properties control how quickly a track is confirmed after appearance and how quickly it is deleted after disappearance or in cases of occlusion. As seen in the previous section, there are very few false detections in the video. Therefore, ConfirmationThreshold
can be as low as 2-out-of-2 or even 1-out-of-1. Setting DeletionThreshold
requires more tuning based on the frame rate and length of occlusion events. 23-out-of-23 means that a track is deleted if it is not associated with any detection for 23 consecutive frames.
tracker.ConfirmationThreshold = [2 2]; % Quick to confirm tracker.DeletionThreshold = [23 23]; % Slow to delete
Run Multi-Object Tracker
The following code block runs the tracker using the detections gathered earlier. The tracker outputs, called tracks, are displayed using a yellow bounding box annotated over the video frame. When a track is not assigned to any detections in the current frame, it is marked as predicted in the annotation.
vidReader.CurrentTime = 0; % Reset the video reader ind = 0; frameCount = 0; numFrames = vidReader.NumFrames; if isempty(vidPlayer.Location) vidPlayer = vision.DeployableVideoPlayer; end while hasFrame(vidReader) % Read a video frame and detect objects in it. frame = readFrame(vidReader); % Read frame frameCount = frameCount + 1; % Increment frame count % Update the tracker if isLocked(tracker) || ~isempty(detectionHistory{frameCount}) tracks = tracker(detectionHistory{frameCount}, frameCount); else tracks = objectTrack.empty; end % Add track information to the frame frame = insertTracksToFrame(frame, tracks); % Add frame count on the top right corner frame = insertText(frame,[0,0],"Frame: "+int2str(frameCount), ... BoxColor="black",TextColor="yellow",BoxOpacity=1); % Display Video step(vidPlayer,frame); % Grab interesting frames if frameCount == interestingFrameInds(ind+1) ind = ind + 1; interestingFrames{ind} = frame; end end
Observe the Results
This section reviews the same occlusion and unresolved detection situations showed in the first section. Observe how the tracker keeps predicting the individuals in the frame even as they are not detected due to occlusion or when the detection is unresolved. Keeping the same track ID, as indicated by the integer number above the bounding box, shows that the tracker maintains them as the same object. This is important for continuity from frame to frame as well as counting the total number of people in the scene.
figure;imshow(interestingFrames{1});
figure;imshow(interestingFrames{2});
figure;imshow(interestingFrames{3});
figure;imshow(interestingFrames{4});
figure;imshow(interestingFrames{5});
figure;imshow(interestingFrames{6});
Explore Other Trackers and Track Management Settings
As mentioned above, GNN is just one type of association algorithm. Other association types include joint probabilistic data association (JPDA) and multiple hypothesis tracking (MHT). These algorithms are better at handling cases of ambiguity in the association of detections with tracks, such as the one that the unresolved detection makes. The Sensor Fusion and Tracking Toolbox provides trackers that are based on JPDA and MHT, trackerJPDA
and trackerTOMHT
. All three trackers follow the same conventions for inputs and outputs as the trackerGNN
. Therefore, you can easily switch between them and see how well they work.
In this section, you can use the provided controls to set the confirmation and deletion thresholds. Then click on "Run Section" on the toolstrip to run the tracker with the new settings.
By default, the example shows how the JPDA tracker can have a lower DeletionThreshold
setting because it probabilistically associates the unresolved detection with both tracks and thus both of them are considered assigned to some degree. Lowering the DeletionThreshold
value allows for faster deletion when an object goes out of frame and the track should be deleted.
tracker = trackerJPDA(MaxNumSensors=1,MaxNumTracks=10,FilterInitializationFcn=@initcvkf); tracker.ConfirmationThreshold = sort([2, 2]); % How fast to confirm a track tracker.DeletionThreshold = sort([11, 11]); % How long to keep a track frames = runTracker(vidReader,tracker,detectionHistory,interestingFrameInds); figure;imshow(frames{1});
figure;imshow(frames{2});
figure;imshow(frames{3});
figure;imshow(frames{4});
figure;imshow(frames{5});
figure;imshow(frames{6});
Use a Different Filter
While a constant velocity Kalman filter is sufficient in this case, sometimes lower frame rates or more maneuvering objects may require more sophisticated models and filters. This section shows how to use a different filter type, in this case a particle filter, trackingPF
. A particle filter maintains the uncertainty about the track state as a collection of particles, which are predicted and corrected using nonlinear functions, and are resampled by the filter. Visualize these particles by small circles to observe how the uncertainty grows when the track is unassigned to a detection and has to be predicted.
release(tracker); tracker.FilterInitializationFcn = @initcv2dpf; frames = runTracker(vidReader, tracker, detectionHistory, interestingFrameInds); figure;imshow(frames{1});
figure;imshow(frames{2});
figure;imshow(frames{3})
Summary
This example shows how to use multi-object trackers to track people in a video. The trackers use different association algorithms and allow you to maintain consistent tracking of individuals in the video. You can tune various parameters, for example the confirmation and deletion thresholds, of each tracker to improve tracking results.
The example also shows how you can visualize the tracks and determine which tracker to use and how to tune it. You can also use track metrics, for example the trackCLEARMetrics
, as shown in the Implement Simple Online and Realtime Tracking example, which requires having ground truth.
This example does not show how to tune the trackers. Tracker tuning is explained in the Tuning a Multi-Object Tracker example.
Supporting Functions
Create Detector Objects
This function creates a foreground detector and a blob analysis object. These two objects are used to detect moving objects in the frame.
The foreground detector segments moving objects from the background. It outputs a binary mask, where the pixel value of 1 corresponds to the foreground and the value of 0 corresponds to the background.
Connected groups of foreground pixels are likely to correspond to moving objects. The blob analysis System object finds such groups (called blobs or connected components) and computes their characteristics, such as their areas, centroids, and the bounding boxes.
function detectorObjects = setupDetectorObjects(minBlobArea) % Create System objects for foreground detection and blob analysis detectorObjects.detector = vision.ForegroundDetector(NumGaussians = 3, ... NumTrainingFrames = 40, MinimumBackgroundRatio = 0.7); detectorObjects.blobAnalyzer = vision.BlobAnalysis(BoundingBoxOutputPort = true, ... AreaOutputPort = true, CentroidOutputPort = true, MinimumBlobArea = minBlobArea); end
Detect Blobs
Use the two detector objects to detect blobs in the frame.
function [centroids, bboxes] = detectBlobs(detectorObjects, frame) % Expected uncertainty (noise) for the blob centroid. % Detect foreground. mask = detectorObjects.detector.step(frame); % Apply morphological operations to remove noise and fill in holes. mask = imopen(mask, strel(rectangle = [6, 6])); mask = imclose(mask, strel(rectangle = [50, 50])); mask = imfill(mask, "holes"); % Perform blob analysis to find connected components. [~, centroids, bboxes] = detectorObjects.blobAnalyzer.step(mask); end
Insert Tracks Information
This function adds bound box annotations to represent the tracks in the frame.
function frame = insertTracksToFrame(frame, tracks) numTracks = numel(tracks); boxes = zeros(numTracks, 4); ids = zeros(numTracks, 1, "int32"); predictedTrackInds = zeros(numTracks, 1); for tr = 1:numTracks % Get bounding boxes. boxes(tr, :) = tracks(tr).ObjectAttributes.BoundingBox; boxes(tr, 1:2) = (tracks(tr).State(1:2:3))'-boxes(tr,3:4)/2; % Get IDs. ids(tr) = tracks(tr).TrackID; if tracks(tr).IsCoasted predictedTrackInds(tr) = tr; end end predictedTrackInds = predictedTrackInds(predictedTrackInds > 0); % Create labels for objects that display the predicted rather % than the actual location. labels = cellstr(int2str(ids)); isPredicted = cell(size(labels)); isPredicted(predictedTrackInds) = {' predicted'}; labels = strcat(labels, isPredicted); % Draw the objects on the frame. frame = insertObjectAnnotation(frame, "rectangle", boxes, labels, ... TextBoxOpacity = 0.5); end
Run the Tracker
This function reads the video frame, runs the tracker with the detections at each frame, and captures interesting frames.
function frames = runTracker(vidReader, tracker, detectionHistory, interestingFrameInds) vidReader.CurrentTime = 0; % Reset the video reader ind = 0; frameCount = 0; frames = cell(1,numel(interestingFrameInds)-1); vidPlayer = vision.DeployableVideoPlayer; isPF = isParticleFilterUsed(tracker,detectionHistory); while hasFrame(vidReader) % Read a video frame and detect objects in it. frame = readFrame(vidReader); % Read frame frameCount = frameCount + 1; % Increment frame count % Update the tracker if isLocked(tracker) || ~isempty(detectionHistory{frameCount}) tracks = tracker(detectionHistory{frameCount}, frameCount); else tracks = objectTrack.empty; end % Add track information to the frame frame = insertTracksToFrame(frame, tracks); % Add particles to display if isPF for trackInd = 1:numel(tracks) % Get particles particles = getTrackFilterProperties(tracker, tracks(trackInd).TrackID, "Particles"); positions = particles{1}; positions = positions([1,3],:)'; % Add particles on frame frame = insertMarker(frame, positions, "circle", Color = "yellow", Size = 1); end end % Add frame count in the top right corner frame = insertText(frame, [0,0], "Frame: " + frameCount, ... BoxColor = "black", TextColor = "yellow", BoxOpacity = 1); % Display Video step(vidPlayer,frame); % Grab interesting frames if frameCount == interestingFrameInds(ind+1) ind = ind + 1; frames{ind} = frame; end end end
isParticleFilterUse
This function returns true
if the tracker uses a particle filter.
function isPF = isParticleFilterUsed(tracker, detectionHistory) isemptyCell = cellfun(@(d) isempty(d), detectionHistory); ind = find(~isemptyCell, 1, "first"); filter = tracker.FilterInitializationFcn(detectionHistory{ind}{1}); isPF = isa(filter, "trackingPF"); end
cvmeas2d
This function returns the two-dimensional measurement of the filter state.
function meas = cvmeas2d(state, varargin) % Measurement model for 2d constant velocity meas3d = cvmeas(state,varargin{:}); meas = meas3d(1:2,:); end
initcv2dpf
This function initializes a 2-D constant velocity particle filter based on an unassigned detection.
function pf = initcv2dpf(detection) %INITCV2DPF Filter initialization function 2D constant velocity particle filter % PF = INITCV2DPF(DETECTION) initialized PF, a trackingPF, filter using % DETECTION, and objectDetection object. PF uses a 2D constant velocity % measurement model. % % The function follows similar steps as initcvpf, but uses the knowledge % that the measurement is the position in rectangular coordinates. classToUse = class(detection.Measurement); % Create Process Noise matrix scaleAccel = ones(1, classToUse); Q = eye(2, classToUse) * scaleAccel; % Store measurement properties n = numel(detection.Measurement); if isscalar(detection.MeasurementNoise) measurementNoise = detection.MeasurementNoise * eye(n,n,classToUse); else measurementNoise = cast(detection.MeasurementNoise,classToUse); end % Number of particles numParticles = 1000; %% Initialize the particle filter in Rectangular frame using state and state covariance posMeas = detection.Measurement(:); velMeas = zeros(n,1,classToUse); posCov = cast(detection.MeasurementNoise,classToUse); velCov = eye(n,n,classToUse); H1d = cast([1 0], classToUse); Hpos = blkdiag(H1d, H1d); % position = Hpos * state Hvel = [zeros(2,1,classToUse),Hpos(:,1:end-1)]; % velocity = Hvel * state state = Hpos' * posMeas(:) + Hvel' * velMeas(:); stateCov = Hpos' * posCov * Hpos + Hvel' * velCov * Hvel; % Measurement related properties are not set for invalid detection. pf = trackingPF(@constvel,@cvmeas2d,state, NumParticles = numParticles, ... StateCovariance = stateCov, ProcessNoise = Q, ... MeasurementNoise = measurementNoise, HasAdditiveProcessNoise = false); setMeasurementSizes(pf,n,n); end