Ground Truth Images and Video

Interactively label images and videos using AI-assisted automation, create training data for AI models, and manage collaborative team labeling for large data sets

Computer Vision Toolbox™ provides a complete workflow for generating ground truth data from images and videos to train AI models for tasks such as object detection, semantic segmentation, instance segmentation, text recognition, and image or video classification. You can start by using the Image Labeler and Video Labeler apps to interactively annotate data with a wide range of label types. These include rectangles, polygons, polylines, scene labels, and pixel-level labels. To get started labeling a collection of images, see Get Started with the Image Labeler. To get started labeling a video or sequence of images, see Get Started with the Video Labeler.

The Image Labeler and Video Labeler apps support manual, AI-assisted and automated annotation, allowing you to accelerate labeling using built-in AI models like the Segment Anything Model (SAM) and Grounding DINO. For more information, see Get Started with AI-Assisted and Automated Labeling. You can also integrate custom automation algorithms to tailor the labeling process to your specific needs. For more details, see Create Custom Automation Algorithm for Labeling.

Once labeling is complete, you can export the annotated data and postprocess it to create training data sets for AI models. The toolbox supports workflows for organizing and managing labeled data, enabling seamless integration with training pipelines for classification, detection, and segmentation tasks.

For collaborative projects, the Image Labeler app includes features to manage team-based labeling, enabling you to distribute labeling tasks, review annotations, provide feedback, and track progress across multiple contributors. This makes it easier to scale labeling efforts and maintain consistency across large data sets. For more details, see Get Started with Team-Based Labeling.

Montage with image on the left showing rectangle and projected cuboid bounding boxes, while the image on the right shows semantic pixel labels and polygon ROI labels.

Highlighted Topics

Featured Examples

Automatically Label Ground Truth Using Segment Anything Model

Produce pixel labels for semantic segmentation using the Segment Anything Model (SAM) in the Image Labeler app. The SAM is an automatic segmentation technique that you can use to segment object regions to label with just a few clicks, or automatically segment the entire image and instantaneously create labels for selected regions. In this example, you interactively label pixels for semantic segmentation in two ways.

Since R2024b
Open Live Script

New

Automatically Label Ground Truth Using Vision-Language Model

Automatically label ground truth images for object detection using the Grounding DINO vision-language model (VLM).

Since R2026a
Open Live Script

New

Automate Ground Truth Polygon Labeling Using Grounded SAM Model

Combine Grounding DINO and the Segment Anything Model 2 (SAM 2) to automatically produce polygon labels using the Video Labeler app.

Since R2026a
Open Live Script

Automate Ground Truth Labeling for Semantic Segmentation

Use a pretrained semantic segmentation algorithm to segment the sky and a road in an image.

Open Live Script

New

Automate Ground Truth Labeling for Instance Segmentation

Create an automation algorithm to automatically label data for instance segmentation using a pretrained SOLOv2 network in the Video Labeler app.

Since R2026a
Open Live Script

Automate Ground Truth Labeling for Object Detection

Create an automation algorithm to automatically label data for object detection using a pretrained object detector.

Open Live Script

Automate Ground Truth Labeling for OCR

Automate the labeling of text for OCR training and evaluation.

Open Live Script

Automate Labeling of Objects in Video Using RAFT Optical Flow

Use a pretrained RAFT optical flow estimation network to propagate a predefined object mask from one frame to the next in a video sequence.

Since R2024b
Open Live Script

Export Ground Truth Object to Custom and COCO JSON Files

Export a ground truth object to a custom data format JavaScript Object Notation (JSON) file and to a COCO data format JSON file.

Open Live Script

Convert Image Labeler Polygons to Labeled Blocked Image for Semantic Segmentation

Convert polygon labels stored in a groundTruth object into a labeled blocked image for semantic segmentation workflows.

Open Live Script

Ground Truth Images and Video

Highlighted Topics

Categories

Featured Examples