Main Content

Detect and Segment Objects

Detect objects, recognize text (OCR), barcodes, and fiducial markers, perform semantic and instance segmentation using AI models

Computer Vision Toolbox™ supports end-to-end workflows for object detection, text detection (OCR), and segmentation using AI models. You can start by creating ground truth data through interactive and AI-assisted labeling of images and videos using the Image Labeler and Video Labeler apps. For object detection, Computer Vision Toolbox provides pretrained deep learning models such as YOLO, RTMDet, SSD and Grounding DINO, which you can use directly or fine-tune for your application using transfer learning. You can also evaluate the object detection performance metrics using the Object Detector Analyzer app. For more information on object detection, see Get Started with Object Detection Using Deep Learning.

For semantic segmentation, you can use pretrained deep learning models like U-Net, DeepLab v3+, BiseNet v2, and 3-D U-Net. For more information on semantic segmentation, see Get Started with Semantic Segmentation Using Deep Learning. For instance segmentation, you can use pretrained deep learning models like SOLOv2 and Mask R-CNN. For more information on instance segmentation, see Get Started with Instance Segmentation Using Deep Learning

For text detection, you can use the MSER feature detector or CRAFT deep learning model, and then recognize the detected text using OCR. For more information, see Getting Started with OCR. Computer Vision Toolbox also provides a pretrained HRNet keypoint detector for human pose estimation which you can also fine-tune for custom keypoint detection on other objects. For more information, see Getting Started with HRNet.

Categories

  • Object Detection
    Label ground truth and detect objects using pretrained AI models like YOLO and Grounding DINO, create custom detectors using transfer learning
  • Semantic Segmentation
    Label ground truth and perform semantic segmentation using pretrained AI models, train custom networks like U-Net with transfer learning
  • Instance Segmentation
    Label ground truth and perform instance segmentation using pretrained AI models like SOLOv2, Mask R-CNN, and SAM, or train custom networks with transfer learning
  • Text, Barcode, and Fiducial Marker Detection and Recognition
    Detect and recognize text (OCR), barcodes, and fiducial markers using AI models
  • Keypoint Detection
    Estimate human pose in images using pretrained HRNet keypoint detector or train custom object keypoint detector
  • Automated Visual Inspection
    Automate quality control tasks using anomaly detection and localization methods

Featured Examples