Detect and Segment Objects

Detect objects, recognize text (OCR), barcodes, and fiducial markers, perform semantic and instance segmentation using AI models

Computer Vision Toolbox™ supports end-to-end workflows for object detection, text detection (OCR), and segmentation using AI models. You can start by creating ground truth data through interactive and AI-assisted labeling of images and videos using the Image Labeler and Video Labeler apps. For object detection, Computer Vision Toolbox provides pretrained deep learning models such as YOLO, RTMDet, SSD and Grounding DINO, which you can use directly or fine-tune for your application using transfer learning. You can also evaluate the object detection performance metrics using the Object Detector Analyzer app. For more information on object detection, see Get Started with Object Detection Using Deep Learning.

For semantic segmentation, you can use pretrained deep learning models like U-Net, DeepLab v3+, BiseNet v2, and 3-D U-Net. For more information on semantic segmentation, see Get Started with Semantic Segmentation Using Deep Learning. For instance segmentation, you can use pretrained deep learning models like SOLOv2 and Mask R-CNN. For more information on instance segmentation, see Get Started with Instance Segmentation Using Deep Learning

For text detection, you can use the MSER feature detector or CRAFT deep learning model, and then recognize the detected text using OCR. For more information, see Getting Started with OCR. Computer Vision Toolbox also provides a pretrained HRNet keypoint detector for human pose estimation which you can also fine-tune for custom keypoint detection on other objects. For more information, see Getting Started with HRNet.