Main Content

imsegsam

Perform automatic full image segmentation using Segment Anything Model (SAM)

Since R2024b

Description

Use the imsegsam function to automatically segment the entire image or all of the objects inside an ROI using the Segment Anything Model (SAM). The SAM samples a regular grid of points on an image and returns a set of predicted masks for each point, which enables the model to produce multiple masks for each object and its subregions. You can customize various segmentation settings based on your application, such as the ROI in which to segment objects, the size range of objects which to segment, and the confidence score threshold with which to filter mask predictions.

Note

This functionality requires Deep Learning Toolbox™, Computer Vision Toolbox™, and the Image Processing Toolbox™ Model for Segment Anything Model. You can install the Image Processing Toolbox Model for Segment Anything Model from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

[masks,scores] = imsegsam(I) automatically segments all objects in an image, I, using the Segment Anything Model (SAM) and returns the masks masks and the prediction confidence scores scores for each segmented object.

Note

This functionality requires Deep Learning Toolbox, Computer Vision Toolbox, and the Image Processing Toolbox Model for Segment Anything Model. You can install the Image Processing Toolbox Model for Segment Anything Model from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

example

[masks,scores] = imsegsam(I,Name=Value) specifies options using one or more name-value arguments. For example, PointGridSize=[64 64] specifies the number of grid points that the imsegsam function samples along the x- and y- directions of the input image as 64 each.

example

Examples

collapse all

Load an image into the workspace.

I = imread("pears.png");

Automatically segment the full image using the Segment Anything Model (SAM).

[masks,scores] = imsegsam(I);
Loading SegmentAnythingModel.
Loading SegmentAnythingModel Complete.

Segmenting using Segment Anything Model.
---------------------------------------------
Processing crop 1/1. 
Processed 1024/1024 point prompts.

Display the masks output, which is a connected component structure.

masks
masks = struct with fields:
    Connectivity: 8
       ImageSize: [486 732]
      NumObjects: 86
    PixelIdxList: {1x86 cell}

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, in the order of the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay,[])

Figure contains an axes object. The hidden axes object contains an object of type image.

Load an image into the workspace.

I = imread("visionteam.jpg");

Segment Image Using SAM

Automatically segment the entire image using the Segment Anything Model (SAM). To reduce the number of segmented objects, specify the MinObjectArea name-value argument as 4000. Specify the ScoreThreshold name-value argument as 0.8, and the Verbose name-value argument as true.

[masks,scores] = imsegsam(I,MinObjectArea=3000,ScoreThreshold=0.8,Verbose=true);
Segmenting using Segment Anything Model.
---------------------------------------------
Processing crop 1/1. 
Processed 1024/1024 point prompts.

Display Masks in Order of Decreasing Mask Area

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, ordered by the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay,[])

Figure contains an axes object. The hidden axes object contains an object of type image.

Display Masks in Order of Increasing Mask Area

To display the object masks ordered by the largest masks on top, instead, first sort the masks contained in the PixelIdxList field of the masks structure in the order of increasing mask area.

objectAreas = cellfun(@(x)numel(x),masks.PixelIdxList);
[~,sortidx] = sort(objectAreas,"ascend");
masks.PixelIdxList = masks.PixelIdxList(sortidx);

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, ordered by the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay,[])

Figure contains an axes object. The hidden axes object contains an object of type image.

Load an image into the workspace.

I = imread("DogTrio.jpg");

Automatically segment the full image using the Segment Anything Model (SAM). To reduce the number of segmented objects, specify the MinObjectArea name-value argument as 5500. Specify the ScoreThreshold name-value argument as 0.65, and the Verbose name-value argument as false.

[masks,scores] = imsegsam(I,MinObjectArea=5500,ScoreThreshold=0.65,Verbose=false);

Convert masks, a connected component structure, to a stack of binary masks, maskStack.

maskStack = false(masks.ImageSize(1),masks.ImageSize(2),masks.NumObjects);
for idx = 1:masks.NumObjects
    mask = false(masks.ImageSize(1),masks.ImageSize(2));
    mask(masks.PixelIdxList{idx}) = true;
    maskStack(:,:,masks.NumObjects-idx+1) = mask;
end

Display the masks with white outlines overlaid on the image, with the smallest object masks on top, using the insertObjectMask (Computer Vision Toolbox) function.

overlayedImg = insertObjectMask(I,maskStack,"MaskColor",lines(masks.NumObjects),"LineColor","white");
imshow(overlayedImg)

Figure contains an axes object. The hidden axes object contains an object of type image.

Input Arguments

collapse all

Image to segment, specified as one of these values.

Image TypeData Format
Grayscale image2-D matrix of size H-by-W.
RGB image3-D array of size H-by-W-by-3.

Tip

For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale the range of pixel values using the rescale function.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: imsegsam(I,PointGridSize=[64 64]) specifies the number of grid points that the imsegsam function samples along the x- and y- directions of the input image as 64.

Point grid size along the x- and y- directions of the image, specified as a 1-by-2 vector. The imsegsam function uses the grid points sampled along each direction as visual prompts for the SAM.

Increase the PointGridSize value for a more precise segmentation at the cost of additional processing time.

Tip

Use a higher value if your image contains small, densely packed objects relative to the image size. For example, if the PointGridSize value is [32 32] and your input image is 1024-by-1024 pixels in size, there are 32 pixels between each grid point. If the smallest object to segment is smaller than 32-by-32 pixels in size, increase the PointGridSize value to sample more grid points and ensure that imsegsam segments the smallest objects.

ROI to segment, specified as an H-by-W logical matrix, where H and W are the height and width of the input image, respectively. The ROI in which to sample grid points and segment all the objects inside is defined by the true (or 0) values of the mask. By default, the ROI size is the size of the full input image.

Specify a PointGridMask to segment all objects within an ROI, instead of the full image, which can help decrease processing time and improve object localization.

Tip

To create a rectangular mask to specify as the PointGridMask, you can use the createMask function.

Number of crop levels, specified as a positive integer. For each level n, the function splits the image into cropped, zoomed-in point grids of size 2(n – 1)-by- 2(n – 1).

To improve the quality of smaller masks, increase the number of crop levels.

Point batch size, specified as a positive integer. Increase the batch size to increase the number of points the function batches and processes together, which can increase processing speed at the expense of higher memory usage.

Increase the batch size to improve processing speed at the expense of higher memory usage.

Point grid downscale factor at each crop level, specified as a positive integer. For a crop level, n, the imsegsam function scales down the PointGridSize value by a factor of DF(n – 1), where DF is the downscale factor. If you specify NumCropLevels as a value greater than 1, you can specify a higher PointGridDownscaleFactor value to decrease computation time.

Confidence score threshold, specified as a numeric scalar in the range [0, 1]. The imsegsam function filters out predictions with confidence scores less than the threshold value. Increase this value to reduce the number of false positives, at the possible expense of missing some true positives.

Overlap threshold, specified as a numeric scalar in the range [0, 1]. When the overlap proportion between two object segmentations is above this value, the function removes the overlapping segmentation with the lower confidence score. Decrease the threshold to reduce the number of overlapping segmentations. However, decreasing the threshold too much can eliminate segmentations with only minor overlap in the image.

Minimum object area to segment, in pixels, specified as a nonnegative numeric scalar. The function discards object segmentations with fewer than the specified number of pixels, which can reduce computation time.

Maximum object area to segment, in pixels, specified as a positive numeric scalar. The function discards object segmentations with greater than the specified number of pixels, which can reduce computation time. To reduce computation time, set this value to the largest known object area for the objects being detected in the image.

Hardware resource on which to process images with the network, specified as one of the execution environment options in this table.

ExecutionEnvironmentDescription
"auto"Use a GPU if available. Otherwise, use the CPU. The use of a GPU requires Parallel Computing Toolbox™ and a CUDA® enabled NVIDIA® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
"gpu"Use the GPU. Using a GPU requires Parallel Computing Toolbox and a CUDA-enabled NVIDIA GPU. If Parallel Computing Toolbox or a suitable GPU is not available, then the function returns an error. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
"cpu"Use the CPU.

Visible progress display, specified as a numeric or logical 1 (true) or 0 (false).

Output Arguments

collapse all

Object masks, returned as a connected component structure with these fields.

FieldDescription
ConnectivityConnectivity of the connected components (objects)
ImageSizeSize of the binary image
NumObjectsNumber of connected components (objects) in the binary image
PixelIdxList1-by-NumObjects cell array where each element in the cell array is a vector containing the linear indices of the pixels in the corresponding object

The PixelIdxList field stores the indices of the object masks as true pixels, sorted in the order of decreasing mask area, in the 1-by-NumObjects cell array.

Tip

To visualize object masks, you can display the masks as a label matrix or a stack of binary masks.

Prediction scores for the segmentation, returned as an N-by-1 numeric vector, where N is the number of connected components detected in the input image.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

Version History

Introduced in R2024b