Main Content

Get Started with Image Preprocessing and Augmentation for Deep Learning

Data preprocessing consists of a series of deterministic operations that normalize or enhance desired data features. For example, you can normalize data to a fixed range or rescale data to the size required by the network input layer. Preprocessing is used for training, validation, and inference.

Preprocessing can occur at two stages in the deep learning workflow.

  • Commonly, preprocessing occurs as a separate step that you complete before preparing the data to be fed to the network. You load your original data, apply the preprocessing operations, then save the result to disk. The advantage of this approach is that the preprocessing overhead is only required once, then the preprocessed images are readily available as a starting place for all future trials of training a network.

  • If you load your data into a datastore, then you can also apply preprocessing during training by using the transform and combine functions. For more information, see Datastores for Deep Learning (Deep Learning Toolbox). The transformed images are not stored in memory. This approach is convenient to avoid writing a second copy of training data to disk if your preprocessing operations are not computationally expensive and do not noticeably impact the speed of training the network.

Data augmentation consists of randomized operations that are applied to the training data while the network is training. Augmentation increases the effective amount of training data and helps to make the network invariant to common distortion in the data. For example, you can add artificial noise to training data so that the network is invariant to noise.

To augment training data, start by loading your data into a datastore. Some built-in datastores apply a specific and limited set of augmentation to data for specific applications. You can also apply your own set of augmentation operations on data in the datastore by using the transform and combine functions. During training, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. For more information, see Preprocess Images for Deep Learning and Preprocess Volumes for Deep Learning.

Preprocess and Augment Images

Common image preprocessing operations include noise removal, edge-preserving smoothing, color space conversion, contrast enhancement, and morphology.

Augment image data to simulate variations in the image acquisition. For example, the most common type of image augmentation operations are geometric transformations such as rotation and translation, which simulate variations in the camera orientation with respect to the scene. Color jitter simulates variations of lighting conditions and color in the scene. Artificial noise simulates distortions caused by the electrical fluctuations in the sensor and analog-to-digital conversion errors. Blur simulates an out-of-focus lens or movement of the camera with respect to the scene.

You can process and augment image data using the operations in this table, as well as any other functionality in the toolbox. For an example that shows how to create and apply these transformations, see Augment Images for Deep Learning Workflows.

Processing TypeDescriptionSample FunctionsSample Output
Resize imagesResize images by a fixed scaling factor or to a target size

The original image is on the left, and a resized image is on the right.

Crop imagesCrop an image to a target size from the center or a random position

An image cropped from the center is on the left, and an image cropped from a random position is on the right.

Warp imagesApply random reflection, rotation, scale, shear, and translation to images

From left to right, the figure shows the original image and the resulting images after reflection, rotation, and scaling.

Jitter colorRandomly adjust image hue, saturation, brightness, or contrast of color images

From left to right, the figure shows the original image with random adjustments to the image hue, saturation, brightness, and contrast.

Simulate noiseAdd random Gaussian, Poisson, salt and pepper, or multiplicative noise

An image with randomly added salt and pepper noise is on the left, and an image with randomly added Gaussian noise is on the right.

Simulate blurAdd Gaussian or directional motion blur

An image with a Gaussian blur is on the left, and an image with a directional motion blur is on the right.

Jitter intensityRandomly adjust the brightness, contrast, or gamma correction of grayscale images and volumes

From left to right, the figure shows the original image with random adjustments to the brightness, contrast, and gamma correction.

Preprocess and Augment Pixel Label Images for Semantic Segmentation

Semantic segmentation data consists of images and corresponding pixel labels represented as categorical arrays. For more information, see Getting Started with Semantic Segmentation Using Deep Learning.

You can use the Image Labeler and the Video Labeler apps to interactively label pixels and export the label data for training a neural network. If you have Automated Driving Toolbox™, then you also use the Ground Truth Labeler (Automated Driving Toolbox) app to create labeled ground truth training data.

When you transform an image for semantic segmentation, you must perform an identical transformation to the corresponding pixel labeled image. You can preprocess pixel label images using the functions in the table and any other function that supports categorical input. For an example that shows how to create and apply these transformations, see Augment Pixel Labels for Semantic Segmentation.

Processing TypeDescriptionSample FunctionsSample Output
Resize pixel labelsResize pixel label images by a fixed scaling factor or to a target size

The original pixel label image is on the left, and a resized pixel label image is on the right.

Crop pixel labelsCrop a pixel label image to a target size from the center or a random position

A pixel label image cropped from the center is on the left, and a pixel labeled image cropped from a random position is on the right.

Warp pixel labelsApply random reflection, rotation, scale, shear, and translation to pixel label images

From left to right, the figure shows the original pixel labeled image and the resulting pixel labeled image after reflection, rotation, and scaling.

Preprocess and Augment Bounding Boxes for Object Detection

Object detection data consists of an image and bounding boxes that describe the location and characteristics of objects in the image. For more information, see Get Started with Object Detection Using Deep Learning.

You can use the Image Labeler and the Video Labeler apps to interactively label ROIs and export the label data for training a neural network. If you have Automated Driving Toolbox, then you also use the Ground Truth Labeler (Automated Driving Toolbox) app to create labeled ground truth training data.

When you transform an image, you must perform an identical transformation to the corresponding bounding boxes. You can process bounding box data using the operations in the table. For an example that shows how to create and apply these transformations, see Augment Bounding Boxes for Object Detection.

Processing TypeDescriptionSample FunctionsSample Output
Resize bounding boxesResize bounding boxes by a fixed scaling factor or to a target size

The original image with a bounding box is on the left, and the resized image and corresponding resized bounding box is on the right.

Crop bounding boxesCrop a bounding box to a target size from the center or a random position

The original image with a bounding box is on the left, and the cropped image with corresponding cropped bounding box is on the right.

Warp bounding boxesApply reflection, rotation, scale, shear, and translation to bounding boxes

From left to right, the figure shows the original image with a bounding box, and the resulting images with corresponding bounding boxes after reflection, rotation, and scaling.

Related Examples

More About