batchnorm

Normalize data across all observations for each channel independently

collapse all in page

Syntax

Y = batchnorm(X,offset,scaleFactor)

[Y,popMu,popSigmaSq] = batchnorm(X,offset,scaleFactor)

[Y,updatedMu,updatedSigmaSq] = batchnorm(X,offset,scaleFactor,runningMu,runningSigmaSq)

Y = batchnorm(X,offset,scaleFactor,trainedMu,trainedSigmaSq)

[___] = batchnorm(___,'DataFormat',FMT)

[___] = batchnorm(___,Name,Value)

Description

The batch normalization operation normalizes the input data across all observations for each channel independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as relu.

After normalization, the operation shifts the input by a learnable offset β and scales it by a learnable scale factor γ.

The batchnorm function applies the batch normalization operation to dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the "S", "T", "C", and "B" labels, respectively. For unspecified and other dimensions, use the "U" label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the DataFormat option.

Note

To apply batch normalization within a dlnetwork object, use batchNormalizationLayer.

Y = batchnorm(X,offset,scaleFactor) applies the batch normalization operation to the input data X using the population mean and variance of the input data and the specified offset and scale factor.

The function normalizes over the 'S' (spatial), 'T' (time), 'B' (batch), and 'U' (unspecified) dimensions of X for each channel in the 'C' (channel) dimension, independently.

For unformatted input data, use the 'DataFormat' option.

example

[Y,popMu,popSigmaSq] = batchnorm(X,offset,scaleFactor) applies the batch normalization operation and also returns the population mean and variance of the input data X.

[Y,updatedMu,updatedSigmaSq] = batchnorm(X,offset,scaleFactor,runningMu,runningSigmaSq) applies the batch normalization operation and also returns the updated moving mean and variance statistics. runningMu and runningSigmaSq are the mean and variance values after the previous training iteration, respectively.

Use this syntax to maintain running values for the mean and variance statistics during training. When you have finished training, use the final updated values of the mean and variance for the batch normalization operation during prediction and classification.

example

Y = batchnorm(X,offset,scaleFactor,trainedMu,trainedSigmaSq) applies the batch normalization operation using the mean trainedMu and variance trainedSigmaSq.

Use this syntax during classification and prediction, where trainedMu and trainedSigmaSq are the final values of the mean and variance after you have finished training, respectively.

[___] = batchnorm(___,'DataFormat',FMT) applies the batch normalization operation to unformatted input data with format specified by FMT using any of the input or output combinations in previous syntaxes. The output Y is an unformatted dlarray object with dimensions in the same order as X. For example, 'DataFormat','SSCB' specifies data for 2-D image input with the format 'SSCB' (spatial, spatial, channel, batch).

[___] = batchnorm(___,Name,Value) specifies additional options using one or more name-value pair arguments. For example, 'MeanDecay',0.3 sets the decay rate of the moving average computation.

Examples

collapse all

Apply Batch Normalization

Open Live Script

Create a formatted dlarray object containing a batch of 128 28-by-28 images with 3 channels. Specify the format 'SSCB' (spatial, spatial, channel, batch).

miniBatchSize = 128;
inputSize = [28 28];
numChannels = 3;
X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize);
dlX = dlarray(X,'SSCB');

View the size and format of the input data.

size(dlX)

ans = 1×4

    28    28     3   128

dims(dlX)

ans = 
'SSCB'

Initialize the scale and offset for batch normalization. For the scale, specify a vector of ones. For the offset, specify a vector of zeros.

scaleFactor = ones(numChannels,1);
offset = zeros(numChannels,1);

Apply the batch normalization operation using the batchnorm function and return the mini-batch statistics.

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor);

View the size and format of the output dlY.

size(dlY)

ans = 1×4

    28    28     3   128

dims(dlY)

ans = 
'SSCB'

View the mini-batch mean mu.

mu

View the mini-batch variance sigmaSq.

sigmaSq

sigmaSq = 3×1

    0.0831
    0.0832
    0.0835

Update Mean and Variance over Multiple Batches of Data

Open Live Script

Use the batchnorm function to normalize several batches of data and update the statistics of the whole data set after each normalization.

Create three batches of data. The data consists of 10-by-10 random arrays with five channels. Each batch contains 20 observations. The second and third batches are scaled by a multiplicative factor of 1.5 and 2.5, respectively, so the mean of the data set increases with each batch.

height = 10;
width = 10;
numChannels = 5;
observations = 20;

X1 = rand(height,width,numChannels,observations);
dlX1 = dlarray(X1,"SSCB");

X2 = 1.5*rand(height,width,numChannels,observations);
dlX2 = dlarray(X2,"SSCB");

X3 = 2.5*rand(height,width,numChannels,observations);
dlX3 = dlarray(X3,"SSCB");

Create the learnable parameters.

offset = zeros(numChannels,1);
scale = ones(numChannels,1);

Normalize the first batch of data dlX1 using batchnorm. Obtain the values of the mean and variance of this batch as outputs.

[dlY1,mu,sigmaSq] = batchnorm(dlX1,offset,scale);

Normalize the second batch of data dlX2. Use mu and sigmaSq as inputs to obtain the values of the combined mean and variance of the data in batches dlX1 and dlX2.

[dlY2,datasetMu,datasetSigmaSq] = batchnorm(dlX2,offset,scale,mu,sigmaSq);

Normalize the final batch of data dlX3. Update the data set statistics datasetMu and datasetSigmaSq to obtain the values of the combined mean and variance of all data in batches dlX1, dlX2, and dlX3.

[dlY3,datasetMuFull,datasetSigmaSqFull] = batchnorm(dlX3,offset,scale,datasetMu,datasetSigmaSq);

Observe the change in the mean of each channel as each batch is normalized.

plot([mu datasetMu datasetMuFull]')
legend("Channel " + string(1:5),"Location","southeast")
xticks([1 2 3])
xlabel("Number of Batches")
xlim([0.9 3.1])
ylabel("Per-Channel Mean")
title("Data Set Mean")

Figure contains an axes object. The axes object with title Data Set Mean, xlabel Number of Batches, ylabel Per-Channel Mean contains 5 objects of type line. These objects represent Channel 1, Channel 2, Channel 3, Channel 4, Channel 5.

Input Arguments

collapse all

`X` — Input data
`dlarray` | numeric array

Input data, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

If X is an unformatted dlarray or a numeric array, then you must specify the format using the DataFormat option. If X is a numeric array, then either scaleFactor or offset must be a dlarray object.

X must have a "C" (channel) dimension.

`offset` — Offset
`dlarray` | numeric array

Offset β, specified as a formatted dlarray, an unformatted dlarray, or a numeric array with one nonsingleton dimension with size matching the size of the 'C' (channel) dimension of the input X.

If offset is a formatted dlarray object, then the nonsingleton dimension must have label 'C' (channel).

`scaleFactor` — Scale factor
`dlarray` | numeric array

Scale factor γ, specified as a formatted dlarray, an unformatted dlarray, or a numeric array with one nonsingleton dimension with size matching the size of the 'C' (channel) dimension of the input X.

If scaleFactor is a formatted dlarray object, then the nonsingleton dimension must have label 'C' (channel).

`runningMu` — Running value of mean statistic
numeric vector

Running value of mean statistic, specified as a numeric vector of the same length as the 'C' dimension of the input data.

To maintain a running value for the mean during training, provide runningMu as the updatedMu output of the previous training iteration.

Data Types: single | double

`runningSigmaSq` — Running value of variance statistic
numeric vector

Running value of variance statistic, specified as a numeric vector of the same length as the 'C' dimension of the input data.

To maintain a running value for the variance during training, provide runningSigmaSq as the updatedSigmaSq output of the previous training iteration.

Data Types: single | double

`trainedMu` — Final value of mean statistic after training
numeric vector

Final value of mean statistic after training, specified as a numeric vector of the same length as the 'C' dimension of the input data.

During classification and prediction, provide trainedMu as the updatedMu output of the final training iteration.

Data Types: single | double

`trainedSigmaSq` — Final value of variance statistic after training
numeric vector

Final value of variance statistic after training, specified as a numeric vector of the same length as the 'C' dimension of the input data.

During classification and prediction, provide trainedSigmaSq as the updatedSigmaSq output of the final training iteration.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'MeanDecay',0.3,'VarianceDecay',0.5 sets the decay rate for the moving average computations of the mean and variance of several batches of data to 0.3 and 0.5, respectively.

`DataFormat` — Description of data dimensions
character vector | string scalar

Description of the data dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

For example, consider an array that represents a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can describe the data as having the format "CBT" (channel, batch, time).

You can specify multiple dimensions labeled "S" or "U". You can use the labels "C", "B", and "T" once each, at most. The software ignores singleton trailing "U" dimensions after the second dimension.

If the input data is not a formatted dlarray object, then you must specify the DataFormat option.

For more information, see Deep Learning Data Formats.

Data Types: char | string

`Epsilon` — Constant to add to mini-batch variances
`1e-5` (default) | positive scalar

Constant to add to the mini-batch variances, specified as a positive scalar.

The software adds this constant to the mini-batch variances before normalization to ensure numerical stability and avoid division by zero.

Before R2023a: Epsilon must be greater than or equal to 1e-5.

`MeanDecay` — Decay value for moving mean computation
`0.1` (default) | numeric scalar between `0` and `1`

Decay value for the moving mean computation, specified as a numeric scalar between 0 and 1.

The function updates the moving mean value using

$μ^{*} = λ_{μ} \hat{μ} + (1 - λ_{μ}) μ,$

where $μ^{*}$ denotes the updated mean updatedMu, $λ_{μ}$ denotes the mean decay value 'MeanDecay', $\hat{μ}$ denotes the mean of the input data, and $μ$ denotes the current value of the mean mu.

Data Types: single | double

`VarianceDecay` — Decay value for moving variance computation
`0.1` (default) | numeric scalar between `0` and `1`

Decay value for the moving variance computation, specified as a numeric scalar between 0 and 1.

The function updates the moving variance value using

$σ^{2}^{*} = λ_{σ^{2}} \hat{σ^{2}} + (1 - λ_{σ^{2}}) σ^{2},$

where $σ^{2}^{*}$ denotes the updated variance updatedSigmaSq, $λ_{σ^{2}}$ denotes the variance decay value 'VarianceDecay', $\hat{σ^{2}}$ denotes the variance of the input data, and $σ^{2}$ denotes the current value of the variance sigmaSq.

Data Types: single | double

Output Arguments

collapse all

`Y` — Normalized data
`dlarray`

Normalized data, returned as a dlarray with the same underlying data type as X.

If the input data X is a formatted dlarray, then Y has the same format as X. If the input data is not a formatted dlarray, then Y is an unformatted dlarray with the same dimension order as the input data.

The size of the output Y matches the size of the input X.

`popMu` — Per-channel mean
numeric column vector

Per-channel mean of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

`popSigmaSq` — Per-channel variance
numeric column vector

Per-channel variance of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

`updatedMu` — Updated mean statistic
numeric vector

Updated mean statistic, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data.

The function updates the moving mean value using

$μ^{*} = λ_{μ} \hat{μ} + (1 - λ_{μ}) μ,$

`updatedSigmaSq` — Updated variance statistic
numeric vector

Updated variance statistic, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data.

The function updates the moving variance value using

$σ^{2}^{*} = λ_{σ^{2}} \hat{σ^{2}} + (1 - λ_{σ^{2}}) σ^{2},$

Algorithms

collapse all

Batch Normalization

The batch normalization operation normalizes the elements x_i of the input by first calculating the mean μ_B and variance σ_B² over the spatial, time, and observation dimensions for each channel independently. Then, it calculates the normalized activations as

$\hat{x_{i}} = \frac{x_{i} - μ_{B}}{\sqrt{σ_{B}^{2} + ϵ}},$

where ϵ is a constant that improves numerical stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow batch normalization, the batch normalization operation further shifts and scales the activations using the transformation

$y_{i} = γ {\hat{x}}_{i} + β,$

where the offset β and scale factor γ are learnable parameters that are updated during network training.

To make predictions with the network after training, batch normalization requires a fixed mean and variance to normalize the data. This fixed mean and variance can be calculated from the training data after training, or approximated during training using running statistic computations.

Deep Learning Array Formats

Most deep learning networks and functions operate on different dimensions of the input data in different ways.

For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.

To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

To create formatted input data, create a dlarray object and specify the format using the second argument.

To provide additional layout information with unformatted data, specify the format using the DataFormat argument.

For more information, see Deep Learning Data Formats.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Code generation does not support the use of name-value arguments 'MeanDecay' and 'VarianceDecay'.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Refer to the usage notes and limitations in the C/C++ Code Generation section. The same limitations apply to GPU code generation.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The batchnorm function supports GPU array input with these usage notes and limitations:

When at least one of the following input arguments is a gpuArray or a dlarray with underlying data of type gpuArray, this function runs on the GPU:
- X
- offset
- scaleFactor

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2019b

expand all

R2023a: `Epsilon` supports values less than `1e-5`

The Epsilon option also supports positive values less than 1e-5.

batchnorm

Syntax

Description

Examples

Apply Batch Normalization

Update Mean and Variance over Multiple Batches of Data

Input Arguments

`X` — Input data
`dlarray` | numeric array

`offset` — Offset
`dlarray` | numeric array

`scaleFactor` — Scale factor
`dlarray` | numeric array

`runningMu` — Running value of mean statistic
numeric vector

`runningSigmaSq` — Running value of variance statistic
numeric vector

`trainedMu` — Final value of mean statistic after training
numeric vector

`trainedSigmaSq` — Final value of variance statistic after training
numeric vector

Name-Value Arguments

`DataFormat` — Description of data dimensions
character vector | string scalar

`Epsilon` — Constant to add to mini-batch variances
`1e-5` (default) | positive scalar

`MeanDecay` — Decay value for moving mean computation
`0.1` (default) | numeric scalar between `0` and `1`

`VarianceDecay` — Decay value for moving variance computation
`0.1` (default) | numeric scalar between `0` and `1`

Output Arguments

`Y` — Normalized data
`dlarray`

`popMu` — Per-channel mean
numeric column vector

`popSigmaSq` — Per-channel variance
numeric column vector

`updatedMu` — Updated mean statistic
numeric vector

`updatedSigmaSq` — Updated variance statistic
numeric vector

Algorithms

Batch Normalization

Deep Learning Array Formats

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023a: `Epsilon` supports values less than `1e-5`

See Also

Topics

batchnorm

Syntax

Description

Examples

Apply Batch Normalization

Update Mean and Variance over Multiple Batches of Data

Input Arguments

X — Input data dlarray | numeric array

offset — Offset dlarray | numeric array

scaleFactor — Scale factor dlarray | numeric array

runningMu — Running value of mean statistic numeric vector

runningSigmaSq — Running value of variance statistic numeric vector

trainedMu — Final value of mean statistic after training numeric vector

trainedSigmaSq — Final value of variance statistic after training numeric vector

Name-Value Arguments

DataFormat — Description of data dimensions character vector | string scalar

Epsilon — Constant to add to mini-batch variances 1e-5 (default) | positive scalar

MeanDecay — Decay value for moving mean computation 0.1 (default) | numeric scalar between 0 and 1

VarianceDecay — Decay value for moving variance computation 0.1 (default) | numeric scalar between 0 and 1

Output Arguments

Y — Normalized data dlarray

popMu — Per-channel mean numeric column vector

popSigmaSq — Per-channel variance numeric column vector

updatedMu — Updated mean statistic numeric vector

updatedSigmaSq — Updated variance statistic numeric vector

Algorithms

Batch Normalization

Deep Learning Array Formats

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023a: Epsilon supports values less than 1e-5

See Also

Topics

`X` — Input data
`dlarray` | numeric array

`offset` — Offset
`dlarray` | numeric array

`scaleFactor` — Scale factor
`dlarray` | numeric array

`runningMu` — Running value of mean statistic
numeric vector

`runningSigmaSq` — Running value of variance statistic
numeric vector

`trainedMu` — Final value of mean statistic after training
numeric vector

`trainedSigmaSq` — Final value of variance statistic after training
numeric vector

`DataFormat` — Description of data dimensions
character vector | string scalar

`Epsilon` — Constant to add to mini-batch variances
`1e-5` (default) | positive scalar

`MeanDecay` — Decay value for moving mean computation
`0.1` (default) | numeric scalar between `0` and `1`

`VarianceDecay` — Decay value for moving variance computation
`0.1` (default) | numeric scalar between `0` and `1`

`Y` — Normalized data
`dlarray`

`popMu` — Per-channel mean
numeric column vector

`popSigmaSq` — Per-channel variance
numeric column vector

`updatedMu` — Updated mean statistic
numeric vector

`updatedSigmaSq` — Updated variance statistic
numeric vector

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2023a: `Epsilon` supports values less than `1e-5`