Main Content

Normalize across all observations for each channel independently

The batch normalization operation normalizes the input data
across all observations for each channel independently. To speed up training of the
convolutional neural network and reduce the sensitivity to network initialization, use batch
normalization between convolution and nonlinear operations such as `relu`

.

After normalization, the operation shifts the input by a learnable offset *β* and scales it by a learnable scale factor *γ*.

The `batchnorm`

function applies the batch normalization operation to
`dlarray`

data.
Using `dlarray`

objects makes working with high
dimensional data easier by allowing you to label the dimensions. For example, you can label
which dimensions correspond to spatial, time, channel, and batch dimensions using the
`'S'`

, `'T'`

, `'C'`

, and
`'B'`

labels, respectively. For unspecified and other dimensions, use the
`'U'`

label. For `dlarray`

object functions that operate
over particular dimensions, you can specify the dimension labels by formatting the
`dlarray`

object directly, or by using the `'DataFormat'`

option.

**Note**

To apply batch normalization within a `layerGraph`

object
or `Layer`

array, use
`batchNormalizationLayer`

.

applies the batch normalization operation to the input data `dlY`

= batchnorm(`dlX`

,`offset`

,`scaleFactor`

)`dlX`

and
transforms using the specified and offset and scale factor.

The function normalizes over the `'S'`

(spatial),
`'T'`

(time), `'B'`

(batch), and
`'U'`

(unspecified) dimensions of `dlX`

for each
channel in the `'C'`

(channel) dimension, independently.

For unformatted input data, use the `'DataFormat'`

option.

`[`

also returns the population mean and variance of the input data
`dlY`

,`popMu`

,`popSigmaSq`

] = batchnorm(`dlX`

,`offset`

,`scaleFactor`

)`dlX`

.

`[`

applies the batch normalization operation using the mean and variance
`dlY`

,`updatedMu`

,`updatedSigmaSq`

] = batchnorm(`dlX`

,`offset`

,`scaleFactor`

,`mu`

,`sigmaSq`

)`mu`

and `sigmaSq`

, respectively, and also returns
updated moving mean and variance statistics.

Use this syntax to maintain running values for the mean and variance statistics data during training. Use the final updated values of the mean and variance for prediction and classification.

`[___] = batchnorm(___,'DataFormat',FMT)`

applies the batch normalization operation to unformatted input data with format specified
by `FMT`

using any of the previous syntaxes. The output
`dlY`

is an unformatted `dlarray`

object with
dimensions in the same order as `dlX`

. For example,
`'DataFormat','SSCB'`

specifies data for 2-D image input with format
`'SSCB'`

(spatial, spatial, channel, batch).

`[___] = batchnorm(___,`

specifies options using one or more name-value pair arguments in addition to the input
arguments in previous syntaxes. For example, `Name,Value`

)`'MeanDecay',0.3`

sets the
decay rate of the moving average computation.

The batch normalization operation normalizes the elements
*x _{i}* of the input by first calculating the mean

$$\widehat{{x}_{i}}=\frac{{x}_{i}-{\mu}_{B}}{\sqrt{{\sigma}_{B}^{2}+\u03f5}},$$

where *ϵ* is a constant that improves numerical
stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow batch normalization, the batch normalization operation further shifts and scales the activations using the transformation

$${y}_{i}=\gamma {\widehat{x}}_{i}+\beta ,$$

where the offset *β* and scale factor
*γ* are learnable parameters that are updated during network
training.

To make predictions with the network after training, batch normalization requires a fixed mean and variance to normalize the data. This fixed mean and variance can be calculated from the training data after training, or approximated during training using running statistic computations.

`dlarray`

| `dlconv`

| `dlfeval`

| `dlgradient`

| `fullyconnect`

| `groupnorm`

| `layernorm`

| `relu`