layernorm

Normalize across all channels for each observation independently

Syntax

``dlY = layernorm(dlX,offset,scaleFactor)``
``dlY = layernorm(dlX,offset,scaleFactor,'DataFormat',FMT)``
``[dlY] = layernorm(___,Name,Value)``

Description

The layer normalization operation normalizes the input data across all channels for each observation independently. To speed up training of recurrent and multi-layer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization after the learnable operations, such as LSTM and fully connect operations.

After normalization, the operation shifts the input by a learnable offset β and scales it by a learnable scale factor γ.

The `layernorm` function applies the layer normalization operation to `dlarray` data. Using `dlarray` objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the `'S'`, `'T'`, `'C'`, and `'B'` labels, respectively. For unspecified and other dimensions, use the `'U'` label. For `dlarray` object functions that operate over particular dimensions, you can specify the dimension labels by formatting the `dlarray` object directly, or by using the `'DataFormat'` option.

Note

To apply layer normalization within a `layerGraph` object or `Layer` array, use `layerNormalizationLayer`.

example

````dlY = layernorm(dlX,offset,scaleFactor)` applies the layer normalization operation to the input data `dlX` and transforms using the specified offset and scale factor.The function normalizes over the `'S'` (spatial), `'T'` (time), `'C'` (channel), and `'U'` (unspecified) dimensions of `dlX` for each observation in the `'B'` (batch) dimension, independently. For unformatted input data, use the `'DataFormat'` option.```
````dlY = layernorm(dlX,offset,scaleFactor,'DataFormat',FMT)` applies the layer normalization operation to the unformatted `dlarray` object `dlX` with format specified by `FMT` using any of the previous syntaxes. The output `dlY` is an unformatted `dlarray` object with dimensions in the same order as `dlX`. For example, `'DataFormat','SSCB'` specifies data for 2-D image input with format `'SSCB'` (spatial, spatial, channel, batch).To specify the format of the scale and offset, use the `'ScaleFormat'` and `'OffsetFormat'` options, respectively.```
````[dlY] = layernorm(___,Name,Value)` specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, `'Epsilon',1e-4` sets the epsilon value to `1e-4`.```

Examples

collapse all

Create a formatted `dlarray` object containing a batch of 128 sequences of length 100 with 10 channels. Specify the format `'CBT'` (channel, batch, time).

```numChannels = 10; miniBatchSize = 128; sequenceLength = 100; X = rand(numChannels,miniBatchSize,sequenceLength); dlX = dlarray(X,'CBT');```

View the size and format of the input data.

`size(dlX)`
```ans = 1×3 10 128 100 ```
`dims(dlX)`
```ans = 'CBT' ```

For per-observation channel-wise layer normalization, initialize the offset and scale with a vector of zeros and ones, respectively.

```offset = zeros(numChannels,1); scaleFactor = ones(numChannels,1);```

Apply the layer normalization operation using the `layernorm` function.

`dlY = layernorm(dlX,offset,scaleFactor);`

View the size and the format of the output `dlY`.

`size(dlY)`
```ans = 1×3 10 128 100 ```
`dims(dlY)`
```ans = 'CBT' ```

To perform element-wise layer normalization, specify an offset and scale factor with the same size as the input data.

Create a formatted `dlarray` object containing a batch of 128 sequences of length 100 with 10 channels. Specify the format `'CBT'` (channel, batch, time).

```numChannels = 10; miniBatchSize = 128; sequenceLength = 100; X = rand(numChannels,miniBatchSize,sequenceLength); dlX = dlarray(X,'CBT');```

View the size and format of the input data.

`size(dlX)`
```ans = 1×3 10 128 100 ```
`dims(dlX)`
```ans = 'CBT' ```

For element-wise layer normalization, initialize the offset and scale with an array of zeros and ones, respectively.

```offset = zeros(numChannels,sequenceLength); scaleFactor = ones(numChannels,sequenceLength);```

Apply the layer normalization operation using the `layernorm` function. Specify the offset and scale formats as `'CT'` (channel, time) using the `'OffsetFormat'` and `'ScaleFormat'` options, respectively.

`dlY = layernorm(dlX,offset,scaleFactor,'OffsetFormat','CT','ScaleFormat','CT');`

View the size and the format of the output `dlY`.

`size(dlY)`
```ans = 1×3 10 128 100 ```
`dims(dlY)`
```ans = 'CBT' ```

Input Arguments

collapse all

Input data, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

If `dlX` is an unformatted `dlarray` or a numeric array, then you must specify the format using the `'DataFormat'` option. If `dlX` is a numeric array, then either `scaleFactor` or `offset` must be a `dlarray` object.

`dlX` must have a `'C'` (channel) dimension.

Offset β, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

The size and format of the offset depends on the type of transformation.

Channel-wise transformation

Array with one nonsingleton dimension with size matching the size of the `'C'` (channel) dimension of the input `dlX`.

For channel-wise transformation, if `offset` is a formatted `dlarray` object, then the nonsingleton dimension must have label `'C'` (channel).

Element-wise transformation

Array with a `'C'` (channel) dimension with the same size as the `'C'` (channel) dimension of the input `dlX` and zero or the same number of `'S'` (spatial), `'T'` (time), and `'U'` (unspecified) dimensions of the input `dlX`.

Each dimension must have size 1 or have sizes matching the corresponding dimensions in the input `dlX`. For any repeated dimensions, for example, multiple `'S'` (spatial) dimensions, the sizes must match the corresponding dimensions in `dlX` or must all be singleton.

The software automatically expands any singleton dimensions to match the size of a single observation in the input `dlX`.

For element-wise transformation, if `offset` is a numeric array or an unformatted `dlarray`, then you must specify the offset format using the `'OffsetFormat'` option.

Scale factor γ, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

The size and format of the offset depends on the type of transformation:

Channel-wise transformation

Array with one nonsingleton dimension with size matching the size of the `'C'` (channel) dimension of the input `dlX`.

For channel-wise transformation, if `scaleFactor` is a formatted `dlarray` object, then the nonsingleton dimension must have label `'C'` (channel).

Element-wise transformation

Array with a `'C'` (channel) dimension with the same size as the `'C'` (channel) dimension of the input `dlX` and zero or the same number of `'S'` (spatial), `'T'` (time), and `'U'` (unspecified) dimensions of the input `dlX`.

Each dimension must have size 1 or have sizes matching the corresponding dimensions in the input `dlX`. For any repeated dimensions, for example, multiple `'S'` (spatial) dimensions, the sizes must match the corresponding dimensions in `dlX` or must all be singleton.

The software automatically expands any singleton dimensions to match the size of a single observation in the input `dlX`.

For element-wise transformation, if `scaleFactor` is a numeric array or an unformatted `dlarray`, then you must specify the scale format using the `'ScaleFormat'` option.

Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Epsilon',1e-4` sets the variance offset value to `1e-4`.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of `'DataFormat'` and a character vector or string scalar `FMT` that provides a label for each dimension of the data.

When specifying the format of a `dlarray` object, each character provides a label for each dimension of the data and must be one of the following:

• `'S'` — Spatial

• `'C'` — Channel

• `'B'` — Batch (for example, samples and observations)

• `'T'` — Time (for example, time steps of sequences)

• `'U'` — Unspecified

You can specify multiple dimensions labeled `'S'` or `'U'`. You can use the labels `'C'`, `'B'`, and `'T'` at most once.

You must specify `'DataFormat'` when the input data is not a formatted `dlarray`.

Example: `'DataFormat','SSCB'`

Data Types: `char` | `string`

Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of `'Epsilon'` and a numeric scalar. The specified value must be greater than `1e-5`. The default value is `1e-5`.

Data Types: `single` | `double`

Dimension order of unformatted scale factor, specified as the comma-separated pair consisting of `'ScaleFormat'` and a character vector or string scalar.

When specifying the format of a `dlarray` object, each character provides a label for each dimension of the data and must be one of the following:

• `'S'` — Spatial

• `'C'` — Channel

• `'B'` — Batch (for example, samples and observations)

• `'T'` — Time (for example, time steps of sequences)

• `'U'` — Unspecified

For layer normalization, the scale factor must have a `'C'` (channel) dimension. You can specify multiple dimensions labeled `'S'` or `'U'`. You can use the label `'T'` (time) at most once. The scale factor must not have a `'B'` (batch) dimension.

You must specify `'ScaleFormat'` for element-wise normalization when `scaleFactor` is a numeric array or an a unformatted `dlarray`.

Example: `'ScaleFormat','SSCB'`

Data Types: `char` | `string`

Dimension order of unformatted offset, specified as the comma-separated pair consisting of `'OffsetFormat'` and a character vector or string scalar.

When specifying the format of a `dlarray` object, each character provides a label for each dimension of the data and must be one of the following:

• `'S'` — Spatial

• `'C'` — Channel

• `'B'` — Batch (for example, samples and observations)

• `'T'` — Time (for example, time steps of sequences)

• `'U'` — Unspecified

For layer normalization, the offset must have a `'C'` (channel) dimension. You can specify multiple dimensions labeled `'S'` or `'U'`. You can use the label `'T'` (time) at most once. The offset must not have a `'B'` (batch) dimension.

You must specify `'OffsetFormat'` for element-wise normalization when `offset` is a numeric array or an unformatted `dlarray`.

Example: `'OffsetFormat','SSCB'`

Data Types: `char` | `string`

Output Arguments

collapse all

Normalized data, returned as a `dlarray`. The output `dlY` has the same underlying data type as the input `dlX`.

If the input data `dlX` is a formatted `dlarray`, `dlY` has the same dimension labels as `dlX`. If the input data is not a formatted `dlarray`, `dlY` is an unformatted `dlarray` with the same dimension order as the input data.

Algorithms

The layer normalization operation normalizes the elements xi of the input by first calculating the mean μL and variance σL2 over the spatial, time, and channel dimensions for each observation independently. Then, it calculates the normalized activations as

`$\stackrel{^}{{x}_{i}}=\frac{{x}_{i}-{\mu }_{L}}{\sqrt{{\sigma }_{L}^{2}+ϵ}}.$`

where ϵ is a constant that improves numerical stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow layer normalization, the layer normalization operation further shifts and scales the activations using the transformation

`${y}_{i}=\gamma {\stackrel{^}{x}}_{i}+\beta ,$`

where the offset β and scale factor γ are learnable parameters that are updated during network training.

Extended Capabilities

Introduced in R2021a