indexcrossentropy
Syntax
Description
The index cross-entropy operation computes the cross-entropy loss between network predictions and targets specified as integer class indices for single-label classification tasks.
Index cross-entropy loss, also known as sparse cross-entropy loss, is a more memory and computationally efficient alternative to the standard cross-entropy loss algorithm. It does not require binary or one-hot encoded targets. Instead, the function requires targets specified as integer class indices. Index cross-entropy loss is particularly well-suited to targets that span many classes, where one-hot encoded data presents unnecessary memory overhead.
calculates the categorical cross-entropy loss between the formatted predictions
loss
= indexcrossentropy(Y
,targets
)Y
and the integer class indices targets
for
single-label classification tasks.
For unformatted input data, use the DataFormat
argument.
specifies options using one or more name-value arguments in addition to any combination of
the input arguments from previous syntaxes. For example, loss
= indexcrossentropy(___,Name=Value
)DataFormat="BC"
specifies that the first and second dimensions of the input data correspond to the batch and
channel dimensions, respectively.
Examples
Index Cross-Entropy Loss for Single-Label Classification
Create an array of prediction scores for seven observations over five classes.
numClasses = 5;
numObservations = 7;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y)
Y = 5(C) x 7(B) dlarray 0.2205 0.1175 0.1140 0.1153 0.1963 0.2416 0.3104 0.2415 0.1408 0.2571 0.1526 0.1056 0.2381 0.1582 0.1109 0.1842 0.2537 0.2500 0.2381 0.1677 0.2021 0.2434 0.2777 0.1583 0.2210 0.2592 0.2182 0.1605 0.1837 0.2798 0.2169 0.2612 0.2008 0.1344 0.1688
Create an array of targets specified as class indices.
T = randi(numClasses,[1 numObservations])
T = 1×7
5 4 2 5 1 3 2
Compute the index cross-entropy loss between the predictions and the targets.
loss = indexcrossentropy(Y,T)
loss = 1x1 dlarray 1.5620
Weighted Index Cross-Entropy Loss
Create an array of prediction scores for seven observations over five classes.
numClasses = 5;
numObservations = 7;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y)
Y = 5(C) x 7(B) dlarray 0.2205 0.1175 0.1140 0.1153 0.1963 0.2416 0.3104 0.2415 0.1408 0.2571 0.1526 0.1056 0.2381 0.1582 0.1109 0.1842 0.2537 0.2500 0.2381 0.1677 0.2021 0.2434 0.2777 0.1583 0.2210 0.2592 0.2182 0.1605 0.1837 0.2798 0.2169 0.2612 0.2008 0.1344 0.1688
Create an array of targets specified as class indices.
T = randi(numClasses,[1 numObservations])
T = 1×7
5 4 2 5 1 3 2
Compute the weighted cross-entropy loss between the predictions and the targets using a vector of class weights. Specify a weights format of "UC"
(unspecified, channel) using the WeightsFormat
argument.
weights = rand(1,numClasses)
weights = 1×5
0.7655 0.7952 0.1869 0.4898 0.4456
loss = indexcrossentropy(Y,T,weights,WeightsFormat="UC")
loss = 1x1 dlarray 0.8725
Input Arguments
Y
— Predictions
dlarray
object | numeric array
Predictions, specified as a formatted or unformatted dlarray
object,
or a numeric array. When Y
is not a formatted
dlarray
, you must specify the dimension format using the
DataFormat
argument.
If Y
is a numeric array, targets
must be a
dlarray
object.
targets
— Target classification labels
dlarray
object | numeric array
Target classification labels, specified as a formatted or unformatted
dlarray
object, or a numeric array.
Specify the targets as an array containing integer class indices with the same size
and format as Y
, excluding the channel dimension. Each element of
targets
must be a positive integer less than or equal to the size
of the channel dimension of Y
(the number of classes), or equal to
the MaskIndex
argument value.
If targets
and Y
are formatted
dlarray
objects, then the format of targets
must
be the same as the format of Y
, excluding the
"C"
(channel) dimension. If targets
is a
formatted dlarray
object and Y
is not a formatted
dlarray
object, then the format of targets
must
be the same as the DataFormat
argument value, excluding the
"C"
(channel) dimension.
If targets
is an unformatted dlarray
or a
numeric array, then the function applies the format of Y
or the
value of DataFormat
to targets
.
Tip
Formatted dlarray
objects automatically permute the dimensions of the
underlying data to have the order "S"
(spatial), "C"
(channel), "B"
(batch), "T"
(time), then
"U"
(unspecified). To ensure that the dimensions of
Y
and targets
are consistent, when
Y
is a formatted dlarray
, also specify
targets
as a formatted dlarray
.
weights
— Weights
dlarray
object | numeric array
Weights, specified as a dlarray
object or a numeric array.
To specify class weights, specify a vector with a "C"
(channel) dimension
with size matching the "C"
(channel) dimension of
Y
and a singleton "U"
(unspecified)
dimension. Specify the dimensions of the class weights by using a formatted
dlarray
object or by using the WeightsFormat
argument.
To specify observation weights, specify a vector with a "B"
(batch)
dimension with size matching the "B"
(batch) dimension of
Y
. Specify the "B"
(batch) dimension of the
class weights by using a formatted dlarray
object or by using the
WeightsFormat
argument.
To specify weights for each element of the input independently, specify the weights as an
array of the same size as Y
. In this case, if
weights
is not a formatted dlarray
object, then
the function uses the same format as Y
. Alternatively, specify the
weights format using the WeightsFormat
argument.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: indexcrossentropy(Y,T,DataFormat="BC")
specifies that the
first and second dimension of the input data correspond to the batch and channel dimensions,
respectively.
MaskIndex
— Masked value index
0
(default) | numeric scalar
Masked value index, specified as a numeric scalar.
The function excludes elements of the input data from loss computation when the target elements match the mask index.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Reduction
— Loss value array reduction mode
"sum"
(default) | "none"
Loss value array reduction mode, specified as "sum"
or
"none"
.
If the Reduction
argument is "sum"
, then the function
sums all elements in the array of loss values. In this case, the output
loss
is a scalar.
If the Reduction
argument is "none"
, then the
function does not reduce the array of loss values. In this case, the output
loss
is an unformatted dlarray
object
of the same size as Y
.
NormalizationFactor
— Divisor for normalizing reduced loss
"batch-size"
(default) | "all-elements"
| "target-included"
| "none"
Divisor for normalizing the reduced loss, specified as one of these options:
"batch-size"
— Normalize the loss by dividing it by the number of observations inY
."all-elements"
— Normalize the loss by dividing it by the number of elements ofY
."target-included"
— Normalize the loss by dividing the loss values by the product of the number of observations and the number of elements that are not excluded according to theMaskIndex
argument."none"
— Do not normalize the loss.
If Reduction
is "none"
, then this option
has no effect.
DataFormat
— Description of data dimensions
character vector | string scalar
Description of the data dimensions, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format "CBT"
(channel, batch,
time).
You can specify multiple dimensions labeled "S"
or "U"
.
You can use the labels "C"
, "B"
, and
"T"
once each, at most. The software ignores singleton trailing
"U"
dimensions after the second dimension.
If the input data is not a formatted dlarray
object, then you must
specify the DataFormat
option.
For more information, see Deep Learning Data Formats.
Data Types: char
| string
WeightsFormat
— Description of dimensions of weights
character vector | string scalar
Description of the dimensions of the weights, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format "CBT"
(channel, batch,
time).
You can specify multiple dimensions labeled "S"
or "U"
.
You can use the labels "C"
, "B"
, and
"T"
once each, at most. The software ignores singleton trailing
"U"
dimensions after the second dimension.
If weights
is a numeric vector and
Y
has two or more nonsingleton
dimensions, then you must specify the
WeightsFormat
option.
If weights
is not a vector, or
weights
and
Y
are both vectors, then the
default value of WeightsFormat
is the same
as the format of Y
.
For more information, see Deep Learning Data Formats.
Data Types: char
| string
Output Arguments
loss
— Index cross-entropy loss
unformatted dlarray
object
Index cross-entropy loss, returned as an unformatted dlarray
object with the same underlying data type as the input Y
.
If the Reduction
argument is "sum"
, then the function
sums all elements in the array of loss values. In this case, the output
loss
is a scalar.
If the Reduction
argument is "none"
, then the
function does not reduce the array of loss values. In this case, the output
loss
is an unformatted dlarray
object
of the same size as Y
.
Algorithms
Index Cross-Entropy Loss
Index cross-entropy loss, also known as sparse cross-entropy loss, is a more memory and computationally efficient alternative to the standard cross-entropy loss algorithm. It does not require binary or one-hot encoded targets. Instead, the function requires targets specified as integer class indices. Index cross-entropy loss is particularly well-suited to targets that span many classes, where one-hot encoded data presents unnecessary memory overhead.
In particular, for each prediction in the input, the standard cross-entropy loss function requires targets specified as 1-by-K vectors, each containing only one nonzero element. To avoid the dense encoding of the zero and nonzero elements, the index cross-entropy function requires targets specified as scalars that represent the indices of the nonzero elements.
For single-label classification, the standard cross-entropy function uses the formula
where T is an array of one-hot encoded targets, Y is an array of predictions, and N and K are the numbers of observations and classes, respectively.
For single-label classification, the index cross-entropy loss function uses the formula:
where T is an array of targets, specified as class indices.
This table shows the index cross-entropy loss formulas for different tasks.
Task | Description | Loss |
---|---|---|
Single-label classification | Index cross-entropy loss for mutually exclusive classes. This is useful when observations must have only a single label. |
where N is the numbers of observations. |
Single-label classification with weighted classes | Index cross-entropy loss with class weights. This is useful for datasets with imbalanced classes. |
where N is the number of observations, and wi denotes the weight for class i. |
Sequence-to-sequence classification | Index cross-entropy loss with masked time steps. This is useful for ignoring loss values that correspond to padded data. |
where and N, S, and K are the numbers of observations, time steps, and classes, respectively, and m denotes the mask index. |
Deep Learning Array Formats
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.
To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format "CBT"
(channel, batch,
time).
To create formatted input data, create a dlarray
object and specify the format using the second argument.
To provide additional layout information with unformatted data, specify the formats using the DataFormat
and WeightsFormat
arguments.
For more information, see Deep Learning Data Formats.
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
The indexcrossentropy
function
supports GPU array input with these usage notes and limitations:
When at least one of these input arguments is a
gpuArray
or adlarray
with underlying data of typegpuArray
, this function runs on the GPU:Y
targets
weights
MaskIndex
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2024b
See Also
dlarray
| dlgradient
| dlfeval
| crossentropy
| softmax
| sigmoid
| huber
| l1loss
| l2loss
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)