Deep Learning Metrics

Use metrics to assess the performance of your deep learning model during and after training.

To specify which metrics to use during training, specify the Metrics option of the trainingOptions function. You can use this option only when you train a network using the trainnet function.

To plot the metrics during training, in the training options, specify Plots as "training-progress". If you specify the ValidationData training option, then the software also plots and records the metric values for the validation data. To output the metric values to the Command Window during training, in the training options, set Verbose to true.

You can also access the metrics after training using the TrainingHistory and ValidationHistory fields from the second output of the trainnet function.

To specify which metrics to use when you test a neural network, use the metrics argument of the testnet function.

You can specify metrics using their built-in name, specified as a string input to the trainingOptions or testnet functions. For example, use this command.

metricValues = testnet(net,data,["accuracy","fscore"]);

If you require greater customization, then you can use metric objects and functions to specify additional options.

If the metric has an equivalent object, then you can create the metric object with additional properties and use the metric object as input to the trainingOptions and testnet functions.
If the metric has an equivalent function, then you can specify that function as a function handle input to the trainingOptions and testnet functions.

For example, use these commands.

customAccuracy = accuracyMetric(NumTopKClasses=5,AverageType="macro");
customCrossEntropy = @(Y,T)crossentropy(X,T,Mask=customMask);
metricValues = testnet(net,data,{customAccuracy,"fscore",customCrossEntropy});

If there is no object or function for metric that you need for your task, then you can create a custom metric using a function or class. For more information, see Custom Metrics.

Classification Metrics

This table compares metrics for classification tasks. The equations include these variables:

TP, FP, TN, FN — True positives, false positives, true negatives, and false negatives
Y_i — Predicted class probabilities for observation i
T_i — One-hot encoded target for observation i
n — Number of observations

N — Normalization factor

Deep Learning Classification Metrics

Name	Description	Use Case	Range	Equation	Built-in Name	Equivalent Object or Function
Accuracy	Proportion of correct predictions to the total number of observations	Provides a general measure of performance, but it can be misleading for imbalanced data sets.	0 – 100 Perfect model: 100	$Accuracy = \frac{T P + T N}{T P + T N + F P + F N}$	`"accuracy"`	`AccuracyMetric` Type: Object
Precision, also known as positive predictive value (PPV)	Proportion of true positive predictions among all positive predictions	Focuses on minimizing false positives, making it useful in scenarios where false positives are costly, such as spam detection.	0 – 1 Perfect model: 1	$Precision = \frac{T P}{T P + F P}$	`"precision"`	`PrecisionMetric` Type: Object
Recall, also known as true positive rate (TPR) or sensitivity	Ability of the model to correctly identify all instances of a particular class	Focuses on minimizing false negatives, making it suitable for applications where false negatives are costly, such as medical diagnosis.	0 – 1 Perfect model: 1	$Recall = \frac{T P}{T P + F N}$	`"recall"`	`RecallMetric` Type: Object
F_β-score	Harmonic mean of precision and recall	Balances precision and recall in a single metric.	0 – 1 Perfect model: 1	$F_{β} = \frac{(1 + β^{2}) T P}{(1 + β^{2}) T P + β^{2} F N + F P}$	`"fscore"`	`FScoreMetric` Type: Object
Area-under-curve (AUC)	Ability of a model to distinguish between classes	Useful for comparing models and evaluating performance across different classification thresholds, but it can be difficult to interpret.	0 – 1 Perfect model: 1	A ROC curve shows the true positive rate (TPR) versus the false positive rate (FPR) for different thresholds of classification scores. The AUC corresponds to the integral of the curve (TPR values) with respect to FPR values from zero to one.	`"auc"`	`AUCMetric` Type: Object
Cross-entropy	Difference between the true and predicted distribution of class labels for single-label classification tasks	Directly related to the output of a model, but it can be difficult to interpret. Suitable for tasks where each observation is assigned exclusively to one class label.	≥ 0 Perfect model: 0	$Crossentropy = - \frac{1}{N} \sum_{i = 1}^{n} T_{i} \ln (Y_{i})$	`"crossentropy"`	`crossentropy` with `NormalizationFactor` set to `"all-elements"`, which is then multiplied by the number of channels, and `ClassificationMode` set to `"single-label"` Type: Function
Binary cross-entropy	Difference between the true and predicted distribution of class labels for multilabel and binary classification tasks	Directly related to the output of a model, but it can be difficult to interpret. Suitable for binary classification tasks or tasks where each observation can be assigned to multiple class labels.	≥ 0 Perfect model: 0	$Crossentropy = - \frac{1}{N} \sum_{i = 1}^{n} (T_{j} ln Y_{j} + (1 - T_{j}) ln (1 - Y_{j}))$	`"binary-crossentropy"`	`crossentropy` with `NormalizationFactor` set to `"all-elements"` and `ClassificationMode` set to `"multilabel"` Type: Function
Index cross-entropy	Difference between the true and predicted distribution of class labels, specified as integer class indices, for single-label classification tasks	Directly related to the output of a model and it can save memory when dealing with many classes, but it can be difficult to interpret. Suitable for tasks where each observation is exclusively assigned one class label.	≥ 0 Perfect model: 0	$\begin{array}{l} Crossentropy = - \frac{1}{N} \sum_{i = 1}^{n} {\tilde{T}}_{i} \ln ({\tilde{Y}}_{i}), \\ where {\tilde{T}}_{i} and {\tilde{Y}}_{i} are the one-hot encoded targets and predictions, respectively \end{array}$	`"indexcrossentropy"`	`indexcrossentropy` with `NormalizationFactor` set to `"target-included"` Type: Function

Regression Metrics

This table compares metrics for regression tasks. The equations include these variables:

Y_i — Predicted value of observation i
T_i — True value of observation i
n — Number of observations
N — Normalization factor

Deep Learning Regression Metrics

Name	Description	Use Case	Range	Equation	Built-in Name	Equivalent Object or Function
Root mean squared error (RMSE)	Magnitude of the errors between the predicted and true values	A general measure of model performance, expressed in the same units as the data. It can be sensitive to outliers.	≥ 0 Perfect model: 0	$RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} \| Y_{i} - T_{i} \|^{2}},$	`"rmse"`	`RMSEMetric` Type: Object
Mean absolute percentage error (MAPE)	Percentage magnitude of the errors between the predicted and true values	Returns a percentage, making it is an intuitive performance measure that is easy to compare across models, though it may perform poorly when target values are near zero.	≥ 0 Perfect model: 0	$MAPE = \frac{1}{N} \sum_{i = 1}^{n} \| \frac{T_{i} - Y_{i}}{T_{i}} \|,$	`"mape"`	`MAPEMetric` Type: Object
R², also known as the coefficient of determination	Measure of how well the predictions explain the variance in the true values	A unitless measure of performance that is easy to compare across different models and data sets.	≤ 1 Perfect model: 1	$R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(Y_{i} - T_{i})}^{2}}{\sum_{i = 1}^{n} {(T_{i} - \bar{T})}^{2}}, where \bar{T} = \frac{1}{n} \sum_{i = 1}^{n} T_{i}$	`"rsquared"`	`RSquaredMetric` Type: Object
Mean absolute error (MAE), also known as L₁ loss	Magnitude of the errors between the predicted and true values	Provides an understanding of the average error. It is robust to outliers and expressed in the same units as the data.	≥ 0 Perfect model: 0	$MAE = \frac{1}{N} \sum_{i = 1}^{n} \| Y_{i} - T_{i} \|$	`"mae" / "mean-absolute-error" / "l1loss"`	`l1loss` with `NormalizationFactor` set to `"all-elements"` Type: Function
Mean squared error (MSE), also known as L₂ loss	Squared difference between the predicted and true values	A general measure of model performance that penalizes outliers more, making it suitable for applications where outliers are costly.	≥ 0 Perfect model: 0	$MSE = \frac{1}{N} \sum_{i = 1}^{n} {(Y_{i} - T_{i})}^{2}$	`"mse" / "mean-squared-error" / "l2loss"`	`l2loss` with `NormalizationFactor` set to `"all-elements"` Type: Function
Huber	Combination of MSE and MAE	Balances sensitivity to outliers with robust error measurement, making it suitable for data sets with some outliers.	≥ 0 Perfect model: 0	$\begin{array}{l} {Huber}_{i} = {\begin{matrix} \frac{1}{2} {(Y_{i} - T_{i})}^{2} & if \| Y_{i} - T_{i} \| \leq 1 \\ \| Y_{i} - T_{i} \| - \frac{1}{2} & otherwise \end{matrix}, \\ Huber = \frac{1}{N} \sum_{i = 1}^{n} {Huber}_{i} \end{array}$	`"huber"`	`huber` with `NormalizationFactor` set to `"all-elements"` Type: Function

Custom Metrics

If Deep Learning Toolbox™ does not provide the metric that you need for your task, then in many cases you can create a custom metric using a function. After you define the metric function, you can specify the metric as the Metrics name-value argument in the trainingOptions function. For more information, see Define Custom Metric Function.

Early stopping and returning the best network is not supported for custom metric functions. If you require early stopping or retuning the best network, then you must create a custom metric object instead. For more information, see Define Custom Deep Learning Metric Object.

Deep Learning Metrics

Classification Metrics

Regression Metrics

Custom Metrics

See Also

Topics