Main Content

Deep Learning Metrics

Use metrics to assess the performance of your deep learning model during and after training.

To specify which metrics to use during training, specify the Metrics option of the trainingOptions function. You can use this option only when you train a network using the trainnet function.

To plot the metrics during training, in the training options, specify Plots as "training-progress". If you specify the ValidationData training option, then the software also plots and records the metric values for the validation data. To output the metric values to the Command Window during training, in the training options, set Verbose to true.

You can also access the metrics after training using the TrainingHistory and ValidationHistory fields from the second output of the trainnet function.

To specify which metrics to use when you test a neural network, use the metrics argument of the testnet function.

You can specify metrics using their built-in name, specified as a string input to the trainingOptions or testnet functions. For example, use this command.

metricValues = testnet(net,data,["accuracy","fscore"]); 
If you require greater customization, then you can use metric objects and functions to specify additional options.

  • If the metric has an equivalent object, then you can create the metric object with additional properties and use the metric object as input to the trainingOptions and testnet functions.

  • If the metric has an equivalent function, then you can specify that function as a function handle input to the trainingOptions and testnet functions.

For example, use these commands.

customAccuracy = accuracyMetric(NumTopKClasses=5,AverageType="macro");
customCrossEntropy = @(Y,T)crossentropy(X,T,Mask=customMask);
metricValues = testnet(net,data,{customAccuracy,"fscore",customCrossEntropy});

If there is no object or function for metric that you need for your task, then you can create a custom metric using a function or class. For more information, see Custom Metrics.

Classification Metrics

This table compares metrics for classification tasks. The equations include these variables:

  • TP, FP, TN, FN — True positives, false positives, true negatives, and false negatives

  • Yi — Predicted class probabilities for observation i

  • Ti — One-hot encoded target for observation i

  • n — Number of observations

  • N — Normalization factor

    Deep Learning Classification Metrics

    NameDescriptionUse CaseRange

    Equation

    Built-in Name Equivalent Object or Function
    Accuracy

    Proportion of correct predictions to the total number of observations

    Provides a general measure of performance, but it can be misleading for imbalanced data sets.

    0 – 100

    Perfect model: 100

    Accuracy = TP+TNTP+TN+FP+FN

    "accuracy"

    AccuracyMetric

    Type: Object

    Precision, also known as positive predictive value (PPV) Proportion of true positive predictions among all positive predictionsFocuses on minimizing false positives, making it useful in scenarios where false positives are costly, such as spam detection.

    0 – 1

    Perfect model: 1

    Precision = TPTP+FP

    "precision"

    PrecisionMetric

    Type: Object

    Recall, also known as true positive rate (TPR) or sensitivityAbility of the model to correctly identify all instances of a particular classFocuses on minimizing false negatives, making it suitable for applications where false negatives are costly, such as medical diagnosis.

    0 – 1

    Perfect model: 1

    Recall = TPTP+FN

    "recall"

    RecallMetric

    Type: Object

    Fβ-scoreHarmonic mean of precision and recallBalances precision and recall in a single metric.

    0 – 1

    Perfect model: 1

    Fβ=(1+β2)TP(1+β2)TP+β2FN+FP

    "fscore"

    FScoreMetric

    Type: Object

    Area-under-curve (AUC)Ability of a model to distinguish between classesUseful for comparing models and evaluating performance across different classification thresholds, but it can be difficult to interpret.

    0 – 1

    Perfect model: 1

    A ROC curve shows the true positive rate (TPR) versus the false positive rate (FPR) for different thresholds of classification scores. The AUC corresponds to the integral of the curve (TPR values) with respect to FPR values from zero to one."auc"

    AUCMetric

    Type: Object

    Cross-entropyDifference between the true and predicted distribution of class labels for single-label classification tasksDirectly related to the output of a model, but it can be difficult to interpret. Suitable for tasks where each observation is assigned exclusively to one class label.

    ≥ 0

    Perfect model: 0

    Crossentropy=1Ni=1nTiln(Yi)

    "crossentropy"

    crossentropy with NormalizationFactor set to "all-elements", which is then multiplied by the number of channels, and ClassificationMode set to "single-label"

    Type: Function

    Binary cross-entropy Difference between the true and predicted distribution of class labels for multilabel and binary classification tasks Directly related to the output of a model, but it can be difficult to interpret. Suitable for binary classification tasks or tasks where each observation can be assigned to multiple class labels.

    ≥ 0

    Perfect model: 0

    Crossentropy=1Ni=1n(TjlnYj+(1Tj)ln(1Yj))

    "binary-crossentropy"

    crossentropy with NormalizationFactor set to "all-elements" and ClassificationMode set to "multilabel"

    Type: Function

    Index cross-entropyDifference between the true and predicted distribution of class labels, specified as integer class indices, for single-label classification tasksDirectly related to the output of a model and it can save memory when dealing with many classes, but it can be difficult to interpret. Suitable for tasks where each observation is exclusively assigned one class label.

    ≥ 0

    Perfect model: 0

    Crossentropy=1Ni=1nT˜iln(Y˜i),where T˜i and Y˜i are the one-hot encoded targets and predictions, respectively

    "indexcrossentropy"

    indexcrossentropy with NormalizationFactor set to "target-included"

    Type: Function

Regression Metrics

This table compares metrics for regression tasks. The equations include these variables:

  • Yi — Predicted value of observation i

  • Ti — True value of observation i

  • n — Number of observations

  • N — Normalization factor

Deep Learning Regression Metrics

NameDescriptionUse CaseRange

Equation

Built-in NameEquivalent Object or Function
Root mean squared error (RMSE)Magnitude of the errors between the predicted and true valuesA general measure of model performance, expressed in the same units as the data. It can be sensitive to outliers.

≥ 0

Perfect model: 0

RMSE = 1Ni=1n|YiTi|2,

"rmse"

RMSEMetric

Type: Object

Mean absolute percentage error (MAPE)Percentage magnitude of the errors between the predicted and true valuesReturns a percentage, making it is an intuitive performance measure that is easy to compare across models, though it may perform poorly when target values are near zero.

≥ 0

Perfect model: 0

MAPE =1Ni=1n|TiYiTi|,

"mape"

MAPEMetric

Type: Object

R2, also known as the coefficient of determinationMeasure of how well the predictions explain the variance in the true valuesA unitless measure of performance that is easy to compare across different models and data sets.

≤ 1

Perfect model: 1

R2=1i=1n(YiTi)2i=1n(TiT¯)2,       where T¯=1ni=1nTi

"rsquared"

RSquaredMetric

Type: Object

Mean absolute error (MAE), also known as L1 lossMagnitude of the errors between the predicted and true valuesProvides an understanding of the average error. It is robust to outliers and expressed in the same units as the data.

≥ 0

Perfect model: 0

MAE=1Ni=1n|YiTi|

"mae" / "mean-absolute-error" / "l1loss"

l1loss with NormalizationFactor set to "all-elements"

Type: Function

Mean squared error (MSE), also known as L2 lossSquared difference between the predicted and true valuesA general measure of model performance that penalizes outliers more, making it suitable for applications where outliers are costly.

≥ 0

Perfect model: 0

MSE=1Ni=1n(YiTi)2

"mse" / "mean-squared-error" / "l2loss"

l2loss with NormalizationFactor set to "all-elements"

Type: Function

HuberCombination of MSE and MAEBalances sensitivity to outliers with robust error measurement, making it suitable for data sets with some outliers.

≥ 0

Perfect model: 0

Huberi={12(YiTi)2if |YiTi|1|YiTi|12otherwise,Huber=1Ni=1n Huberi

"huber"

huber with NormalizationFactor set to "all-elements"

Type: Function

Custom Metrics

If Deep Learning Toolbox™ does not provide the metric that you need for your task, then in many cases you can create a custom metric using a function. After you define the metric function, you can specify the metric as the Metrics name-value argument in the trainingOptions function. For more information, see Define Custom Metric Function.

Early stopping and returning the best network is not supported for custom metric functions. If you require early stopping or retuning the best network, then you must create a custom metric object instead. For more information, see Define Custom Deep Learning Metric Object.

See Also

| | | | | | | | | | |

Topics