Main Content

risk.validation.kolmogorovSmirnov

Return Kolmogorov-Smirnov statistic

Since R2025a

    Description

    ksValue = risk.validation.kolmogorovSmirnov(Score,BinaryResponse) returns the Kolmogorov-Smirnov (KS) value, where Score contains numeric values that represent quantities such as rankings or predictions, probability of default (PD), or loss given default (LGD) estimates. For example, in credit scoring models, the values in Score can represent individual credit scores or other credit data. BinaryResponse specifies the target state of each value in Score.

    example

    ksValue = risk.validation.accuracyRatio(Score,BinaryResponse,SortDirection=sortdir) specifies the sorting direction of the unique values in Score.

    [ksValue,Output] = risk.validation.accuracyRatio(___) also returns a structure Output, that contains the KS score and a table of metrics with columns Thresholds, TruePositiveRate, and FalsePositiveRate.

    Examples

    collapse all

    Compute the Kolmogorov-Smirnov (KS) statistic for credit scores by using the kolmogorovSmirnov function. In this example, you use the credit validation data set, which includes a table, ScorecardValidationData, that contains credit scores and their corresponding default status information.

    Load and display the credit validation data.

    load CreditValidationData.mat
    head(ScorecardValidationData)
        CreditScore      PD       Default
        ___________    _______    _______
    
          579.86       0.14182       0   
          563.65       0.17143       0   
          549.52       0.20106       0   
          546.25       0.20845       0   
          485.34       0.37991       0   
          482.07       0.39065       0   
          579.86       0.14182       1   
          451.73         0.494       0   
    

    Extract the variables CreditScore and Default from the table ScorecardValidationData. Use Default as the BinaryResponse input argument.

    Scores = ScorecardValidationData.CreditScore;
    BinaryResponse = ScorecardValidationData.Default;

    Compute the KS statistic by using the kolmogorovSmirnov function with the fully qualified namespace risk.validation. For credit models, you can sort the scores from lower scores to higher scores by setting the SortDirection name-value argument to "ascending". This setting ensures that the function sorts the scores from higher risk individuals to lower risk individuals.

    [ksValue,Output] = risk.validation.kolmogorovSmirnov(Scores,BinaryResponse,SortDirection="ascending")
    ksValue = 
    0.1770
    
    Output = struct with fields:
        KolmogorovSmirnovStatistic: 0.1770
            KolmogorovSmirnovScore: 476.4030
                           Metrics: [107×3 table]
    
    

    The output structure, Output, contains the KS statistic and the value in Score that attains this statistic. Display the metrics Threshold, TruePositiveRate, and FalsePositiveRate contained in the table Output.Metrics.

    head(Output.Metrics)
        Threshold    TruePositiveRate    FalsePositiveRate
        _________    ________________    _________________
    
         408.99                 0                   0     
         408.99          0.071429            0.012821     
         410.12          0.079365            0.017094     
         430.66          0.087302            0.017094     
         435.52          0.087302            0.025641     
         436.65           0.10317            0.029915     
         439.33           0.11905            0.029915     
         440.45           0.13492            0.029915     
    

    Input Arguments

    collapse all

    Score values, specified as a numeric vector, containing values that indicate quantities such as rankings or predictions, PD, or LGD estimates.

    Binary response, specified as a numeric or logical vector, that contains values of 1 (true) or 0 (false). The binary response represents the target state for each value in Score. For example, you can use the binary response to represent a discretized LGD target, where ones indicate a high LGD value.

    Sorting direction of the unique values in Score, specified as "descending" or "ascending". If Score contains credit scores, where low values commonly correspond to higher risk individuals, you can set the sorting direction to "ascending". This setting ensures that TruePositiveRate represents the proportion of defaulters. If Score contains PD values, where higher values correspond to higher risk, sorting the values in descending order is common practice.

    Output Arguments

    collapse all

    KS value for the values contained in Score, returned as a numeric scalar. You can use the KS value to quantify how well a model differentiates between lower risk and higher risk customers.

    Output metrics, returned as a structure containing the following fields:

    • KolmogorovSmirnovStatisticksValue

    • KolmogorovSmirnovScore — Value in Score that attains the KS statistic.

    • Metrics — Table with columns:

      • Thresholds — Unique score values sorted according to the value of sortdir.

      • TruePositiveRate — True positive rate values corresponding to the unique scores in the Thresholds column. For credit scoring models, this column represents the proportion of defaulters.

      • FalsePositiveRate — False positive rate values corresponding to the unique scores in the Threshold column. For credit scoring models, this column represents the proportion of nondefaulters.

      Metrics contains the data you need to make the KS curve, which plots TruePositiveRate and FalsePositiveRate as a function of Thresholds.

    Version History

    Introduced in R2025a