risk.validation.hosmerLemeshowTest

Return Hosmer-Lemeshow test result

Since R2025a

Syntax

hHLTest = risk.validation.hosmerLemeshowTest(Probability,NumEvents,NumTrials)

hHLTest = risk.validation.hosmerLemeshowTest(Probability,NumEvents,NumTrials,ConfidenceLevel=confLevel)

[hHLTest,HLOutput] = risk.validation.hosmerLemeshowTest(___)

Description

hHLTest = risk.validation.hosmerLemeshowTest(Probability,NumEvents,NumTrials) returns the Hosmer-Lemeshow test result, hHLTest, for a given set of probabilities, events, and trials. The output is 1 if the test rejects the null hypothesis at the 95% confidence level, or 0 otherwise. Probability contains numeric values that represent quantities such as probability of default (PD) estimates.

example

hHLTest = risk.validation.hosmerLemeshowTest(Probability,NumEvents,NumTrials,ConfidenceLevel=confLevel) specifies the confidence level for the hypothesis test.

[hHLTest,HLOutput] = risk.validation.hosmerLemeshowTest(___) returns a structure HLOutput, containing summary metrics.

Examples

collapse all

Apply Hosmer-Lemeshow Test to PD Data

Open Live Script

Use the hosmerLemeshow function to assess whether the Hosmer-Lemeshow test rejects the null hypothesis at a 0.95 confidence level in a probability of default (PD) data set. In this example, you use the credit validation data set, which includes a table, ScorecardValidationData, that contains probability of PD values and their corresponding default status. Before you apply the test, it is common practice to:

Group the probabilities by deciles.
Compute the average probability of each group.
Compute the total number of defaults and loans of each group.

Load and display the credit validation data.

load CreditValidationData.mat
head(ScorecardValidationData)

    CreditScore      PD       Default
    ___________    _______    _______

      579.86       0.14182       0   
      563.65       0.17143       0   
      549.52       0.20106       0   
      546.25       0.20845       0   
      485.34       0.37991       0   
      482.07       0.39065       0   
      579.86       0.14182       1   
      451.73         0.494       0

Group Probabilities by Deciles

Prepare your PD data for the Hosmer-Lemeshow test by grouping the data by deciles, which results in ten values. Extract the variable, PD, from the table ScorecardValidationData and group the probabilities by using the risk.validation.groupNumberByQuantile function with the fully qualified namespace risk.validation. Specify "deciles" as the quantile type.

Probability = ScorecardValidationData.PD;
QuantileType = "deciles";
PDDecileNumber = risk.validation.groupNumberByQuantile(Probability,QuantileType);

Determine Average Probabilities by Group

Next, calculate the average probability for each group by using the groupsummary function.

PDAvgByDecile = groupsummary(Probability,PDDecileNumber,"mean");

Compute Total Defaults and Loans by Group

Extract the variable Default from the table ScorecardValidationData and use this variable as the default indicator. Then use the groupsummary function to compute the sums of defaults and loans for each group.

DefaultIndicator = ScorecardValidationData.Default;
[NumDefaultsByDecile,~,NumLoansByDecile] = groupsummary(DefaultIndicator,PDDecileNumber,"sum");

Apply Hosmer-Lemeshow Test

You can then apply the hosmerLemshowTest function with the fully qualified namespace risk.validation to see if the test rejects the null hypothesis. Use the average probabilities and the total number of defaults and loans as input arguments. Then, display the structure HLOutput with summary information about the test. hosmerLemshowTest outputs a scalar test result that it computes from aggregating all of the input PD information, whereas risk.validation.binomialTest computes a test result for each individual PD value.

[hHLTest,HLOutput] = risk.validation.hosmerLemeshowTest(PDAvgByDecile,NumDefaultsByDecile,NumLoansByDecile)

hHLTest = logical
   0

HLOutput = struct with fields:
    RejectHosmerLemeshowTest: 0
     HosmerLemeshowStatistic: 8.9211
               CriticalValue: 15.5073
                      PValue: 0.3490
             ConfidenceLevel: 0.9500
            DegreesOfFreedom: 8

Input Arguments

collapse all

`Probability` — Probability values
numeric vector with values in the range (0,1)

Probability values, specified as a numeric vector with values in the range (0,1). Probability contains values that indicate quantities such as PD estimates.

`NumEvents` — Number of events observed
numeric vector of nonnegative integers

Number of events observed, specified as a numeric vector of nonnegative integers. For PD models, NumEvents contains the number of defaults observed.

`NumTrials` — Number of trials
numeric vector of positive integers

Number of trials, specified as a numeric vector of positive integers. Each element in NumTrials must be greater than or equal to the corresponding elements of NumEvents. For PD models, NumTrials contains the number of loans.

`confLevel` — Confidence level
`0.95` (default) | numeric scalar in the range (0,1)

Confidence level of the hypothesis test, specified as a numeric scalar in the range (0,1).

Output Arguments

collapse all

`hHLTest` — Hypothesis test result
`0` | `1`

Hypothesis test result, returned as 0 or 1.

A value of 1 rejects the null hypothesis at the specified confidence level.
A value of 0 fails to reject the null hypothesis at the specified confidence level.

`HLOutput` — Output metrics
structure

Output metrics, returned as a structure with the following fields:

RejectHosmerLemeshowTest — Logical vector that indicates whether the null hypothesis was rejected. This field represents the same values as hHLTest.
HosmerLemeshowStatistic — Numeric scalar with the value of the Hosmer-Lemeshow test statistic.
CriticalValue — Numeric scalar representing the critical value for the Hosmer-Lemeshow test.
PValue — p-value for the hypothesis test returned as a scalar in the range [0,1]. A small value indicates that the null hypothesis might not be valid.
ConfidenceLevel — Confidence level for the hypothesis test.
DegreesOfFreedom — Degrees of freedom used to compute the critical value and p-value. The value of DegreesOfFreedom is length(Probability)-2.

More About

collapse all

Hosmer-Lemeshow Test

The test statistic for the Hosmer-Lemeshow test [1], [2], is given by:

$H L = \sum_{g = 1}^{G} \frac{{(N_{g} P_{g} - E_{g})}^{2}}{N_{g} P_{g} (1 - P_{g})}$

where G is the number of groups into which the sample has been grouped, N_g is the number of trials or observations in group g, P_g is the predicted probability of the event for group g, and E_g is the number of events observed in group g. For the Hosmer-Lemeshow test, the number of groups G is typically 10, and is defined by the deciles of the probability values. The group probability P_g is typically the average probability of the observations in group g. The test statistic HL asymptotically follows a chi-square distribution with G-2 degrees of freedom, from where the critical value and p-value can be obtained. Rejection of the Hosmer-Lemeshow test means that the observed number of events, E_g, does not match the expected number of events N_gP_g, and the model producing the predicted probabilities needs to be revised.

References

[1] Hosmer, D. W., Jr., and S. Lemeshow. 1980. “Goodness-of-fit tests for the multiple logistic regression model”. Communications in Statistics—Theory and Methods 9:1043–1069.

[2] Basel Committee on Banking Supervision, “Studies on the Validation of Internal Rating Systems”, Working Paper 14, May, 2005. https://www.bis.org/publ/bcbs_wp14.htm.

Version History

Introduced in R2025a

risk.validation.hosmerLemeshowTest

Syntax

Description

Examples

Apply Hosmer-Lemeshow Test to PD Data

Input Arguments

Probability — Probability values numeric vector with values in the range (0,1)

NumEvents — Number of events observed numeric vector of nonnegative integers

NumTrials — Number of trials numeric vector of positive integers

confLevel — Confidence level 0.95 (default) | numeric scalar in the range (0,1)

Output Arguments

hHLTest — Hypothesis test result 0 | 1

HLOutput — Output metrics structure

More About

Hosmer-Lemeshow Test

References

Version History

See Also

`Probability` — Probability values
numeric vector with values in the range (0,1)

`NumEvents` — Number of events observed
numeric vector of nonnegative integers

`NumTrials` — Number of trials
numeric vector of positive integers

`confLevel` — Confidence level
`0.95` (default) | numeric scalar in the range (0,1)

`hHLTest` — Hypothesis test result
`0` | `1`

`HLOutput` — Output metrics
structure