Comparison of Probability of Default Using Through-the-Cycle and Point-in-Time Models

This example shows how to work with consumer credit panel data to create through-the-cycle (TTC) and point-in-time (PIT) models and compare their respective probabilities of default (PD).

The PD of an obligor is a fundamental risk parameter in credit risk analysis. The PD of an obligor depends on customer-specific risk factors as well as macroeconomic risk factors. Because they incorporate macroeconomic conditions differently, TTC and PIT models produce different PD estimates.

A TTC credit risk measure primarily reflects the credit risk trend of a customer over the long term. Transient, short-term changes in credit risk that are likely to be reversed with the passage of time get smoothed out. The predominant features of TTC credit risk measures are their high degree of stability over the credit cycle and the smoothness of change over time.

A PIT credit risk measure utilizes all available and pertinent information as of a given date to estimate the PD of a customer over a given time horizon. The information set includes not just expectations about the credit risk trend of a customer over the long term but also geographic, macroeconomic, and macro-credit trends.

Previously, according to the Basel II rules, regulators called for the use of TTC PDs, losses given default (LGDs), and exposures at default (EADs). However, with to the new IFRS9 and proposed CECL accounting standards, regulators now require institutions to use PIT projections of PDs, LGDs, and EADs. By accounting for the current state of the credit cycle, PIT measures closely track the variations in default and loss rates over time.

Load Panel Data

The main data set in this example (data) contains the following variables:

  • ID — Loan identifier.

  • ScoreGroup — Credit score at the beginning of the loan, discretized into three groups: High Risk, Medium Risk, and Low Risk.

  • YOB — Years on books.

  • Default — Default indicator. This is the response variable.

  • Year — Calendar year.

The data also includes a small data set (dataMacro) with macroeconomic data for the corresponding calendar years:

  • Year — Calendar year.

  • GDP — Gross domestic product growth (year over year).

  • Market — Market return (year over year).

The variables YOB, Year, GDP, and Market are observed at the end of the corresponding calendar year. ScoreGroup is a discretization of the original credit score when the loan started. A value of 1 for Default means that the loan defaulted in the corresponding calendar year.

This example uses simulated data, but you can apply the same approach to real data sets.

Load the data and view the first 10 rows of the table. The panel data is stacked and the observations for the same ID are stored in contiguous rows, creating a tall, thin table. The panel is unbalanced because not all IDs have the same number of observations.

load RetailCreditPanelData.mat
disp(head(data,10));
    ID    ScoreGroup     YOB    Default    Year
    __    ___________    ___    _______    ____

    1     Low Risk        1        0       1997
    1     Low Risk        2        0       1998
    1     Low Risk        3        0       1999
    1     Low Risk        4        0       2000
    1     Low Risk        5        0       2001
    1     Low Risk        6        0       2002
    1     Low Risk        7        0       2003
    1     Low Risk        8        0       2004
    2     Medium Risk     1        0       1997
    2     Medium Risk     2        0       1998
nRows = height(data);
UniqueIDs = unique(data.ID);
nIDs = length(UniqueIDs);
fprintf('Total number of IDs: %d\n',nIDs)
Total number of IDs: 96820
fprintf('Total number of rows: %d\n',nRows)
Total number of rows: 646724

Default Rates by Year

Use Year as a grouping variable to compute the observed default rate for each year. Use the groupsummary function to compute the mean of the Default variable, grouping by the Year variable. Plot the results on a scatter plot which shows that the default rate goes down as the years increase.

DefaultPerYear = groupsummary(data,'Year','mean','Default');
NumYears = height(DefaultPerYear);
disp(DefaultPerYear)
    Year    GroupCount    mean_Default
    ____    __________    ____________

    1997      35214         0.018629  
    1998      66716         0.013355  
    1999      94639         0.012733  
    2000      92891         0.011379  
    2001      91140         0.010742  
    2002      89847         0.010295  
    2003      88449        0.0056417  
    2004      87828        0.0032905  
subplot(2,1,1)
scatter(DefaultPerYear.Year, DefaultPerYear.mean_Default*100,'*');
grid on
xlabel('Year')
ylabel('Default Rate (%)')
title('Default Rate per Year')
% Get IDs of the 1997, 1998, and 1999 cohorts
IDs1997 = data.ID(data.YOB==1&data.Year==1997);
IDs1998 = data.ID(data.YOB==1&data.Year==1998);
IDs1999 = data.ID(data.YOB==1&data.Year==1999);
% Get default rates for each cohort separately
ObsDefRate1997 = groupsummary(data(ismember(data.ID,IDs1997),:),...
    'YOB','mean','Default');

ObsDefRate1998 = groupsummary(data(ismember(data.ID,IDs1998),:),...
    'YOB','mean','Default');

ObsDefRate1999 = groupsummary(data(ismember(data.ID,IDs1999),:),...
    'YOB','mean','Default');
% Plot against the calendar year
Year = unique(data.Year);
subplot(2,1,2)
plot(Year,ObsDefRate1997.mean_Default*100,'-*')
hold on
plot(Year(2:end),ObsDefRate1998.mean_Default*100,'-*')
plot(Year(3:end),ObsDefRate1999.mean_Default*100,'-*')
hold off
title('Default Rate vs. Calendar Year')
xlabel('Calendar Year')
ylabel('Default Rate (%)')
legend('Cohort 97','Cohort 98','Cohort 99')
grid on

The plot shows that the default rate decreases over time. Notice in the plot that loans starting in the years 1997, 1998, and 1999 form three cohorts. No loan in the panel data starts after 1999. This is depicted in more detail in the "Years on Books Versus Calendar Years" section of the example on Stress Testing of Consumer Credit Default Probabilities Using Panel Data. The decreasing trend in this plot is explained by the fact that there are only three cohorts in the data and that the pattern for each cohort is decreasing.

TTC Model Using ScoreGroup and Years on Books

TTC models are largely unaffected by economic conditions. The first TTC model in this example uses only ScoreGroup and YOB as predictors of the default rate.

Generate training and testing data sets by splitting the existing data into training and testing data sets that are used for model creation and validation, respectively.

NumTraining = floor(0.6*nIDs);

rng('default');
TrainIDInd = randsample(nIDs,NumTraining);
TrainDataInd = ismember(data.ID,UniqueIDs(TrainIDInd));
TestDataInd = ~TrainDataInd;

Use the fitglm function to fit a logistic model.

TTCModel = fitglm(data(TrainDataInd,:),...
    'Default ~ 1 + ScoreGroup + YOB',...
    'Distribution','binomial');
disp(TTCModel)
Generalized linear regression model:
    logit(Default) ~ 1 + ScoreGroup + YOB
    Distribution = Binomial

Estimated Coefficients:
                              Estimate       SE        tStat       pValue   
                              ________    ________    _______    ___________

    (Intercept)                -3.2453    0.033768    -96.106              0
    ScoreGroup_Medium Risk     -0.7058    0.037103    -19.023     1.1014e-80
    ScoreGroup_Low Risk        -1.2893    0.045635    -28.253    1.3076e-175
    YOB                       -0.22693    0.008437    -26.897    2.3578e-159


388018 observations, 388014 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 1.83e+03, p-value = 0

Predict the PD for the training and testing data sets using predict.

data.TTCPD = zeros(height(data),1);

% Predict in-sample
data.TTCPD(TrainDataInd) = predict(TTCModel,data(TrainDataInd,:));
% Predict out-of-sample
data.TTCPD(TestDataInd) = predict(TTCModel,data(TestDataInd,:));

Visualize the in-sample fit and out-of-sample fit.

PredTTCPDTrainYear = groupsummary(data(TrainDataInd,:),'Year','mean',...
    {'Default','TTCPD'});
f = figure;
subplot(2,1,1)
scatter(PredTTCPDTrainYear.Year,PredTTCPDTrainYear.mean_Default*100,'*');
hold on
plot(PredTTCPDTrainYear.Year,PredTTCPDTrainYear.mean_TTCPD*100);
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Training Data)')
grid on
PredTTCPDTestYear = groupsummary(data(TestDataInd,:),'Year','mean',...
    {'Default','TTCPD'});

subplot(2,1,2)
scatter(PredTTCPDTestYear.Year,PredTTCPDTestYear.mean_Default*100,'*');
hold on
plot(PredTTCPDTestYear.Year,PredTTCPDTestYear.mean_TTCPD*100);
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Testing Data)')
grid on

PIT Model Using ScoreGroup, Years on Books, GDP, and Market Returns

PIT models vary with the economic cycle. The PIT model in this example uses ScoreGroup, YOB, GDP, and Market as predictors of the default rate. Use the fitglm function to fit a logistic model.

% Add the GDP and Market returns columns to the original data

data = join(data, dataMacro);
disp(head(data,10))
    ID    ScoreGroup     YOB    Default    Year      TTCPD       GDP     Market
    __    ___________    ___    _______    ____    _________    _____    ______

    1     Low Risk        1        0       1997    0.0084797     2.72      7.61
    1     Low Risk        2        0       1998    0.0067697     3.57     26.24
    1     Low Risk        3        0       1999    0.0054027     2.86      18.1
    1     Low Risk        4        0       2000    0.0043105     2.43      3.19
    1     Low Risk        5        0       2001    0.0034384     1.26    -10.51
    1     Low Risk        6        0       2002    0.0027422    -0.59    -22.95
    1     Low Risk        7        0       2003    0.0021867     0.63      2.78
    1     Low Risk        8        0       2004    0.0017435     1.85      9.48
    2     Medium Risk     1        0       1997     0.015097     2.72      7.61
    2     Medium Risk     2        0       1998     0.012069     3.57     26.24
PITModel = fitglm(data(TrainDataInd,:),...
   'Default ~ 1 + ScoreGroup + YOB + GDP + Market',...
   'Distribution','binomial');
disp(PITModel)
Generalized linear regression model:
    logit(Default) ~ 1 + ScoreGroup + YOB + GDP + Market
    Distribution = Binomial

Estimated Coefficients:
                               Estimate        SE         tStat       pValue   
                              __________    _________    _______    ___________

    (Intercept)                   -2.667      0.10146    -26.287    2.6919e-152
    ScoreGroup_Medium Risk      -0.70751     0.037108    -19.066     4.8223e-81
    ScoreGroup_Low Risk          -1.2895     0.045639    -28.253    1.2892e-175
    YOB                         -0.32082     0.013636    -23.528    2.0867e-122
    GDP                         -0.12295     0.039725     -3.095      0.0019681
    Market                    -0.0071812    0.0028298    -2.5377       0.011159


388018 observations, 388012 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 1.97e+03, p-value = 0

Predict the PD for training and testing data sets using predict.

data.PITPD = zeros(height(data),1);

% Predict in-sample
data.PITPD(TrainDataInd) = predict(PITModel,data(TrainDataInd,:));
% Predict out-of-sample
data.PITPD(TestDataInd) = predict(PITModel,data(TestDataInd,:));

Visualize the in-sample fit and out-of-sample fit.

PredPITPDTrainYear = groupsummary(data(TrainDataInd,:),'Year','mean',...
    {'Default','PITPD'});

figure;
subplot(2,1,1)
scatter(PredPITPDTrainYear.Year,PredPITPDTrainYear.mean_Default*100,'*');
hold on
plot(PredPITPDTrainYear.Year,PredPITPDTrainYear.mean_PITPD*100);
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Training Data)')
grid on
PredPITPDTestYear = groupsummary(data(TestDataInd,:),'Year','mean',...
    {'Default','PITPD'});

subplot(2,1,2)
scatter(PredPITPDTestYear.Year,PredPITPDTestYear.mean_Default*100,'*');
hold on
plot(PredPITPDTestYear.Year,PredPITPDTestYear.mean_PITPD*100);
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Testing Data)')
grid on

In the PIT model, as expected, the predictions match the observed default rates more closely than in the TTC model. Although this example uses simulated data, qualitatively, the same type of model improvement is expected when moving from TTC to PIT models for real world data, although the overall error might be larger than in this example. The PIT model fit is typically better than the TTC model fit and the predictions typically match the observed rates.

Calculate TTC PD Using the PIT Model

Another approach for calculating TTC PDs is to use the PIT model and then replace the GDP and Market returns with the respective average values. In this approach, you use the mean values over an entire economic cycle (or an even longer period) so that only baseline economic conditions influence the model, and any variability in default rates is due to other risk factors. You can also enter forecasted baseline values for the economy that are different from the mean observed for the most recent economic cycle. For example, using the median instead of the mean reduces the error.

You can also use this approach of calculating TTC PDs by using the PIT model as a tool for scenario analysis, however; this cannot be done in the first version of the TTC model. The added advantage of this approach is that you can use a single model for both the TTC and PIT predictions. This means that you need to validate and maintain only one model.

% Modify the data to replace the GDP and Market returns with the corresponding average values
data.GDP(:) = median(data.GDP);
data.Market = repmat(mean(data.Market), height(data), 1);
disp(head(data,10));
    ID    ScoreGroup     YOB    Default    Year      TTCPD      GDP     Market      PITPD  
    __    ___________    ___    _______    ____    _________    ____    ______    _________

    1     Low Risk        1        0       1997    0.0084797    1.85    3.2263    0.0093187
    1     Low Risk        2        0       1998    0.0067697    1.85    3.2263     0.005349
    1     Low Risk        3        0       1999    0.0054027    1.85    3.2263    0.0044938
    1     Low Risk        4        0       2000    0.0043105    1.85    3.2263    0.0038285
    1     Low Risk        5        0       2001    0.0034384    1.85    3.2263    0.0035402
    1     Low Risk        6        0       2002    0.0027422    1.85    3.2263    0.0035259
    1     Low Risk        7        0       2003    0.0021867    1.85    3.2263    0.0018336
    1     Low Risk        8        0       2004    0.0017435    1.85    3.2263    0.0010921
    2     Medium Risk     1        0       1997     0.015097    1.85    3.2263     0.016554
    2     Medium Risk     2        0       1998     0.012069    1.85    3.2263    0.0095319

Predict the PD for training and testing data sets using predict.

data.TTCPD2 = zeros(height(data),1);

% Predict in-sample
data.TTCPD2(TrainDataInd) = predict(PITModel,data(TrainDataInd,:));
% Predict out-of-sample
data.TTCPD2(TestDataInd) = predict(PITModel,data(TestDataInd,:));

Visualize the in-sample fit and out-of-sample fit.

PredTTCPD2TrainYear = groupsummary(data(TrainDataInd,:),'Year','mean',...
    {'Default','TTCPD2'});

figure;
subplot(2,1,1)
scatter(PredTTCPD2TrainYear.Year,PredTTCPD2TrainYear.mean_Default*100,'*');
hold on
plot(PredTTCPD2TrainYear.Year,PredTTCPD2TrainYear.mean_TTCPD2*100);
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Training Data)')
grid on
PredTTCPD2TestYear = groupsummary(data(TestDataInd,:),'Year','mean',...
    {'Default','TTCPD2'});

subplot(2,1,2)
scatter(PredTTCPD2TestYear.Year,PredTTCPD2TestYear.mean_Default*100,'*');
hold on
plot(PredTTCPD2TestYear.Year,PredTTCPD2TestYear.mean_TTCPD2*100);
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Testing Data)')
grid on

Compare the Models

Create a summary plot to compare the three models and their PDs.

figure
scatter(PredPITPDTestYear.Year,PredPITPDTestYear.mean_Default*100, '*')
hold on
plot(PredTTCPDTestYear.Year,PredTTCPDTestYear.mean_TTCPD*100, 'Marker','o')
plot(PredPITPDTestYear.Year,PredPITPDTestYear.mean_PITPD*100, 'Marker','square')
plot(PredTTCPD2TestYear.Year,PredTTCPD2TestYear.mean_TTCPD2*100, 'Marker','diamond')
hold off
xlabel('Year')
ylabel('Default Rate (%)')
legend('default time','PD TTC','PD PIT','PD TTC 2')
title('PIT PDs vs. TTC PDs')
grid on

This plot illustrates that the PD PIT model has the best fit, the PD TTC model has the second best fit, and the PD TTC 2 model has the third best fit.

As a measure of quality, compare the root mean squared error of the PIT and TTC model PDs to the observed default times.

TTCRMSError = sqrt(mean((PredPITPDTestYear.mean_Default - PredTTCPDTestYear.mean_TTCPD).^2));
PITRMSError = sqrt(mean((PredPITPDTestYear.mean_Default - PredPITPDTestYear.mean_PITPD).^2));
TTC2RMSError = sqrt(mean((PredPITPDTestYear.mean_Default - PredTTCPD2TestYear.mean_TTCPD2).^2));
TTCMaxError = max(abs(PredPITPDTestYear.mean_Default - PredTTCPDTestYear.mean_TTCPD));
PITMaxError = max(abs(PredPITPDTestYear.mean_Default - PredPITPDTestYear.mean_PITPD));
TTC2MaxError = max(abs(PredPITPDTestYear.mean_Default - PredTTCPD2TestYear.mean_TTCPD2));

T = array2table([TTCRMSError, TTCMaxError; PITRMSError, PITMaxError; TTC2RMSError, TTC2MaxError]);
T.Properties.RowNames = {'TTC Model'; 'PIT Model'; 'TTC with PIT Model'};
T.Properties.VariableNames = {'Root Mean Squared Error', 'Maximum Error'};
disp(T);
                          Root Mean Squared Error    Maximum Error
                          _______________________    _____________

    TTC Model                     0.001964             0.0035249  
    PIT Model                   0.00078292             0.0017776  
    TTC with PIT Model           0.0036801             0.0066774  

References

  1. Generalized Linear Models documentation: https://www.mathworks.com/help/stats/generalized-linear-regression.html

  2. Baesens, B., D. Rosch, and H. Scheule. Credit Risk Analytics. Wiley, 2016.