R5: : PD Model Development in Python:

Python code and data link:

https://drive.google.com/drive/folders/1kC621QtmjG3C_2ok-I53fPYqDRf9r8RK?usp=sharing

A. Data Preparation

Import & Clean Data: Read factor data with ratings and defaults; handle missing values.
Target Variable Setup: Calculate yearly default rates and define the target (Default flag).
Data Split: Train-test split for robust model validation.

B. Statistical Screening of Independent Variables

Stability Check (PSI): Ensure variable stability over time.
Discriminatory Power (KS-Stat): Select variables that distinguish well between default and non-default.
Predictive Power (IV & WoE): Retain only those with high predictive value.
Stationarity (ADF Test): Remove non-stationary series.
Multicollinearity (VIF): Drop highly correlated variables.
Partial Correlation: Remove redundant/confounding variables.

C. Logistic Regression & Model Construction

Stepwise Logistic Regression: Based on p-values (< 0.01), build the core model.
PD Estimation: Generate scores and PD predictions with monotonicity checks.
Diagnostics: Autocorrelation (Durbin-Watson) and Heteroskedasticity (Breusch-Pagan) tests ensure statistical robustness.

D. Model Testing

Rating Assignment: Cluster PD outputs into buckets using K-means for interpretability.
Validation Tests:
- Jeffreys Test and KS-Stat – Compare predicted vs actual default distributions.

E. Final Model Validation

Accuracy Checks: AUC-ROC, F1, Recall, Precision, Log Loss across Train/Test sets.
Cross-Validation: K-Fold CV for model generalization.
Regularization Checks:
- Lasso Regression – Identifies non-contributing features.
- Ridge Regression – Tests coefficient stability.
Model Comparison: Combine and review coefficients from Logit, Lasso, and Ridge models.

This pipeline balances statistical rigor with regulatory expectations, providing a ready-to-explain model for auditors, regulators, and internal committees. It’s a great base for both Basel and IFRS9/CECL-aligned PD model builds.

R5:

Labels

Friday, August 1, 2025

PD Model Development in Python:

A. Data Preparation

B. Statistical Screening of Independent Variables

C. Logistic Regression & Model Construction

D. Model Testing

E. Final Model Validation

No comments:

R3 chase - Pursuit

PIT PD Modeling Using Systematic Factor Approach