Friday, July 3, 2026

“Python Based End to End Integrated Credit Risk Modelling Framework Covering Development, Validation, Calibration & MRM Governance ”

 https://drive.google.com/drive/folders/1pERExuTBQM7n7v915OTAkRSia_RxKMrC?usp=drive_link



The developed Python framework implements an end-to-end credit risk modelling pipeline covering the complete lifecycle from data preparation to model monitoring and governance. The design aligns with model risk management expectations and ensures reproducibility, transparency, and regulatory compliance.

1. Data Preparation & Basic Controls

The process begins with loading historical credit risk data and defining the development sample window. Default definition is derived from rating thresholds, and time-based filtering is applied.

Key steps include:

  • Date processing and feature alignment
  • Default flag creation based on rating cut-off logic
  • Aggregation of yearly default rates for trend analysis
  • Missing value profiling and summary statistics generation
  • Train-test split for development and validation stability checks

2. Variable Stability & Initial Screening

A structured variable filtering process is applied to ensure only stable and predictive factors are retained.

Techniques used:

  • Population Stability Index (PSI) to assess distribution shift between train and test
  • Kolmogorov-Smirnov (KS) test to evaluate discriminatory power
  • Filtering based on PSI thresholds for stability
  • Frequency distribution analysis across bins

3. Feature Binning & Predictive Strength

Variables are transformed into binned formats for interpretability and risk segmentation.

Steps include:

  • Quantile-based binning of continuous variables
  • Weight of Evidence (WOE) transformation
  • Information Value (IV) calculation for predictive strength
  • Selection of variables based on IV thresholds (moderate predictive power range)

4. Time Series & Statistical Validation

To ensure robustness and reduce spurious relationships:

  • Augmented Dickey-Fuller (ADF) test is applied for stationarity
  • Variables failing stationarity criteria are excluded from modelling

5. Multicollinearity & Dependency Checks

To avoid redundancy and unstable coefficients:

  • Variance Inflation Factor (VIF) is computed
  • Highly collinear variables are removed using predefined thresholds
  • Partial correlation analysis is performed to further eliminate interdependent predictors

6. Logistic Regression Model Development

A stepwise logistic regression approach is implemented with constraints:

  • Forward selection based on statistical significance (p-values)
  • Business rule enforcement using expected coefficient signs
  • Final model selection constrained by maximum parameter limit
  • Model fitted using statsmodels Logit framework

7. Model Performance Evaluation

Comprehensive performance metrics are computed for both development and validation datasets:

  • Accuracy, Precision, Recall, F1-score
  • ROC-AUC and Gini coefficient
  • KS statistic for separation power
  • Log-loss for probabilistic performance

8. Calibration & Distribution Tests

Model calibration and statistical consistency are validated using:

  • Hosmer–Lemeshow goodness-of-fit test
  • Brier score for probability accuracy
  • Jeffreys / binomial test for default consistency
  • KS test between predicted and actual distributions

9. Rating System & Business Mapping

Model outputs are translated into interpretable risk grades:

  • K-Means clustering on predicted PD values
  • Mapping PD bands to rating grades (1–10 scale)
  • Default classification based on rating thresholds
  • Validation of monotonic relationship between rating and PD

10. Residual & Diagnostic Testing

Model assumptions are validated through:

  • Durbin-Watson test for autocorrelation
  • Breusch-Pagan test for heteroscedasticity
  • Residual trend analysis

11. Cross-Validation & Robustness Checks

To ensure generalization:

  • K-fold cross-validation using logistic regression
  • Stability of accuracy, precision, recall, and F1-score evaluated across folds

12. Monitoring Framework (Out-of-Time Testing)

A monitoring module is implemented for post-deployment tracking:

  • PSI computation on new (monitoring) data
  • Drift detection across key variables
  • Real-time performance tracking (AUC, KS, Gini, log-loss)
  • Rating consistency checks

13. Governance & MRM Alignment

A structured Model Risk Management layer is embedded:

  • Validation thresholds for key metrics (KS, Gini, PSI, calibration)
  • Escalation rules for drift and performance deterioration
  • Model lifecycle governance (development → validation → monitoring → recalibration)
  • Alignment with SR 11-7 principles

Overall Summary

This framework consolidates credit risk model development, validation, and monitoring into a single automated pipeline. It integrates statistical rigor with governance requirements, enabling end-to-end model lifecycle management with audit-ready outputs.

Sunday, June 28, 2026

IFRS 9 Governance & Model Risk Management Framework

 

https://docs.google.com/document/d/1jWVuYvUb7Eh3lbFAg0gUoXp7ur8EWb6w/edit?usp=drive_link&ouid=115594792889982302405&rtpof=true&sd=true

Technical overview of an IFRS 9 Model Risk Management Framework covering key governance and validation practices across the model lifecycle.

The Doc discusses:

  • Model governance, ownership, and lifecycle management
  • Performance monitoring using PSI, PD drift, ECL drift, and stage migration analysis
  • Validation metrics including Gini, KS, Calibration Error, Backtesting, and Stability Monitoring
  • Materiality assessment for distinguishing monitoring breaches from model impairment
  • Recalibration triggers, root cause analysis, and governance-driven escalation
  • Alignment with SR 11-7 principles for model governance, independent validation, documentation, and change management

The objective is to provide a structured governance approach that supports robust model monitoring, regulatory compliance, and informed model risk management decisions.

Friday, May 8, 2026

Logistic Regression vs XGBoost

 Python Code:
https://drive.google.com/drive/folders/1SVGLABBtkLU7kxZPArwnQK9JJ8bnYS8-?usp=sharing

PD Model Pipeline – Technical Summary

  • Built binary PD classification framework (Default = f(Rating threshold)).
  • Compared Logistic Regression vs XGBoost under identical preprocessing pipeline.

Data & Setup

  • Train/test split: 80/20
  • Target: binary default flag
  • Time feature retained for macro alignment
  • Leakage controls applied (rating/date removed)

Feature Pre-Filtering (Macro Risk Controls)

Applied sequential filtering:

  • PSI (<0.1) → removes unstable variables across train/test
  • KS (>0.1) → ensures discriminatory power
  • IV (0.02–2) → retains predictive but non-dominant features
  • ADF (p < 0.05) → ensures stationarity in macro series
  • VIF (<10) → removes multicollinearity

Final feature set = intersection of all filters.

Logistic Regression (Model Selection)

  • Exhaustive subset selection using combinations of n_vars
  • Statsmodels Logit estimation
  • Selection criterion:
    • Maximize: Pseudo R²
    • Minimize: average p-values
  • Output: best interpretable variable set

XGBoost Model

  • Gradient boosting classifier (fallback: sklearn GBC)
  • Feature selection via importance ranking
  • Top-N features retained
  • Non-linear interaction capture enabled automatically

Predictions

  • Logistic: logit → sigmoid transformation to PD
  • XGBoost: probability output directly

Evaluation Metrics

Computed for train & test:

  • AUC (ranking power)
  • KS (class separation)
  • Accuracy
  • Precision / Recall
  • F1-score
  • LogLoss (calibration)

Feature Interpretability

  • Logistic: coefficient sign + magnitude (regulatory usable)
  • XGBoost: feature importance ranking only

Comparison Logic

  • Model comparison based on:
    • Discriminatory power (AUC, KS)
    • Stability (train vs test gap)
    • Calibration (LogLoss)
  • Trade-off:
    • Logistic = interpretability + stability
    • XGBoost = predictive lift + non-linearity

Wednesday, March 18, 2026

Detecting Outliers Using Medcouple – A Simple, Robust Approach


Python Implementation along with Required Files:

https://drive.google.com/drive/folders/1i6ZN3noeTN9MCDk8fqbA1RndzV1L49dh?usp=drive_link


Risk 
modeling—often means dealing with skewed distributions. Standard methods like Z-scores or basic IQR can fail, either missing real outliers or flagging valid extreme values.

To address this, I used a Medcouple-based method, which is a robust, skewness-aware outlier detection technique.

How It Works

  1. Compute Ratios – Transform raw variables into a ratio (e.g., X2 / X1).

  2. Center Around Median – Scale values relative to the median to preserve asymmetry.

  3. Estimate Spread Robustly – Use quartiles above and below the median to calculate IQR.

  4. Measure Skewness (Medcouple) – A robust statistic capturing asymmetry without being influenced by extremes.

  5. Adjust Outlier Bounds – Expand or shrink thresholds based on skewness for accurate detection.

  6. Identify Outliers – Flag observations outside the skewness-adjusted bounds.


Benefits

  • Handles skewed and heavy-tailed distributions

  • Preserves meaningful extreme values

  • Improves data quality for modeling and analysis

Attached Files

To make this reproducible, I’m sharing:

  1. Excel replication – See the method step by step in Excel

  2. Python implementation – Fully automated outlier detection

  3. Input data used in Python – The dataset for replication

Saturday, September 27, 2025

PIT PD Modeling Using Systematic Factor Approach

 Python Code and Data

: https://drive.google.com/drive/folders/1d7vkT9SeXlELPjRRKDU3qUezibQaUL-y?usp=sharing

Robust methodology to estimate Point-in-Time (PIT) Probability of Default (PD) for non-default obligors under IFRS9, combining obligor-level characteristics with macroeconomic indicators. The approach bridges regulatory compliance with practical portfolio forecasting.

Key Steps:

  1. TTC PD Calculation:

    • We start with a Through-the-Cycle (TTC) PD model at the obligor level, capturing borrower-specific risk factors such as financial ratios, credit history, and product attributes.

    • Macro variables (Macroeconomic Exposure Variables, MEVs) are averaged over a historical period to normalize for economic cycles, ensuring stability and compliance with regulatory TTC requirements.

  2. Systematic Factor Extraction (Credit Cycle Index):

    • To incorporate the impact of economic cycles on forward-looking PDs, we applied Principal Component Analysis (PCA) to a set of macroeconomic indicators.

    • The first principal component serves as a credit cycle index, representing the systematic risk factor that drives correlated changes in credit quality across obligors.

  3. Forecasted Credit Cycle:

    • Using macroeconomic forecasts for upcoming quarters, we projected the credit cycle index forward, maintaining the relationship with historical MEVs.

    • This allows us to translate macroeconomic expectations into a forward-looking credit environment.

  4. PIT PD Estimation via Vasicek Transformation:

    • The TTC PDs were adjusted to PIT PDs using a Vasicek-based single-factor model, incorporating a correlation coefficient (ρ) to reflect the sensitivity of obligors to the systematic credit cycle.

    • This transformation ensures that obligors’ forward-looking PDs respond dynamically to expected changes in the macroeconomic environment.

  5. Portfolio-Level Forecast:

    • The final output is a matrix of PIT PDs, with each obligor in rows and forecasted quarters in columns, allowing granular IFRS9 expected credit loss calculations while remaining aligned with Basel and EBA guidance.

Benefits of this Methodology:

  • Combines obligor-specific risk and macroeconomic trends for accurate PIT PD forecasting.

  • Compliant with IFRS9 and regulatory expectations for forward-looking credit risk modeling.

  • Avoids the need for future obligor-level forecasts, which are often unavailable.

  • Easily scalable to large portfolios for quarterly IFRS9 reporting.

Saturday, September 20, 2025

Estimating Obligor-Level PIT PDs in Low-Default Portfolios

 Excel Example:
https://docs.google.com/spreadsheets/d/1vx3K1YsKxPD3wIc229W2QcghbUIOSe55/edit?usp=sharing&ouid=115594792889982302405&rtpof=true&sd=true

In low-default portfolios, estimating Point-in-Time (PIT) probability of default (PD) at the obligor level is particularly challenging due to data scarcity. To address this, I implemented the Basel Committee’s BCR approach and extended it with idiosyncratic adjustments for borrower-level differentiation.

Portfolio Level

  • Macro Driver: GDP YoY is standardized into a Z-score.

  • Systematic Link: Vasicek’s single-factor model connects portfolio Through-the-Cycle (TTC) PDs to the macroeconomic cycle.

  • Calibration: Goal Seek ensures unconditional PIT PDs are consistent with observed default frequencies.

Obligor Level

  • Start from TTC PDs and apply PIT adjustments consistently across obligors.

  • To differentiate obligors within the same quarter, I introduced idiosyncratic shifts (e.g., Debt-to-Equity ratio).

  • This framework can be extended using Principal Component Analysis (PCA) across multiple borrower-level factors (leverage, liquidity, profitability, etc.) to extract orthogonal risk drivers for richer differentiation.

  • These shifts or factors adjust each obligor’s threshold in the Vasicek model, producing distinct PIT PDs.

  • Finally, obligor PITs are rescaled so their average aligns with the calibrated portfolio PIT.

 This approach ensures regulatory consistency at the portfolio level while producing economically intuitive obligor-level PDs — higher leverage or weaker fundamentals result in higher PIT PDs, while PCA allows multiple dimensions of risk to be captured systematically.

Thursday, August 14, 2025

Import Macro Data from MOSPI into Python:

 Step 1 — Capture the Download URL (One-Time Setup)

  1. Open the MOSPI page: https://esankhyiki.mospi.gov.in/

  2. Search for your dataset (CPI, WPI, IIP, etc.)

  3. Press F12 → Network tab and tick Preserve log

  4. Click the Download button on the page

  5. In the network log, find the .xlsx request (e.g., cpi_8.xlsx) and copy the Request URL

 Step 1 — Python Automation and Data Processing

# CPI Python Code can be copied directly

url = "https://api.mospi.gov.in/api/download/CPI/cpi_8.xlsx"

output_file = "cpi_8.xlsx"


response = requests.get(url)

response.raise_for_status()  # Check for errors


with open(output_file, "wb") as f:

    f.write(response.content)


Ind_CPI = pd.read_excel("cpi_8.xlsx")


Ind_CPI['month_end'] = pd.to_datetime(

    Ind_CPI['year'].astype(str) + '-' + Ind_CPI['month_code'].replace(0, 12).astype(str) + '-01'

) + pd.offsets.MonthEnd(0)



Ind_CPI = (

    Ind_CPI

    .groupby(['month_end', 'group'], as_index=False)['index']

    .mean()

)


Ind_CPI = Ind_CPI.pivot(index='month_end', columns='group', values='index')

Ind_CPI = Ind_CPI.reset_index()


Ind_CPI['Quarter_End'] = pd.to_datetime(Ind_CPI['month_end']) + pd.offsets.QuarterEnd(0)

Ind_CPI = Ind_CPI.drop(columns=['month_end'])


Ind_CPI_qtr = (

    Ind_CPI

    .groupby('Quarter_End', as_index=False)

    .mean(numeric_only=True)  # averages each subgroup's monthly values into quarterly

)


Ind_CPI_qtr_pc = Ind_CPI_qtr.copy()

Ind_CPI_qtr_pc.iloc[:, 1:] = Ind_CPI_qtr_pc.iloc[:, 1:].pct_change(periods=x)  # in %



Ind_CPI_qtr_pc = Ind_CPI_qtr_pc.dropna().reset_index(drop=True)


Ind_CPI_qtr_pc['Quarter_End'] = Ind_CPI_qtr_pc['Quarter_End'].dt.date

Friday, August 1, 2025

PD Model Development in Python:

 Python code and data link:

https://drive.google.com/drive/folders/1kC621QtmjG3C_2ok-I53fPYqDRf9r8RK?usp=sharing

A. Data Preparation

  • Import & Clean Data: Read factor data with ratings and defaults; handle missing values.

  • Target Variable Setup: Calculate yearly default rates and define the target (Default flag).

  • Data Split: Train-test split for robust model validation.

 B. Statistical Screening of Independent Variables

  • Stability Check (PSI): Ensure variable stability over time.

  • Discriminatory Power (KS-Stat): Select variables that distinguish well between default and non-default.

  • Predictive Power (IV & WoE): Retain only those with high predictive value.

  • Stationarity (ADF Test): Remove non-stationary series.

  • Multicollinearity (VIF): Drop highly correlated variables.

  • Partial Correlation: Remove redundant/confounding variables.

 C. Logistic Regression & Model Construction

  • Stepwise Logistic Regression: Based on p-values (< 0.01), build the core model.

  • PD Estimation: Generate scores and PD predictions with monotonicity checks.

  • Diagnostics: Autocorrelation (Durbin-Watson) and Heteroskedasticity (Breusch-Pagan) tests ensure statistical robustness.

D. Model Testing

  • Rating Assignment: Cluster PD outputs into buckets using K-means for interpretability.

  • Validation Tests:

    • Jeffreys Test and KS-Stat – Compare predicted vs actual default distributions.

E. Final Model Validation

  • Accuracy Checks: AUC-ROC, F1, Recall, Precision, Log Loss across Train/Test sets.

  • Cross-Validation: K-Fold CV for model generalization.

  • Regularization Checks:

    • Lasso Regression – Identifies non-contributing features.

    • Ridge Regression – Tests coefficient stability.

  • Model Comparison: Combine and review coefficients from Logit, Lasso, and Ridge models.

This pipeline balances statistical rigor with regulatory expectations, providing a ready-to-explain model for auditors, regulators, and internal committees. It’s a great base for both Basel and IFRS9/CECL-aligned PD model builds.

Tuesday, July 22, 2025

BCR Approach with Python for Low-Default Portfolios

Access the full Python code and input data here: [https://drive.google.com/drive/folders/1T8clLUy9h42pn-WZ9Z89f3Bo98uQb1U4?usp=sharing] 

Key features of the Python implementation:

  • Inverse Vasicek calibration using root_scalar() to estimate PD
  •  Time-series PD projection based on GDP path volatility
  • Goodness-of-fit validation using Binomial hypothesis testing
  • Bayesian posterior estimation of PD with credible intervals
  • Stress scenario simulation – evaluates PD under adverse GDP shocks
  • Sensitivity analysis – assesses how varying asset correlation (ρ) affects PD outcomes





Sunday, July 13, 2025

BCR Approach for Regulatory Reporting (PD in Low-Default Portfolios)

Attached: Excel workbook with full BCR implementation (macro-driven, quarterly defaults, confidence intervals, and model diagnostics)
https://docs.google.com/spreadsheets/d/1ltVYX4vhGeTllOQkEWKrED2DV33eb55M/edit?usp=sharing&ouid=115594792889982302405&rtpof=true&sd=true

Excel Implementation Details (see link above)

  • Historical GDP YoY data as a macro factor

  • Z-score standardization of macro series

  • Conditional PD computed using Vasicek model

  • Expected defaults calculated for each period

  • Goal Seek used to backsolve for portfolio PD

  • Bayesian posterior estimate and 95% confidence bound from Beta distribution



 BCR Approach for Regulatory Reporting (PD in Low-Default Portfolios)

To calculate PD for low-default portfolios like sovereigns, large corporates, or prime mortgages, the Benjamin, Cathcart, and Ryan (BCR) methodology (2006) is a robust statistical approach used in both regulatory modeling and internal risk management:

It leverages:

- Vasicek (asymptotic single risk factor) model

-Observed macroeconomic conditions (e.g., GDP YoY growth)

-Asset correlation

-Observed default count over a given period

Excel Implementation Steps (Included in Attached File)

Inputs:

  • #Obligors (e.g., 54,000)

  • Observed defaults (e.g., 150)

  • Correlation (e.g., 0.20)

  • Historical GDP YoY (%) path as macro risk driver




Thursday, May 22, 2025

VIF (Variation Inflation Factor) vs. GVIF (Generalized VIF):

 VIF (Variation Inflation Factor) vs. GVIF (Generalized VIF):


Handling Multicollinearity with Factor Variables
When building regression models, checking multicollinearity is crucial.

For continuous predictors, VIF helps identify multicollinearity. But categorical variables with multiple levels (factors) need special treatment

A factor with k levels is represented by d=k−1 dummy variables. These dummies are inherently correlated because they encode the same categorical feature.
In this case we use the Generalized Variance Inflation Factor (GVIF) — which measures multicollinearity jointly for all dummy variables representing a factor.

When calculating GVIF for a factor variable, we regress all its dummy variables simultaneously on the other predictors in the model
GVIFj​=(1/ (1−Rj^2​​)^d)

Adjusted GVIF=GVIF^(1/(2⋅d))

d = number of dummy variables (degrees of freedom for the factor).
This adjustment scales GVIF to be comparable to standard VIF values.


Key reasons to consider GVIF:
Treating the factor’s dummies jointly prevents misleading interpretations of multicollinearity.
It reflects the true inflation of variance caused by correlation between the factor and other predictors.
Helps in making informed decisions about feature selection and model stability.

Wednesday, March 19, 2025

Incorporating IPCC Climate Projections into Probability of Default:

 For a detailed description od theory and excel worbook example, refer to the attached link

https://drive.google.com/drive/folders/1vAcH8ge2KEFBPxxJt06cPgIr5XVKhyOr?usp=sharing

R3 chase - Pursuit

“Python Based End to End Integrated Credit Risk Modelling Framework Covering Development, Validation, Calibration & MRM Governance ”

 https://drive.google.com/drive/folders/1pERExuTBQM7n7v915OTAkRSia_RxKMrC?usp=drive_link The developed Python framework implements an end-to...