Friday, May 8, 2026

Logistic Regression vs XGBoost

 Python Code:
https://drive.google.com/drive/folders/1SVGLABBtkLU7kxZPArwnQK9JJ8bnYS8-?usp=sharing

PD Model Pipeline – Technical Summary

  • Built binary PD classification framework (Default = f(Rating threshold)).
  • Compared Logistic Regression vs XGBoost under identical preprocessing pipeline.

Data & Setup

  • Train/test split: 80/20
  • Target: binary default flag
  • Time feature retained for macro alignment
  • Leakage controls applied (rating/date removed)

Feature Pre-Filtering (Macro Risk Controls)

Applied sequential filtering:

  • PSI (<0.1) → removes unstable variables across train/test
  • KS (>0.1) → ensures discriminatory power
  • IV (0.02–2) → retains predictive but non-dominant features
  • ADF (p < 0.05) → ensures stationarity in macro series
  • VIF (<10) → removes multicollinearity

Final feature set = intersection of all filters.

Logistic Regression (Model Selection)

  • Exhaustive subset selection using combinations of n_vars
  • Statsmodels Logit estimation
  • Selection criterion:
    • Maximize: Pseudo R²
    • Minimize: average p-values
  • Output: best interpretable variable set

XGBoost Model

  • Gradient boosting classifier (fallback: sklearn GBC)
  • Feature selection via importance ranking
  • Top-N features retained
  • Non-linear interaction capture enabled automatically

Predictions

  • Logistic: logit → sigmoid transformation to PD
  • XGBoost: probability output directly

Evaluation Metrics

Computed for train & test:

  • AUC (ranking power)
  • KS (class separation)
  • Accuracy
  • Precision / Recall
  • F1-score
  • LogLoss (calibration)

Feature Interpretability

  • Logistic: coefficient sign + magnitude (regulatory usable)
  • XGBoost: feature importance ranking only

Comparison Logic

  • Model comparison based on:
    • Discriminatory power (AUC, KS)
    • Stability (train vs test gap)
    • Calibration (LogLoss)
  • Trade-off:
    • Logistic = interpretability + stability
    • XGBoost = predictive lift + non-linearity

No comments:

R3 chase - Pursuit

Logistic Regression vs XGBoost

 Python Code: https://drive.google.com/drive/folders/1SVGLABBtkLU7kxZPArwnQK9JJ8bnYS8-?usp=sharing PD Model Pipeline – Technical Summary B...