Python Implementation along with Required Files:
https://drive.google.com/drive/folders/1i6ZN3noeTN9MCDk8fqbA1RndzV1L49dh?usp=drive_link
Risk modeling—often means dealing with skewed distributions. Standard methods like Z-scores or basic IQR can fail, either missing real outliers or flagging valid extreme values.
To address this, I used a Medcouple-based method, which is a robust, skewness-aware outlier detection technique.
How It Works
-
Compute Ratios – Transform raw variables into a ratio (e.g., X2 / X1).
-
Center Around Median – Scale values relative to the median to preserve asymmetry.
-
Estimate Spread Robustly – Use quartiles above and below the median to calculate IQR.
-
Measure Skewness (Medcouple) – A robust statistic capturing asymmetry without being influenced by extremes.
-
Adjust Outlier Bounds – Expand or shrink thresholds based on skewness for accurate detection.
-
Identify Outliers – Flag observations outside the skewness-adjusted bounds.
Benefits
-
Handles skewed and heavy-tailed distributions
-
Preserves meaningful extreme values
-
Improves data quality for modeling and analysis
Attached Files
To make this reproducible, I’m sharing:
-
Excel replication – See the method step by step in Excel
-
Python implementation – Fully automated outlier detection
-
Input data used in Python – The dataset for replication
Huber M-estimation uses a loss function (1) that transitions from squared error to absolute error depending on a threshold 𝛿:
δ based on the expected distribution of residuals.
e.g. δ=m * Stdev of errors
- If ∣ri∣≤m * Stdev, weight wi=1
- If ∣ri∣>m * Stdev the weight wi= m * Stdev / ∣ri∣
Steps:
Y (CCF) =β0+ β1X+ ϵ,
fit the Ordinary Least Squares (OLS) regression:
β^=(X^TX)^−1 * X^TY
residuals ri=yi−y^i
Define the Huber loss function (1) to calculate weights.
Using weights, modify the regression:
β^=(X^T*W*X)^−1X^T*W*Y