Showing posts with label Econometrics. Show all posts

Sunday, July 13, 2025

BCR Approach for Regulatory Reporting (PD in Low-Default Portfolios)

Attached: Excel workbook with full BCR implementation (macro-driven, quarterly defaults, confidence intervals, and model diagnostics)
https://docs.google.com/spreadsheets/d/1ltVYX4vhGeTllOQkEWKrED2DV33eb55M/edit?usp=sharing&ouid=115594792889982302405&rtpof=true&sd=true

Excel Implementation Details (see link above)

Historical GDP YoY data as a macro factor
Z-score standardization of macro series
Conditional PD computed using Vasicek model
Expected defaults calculated for each period
Goal Seek used to backsolve for portfolio PD
Bayesian posterior estimate and 95% confidence bound from Beta distribution

BCR Approach for Regulatory Reporting (PD in Low-Default Portfolios)

To calculate PD for low-default portfolios like sovereigns, large corporates, or prime mortgages, the Benjamin, Cathcart, and Ryan (BCR) methodology (2006) is a robust statistical approach used in both regulatory modeling and internal risk management:

It leverages:

- Vasicek (asymptotic single risk factor) model

-Observed macroeconomic conditions (e.g., GDP YoY growth)

-Asset correlation

-Observed default count over a given period

Excel Implementation Steps (Included in Attached File)

Inputs:

#Obligors (e.g., 54,000)
Observed defaults (e.g., 150)
Correlation (e.g., 0.20)
Historical GDP YoY (%) path as macro risk driver

Thursday, May 22, 2025

VIF (Variation Inflation Factor) vs. GVIF (Generalized VIF):

Handling Multicollinearity with Factor Variables
When building regression models, checking multicollinearity is crucial.

For continuous predictors, VIF helps identify multicollinearity. But categorical variables with multiple levels (factors) need special treatment

A factor with k levels is represented by d=k−1 dummy variables. These dummies are inherently correlated because they encode the same categorical feature.
In this case we use the Generalized Variance Inflation Factor (GVIF) — which measures multicollinearity jointly for all dummy variables representing a factor.

When calculating GVIF for a factor variable, we regress all its dummy variables simultaneously on the other predictors in the model
GVIFj=(1/ (1−Rj^2)^d)

Adjusted GVIF=GVIF^(1/(2⋅d))

d = number of dummy variables (degrees of freedom for the factor).
This adjustment scales GVIF to be comparable to standard VIF values.

Key reasons to consider GVIF:
Treating the factor’s dummies jointly prevents misleading interpretations of multicollinearity.
It reflects the true inflation of variance caused by correlation between the factor and other predictors.
Helps in making informed decisions about feature selection and model stability.

Friday, June 14, 2024

Change Point Detection Time Series

Change Point Detection Methods

Kernel Change Point Detection:

Kernel change point detection method detects changes in the distribution of the data, not just changes in the mean or variance.

Kernel Method is utilized to map the data into a high-dimensional feature space, where changes are more easily detectable. This approach uses the Maximum Mean Discrepancy (MMD) to measure the difference between the distributions of segments of the time series.

Steps:

1- Data and Kernel Function: Consider a univariate time series {x1,x2,…,xn} We start by choosing a kernel function k(x,y) to measure similarity between points.

2- Construction of Kernel Matrix: kernel matrix K is constructed, where each element K_ij=k(xi,xj)

For the linear kernel, this is: Kij=xi⋅xj (X^TX)

3- Maximum Mean Discrepancy (MMD):

MMD measures how different two groups of data are by comparing the average of all pairwise similarities within each group and between the groups or compares two distributions to see if they are different.

MMD is used to measure the difference between the distributions before and after a candidate change point t.

For each candidate change point t

In the above equation:

- The first term measures the similarity within the first segment.

- The second term measures the similarity within the second segment.

- The third term measures the similarity between the two segments.

4- To detect the change point, we compute the MMD values are computed for all possible change points t and choose the one that maximizes the MMD value:

Excel Example :https://docs.google.com/spreadsheets/d/1IdC-ss1VjaL2QVQdABNwuIPfRphDtlZi/edit?usp=sharing&ouid=115594792889982302405&rtpof=true&sd=true

Saturday, November 4, 2023

PCA: Eigenvalues & Eigen Vectors

Excel (Macro enable workbook) and VBA code:

https://drive.google.com/drive/folders/18tIKLLg8MfJ2MYDjLPAWHspEbApVIpGz?usp=sharing

PCA: EigenValues & EigenVectors:

Eigen Values & Vectors-

Eigen Value- a scalar associated with a given linear transformation of a vector space and having the property that there is some nonzero vector which when multiplied by the scalar is equal to the vector obtained by letting the transformation operate on the vector; especially: a root of the characteristic equation of a matrix.

Eigenvector or characteristic vector of a linear transformation is a non-zero vector whose direction does not change when that linear transformation is applied to it.

Let A an n*n matrix. The number x is an eigenvalue of A if there exists a non-zero vector v such that.

Av= xv

In this case, vector v is called an eigenvector of A corresponding to eigen value .

Rewrite the condition: Av= xv as

(A− xI)*v= 0 (E.1)

Where I am the n n identity matrix.

For a non-zero vector v to satisfy this equation, A− xI must not be invertible. If it is invertible than v = 0.

The Characteristic Polynomial =

p( x )=det (A− xI) (E.2)

Roots of the above equation will give us the eigen values x .

Substituting x in E.2 will give us the corresponding vectors.

- Eigenvalues represent the variance of the data along the eigenvector directions, whereas the variance components of the covariance matrix represent the spread along the axes.

- All the eigen vectors of a matrix are perpendicular, i.e. at right angles to each other.

- The eigen vector of covariance matrix with the highest eigen value is the principle component of the data set.

- The largest eigenvector of the covariance matrix always points into the direction of the largest variance of the data, and the magnitude of this vector equals the corresponding eigenvalue.

- The second largest eigenvector is always orthogonal to the largest eigenvector, and points into the direction of the second largest spread of the data.

- If the covariance matrix of our data is a diagonal matrix, such that the covariances are zero, then this means that the variances must be equal to the eigenvalues x .

Thursday, September 14, 2023

Ratio Outlier (Medcouple)

Refer for Details:

https://drive.google.com/drive/folders/19IEg3V008AkBwT5UGrsXCL8N6It7BGc_?usp=sharing

Saturday, August 13, 2022

Bias & Variance Trade off:

Errors of a model is decomposed into Noise, Bias and variance.

There is a tradeoff between a model's ability to minimize bias and variance.

Overfitting: Variance High; model is good in the training set but not in the testing data set. Low training error does not imply good expected performance: over‐fitting.
Underfitting: Bias High; Model is neither good in the training nor in the testing data.
Noise: The model is neither overfitting or underfitting, and the high MSE is simply due to the amount of noise in the dataset.

Error due to Bias: Actual Value – average (Predicted Value).
A high bias model characteristic:
1.   High training error.
2.   Validation error is similar in magnitude to the training error.
Error due to Variance:
Is variability of a model prediction for a given data point. Repeating the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.
A high variance model characteristic:
1.   Low training error
2.   Very high Validation error

Sunday, July 3, 2022

Bayesian Inference

Update probability with new information (data).

Combining two distributions (Likelihood and Prior) into Posterior.

Posterior is used find the “best” parameters in terms of maximizing the posterior probability.

Steps:

i. Prior: Choose a PDF to model i.e. the prior distribution P(θ).

ii. Likelihood: Choose a PDF for P(X|θ). How the data X will look like given the parameter θ.

iii. Posterior: Calculate the posterior distribution P(θ|X) and pick the θ that has the highest P(θ|X).

Calculate P(θ) & P(X|θ) for a specific θ and multiply them together. Pick the highest P(θ) * P(X|θ) among different θ’s.

Posterior becomes the new prior. Repeat step 3 as you get more data.

R5:

Labels

Sunday, July 13, 2025

BCR Approach for Regulatory Reporting (PD in Low-Default Portfolios)

Excel Implementation Details (see link above)

Excel Implementation Steps (Included in Attached File)

Thursday, May 22, 2025

VIF (Variation Inflation Factor) vs. GVIF (Generalized VIF):

Friday, June 14, 2024

Change Point Detection Time Series

Change Point Detection Methods

Kernel Change Point Detection:

Saturday, November 4, 2023

PCA: Eigenvalues & Eigen Vectors

Thursday, September 14, 2023

Ratio Outlier (Medcouple)

Saturday, August 13, 2022

Bias & Variance Trade off:

Sunday, July 3, 2022

Bayesian Inference

R3 chase - Pursuit

BCR Approach for Regulatory Reporting (PD in Low-Default Portfolios)