1- Principal Components Analysis
PCA is a way of identifying patterns in
data, and expressing the data in such a way as to highlight their similarities
and differences.
Step-2- Calculate the correlation matrix.
Step-3-Calculate the eigenvectors and
eigenvalues of the covariance matrix.
This provide us
with information about the patterns in the data.
Test the eigen
vector for unit length.
Step-4- Selecting Variables-
Order
eigenvalue from highest to lowest
Eigen vector with the highest eigen value is
the principle component of the data set.
Step-5-
Where X is the matrix
of variables shortlisted and C
Step-6
Regressing new variable Z with Y as dependent.
Example:
Consider a set of
5 variables with eigen values
of correlation matrix -
We
can see that first component explains the 95.57% of variation.
Eigen Vector-
The principal
components of the explanatory variables are:
Since, the first
component accounts for 95.57 % of the variance.
Y is regressed
with independent variable Z1 for coefficients.
Eigen Value- a scalar associated with a given
linear transformation of a vector space and having the property that there is
some nonzero vector which when multiplied by the scalar is equal to the vector
obtained by letting the transformation operate on the vector; especially:
a root of the characteristic equation of a matrix.
Eigenvector or characteristic vector of a linear transformation is a non-zero vector whose direction does not change when that linear transformation is
applied to it.
Let A be an n
n matrix. The number
is an eigenvalue of A if there exists a non-zero vector v such that.
Av=
v
In this case, vector v
is called an eigenvector of A corresponding to eigen value
.
Rewrite the condition: Av=
v as
(A−
I)*v= 0
(E.1)
Where I is the n
n identity matrix.
For a non-zero vector v to satisfy this equation, A−
I must not be invertible. If it is invertible than v = 0.
The Characertstic Polynomial =
p(
)=det (A−
I)
(E.2)
Roots of the above equation will give
us the eigen values
.
Substituting
in E.2 will give us
the corresponding vectors.
-
Eigenvalues represent the variance of
the data along the eigenvector directions, whereas the variance components of
the covariance matrix represent the spread along the axes.
-
All the eigen vectors of a matrix are
perpendicular, i.e. at right angles to each other.
-
The eigen vector of covariance matrix
with the highest eigen value is the principle component of the data set.
Properties-
-
The largest eigenvector
of the covariance matrix always points into the direction of the largest
variance of the data, and the magnitude of this vector equals the corresponding
eigenvalue.
-
The second largest eigenvector is
always orthogonal to the largest eigenvector, and points into the direction of
the second largest spread of the data.
-
If the covariance matrix of our data
is a diagonal matrix, such that the covariances are zero, then this means that
the variances must be equal to the eigenvalues
.
No comments:
Post a Comment