Monday, August 28, 2017

Principal Component Analysis





1-  Principal Components Analysis

PCA is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences.
Step-1- Collect the data and subtract the mean from each data element.
Step-2- Calculate the correlation matrix.
Step-3-Calculate the eigenvectors and eigenvalues of the covariance matrix.
This provide us with information about the patterns in the data.
Test the eigen vector for unit length.

Step-4- Selecting Variables-
Order eigenvalue from highest to lowest
Eigen vector with the highest eigen value is the principle component of the data set.
Step-5-
Where X is the matrix of variables shortlisted and C
Step-6
Regressing new variable Z with Y as dependent.

Example:
Consider a set of 5 variables with eigen values of correlation matrix -
We can see that first component explains the 95.57% of variation.

Eigen Vector-
The principal components of the explanatory variables are:
Since, the first component accounts for 95.57 % of the variance.

Y is regressed with independent variable Z1 for coefficients.




Eigen Value- a scalar associated with a given linear transformation of a vector space and having the property that there is some nonzero vector which when multiplied by the scalar is equal to the vector obtained by letting the transformation operate on the vector; especially:  a root of the characteristic equation of a matrix.

Eigenvector or characteristic vector of a linear transformation is a non-zero vector whose direction does not change when that linear transformation is applied to it.
Let A be an n n matrix. The number  is an eigenvalue of A if there exists a non-zero vector v such that.
Av= v
In this case, vector v is called an eigenvector of A corresponding to eigen value .
Rewrite the condition:                              Av= v              as
                                                                 (A− I)*v= 0                   (E.1)
Where I is the n n identity matrix.
For a non-zero vector v to satisfy this equation, A− I must not be invertible. If it is invertible than v = 0.
The Characertstic Polynomial =
                       p( )=det (A I)                                                 (E.2)
Roots of the above equation will give us the eigen values .
Substituting  in E.2 will give us the corresponding vectors.
-         Eigenvalues represent the variance of the data along the eigenvector directions, whereas the variance components of the covariance matrix represent the spread along the axes.
-         All the eigen vectors of a matrix are perpendicular, i.e. at right angles to each other.
-         The eigen vector of covariance matrix with the highest eigen value is the principle component of the data set.

Properties-
-         The largest eigenvector of the covariance matrix always points into the direction of the largest variance of the data, and the magnitude of this vector equals the corresponding eigenvalue.
-         The second largest eigenvector is always orthogonal to the largest eigenvector, and points into the direction of the second largest spread of the data.
-         If the covariance matrix of our data is a diagonal matrix, such that the covariances are zero, then this means that the variances must be equal to the eigenvalues .
 

No comments:

R3 chase - Pursuit

Change Point Detection Time Series

  Change Point Detection Methods Kernel Change Point Detection: Kernel change point detection method detects changes in the distribution of ...