1501589578da Mod15 Q1 e Text
1501589578da Mod15 Q1 e Text
1.1 Introduction
II. The original data set is reduced to one consisting of N data vectors on c
principal components (reduced dimensions)
III. Each data vector is a linear combination of the c principal component vectors
IV. Works for numeric data only
Figure 1 (a) Data with 2 variables (b) Same data with 2 new variables
X1 and X2 Y1 and Y2
From k original variables: x1,x2,...,xk, (as shown in figure 1 (a) and (b)) produce k
new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal) Principal Components
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
and
{a11,a12,...,a1k} is 1st Eigenvector of correlation/covariance matrix, and
coefficients of first principal component
{a21,a22,...,a2k} is 2nd Eigenvector of correlation/covariance matrix, and
coefficients of 2nd principal component
…
{ak1,ak2,...,akk} is kth Eigenvector of correlation/covariance matrix, and
coefficients of kth principal component
1.5 Fundamentals
To get to PCA, we’re going to quickly define some basic statistical ideas – mean,
standard deviation, variance and covariance – so we can weave them together later.
Their equations are closely related.
Mean is simply the average value of all x’s in the set X, which is found by dividing
the sum of all data points by the number of data points, n.
Standard deviation, is simply the square root of the average square distance of
data points to the mean. In the equation below, the numerator contains the sum of
the differences between each data point and the mean, and the denominator is
simply the number of data points (minus one), producing the average distance.
Covariance (cov(X,Y)) is the joint variability between two random variables X and Y,
and covariance is always measured between 2 or more dimensions. If you calculate
the covariance between one dimension and itself, you get the variance.
For both variance and standard deviation, squaring the differences between data
points and the mean makes them positive, so that values above and below the mean
don’t cancel each other out.
Input to various regression techniques can be in the form of correlation or covariance
matrix. Covariance Matrix is a matrix whose element in the i, j position is
the covariance between the i th and j th elements of a random vector. A random
vector is a random variable with multiple dimensions.
Correlation Matrix is a table showing correlation coefficients between sets of
variables.
1. Consider a Data
2. Subtract the mean - from each of the data dimensions
3. Calculate the covariance matrix
4. Calculate the eigenvalues and eigenvectors of the covariance matrix
5. Reduce dimensionality and form feature vector
1. order the eigenvectors by eigenvalues, highest to lowest. This gives
you the components in order of significance and ignore the
components of lesser significance
2. Feature Vector = (eig1 eig2 eig3 … eign)
6. Deriving the new data:
FinalData = RowFeatureVector x RowZeroMeanData
1. RowFeatureVector is the matrix with the eigenvectors in the rows, with
the most significant eigenvector at the top
2. RowZeroMeanData is the mean-adjusted data transposed, ie. the data
items are in each column, with each row holding a separate dimension
So, since the non-diagonal elements in this covariance matrix are positive, we
should expect that both the x and y variable increase together.
Given our example set of data, and the fact that we have 2 eigenvectors, we have
two choices. We can either form a feature vector with both of the eigenvectors or we
can choose to leave out the smaller, less significant component and only have
asingle column:
(OR)
In the case of keeping both eigenvectors for the transformation, we get the data and
the plot found in Figure 3. This plot is basically the original data, rotated so that the
eigenvectors are the axes. This is understandable since we have lost no information
in this decomposition.
Basically we have transformed our data so that is expressed in terms of the patterns
between them, where the patterns are the lines that most closely describe the
relationships between the data. This is helpful because we have now classified our
data point as a combination of the contributions from each of those lines.
• For example, the first principal component can be computed using the
elements of the first eigenvector:
Summary
• One benefit of PCA is that we can examine the variances associated with the
principle components.