0% found this document useful (0 votes)
31 views

Chapter 3 Normalized Principal Components Analysis

This document discusses normalized principal components analysis (PCA). It describes how PCA can be used to: 1) Reduce the dimensionality of data by finding the best visualization planes using orthogonal projections. 2) Group together homogeneous individuals and identify outliers. 3) Analyze relationships between variables. The document outlines the key steps of PCA, including data standardization, identifying principal axes and components, analyzing variable and individual scatterplots, assessing inertia to determine the optimal number of principal axes, and interpreting the results.

Uploaded by

sarra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Chapter 3 Normalized Principal Components Analysis

This document discusses normalized principal components analysis (PCA). It describes how PCA can be used to: 1) Reduce the dimensionality of data by finding the best visualization planes using orthogonal projections. 2) Group together homogeneous individuals and identify outliers. 3) Analyze relationships between variables. The document outlines the key steps of PCA, including data standardization, identifying principal axes and components, analyzing variable and individual scatterplots, assessing inertia to determine the optimal number of principal axes, and interpreting the results.

Uploaded by

sarra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Chapter 3: Normalized Principal Components Analysis

1. Initial data
Columns are quantitative variables (turnover, rate, weight ...)
Rows represent statistical individuals (basic units such as human beings, countries, years…)

2. Example of data
3. Objectives from a technical point of view.
 Reduce the number of dimension of the data by looking for the best planes visualization and this by applying
orthogonal projections of the data.
 Group together homogeneous individuals and identify exceptional individuals.
 Analyze the relationships between the variables.

Case Studies
Study the perception of a brand by the consumer.
Study the evolution of the financial situation of a company over time.
Compare several brands on the market.

Data standardization: centering and reducing the data

4. Individuals scatter plot analysis


Identification of the principal axes.
Principal axes:
 Principal axes ∆ 1 , ∆ 2 , . . ., k are identified by looking for eigenvalue of the eigenvectors of the correlation matrix
R=t ZDZ . ¿ is the weight matrix).
 In a next step, we sort the eigenvalue in a decreasing order1 ≫❑2> …>❑p . We denote by U the ( pXp) matrix of
eigenvectors u j organized in columns.

Principal components:
 The coordinates, over the new axes formed by the eigenvectors, of a
given individuals are given by the scalar products:
 Any couple of columns of the matrix U form a factors map.
Factors map and projection quality
Absolute contribution: The absolute contribution of a given point i to the projection inertia over the axis α is :
α 2
pi ( C i )
ACTR ( i ,α )=
❑α

Relative contribution or cosine square:


The relative contribution of a given point i over the axis is:
2
( Cαi )
RCTR ( i , α )=cos ( z i , z^ )= 2
2 α
i
‖z i‖
α
Where ^z i is the orthogonal projection of Zi over the axisα .

Remarks
C 1 Is the variable that gives the best description of the data dispersion.
The best plane data visualization is given by the factors map formed by the two axes C 1∧C 2 ¿ .
The variables C α ( α =1 , … , p ) are orthogonal (not correlated)
The variables C α ( α =1 , … , p )are a linear combination of Z j variables, so C α is also standardized.
For all α ≤ p :Var ( Cα ) =❑α

5. Variable scatterplot analysis

()
j
Z1
j
The coordinates of the j th variable point: Z = Z 2
j

j
Z n

The eigenvectors (v 1 , v 2 , ...) defining the principal axes of the second scatterplot are given by the transition formula:
1
vα= Zu
√ α α

The new factor coordinates of each variable point j over the axis are given by
α j
S j = √ ❑α uα

Projection quality of the variables points


Relative contribution (or cosine square) of the projection of the variable point Z jaccording to the axis α is given by the
cosine square of the angle formed the two vectors Z j and its projection ^z j , α F α :
α 2
CTR ( j, α )=cos ² (Z , ^z )=( S j )
j j,α

The communality according to the first factor map is:


1 2 2 2
Com( j ,(1 ,2))=( S j ) + ( S j )

The projections ^z j of the variables Z j the principal maps lie in a circle of center O and radius 1: this circle is named
correlation circle.
6. Inertia and the choice of the number of principal axes
Inertia and the choice the number of axes to retain
The total inertia of the initial scatterplot of individual is equal to:
p
I =∑ ❑i =p
i=1

The overall quality of the representation of the scatterplot on the main sub-space formed by ( u1 , u2 ,… , uq ) . The
proportion of the inertia absorbed by this subspace measures it. It is worth:
..+..+¿ q
❑1+ ¿
p
So, the rate of inertia absorbed by the first factors map (or principal factor map) is:
❑1+❑2
p
Number of axes to retain: inertia criterion
The criterion usually employed to measure PCA quality is the percentage of total inertia explained by the first chosen k
axes. It is defined by:
..+ ..+ ¿k ..+ ..+ ¿k
Rat e k =❑1 + p
=❑1 + ¿¿
p
∑ ❑i
i=1

This rate defines the explanatory power of the k first axis (or factors): it represents the part of total variance taken into
account by this k axis. However, its appreciation must take into account the number of variables and the number of
individuals. For example, an inertia rate relative to an axis of 10 % can be an important value if the we have 100
variables and low if it has only 7 variables.

Number of axes to retain: criterion


It consists in keeping, in a normalized PCA, only those axes whose eigenvalue is greater than 1 (i.e. average inertia).

7. Interpretation of variables factor maps


1 Variables to keep: We keep only the variables well represented on the factor map (i.e. variables close to the
correlation circle).

2 Variable-axis: variables strongly correlated with a factor will contribute to the definition of this axis.

3 Variable-variable: the proximity of projections of 2 variables indicates a strong positive correlation between them.
item 2 Diametrically opposite projections indicate a negative correlation between them.

4 Nearly orthogonal directions indicate a weak linear correlation.


Rotation
To ease the interpretation task, it may be convenient, once the number of factors determined, to rotate the axes. The
rotation (the varimax method, …) allows to get closer to a simple structure:
 One component is strongly correlated with some variables and little correlated with the others.
 A variable is correlated with a single component. In this case, the information restored by the factorial plane remains
the same but that restored by the axes changes.

8. Interpretation of the individual factors map

You might also like