Chapter 3 Normalized Principal Components Analysis
Chapter 3 Normalized Principal Components Analysis
1. Initial data
Columns are quantitative variables (turnover, rate, weight ...)
Rows represent statistical individuals (basic units such as human beings, countries, years…)
2. Example of data
3. Objectives from a technical point of view.
Reduce the number of dimension of the data by looking for the best planes visualization and this by applying
orthogonal projections of the data.
Group together homogeneous individuals and identify exceptional individuals.
Analyze the relationships between the variables.
Case Studies
Study the perception of a brand by the consumer.
Study the evolution of the financial situation of a company over time.
Compare several brands on the market.
Principal components:
The coordinates, over the new axes formed by the eigenvectors, of a
given individuals are given by the scalar products:
Any couple of columns of the matrix U form a factors map.
Factors map and projection quality
Absolute contribution: The absolute contribution of a given point i to the projection inertia over the axis α is :
α 2
pi ( C i )
ACTR ( i ,α )=
❑α
Remarks
C 1 Is the variable that gives the best description of the data dispersion.
The best plane data visualization is given by the factors map formed by the two axes C 1∧C 2 ¿ .
The variables C α ( α =1 , … , p ) are orthogonal (not correlated)
The variables C α ( α =1 , … , p )are a linear combination of Z j variables, so C α is also standardized.
For all α ≤ p :Var ( Cα ) =❑α
()
j
Z1
j
The coordinates of the j th variable point: Z = Z 2
j
…
j
Z n
The eigenvectors (v 1 , v 2 , ...) defining the principal axes of the second scatterplot are given by the transition formula:
1
vα= Zu
√ α α
❑
The new factor coordinates of each variable point j over the axis are given by
α j
S j = √ ❑α uα
The projections ^z j of the variables Z j the principal maps lie in a circle of center O and radius 1: this circle is named
correlation circle.
6. Inertia and the choice of the number of principal axes
Inertia and the choice the number of axes to retain
The total inertia of the initial scatterplot of individual is equal to:
p
I =∑ ❑i =p
i=1
The overall quality of the representation of the scatterplot on the main sub-space formed by ( u1 , u2 ,… , uq ) . The
proportion of the inertia absorbed by this subspace measures it. It is worth:
..+..+¿ q
❑1+ ¿
p
So, the rate of inertia absorbed by the first factors map (or principal factor map) is:
❑1+❑2
p
Number of axes to retain: inertia criterion
The criterion usually employed to measure PCA quality is the percentage of total inertia explained by the first chosen k
axes. It is defined by:
..+ ..+ ¿k ..+ ..+ ¿k
Rat e k =❑1 + p
=❑1 + ¿¿
p
∑ ❑i
i=1
This rate defines the explanatory power of the k first axis (or factors): it represents the part of total variance taken into
account by this k axis. However, its appreciation must take into account the number of variables and the number of
individuals. For example, an inertia rate relative to an axis of 10 % can be an important value if the we have 100
variables and low if it has only 7 variables.
2 Variable-axis: variables strongly correlated with a factor will contribute to the definition of this axis.
3 Variable-variable: the proximity of projections of 2 variables indicates a strong positive correlation between them.
item 2 Diametrically opposite projections indicate a negative correlation between them.