0% found this document useful (0 votes)

7 views

Kumar 2017

Uploaded by

Hamza Zabet

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Kumar 2017

Uploaded by

Hamza Zabet

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

GENERAL ARTICLE

Principal Component Analysis:

Most Favourite Tool in Chemometrics
Keshav Kumar

Principal component analysis (PCA) is the most commonly

used chemometric technique. It is an unsupervised pattern
recognition technique. PCA has found applications in chem-
istry, biology, medicine and economics. The present work at-
tempts to understand how PCA work and how can we inter-
pret its results.

Keshav Kumar did his PhD

1. Introduction from Department of
Chemistry, Indian Institute of
Chemometrics is a discipline that combines mathematics, statis- Technology-Madras, India,
under the guidance of
tics, and logic to design or select optimal measurement proce-
Professor A K Mishra.
dures and experiments. It allows the extraction of maximum rel- Currently he is working as a
evant chemical information by analysing chemical data and helps Postdoc at the Institute for
in understanding chemical systems [1]. In recent years, chemo- Wine Analysis and Beverage
Research, Hochschule
metrics has emerged as an important part of analytical chemistry.
Geisenheim University,
Chemometric techniques have enabled the analysis of large vol- Germany. His research
umes of data obtained from various instruments (single or hy- mainly focus on
phenated) eﬃciently. The analyses of such large data sets are chemometrics and its
application in various elds.
otherwise a time consuming process and might end up with no
meaningful interpretation or conclusions.
Among various chemometric techniques, principal component anal-
ysis (PCA) [1, 2] is considered the ‘most favourite’. PCA has
found applications in various elds. For example, Singh et al.,
have successfully used PCA for stellar spectral classication [3].
Kumar et al., have applied PCA for (i) classifying aqueous herbal
drugs [4] and (ii) diagnosis and therapeutic prognosis of oral sub-
mucous brosis [5]. Kowalski et al., have used PCA for the clas- Keywords
sication of archaeological artefacts [6]. Kowalkowaski has ap- Chemometrics, principal compo-
nent analysis, classication, pat-
plied PCA for river water classication [7], while Ragot and co-
tern recognition, chromatography.
workers have used PCA for air quality monitoring [8].

RESONANCE | August 2017 747

GENERAL ARTICLE

Data compression by
It can be realised that PCA is capable of providing a fast and
PCA involves nding a
new space spanned by eﬀective way of analysing data sets from various disciplines viz
fewer number of physics, biology, chemistry, archaeology, etc. PCA essentially
dimensions over which reduces the dimensions of the data set while retaining most of
original data set is the variation [1, 2]. Data compression by PCA involves nding
projected. The
dimensions of the new a new space spanned by fewer number of dimensions over which
space are orthogonal to original data set is projected. The dimensions of the new space
each other simplifying are orthogonal to each other simplifying the data sets for further
the data sets for further analysis. Theoretical and various technical aspects of PCA are
analysis.
discussed below.

2. Theory

2.1 Geometrical Representation of PCA

In order to understand PCA geometrically, let us consider a two

dimensional data set I × J, where I is the number of samples and
J is the number of variables. In the present case, for convenience,
we have set the number of variables (i.e., J) to two – J 1 and J 2 .
As shown in Figure 1, these samples can be presented in a two
dimensional space spanned by J 1 and J 2 . The two axes J 1 and J 1
are orthogonal to each other. The data set acquired for the sam-
ples have considerable amount of variation along J 1 and J 2 axes.
In other words, both the dimensions are signicantly important to
have the complete information about the sample set.
An anti-clock wise rotation of the J 1 and J 2 axes by an angle θ (=
45o in the present case) generates another pair of orthogonal axes
T 1 and T 2 . Mathematically, it could be shown using (1):
⎛ ⎞ ⎛ ⎞⎛ ⎞
⎜⎜⎜T 1 ⎟⎟⎟ ⎜⎜⎜ cos θ sin θ ⎟⎟⎟ ⎜⎜⎜ J1 ⎟⎟⎟
⎝⎜ ⎠⎟ = ⎝⎜ ⎠⎟ ⎝⎜ ⎠⎟
T2 − sin θ cos θ J2

The new variables (or dimensions) T 1 and T 2 are the linear com-
binations of J 1 and J 2 variables with sine and cosine as coeﬃ-
cients
T 1 = J1 cos θ + J2 sin θ (1)

748 RESONANCE | August 2017

GENERAL ARTICLE

T 2 = −J1 sin θ + J2 cos θ (2)

Projection of the data set in space, spanned by the new variables

T 1 and T 2 is shown in Figure 2. The data set has most of the
variations along T 1 axis and is literally invariant along T 2 axis.
Figure 1. Representa-
tion of a data set in the
space spanned by J1 and J2 .
Data has signicant varia-
tion along both the axes.

Figure 2. (a) Rotation of

axes J1 and J2 by 45o to gen-
erate another pair of orthog-
onal axes T 1 and T 2 . (b)
Representation of data set
in the new space spanned
by T 1 and T 2 axes. The
data set has variation along
T 1 and no variation along
T 2 . (c) Reduction of di-
mensions. T 2 is unimpor-
tant and hence could be re-
moved, and T 1 can be taken
as the approximation of data
spanned in the two dimen-
sional space spanned by J1
and J2 .

RESONANCE | August 2017 749

GENERAL ARTICLE

While the score value In principle, variation along T 1 axis can be taken as a good ap-
explains how the proximation of the original two-dimensional data set, and one can
samples are related to easily ignore the T 2 axis. Thus, by projecting the data set in a suit-
each other, the loading
value explains how the able space, it is possible to reduce the dimensions of the data sets
variables are related to while retaining all the information.
each other.
2.2 Commonly Used Terminologies in PCA

Before proceeding further, it is necessary that we briey describe

some commonly used terminologies.
(i) Principal Components: The set of new variables (i.e., T 1 and
T 2 ) obtained from the linear combinations of old variables (J 1
and J 2 ) are called principal components. The variable that ex-
plains the maximum variation is called the rst principal compo-
nent. Second principal component explains the second highest
variation from the unexplained variance of the data set and so on.
In the above given example, T 1 is the rst principal component
and explains all the variations of the data set, and T 2 is the second
principal component that explains the remaining variance of the
data set.
(ii) Loading Vectors: They essentially form the basis for project-
ing the original data set to obtain the principal components. In
the above example, [cos θ sin θ]T and [− sin θ cos θ]T , transpose of
rst and second row of the matrix, respectively, given in (1) rep-
resents the rst and second loading vectors corresponding to rst
(T 1 ) and second (T 2 ) principal components. The loading vectors
in more generic sense are known as Eigen vectors of the data set.
(iii) Score and Loading Value: The numerical values associated
with principal components of each sample are called the score
values. The numerical values associated with the elements of
loading vectors are called the loading values. The score values
explain how the samples are related to each other, and the loading
values explain how the variables are related to each other.

750 RESONANCE | August 2017

GENERAL ARTICLE

2.3 Fitting PCA Model

PCA model can be tted using Eigen value decomposition method

as summarized below. Autoscaled data is very
useful when the
(1) In the rst step, the data set is mean centered X = X−mean(X). variables are in different
(2) In the second step, the covariance matrix for mean centred X scales or in different
magnitudes. The data set
is calculated, is autoscaled by
XT X subtracting the mean
Cov(X) = from each column,
I−1 followed by division
(3) In the next step, the covariance matrix is diagonalized to ob- with the standard
tain the ∧ (Eigen values) and P (Eigen matrix containing the deviation.
Eigen vectors): Cov(X)P = ∧P.
(4) In this step, the diagonal elements in the matrix ∧ are arranged
in the decreasing order (i.e., ∧1 > ∧2 > ∧3 ... > ∧k ), and the
corresponding arrangement is made in the loading matrix P. For
example if the positions of ∧1 and ∧3 are interchanged in the
matrix ∧, then the rst and third rows of P are interchanged.
(5) Score matrix T can be calculated by projecting the data set
X in the space spanned by the Eigen vectors of matrix P : T =
XP. The score matrix T and loading matrix P are orthogonal and
orthonormal, respectively. T T T = diagonal matrix, and PT P =
identity matrix.
(6) The approximation of data set X by PCA model can be repre-
sented as: X = T PT + E.
E is the residual matrix of dimension I × J, T is the score matrix
of dimension I × K, and P is loading matrix of dimension J × K,
where K is the number of signicant factors of PCA model that
explain majority of the variation in the data set and it is always ≤
min( I and J).
(7) Score value of any new sample can be calculated by projecting
X new new data on P: T new = Xnew P.
We can also use the autoscaled data in step 1 to perform PCA
analysis. It is very useful when the variables are in different
scales or in different magnitudes. The data set is autoscaled

RESONANCE | August 2017 751

GENERAL ARTICLE

by subtracting the mean from each column, followed by division

The choice of number of with the standard deviation:
factors can only affect
the extent to which one X = [X − mean (X)] /standard deviation (X).
can retrieve different Steps 2–7 can be performed on autoscaled X to create the PCA
pieces of orthogonal
information. Thus, a model. It is to be noted that square matrix obtained in step 2 is
PCA model can be dened as the correlation matrix.
created with any number
of factors, and each 2.4 Finding Optimum Number of Factors for PCA Model
model would provide a
true piece of information
available in the data set.
One of the signicant advantages of PCA is that it is essentially
sequential in nature. In other words, K factor PCA model is al-
ways a subset of K + 1 factor PCA model. The choice of number
of factors can only affect the extent to which one can retrieve dif-
ferent pieces of orthogonal information. Thus, a PCA model can
be created with any number of factors, and each model would
provide a true piece of information available in the data set. In
order to ensure that we capture the complete information of the
data set without overtting the model, a general thumb rule is that
one has to rst select k factors from available K factors that could
capture at least 80% of the data set.

k
∧i
Amount of variance = i=1 , > 80%; k ≤ K.
i=1 ∧i
K

One has to keep adding the number of factors if the amount of

variance captured by the model increases by more than 3–4%.

2.5 Some Statistical Parameters Involved in PCA

Lack of Fit Parameter (Q): It is a measure of the diﬀerence be-

tween the actual data set and the approximation made by the PCA
model. The lack of t parameter (Q) of PCA model can be calcu-
lated by taking the outer product of the residual matrix E.
Q = EE T = X(I − PPT )X T .
In an ideal case, the diagonal elements of Q (a matrix of dimen-
sion I × I) should be zero.

752 RESONANCE | August 2017

GENERAL ARTICLE

Hotelling’s T 2 Statistic: This parameter measures the variation

of each sample within the PCA model. It indicates the spread of
samples from the origin in the model. It could be calculated as,
T 2 = T λ−1 T T .
Leverage: It measures the inuence of a sample in PCA model. A
sample with high leverage reinforces PCA model and may cause
the rotation of principal components. Leverage of a sample can
be calculated using the formula: Leverage = T (T T T )−1 T .

2.6 Detection of Outliers

Principal component analysis can be used to nd the outlier in a

data set, provided we study the leverage and residual of the sam-
ples. A sample that is not well described by the model will have
unusually high residual, and the samples that have high inuence
on the model will have unusually high leverage [9]. In an ideal
case, all the samples of a data set should have low leverage and
low residuals. The samples having high residual are classied as
outliers and need to be analyzed carefully. The samples having
high leverage may have high residual or low residual. The former
is called as bad leverage samples, and the latter is called as good
leverage samples. The bad leverage samples need to be analysed
very carefully because they tend to bias the model signicantly.
The good leverage samples also have to be analysed carefully as
they indicate unusual variation of the variable, though the data set
of those samples are well tted by PCA model.

3. Performing PCA: An Example

3.1 Data Used

The chromatographic data set reported in literature [10, 11] has

been used to carry out the present work. The chromatographic
data set consist of 120 oil samples. Of these samples, 68 be-
longs to the class of olive oils and 52 belongs to the class of non-
olive vegetable oils, and oil blends (i.e., the vegetable and olive
oil blends).

RESONANCE | August 2017 753

GENERAL ARTICLE

Figure 3. The chro-

matograms of 120 oil sam-
ples. Of this, 68 belongs
to the class of olive oils and
the remaining belongs to the
class of non-olive oils and
blended oils.

3.2 Software Used

All the analysis and data plotting was carried out on MATLAB-
2014 platform. However, there are other platforms such as R,
python, etc., that can be used for the analysis.

3.3 Results and Discussion

The chromatographic data sets acquired for the olive and non-
olive oils are shown in Figure 3. It can be seen that based on the
visual analysis of the chromatographic proles, it is diﬃcult to
diﬀerentiate olive oils from non-olive vegetable oils. Moreover,
manual analysis of such a large volume of data is laborious and
time consuming, and may not provide any meaningful interpre-

Figure 4. Amount of vari-

ance captured by diﬀerent
principal components (PCs).
The plot indicates that rst
two PCs are suﬃcient to ex-
plain most of the variance
(more than 85%) of the data
set without overtting the
model.

754 RESONANCE | August 2017

GENERAL ARTICLE

tations. However, PCA essentially simplies and reduces the The optimum number of
dimensions of the data set, and provides a fast and efficient way principal components
of analysing the complex chromatographic data of the selected oil required for tting PCA
model is obtained from
samples. The chromatographic data sets are arranged in a matrix the variance captured by
of dimensions 126 × 4001, where 126 is the number of samples different principal
and 4001 is the number of variables (i.e., retention time points). components against the
The chromatographic data sets are normalized to unit area and principal component
numbers.
mean-centered prior to PCA. The optimum number of principal
components required for tting PCA model is obtained from the
variance captured by different principal components against the
principal component numbers, as shown in Figure 4. It can be
seen that with the addition of principal components, there is a
substantial improvement in the cumulative percentage of variance
captured by the PCA model. However, beyond 2 principal com-
ponents, the addition of extra factors do not bring any substan-
tial improvement in the cumulative variance captured by the PCA
model. Thus, one can conclude that PCA model of two principal
components that explains more than 80% variance is optimum
to capture all the important information buried in the data set.
PC1 and PC2 individually explains 78% and 5% variances of the
data sets. The PC1 versus PC2 score plot is shown in Figure 5;
PCA model clearly separates the samples in two groups. It is
found that all the samples belonging to the class of olive oils have
negative PC1 score values and the samples belonging to the non-
olive oil class have positive PC1 score values. The blended oils

Figure 5. PC1 versus

PC2 score plot classifying
the olive oil samples from
non-olive oils and blended
oil samples.

RESONANCE | August 2017 755

GENERAL ARTICLE

Figure 6. (a) Load-

ing vectors corresponding to
PC1 that mainly contains
negatively correlated major
peaks can be used to ratio-
nalize the classication of
olive oil, non-olive oil, and
blended oil samples in the
PC1 versus PC2 score plot.
(b) Loading vectors corre-
sponding to PC2 mainly ex-
plains the minor peaks that
can be used for the classi-
cation of the samples within
the groups.

in the score plot appear near the edges of the ellipse. The blended
oil containing more of olive oils have negative PC1 score values,
whereas the blended oils containing less of olive oils have posi-
tive PC1 score values. The loading vectors that explain how the
variables are related to each other are shown in Figure 6.
The analysis of loading vector plot can be really helpful in nd-
ing the set of variables that are really helpful in characterizing the
samples. The loading vector corresponding to PC1 mainly ex-
plains the variations of the major peaks that can be used to char-
acterize the classes of olive oils and non-olive oils. The loading
vector corresponding to PC2 mainly explains the variation of the
minor peaks and can be used to diﬀerentiate the samples within
the olive oil and non-olive oil groups. Based on the loading vec-
tor proles, the appearance of blended oil samples on the extreme
edges of PC2 axes can be attributed to the fact that blended oils

756 RESONANCE | August 2017

GENERAL ARTICLE

Figure 7. The outlier di-

agnostic plot explaining the
leverage and residual val-
ues in two component PCA
model. Nine samples (16,
20, 40, 48, 51, 52, 72, 95 and
120) are found to have high
leverage values. Of this,
5 samples (16, 20, 40, 51
and 95) are the blended oils.
Five samples (41, 84, 85, 86
and 119) are found to have
contain diﬀerent types of olive and non-olive oils. The outlier high residual values indicat-
diagnostic plot created by plotting the leverage versus residual ing that their composition is
values can be used to nd the really unusual samples called the very diﬀerent from others or
outliers. The outlier diagnostic plot is shown in Figure 7. All the something went wrong at the
5 blended samples (16, 20, 40, 51, and 95) are found to have un- sample preparation or data
acquisition stages and need
usually high leverage values that correlate well with the fact that
further attention.
they contain constituents of both olive and non-olive oils. There
are some samples with high residual values indicating that these
samples need a careful analysis. These samples might have un-
usual compositions or something might have gone wrong at the
sample preparation or data acquisition stages. In summary, the
obtained PCA model is found to be highly specic and sensitive
in classifying the oil samples.
In most cases, PCA is well capable of classifying the samples.
Though, in some cases due to the complexity of the data sets,
the output of PCA such as score matrix needs to be further pro-
cessed with some other chemometric techniques such as linear
discriminant analysis (LDA) [12], soft independent modelling of
class analogy (SIMCA) [13,14], neural network analysis (NNA)
[3,14], etc., for achieving meaningful interpretation of the data
sets.

RESONANCE | August 2017 757

GENERAL ARTICLE

3.4 Conclusions

PCA is the most favourite tool in chemometrics. It reduces the

dimensions of the data sets and simplies the data for easy and
meaningful interpretations. Using the chromatographic data set
of olive and non-olive oil samples, it has been clearly shown that
PCA can be used as an unsupervised pattern recognition tech-
nique. PCA successfully diﬀerentiated olive oil from non-olive
oil samples. It is also shown that PCA can be used for detecting
the outlier samples in the data set.

Suggested Reading
[1] D L Massart, B G M Vandeginste, L M C Buydens, S de Jong, P J Lewi and V
J S Verbeke, Handbook of Chemometrics and Qualimetrics, Elsevier, New York,
1997.
[2] R Kramer, Chemometric Techniques for Quantitative Analysis, Marcel Dekker,
New York, 1998.
[3] H P Singh, R K Gulati and Ranjan Gupta, Stellar Spectral Classication Us-
ing Principal Component Analysis and Articial Neural Networks, Monthly
Notices of the Royal Astronomical Society, Vol.295, pp.312–318, 1998.
[4] K Kumar, P Bairi, K Ghosh, K K Mishra and A K Mishra, Classication
of Aqueous-based Ayurvedic Preparations Using Synchronous Fluorescence
Spectroscopy and Chemometric Techniques, Current Science, Vol.107, No.3,
p.107, 470–477, 2014.
[5] K Kumar, S Sivabalan, S Ganesan, and A K Mishra, Discrimination of Oral
Submucous Fibrosis (OSF) Aﬀected Oral Tissues From Healthy Oral Tis-
sues Using Multivariate Analysis of In-vivo Fluorescence Spectroscopic Data:
A Simple and Fast Procedure for OSF Diagnosis, Analytical Methods, Vol.5,
pp.3482–3489, 2013.
[6] B R Kowalski, T F Schatzki and F H Stross, Classication of Archaeological
Artifacts by Applying Pattern Recognition to Trace Element Data, Analytical
Chemistry, Vol.44, pp.2176–2180, 1972.
[7] T Kowalkowski, R Zbytniewski, J Szpejna and B Buszewski, Application of
Chemometrics in River Water Classication, Water Research, Vol.40, pp.744–
752, 2006.
[8] M F Harkat, G Mourot and J Ragot, An Improved PCA Scheme for Sensor
FDI: Application to An Air Quality Monitoring Network, Journal of Process
Control, Vol.16, pp.625–634, 2006.
[9] S Wold, K Esbensen and P Geladi, Principal Component Analysis, Chemomet-
rics and Intelligent Laboratory Systems, Vol.2, pp.37–52, 1987.
[10] de la P Mata-Espinosa, J M Bosque-Sendra, R Bro and L Cuadros-Rodriguez,
Olive Oil Quantication of Edible Vegetable Oil Blends Using Triacylglyc-

758 RESONANCE | August 2017

GENERAL ARTICLE

erols Chromatographic Fingerprints and Chemometric Tools, Talanta, Vol.85,

pp.177–182, 2011.
[11] http://www.models.life.ku.dk/oliveoil Address for Correspondence
[12] S Balakrishnama and A Ganapathiraju, Linear Discriminant Analysis A Brief Keshav Kumar
Tutorial, Institute for Signal and Information Processing, March 2, 1998. Institute for Wine Analysis and
https://www.isip.piconepress.com/publications/reports/1998/isip/lda/lda Beverage Research,
theory.pdf Hochschule, Geisenheim
[13] S Wold, Pattern Recognition by Means of Disjoint principal component mod- University, Geisenheim 653 66
els, Patt. Recog., Vol.8, pp.127–139, 1976. Germany
[14] R Brereton, Chemometrics for Pattern Recognition, John Wiley & Sons, Ltd, Email:
U.K., 2009. [email protected]

RESONANCE | August 2017 759

Blueberry Cheesecake Chocolates
100% (1)
Blueberry Cheesecake Chocolates
5 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Cab Overview: New Operating Concept
100% (2)
Cab Overview: New Operating Concept
34 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
22 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
23 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
4 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal+Component+Analysis
No ratings yet
Principal+Component+Analysis
6 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis: Learning Objectives
No ratings yet
Principal Component Analysis: Learning Objectives
11 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
R PCA (Principal Component Analysis) - DataCamp
No ratings yet
R PCA (Principal Component Analysis) - DataCamp
54 pages
PCA Basics
No ratings yet
PCA Basics
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
STAT502
No ratings yet
STAT502
13 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
program-3
No ratings yet
program-3
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
No ratings yet
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
12 pages
Unit-3
No ratings yet
Unit-3
28 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
4 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Dimensionality Reduction: Principal Component Analysis (PCA)
No ratings yet
Dimensionality Reduction: Principal Component Analysis (PCA)
17 pages
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Pca
No ratings yet
Pca
18 pages
(ABDI H.) Principal Component Analysis
No ratings yet
(ABDI H.) Principal Component Analysis
27 pages
Principal Component Analysis: Herv e Abdi and Lynne J. Williams
No ratings yet
Principal Component Analysis: Herv e Abdi and Lynne J. Williams
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Pca and t-SNE Dimensionality Reduction
No ratings yet
Pca and t-SNE Dimensionality Reduction
3 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Barron's Physics Practice Plus: 400+ Online Questions and Quick Study Review
From Everand
Barron's Physics Practice Plus: 400+ Online Questions and Quick Study Review
Barron's Educational Series
No ratings yet
Heat: The Nature of Temperature and Most Other Physics
From Everand
Heat: The Nature of Temperature and Most Other Physics
Marc E. King
No ratings yet
Physics I Essentials
From Everand
Physics I Essentials
The Editors of REA
3.5/5 (4)
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Lawn and Its Management
100% (3)
Lawn and Its Management
13 pages
Summative and Exam Questions
No ratings yet
Summative and Exam Questions
21 pages
350 050tk
No ratings yet
350 050tk
70 pages
Shape Memory Alloys
No ratings yet
Shape Memory Alloys
2 pages
IADC 77 Pretest 1a
No ratings yet
IADC 77 Pretest 1a
4 pages
The Secrets of Hunza Water
100% (2)
The Secrets of Hunza Water
3 pages
Sym Jet Power (Kau)
No ratings yet
Sym Jet Power (Kau)
57 pages
Single Stage PDF
100% (1)
Single Stage PDF
52 pages
5g Vehicular Communication Seminar 2
No ratings yet
5g Vehicular Communication Seminar 2
24 pages
My Belongings Are On The Ship Already. Good-Bye. And, My Dear Sister, As Long As The Winds Are Blowing and Ships Are
No ratings yet
My Belongings Are On The Ship Already. Good-Bye. And, My Dear Sister, As Long As The Winds Are Blowing and Ships Are
5 pages
Dapus Referat
No ratings yet
Dapus Referat
2 pages
Method Statement R1 - Grouting Nailing
No ratings yet
Method Statement R1 - Grouting Nailing
2 pages
Compaction Variables and Compaction Specification
No ratings yet
Compaction Variables and Compaction Specification
18 pages
One Year On Black Rock Still Addicted To Fossil Fuels
No ratings yet
One Year On Black Rock Still Addicted To Fossil Fuels
7 pages
Training of Mhps-Tomoni For GSRC
100% (5)
Training of Mhps-Tomoni For GSRC
22 pages
Drug Study Chloromazine
No ratings yet
Drug Study Chloromazine
5 pages
EXTRA Practice Unit 9A&B - Elementary
No ratings yet
EXTRA Practice Unit 9A&B - Elementary
3 pages
Zeiss Connected Portfolio - 2020
No ratings yet
Zeiss Connected Portfolio - 2020
56 pages
EKM 202 Operating Manual
No ratings yet
EKM 202 Operating Manual
161 pages
Unit-III (Fundamentals of Electrical Engg.) PDF
No ratings yet
Unit-III (Fundamentals of Electrical Engg.) PDF
32 pages
Case Hardening Steel 16MnCr5 AUSA
No ratings yet
Case Hardening Steel 16MnCr5 AUSA
3 pages
Capstone Project-1
No ratings yet
Capstone Project-1
10 pages
Tests SI Units Traditional Units: Clinical Laboratory Tests - Normal Values
No ratings yet
Tests SI Units Traditional Units: Clinical Laboratory Tests - Normal Values
2 pages
IFE-EFE
No ratings yet
IFE-EFE
3 pages
Waves and Optics
No ratings yet
Waves and Optics
8 pages
Zitai Catalog
0% (1)
Zitai Catalog
16 pages
Metamorphic Rocks Research & References PDF
No ratings yet
Metamorphic Rocks Research & References PDF
3 pages
Unit - 6 - Syllogism
No ratings yet
Unit - 6 - Syllogism
1 page

Kumar 2017

Uploaded by

Kumar 2017

Uploaded by

GENERAL ARTICLE

Principal Component Analysis:

Principal component analysis (PCA) is the most commonly

Keshav Kumar did his PhD

RESONANCE | August 2017 747

2.1 Geometrical Representation of PCA

In order to understand PCA geometrically, let us consider a two

748 RESONANCE | August 2017

T 2 = −J1 sin θ + J2 cos θ (2)

Projection of the data set in space, spanned by the new variables

Figure 2. (a) Rotation of

RESONANCE | August 2017 749

Before proceeding further, it is necessary that we briey describe

750 RESONANCE | August 2017

2.3 Fitting PCA Model

PCA model can be tted using Eigen value decomposition method

RESONANCE | August 2017 751

by subtracting the mean from each column, followed by division

One has to keep adding the number of factors if the amount of

2.5 Some Statistical Parameters Involved in PCA

Lack of Fit Parameter (Q): It is a measure of the diﬀerence be-

752 RESONANCE | August 2017

Hotelling’s T 2 Statistic: This parameter measures the variation

2.6 Detection of Outliers

Principal component analysis can be used to nd the outlier in a

3. Performing PCA: An Example

3.1 Data Used

The chromatographic data set reported in literature [10, 11] has

RESONANCE | August 2017 753

Figure 3. The chro-

3.2 Software Used

3.3 Results and Discussion

Figure 4. Amount of vari-

754 RESONANCE | August 2017

Figure 5. PC1 versus

RESONANCE | August 2017 755

Figure 6. (a) Load-

756 RESONANCE | August 2017

Figure 7. The outlier di-

RESONANCE | August 2017 757

PCA is the most favourite tool in chemometrics. It reduces the

758 RESONANCE | August 2017

erols Chromatographic Fingerprints and Chemometric Tools, Talanta, Vol.85,

RESONANCE | August 2017 759

You might also like