0% found this document useful (0 votes)
209 views13 pages

Statistical Analysis of Cricket Leagues Using Principal Component Analysis

This document discusses using principal component analysis to statistically analyze cricket leagues. Specifically, it analyzes batting and bowling statistics from the Pakistan Super League and Indian Premier League from 2016-2019. Principal component analysis is used to rank the top 10 batsmen and bowlers in each league based on their contributions. This study aims to reduce the dataset dimensions to identify the most important variables that influence player performance. It is the first analysis of its kind focused on Asian T20 cricket leagues.

Uploaded by

MOHAN R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views13 pages

Statistical Analysis of Cricket Leagues Using Principal Component Analysis

This document discusses using principal component analysis to statistically analyze cricket leagues. Specifically, it analyzes batting and bowling statistics from the Pakistan Super League and Indian Premier League from 2016-2019. Principal component analysis is used to rank the top 10 batsmen and bowlers in each league based on their contributions. This study aims to reduce the dataset dimensions to identify the most important variables that influence player performance. It is the first analysis of its kind focused on Asian T20 cricket leagues.

Uploaded by

MOHAN R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/354721618

Statistical Analysis of Cricket Leagues Using Principal Component Analysis

Article  in  Journal of Engineering and Applied Sciences · September 2021


DOI: 10.33897/fujeas.v2i1.451

CITATIONS READS

0 1,200

6 authors, including:

Sheharyar KHAN Ktk Muhammad Ishtiaq


University of Engineering and Technology, Taxila University of Engineering and Technology, Taxila
1 PUBLICATION   0 CITATIONS    3 PUBLICATIONS   1 CITATION   

SEE PROFILE SEE PROFILE

Kainat Bibi Rashid Amin


COMSATS University Islamabad University of Chakwal
2 PUBLICATIONS   0 CITATIONS    55 PUBLICATIONS   717 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Hybrid SDN View project

Statistical Analysis of Cricket Leagues Using PrincipalComponentAnalysis View project

All content following this page was uploaded by Sheharyar KHAN Ktk on 21 September 2021.

The user has requested enhancement of the downloaded file.


Statistical Analysis of Cricket Leagues
Using Principal Component Analysis
Sheharyar Khan 1, Muhammad Ishtiaq 2, Kainat Bibi 3,Rashid Amin∗4 Adeel Ahmed5, Iqra
Chaudhary6
1, 2, 4
Department of Computer Science University of Engineering and Technology Taxila, Pakistan
3
COMSATS University Islamabad Wah Campus
5
Department of Computer Science, Quaid-i-Azam Univeristy, Islamabad, Pakistan
6
Department of Computer Science, National University of Modern Languages, Islamabad, Pakistan

ABSTRACT--Any sport has statistics and cricket is one of ____________________


the sports where statistics are significantly important ∗Corresponding Author
because, based on these statistics, players are ranked. Email addresses:
These statistics include individual runs, wickets, and [email protected],ishtiaqmohammed31
highest scores, etc. Based on statistics, players are @gmail.com, [email protected],
selected for any tournament around the world. This *[email protected],
research uses Principal Component Analysis by Received: DD.MM.YYYY; Accepted: DD.MM.YY
evaluating cricket facts and figures. This analysis tests
the precise co-variation among different measurements
relating to the batting and bowling abilities of players
I. INTRODUCTION
in the Pakistan Super League PSL T-20 (2016-2019)
and IPL T-20 (2016-2019) utilizing the progressed Cricket is a game in which each team has eleven players
factual system Principal Component Analysis. PCA is and it is a bat and ball game between two teams. There
applied in this study to rank the PSL batsmen and is a pitch at the center of which is 20 meters long with a
bowlers based on their contributions to their clubs wicket at each end. Each wicket has a bail on it. The
during these competitive seasons. The results of this batsmen score runs by striking the ball bowled by the
research revealed the top ten ranked batters and bowler, and the bowlers try to take wickets by hitting
bowlers who excelled during the series. Principal the ball to the wickets. There are other possible ways to
Component Analysis is widely used in applied take wickets, such as being caught out by any fielder on
multivariate data analysis. In the current investigation, the ground when batsmen hit a ball or being stumped
PCA was utilized to rank the top ten best-performing out by the keeper behind the batsmen, or by hitting a
batsmen and bowlers of the PSL and IPL. Principal wicket when batsmen hit their bat to the wicket.
Component Analysis is a dimension reduction technique
that is used to reduce dataset dimensions into smaller
variables. Here, principal component analysis is
successfully used to rank the cricket batsmen and
bowlers. We can presume that batting ability rules over
bowling capacity. This exploration is the first report in
Pakistan that features the highlights of the PSL and IPL

Mathematics Subject Classification (2010)


62H25, 62H86, 62J10

Keywords PSL, IPL, PCA, Sports Analysis


Similarly, batsmen score runs by hitting boundaries or including entire number programming and data
by running runs. When ten players are dismissed by the envelopment assessment, have been used in the
bowling side, the innings of one team that’s batting composition (see Sharp et al., 2011, Lemmer [7]. [8]
Factor examination just on world T20 data. [9]“A Study
ends, and the bowling team sends their batsman to bat.
on Performance of Cricket Players using Factor
The first batting team's target is now being chased by Analysis Approach on IPL 9”.[9] Purposed situating of
the second batting team. Based on the chase or captain subject to a couple of parameters using Principal
dismissed by the bowling team, win and loss are Component Analysis [10] used factor examination of
decided. The game is supervised by two umpires, aided just IPL Players.[11]used progressive head section-
by a third umpire, usually called the TV Umpire, and a based computation for perfect lineup and batting
match referee in international matches. They demand assurance in one-day worldwide cricket for
communicate with two off-field scorers who record the Bangladesh. In the wake of looking at composing
experts have achieved this work on IPL and World T20,
match’s statistical information. This game is currently
which exhibits that there is an opening for which is
in three formats. filled by giving Asian T20 affiliation Analysis driven by
the country boards. For instance, [12]Valadkhani et al.
• Test-In with 5 days for play and no limits of (2008) used a factor assessment approach for worldwide
overs. portfolio extension According to the author’s research,
• Twenty-Twenty in which each team plays 20 no one has made arrangements with Asian leagues like
overs. PSL and IPL for the last four seasons, and analysis was
performed most likely on the PSL. The limited
• ODI- One Day International in which 50 composition of various parts of cricket demonstrates a
cricket players between two teams.
strong need to improve cricket composition. Statistical
Method was used to rank captains based on numerous
Over the last 8 to 10 years, twenty-twenty of
factors. The weighted average approach was also used
international games have gained popularity and
to rank captains based on the z score of the team's
audiences are very involved in these limited-overs
performance, as well as the captain's performance as a
games as if they finish within 2 hours. Twenty 20
batsman and bowler[13]. During the match, the multiple
cricket was showcased in 2003 and involved matches
linear regression model is also used to predict the match
between English and Welsh domestic sides "[1]. The
outcome[14]. A comparative study of machine-learning
T20 cricket format was financially effective and
methods was applied to predict cricket match outcomes
spectators delighted in a shorter variant of the game. In
using the opinions of crowds on social networks[15].
this era, everybody knows about cricket, but for those
The rest of the paper is organized as follows Section III
not aware of this sport, read [2]. These leagues come up
describes the Empirical Statistical Methodology
with many highly skilled players from all over the
employed in this paper. Section IV Steps Involved in
world. In our study, IPL and PSL players are from
PCA Section V Describes Data and Statistical Analysis
approximately 11 different countries, mostly from Asia
and Section VI Introduction to IPL, Section VII
and Europe. Based on this data, we quantified the
Introduction to PSL, Section VIII with Concluding
performance through PCA. This research aims to find
remarks
the rankings of PSL and IPL league bowlers and
batsmen using PCA for the last four seasons (2016- III. EMPIRICAL STATISTICAL
2019). From this, we can identify the top batsmen and
METHODOLOGY
bowlers in the PSL and IPL competitions to evaluate the
batting capabilities over bowling capabilities in the PSL PCA is the multivariate data analysis technique devised
and IPL. by Pearson [16] and [17]. This technique uses
mathematical principles to convert correlated variables
II. LITERATURE REVIEW into a smaller number of variables named Principal
components. The PCA technique involves the extraction
In literature, many authors have worked on these types
of statics from a dataset and keeping the important
of analysis, but for unique single cricket seasons, some
statistical information. PCA performs compression,
use graphical approaches. Bowling and batting
simplifies the data dimensions and, reduces the
performance comparison is conducted in a graphical
dimensions of data. PCA is significant because if we
methodology by [3]. Furthermore, this investigation can
perform the elimination of variables, it may cause
be used to differentiate dissimilar groups of players, for
information loss, but PCA performs extraction which
instance, aggressive batsmen, bowling all-rounders,
can retain important information and without much loss
spinners, and other groups. But graphical approaches
of information, our dataset represents a more clear view
are not accurate and there is a gap in introducing other
of information. PCA is a technique used for correlated
techniques for statistical analysis. For selection and
data and it reduces the variables. We aim to break up
comparison[4] suggest a numerical approach through
the set of correlated variables into a smaller number of
this approach to picking batsmen in the team. [5]
uncorrelated variables as a linear combination of actual
Presents a mockup approach for the order of batsmen
variables [18]. For more information about "Principal
and batting in one-day cricket. This work was
Component Analysis Johnson and Wichern (2007)"
furthermore loosened up by [6]. He created a model
[19], and [20] are described in detail. In this study, a
which can simulate one-day cricket. For various
brief introduction to PCA is presented. PCA is used to
components of cricket, for instance, player’s selection
the covariance through a set of original variables and
for the team, some advanced logical techniques,
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 3
reduce set of variables. It is a technique used for V. DATA & STATISTICAL ANALYSIS
dimensionality reduction. PCA is useful when our
To demonstrate the proposed model of Principal
dataset has some number of useful variables. There has
i Component Analysis data is gathered from the ESPN
some reasonable redundancy in these variables. Here
website. For batting analysis, only those player’s data is
redundancy refers to variables that are interrelated with
included in our dataset who played at least four innings
one another and in some logic, they measure the similar
and for bowling analysis, only those players are
player performance attribute. PCA’s technique reduces
included who have bowled at least four overs. After this
these redundant variables to a lesser number also
rule applied to the data set, we have157 batsmen and 135
reduces the set of variables and reflects the original
bowlers for IPL (2016-2019) and similarly, we have 110
variables. Let suppose we have some variables that are
batsmen and 62 bowlers for PSL (2016-2019) are
correlated to one another PCA reduces the correlated
included in our final datasheet.
variables and this reduced set is used to provide a sum-
up measure of performance. Suppose we have given an
5.1. Batting Statistics
arbitrary vector (α = α1,α2,α3...αp consisting of p random
variables, having covariance matrix and eigenvalue In this study, we consider eight important measures of
Eigenvector pairs (µ1, e1), (µ2, e2), (µ3, e3) (µp, ep) where batting statistics, such as total runs, highest score,
µ1 µ2......µ1 the ith principal component, say Li, is average ball faced, strike rate, number of the 50’s,
defined as Li = etα =ei1α1 + ei2α2 + ......... + eipαp for i number of 4’s and number of 6’s. Descriptions of these
= 1, 2 p where e is the component of Eigenvectoret it different parameters of batting performances are given
can be shown that PCA is a Linear Combination of in the table.
original variables. Furthermore, If we have Zi = btα =
bi1α1 + bi2α2 + ...... + bipαp is a linear combination of Table 1. Batting Statistics
these actual variables for the first PC Variable (L1) =
µ1 V ar (Zi) Cov (Li, Lj) = 0, for i is not equal to j and SR Batting Description
above three equation gives us the hope that principal No. Statistics
component Li can give us the aggregate of the important 1 Inning Total Number of Inning a
signals contained in the actual variables α and this can player played in all season not
be done without redundancy and provide the mean of less than 5.
each Principal component. Notice the total variability 2
(VT ) Total Runs Sum of all runs in every match
VT = V ar (α1) + V ar (TR)
(α2) +..... + V ar (αp) 3 Highest Score The highest(3.1)score by a
VT = V ar (L1) + V ar (L2) + ... + V ar (HS) batsman in one match during a
(Lp) (3.2) tournament

VT = µ1 + µ2 + ... + µp (3.3)4 Batting The ratio of R/m R denotes


Average Number Runs Score and M
For the jth principal component, the total percentage of denotes
p i=1
variance is µj × µi if a substantial percentage 5 Ball Faced Total Number of Ball Faced
of the total variance is captured by the first few in allseasons and any League
(BF)
Principal Component Analysis. Then these reduced
sets of new variables are used in place of the actual 6 Strike Rate Strike Rate is TR/BF
variable without any loss of information. Firstly, we
have to make some adjustments to the variables 7 No. of the 50’s Total No. of the 50’s
before performing PCA. If we don’t make some 8
adjustments, the results of principal components will No. of 4’s Total No. of 4’s
not be accurate, and we have variables with extreme 9 No. of 6’s Total number of 6’s
variance, which can affect the goal of an overall
performance measure. Cricket Data needs fine-tuning.
If we do so, the results are accurate and did not affect 5.2. Bowling Statistics
the goal [21].
Five measures are taken in Bowling Analysis, e.g.
Wickets, Bowling Average, Bowling Economy Rate,
IV. STEPS INVOLVE IN PRINCIPAL Bowling Average, Bowling Strike Rate, and Maidens.
COMPONENT ANALYSIS Over Sharp et al. (2011) and Lemmer (2011) used three
bowling measures, such as the bowler’s economy rate,
a. Standardization. bowling average, and bowling strike rate. Descriptions
of these different parameters of bowling performances
b. Co-variance matrix computation. are given in the table. 2.
c. Compute the eigenvectors and
eigenvalues of the covariance matrix
to identify the principal component.
d. Feature Vector.
e. Replicate the data along with the
principal component axes.
Table 2. Bowling Statistics the most variation that is 80.01 percent against the first
Eigenvalue.
SR Bowling Description
No. Statistics
1 Wickets Total Number of Wickets in Any
league
2 Bowling TR/W, where TR is the total runs
Average conceded by a bowler and w is thetotal
number of wickets taken by abowler

3 Bowling TR/O, where TR is the total number


Economy of runs conceded by a bowler and O is
the total number of overs bowled by a
bowler
4 Bowler TB/W, where TB is the total number
Strike of balls bowled by a bowler and w is
Rate the total number of wickets taken by a
bowler
5 Maiden Number of Over’s which has no run
Overs score

VI. Introduction to IPL (Season 9-12)


The Indian Premier League (IPL) was introduced by the
Board of Control for Cricket in India (BCCI). This Fig. 1. Matrix plot of eight batting parameters
league started in the month of April-May and has
gained a huge economic and entertainment forum. The
IPL is the most-attended cricket league in the world, Table 3. Sample Correlation Matrix of Batting
and in the IPL teams were 9-10 teams participating in Statistics
the tournament. This team’s owner chooses players by VA R HS AV BF SR 50 4s 6
auction from the country and a few from other R U E s
countries. Every team chooses a maximum of four NS
players from foreign countries. Ru 1 0.83 0.8 0.9 0.3 0.9 0.9 0.
6.1. Empirical Results of IPL ns 5 95 93 68 4 76 8
6.1.1. Results of Batting Performance. 8
For evaluating the batting performance of the players in 9
the IPL, the eight batting parameters: runs, the highest HS 0.8 1 0.8 0.8 0.5 0.7 0.8 0.
individual score (HS), average batting performance 35 06 22 41 59 16 7
(Ave), strike rate (SR), number of fours (4’s), balls 7
faced (BF), number of the fifties (50) and number of 3
sixes (6’s) are used. Figure 1 illustrates a matrix plot; it Av 0.8 0.80 1 0.8 0.4 0.8 0.8 0.
shows that there are noteworthy connections that exist e 95 6 87 57 1 35 8
among these measures. Runs and HS, Runs and BF, 2
Runs, and 4s and 6s essentially correspond. To examine 7
batting execution, the PCA method, which is equipped BF 0.9 0.82 0.8 1 0.3 0.9 0.9 0.
for taking care of connected information in any sensible 93 2 87 21 3 77 8
endeavor, is received. PCA’s analysis utilizes these 4
batting measures, for example, runs, HS, AVE, BF, SR, 5
50, 4s, and 6s with the condition that all the batsmen SR 0.3 0.54 0.4 0.3 1 0.2 0.3 0.
who played at least 4 matches in the IPL are considered 68 1 57 21 96 35 4
and a total of 157 records are gathered after adjustments 5
in the dataset. Innings played by batsmen where he bats. 4
There is a chance, due to the short format of the game, 50 0.9 0.75 0.8 0.9 0.2 1 0.9 0.
he did not bat. Similarly, a batsman who never bats is 4 9 1 3 96 36 8
not included in our dataset. And we cannot consider it 1
an inning. All the cricket variables are first normalized 5
with the above rule. If we cannot do this 4s 0.9 0.81 0.8 0.9 0.3 0.9 1 0.
standardization, the results are not accurate. Then we 76 6 35 77 35 36 8
apply principal component analysis. Table3 indicates 0
above the sample correlation matrix for the 157 batting 2
vectors. Here for calculating the Eigenvalue eigenvector 6s 0.8 0.77 0.8 0.8 0.4 0.8 0.8 1
pairs of a sample correlation matrix, we use Minitab 19 89 3 27 45 54 15 02
Table No.4 below represents the first PC that explains
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 5
Table 4 shows 80.01 percent and its corresponding
Eigenvalue is 6.4014. That is higher than 1 [22]
Batting Ranking of IPL
recommends that retain the principal component whose
corresponding Eigenvalue is greater than 1. Here, it is referred to as the first principal component,
Table 5 below shows the Eigenvector factors for all which is the weighted average of all eight variables
eight principal components (PC1–PC8). The Eigenvalue used. We have seen that the value of the first principal
Eigenvector pair for the first principal component is is positive and it is a matter of fact that the higher the
shown in Table 5. value of L1 shows better performance and the lower the
Figure 2 batting statistics are shown in the scree plot. In value of L1 shows poor performance. This gives us
the score plot, it is clearly shown at elbow. The point is hope of ranking the players based on the first principal
at 2, which shows that it is used as the first principal component. Table 6 indicates the top 10 batsmen who
component and it explains 80 percent of total played at least 4 matches in the IPL (2016-2019) using
variability. Hence, the result for batsmen is shown the first principal component. L1 6 below shows the top
below: ten IPL Batsmen performers. Five bowling measures,
L1 = (0.39 ∗ Runs) + (0.352 ∗ HS) + (0.365 ∗ Ave) + such as wickets, bowler’s economy rate, bowling
(0.384 ∗ BF) + (0.189 ∗ SR) +(0.368 ∗ 50) + (0.378 ∗ average, and bowling strike rate.
4s) + (0.358 ∗ 6s)
6.2. Result of Bowling Performance
Table 4. Eigen Values and Corresponding and total Variability
Five bowling measures, such as wickets, bowler’s
economy rate, bowling average, and bowling strike rate,
Eigen Total Variability have been used for evaluating the bowling performance
Values of the players in the IPL. The correlation structure of
the five "bowling variables" is explained in Table 2.
6.4014 80.0175 Figure 3 shows the used performance variables in the
matrix plot. Average and strike rates are significantly
0.9039 11.29875 high correlation and other shows below, like maiden
over, show negative impact and weaker correlation.
0.2316 2.895 However, each one of them shows a different type of
bowler.
0.2026 2.5325
0.1688 2.11

0.0758 0.9475
0.0145 0.18125
0.0014 0.0175

Table 5. Eigen Values and Eigen


Vectors for Sample Correlation
Matrix BattingPerformance IPL
Variable PC PC PC PC4 PC5 PC6 PC7 PC8
1 2 3
Runs 0.39 - - 0.103 0.014 0.267 - -
0 0.13 0.02 0.230 0.833
8 7
HS 0.35 0.19 0.54 -0.617 - - - -
2 7 4 0.373 0.147 0.020 0.010
Ave 0.36 0.04 - -0.395 0.764 - 0.167 0.021
5 4 0.22 0.211
7
BF 0.38 - 0.08 0.089 0.122 0.363 - 0.511
4 0.18 9 0.627
9
SR 0.18 0.90 0.02 0.363 0.074 0.039 - 0.012
9 6 0 0.055
50 0.36 - 0.05 0.445 - - - 0.035
8 0.21 8 0.112 0.771 0.077
9
4s 0.37 - 0.29 0.297 0.011 0.360 0.706 0.155
8 0.18 0
0 Fig. 2. Scree Plot For Batting Performance IPL
6s 0.35 0.05 - -0.162 - 0.072 0.135 0.141
8 0 0.74 0.493
6
r
Table 6. Top ten Batsmen of IPL Ranked by m
a
First Principal Component, L1
R B M I N R H A B S 5 4 6 L 1 S 5 5 6 1 1 1 1 1 6 1 5 1
a a a n O u S v f R 0 s s 1 0 V 5 5 4 0 2 0 3 1 9 1
n t t n n e 6 2 9 8 4 0 8
k s c s s S 0 . 2 . 8.
m h a 2 9 8
a e m 0 3 6
n s s
o
1 V 5 5 7 2 1 4 1 1 1 2 8 1
n
4 4 2 1 8 6 4 7 0 0 6
K 7 3 . 0 2 4 9
o 5 3 1 . 8. PCA out of 135 bowlers are included from these four
h 0 3 seasons and only those are selected that bowl at least
li 9 0
four overs. Table 7 shows the sample correlation matrix
2 D 4 4 8 2 1 4 1 1 2 2 7 1
A 3 3 1 2 7 4 4 1 0 8 6 for the 135 bowling observations. In Table 7, ordered
8 6 . 9 6 8 7 Eigenvalues and their corresponding total variability.
W 1 0 3 . 2. While Table 8 shows the factors for all five principal
a 1 0 2 components (PC1 – PC5). The Eigenvalue Eigenvector
r 8 9
m pair for the first principal component is highlighted in
a Table 8.
3 S 6 6 9 1 9 3 1 1 1 2 4 1
D 3 3 9 7 7 5 2 6 2 2 5
h 9 . 5 8 7 5
a 8 0 2 . 3.
w 8 7 9
a 3 2
n
4 A 5 4 9 1 1 1 1 1 1 1 1 1
B 0 9 8 2 2 1 6 8 3 0 3
2 9 8 3 1 9 9 3
d 5 . 2 . 6.
e 7 2 6
V 3 1 8
il
h
e
r
s
5 S 6 6 8 1 8 1 1 1 1 1 4 1
K 1 1 6 4 6 2 3 3 7 4 2
6 1 6 1 2 9
R 9 . 9 . 0.
a 2 5 0
i 1 2 1
n
a
6 A 5 5 6 1 1 1 1 1 1 1 3 1
M 9 7 6 0 3 3 2 0 7 2 2
2 5 1 0 5 3 8 Fig.3. Matrix Plot for five Bowling parameters
R 5 . 0 5.
a 5 9
h 2 5
a
n
a Table 7. Sample Correlation Matrix for Bowling Performance
7 R 5 5 6 1 1 1 1 1 1 1 9 1 Variable Wkts Ave Econ SR Maiden
P 4 4 7 2 2 0 6 1 5 4 2
P 3 8 5 6 2 2 7
Over
a 6 7 . 0. Wkts 1 - - - 0.52
n 6 5 0.425 0.360 0.376
t 9 9 Ave - 1 0.268 0.982 -0.221
8 K 4 4 8 1 1 1 1 1 1 1 7 1 0.425
L 2 0 6 0 6 1 4 6 5 3 2
R 4 0 2 2 6 2 7 Econ -0.36 0.286 1 0.099 0.279
a 9 . 5 . 0. SR -0.36 0.982 0.099 1 -0.181
h 6 5 5 Maiden 0.52 - - - 1
u 9 7 9
l 9 Over 0.221 0.279 0.181
9 R 6 5 8 1 9 1 1 1 1 1 4 1
G 0 9 5 4 4 1 2 2 5 7 2
1 6 7 9 7 4
S 3 . 1 . 1.
h 5 2 4
a 7 0 4
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 7
Table 8 Eigen Values and Eigen Vectors for Table 10. Top Ten Bowlers of IPL
Sample Correlation Matrix
Ranked by First Principal Component,
Variable PC1 PC2 PC3 PC4 PC5
L1
Wkts - 0.333 0.242 0.786 0.001
0.461 R P M I O M W A E SR L
Ave 0.553 0.398 0.114 0.120 0.713 a l a n v ai kts v c 1
n a t n e d e o
Econ 0.300 - 0.814 0.123 -
k y r e n
0.466 0.124
i e s n
SR 0.519 0.492 -0.032 0.107 -
n r
0.690
g
Maiden - 0.522 0.514 - -
1 B 5 5 2 5 71 2 7 18.870 -
Over 0.349 0.584 0.005
8 8 2 3 . 9.
K 3 . 4 3
The L1 first principal component for bowlers is defined u . 5 9 9
as: m 3 7 6 4
L1 = (Wkt ∗ (−0.461)) + (Ave ∗ 0.553) + (Eco ∗ a 7
0.3) + (SR ∗ 0.519) + (Mnd ∗ (−0.349)) r
In Table 9 there total variability in the first entry is 51
2 J 6 6 2 3 71 2 7 19.141 -
percent and Figure 4 shown the scree plot from the scree J 0 0 2 2 . 9.
plot it is clearly shown that the elbow at 2 gives us the 6 . 1 0
first principal component and it is enough evidence for B . 9 8 1
ranking bowler. Its Eigenvalue is 2.5597 and this is the u 6 1 0 3
only one that is higher than 1.
m 5
Table 9. Ordered Eigen Values and Corresponding Percentages of r
Total Variability a
h
Eigen Values Total Variability
3 Y 5 5 1 2 65 2 7 17.686 -
S 4 4 9 2 . 6.
2.5597 51.194 1 . 7 5
C . 8 4 4
1.2541 25.082 h 6 1 0 4
a 5
0.7405 14.81 h
a
0.4423 8.846
l
0.0034 0.068 4 I 3 3 1 1 55 2 7 16.101 -
m 9 9 4 0 . 3.
r 7 . 6 7
a . 4 2 3
n 6 7 8 7
2
T
a
h
i
r
5 R 4 4 1 3 55 2 6 19.854 -
a 6 6 8 1 . 2.
s 2 . 5 1
h 6 5 3
i 9 4 5
d 0

K
h
a
n
6 U 4 4 1 1 55 2 8 17.989 0.
T 8 8 6 6 . 6
Fig.4. Scree Chart of IPL Bowling 4 . 6 4
Y . 0 9 7
a 9 5 0
d 4
a
v

7 S 5 5 1 1 56 2 7 19.757 0.
a 0 0 8 5 . 7
n 4 . 8 5
d . 8 5 5
e 4 7 7
e 5
p

S
h
a
r
m
a
8 M4 4 1 1 53 2 8 18.70 2.
J 4 4 6 6 . 2 Fig.5. Matrix plots of eight batting parameters of PSL
5 . 5 8
M . 7 7 2 Bowling lowers the value of L1 depicts better
c 2 3 7 performance and the higher the value of L1 in bowling
C 5 Analysis depicts poor performance Based on which
l bowling ranking is generated. Table 10 shows (IPL
e 2016-2019) lists the top ten bowlers using the first
n principal component L1.
a
g VII. INTRODUCTION TO PSL (I TO
h
a IV)
n The Pakistan Super League (PSL) is a Twenty20 cricket
9 A 2 2 9 0 39 2 8 15.230 4. league owned by the Pakistan Cricket Board. It is a
J 6 6 9 1 . 0 commercial professional league similar to IPL and BPL,
. 3 7 which started in Sep-2015. This league initially had five
T 0 0 2 teams participating. But with the time now, it comprises
y 7 3 six teams. Each team is owned by a franchise. PSL on
e 6 the world calendar is scheduled for February and March
1 K 1 1 6 0 31 1 8. 13.2 4. of every year. Each team plays double matches. The
0 8 8 8 7. 1 9 first four on the point table qualify for the play-offs and
R . 9 5 2 the winner of the play-offs will play the final match. For
a 2 3 2 3 batting and bowling analysis, a similar strategy is
b 5 8 followed as discussed in Table 1 and Table 2. The only
a difference is we have different players and their data-set
d is comprised of 110 batsmen and 62 bowlers. From
a seasons 1 to 4, we have gathered data from the ESPN
website. EMPIRICAL RESULTS OF PSL
Bowling Ranking of IPL
7.1. Result of Batting Performance
Similarly, in the above, the author applies Principal
Component Analysis to eight measures of the Batting
dataset, and figure 5 illustrates a matrix plot. It shows
that there are noteworthy connections that exist among
these measures. Runs and HS We have the following
Eigen Values and Eigenvector of the sample co-relation
matrix as shown in Table 11. Ordered Eigenvalues and
their corresponding percentage total variability are
shown in Table 12.
Table 12 represents the ordered Eigenvalues and
percentage of total variability attributed to each
Eigenvalue. This shows that the first PC explains the
most variation that is 78.4% against the first
Eigenvalue. So, we take the first Principal Component
to rank the batsmen.
Table 13 shows the eigenvector factors for all eight
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 9
principal components (PC1 – PC8). The Eigenvalue- 38 0.0 2 62 0.0 0.4 0.5 0.5
eigenvector pair for the first principal component is 8 47 2 86 56 47 07
shown in Table 13. 2
SR 0. 0.7 - - 0.5 0.0 - -
Table 11. Sample Correlation Matrix of Batting Performance PSL 26 97 0.1 0.0 21 08 0.0 0.0
3 21 86 32 15
Varia PC PC PC PC PC PC PC PC
50 0. - 0. - 0.3 0.6 - -
ble 1 2 3 4 5 6 7 8
35 0.3 3 0.0 81 69 0.1 0.0
RUM 1 0.74 0.84 0.98 0.61 0.89 0.96 0.88 7 24 8 11 69 16
S 4 7 6 6 3 8 8 2
HS 0.7 1 0.63 0.71 0.32 0.71 0.74 0.67 4s 0. - 0. 0.0 0.1 - 0.7 -
44 8 9 5 3 1 5 38 0.1 2 08 63 0.3 76 0.1
AVE 0.8 0.63 1 0.85 0.68 0.68 0.78 0.72 4 60 6 32 41
47 8 7 8 7 3 5 1
BF 0.9 0.71 0.85 1 0.58 0.88 0.95 0.81 6s 0. 0.0 - - - 0.2 0.0 -
86 9 7 9 5 9 9 35 96 0.1 0.7 0.4 01 50 0.1
SR 0.6 0.32 0.68 0.58 1 0.40 0.53 0.62 6 28 34 84 81
16 5 8 9 2 3 7
50 0.8 0.71 0.68 0.88 0.40 1 0.92 0.73
93 3 7 5 2 7 5
4s 0.9 0.74 0.78 0.95 0.53 0.92 1 0.81
68 1 3 9 3 7 2
6s 0.8 0.67 0.72 0.81 0.62 0.73 0.81 1
88 5 5 9 7 5 2

Table 12 Eigen Values and Corresponding and total Variability

Eigen Values Total Variability


6.2738 78.4127
0.8027 10.0325
0.3572 4.464442
0.2761 3.450819
0.1771 2.213473
0.0799 0.998625
0.0299 0.373703
0.0034 0.042495

Table 13 Eigen Values and Eigen Vectors


for Sample Correlation Matrix Batting
Performance PSL

Vari PC PC P PC PC PC PC PC
able 1 2 C3 4 5 6 7 8
Runs 0. - 0. - - - - 0.8
39 0.0 1 0.0 0.1 0.2 0.1 30
5 33 4 61 10 79 96
7
HS 0. - - 0.0 0.2 - - -
31 0.3 0.8 54 64 0.0 0.0 0.0
5 96 17 54 43 11 Fig.6. Score Chart of PSL Batting
Ave 0. 0.2 - 0.6 - 0.3 0.1 0.0
35 53 0.1 49 0.4 37 61 01
1 31 84
BF 0. - 0. 0.1 - - - -
Table 14. Top Batsmen of PSL Ranked by First ee
Principal Component Analysis L1 z
R Pl M I N RHS Ave B S 1 50 4 6 LI 8 D 2 2 4 7 7 31.9 6 4 0 5 1 6 3 699
aa a n Ou F R 0 0 s s R 8 5 0 3 37 1 2 7 1 .36
ny t n n 0 S 1 1 8. 5
k er s s
s m 3
1K 4 4 2 1 10 28.7 9 5 2 9 8 1 6 113 it 2
a 7 6 2 7 375 5 2 2 7 6.1 h
m 8 6 3. 7 4 9 C 2 2 2 6 11 25.2 5 5 1 4 4 6 2 697
ra 6 2 S 6 5 8 7 55 1 5 6 6 .79
n 7 D 2 9 2. 6
A el 4
k p 2
m or
al t
2S 3 3 4 1 9 33.0 8 5 0 7 1 9 6 100 1 S 4 3 1 7 5 30.1 5 4 0 3 2 5 1 696
R 7 7 1 1 35 2 3 7 5 3.7 0 ar 3 4 0 2 6 25 7 9 8 4 .04
W 1 5 4. 5 fa 0 9 6. 8
at 4 1 ra 0
s 5 z 3
o A
n h
3B 3 3 2 1 7 27.6 9 4 0 9 4 1 1 955 m
a 5 4 0 7 2 0 0 0 8 .01 e
b 4 4 7. 8 2 d
ar 3 7
A 3
z
a Figure 6 represents the ordered Eigen values in scree
m plot already discuss in above analysis. Figure 6 shows
4A 3 3 2 1 9 31.7 8 4 0 9 3 1 2 948 the Scree plot for the batting statistics, in scree plot line
h 8 6 0 9 325 3 8 0 9 .65 bend at 2 is satisfied that we can use as first PC and it
m 1 3 7. 4 1 explains 78.4 percent of the total variability:
e 6 9 L1 = (0.395 ∗ Runs) + (0.315 ∗ HS)
d 5 + (0.351 ∗ Ave) + (0.388 ∗ BF) +
(0.263 ∗ SR) +(0.357 ∗ 50) + (0.384
S ∗ 4s) + (0.356 ∗ 6s)
h
e
h
za Batting Ranking of PSL
d
We refer to it as the first principal component. It is the
5S 3 3 8 8 6 31.6 7 4 0 4 2 4 3 800
average of PC1-PC8. The higher value of L1 indicates
h 9 6 4 5 3 1 7 7 3 .70
better performance and a lower value of L1 leads to
o 9 7 2. 6
poor performance. Based on the highest and lowest
ai 8
value of L1, we rank the players. Table 14 indicates the
b 3
top 10 batsmen who played at least 4 matches in the
M
PSL (2016-2019) using the first principal component
al
L1. Here is the top ten PSL Bowler’s in the Table 15.
ik
6U 3 3 5 8 9 37.5 6 5 0 7 3 6 4 785
m 2 0 3 3 65 0 1 6 2 .06
ar 3 4 8. 2 7.2. Results of Bowling Performance
A 6
k 3 As the author applied to the IPL, a similar methodology
m was applied and here, without going into detail, we will
al present the final top ten bowlers of the PSL using five
performance measures. For a reviewer, the author will
7M 3 3 2 6 7 21.2 6 4 0 4 3 6 2 700
provide all the analysis from which the author drives the
o 5 3 9 7 72 0 7 9 6 .71
bowling ranking as L1 is shown in below Table 15.
ha 0 5 3. 1
m 4
m 6
ad
H
af
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 11
prediction for a team. What will be the effect on a
team's ranking if a team is moved from one region to
Table 15. Top Ten Bowlers of PSL Ranked by
another? Similar studies can also be utilized to rank
other sports as well. Later, the h-index and PageRank
First Principal Component, L1
algorithms were extended to rank cricket teams. This
R Playe M I O M R W Av Ec SR L1 statistical analysis can be used for football leagues using
a r a nv d u k e on different principal component analysis.
n t n er n n t
k s s s s s REFERENCES
s
1 Waha 4 4 1 0 1 6 17. 6.7 15. - 1. KAMAL, S.A., et al., BOZOK
b 5 4 6 1 5 384 990 341 7.2 INTERNATIONAL JOURNAL OF SPORT
Riaz 6. 3 62 37 54 675 SCIENCES.
2 0 9 2. Sharma, S.K., R.G. Amin, and S.J.I.J.o.P.A.i.S.
2 Fahee 2 2 8 1 6 3 16. 7.5 12. 2.3 Gattoufi, Choosing the best Twenty20 cricket
m 4 4 4. 3 9 205 148 938 037 batsmen using ordered weighted averaging.
Ashraf 1 2 13 63 46 02 2012. 12(3): p. 614-628.
3 Moha 3 3 1 1 8 4 20. 6.9 18. 6.9 3. Hotelling, H.J.J.E.P., Analysis of a complex of
mmad 6 6 2 7 2 833 334 028 673 statistical variables with principal
Sami 6. 5 33 39 57 31 components. 1933. 24: p. 498-520.
2 4. Barr, G. and B.J.J.o.t.O.R.S. Kantor, A
4 Usma 2 2 8 0 7 3 21. 8.5 15. 9.6 criterion for comparing and selecting batsmen
n 7 6 9. 5 5 657 072 274 772 in limited overs cricket. 2004. 55(12): p. 1266-
Shinw 1 8 14 95 29 15 1274.
ari 5. Swartz, T.B., et al., Optimal batting orders in
5 Moha 5 5 1 3 1 5 28. 7.7 22. 10. one-day cricket. 2006. 33(7): p. 1939-1950.
mmad 6 4 8 4 1 607 938 023 356 6. Swartz, T.B., P.S. Gill, and S.J.C.J.o.S.
Irfan 7. 5 84 03 53 48 Muthukumarana, Modelling and simulation for
2 9 one‐day cricket. 2009. 37(2): p. 143-160.
6 Moha 4 4 1 1 1 4 24. 6.9 20. 10. 7. Preston, I. and J.J.J.o.t.R.S.S.S.D. Thomas,
mmad 3 3 5 0 3 279 414 986 614 Batting strategy in limited overs cricket. 2000.
Nawaz 0. 4 07 89 05 21 49(1): p. 95-106.
4 4 8. Van Staden, P.J., Comparison of cricketers’
7 Moha 3 3 1 4 9 3 23. 7.0 20. 10. bowling and batting performances using
mmad 7 6 3 2 9 641 114 230 955 graphical displays. 2009.
Amir 1. 2 03 07 77 39 9. Shah, S., P.J. Hazarika, and J.J.I.J.o.A.R.i.C.S.
5 Hazarika, A study on performance of cricket
8 Shahi 3 3 1 5 8 3 23. 6.5 21. 12. players using factor analysis approach. 2017.
d 7 6 2 1 5 257 964 154 746 8(3): p. 656-660.
Afridi 3. 4 14 34 29 07 10. Motipalle, R. and S.V.L. Kumar, A
4 multivariate statistical approach for ranking
9 Rahat 1 1 6 0 4 2 19. 7.5 15. 13. the best batsmen in test cricket. 2020.
Ali 8 8 5. 9 5 92 799 768 107 11. Shanto, S.I. and N.J.I.J.o.P.A.i.S. Awan, A
7 8 09 1 sequential principal component-based
1 Zafar 1 9 2 0 2 1 15. 7.8 12. 13. algorithm for optimal lineup and batting order
0 Gohar 0 6. 0 3 846 625 092 674 selection in one day international cricket for
2 6 15 95 31 37 Bangladesh. 2019. 19(4): p. 567-583.
12. Perera, H., et al., Optimal lineups in Twenty20
cricket. 2016. 86(14): p. 2888-2900.
VIII. CONCLUSION AND FUTURE WORK 13. Shah, P. and M.J.I.J.P.N.P.E. Patel, Ranking
the cricket captains using principal component
Determining a player’s performance is an exciting job analysis. 2018. 3(2): p. 477-483.
in any sport. It is especially significant in viable sports 14. Bailey, M., S.R.J.J.o.s.s. Clarke, and medicine,
like cricket, which is impacted by a player's Predicting the match outcome in one day
performance by their runs. Wickets involve many international cricket matches, while the game
factors. The organization invests a lot of money in these is in progress. 2006. 5(4): p. 480.
leagues and hopes for advantages. The role of the 15. Mustafa, R.U., et al., Predicting the cricket
bowler and batsmen in cricket impacts the result of the match outcome using crowd opinions on social
match and eventually the profit of the franchise. Here, networks: A comparative study of machine
we have proved by analysis that Principal Component learning methods. 2017. 30(1): p. 63-76.
Analysis is useful for correlated, multivariate data. 16. Pearson, K. and L.O.J.J.S. Lines, Planes of
Using IPL and PSL data, we have shown their past Closest Fit to Systems of Points in Space,
individual contributions to their teams and their London Edinburgh Dublin Philos. Mag. 1901.
respective top ten rankings. 2(11): p. 559-572.
In the future, we plan to propose a cluster-based 17. Hotelling, H.J.J.o.e.p., Analysis of a complex of
statistical variables into principal components. 20. Watnik, M. and R.A.J.J.o.S.E. Levine, NFL
1933. 24(6): p. 417. Y2K PCA. 2001. 9(3).
18. Hladnik, A.J.I.C.o.G.E. and Research, Image 21. Manage, A.B. and S.M.J.J.o.S.E. Scariano, An
compression and face recognition: Two image introductory application of principal
processing applications of principal components to cricket data. 2013. 21(3).
component analysis. 2013. 6(4): p. 56-61. 22. Kaiser, H.F.J.E. and p. measurement, The
19. Dawkins, B.J.T.A.S., Multivariate analysis of application of electronic computers to factor
national track records. 1989. 43(2): p. 110- analysis. 1960. 20(1): p. 141-151.
115.

View publication stats

You might also like