Statistical Analysis of Cricket Leagues Using Principal Component Analysis
Statistical Analysis of Cricket Leagues Using Principal Component Analysis
net/publication/354721618
CITATIONS READS
0 1,200
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sheharyar KHAN Ktk on 21 September 2021.
0.0758 0.9475
0.0145 0.18125
0.0014 0.0175
K
h
a
n
6 U 4 4 1 1 55 2 8 17.989 0.
T 8 8 6 6 . 6
Fig.4. Scree Chart of IPL Bowling 4 . 6 4
Y . 0 9 7
a 9 5 0
d 4
a
v
7 S 5 5 1 1 56 2 7 19.757 0.
a 0 0 8 5 . 7
n 4 . 8 5
d . 8 5 5
e 4 7 7
e 5
p
S
h
a
r
m
a
8 M4 4 1 1 53 2 8 18.70 2.
J 4 4 6 6 . 2 Fig.5. Matrix plots of eight batting parameters of PSL
5 . 5 8
M . 7 7 2 Bowling lowers the value of L1 depicts better
c 2 3 7 performance and the higher the value of L1 in bowling
C 5 Analysis depicts poor performance Based on which
l bowling ranking is generated. Table 10 shows (IPL
e 2016-2019) lists the top ten bowlers using the first
n principal component L1.
a
g VII. INTRODUCTION TO PSL (I TO
h
a IV)
n The Pakistan Super League (PSL) is a Twenty20 cricket
9 A 2 2 9 0 39 2 8 15.230 4. league owned by the Pakistan Cricket Board. It is a
J 6 6 9 1 . 0 commercial professional league similar to IPL and BPL,
. 3 7 which started in Sep-2015. This league initially had five
T 0 0 2 teams participating. But with the time now, it comprises
y 7 3 six teams. Each team is owned by a franchise. PSL on
e 6 the world calendar is scheduled for February and March
1 K 1 1 6 0 31 1 8. 13.2 4. of every year. Each team plays double matches. The
0 8 8 8 7. 1 9 first four on the point table qualify for the play-offs and
R . 9 5 2 the winner of the play-offs will play the final match. For
a 2 3 2 3 batting and bowling analysis, a similar strategy is
b 5 8 followed as discussed in Table 1 and Table 2. The only
a difference is we have different players and their data-set
d is comprised of 110 batsmen and 62 bowlers. From
a seasons 1 to 4, we have gathered data from the ESPN
website. EMPIRICAL RESULTS OF PSL
Bowling Ranking of IPL
7.1. Result of Batting Performance
Similarly, in the above, the author applies Principal
Component Analysis to eight measures of the Batting
dataset, and figure 5 illustrates a matrix plot. It shows
that there are noteworthy connections that exist among
these measures. Runs and HS We have the following
Eigen Values and Eigenvector of the sample co-relation
matrix as shown in Table 11. Ordered Eigenvalues and
their corresponding percentage total variability are
shown in Table 12.
Table 12 represents the ordered Eigenvalues and
percentage of total variability attributed to each
Eigenvalue. This shows that the first PC explains the
most variation that is 78.4% against the first
Eigenvalue. So, we take the first Principal Component
to rank the batsmen.
Table 13 shows the eigenvector factors for all eight
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 9
principal components (PC1 – PC8). The Eigenvalue- 38 0.0 2 62 0.0 0.4 0.5 0.5
eigenvector pair for the first principal component is 8 47 2 86 56 47 07
shown in Table 13. 2
SR 0. 0.7 - - 0.5 0.0 - -
Table 11. Sample Correlation Matrix of Batting Performance PSL 26 97 0.1 0.0 21 08 0.0 0.0
3 21 86 32 15
Varia PC PC PC PC PC PC PC PC
50 0. - 0. - 0.3 0.6 - -
ble 1 2 3 4 5 6 7 8
35 0.3 3 0.0 81 69 0.1 0.0
RUM 1 0.74 0.84 0.98 0.61 0.89 0.96 0.88 7 24 8 11 69 16
S 4 7 6 6 3 8 8 2
HS 0.7 1 0.63 0.71 0.32 0.71 0.74 0.67 4s 0. - 0. 0.0 0.1 - 0.7 -
44 8 9 5 3 1 5 38 0.1 2 08 63 0.3 76 0.1
AVE 0.8 0.63 1 0.85 0.68 0.68 0.78 0.72 4 60 6 32 41
47 8 7 8 7 3 5 1
BF 0.9 0.71 0.85 1 0.58 0.88 0.95 0.81 6s 0. 0.0 - - - 0.2 0.0 -
86 9 7 9 5 9 9 35 96 0.1 0.7 0.4 01 50 0.1
SR 0.6 0.32 0.68 0.58 1 0.40 0.53 0.62 6 28 34 84 81
16 5 8 9 2 3 7
50 0.8 0.71 0.68 0.88 0.40 1 0.92 0.73
93 3 7 5 2 7 5
4s 0.9 0.74 0.78 0.95 0.53 0.92 1 0.81
68 1 3 9 3 7 2
6s 0.8 0.67 0.72 0.81 0.62 0.73 0.81 1
88 5 5 9 7 5 2
Vari PC PC P PC PC PC PC PC
able 1 2 C3 4 5 6 7 8
Runs 0. - 0. - - - - 0.8
39 0.0 1 0.0 0.1 0.2 0.1 30
5 33 4 61 10 79 96
7
HS 0. - - 0.0 0.2 - - -
31 0.3 0.8 54 64 0.0 0.0 0.0
5 96 17 54 43 11 Fig.6. Score Chart of PSL Batting
Ave 0. 0.2 - 0.6 - 0.3 0.1 0.0
35 53 0.1 49 0.4 37 61 01
1 31 84
BF 0. - 0. 0.1 - - - -
Table 14. Top Batsmen of PSL Ranked by First ee
Principal Component Analysis L1 z
R Pl M I N RHS Ave B S 1 50 4 6 LI 8 D 2 2 4 7 7 31.9 6 4 0 5 1 6 3 699
aa a n Ou F R 0 0 s s R 8 5 0 3 37 1 2 7 1 .36
ny t n n 0 S 1 1 8. 5
k er s s
s m 3
1K 4 4 2 1 10 28.7 9 5 2 9 8 1 6 113 it 2
a 7 6 2 7 375 5 2 2 7 6.1 h
m 8 6 3. 7 4 9 C 2 2 2 6 11 25.2 5 5 1 4 4 6 2 697
ra 6 2 S 6 5 8 7 55 1 5 6 6 .79
n 7 D 2 9 2. 6
A el 4
k p 2
m or
al t
2S 3 3 4 1 9 33.0 8 5 0 7 1 9 6 100 1 S 4 3 1 7 5 30.1 5 4 0 3 2 5 1 696
R 7 7 1 1 35 2 3 7 5 3.7 0 ar 3 4 0 2 6 25 7 9 8 4 .04
W 1 5 4. 5 fa 0 9 6. 8
at 4 1 ra 0
s 5 z 3
o A
n h
3B 3 3 2 1 7 27.6 9 4 0 9 4 1 1 955 m
a 5 4 0 7 2 0 0 0 8 .01 e
b 4 4 7. 8 2 d
ar 3 7
A 3
z
a Figure 6 represents the ordered Eigen values in scree
m plot already discuss in above analysis. Figure 6 shows
4A 3 3 2 1 9 31.7 8 4 0 9 3 1 2 948 the Scree plot for the batting statistics, in scree plot line
h 8 6 0 9 325 3 8 0 9 .65 bend at 2 is satisfied that we can use as first PC and it
m 1 3 7. 4 1 explains 78.4 percent of the total variability:
e 6 9 L1 = (0.395 ∗ Runs) + (0.315 ∗ HS)
d 5 + (0.351 ∗ Ave) + (0.388 ∗ BF) +
(0.263 ∗ SR) +(0.357 ∗ 50) + (0.384
S ∗ 4s) + (0.356 ∗ 6s)
h
e
h
za Batting Ranking of PSL
d
We refer to it as the first principal component. It is the
5S 3 3 8 8 6 31.6 7 4 0 4 2 4 3 800
average of PC1-PC8. The higher value of L1 indicates
h 9 6 4 5 3 1 7 7 3 .70
better performance and a lower value of L1 leads to
o 9 7 2. 6
poor performance. Based on the highest and lowest
ai 8
value of L1, we rank the players. Table 14 indicates the
b 3
top 10 batsmen who played at least 4 matches in the
M
PSL (2016-2019) using the first principal component
al
L1. Here is the top ten PSL Bowler’s in the Table 15.
ik
6U 3 3 5 8 9 37.5 6 5 0 7 3 6 4 785
m 2 0 3 3 65 0 1 6 2 .06
ar 3 4 8. 2 7.2. Results of Bowling Performance
A 6
k 3 As the author applied to the IPL, a similar methodology
m was applied and here, without going into detail, we will
al present the final top ten bowlers of the PSL using five
performance measures. For a reviewer, the author will
7M 3 3 2 6 7 21.2 6 4 0 4 3 6 2 700
provide all the analysis from which the author drives the
o 5 3 9 7 72 0 7 9 6 .71
bowling ranking as L1 is shown in below Table 15.
ha 0 5 3. 1
m 4
m 6
ad
H
af
Statistical Analysis of Cricket Leagues Using Principal Component Analysis 11
prediction for a team. What will be the effect on a
team's ranking if a team is moved from one region to
Table 15. Top Ten Bowlers of PSL Ranked by
another? Similar studies can also be utilized to rank
other sports as well. Later, the h-index and PageRank
First Principal Component, L1
algorithms were extended to rank cricket teams. This
R Playe M I O M R W Av Ec SR L1 statistical analysis can be used for football leagues using
a r a nv d u k e on different principal component analysis.
n t n er n n t
k s s s s s REFERENCES
s
1 Waha 4 4 1 0 1 6 17. 6.7 15. - 1. KAMAL, S.A., et al., BOZOK
b 5 4 6 1 5 384 990 341 7.2 INTERNATIONAL JOURNAL OF SPORT
Riaz 6. 3 62 37 54 675 SCIENCES.
2 0 9 2. Sharma, S.K., R.G. Amin, and S.J.I.J.o.P.A.i.S.
2 Fahee 2 2 8 1 6 3 16. 7.5 12. 2.3 Gattoufi, Choosing the best Twenty20 cricket
m 4 4 4. 3 9 205 148 938 037 batsmen using ordered weighted averaging.
Ashraf 1 2 13 63 46 02 2012. 12(3): p. 614-628.
3 Moha 3 3 1 1 8 4 20. 6.9 18. 6.9 3. Hotelling, H.J.J.E.P., Analysis of a complex of
mmad 6 6 2 7 2 833 334 028 673 statistical variables with principal
Sami 6. 5 33 39 57 31 components. 1933. 24: p. 498-520.
2 4. Barr, G. and B.J.J.o.t.O.R.S. Kantor, A
4 Usma 2 2 8 0 7 3 21. 8.5 15. 9.6 criterion for comparing and selecting batsmen
n 7 6 9. 5 5 657 072 274 772 in limited overs cricket. 2004. 55(12): p. 1266-
Shinw 1 8 14 95 29 15 1274.
ari 5. Swartz, T.B., et al., Optimal batting orders in
5 Moha 5 5 1 3 1 5 28. 7.7 22. 10. one-day cricket. 2006. 33(7): p. 1939-1950.
mmad 6 4 8 4 1 607 938 023 356 6. Swartz, T.B., P.S. Gill, and S.J.C.J.o.S.
Irfan 7. 5 84 03 53 48 Muthukumarana, Modelling and simulation for
2 9 one‐day cricket. 2009. 37(2): p. 143-160.
6 Moha 4 4 1 1 1 4 24. 6.9 20. 10. 7. Preston, I. and J.J.J.o.t.R.S.S.S.D. Thomas,
mmad 3 3 5 0 3 279 414 986 614 Batting strategy in limited overs cricket. 2000.
Nawaz 0. 4 07 89 05 21 49(1): p. 95-106.
4 4 8. Van Staden, P.J., Comparison of cricketers’
7 Moha 3 3 1 4 9 3 23. 7.0 20. 10. bowling and batting performances using
mmad 7 6 3 2 9 641 114 230 955 graphical displays. 2009.
Amir 1. 2 03 07 77 39 9. Shah, S., P.J. Hazarika, and J.J.I.J.o.A.R.i.C.S.
5 Hazarika, A study on performance of cricket
8 Shahi 3 3 1 5 8 3 23. 6.5 21. 12. players using factor analysis approach. 2017.
d 7 6 2 1 5 257 964 154 746 8(3): p. 656-660.
Afridi 3. 4 14 34 29 07 10. Motipalle, R. and S.V.L. Kumar, A
4 multivariate statistical approach for ranking
9 Rahat 1 1 6 0 4 2 19. 7.5 15. 13. the best batsmen in test cricket. 2020.
Ali 8 8 5. 9 5 92 799 768 107 11. Shanto, S.I. and N.J.I.J.o.P.A.i.S. Awan, A
7 8 09 1 sequential principal component-based
1 Zafar 1 9 2 0 2 1 15. 7.8 12. 13. algorithm for optimal lineup and batting order
0 Gohar 0 6. 0 3 846 625 092 674 selection in one day international cricket for
2 6 15 95 31 37 Bangladesh. 2019. 19(4): p. 567-583.
12. Perera, H., et al., Optimal lineups in Twenty20
cricket. 2016. 86(14): p. 2888-2900.
VIII. CONCLUSION AND FUTURE WORK 13. Shah, P. and M.J.I.J.P.N.P.E. Patel, Ranking
the cricket captains using principal component
Determining a player’s performance is an exciting job analysis. 2018. 3(2): p. 477-483.
in any sport. It is especially significant in viable sports 14. Bailey, M., S.R.J.J.o.s.s. Clarke, and medicine,
like cricket, which is impacted by a player's Predicting the match outcome in one day
performance by their runs. Wickets involve many international cricket matches, while the game
factors. The organization invests a lot of money in these is in progress. 2006. 5(4): p. 480.
leagues and hopes for advantages. The role of the 15. Mustafa, R.U., et al., Predicting the cricket
bowler and batsmen in cricket impacts the result of the match outcome using crowd opinions on social
match and eventually the profit of the franchise. Here, networks: A comparative study of machine
we have proved by analysis that Principal Component learning methods. 2017. 30(1): p. 63-76.
Analysis is useful for correlated, multivariate data. 16. Pearson, K. and L.O.J.J.S. Lines, Planes of
Using IPL and PSL data, we have shown their past Closest Fit to Systems of Points in Space,
individual contributions to their teams and their London Edinburgh Dublin Philos. Mag. 1901.
respective top ten rankings. 2(11): p. 559-572.
In the future, we plan to propose a cluster-based 17. Hotelling, H.J.J.o.e.p., Analysis of a complex of
statistical variables into principal components. 20. Watnik, M. and R.A.J.J.o.S.E. Levine, NFL
1933. 24(6): p. 417. Y2K PCA. 2001. 9(3).
18. Hladnik, A.J.I.C.o.G.E. and Research, Image 21. Manage, A.B. and S.M.J.J.o.S.E. Scariano, An
compression and face recognition: Two image introductory application of principal
processing applications of principal components to cricket data. 2013. 21(3).
component analysis. 2013. 6(4): p. 56-61. 22. Kaiser, H.F.J.E. and p. measurement, The
19. Dawkins, B.J.T.A.S., Multivariate analysis of application of electronic computers to factor
national track records. 1989. 43(2): p. 110- analysis. 1960. 20(1): p. 141-151.
115.