0% found this document useful (0 votes)
51 views17 pages

Cricket Statistical Analysis Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views17 pages

Cricket Statistical Analysis Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

1

STATISTICAL ANALYSIS AND CRICKET

Abdul Rehman Amiwala

Contents
2
STATISTICAL ANALYSIS AND CRICKET
INTRODUCTION.................................................................................................. 2
BABAR AZAM THE NUMBER 1............................................................................................... 3
DESCRIPTIVE STATISTICS............................................................................................................ 4
Mean:............................................................................................................................... 5
Standard Deviation.......................................................................................................... 6
Coefficient of variation (σ/µ)............................................................................................6
Box and whisker plot........................................................................................................ 7
Limitations to this report.................................................................................................. 8
Conclusion....................................................................................................................... 8
HYPOTHESIS TESTING:.......................................................................................9
RIZWAN VS BABAR IN T20......................................................................................................... 9
BABAR VS MALAN.................................................................................................................. 10
BABAR VS KOHLI ODI............................................................................................................. 11
ANOVA TEST......................................................................................................................... 12
REGRESSION ANALYSIS....................................................................................13
REGRESSION ANALYSIS OF WICKETS TAKEN IN THE CALENDAR YEAR IN RELATION WITH NUMBER OF MONTHS
TO THEIR DEBUT AND NUMBER OF MATCHES PLAYED......................................................................13
MULTI REGRESSION................................................................................................................. 16

INTRODUCTION
Cricket is a worldwide played sport that was first played in the 16th century. Having
originated in south-east England, it became England’s national sport in the 18th century
and has developed globally over time trough the 19th and 20th centuries.
Cricket is a game of numbers, where victory and defeat can ultimately come down to the
irreducible margin of a single run. Yet, from time to time, numbers tell their own story,
write their own poetry and make their own painting about cricket. Some numbers in the
3
STATISTICAL ANALYSIS AND CRICKET
game are indeed stuff of legend. The number 99.4 probably does not need any
introduction, particularly to cricket fans who were born before the turn of the millennium.
Hundred - a hundred times - is what Sachin Tendulkar fashioned during an international
career that spanned 24 years. Mt. 800 is the wicket peak atop which Muttiah Muralidaran
sits. 952 for 6 was what Jayasuria and co. took of India on a road repurposed as a cricket
pitch in Colombo (after India had declared their own innings closed on a forgotten 500-
odd). And 400* is Test cricket's highest individual score, a record Brian Lara wrested
back from Matthew Hayden, as the latter had predicted he would.
But these number numbers get more interesting when looked with statistical analysis. In
this report we will be statistically analyzing some interesting debates and topics in
modern day cricket:

1: Is Babar Azam rightly the number one t20 and odi batsman in the world?

2: Regression analysis of wickets taken in a calendar year in relation with number of


months to their debut and number of matches played

BABAR AZAM THE NUMBER 1


1: Is Babar Azam rightly the number one t20 and odi batsman in the world?

Every generation has enjoyed and witnessed the best batsman and player of their time.
For the fans of 21st generation of pakistan cricket fans there was void as after listening
about hanif muhammad and imran khan from their grandparents and about miandad and
zaheer abbas from their father they were looking for a they would look forward to. Likes
4
STATISTICAL ANALYSIS AND CRICKET
of ahmad shahzad and umar akmal came but no one had the consistency or the vision
good enough.
Then in 2016 from the outskirts of Lahore a player named babar Azam entered the big
stage. He had the mission to become the best not just in pakistan but in the world and
witu his goal in vision he started of a journey Worth remembering, breaking evry record
in his way scoring runs for fun with drives so aesthetically pleasing the world wacthes
them in an aw. The world started to recognising and respecting him that. He is there with
a goal to become the best. And recently he became the number 1 batsman in ODI and
T20I. with his fans celeberating there were some critiquing his rankings.
So here we check using numbers and stats were the critiques claim right or wrong.
We will be using:
1. Mean
2. Standard deviation
3. Coefficient of variation
4. Skewness
5. Hypothesis testing

Descriptive statistics
T20 Babar Azam Virat Kohli Muhammad Rizwan David Malan Jos Butler
total 1135 1168 1301 989 1158
Mean 37.83333333 38.93333333 43.36666667 32.96666667 38.6
Standard Error 5.227777269 4.14240754 4.488900744 4.390345556
Median 40 34 39.5 24 28.5
Mode 51 9 0 1 2
Standard Deviation 30.49175506 30.49175506 30.49175506 30.49175506 30.49175506
Sample Variance 958.0747126 873.3057471 1015.067816 785.8264368 869.9724138
Kurtosis 0.1229428799 -1.28902471 -1.274221958 0.45596626 -1.098072261
CV 0.6657295624 0.3211080426 0.2261962096 1.046767571 0.4040857497
Skewness 0.660754019 0.660754019 0.660754019 0.660754019 0.660754019
5
STATISTICAL ANALYSIS AND CRICKET
Range 122 94 104 102 101
Minimum 0 0 0 1 0
Maximum 122 94 104 103 101
Count 30 30 30 30 30

ODI STATS B Azam V Kohli S Smith K Williamson J Root


total 1803 1592 1247 1341 1213
mean 60.1 53.06666667 41.56666667 44.7 40.43333333
Standard
deviation 41.0091243 39.4216231 37.98337507 34.29501824 32.85567534
variance 1681.748276 1554.064368 1442.736782 1176.148276 1079.495402
mode 31 7 7 19 1
median 49.5 59.5 21.5 35.5 37.5
C~V 0.6823481581 0.742869782 0.9137941075 0.7672263588 0.8125888379
50+ 15 17 12 10 9
100+ 7 5 4 4 4
min 0 0 1 1 0
max 158 123 131 148 107
lower quartile 25.5 16.5 10.5 20.5 8.5
upper quartile 95.5 81 72 64.75 55.5
skewness 0.5395059408 0.2267960639 0.7288265267 1.306784409 0.6099507066
kurtosis -0.6437253361 -1.198736169 -0.6477619246 1.664381662 -0.6661353833

Mean:
Looking at the mean data, which in cricket is called the “Average of a batsman”, Babar Azam
has the highest mean score (56) amongst the rest of the batsmen with Joe Root’s being the
lowest (38.73). This means that Babar Azam has scored 1680 runs in his last 30 innings which
gives a very impressive average of 56 and one of the reasons why he is called a run machine.
On the other hand, M Rizwan has the highest average in t20 cricket while babar azam on 2nd
with a higher consistency.

Standard Deviation
Standard Deviation is also one of the statistical tools used to compare the performance of the
batsmen. Standard deviation calculates how much the numbers are spread out from the mean
or expected value. If a batsman has a low standard deviation, it means that his average score
does not deviate too much from the mean score. In easier words, it just shows how well a player
can live up to their mean score expectation. For a cricketer a lower standard deviation is
preferred but there are some limitations as in this case Joe Root has lesser runs than Kohli and
his average is quite low as well. Babar Azam’s standard deviation stands at a 36.7 which is the
second highest on the list, however his average is high too so he does manage to make a
respectable score on his worse day.On the other hand in t20 the SD is very close for everyone
as it’s a very short format.
6
STATISTICAL ANALYSIS AND CRICKET
In conclusion, the standard deviation cannot act as a standalone to calculate a player’s
performance. It is only useful when the data of means are available as well.

Figure 1 bar chart

Coefficient of variation (σ/µ)


Coefficient of variation is used to calculate the consistency of the batsmen by dividing the
standard deviation from the mean of the player. It compares the degree of variation of one data
series to another, even if the means are different. For the purpose of measuring consistency, it
is ideal to have a low CV. When the calculations are done for Babar Azam, the result comes out
to be a low of 0.66 which proves the title of Mr. Consistency is well deserved. Steve Smith’s is
0.91 and a 0.3 difference between Steve Smith’s and Babar Azam’s states the greatness of
Babar Azam. On the other hand, Rizwan and jos butler shared the lowest CV.
7
STATISTICAL ANALYSIS AND CRICKET
Box and whisker plot

Figure 2 box plot a

Box and whisker plot also called the box plot displays the five-number summary of a set
of data. The five-number summary is the minimum and maximum value, first quartile,
third quartile and the median. It also shows if there are any outliers.
In our case, the minimum value of all the players is 0 (marked by a line on the plot) as it
is quite normal that they all have scored at least one duck in their last 30 matches.
The first quartile is the lower part of the box and shows the runs scored in that fall below
25% of their scores in the last 30 while and the upper quartile shows the runs scored in
that are greater than 75% percentile of runs.
8
STATISTICAL ANALYSIS AND CRICKET
Williamson has a maximum value of 148 in an innings, however this value is an outlier
as it is not close to his mean score and has deviated by 3 quarters from his mean score
(>Q3 + 1.5 IQR) making it an unusual score hence an outlier.
If we look closely the means of Babar Azam and Virat Kohli are almost same compared
to the other 3 which are quite low, as is quite evident in their overall performance.
Moreover, Joe root's upper quartile is below the mean of Virat Kohli and Babar Azam
which further proves that he is been struggling recently and on the other hand Babar and
Virat have been in good form.

Limitations to this report


All of these players played there last 30 matches against different opponents
and at different venues. Some of these pitches favoured bowlers while the
others were batting friendly e.g., Babar Azam had series with Zimbabwe and
South Africa at Zimbabwe and Pakistan respectively while Joe Root played
away series in Australia where the conditions are bowler friendly

Secondly the form factor does really exist in cricket; on some days a player is
at its peak, on off days one can suffers to score. The data taken was from the
last 30 innings and this analysis is only on current form not overall
performance e.g., Joe Root is having a bad form currently but we cannot
deny he is one of the best in the world. If we would have taken data from
2017/18, he would have had a better average than what he has now. So, this
analysis only limited to the current form of batsmen.

There might be some long gap between these 30 innings which might make
this analysis weak for e.g., Kane Williamson played his last game in 2020 but
played his second last game in 2019.

Conclusion
In the end all these analyses need a conclusion. Babar Azam has been the
only player in recent form who has scored most runs along with best average
in both the formats compared to other 4 while Virat Kohli is very close behind
him with his average only 3 runs less than Babar Azam’s in Odis and almost
the same in t20i. On the other hand, the other 3 have struggled to score runs
in last 30 innings specifically Joe Root who hasn't performed up to his usual
standards.
Secondly, Babar Azam and Virat Kohli also lead the race in standard
deviation as their scores do deviate from their mean score but this deviation
might also mean that higher score from average.
Moving onto the coefficients of variation to check the consistency, we can
say Babar Azam and Virat Kohli have been most consistent among the 5 as
they have the lowest cv compared to the other 3 who have not been as
consistent in their performance as the Babar and Kohli.
9
STATISTICAL ANALYSIS AND CRICKET
Lastly only Babar Azam and Virat Kohli have a symmetrical skewness which
means their scores are perfectly balanced with mix of high mean and low
scores while the other three have positive skewness which consists of
majority of low scores.
Whereas in t20s only Virat Kohli has symmetrical distribution while other
have had majority of low scores making a positively skewed data.
To conclude we can say Babar Azam has been the best player in recent time
followed closely by Virat Kohli, which might be the reason these two top
batsmen are highly ranked in both the formats.
Hence, the answer to our question can be yes Babar does deserve to be in
the fab 4 or perhaps it should be the fab 5 now.

Hypothesis Testing:
Rizwan vs Babar in T20

H0 =. U1=U2
H1 = U1 IS NOT EQUAL. TO U2
t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2
37.833333 43.366666
Mean 3 7
958.07471 1015.0678
Variance 3 2
Observations 30 30
986.57126
Pooled Variance 4
Hypothesized Mean Difference 0
df 58
t Stat -0.6822888
0.2488861
P(T<=t) one-tail 4
1.6715527
t Critical one-tail 6
0.4977722
P(T<=t) two-tail 9
2.0017174
t Critical two-tail 8
10
STATISTICAL ANALYSIS AND CRICKET
We will accept the null hypothesis because the t stat falls in the acceptance region hence we can
conclude that Babar’s mean is equal to that of Rizwan’s. Therefore making them the best batting
pair in recent times.

t-Test: Two-Sample Assuming Equal Variances

Babar Vs Malan
H0= Babar mean u1> Malans
mean u2
H1= Babar mean = malans
mean u2
Variable 1 Variable 2
37.833333
Mean 3 32.9666667
958.07471
Variance 3 785.826437
Observations 30 30
871.95057
Pooled Variance 5
Hypothesized Mean Difference 0
df 58
0.6383094
t Stat 8
0.2628923
P(T<=t) one-tail 3 We will accept the null
1.6715527 hypothesis because the t
t Critical one-tail 6 stat falls in the
0.5257846 acceptance region of this
P(T<=t) two-tail 7 test. Hence we will
2.0017174 conclude that the mean of
t Critical two-tail 8 Babar is greater than that
of Malan. This also
confirms that Babar is a better ranked batsman than David Malan in t20.
11
STATISTICAL ANALYSIS AND CRICKET

t-Test: Two-Sample Assuming Equal Variances

Babar vs Kohli ODI


H0= U1=U2
H1= U1 is not equal to U2
Variable 1 Variable 2
Mean 60.1 53.0666667
Variance 1681.74828 1554.06437
Observations 30 30
Pooled Variance 1617.90632
Hypothesized Mean
Difference 0
df 58
t Stat 0.67722057
P(T<=t) one-tail 0.2504792
t Critical one-tail 1.67155276
P(T<=t) two-tail 0.5009584
t Critical two-tail 2.00171748

We will accept the null hypothesis and we will confirm that the mean of
Babar in ODI. Is equal to that Kohli hence they are equally good players in
this format.
12
STATISTICAL ANALYSIS AND CRICKET
Anova Test

ANOVA
HO=
U1=U2=U3=U4=
U5
H1= Atleast one
is different
SUMMARY
Groups Count Sum Average Variance
37.833333 958.07471
Column 1 30 1135 3 3
38.933333 873.30574
Column 2 30 1168 3 7
43.366666 1015.0678
Column 3 30 1301 7 2
32.966666 785.82643
Column 4 30 989 7 7
869.97241
Column 5 30 1158 38.6 4

Source of
Variation SS df MS F P-value F crit
1644.4933 411.12333 0.4565757 0.7674734 2.4340651
Between Groups 3 4 3 1 6 4
130565.16 900.44942
Within Groups 7 145 5

Total 132209.66 149

We will reject the null hypothesis that all means are not equal and it can be
inferred that atleast one mean is different than that of the others.

In conclusion we can say that babars position 1 in odi and t20 is


justified as his mean is greater and his scores don’t deviate much.
Moreover, tests conducted suggest that babars performance have
been better than his competetion like kohli and malan which
rightly justifies his 1st position in both the formats.
13
STATISTICAL ANALYSIS AND CRICKET
Regression Analysis

regression analysis of wickets taken in the calendar year in


relation with number of months to their debut and number of
matches played

Every year bowlers compete against each other to bag most wickets in a calendar year which
helps their ranking and aid their teams.

In second part of the report, we have the regression analysis of relation between wickets taken in
a calendar year with two x variables
1- experience in months from debut
2 number of matches played till 2019

Attached below are stats of calendar year 2019

EXPERIENCE IN MONTHS MATCHES PLAYED WICKETS IN


FROM DEBUT TILL 2019 2019
MUHAMMAD SHAMI 72 52 42

TRENT BOULT 76 69 38

LOCKIE FERGUSON 24 19 35

MUSTAFIZUR 42 40 34
RAHMAN
BHUVNESHWAR 72 95 33
KUMAR
PAT CUMMINS 85 42 31

SHAHEEN SHAH 3 6 27
AFRIDI
SHELDON COTRELL 48 6 31

CHRIS WOAKES 96 80 29

KULDEEP YADAV 18 33 32
14
STATISTICAL ANALYSIS AND CRICKET

Attached above is the scatter plot with line of best fit showing the relation between matches
played till 2019 by a player and wickets taken in the calendar year 2019.

The coefficient of x which also is the gradient or the slope in this equation is
0.0385 which means that if the matches played by the player increases by
one the wickets taken in a calendar year increases by 0.0385. On the other
hand, the y intercept is 31.5 which means when the x or number or matches
played till 2019 is 0 the player will be able to get 31.5 /32 wickets in the
calendar year.
The relation between these two independent and dependent variables is said
to be positive in nature as when x increases it causes the y to increase
however the effect is very small.
15
STATISTICAL ANALYSIS AND CRICKET

Attached above is the scatter plot with line of best fit showing the relation between experience in
months of a player from his debut 2019 by a player and wickets taken in the calendar year 2019.

The coefficient of x which also is the gradient or the slope in this equation is
0.0368 which means that if the experience in month from his debut
increases by one the wickets taken in a calendar year increases by 0.0368.
On the other hand, the y intercept is 31.2 which means when the x or
experience in months from debut till 2019 is 0 the player will be able to get
31.2 /32 wickets in the calendar year.
The relation between these two independent and dependent variables is said
to be positive in nature as when x increases it causes the y to increase
however the effect is very small.
16
STATISTICAL ANALYSIS AND CRICKET

Multi regression

SUMMARY OUTPUT

Regression Statistics
0.2835944
Multiple R 1
0.0804257
R Square 9
Adjusted R -
Square 0.1823097
4.7479158
Standard Error 6
Observations 10

ANOVA
df SS MS
Regression 2 13.801065 6.9005325
Residual 7 157.798935 22.542705
Total 9 171.6

Coefficient Standard
s Error t Stat
31.109796 9.9584401
Intercept 9
3.12396284 3
0.2629779
X Variable 1 0.0201865 0.0767612 6
0.2870114
X Variable 2 0.0228101 0.07947454 5
As observed the intercept is 31.10 which can be interpreted that if the
experience and age of the player before 2019 is to be 0 the average
expected wickets the bowler will take will be 31.10

X variable 1 is 0.021 which can be interpreted that with increase of 1 Match


the bowler plays the average wickets taken in an calender year would rise by
0.021.
17
STATISTICAL ANALYSIS AND CRICKET
X variable 2 is 0.022 which can be interpreted as with increase of 1 month of
experience the average expected wickets taken in an year would rise by
0.022.

R2 is 0.08 which is interpreted as 8% of variability in expected average


wickets taken in an calender is caused by experience in months and matches
played till the year.

The critical f value as seen is 0.74 and if tested at 5% significance level we


can conclude dependent and indepenendent variables are not dependent on
each other

You might also like