"Cricket Player Statistics Analysis": Visvesvaraya Technological University
"Cricket Player Statistics Analysis": Visvesvaraya Technological University
Mini-Project Report on
by
Mrs. Roopa H M
Assistant Professor
Department of MCA
Estd : 2001
Department of Master of Computer Applications
RNS INSTITUTE OF TECHNOLOGY
Dr. Vishnuvardhan Road, Channasandra, Bengaluru – 560 098
2022
1|Page
Cricket Player Statistics Analysis
Estd: 2001
CERTIFICATE
This is to certify that the Mini-Project work entitled “Cricket Player Statistics Analysis”
has been successfully carried out by Lakshmi B bearing USN 1RN20MC026, bonafide
student of RNS Institute of Technology, in partial fulfillment of the requirements for award
of degree of Master of Computer Applications of Visvesvaraya Technological
University, Belagavi, during the year 2021-22. It is certified that all corrections/suggestions
indicated for internal assessment have been incorporated in this report. The Internship report
has been approved as it satisfies the academic requirements for the said degree.
_____________________ __________________
Mrs. Roopa H M Dr. N P Kavya
Project Coordinator Head of Department
Department of MCA Department of MCA
RNSIT, Bengaluru. RNSIT, Bengaluru.
External Viva
Name of Examiners Signature with Date
1.
2.
2|Page
Cricket Player Statistics Analysis
DECLARATION
Name: Lakshmi B
USN: 1RN20MC026
Signature of the candidates
3|Page
Cricket Player Statistics Analysis
ACKNOWLEDGEMENT
The successful completion of Mini-Project work depends on the co-operation and help of
many people, other than those who directly execute the work. I take this opportunity to
acknowledge for the help received for valuable assistance and cooperation from many
sources.
Our institution has played a paramount role in guiding us in right direction. I would like to
profoundly thank the Management of RNS Institute of Technology for providing such
healthy environment for successful completion of this project work.
I express my sincere words of gratitude to our Chairman Sri Dr. R N Shetty, for creating
an academic environment to brighten our career.
I would also like to thank our beloved Principal, Dr. M K Venkatesha, for providing the
necessary facilities to carry out this work.
I am extremely grateful to our beloved HoD, Dr. N P Kavya, for having accepted to
patronize me in the right direction with all her wisdom.
I would also express my heartfelt thanks to our Project Coordinator Mrs. Roopa H M,
Assistant Professor, Department of MCA for her constant guidance and devoted support.
Name: Lakshmi B
USN: 1RN20MC026
ii
4|Page
Cricket Player Statistics Analysis
ABSTRACT
In this project, we are analysing the data of cricketer’s career using bulk
data set. We are analysing the matches and player’s statistics using Python Data
Analysis.
The game is gaining lot of attention across the world. It is growing
rapidly to become one of the biggest business and entertainment provider in the
world. As the seasons go on, the data in the domain is growing rapidly. We need
to keep track of data for future analysis. It is important to record each match of
player and data on a daily basis.
Data analytics generally is the most important task in all areas of today’s
world, so as in this field. It is useful to analyse the career of an individual, his
team or a match which also help us in future assumptions and predictions. We
are achieving this by below specified requirements.
5|Page
Cricket Player Statistics Analysis
iii
TABLE OF CONTENTS
Chapter Name Page No
Declaration i
Acknowledgement ii
Abstract iii
Table of contents iv
List of Figures vi
CHAPTERS
1. INTRODUCTION 08
1.1. Project Overview 08
1.2. Data Collection 09
2. LITERATURE SURVEY 09
2.1. Library/Module Requirements 10
2.2. Hardware & Software Requirements 10
2.3. Tools/ Languages/ Platform 10
3. DATA CLEANING AND WRANGLING MECHANISMS 10
4. DATA ANALYSIS AND VISUALIZATION 11
5. CONCLUSION 16
REFERENCES 16
iv
LIST OF FIGURES
6|Page
Cricket Player Statistics Analysis
vi
1.INTRODUCTION
1.1 Project Overview
In this article, we’ll see one such use case of Python. We will use Python to
analyze the performance of Indian cricketer MS Dhoni in his One Day
International (ODI),T-20 career.
Cricket, or the gentleman’s game is a very old, widespread and uncomplicated
pastime game. In the late 16th century, the sport of cricket has originated in the southeast
7|Page
Cricket Player Statistics Analysis
England. It became the country’s national sport in the 18th century and has developed
globally in the 19th and 20th The International Cricket Council (ICC) Cricket World Cup,
a One-Day International (ODI) cricket, is the flagship event of the international cricket
calendar and takes place every four years with matches contested in a 50-over format. It is
the biggest cricketing tournament and one of the world’s most viewed sporting events.
While, the Indian Premier League (IPL), a one-day cricket in India with matches
contested in a 20-over format is the most watched cricket league in the world. It is a
tournament centuries and yet the most popular game of the today’s world. It is a game of
uncertainty. One cannot predict outcome of the game upto the last moment of the game
though the possible results are known to all, therefore, an appropriate probability model
can be applied to predict the result.
cricket info. If you are not aware of web scraping. The data is available as an Excel file.
Once you have the dataset with you, you will need to load it in Python. You can use the
Once the dataset has been read, we should look at the head and tail of the dataset to make
sure it is imported correctly. The head of the dataset should look like this:
8|Page
Cricket Player Statistics Analysis
2.LITERATURE SURVEY
2.1 Library/Module Requirements
This Project Require some of the Python Libraries and Modules i.e pandas, numpy,
matplotlib libraries
Pandas: pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical, real-
world data analysis in Python. Additionally, it has the broader goal of becoming the most
powerful and flexible open source data.
Numpy: It is a Python library that provides a multidimensional array
object, various derived objects (such as masked arrays and matrices), and an
assortment of routines for fast operations on arrays, including mathematical,
logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear
algebra, basic statistical operations, random simulation and much more. At the core of the
NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of
homogeneous data types, with many operations being performed in compiled code for
performance. There are several important differences between NumPy arrays and the
standard Python sequences:
Matplotlib: Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in arrays. It provides
an object-oriented API that helps in embedding plots acan be used in Python and IPython
shells, Jupyter notebook and web application servers also.
Software requirements:
Processor: i3 or higher
RAM:4GB or more
9|Page
Cricket Player Statistics Analysis
Hardware requirement:
Windows 7 or Higher
2.3 Tools/Language/Platform
Python Language
Jupyter Platform
Next, we will create a column for the year in which the match was played. Please make sure
that the date column is present in the DateTime format in your DataFrame. If not, please
use pd.to_datetime() to convert it to DateTime format .
We will also create a column indicating whether Dhoni was not out in that innings or not.
We will also drop all those matches from our records where Dhoni did not bat, and store
this information in a new DataFrame.
Finally, we will fix the data types of all the columns present in our new DataFrame.
Firstly, we will look at how many matches he has played against different oppositions. You
can use the following piece of code for this purpose:
We can see that he has played the majority of his matches against Sri Lanka, Australia,
England, West Indies, South Africa, and Pakistan.Let us look at how many runs he has
scored against different oppositions. You can use the following code snippet to generate
the result:
11 | P a g e
Cricket Player Statistics Analysis
We can see that Dhoni has scored the most runs against Sri Lanka, followed by Australia,
England, and Pakistan. He has also played a lot of matches against these teams, so it makes
sense.
To get a clearer picture, let us look at his batting average against each team. The following
12 | P a g e
Cricket Player Statistics Analysis
As we can see, Dhoni has performed remarkably against tough teams like Australia, England,
and Sri Lanka. His average against these teams is either close to his career average, or
slightly higher. The only team against whom he has not performed well is South Africa.
Let us now look at his year-on-year statistics. We will start by looking at how many matches
he has played each year after his debut. The code for that will be:
13 | P a g e
Cricket Player Statistics Analysis
We can see that in 2012, 2014, and 2016, Dhoni played very few ODI matches for India.
Overall, after 2005-2009, the average number of matches he played reduced slightly.
We should also look at how many runs he has scored every year. The code for that will be:
14 | P a g e
Cricket Player Statistics Analysis
It can be clearly seen that Dhoni scored the most runs in the year 2009, followed by 2007
and 2008. The number of runs started reducing post-2010 (because the number of matches
Finally, let’s look at his career batting average progression by innings. This is time-series
data and has been plotted on a line plot. The code for that will be:
15 | P a g e
Cricket Player Statistics Analysis
5. CONCLUSION
Here, we have studied the performance of cricket players in both IPL session 9, 2016 and
ICC World Cup, 2015 in the same direction of Sharma (2013). The statistical technique of
factor analysis has been employed to explore the interrelationship among the various
dimensions of batting and bowling of 20- and 50- overs cricket matches. It has been applied
through PCA to explain items validity as well as groups of items into meaningful clusters. It
is observed that in both the cases of the 20- and 50- overs matches, the five dimensions have
been grouped into factor1 (i.e., batting) while, three dimensions have been grouped into
factor2 (i.e., bowling). The variance explained by factor1 (batting) is much higher than the
variance explained by factor2 (bowling). Thus, it concludes that the batting capability
REFERENCES:
https://www.analyticsvidhya.com/blog/2021/06/analyze-
cricket-data-with-python-a-hands-on-guide/#h2
Bailey, M.J. & Clarke, S.R.: Market inefficiencies in player head to head betting on the 2003
cricket world cup. In Economics, Management and Optimization in Sport, S.Butenko, J.Gil-
Barr, G.D.I. and Kantor, B.S.: A criterion for comparing and selecting batsmen in limited overs
16 | P a g e
Cricket Player Statistics Analysis
17 | P a g e