0% found this document useful (0 votes)
118 views

"Cricket Player Statistics Analysis": Visvesvaraya Technological University

The document provides details about a mini-project report submitted by a student named Lakshmi B to analyze cricket player statistics. The report includes an introduction describing the growing popularity of cricket globally and importance of data analysis in the sport. It discusses collecting a bulk data set of matches and players' statistics to analyze careers, teams, and individual matches. The report is submitted in partial fulfillment of requirements for a Master of Computer Applications degree from Visvesvaraya Technological University.

Uploaded by

Sampreet Gowda
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

"Cricket Player Statistics Analysis": Visvesvaraya Technological University

The document provides details about a mini-project report submitted by a student named Lakshmi B to analyze cricket player statistics. The report includes an introduction describing the growing popularity of cricket globally and importance of data analysis in the sport. It discusses collecting a bulk data set of matches and players' statistics to analyze careers, teams, and individual matches. The report is submitted in partial fulfillment of requirements for a Master of Computer Applications degree from Visvesvaraya Technological University.

Uploaded by

Sampreet Gowda
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 17

Cricket Player Statistics Analysis

VISVESVARAYA TECHNOLOGICAL UNIVERSITY


BELAGAVI-590 018

Mini-Project Report on

“Cricket Player Statistics Analysis”

Submitted in partial fulfillment of the requirements for the degree of


Master of Computer Applications
of Visvesvaraya Technological University, Belagavi

by

Student Name : Lakshmi B


USN : 1RN20MC026

Under the guidance of

Mrs. Roopa H M
Assistant Professor
Department of MCA

Estd : 2001
Department of Master of Computer Applications
RNS INSTITUTE OF TECHNOLOGY
Dr. Vishnuvardhan Road, Channasandra, Bengaluru – 560 098
2022

1|Page
Cricket Player Statistics Analysis

RNS INSTITUTE OF TECHNOLOGY


Dr. Vishnuvardhan Road, Channasandra, Bengaluru – 560 098

Department of Master of Computer Applications

Estd: 2001

CERTIFICATE

This is to certify that the Mini-Project work entitled “Cricket Player Statistics Analysis”
has been successfully carried out by Lakshmi B bearing USN 1RN20MC026, bonafide
student of RNS Institute of Technology, in partial fulfillment of the requirements for award
of degree of Master of Computer Applications of Visvesvaraya Technological
University, Belagavi, during the year 2021-22. It is certified that all corrections/suggestions
indicated for internal assessment have been incorporated in this report. The Internship report
has been approved as it satisfies the academic requirements for the said degree.

_____________________ __________________
Mrs. Roopa H M Dr. N P Kavya
Project Coordinator Head of Department
Department of MCA Department of MCA
RNSIT, Bengaluru. RNSIT, Bengaluru.

External Viva
Name of Examiners Signature with Date

1.

2.

2|Page
Cricket Player Statistics Analysis

DECLARATION

I, Lakshmi B student of 3rd MCA, RNS Institute of Technology, bearing USN:


1RN20MC026 hereby by declare that the project entitled “Cricket Player Statistics
Analysis” has been carried out by me under the supervision of Project Coordinator Mrs.
Roopa H M, Assistant Professor, Department of MCA and submitted in partial fulfillment of
the requirements for the award of the Degree of Master of Computer Applications by the
Visvesvaraya Technological University during the academic year 2021-22. This report has
not been submitted to any other Organization/University for any award of degree or
certificate.

Name: Lakshmi B
USN: 1RN20MC026
Signature of the candidates

3|Page
Cricket Player Statistics Analysis

ACKNOWLEDGEMENT

The successful completion of Mini-Project work depends on the co-operation and help of
many people, other than those who directly execute the work. I take this opportunity to
acknowledge for the help received for valuable assistance and cooperation from many
sources.

Our institution has played a paramount role in guiding us in right direction. I would like to
profoundly thank the Management of RNS Institute of Technology for providing such
healthy environment for successful completion of this project work.

I express my sincere words of gratitude to our Chairman Sri Dr. R N Shetty, for creating
an academic environment to brighten our career.

I would also like to thank our beloved Principal, Dr. M K Venkatesha, for providing the
necessary facilities to carry out this work.

I am extremely grateful to our beloved HoD, Dr. N P Kavya, for having accepted to
patronize me in the right direction with all her wisdom.

I would also express my heartfelt thanks to our Project Coordinator Mrs. Roopa H M,
Assistant Professor, Department of MCA for her constant guidance and devoted support.

Name: Lakshmi B
USN: 1RN20MC026

ii

4|Page
Cricket Player Statistics Analysis

ABSTRACT
In this project, we are analysing the data of cricketer’s career using bulk
data set. We are analysing the matches and player’s statistics using Python Data
Analysis.
The game is gaining lot of attention across the world. It is growing
rapidly to become one of the biggest business and entertainment provider in the
world. As the seasons go on, the data in the domain is growing rapidly. We need
to keep track of data for future analysis. It is important to record each match of
player and data on a daily basis.
Data analytics generally is the most important task in all areas of today’s
world, so as in this field. It is useful to analyse the career of an individual, his
team or a match which also help us in future assumptions and predictions. We
are achieving this by below specified requirements.

5|Page
Cricket Player Statistics Analysis

iii

TABLE OF CONTENTS
Chapter Name Page No
Declaration i
Acknowledgement ii
Abstract iii
Table of contents iv

List of Figures vi

CHAPTERS

1. INTRODUCTION 08
1.1. Project Overview 08
1.2. Data Collection 09
2. LITERATURE SURVEY 09
2.1. Library/Module Requirements 10
2.2. Hardware & Software Requirements 10
2.3. Tools/ Languages/ Platform 10
3. DATA CLEANING AND WRANGLING MECHANISMS 10
4. DATA ANALYSIS AND VISUALIZATION 11
5. CONCLUSION 16
REFERENCES 16

iv

LIST OF FIGURES

Figure No. Name Page No.

6|Page
Cricket Player Statistics Analysis

Fig. 1.1 Data Collection 09

Fig 4.1 No of Matches against opposition 11

Fig 4.2 Runs Scored against diff opposition 12

Fig 4.3 Avg against major teams 13

Fig 4.4 Matches played by year 14

Fig 4.5 Runs scored by year 14

Fig 4.6 Career avg progression by innings 15

vi

1.INTRODUCTION
1.1 Project Overview
In this article, we’ll see one such use case of Python. We will use Python to
analyze the performance of Indian cricketer MS Dhoni in his One Day
International (ODI),T-20 career.
Cricket, or the gentleman’s game is a very old, widespread and uncomplicated
pastime game. In the late 16th century, the sport of cricket has originated in the southeast
7|Page
Cricket Player Statistics Analysis

England. It became the country’s national sport in the 18th century and has developed
globally in the 19th and 20th The International Cricket Council (ICC) Cricket World Cup,
a One-Day International (ODI) cricket, is the flagship event of the international cricket
calendar and takes place every four years with matches contested in a 50-over format. It is
the biggest cricketing tournament and one of the world’s most viewed sporting events.
While, the Indian Premier League (IPL), a one-day cricket in India with matches
contested in a 20-over format is the most watched cricket league in the world. It is a
tournament centuries and yet the most popular game of the today’s world. It is a game of
uncertainty. One cannot predict outcome of the game upto the last moment of the game
though the possible results are known to all, therefore, an appropriate probability model
can be applied to predict the result.

1.2 Data Collection


If you are familiar with the concept of web scraping, you can scrape the data from this ESPN

cricket info. If you are not aware of web scraping. The data is available as an Excel file.

Once you have the dataset with you, you will need to load it in Python. You can use the

piece of code below to load the dataset in Python:

Once the dataset has been read, we should look at the head and tail of the dataset to make

sure it is imported correctly. The head of the dataset should look like this:

Figure 1.1 : Data Collection

8|Page
Cricket Player Statistics Analysis

2.LITERATURE SURVEY
2.1 Library/Module Requirements
This Project Require some of the Python Libraries and Modules i.e pandas, numpy,
matplotlib libraries
Pandas: pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical, real-
world data analysis in Python. Additionally, it has the broader goal of becoming the most
powerful and flexible open source data.
Numpy:  It is a Python library that provides a multidimensional array
object, various derived objects (such as masked arrays and matrices), and an
assortment of routines for fast operations on arrays, including mathematical,
logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear
algebra, basic statistical operations, random simulation and much more. At the core of the
NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of
homogeneous data types, with many operations being performed in compiled code for
performance. There are several important differences between NumPy arrays and the
standard Python sequences:

Matplotlib: Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in arrays. It provides
an object-oriented API that helps in embedding plots acan be used in Python and IPython
shells, Jupyter notebook and web application servers also.

Seaborn: Seaborn is an open-source Python library built on top of matplotlib. It is


used for data visualization and exploratory data analysis. Seaborn works easily with
dataframes and the Pandas library. The graphs created can also be customized easily.

2.2 Hardware and Software requirements

Software requirements:
 Processor: i3 or higher

 RAM:4GB or more

 Input Devices: Keyboard, mouse

 Hard Disk:500GB or more

9|Page
Cricket Player Statistics Analysis

Hardware requirement:
 Windows 7 or Higher

2.3 Tools/Language/Platform
 Python Language
 Jupyter Platform

3. DATA CLEANING MECHANISMS


Data cleansing is so important for individuals because eventually, all this information
can become overwhelming. It can be difficult to find the most recent paperwork. You may
have to wade through dozens of old files before you find the most recent one.
Disorganization can lead to stress, and even lost documents!Data cleansing ensures you only
have the most recent files and important documents, so when you need to, you can find
them with ease. It also helps ensure that you do not have significant amounts of personal
information on your computer, which can be a security risks.
This data has been taken from a webpage, so it is not very clean. We will start by
removing the first 2 characters from the opposition string because that is not required .

Next, we will create a column for the year in which the match was played. Please make sure
that the date column is present in the DateTime format in your DataFrame. If not, please
use pd.to_datetime() to convert it to DateTime format .

We will also create a column indicating whether Dhoni was not out in that innings or not.

We will also drop all those matches from our records where Dhoni did not bat, and store
this information in a new DataFrame.

Finally, we will fix the data types of all the columns present in our new DataFrame.

4. DATA ANALYSIS AND VISUALIZATION


10 | P a g e
Cricket Player Statistics Analysis

Firstly, we will look at how many matches he has played against different oppositions. You
can use the following piece of code for this purpose:

The output should look like this:

Figure 4.1: No of Matches against opposition

We can see that he has played the majority of his matches against Sri Lanka, Australia,

England, West Indies, South Africa, and Pakistan.Let us look at how many runs he has

scored against different oppositions. You can use the following code snippet to generate

the result:

The output will look like this:

11 | P a g e
Cricket Player Statistics Analysis

Figure 4.2 : Runs scored against diff oppositions

We can see that Dhoni has scored the most runs against Sri Lanka, followed by Australia,

England, and Pakistan. He has also played a lot of matches against these teams, so it makes

sense.

To get a clearer picture, let us look at his batting average against each team. The following

piece of code will help us with getting the desired result:

For generating the plot, use the code snippet below:

The output will look like this:

12 | P a g e
Cricket Player Statistics Analysis

Figure 4.3 : Avg agnst major Teams

As we can see, Dhoni has performed remarkably against tough teams like Australia, England,

and Sri Lanka. His average against these teams is either close to his career average, or

slightly higher. The only team against whom he has not performed well is South Africa.

Let us now look at his year-on-year statistics. We will start by looking at how many matches

he has played each year after his debut. The code for that will be:

The plot will look like this

13 | P a g e
Cricket Player Statistics Analysis

Figure 4.4 : matches played by year

We can see that in 2012, 2014, and 2016, Dhoni played very few ODI matches for India.

Overall, after 2005-2009, the average number of matches he played reduced slightly.

We should also look at how many runs he has scored every year. The code for that will be:

The output should look like this:

Figure 4.5 : Runs scored by year

14 | P a g e
Cricket Player Statistics Analysis

It can be clearly seen that Dhoni scored the most runs in the year 2009, followed by 2007

and 2008. The number of runs started reducing post-2010 (because the number of matches

played also started reducing).

Finally, let’s look at his career batting average progression by innings. This is time-series

data and has been plotted on a line plot. The code for that will be:

The code snippet for the plot will be:

The output plot will look like this:

Figure 4.6 : Career avg progression by innings

15 | P a g e
Cricket Player Statistics Analysis

5. CONCLUSION

Here, we have studied the performance of cricket players in both IPL session 9, 2016 and

ICC World Cup, 2015 in the same direction of Sharma (2013). The statistical technique of

factor analysis has been employed to explore the interrelationship among the various

dimensions of batting and bowling of 20- and 50- overs cricket matches. It has been applied

through PCA to explain items validity as well as groups of items into meaningful clusters. It

is observed that in both the cases of the 20- and 50- overs matches, the five dimensions have

been grouped into factor1 (i.e., batting) while, three dimensions have been grouped into

factor2 (i.e., bowling). The variance explained by factor1 (batting) is much higher than the

variance explained by factor2 (bowling). Thus, it concludes that the batting capability

dominates over bowling capability which rejustified the works of Sharma .

REFERENCES:

 https://www.analyticsvidhya.com/blog/2021/06/analyze-
cricket-data-with-python-a-hands-on-guide/#h2
 Bailey, M.J. & Clarke, S.R.: Market inefficiencies in player head to head betting on the 2003

cricket world cup. In Economics, Management and Optimization in Sport, S.Butenko, J.Gil-

Lafuente & P.M.Pardalos, editors, SpingerVerlag, Heidelberg,pp. 185-202 (2004).

 Barr, G.D.I. and Kantor, B.S.: A criterion for comparing and selecting batsmen in limited overs

cricket, Journal of the Operational Research Society, 55, p. 1266-1274 (2004).

16 | P a g e
Cricket Player Statistics Analysis

17 | P a g e

You might also like