0% found this document useful (0 votes)
90 views19 pages

Nadeem Report

Uploaded by

Naheeda Afreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views19 pages

Nadeem Report

Uploaded by

Naheeda Afreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

A Technical Seminar Report

on

MACHINE LEARNING-BASED SELECTION OF OPTIMAL SPORTS TEAM


BASED ON THE PLAYERS PERFORMANCE

in the partial fulfillment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

Submitted by

MOHAMMED NADEEM ISRAR

19B81A3324

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

CVR COLLEGE OF ENGINEERING


(An Autonomous institution, NAAC Accredited and Affiliated to JNTUH, Hyderabad)
Vastunagar, Mangalpalli (V), Ibrahimpatnam (M),
Rangareddy (D), Telangana- 501 510

DECEMBER 2022

i
CVR COLLEGE OF ENGINEERING
(An Autonomous institution, NAAC Accredited and Affiliated to JNTUH, Hyderabad)

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

CERTIFICATE

This is to certify that the technical seminar report titled “Machine learning-based Selection
of Optimal sports Team based on the Players Performance” submitted by MOHAMMED
NADEEM ISRAR bearing H.T. No: 19B81A3324 as part of academic requirement of the
award of Bachelor of Technology in Computer Science and Information Technology to the
CVR College of Engineering, during the year 2022-2023.

Technical Seminar Coordinator Head of the Department

Dr. R. Raja Dr. Lakshmi H N

Associate Professor Professor & HoD, ET

Department of CSIT

ii
ACKNOWLEDGEMENT

I would like to express heartful thanks to the Head of the Department, Professor
Dr. Lakshmi H N for her meticulous care and cooperation throughout the Technical Seminar.

I thank Dr R. Raja, Project Coordinator and Dr. V. Deepika, Associate Professor, Technical
Seminar Coordinator for their valuable guidance in the preparation of this seminar report.

I am also thankful for and fortunate enough to get constant encouragement, support, and
guidance from all Teaching staff of CSIT Department which helped me in successfully
completing this seminar.

I have a deep sense of gratitude and heartfelt thanks to management for providing excellent lab
facilities.

iii
TABLE OF CONTENTS

ChapterNo. TITLE Page No.

1 ABSTRACT 1

2 INTRODUCTION 2

3 LITERATURE SURVEY 3

3.1 Existing Works 3

4 OBJECTIVES 5

5 TOPIC DESCRIPTION 6

5.1 Proposed Methodology 6

5.2 Algorithms And Techniques 8

5.3 Data Description 8

6 IMPLEMENTATION AND RESULTS 11

7 CONCLUSION 14

REFERENCES 15

iv
CHAPTER 1
ABSTRACT

This paper is about a model that can select best playing 11 in the Indian cricket team. The
performance of each player depends on several factors like the pitch type, the opposition team,
the ground, and several others. The proposed model contains data from the One Day
International of the past several years of team India. The dataset used for this model created
using data from trusted sites like espn.com. This method is distinct in the sense that it gives
you a 360-degree view of the player's skill set, be it, batting, bowling, and fielding. The vital
part of this model is to find the best allrounder player. Random forest algorithm used for
predicting performance. The player performance classified into several classes, and a random
forest classifier used to predict the player’s performance. This model gives 76% accuracy for
batsmen, around 67% accuracy for bowlers, and 95% for an all-rounder. A model is developed
with some extra features like weather, matches played that have not considered in any existing
model. Using this model, the best team can be selected to play in given conditions.

1
CHAPTER 2
INTRODUCTION

Cricket is the second most-watched program on television. The popularity of this sport is
soaring high in southeast countries like India, Pakistan, Bangladesh, and Sri-Lanka. One of the
major issues now is playing 11, which should be selected. People of different ages and
backgrounds are zealot fans of cricket because the Indian team is doing well in the recent past.
However, many confusions arise before a match about team combinations, i.e., which player
to select or drop for the next game, which batsman should play in which position or which
bowler should be picked for the upcoming game. Machine learning is used to predict the match
results in many different sports [2]. From here, the motivation emerged to evaluate the
performance of the player in a specific match and selecting the best playing 11 according to it
using machine learning [3], as it will remove the proclivity towards one particular player and
benefit our cricket team. The thing that makes this model unique is that it takes into
consideration if a player is an all-rounder. Allrounders, when compared based only on one of
their attributes, may not get a place in the team.
As machine learning aims to focus on larger and complex tasks, the importance of providing a
potential amount of relevant data has become the most crucial part of the field . For example,
the data records for data mining is essential for the implementation of many features as well as
for developing and running a pertinent algorithm that is appropriate to achieve a desirable
outcome.
The main objective of this proposed model is to find out the performance of the players from
the players pool based on the past records of players. The proposed model intents to predict the
run of a batsman he would score in the next match and also runs a bowler might give in the
next match. The technique is to use different types of supervised learning technique and find
out the comparison of the performance between the model’s accuracy. Cricket is now the most
popular game in south Asian subcontinent. People ranges from all age and occupation are great
fan of cricket because Bangladesh team is doing well in the recent past. However, many
confusions arise before a match about team combination, e.g., which player to pick or drop for
the next match, which batsman should play in which position or which bowler should be picked
for the next match. From here the necessity and motivation of the work emerged in order to
evaluate the players by predicting their performance using machine learning technique, as it
will remove the biasness of the player selection process as well as facilitate our cricket team.

2
CHAPTER 3
LITERATURE SURVEY

3.1 Existing Works

3.1.1 Player’s Performance Prediction in ODI Cricket Using Machine Learning


Algorithms
[1] Aminul Islam Anik et al. proposed using the balls faced, ground, pitch, opposition, and
position to find the performance of a player using SVM. This model does a proper analysis of
various factors and their effect on the runs scored by batsmen. The primary factor is balls faced
by the players, but a drawback is that the number of shots faced by a player cannot be known
before the match. The paper has done the right amount of work on batsmen and bowlers, but
the analysis for all-rounder is left.

3.1.2 CricAI: A Classification Based Tool to Predict the Outcome in ODI Cricket
[2] Amal Kaluarachchi used Bayesian classifiers in Machine learning, to predict how the
factors like home game advantage affect the outcome of the match. Using this idea, the home
and away is used as one of the parameters that affects the players’ performance.

3.1.3 Identifying the Optimal Set of Attributes that Impose High Impact on the End
Results of a Cricket Match Using Machine Learning
[3]Pranavan Somaskandhan et al.; analyzed the set of attributes that impose a high impact on
the outcome of a game using machine learning. When the attribute combination of high
individual wickets, number of bowled deliveries, number of the thirties, total wickets, wickets
in the power play, runs in death overs, dots in middle overs, number of fours and singles in
middle overs highest accuracy obtained. The attribute, as mentioned above, gave an accuracy
of 81% using SVM.

3.1.4 An Analysis of Bangladesh One Day International Cricket Data: A Machine


Learning Approach
[4]Md. Muhaimenur Rahman analyzed Bangladesh One Day International Cricket data. They
divided the study into three sections, i.e., at the start of the game, after one inning and after the
fall of wickets. They used Decision Tree and got an accuracy of 63.63% at the beginning of

3
the game, Proceedings of the Fifth International Conference on Communication and
Electronics Systems (ICCES 2020) IEEE Conference Record # 48766; IEEE Xplore ISBN:
978-1-7281-5371-1 978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1267 Authorized licensed use
limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC
from IEEE Xplore. Restrictions apply. 72.72%, 81.81% in first and second innings, and 80%
and 70% for fall of wicket analysis.

3.1.5 A DEA model for Selection of Indian Cricket team players


[5]Riju Chaudhari et al. used a DEA(Data Envelopment Analysis) for measuring the efficiency
of players. The paper takes records of the player's performance in test matches. Since there can
be factors such as match-fixing in T20. Since in test series, every player might have to do
batting a bowler with a higher batting strike rate is given preference. This paper is very different
from other paper since it does not take direct machine learning techniques and instead uses a
unique approach.

3.1.6 Bangladesh cricket squad prediction using statistical data and genetic algorithm
[6]Md. Jakir Hossain used a genetic algorithm on 30 players in the Bangladesh cricket team
to select top players. The paper combines statistical analysis with a genetic algorithm to choose
top tier players. Every possible solution out of 30C14 total solutions taken as a chromosome.
The ratings of players considered using a statistical method. The final fitness value calculated
using factors like the sum of a rating of players, number of bowlers, batsmen, all-rounders, and
wicket keepers, number of spins and fast bowlers, and number of the right and left-handed
players.

3.1.7 A survey on team selection in game of cricket using machine learning


[7]Vipul Punjabi and team used naive Bayes classifier to predict the runs scored by batsmen
and wickets taken by the bowler. The runs scored and wickets taken classified into different
categories. The dataset used for this is taken from records of IPL matches. This paper takes
remarkably few input features, thus reducing a lot of potential of the model.

4
CHAPTER 4
OBJECTIVES

Cricket is now the most popular game in south Asian subcontinent. People ranges from all age
and occupation are great fan of cricket because Bangladesh team is doing well in the recent
past. However, many confusions arise before a match about team combination, e.g., which
player to pick or drop for the next match, which batsman should play in which position or
which bowler should be picked for the next match. From here the necessity and motivation of
the work emerged in order to evaluate the players by predicting their performance using
machine learning technique, as it will remove the biasness of the player selection process as
well as facilitate our cricket team.
The objective of the proposed system are as follows
• To collect the Data from espncricinfo.com.
• To find out the performance of the players from the players pool based on the past
records of players.
• To find out the best combination of players for the next match.

5
CHAPTER 5
TOPIC DESCRIPTION

5.1 Proposed Methodology

Data of the past ODI matches used to create the dataset, which is mentioned in detail in the
next section of this paper. The performance of batsmen, according to the runs scored is divided
into various classes, the same done for the number of wickets taken by the bowler.

Fig. 5.1: Proposed Model

The dataset which was created was split into 2 parts, 80% of the dataset was used for training
the model and 20% of the dataset was used to evaluate the results (refer fig 5.1). Various
algorithms like logistic regression, SVM, and random forest were used to get the results which
are discussed in detail in the next section of this paper.

5.2 Algorithms And Techniques

5.2.1 Logistic Regression


Logistic regressor usually used for binary classification tasks. In the case of multiclass
classification, softmax function used in place of the sigmoid function. [8] The hypothesis
function for logistic regression is given as g(z)=1/(1+e-z).

6
Fig. 5.2.1.1 Graph of g(z) vs z

The plot (refer Fig.5.2.1.1) for g(z) tends towards one as z tends to infinity. And it tends towards
0 as z tends to negative infinity.

Fig.5.2.1.2 Cost function vs h(x) when y=0

Fig.5.2.1.3 Cost function vs h(x) when y=0

The cost function can be written in one line as [8]: cost(hθ, (x),y) = -ylog( hθ(x) ) - (1-y)log(
1- hθ(x))

7
5.2.2 Support Vector
In support vector classifier, the hyper plane helps to distinguish different classes. There are
different kernels to separate non-linear data by mapping them to higher dimensions [9]. Many
hyperplanes might classify the data successfully. One reasonable choice as the best hyperplane
is the one representing the most significant separation or margin between the two classes. So,
the hyperplane is chosen in a way that the distance from it to the nearest point maximized.

Fig.5.2.2.1 SVM Hyperplane

The solid line (refer Fig.5.2.2.1) represents one of the possible hyperplanes.

5.2.3 Random Forest


Random Forest is an ensemble learning method for classification, regression, and other tasks
by taking a combination of results of several decision trees.
Random forests used to rank the importance of variables in a regression or classification
problem in a natural way. Most of the batsmen score a meager amount of runs which is given
by the graph of the dataset provided below (Fig 5.2.3.1).

Fig.5.2.3.1 Distplot of Runs

8
There is a strong relationship between strike rate and performance of players. The higher the
result, the better is the performance (refer Fig 5.2.3.1).

Fig.5.2.3.1 Joint plot of Result vs Strike Rate

5.3 Data Description

Data Collection: The dataset is made from sites like espncricinfo.com, one of the legit sites.
A CSV file created using the data from previous matches played by the Indian cricket team.
And for other conditions, the summary was used.

Fig. 5.3.1 A part of the Batsman dataset

The datasets: Figure 5.3 is a small sample of the batsmen dataset. To process this data, one hot
encoding used. The player names and his stats are hardcoded in the backend. After the stats
entered in the API, all players' performance is recorded in a dictionary using for loop. The

9
dictionary then sorted in the reverse order to give the names of the top players. There are eight
opposition teams and 15 grounds taken into consideration. Below is a part of the dataset after
preprocessing with pandas library.

Fig.5.3.2 Part of Batsmen dataset

Feature Selection: The previously created models for selection of optimal team contained
features like Opponents, Runs Scored, Strike Rate, and Overall Average. The following
attributes are considered to measure a player’s performance.
Batting Attributes: Position, Matches Played, Runs, Strike Rate, Ground, Home Away, 50s,
100s, overall average, pitch, opponent, and weather.
Bowling Attributes: Matches Played, Wickets, Average, Economy, Strike Rate, Ground, Pitch,
Opponent, Weather, Home Away.
All-rounder Attributes: matches played, wickets, runs scored, strike rate, average, ground,
home away, opponent, weather, and pitch.
Since these models considered very few features, the accuracy was not good enough. Hence, a
model is developed with some extra features like matches played, pitch, weather, and so on.
These models only considered batsmen and bowler's to create a team of 11 players. But if
bowlers and batsmen compared with all-rounders, all-rounders will always get fewer ratings.
Thus all-rounders are considered while creating the model to generate an official team.

10
CHAPTER 6
IMPLEMENTATION AND RESULTS

For implementation, Flask API is used. A jupyter notebook is used to train the model and use
the available algorithms in the scikit lean library in python. Below is the screenshot of our
implementation and a sample result. The input parameters, which are the same for batsmen,
bowlers, and all-rounders, are taken into consideration.

Fig.6.1: Input display

The sample output on clicking the predict button is given below in Fig.6.2.
Using the scikit learn library in python, various algorithms used to predict the results. Different
techniques like logistic regression, SVM classifier, decision tree, and random forest used to
predict the classes, out of which Random forest gave the best results.

11
Fig.6.2 Output display for all-rounders

When the random forest classifier is used with a test size of 20%, the following results for the
batsmen dataset are obtained.

Fig.6.3 Random Forest Classification Report

The support vector classifier gives the below classification report.

Fig.6.4 Support Vector Classification Report

While the logistic regression algorithm gave the worst results.

12
Fig.6.5 Logistic Regression Classification Report

So the process is further proceeded with the random forest algorithm.

13
CHAPTER 7
CONCLUSION

The proposed work can address the issue of selecting the optimal team in cricket without any prejudice
and give equal importance to all-rounders. This method can successfully implemented in a web
application by using a flask to run our project. This model provides 76% accuracy for batsmen (refer
Fig 7.1) around 67% accuracy for bowlers (Fig 7.3) and 95% for all-rounder (Fig 7.2). The results are
verified for 20% of the dataset and obtained the above results.

Fig.7.1: Classification report for batsmen


For batsmen, the dataset (refer fig 7.1) had the highest accuracy for class 5 and lowest for class 3.

Fig 7.2: Classification report for all-rounders


For all-rounder (refer fig 7.2) both the precision and recall were very high leading to a good f1-score
and a high accuracy of around 96%.

Fig.7.3: Classification report for bowlers


This analysis could be done between the game where the number of dots, remaining overs, number of
wickets left, and strike rate are known, which could help the players decide the position they should
play, giving even better results. As these factors determine the game's outcome within split seconds, a
lot of work could be done in these dynamic factors leading to a beneficial model.

14
REFERENCES

[1] Aminul Anik, “Player’s Performance Prediction in ODI Cricket Using Machine Learning
Algorithms” BRAC University, Dhaka, Bangladesh, 4th International Conference 2018 on
Electrical Engineering and Information and Communication Technology.
[2] Amal Kaluarachchi, Aparna S. Varde, "CricAI: A Classification Based Tool to Predict the
Outcome in ODI Cricket " thesis, Montclair State University, Montclair, NJ, USA, 2010 Fifth
International Conference on Information and Automation for Sustainability.
[3] Pranavan Somaskandhan, Gihan Wijesinghe, Leshan Bashitha Wijegunawardana,
Asitha Bandaranayake, and Sampath Deegalla, "Identifying the Optimal Set of Attributes
that Impose High Impact on the End Results of a Cricket Match Using Machine Learning,"
2017 IEEE International Conference on Industrial and Information Systems (ICIIS)
[4] Md. Muhaimenur Rahman, Md. Omar Faruque Shamim, Sabir Ismail, "An Analysis of
Bangladesh One Day International Cricket Data: A Machine Learning Approach," Computer
Science & Engineering Sylhet Engineering College Sylhet, Bangladesh, 2018 International
Conference on Innovations in Science, Engineering and Technology (ICISET)
[5] Riju Chaudhari, Sahil Bhardwaj, Sakshi Lakra, " A DEA model for Selection of Indian
Cricket team players." 2019 Amity International Conference on Artificial Intelligence.
[6] Md. Jakir Hossain, "Bangladesh cricket squad prediction using statistical data and genetic
algorithm".2018 4th International Conference on Electrical Engineering and Information and
Communication Technology.
[7] Vipul Pujbai, Rohit Chaudhari, Devendra Pal, Kunal Nhavi, Nikhil Shimpi, Harshal
Joshi, “A survey on team selection in game of cricket using machine learning.” Nov 2019, Vol
6, Issue 11, International Research Journal of Engineering and Technology.
[8] Park, Hyeoun-Ae, “An Introduction to Logistic Regression: From Basic Concepts to
Interpretation with Particular Attention to Nursing Domain”. J Korean Acad Nurs Vol.43 No.2
April 2013.
[9] C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines,” ACM
transactions on intelligent systems and technology(TIST), vol. 2, no. 3, pp. 1–27, Jan. 2011.
[10] D. C. Montgomery, E. A. Peck, and G. G. Vining, “Introduction to linear regression
analysis”, vol. 821. John Wiley & Sons, 2012.
[11] Raj, J.S.,& Ananthi,J.V, “Recurrent Neural Networks and Nonlinear Prediction in Support
Vector Machine”. Journal of Soft Computing Paradigm(JSCP) in 2019,1(01),33-40.
[12] H. H. Lemmer, "A measure for the batting performance of cricket players: research article,"
South African Journal for Research in Sport, Physical Education, and Recreation, vol. 26, no.
1, 2004.

15

You might also like