Quantifying and Analyzing The Performance of Cricket Player Using Machine Learning
Quantifying and Analyzing The Performance of Cricket Player Using Machine Learning
ISSN No:-2456-2165
Abstract:- In cricket, automation for learning,                             makes the challenge in analyzing the accurate outcome of
analyzing, guessing, and predicting is important. As                        the cricket match. Sports have gained much importance at
cricket is a sport that is having high demand, no one                       both national and international levels. Cricket is one such
knows who will win the game until the last over. And                        game, which is marked as the most prominent sport in the
there are various factors inclusive of men or women,                        world. The suggested analysis model makes use of SVM
crew performances, and some diverse environmental                           and KNN to fulfill the objective of the problem stated. Our
elements that need to be taken into consideration in                        work novelty is to analyze runs for each ball by keeping the
planning a recreation method as a result, we decided to                     runs scored by the batsman in the previous ball as the
create a machine-learning model to analyze the game                         observed data and to verify whether our prediction fits into
using previous match data For this interest, we used a                      the desired model.
records evaluation and statistical equipment to
procedure statistics and bring some suggestions.                                  Predictive modeling using data science is increasingly
Implemented models can help selection makers                                important in the world of sports. One of the well-known
throughout cricket games to test a crew’s strengths in                      sports in India is cricket. Any team, on a given day, has a
competition to the other and environmental elements.                        chance to win the game with its play. This makes it
Right here we’re got used sklearn, preprocessing, and                       difficult to make an accurate prediction. The result of the
label encoder, and for compilation were got used                            cricket game. At the national and international levels,
random woodland classifier set of rules to illustrate the                   sports have grown significantly in significance. One such
conditions and recommendations for problem fixing We                        game that is recognized as the most popular sport
can also predict match outcomes from past experiences                       worldwide is cricket. One of the cricket formats recognized
by using some algorithms like Support Vector Machine                        by the International Cricket Council is T20 (ICC).
(SVM), Naive Bayes, k-Nearest Neighbor (KNN) are
used for classification of match winner and Linear                                SVM and KNN are used in the suggested prediction
Regression and decision tree for the prediction of an                       model to achieve the problem's stated goal. There haven't
inning’s score. The dataset contains huge data on the                       been many studies done in this area of cricket match
previous performance of bowlers and batsmen in                              prediction. In our study, we discovered that the work done
matches, many Seven features have been identified that                      so far for assessing and forecasting the results of the match
can be used for prediction. Based on those features,                        is based on data mining. By using the runs the batter scored
                                                                            on the previous ball as the observed data, our approach is
models are built and evaluated using certain
parameters.                                                                 new in that we predict runs for each ball and then check to
                                                                            see if our prediction fits the intended model.
Keywords:- Random Forest Classifier, Support Vector
Machine (SVM), Naive Bayes, k-Nearest Neighbor (KNN),
NumPy, Data Mining, Analysis.
I. INTRODUCTION
KNN
                                                                    NAÏVE
                                                                    BAYES
B. SUPERVISED
    The ideal paradigm for machine learning is supervised
learning. Since it is the easiest to understand, it is also the
easiest to put into practice. Learning a function that
converts an input into an output with the help of example
input-output pairs is the challenge at hand. It infers a
function from tagged training data made up of a collection
of coaching instances. Each example in supervised learning
may be a pair made up of an input item, such as Typically,
a vector and an output value are used.
Fig. 3: KNN
                                                                             NumPy:
                                                                                 A general-purpose package for handling arrays is
                                                                            called NumPy. It offers a multidimensional array object
                                                                            with outstanding speed as well as capabilities for
                                                                            interacting with these arrays. It is the cornerstone Python
                                                                            module for scientific computing. It has a number of
                                                                            characteristics, including the following crucial ones:
                                                                             Tools for integrating C/C++ and Fortran code;
                                                                             Tools for integrating C/C++ and Fortran code;
                                                                             Beyond its apparent applications in science, NumPy also
                                                                              functions well as a multi-dimensional storage container
                                                                              for general data.
                                                                             Pandas:
                                                                                Built on top of the NumPy library is the open-source
                                                                            Pandas library. It is a Python package that provides a
                                                                            number of data structures and actions for working with
               Fig. 5: Random Forest Classifier                             time series and numerical data. It is quick and offers
                                                                            consumers exceptional performance & productivity. Python
C. UNSUPERVISED:                                                            programming language offers high-performance and
    Unsupervised learning is a machine learning technique                   simple-to-use data structures and data analysis tools.
where supervision of the model is not necessary. Instead,                   Pandas are utilized in a variety of academic and
you should allow the model to carry out its own data                        professional subjects, including economics, statistics,
collection calculations.                                                    analytics, and other areas.
C. Utilizing Machine Learning to Predict Cricket Match                             The following modules for analysis, prediction,
   Results.                                                                   ranking, and visualizations are implementable.
   The numerous elements that affect a match' s                            Overall group efficiency
result in the Indian Premier League were found in this                         Batman evaluation and ranking
study.The home team, the away team, the toss winner, the                       Batman evaluation and ranking
toss decision, the venue, and the weight of therespective                      Bowler evaluation and ranking
teams are the seven variables that have a substantial impact                   match evaluation
on the outcome of each IPL match.The points earned by                          team evaluation
each player are determined by a multivariate regression-                       head-to-head evaluation
based model that takes intoaccount their past performances,
which include                                                              VI.       ADVANTAGES OF THE PROPOSED SYSTEM
 The number of wickets taken
                                                                              To ensure that your predictions are accurate, take into
 The number of dot balls given
                                                                              account the following:
 The number of ours hit
                                                                               types of players
 The number of sixes hit
                                                                               The pitch's condition
 The number of catches
                                                                               injured athletes
 The number of stumpings.
                                                                               Comparison statistics
           VIII.       FUTURE SCOPE                                      unsuited. As a result, every percentage point gain in the
                                                                         model's accuracy would be regarded as extremely crucial.
      This project can be improved in a number of different              Additionally, we intended to create a model that
ways with future work.                                                   outperforms earlier iterations of such a model because
 The data set may contain certain external variables, such              cricket is a game whose outcome depends on a number of
  as player weariness, player injury, winning streaks with               variables. We proceeded with the Random Forest Classifier
  particular teams, overall winning streaks, average runs                Prediction Model as a result. On the input data, the Support
  scored by a team in prior matches against a particular                 Vector Machine (SVM), Naive Bayes, and k-Nearest
  team, etc. We can try to forecast outcomes based on                    Neighbor (KNN) algorithms are applied to determine the
  these variables and track how accurate our predictions                 optimum performance. Performance indicators are used to
  turn out to be.                                                        compare these strategies. Support Vector Machine (SVM)
 The prediction can take into account the performance of                provides a higher accuracy score on test data than the other
  the players in the team, such as the total number of runs              two algorithms, according to the study of the metric.
  scored by a player in the tournament, the player's form
  guide, the number of men of the match awards earned,                                        REFERENCES
  etc. in addition to high-level information about the
  different matches, such as the winner of the toss, the                 [1.]   College students’ prevalence and perceptions of text
  outcome of the toss, the home team, etc.                                      messaging while driving Author links Open overlay
 There are no web/mobile applications or user interfaces                       PanelMarissa A.Harrison.R.
  in my project. Therefore, it is conceivable to develop a               [2.]   Rabindra Lamsal and Ayesha Choudhary,
  web application that would receive the entire set of data                     “Predicting Outcome of Indian Premier League
  as input and output the predictions for each occurrence as                    (IPL) Matches Using Machine Learning”,
  a pdf or text file.                                                           arXiv:1809.09813v5 [stat.AP] 21 Sep 2020.
                                                                         [3.]   Pallavi Tekade, Kunal Markad, Aniket Amage,
              IX.       CONCLUSION                                              Bhagwat Natekar, “Cricket Match Outcome
                                                                                Prediction Using Machine Learning”, International
     As can be seen in the Results, we used the Random                          Journal Of Advance Scientific Research And
Forest Classifier Algorithm to develop our prediction                           Engineering Trends, Volume 5, Issue 7, July 2020,
model and achieved a high accuracy rating. As a proof of                        ISSN (Online) 2456-0774.
concept, we have also included two additional algorithms.                [4.]   Ch Sai Abhishek, Ketaki V Patil, P Yuktha,
The first one is Multinomial Logistic Regression, which                         Meghana K S, MV Sudhamani, “Predictive
was one of the most frequently used algorithms in earlier                       Analysisof IPL Match Winner using Machine
prediction models, and the second one is AdaBoost, which                        Learning Techniques”, International Journal of
has not previously been used in cricket match prediction                        InnovativeTechnology and Exploring Engineering
models but is less prone to overfitting and is regarded as a                    (IJITEE) ISSN: 2278-3075, Volume-9 Issue-
good fit in this model. As can be seen from the data,                           2S,December 2019.
AdaBoost is a good fit but not the best for our model that               [5.]   Shubhra Singh, Parmeet Kaur, “IPL Visualization
attempts to predict the outcome of IPL and T20 matches,                         and Prediction Using HBase”, Procedia Computer
whereas Multinomial Logistic Regression is completely                           Science 122 (2017) 910–915.
BIOGRAPHIES:
                  Dr. Chaitanya Kishore Reddy. M is currently working as a Professor and Dean in the Department of
                  Information Technology at NRI Institute Of Technology, Pothavarappadu, Agiripalli, Krishna(dist.), India. He
                  received Ph.D. in Computer Science and Engineering and M. Tech in Computer Science and Engineering at
                  Jawaharlal Nehru Technological University, Kakinada. He has Published 40 research papers in various National
                  and International Journals and International Conferences. He is a member of ISTE, CSI, and IAENG. His
                  research areas are Mobile Ad-hoc Networks, IoT, and Cloud Computing.
                   SK. Arshiya Mobeen is currently studying B.Tech with a specification in Information Technology atthe NRI
                   Institute of Technology. She has donea summer internship on Cricket match analysis.
                   P. Mounika Sridevi is currently studying B.Tech with the specification of Information Technology at NRI
                   Institute of Technology. She has donea summer internship on Cricket match analysis.
                   U. Nithin Kumar is currently studying B.Tech with a specification in Information Technology at NRI
                   Institute of Technology. He has done a summer internship on Cricket match analysis