0% found this document useful (0 votes)
16 views7 pages

Indian Premier League Player Selection Model Based On Indian Domestic League Performance

The document presents a model for selecting Indian Premier League (IPL) players based on their performance in domestic cricket leagues. It utilizes data mining and machine learning techniques to analyze player statistics from various domestic tournaments to predict their suitability for IPL selection. The study aims to improve decision-making in player selection by providing an explainable model that correlates domestic performance with IPL outcomes.

Uploaded by

dr.saifalikhan90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Indian Premier League Player Selection Model Based On Indian Domestic League Performance

The document presents a model for selecting Indian Premier League (IPL) players based on their performance in domestic cricket leagues. It utilizes data mining and machine learning techniques to analyze player statistics from various domestic tournaments to predict their suitability for IPL selection. The study aims to improve decision-making in player selection by providing an explainable model that correlates domestic performance with IPL outcomes.

Uploaded by

dr.saifalikhan90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/350149401

Indian Premier League Player Selection Model Based on Indian Domestic League
Performance

Conference Paper · January 2021


DOI: 10.1109/CCWC51732.2021.9376011

CITATIONS READS
7 1,441

5 authors, including:

Amlan Ghosh Pritilata Saha

2 PUBLICATIONS 10 CITATIONS
Paderborn University
3 PUBLICATIONS 16 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Pritilata Saha on 17 July 2021.

The user has requested enhancement of the downloaded file.


Indian Premier League Player Selection Model
Based on Indian Domestic League Performance
Amlan Ghosh∗ , Abhirup Sinha† , Pritam Mondal‡ , Anusree Roy§ and Pritilata Saha¶
∗ Department of Industrial Engineering and Operations Research
Indian Institute of Technology Bombay, Mumbai, India
Email: [email protected]
† Department of Computer Science

Paderborn University, Paderborn, Germany


Email: [email protected]
‡ Data & Analytics Department

IDFC First Bank, Mumbai, India


Email: [email protected]
§ Department of Information Technology

Indian Institute of Engineering Science and Technology Shibpur, Howrah, India


Email: [email protected]
¶ Department of Computer Science

Paderborn University, Paderborn, Germany


Email: [email protected]

Abstract—For squad selection in any sports events or (b) One-Day Cricket - As evident from the name, one match
tournaments, performance analysis of players at the international lasts only for a day. Each team gets to play one batting
level and the domestic level is necessary. In cricket, the Indian and fielding innings, and each innings can run for a
Premier League (IPL) is one of the prestigious tournaments in
the world. Its teams constitute a diaspora of Indian and foreign maximum of 50 overs. Thus, one-day cricket matches
players. Hence, it is of utmost importance that only the better take a lot less time than test matches.
performing players get selected. A player’s performance can be (c) Twenty-20 Cricket – The twenty-20 or T20 format of
broadly divided into two major categories in cricket, i.e. batting cricket is the latest format, where a match lasts only for
performance and bowling performance. Our study proposes a several hours, usually around three hours, and takes less
unique approach to mark an Indian player possibly suitable
for the IPL, based on his performance in Indian domestic time than one-day matches. Though both the teams play
tournaments. one batting and fielding innings; an innings only lasts for
Index Terms—Sports Analytics, Indian Premier League, Indian 20 overs, making the game highly attractive to spectators.
Domestic Cricket, Data Mining, Imbalance Learning, Binary
Classification The governing body of world cricket, International Cricket
Council (ICC), has recognised the above three formats at the
I. I NTRODUCTION international and domestic level. In India, the Board of Control
for Cricket in India (BCCI), the national governing body for
Cricket is a popular game, played between two teams, cricket, organises several domestic and professional cricket
each team having 11 players. The game has to be completed competitions like “Ranji Trophy”, “Vijay Hazare Trophy”,
within a specific time-frame and is governed by some rules “Deodhar Trophy”, “Syed Mushtaq Ali Trophy”, “Indian
and regulations. At the game, the two teams bat and field Premier League”, “Karnataka Premier League”, “Tamil Nadu
alternately, each such turn is called an innings. The batting Premier League” etc. in the above three formats only. Among
team aims to score as many runs as possible, while the these competitions, the “Indian Premier League” (IPL) has
fielding team tries to restrict that. To win the game, one team garnered worldwide coverage.
has to overcome the opponent team’s total runs or restrict
The IPL is a franchise-based professional T20 league in
the opponent team to lesser runs. There are primarily three
India, started in 2008. There are currently eight teams in the
different formats of cricket, played worldwide-
IPL, and a team is composed of Indian and overseas players.
(a) Test Cricket - This is the earliest format of the cricket In playing eleven, a maximum of four overseas players can
game. Usually, each match lasts for 4-5 days, and each be played, and each team can have 18-25 players, and at
team gets to play two batting and two fielding innings. most eight overseas players in the squad. Rest Indian players
In a day, 90 overs are bowled. are selected based on their performance in domestic and/or
978-0-7381-4394-1/21/$31.00 ©2021 IEEE national levels. Usually, the Indian player cluster consists of
national players and high performing domestic players. The combining four and five wicket hauls and applied linear and
selectors must choose such domestic players for auctioning, polynomial regression for ranking [7]. Building on the top
who have performed consistently well in Indian domestic of that, Santra, Sinha, Saha and Das also calculated several
leagues such as “Ranji Trophy”, “Vijay Hazare Trophy”, derived metrics like ‘SR R’, ‘ACVT’ and ‘Boundary’ for
“Deodhar Trophy”, “Syed Mushtaq Ali Trophy” etc. As recent batsmen ranking. ‘SR R’ combines the effect of ‘Strike Rate’
reports [1], [2] suggest that performance in IPL can help and ‘Runs Scored’. ‘ACVT’ considers the effect of High
a player get inducted into the Indian national cricket team, Score, Numbers of hundreds and fifties. They calculated the
players must be chosen decisively from the pool of domestic ‘Boundary’ parameter as runs coming from fours and sixes.
players; so that they can benefit the national side on a long run. Regression techniques were applied for valuation and ranking
So, We have created a model that can help in such decision- of players [8]. Though their results correlated with original
making tasks, and do that in an explainable manner. purple cap and orange cap ranking mostly, they could not
address the need for an alternative evaluation method.
II. L ITERATURE S URVEY Saikia, Bhattacharjee, and Lemmer used Multilayer
A player’s performance plays a vital role in the selection of Perceptron (MLP) to analyze bowlers’ performance in the
an efficient team for any game. Several works have been done IPL. They used the performance data of bowlers in the IPL
on the field of sports analytics, related to player performance up to 2010 to train the model. They derived a measure called
analysis as well as analysis of team performance. These works combined bowling rate (CBR) to quantify the performances of
are mostly related to football, basketball etc. A few works are bowlers. They measured the CBR by combining three bowling
being done in cricket too, but research work in T20 cricket is statistics- bowling average, economy rate and bowling strike
much limited. rate using harmonic mean. They used a categorical response
Chakraborty, Sen and Bagchi introduced a combinatorial variable, which was determined from the CBR, to predict
auction scheme, named as “Multi-Round with Price level the performances of bowlers. They predicted performances
Feedback” (MRPF). In a combinatorial auction scheme, of those bowlers who got selected for the next season of
instead of single-item auction, multiple items auction could the IPL. This artificial neural network model could help a
be done simultaneously. The MRPF scheme could provide franchisee to decide upon bowlers and their cost for their
an efficient auction system for player selection in the IPL; team [9].
though, in a few cases, MRPF was not a very good option to Our aim in this work has been to construct a model that
get the fees of individual players [3]. will help to determine an Indian player’s induction into the
Prakash, Patvardhan and Lakshmi analyzed the IPL IPL, based on his domestic cricket performance. Our approach
performance data up to 2015 and T20 performance data is unique because the feature choice and preprocessing we
of players up to January 2016 to design a performance used and no such work has previously been done to predict
ranking scheme for batsmen and bowlers in IPL. This the IPL selection of Indian players based on domestic league
performance ranking scheme was determined using Random performance. In the process, we have applied supervised
Forest-based recursive feature elimination algorithm. The machine learning to achieve our objective. We have collected
feature elimination algorithm extracted relevant features like scorecard data of all Indian players who played in major Indian
“IPL Consistency”, “T20 Consistency”, “IPL FastScorer”, domestic tournaments like ”Ranji Trophy”, ”Vijay Hazare
“IPL ShortPerformance”, “IPL WicketTaker” etc., which Trophy”, ”Syed Mushtaq Ali Trophy” etc. between 2010 and
could help to select the best 11 players for each team 2019, and the corresponding IPL squad data from 2011 to
[4]. They also gave a new index-based approach named 2020. Players were grouped by their roles and were marked
“Deep Performance Index” (DPI) and used the above for selection accordingly. We trained our model on domestic
mentioned extracted features to design the DPI. The season data from 2010 to 2018 and used 2019 domestic season
weighted average of the indices like “HardHitter”, “Finisher”, data to predict 2020 IPL data. The final results are compared
“FastScorer”, “Consistent”, “Economy”, “WicketTaker”, against the actual 2020 IPL squad data, thereby validating our
“ShortPerformance” etc. were used for determining DPIs for model.
the players. According to the authors, DPI could capture
performance-related data for players better than other ranking III. M ATERIALS AND M ETHODS
schemes for T20 cricket [5]. Prakash also designed and
implemented two genetic algorithms, namely- Simple Genetic A. Data Collection
Algorithm (SGA) and Memetic Genetic Algorithm (MGA), The ESPNCricinfo website houses year-wise data on
to search for and optimise the playing team. In his work, cricket matches happening around the world [10], and it
he created a multi-objective optimization model using these also provides a REST API (Representational State Transfer
genetic algorithms. This model was used to select the best Application Programming Interface) based framework for
playing team [6]. data collection. For our work, data collection was to be done
Santra, Mitra, Sinha and Das gave an alternative approach in two phases- Firstly, players’ domestic performance data
to bowler ranking in the IPL, by evaluating past profiles. was to be collected, and then the IPL squad information was
They calculated a derived parameter named ‘LW Haul’, to be collected.
For domestic performance, we have considered match 2) Tournament Clustering: Various domestic leagues in
scorecards. Scorecards of matches since 2010 on Indian India run throughout the year, and not every player can
domestic leagues of different formats i.e. play in every league. Such a situation creates a large sparse
• Test Format : “Ranji Trophy” matrix for training and evaluation purposes. A large sparse
• ODI Format : “Vijay Hazare Trophy”, “Deodhar Trophy” matrix requires very large memory, which is known as space
• T20 Format : “Syed Mushtaq Ali Trophy”, “Karnataka complexity, and it takes a lot of time to perform any operation
Premier League”, “Tamil Nadu Premier League” on such a matrix, also known as time complexity. A sparse
matrix can also have a lot of features, which might make a
Data was collected from the website and was obtained in JSON predictive modelling task more challenging. Thus said, we
(JavaScript Object Notation) format. This data was parsed, have clustered performance data of every player in various
processed and stored in tabular format for our work. tournaments according to their formats, i.e. Test, One-Day and
The IPL squad data was needed to determine which T20. These clusters are created as follow-
player was selected in the IPL season, following a domestic • Test Format - Contains performance data of players in
season, and in which role. As we could not find any API the “Ranji Trophy” only.
based implementation to collect this data from ESPNCricinfo • One-Day Format - Contains performance data of players
website, we had to implement an HTML (Hypertext Markup in the “Vijay Hazare Trophy” and “Deodhar Trophy”.
Language) based scraper to find the information directly • Twenty-20 Format - Contains performance data of players
from the IPL squad pages and player profile pages. Relevant in the “Syed Mushtaq Ali Trophy”, “Tamil Nadu Premier
information since the 2011 IPL season was collected, League” and “Karnataka Premier League”.
processed and stored in tabular format for further use. Such feature-level clustering helped us to reduce sparsity
without any data loss and to reduce space complexity and
B. Preprocessing time complexity, as mentioned above. Previously we had 126
feature points, corresponding to batting and bowling statistics
1) Primary Features: The data for each player were for each tournament. After such clustering, we reduced it to
associated with “Player Type”, i.e. “Batsman”, “Bowler”, 74 effective feature points. Thus, we effectively scaled down
“All-Rounder” or “Wicket-Keeper”. We took batting statistics our data to a relatively lower-dimensional space. It improved
for each innings i.e. “Batting Position”, “Ducks”, “Runs our learning model’s performance and made the model more
Scored”, “Balls Faced”, “Dots”, “Fours”, “Sixes”, “Fifties” explainable.
and “Hundreds”. Those measures were aggregated to get 3) Derived Features: While building the model, we came
the season-wise statistics for a batting performance, i.e. total up with some newly derived features that fit best to predict a
innings played, average batting position throughout the year, player for selection in the IPL. We introduced several new
average runs scored, ducks, balls faced, dots, fours hit, sixes features, by merging some features from one-day and test
hit, fifties and hundreds scored. We also calculated the strike format, thus forming non-T20 format features. For batting
rate for each batsman as it is a frequently used statistic. We performance, we merged features such as the average number
derived a new variable, “batting boundary strike rate” for of fifties and hundreds scored and the average number of fours
one-day and T20 format as the frequency of boundaries in in both formats. For bowling performance, features like wides,
a limited over match can be considered a good measure for a no balls and runs conceded were taken into consideration to
batsman’s aggressiveness. As we calculated the average runs merge and form new features.
per innings, we dropped the “total innings count” feature to 4) Dataset Resampling: Our training data consists of 6943
make the features independent of each other. We also dropped data points. Among these, around 83% of players, i.e. 5803
the features “average dots” and “ducks count”, as high sparsity players were not selected for any of the IPL and 1140 players,
was present in those. i.e. 17% got a chance to play in the IPL. It can be seen
For bowling statistics, we selected features for each innings that our dataset is highly skewed towards the players that do
like “overs bowled”, “maidens given”, “dots bowled”, “runs not get selected. Thus, any learning algorithm with such an
conceded”, “fours conceded”, “sixes conceded”, “wickets imbalanced dataset will have a higher probability to predict
taken”, “wides given” and “no balls given”. These measures non-selection of a player in the IPL. This situation led us to
were also aggregated to get the annual performance metrics balance our dataset.
like total innings count, average overs bowled, maidens It was seen that resampling and balancing an imbalanced
given, dots bowled, runs conceded, fours and sixes conceded, dataset gave an improved and better result for prediction on
wides and no balls given. We also calculated some widely medical datasets like the dataset of cardiovascular patients,
used bowling statistics such as economy rate (runs per over), obtained from Hull and Dundee clinical site data [11]. Mondal,
bowling strike rate (balls bowled per wicket taken), bowling Ghosh, Sinha and Goswami used oversampling technique in
average (runs conceded per wicket taken). We dropped the the Yelp restaurant review dataset to determine whether the
average of “fours conceded” and “sixes conceded” due to tone of a given review is “positive”, “negative” or “neutral”
high sparsity and also the “number of innings played” to [12]. Pelayo and Dick used oversampling to improve upon
make the features independent. existing methods of software defect prediction [13].
To balance our dataset, we also performed oversampling. IPL as “Class 1”. A confusion matrix is a matrix representing
Oversampling is generating new data points for minority the number of correct and incorrect predictions for each class.
classes. We used the SMOTE (Synthetic Minority Over- The confusion matrix is presented in the Table I.
sampling Technique) algorithm to balance the dataset, i.e. to
Predicted
generate data points for players who got selected in the IPL. Class 0 Class 1
In this technique, generally, any two data points of minority Class 0 747 193
Actual
class are taken, and new synthetic data points are generated by Class 1 20 84
taking a convex combination of these points [14], [15]. After TABLE I: Confusion matrix of predicted selection of players
applying SMOTE, our resampled dataset was observed to be
uniform, and it consisted of 11606 data points with 5803 data The overall accuracy of our prediction was 79.59%. The
points in each class. balanced precision, recall and F1 Score are presented in the
C. Model Building Table II.
Ensemble methods are machine learning methods used to Precision Recall F1 Score
Class 0 0.81 0.79 0.80
produce the optimal predictive model combining several base Class 1 0.79 0.80 0.80
classification or regression models. Bagging and Boosting
are the two types of ensemble methods, commonly used TABLE II: Precision, Recall & F1 Score of our model
for prediction purposes. Boosting is the ensemble technique
that uses several weak classifiers to form a robust classifier.
V. D ISCUSSION
Gradient Boosting is one of the widely-used boosting
algorithms for classification. A. Features
Gradient Boost creates the final model using a collection of The type of a player was a primary feature used to judge a
weaker models. The weaker models tend to overfit the data player performance. To evaluate players’ batting performance,
but combining such models to build an ensemble learning we selected the features batting position, runs, balls faced,
model, leads to a much-improved result. The individual models fours, sixes, fifties and centuries. For the evaluation of bowling
are built in a sequential manner, every time it puts more performance we selected features of overs, maidens, runs
weight on the instances that are mislabeled. Increasing the conceded, wickets, wides, no balls, economy rate, bowling
weights of incorrectly labelled data points, Gradient Boost will average and bowling strike rate. We chose such features as
try to learn from past mistakes and make predictions more these are the most used and complete features to represent a
accurately. player.
The loss function, the aggregate of the error between the We collected the data on per match basis. We aggregated
actual and predicted results is minimized using the gradient it per year to get an annual performance of each player. We
descent method. The goal of gradient descent is to find the did not evaluate batting and bowling performances separately,
direction using the partial derivative of the loss function in but combining both the performance helped the model to
each iteration to find the model’s optimal parameters. After predict the player more precisely. The model learned that a
every training, the weak learner is gradually improved to player with only good batting performance (Batsman) or only
become a strong learner to predict the output. good bowling performance (Bowler) was equally likely to get
We used the XGBoost algorithm, an implementation of selected in the IPL with a player of average bowling and
gradient boosting algorithms. XGBoost stands for “Extreme batting performances (All-Rounder). As T20 is very restricted
Gradient Boosting”, which is a highly efficient, flexible and in overs, the part-time bowlers, the tail-enders and all-rounders
portable gradient boosting algorithm [16], [17]. A parallel tree play a major role in the T20 format cricket. So fitting the
boosting is provided by this optimized distributed gradient model combining the bowling and batting performance helped
boosting library which runs on major distributed environment. us capture both the batting and bowling ability of a player.
We clustered the tournaments performance based on their
IV. R ESULTS
types. Thus it helped a lot in computation reducing sparsity
After training the model on the players and their statistics and dimension. Also the clustering helped to extract the
from the 2010 domestic season and their status of playing in performance of players in different formats of cricket. A
the next calendar year IPL, we obtained the optimal parameters player may perform poorly or good in various tournaments,
to predict a player to be selected in the IPL based on domestic but clustering helped us capture the consistency of players in
cricket performance. We tested the model on the dataset that specified format.
consisting of statistics from the 2019 calendar year and their
status on the IPL 2020. We got the accuracy of the prediction B. Derived Variables
and different other metrics like precision, recall, and F1 Score Our model has several new limited-over performance
from the classification report. parameters that helped the model capture the ability to
For reference, we labeled the players who were not selected perform well in the T20 format. By fitting the whole
in the IPL as “Class 0” and players who were selected in the training dataset into a logistic regression model, the
algorithm classified each data point based on a probabilistic C. Prediction
value generated by a regression technique. Calculating the Some key metrics were used to assess the classification
correlation coefficient between the probability value with model accuracy. Precision, Recall and F1 Score are some
each continuous variable, we can see how each variable is of the main parameters that are used to evaluate a model’s
related to the prediction. The correlation coefficient between accuracy.
two variables is always between -1 to 1. Having a correlation The Recall is the parameter to measure the correct number
coefficient value close to 1 or -1 for a feature implies that a of predictions, which is the accuracy calculated for each class
player’s selection probability is very much linearly dependent and is given as-
on the feature. Inspecting the correlation coefficients carefully,
we saw that the correlation coefficients for some features T otal number of correct predictions
were very close to each other. Recall =
T otal number of actual samples
Let P be any variable, and X and Y be two independent
variables. The variance of P , X and Y are denoted by We achieved a recall score of 0.79 and 0.80 for respective
V (P ), V (X) and V (Y ) respectively. Let us consider both the classes 0 and 1, with an overall accuracy of nearly 0.80. We
correlation coefficient between P and X and P and Y be ρ. can see that our model predicted 79% of the players that
Now let ρ(P,X+Y ) be the correlation coefficient between and are not selected in the IPL correctly and 80% of players got
the new variable after adding the variables X and Y . Then, selected in the IPL correctly.
Precision is the parameter that is used to measure the
p p
V (P )V (X) + V (P )V (Y ) correctness of prediction for each class. It is given as-
|ρ(P,X+Y ) | = |ρ| p (1)
V (P )V (X) + V (P )V (Y )
T otal number of correct predictions
P recision =
So, we can see that |ρ(P,X+Y ) | ≥ |ρ|. T otal number of predictions
Thus, to increase the linear relationship between the Our testing data was very much imbalanced. Among the
variables and the selection probability, we merged some 1044 players who played domestic cricket in the 2019 calendar
variables to derive new variables. During the merging of year, 940 of them did not get selected in the IPL whereas only
the new variables, we always considered the similarity of 104 Indian players got selected to play in the IPL. From the
the features. The following features were merged into new confusion matrix, we saw that the precision corresponding to
features. Class 0 was much higher, whereas the precision corresponding
to Class 1 was much lower for the imbalance present in the
• ODI Fours and Test Fours were merged into Non-T20
dataset. To tackle the problem, we calculated the precision for
Fours.
a balanced dataset. The balanced precision scores we obtained
• ODI Fifties and Test Fifties were merged into Non-T20
were 0.81 and 0.79 respectively, for class 0 and class 1.
Fifties.
We can infer that whenever our model classifies a player not
• ODI Hundreds and Test Hundreds were merged into Non-
to be selected for the IPL, we can confirm that the prediction
T20 Hundreds.
is 81% accurate. Similarly, when a player is predicted for
• ODI Wides and Test Wides were merged into Non-T20
selection in the IPL, it is confirmed that the prediction is 79%
Wides.
correct.
• ODI No Balls and Test No balls were merged into Non-
F1 Score is calculated as a combination of precision and
T20 No Balls.
recall. It captures both the importance of precision and recall
• ODI Runs Conceded and Test Runs Conceded were
and is given by the following formula.
merged into Non-T20 Runs Conceded.
The correlation coefficients of the old variables along with 2 × P recision × Recall
F1 Score =
the merged variables are shown in the Table III. It is evident P recision + Recall
that correlation has increased after merging the old variables. The balanced F1 Score for each class was 0.80 and 0.80
respectively. Thus our model performed very well for the
ODI −0.53 ODI −0.38 prediction of the players.
Fours Test −0.52 Fifties Test −0.40
Non-T20 −0.62 Non-T20 −0.49 D. Limitations
ODI −0.25 ODI −0.18
Hundreds Test −0.26 Wides Test −0.18 There are various tournaments in India played throughout
Non-T20 −0.32 Non-T20 −0.22 the year in every format. Not all players can play all these
ODI −0.14 ODI −0.40 tournaments in all formats. In our model, we combined the
No Balls Test −0.14 Runs Conceded Test −0.45
Non-T20 −0.17 Non-T20 −0.51 data in all formats, making the statistics zero for a player not
playing tournaments for a particular format. The predictions
TABLE III: Comparison among correlation values of original can be improved considering the combinations of the format
and derived features used in model building played by a player. There can be several combinations for
a player of playing the domestic tournaments. A player can
play tournaments in either Test, ODI or T20 format. A player “Pakistan Super League” etc. or other major ICC tournaments
can play Test and ODI or Test and T20 or ODI and T20 like “Champions League Twenty20”, “T20 World Cup” etc.
tournaments. A player can also play all forms of tournaments
R EFERENCES
in a domestic year. Thus, ensembling separate models for
the combinations could give a better prediction for players’ [1] “IPL Form Will Matter for World Cup Selection: BCCI
Goes for a Reverse Hit — NewsClick.” [Online].
selection. Available: https://www.newsclick.in/ipl-form-will-matter-icc-world-cup-
We have not considered international performance of indian-cricket-team-selection-bcci-goes-reverse-hit
[2] “From Dhoni to Kohli, why skippers prefer their IPL
players while selecting them. Some players might play boys in national team — Business Standard News.”
less matches in domestic cricket leagues, if they have an [Online]. Available: https://www.business-standard.com/article/current-
international tour. Taking international performance into affairs/from-dhoni-to-kohli-why-skippers-prefer-their-ipl-boys-in-
national-team-118022600218 1.html
consideration could have improved the model performance. [3] S. Chakraborty, A. K. Sen, and A. Bagchi, “Combinatorial auctions for
In India, not all players from domestic cricket circuit register player selection in the Indian Premier League (IPL),” Journal of Sports
for IPL auction. Instead of IPL squad data, if one could get Economics, vol. 16, no. 1, pp. 86–107, 2012.
[4] C. D. Prakash, C. Patvardhan, and C. V. Lakshmi, “Team selection
the list of players who appeared for IPL auctions from 2011 strategy in ipl 9 using random forests algorithm,” International Journal
till now, the model could have been trained upon that data. In of Computer Applications, vol. 975, p. 8887, 2016.
that scenerio, model prediction could have been improved. [5] C. D. Prakash, C. Patvardhan, and S. Singh, “A new machine
learning based deep performance index for ranking ipl t20 cricketers,”
Also improving the dataset quality, i.e. having players International Journal of Computer Applications, vol. 137, no. 10, pp.
selection results with less selector bias, should improve the 42–49, 2016.
model for a better prediction success. [6] C. D. Prakash, “A new team selection methodology using machine
learning and memetic genetic algorithm for ipl-9,” Int. Jl. of Electronics,
VI. C ONCLUSION Electrical and Computational System IJEECS ISSN, 2016.
[7] A. Santra, A. Mitra, A. Sinha, and A. K. Das, “Prediction of
In every season of the IPL, some franchises retain their Most Valuable Bowlers of Indian Premier League (IPL),” in Data
valuable players, and other players are bought by the Management, Analytics and Innovation, N. Sharma, A. Chakrabarti,
V. E. Balas, and J. Martinovic, Eds. Springer Singapore, 2021, pp.
franchises from the auction. Thus by selecting a player for 211–223.
a particular franchise, there is an intuition that the selector’s [8] A. Santra, A. Sinha, P. Saha, and A. K. Das, “A novel regression
bias plays an important role in selecting players. A player based technique for batsman evaluation in the indian premier league,” in
2020 IEEE 1st International Conference for Convergence in Engineering
might not get selected after a very good domestic performance (ICCE), 2020, pp. 379–384.
throughout the calendar year for many factors like age and [9] H. Saikia, D. Bhattacharjee, and H. H. Lemmer, “Predicting the
recognition as a good test player but not as a T20 player. performance of bowlers in IPL: An application of artificial neural
network,” International Journal of Performance Analysis in Sport,
Similarly, a player with poor domestic performance can get vol. 12, no. 1, pp. 75–89, 2012.
selected for recognition as a promising talent. [10] “Check Live Cricket Scores, Match Schedules, News, Cricket
Our prediction model here justifies the current selection Videos Online — ESPNcricinfo.com.” [Online]. Available:
https://www.espncricinfo.com/
process. Our overall accuracy of 80% implies that the [11] M. M. Rahman and D. N. Davis, “Addressing the Class Imbalance
selectors’ selection is made based on the performance of the Problem in Medical Datasets,” International Journal of Machine
players in the domestic season. Also, the regional selector Learning and Computing, vol. 3, no. 2, pp. 224–228, apr 2013.
[12] P. Mondal, A. Ghosh, A. Sinha, and S. Goswami, “A Study
bias might play an important role in player selection, which of Interrelation Between Ratings and User Reviews in Light of
restricts our model performance to only 80%. Classification,” in Advances in Intelligent Systems and Computing, vol.
In the IPL, each year, lots of players register for an auction 937, 2020, pp. 689–697.
[13] L. Pelayo and S. Dick, “Applying novel resampling strategies to software
to get selected. The selectors have to manually judge the defect prediction,” in NAFIPS 2007 - 2007 Annual Meeting of the North
performance based on the statistics available, and then they American Fuzzy Information Processing Society, 2007, pp. 69–72.
have to make a selection. So our model can give a very good [14] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
“SMOTE: Synthetic minority over-sampling technique,” Journal of
prediction of each player considering all the performances in Artificial Intelligence Research, vol. 16, pp. 321–357, jun 2002.
the domestic circuit, thus reducing the manual efforts for the [15] G. Lemaı̂tre, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A
selectors. python toolbox to tackle the curse of imbalanced datasets in machine
learning,” Journal of Machine Learning Research, vol. 18, no. 17, pp.
Though the study has been carried out using Indian 1–5, 2017.
domestic performance data and IPL player data, if sufficient [16] J. H. Friedman, “Greedy function approximation: A gradient boosting
data is available then the adopted methodologies can also machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189 – 1232, 2001.
[17] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
be applied to other T20 leagues like “Big Bash League”, in Proceedings of the 22nd ACM SIGKDD International Conference on
“Caribbean Super League”, “Bangladesh Premier League”, Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY,
USA: Association for Computing Machinery, 2016, p. 785–794.

View publication stats

You might also like