Prediction and Analysis of Franchise Cricket
Prediction and Analysis of Franchise Cricket
Abstract:- In our present world, sports produce a very classification techniques like Artificial Neural Networks
large amount of statistical data. What makes cricket (ANN), Support Vector Machines (SVM), Bayesian method,
different from other sports is the number of variables Decision trees, Fuzzy system and Logistic Regression. The
involved in it right from the pitch to conditions playing datasets that are used were, first 15 weeks of NFL 2003,
under, a breeze to the length of the boundary line and Rugby League, ACB Basketball League 2008-9 season,
likewise many other which makes every game has its NBA League 2005-6 to 2009-10 seasons, last 15 years of
prominence maybe that’s the reason haven’t got bored Netherland soccer and many other datasets. By evaluating
of the game though it is more than 100 years old. From the literature in this, they have detected two major
the day we started using analytics in cricket to today the challenges, first is the need for further research in order to
game evolved massively, it changed the way the players, obtain better prediction accuracy and second is the lack of
coaches look at the game and it brought a new dimension general and comprehensive statistics datasets that force
to the game. IPL has been a carnival of cricket and researchers to collect data from sports websites. They have
showing the potential of cricket to the world and acted as suggested a few things to improve the prediction accuracy,
a bridge in carrying the game to a wider range of they are - using the data mining and machine learning
audiences. In present-day IPL we are using every techniques that have yielded good results in other fields,
statistic that’s available because of the high also using hybrid algorithms can boost the accuracy, and
competitiveness of the tournament. This project is a including more valid features for prediction.
sincere effort to find hidden insights in the IPL by using
the data of previous seasons. Several studies have been done on player’s
compensation in various sports. For example, Estenson
I. INTRODUCTION (1994) studied player compensation in baseball. Likewise
Dobson and Goddard (1998) and Kahn (1992) considered
The Indian Premier League (IPL) is a professional some of the issues in the field of football. There are also
Twenty20 cricket league in India that has captured the studies related to ice-hockey by (Jones and Walsh, 1988)
imagination of cricket fans worldwide. Founded in 2008, the and basketball (Berri, 1997). There are quite a few studies
league has since grown in popularity and has become a that deal with scheduling cricket matches. But there hasn't
lucrative platform for both players and franchises. The IPL been any significant research in the field of player price
is not only a source of entertainment but also a goldmine for analysis which says what amount of money a player can be
data analytics. Franchises are leveraging data analysis to given which is done by "Siddhartha K Rastogi and Satish Y
make informed decisions on team selection, strategy, and Deodhar". Hedonic price analysis is based on the hypothesis
performance analysis. Data analysis has become an integral that a good/service can be treated as a collection of
part of the IPL, and it has helped franchises gain a attributes that differentiates it from other goods/services.
competitive edge over their rivals. In this paper, we will
explore the various uses of data analysis for IPL franchises Pabitra Kumar Dey, Gangotri Chakraborty, Purnendu
and how it has impacted the league. The main intention of Ruj, and Suvobrata Sarkar in the paper "A Data Mining
performing this analysis is to help the franchises in order to Approach on Cluster Analysis of IPL'' [3] has made used of
build a strong team and help them with the players MATLAB to produce a clustering algorithm based on fuzzy
individual statistics over the years in the IPL. logic to classify the batting statistics of IPL into a number of
clusters. Here they divide the data into four clusters and the
II. LITERATURE REVIEW goal is to determine the grouping in a set of unlabelled data.
The criterion for the classification is run/ball as a parameter.
In this section, an overview of a few recent research When no. of clusters are 4 the accuracy if classification is
activities will be presented where sports (cricket) data has 73.48%. They obtained the results for the test set and the
been used for knowledge and analysis. accuracy was measured for the particular machine learning
model used to predict the winner of the match. They predict
Haghighat M. Rastegari H and Nourafza N. (ACSIJ based on the different teams of the IPL, and are predicted by
Advances in Computer Science: an International Journal, analysing every over, so that the winner can be predicted in
2013) have reviewed data mining techniques for result almost any situation of the match. The accuracy for a
prediction in sports [1]. They have also evaluated the selected number of attributes for each team using feature
advantages and disadvantages of the reviewed data mining selection was also measured. For every model generated the
techniques. The techniques that are reviewed are highest accuracy for a team to win is being predicted. Thus,
IJISRT25JAN058 www.ijisrt.com 1
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14576672
the graph displays which team has the highest accuracy in undergoes pre-processing, where it is transformed and
each generated model. normalized to prepare it for analysis. The processed data is
then analyzed using various data mining techniques,
Daniel Mago Vistro, Faizan Rasheed, Leo Gertrude including analysis. This includes visualizations such as
David (International Journal Of Scientific & Technology charts, graphs, and tables to present the data in a user-
Research, 2019) have proposed a model to predict the friendly manner.
winner in cricket match using Machine learning and Data
Analytics [2]. The Machine learning algorithms used to train Dataset
with datasets are SVM, Random forest, Naive Bayes, The IPL datasets on Kaggle contain data on all IPL
Decision trees and Logistic Regression. They have also used matches from 2008 to 2020. The dataset includes
an XGBoost classifier which is also called gradient boosting information on the match details such as venue, date, and
and makes very fast calculations by tree algorithms. The winner, as well as ball-by-ball data, player statistics, and
dataset taken is IPL data of the year 2008 to 2017. They are team standings. The ball-by-ball data includes information
using the model’s confusion matrix to evaluate the on the deliveries such as the type of delivery, runs scored,
performance. By visualizing attributes of data with the target wickets taken, and the player involved. The player statistics
variable the best features were selected. The accuracy include information on batting, bowling, and fielding
obtained by the models were Decision tree classifier - performance. Cricsheet, on the other hand, provides a more
76.9%, Random forest classifier - 80.76% and XGBoost extensive dataset that includes ball-by-ball data for all
classifier - 94.23%. The prediction produced by their model international cricket matches, including the IPL. The dataset
required a lot of domain information and expertise for contains information on over 700 T20 matches played in the
observations. IPL between 2008 and 2021. It includes detailed information
on the deliveries, such as the type of delivery, runs scored,
The paper (A MCDM Approach for Evaluating wickets taken, and the fielding events. The dataset also
Bowlers Performance in IPL [6]) is about drawing statistics includes information on the players, such as their batting
on the bowler in all of his three forms (criteria for ranking— and bowling statistics.
1.played at least 3 matches. 2. Bowled at least 8 overs 3. Got
at least 1 wicket) and performance and ranking is given V. RESULT AND EVALUATION
using AHP-TOPSIS and AHP-COPRAS.The first three
attributes are negative attributes as lower the value of the There are numerous aspects which play crucial part in
attribute, higher the performance of the bowler. The latter cricket, We considered some of the most important and
three are positive attributes. If a bowler's performance is impactful factors and analyzed them thoroughly.
good in a match we can’t say or analyze his overall ranking
in the series so the criterion is that the bowler should at least A. Toss Decision vs Result:
play 3 matches, bowled for at least 8 overs and took at least The toss in cricket, often perceived as a simple coin
a wicket flip, plays a significant role in determining the outcome of a
match, particularly in the Indian Premier League (IPL).
III. SOFTWARE REQUIREMENTS Environmental factors such as dew heavily influence toss
decisions. In evening matches, dew creates challenging
Practically every field uses Python for a variety of conditions for bowlers by reducing their grip on the ball,
tasks and activities. It supports a variety of paradigms for making it advantageous for teams to bowl first. Similarly,
programming, including structured (particularly procedural), the pitch is another critical factor. Its behavior can vary
functional, and object-oriented. The extensive standard based on location, weather, and even the duration of the
library of this language has earned it the moniker "batteries- match, adding complexity to the decision-making process.
included" language. Python gives programmers some of the
best flexibility and capabilities, which will improve their Beyond environmental considerations, team
efficiency and capacities as well as the quality of their code. composition and strategy also impact toss decisions. Teams
Python also has a vast library that helps with the heavy must assess their strengths and weaknesses, as well as those
workload. Libraries for machine learning methods include of their opponents, to determine whether batting or bowling
NumPy, Pandas, Scikit-Learn, and NLTK. first will provide a strategic advantage. While the toss may
appear inconsequential at a glance, its implications for team
IV. METHODOLOGY performance and match dynamics are profound. As such,
IPL teams devote considerable effort to analyzing these
The IPL data is sourced from the Kaggle website, variables, recognizing the pivotal role the toss plays in
which acts as the input source for the analysis. The data shaping the trajectory of the game. This study delves into
extraction process involves retrieving the raw data from the the underlying factors that make toss a critical element in
source and storing it in a dataset. The raw dataset is then IPL cricket.
cleaned to remove any inconsistencies, errors, or missing
values. Once the data cleaning is complete, the dataset
IJISRT25JAN058 www.ijisrt.com 2
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14603646
IJISRT25JAN058 www.ijisrt.com 3
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14576672
B. Analysis of Phasewise Runrate for Team Batting First vs Team Batting Second(Year Wise) (Powerplay, First Two, Next Four
overs vs Middle overs vs Death overs):
Let’s break this analysis down year by year to see how teams’ batting strategies have changed over time. When batting first,
teams usually approach the innings in phases: the first two overs, the next four, the middle overs, and the death overs. The first
two overs are about understanding the conditions and adjusting their play. The next four focus on taking advantage of the
powerplay to score quickly. In the middle overs, teams work on building partnerships and keeping the scoreboard ticking. Finally,
the death overs are all about going big, with set batsmen aiming to rack up as many runs as possible.This phased approach helps
us see how strategies have evolved. The run rate—a key metric in the IPL—is a simple but powerful way to measure batting
performance. It’s calculated by dividing the total runs by the overs faced. It not only shows how quickly a team scores but also
gives franchises insights into their overall performance and areas to improve.
IJISRT25JAN058 www.ijisrt.com 4
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14603646
Run rate also plays an important role during the match, as it helps the teams to set a target or chase a target. If a team is
batting first, they need to score runs at a higher rate to post a challenging total on the board. Similarly we analyzed and compared
all the run-rates over the years and making the franchise understand their weakness and strengths in the powerplay,middle-overs
and final-overs.
IJISRT25JAN058 www.ijisrt.com 5
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14576672
By the above figure 6 in which it describes the average run-rate of Chennai super kings over the years, as we can see run-rate
is quite good, in-fact great but there is slight decrement in the middle overs where it is the main region where CSK is struggling,
so that they need settled batsmen who can defend the as well as who can hit at least one boundary in an over in the middle overs.
CSK is one of the most successful teams in the IPL and the main strength of CSK is powerplay and death-overs in the batting
because they have a powerful opening pair and a destructive death batting. The middle overs is the most important phase in the
IPL because most of the wickets will fall in this phase if the Batting order collapses then the chances of winning comes close to
zero, so having a middle order batsmen is very much important and this can help CSK franchise to produce better performance in
the tournament.
As we can see in the above figure 7 SRH is not only failing in the middle overs but also in the early death overs , SRH is one
of the franchises where it depends mostly on the top order batsmen failing to produce a better middle order and death over
batsmen in order overcome this problem rather than depending upon only on the top order they need a middle order batsmen who
can produce a descent innings and make a strong partnership with the death-over batsmen so that SRH will be able to compete
with opponent while chasing a total. When we try compare CSK and SRH both the teams performed almost the same in the
powerplay but the major change occurs in the middle and death overs ,where CSK is producing more runs in the death-overs than
SRH and turning most of the matches towards them , So this way we can compare one team with each every team and find out the
weak links amongst the franchises.
IJISRT25JAN058 www.ijisrt.com 6
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14603646
IJISRT25JAN058 www.ijisrt.com 7
Volume 10, Issue 1, January – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14576672
VI. CONCLUSION AND FUTURE SCOPE [9]. Kimber, Alan C., and Alan R. Hansford. “A Statistical
Analysis of Batting in Cricket.” Journal of the Royal
Studying franchise cricket in the IPL has given us Statistical Society. Series A (Statistics in Society), vol.
some fascinating insights into how teams perform, how 156, no. 3, 1993, pp. 443–455. JSTOR,
strategies evolve, and how players shine on the big stage. By www.jstor.org/stable/2983068. Accessed 28 Feb. 2020.
digging into large datasets with advanced data mining tools, [10]. Manage, Ananda & Butar Butar, Ferry. (2007).
we uncovered patterns and trends that aren’t immediately Statistical analysis in one-day cricket. Proc. Amer.
obvious. These insights help teams make smarter choices Statist. Assoc.. 2600-2605.
about player selection, match tactics, and overall strategies.
They also offer fans and analysts a deeper look into the
nuances of the game, making cricket even more exciting to
follow.
REFERENCES
IJISRT25JAN058 www.ijisrt.com 8