0% found this document useful (0 votes)
108 views9 pages

Journal 4 Final

The document summarizes two academic papers on statistical modeling and decision making in soccer. The first paper presents a Poisson betting model to predict soccer match outcomes and calculate optimal betting amounts. It generates lambda values for goal predictions and uses these to calculate probabilities of matches' results. The second paper models team decision making through Markov processes and evaluates shooting policies to estimate their effects on goals scored. It finds areas where shooting is superior to passing and implications for coaching and tactics.

Uploaded by

api-541207438
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views9 pages

Journal 4 Final

The document summarizes two academic papers on statistical modeling and decision making in soccer. The first paper presents a Poisson betting model to predict soccer match outcomes and calculate optimal betting amounts. It generates lambda values for goal predictions and uses these to calculate probabilities of matches' results. The second paper models team decision making through Markov processes and evaluates shooting policies to estimate their effects on goals scored. It finds areas where shooting is superior to passing and implications for coaching and tactics.

Uploaded by

api-541207438
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

STAT375: Statistical Aspects of Competition with Dr.

McShane
Journal # 4 - MITSSAC Judging Due April 16th, 2021 at 10pm Sean Dube

“A Poisson Betting Model with a Kelly Criterion Element”

Generating Lambda values for a Poission Distribution

Authors Kushal Shah, James Hyman, and Dominic Samangy present “A Poisson Betting Model with a Kelly
Critierion Element for European Soccer.” Their motivation is driven by the legality in sports betting in differ-
ent countries and the opportunity to create statistical models to predicts outcomes. They create a model for
European Soccer and measure its performance against betting markets to understand if the model can be used
to generate profits (Kushal Shah 2021). In order to do to this they predicted the number of goals by each team in a
game and utilized these predictions to create a Poisson distribution that determined the probability of each event
(Win,Lose, Draw) and used optimization technique known as Kelly Criterion to determine the optimal amount of
money that should be bet on each match of the 2018-2019 seasons across five major European soccer leagues
(Kushal Shah 2021). The developed an idea of generating 𝜆 values of the Poisson Distributions. 𝜆 represents
the number of goals we expect each team to score in a game. We use four different models to create different
𝜆 values and tested models with each 𝜆. The 𝜆 they generated is adjusted for the strength component if the
team is home or away. This ensured the values of 𝜆 factor in 3 major characteristics: the 𝜆 value accounts for
the opponent, the 𝜆 value accounts for whether the team is playing home or away, and the 𝜆 value updates as
the season progresses, becoming more accurate (Kushal Shah 2021). They use 2018 Premier League matchup
Crystal Palace (H) vs West Brom (A) and the created are based on expected goals, linear regression, and Random
Forest Regression.

Generating Probabilities for Every Event

𝑒−𝑦 𝑋𝜆𝑥
𝑃 (𝑥) =
𝑥!
Above is the formula the Poisson distribution uses to predict x number of events. Shah, Hyman, and Samangy
wanted to predict the probability of a team scoring 3 goals, so x would take the value of 3. To calculate each
outcomes total probability, we use the formula above to calculate every possible score line from 0-0 to 5-5 (Kushal
Shah 2021). Figure 1 is a “Score Line Matrix” that shows the probability of each score line:
The authors say the total probability for every event is the sum of all the score lines that correlate with outcome.
That is, the total probability of a Home win is the sum of the probability of scorelines with the Home team winning.
The same methodology is applied to away wins and draws.
Figure 2 and 3 show distributions of total probability for each model. They concluded an overwhelming majority
have a sum over 0.9. These are from games beginning in match week 5, so as the season progresses, the more
they anticipate the distribution to move even closer to 1. Hence, the Poisson Distribution with each 𝜆 value
generated accurately accounts f or each outcome in the game (Kushal Shah 2021).

Model Calibration

Shah, Hyman, and Samangy used a model calibration technique to determine which model generated the most
accurate 𝜆 value. They achieved this using a Brier Score for predictions using the formula provided below.

1 𝑛
𝐵𝑆 = ∑(𝑝 − 𝑜𝑖 )2
𝑛 𝑖=1 𝑖

1
Figure 1: Score Line Matrix

Figure 2: Probabilitiy Sum of Goals and Expected Goals Model

Figure 3: Probabilitiy Sum of Linear Regression and Random Forest Model

2
Their goal is to measure if the predicted outcomes have a certain probability occuring in that proportion. The best
performing model would have probability values corresponding to the respective interval (Kushal Shah 2021).
Figure 4 is a calibration model that showcases all 4 models and how well they do in predicting the home team and
the away team to win. The dashed red line is what perfect calibration would look like and provides a reference
for comparison. The model closest to the red line would be the best model in terms of predictions. In the case
of both home wins and away wins the Random Forest is the best model (Kushal Shah 2021).

Figure 4: Calibration Models

Bankroll Management System (Kelly Criterion)

The authors describe a Bankroll Management System where the goal is to maximize profit while accounting for
risk on a lost bet (Kushal Shah 2021). They base this system of the Kelly Criterion which is given by the formula
below:
(𝑂 ∗ 𝑃𝑤 ) − 𝑃𝑙
=𝐵
𝑂
They are able to achieve their goal by maximizing the logarithm of the potential ending bankrolls after the bet is
placed.

Model Evaluation

Shah, Hyman, and Samangy emphasize that it is important to understand the KCO (Kelly Criterion Objective)
value associated with each bet. They explore the relationship between KCO value and Net Income (profit or loss
from a bet) which is displayed graphically in Figure 5. Overall, the authors interpret that as KCO value increases,
the net income of the bet tends to increase (Kushal Shah 2021).

3
Figure 5: KCO Value vs Net Income for Each Bet

Leaving Goals on the Pitch: Evaluating Decision Making Soccer

“Leaving Goals on the Pitch: Evaluating Decision Making Soccer” is presented by Maaike Van Roy, Pieter Rob-
berechts, Wen-Chi Yang, Luc De Raedt, Jesse Davis, and KU Leuven. They attempt to reconcile the insights of
from expected goals with by using a nuanced approach to analyze decision making and policies around shooting
in soccer (Maaike Van Roy 2021). They specifically model the probability that a team will attempt to the move the
ball to a specific location or shoot based on the current location of the ball. They use techniques from probabilis-
tic verification to analyze a team’s chance of scoring if they would shoot from a specific state versus the chance
of scoring if they would forgo a shot. They also explore the effects of a modifying a team’s shooting policy from
various locations on the total number of goals that a team would be expected to score over the course of a season
(Maaike Van Roy 2021).

Modeling Team Behavior

In order to model a team’s behavior, the group use Markov Decision Processes (MDPs). This process models how
an agent behaves in a dynamic environment and how the environment transitions between different states based
on which action the agent decides to perform (Maaike Van Roy 2021). First they define the process which consists
of understand the state space, the action space, the transition function, the policy, and the reward function. They
highlight learning the MDP which can be done by using transitions in the data to estimate probabilities.

Evaluating Decision Making with Counterfactual Reasoning

In evaluating a team’s decision making, the authors consider multiple questions: What should players decide to
do and what if a team decided to alter its policy? (Maaike Van Roy 2021). They find out that evaluating a player’s
decision making by evaluating the potential benefits of different courses of actions is non trivial. They describe
how the AI field of probabilistic model checking can be a powerful tool for reasoning about the probability of
certain behaviors that arise in MDP. When determining whether a team should alter its policy, the authors assume
that when the probability of performing an action increases in a state, the probability of all other actions decreases
proportional to their original share (Maaike Van Roy 2021). They also evaluated the effectiveness of a policy by
estimating the number of goals a team is expected to sore over a season over the given policy. They formally
state this estimation:

𝐸[𝑔𝑜𝑎𝑙𝑠|𝜋] = ∑ 𝐸[𝑠ℎ𝑜𝑡𝑠𝑖𝑛𝑆|𝜋] ∗ 𝑃 (𝑠, 𝑠ℎ𝑜𝑜𝑡, 𝑠𝑠𝑔 𝑜𝑎𝑙 )


𝑠∈𝑓𝑖𝑒𝑙𝑑𝑠𝑡𝑎𝑡𝑒𝑠

The authors implemented this policy using Premier League soccer teams focusing on shooting more or less from
distance. Figure 6 which shows the effect on the number of expected goals a team would score when uniformly

4
increasing or decreasing the frequency from the long-distance region (region where players are more likely to
shoot from distance).

Figure 6: Goal Difference Between a uniform adjustment and the original policy

Conclusions

Overall the paper proposed a framework about decision making and the possible effects of decisions in soccer by
combining techniques from machine learning and artificial intelligence. They applied this framework to explore
the effects off shooting from long-distance on two different levels (Maaike Van Roy 2021). They discovered that
each team has several specific areas where shooting tends to be the superior decision than attempting to move the
ball to get a better chance later in the game. They found several implications and applications for practitioners in
terms of coaching players, preparing for a specific opponent, or setting a team’s tactics (Maaike Van Roy 2021).

5
Routine Inspection- A Playbook for Corner Kicks

Laurie Shaw of Harvard University and Sudarshan Gopaladesikan of SL Benfica team up to explore strategy on
corner kicks. Tracking data has helped them perform a detailed analysis of the synchronized runs made by the
attacking players. By identifying and classifying distinct corner routines, they find those that most frequently
create high-quality scoring opportunities. They say “Using statistical and machine learning techniques, we have
developed tools to classify the coordinated runs made by the attacking players during corner kicks, enabling us
to identify the distinct corner routines employed by teams in tracking data” (Shaw and Gopaladesikan 2021).
They mention tracking data can be used to identify and classify the defensive strategies used by teams to repel
corner kicks. Shaw and Gopladesikan write “to study defensive strategy in more detail, we have developed a
supervised classification algorithm to identify the roles of individual defenders in corner kick situations” (Shaw
and Gopaladesikan 2021). Shaw and Gopaladesikan compiled 234 matches of tracking and event data from a
season of an elite European professional league. They use event data to provide an initial estimate for the time
at which corners occurred. They had a sample of 1273 corner kicks.

Classifying corner routines

They implement Gaussian mixture modeling to classify attacking player runs into tuples based on their start and
end location and a topic model to identify runs that frequently cooccur in corner routines.

Figure 7: Corner Routines

Figure 7 shows two examples of corner routines from their sample. They classify a player run by defining it terms
of the initial and target location of each player; they don’t consider the player’s full trajectory. The allocate players
to distinct pairs of zones based on their initial and target locations (Shaw and Gopaladesikan 2021). Runs made
by attacking players are coordinated and sychronized. Shaw and Gopaladesikan identify types of runs that are
frequently combined in the same routine.

Figure 8: Corner Routines

6
Figure 8 shows a matrix of thirty features, or frequently co-occurring runs identified in their topic model. They
conclude that distinct corner routines in the dataset can be identified by grouping corners that exhibit similar
feature expressions using agglomerative hierarchical clustering (Shaw and Gopaladesikan 2021).

Practical Applications

Shaw and Gopaladesikan apply their method in various way. Their methodology enables them to rapidly identify
the key features of the corner strategies used by teams, both offensively and defensively, over hundreds of
matches. They found a popular strategy used by almost every team in our sample is the jellyfish. Three or four
players start in a cluster outside the six-yard box before making gradually diverging runs towards the box (Shaw
and Gopaladesikan 2021). Figure 9 shows examples of popular corner routines employed by the four teams in
their dataset.

Figure 9: Corner Routines

They also identify zonal marking in tracking data and analyze the effectiveness of different configurations of zonal
defenders. Figure 8 is a diagram of a comparison of shots and goals conceded by two different zonal systems.

Figure 10: Two zonal system and four zonal system

They ran 366 corner kicks by eight different teams on with four zonal defenders and 600 corners by ten different
teams with two zonal defenders. Overall the found that while the four-zone system conceded slightly more shots
on goal, those shots tended to be from a greater distance than the shots conceded by the two-zone system and
were therefore less threatening (Shaw and Gopaladesikan 2021).

7
Judgment

Judge Paper/Poster/Presentation #1

“A Poisson Betting Model with a Kelly Criterion Element” demonstrates how a Poisson distribution can be used to
help make bets in English Premier League soccer. This presentation is explained well with good flow and a clear
objective. The authors do a great job of explaining their approach by defining equations and creating visuals to
help understand their method. their method seems valid as they use several techniques to account for conflicting
factors which ultimately strengthens the credibility of their research. The poster is informative with a structure
format flowing with their research. Graphs are explained well and tables shown to explain results. Similar to the
posters of other paper, there is a lot of small text that can be difficult to read. I would suggest to increase the
font size and make the poster descriptions more concise.

Judge Paper/Poster/Presentation #2

“Leaving Goals on the Pitch: Evaluating Decision Making Soccer” is a very insightful paper exploring strategy in
professional soccer. The statistics used for its analysis are advanced and the authors did a great job of explain
their research. I couldn’t clearly tell where they got the data from, but I would advise to dedicate a paragraph
to talk about where the data came from. The poster is informative and structured well with colorful graphics to
display their findings. I noticed the title of their project is written in a smaller font size than the subtitle above the
graph and matrix. There is a lot of small text that is difficult to read which can lead to readers not understand the
objective of the poster. It was easy to follow the presentation. The presenter spoke well and carefully explained
their research. I was impressed how they applied the Markov Decision Process and explained their reasoning for
using this method. They create 375 states in the attacking half and use probability to explore whether or not a
player should shoot. Some suggestions I would advise are to format the poster with larger text and to try include
less graphics and try to tell a story in the introduction. Lastly, I did not notice any limitations in the conclusion.
Limitations are important to the credibility of a work and make the study reproducible.

Judge Paper/Poster/Presentation #3

“Routine Inspection- A playbook for Corner Kicks” is the best of the three papers. Shaw and Gopaladesikan
effectively explore corner kick strategy using graphics that help visualize their analysis. The paper is well written
with good flow and necessary explanations. The poster is structured well with pictures and graphs, but I would
note there is a lot of small text that can be difficult to read on screen. There page is crowded with photos which
can make the pictures seem not important. I do like how they incorporate a video to explain corner kick plays. This
is helpful for those who are not familiar with soccer. The presentation is well done and each speaker effectively
explains the research.

Coda

I enjoyed reading these summarizing these presentations because I got exposure to concepts of soccer analyt-
ics that I have never considered before. Each paper was insightful, and I learned a lot about apply statistical
techniques to certain aspects of game. My favorite paper was Routine Inspection: A Playbook for Corner Kicks
because of how tracking data was implemented to test corner kick play. I have a big passion for soccer, and
tracking data has become revolutionary for analyzing soccer and creating video games like FIFA. If possible, I
would like to attempt a project like this where I analyze tracking data for our soccer team and try to determine
which set pieces strategies will work well for us. After learning about Markov Chains in previous class activities
the second paper “Leaving Goals on the Pitch: Evaluating Decision Making Soccer” was a cool application of
Markov Decision processes. This group looked at whether or not teams should shoot more from long distance.
Long distance shooting is an ability I tend to employ in my game and a tactic that I use when playing FIFA. I
certainly noticed that long distance shooting can lead to more expected goals. Of course this all depends on the
quality of the player and that is what is accounted for when using Markov Decision processes.

8
References
Kushal Shah, D. S., James Hyman (2021), “A Poisson Betting Model with a Kelly Criterion Element for European
Soccer,” MIT Sloan Sports Analytics Conference.
Maaike Van Roy, W.-C. Y., Pieter Robberechts (2021), “Leaving Goals on the Pitch:evaluating Decision Making
in Soccer,” MIT Sloan Sports Analytics Conference.
Shaw, L., and Gopaladesikan, S. (2021), “Routine Inspection: A Playbook for Corner Kicks,” MIT Sloan Sports
Analytics Conference.

You might also like