1 s2.0 S016920702300033X Main
1 s2.0 S016920702300033X Main
article info a b s t r a c t
Keywords: The paper presents a model for forecasting the results of football matches, which takes
Sports forecasting into account the abilities of the players on each team. The advantage of this approach is
Football that the dynamic nature of team strengths is incorporated into the model directly. We
Betting
test our model against the bookmaker’s predictions and in a Kelly-type betting strategy
Rating
applied to the pre-match win/draw/loss market. The new model results in significant
Ranking
positive returns to betting.
© 2023 The Authors. Published by Elsevier B.V. on behalf of International Institute of
Forecasters. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
https://doi.org/10.1016/j.ijforecast.2023.03.002
0169-2070/© 2023 The Authors. Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under
the CC BY license (http://creativecommons.org/licenses/by/4.0/).
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
paper with a summary of our findings and thoughts for as the basis of a forecasting model. Peeters (2018) do not
future work. use player ratings to forecast match results. Instead, they
use a simple average of the crowd-sourced player transfer
2. Recent literature valuations from Transfermarkt.com for the two teams and
find that their predictions outperform the team-based
Since Maher (1982) and Dixon and Coles (1997), many rating model they use. A major contribution of the model
published models for forecasting the scorelines (and/or we propose here is that we do not use simple averages of
results) have continued to use the same basic specifica- player ratings on the two teams as the basis for generating
tion. Team attack and defence strengths are estimated, the forecasts. Instead, we build a model to mimic how
and a team’s attack strength interacts with the opposing each player on one team interacts with each player on the
team’s defence strength and vice versa. The beauty of this opposition team.
specification cannot be understated. It represents the real-
ity of football: the attackers on one team interact with the 3. Data
opposition’s defending players. Boshnakov et al. (2016)
follow the lead of Maher (1982) in their model specifi- The data requirements of our model are non-trivial–
cation and use a bivariate Weibull count distribution as which would be the case for any player-based model.
the underlying probability distribution for the counts of Three required data sources cover the player ratings,
goals. match event, and odds data. Each data set was obtained
The Elo rating system has been used to model foot- for all seasons from 2013/14 to 2020/21. All processing of
ball (see, for example, Hvattum and Arntzen (2010)), es- data and subsequent modelling was performed using the
timates team strengths based on previous results, and R programming language (R. Core Team, 2022).
includes a method for updating the team strengths as
new results are recorded. The pi-ratings of Constantinou 3.1. WhoScored player matchday ratings
et al. (2012) and the GAP ratings of Wheatcroft (2020)
follow similarly in which team ratings are updated as new Our modelling framework requires individual player
information is recorded. ratings of the players on the pitch (the line-ups and iden-
Following the ‘Soccer Prediction Challenge’ (Dubitzky tity of players on each team are announced a minimum
et al., 2019), a flurry of papers adopting machine learning of 30 minutes before a match and often known well in
techniques were published. Berrar et al. (2019) ‘won’ the advance) as inputs into a model for the results of matches.
competition with an ensemble of gradient-boosted trees. We collected match performance ratings published by
But perhaps most noteworthy is their conclusion that WhoScored.com. In total, we used 1,505,177 individual
incorporating domain knowledge in forecasting models ratings attributed to 24,167 unique players. These ratings
for football is a key driver of forecasting success. Hubacek spanned 14/07/2013 to 31/05/2021.
et al. (2019) came a close second using a combination
of rating models for teams, including pi-ratings, Elo and 3.2. Match event data
Google PageRank. Constantinou (2019) performed well
using the pi-ratings, as did Tsokos et al. (2019), who used Match event data describes all of the actions (shots,
a Poisson model with scoring intensities allowed to vary passes, tackles, interceptions, etc.) within a match and is
with time according to an INLA process. In our round-up becoming more commonplace in football literature. We
of machine learning models, we mention da Costa et al. use such event data as part of a series of multinomial
(2021), who estimated the probability of both teams to models to estimate the level of interaction between two
score using machine learning classifiers but notably used opposing players (which will be introduced in Section 5).
team-level variables. InStat provided the match event data.
Despite the efforts of the machine learning commu- For the multinomial models, we required all defensive
nity, the marginal gains in terms of predictive accuracy actions: aerial duels, blocks, ground duels, interceptions,
are limited. For example, the best-performing model in and all shots. In addition, we required the playing posi-
the ‘Soccer Prediction Challenge’ achieved an accuracy of tions of players and the formations of both teams. We en-
53.88%, whereas the worst (of the serious entries) had sured that position or formation changes during a match
an accuracy of 50.49%. As Berrar et al. (2019) stated, were accounted for.
domain knowledge is a key driver to success, and machine The final forecasting model uses matches within the
learning algorithms have the unattractive property of not top five European leagues from seasons 15/16 to 20/21
representing reality. Like the Maher (1982) and Dixon and (this will be discussed further in Section 6.1). The multi-
Coles (1997), our model has the attractive property of nomial models are based on actions within the top five
representing the reality of how football is played. leagues from 13/14 to 14/15 to ensure our predictions are
Despite the clear benefits, there have been few at- fully out-of-sample. The final dataset included 764,712
tempts to utilise a player-based model for forecasting defensive actions and 108,286 shots.
football match results. Kharrat (2016) and Lasek (2019) Match information such as the results, scoreline, iden-
use player ratings from the popular FIFA video game tities of teams and players, and formations of the teams
franchise in forecasting models. Arntzen and Hvattum were also provided within the InStat data. Further, details
(2021) utilised the difference in the simple average of the of any changes in team formation occurring during a
regularised plus-minus player ratings on the two teams match, and the timing of the change, were recorded.
303
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
Fig. 1. Histogram of the raw WhoScored ratings achieved by all players. Players are separated by whether they started the match or not.
Table 1
Top ten WhoScored average ratings achieved by players over a two-year rolling window (RWS). Players must have
made at least ten appearances and can only appear once in the table.
Player Date Team League RWS
Neymar 09/04/2019 PSG France Ligue 1 8.821
Lionel Messi 19/12/2018 Barcelona Spain LaLiga 8.698
Hakim Ziyech 12/02/2020 Ajax Netherlands Eredivisie 8.429
Carlos Vela 20/08/2020 Los Angeles FC USA Major League Soccer 8.394
Cristiano Ronaldo 02/09/2016 Real Madrid Spain LaLiga 8.240
James Tavernier 07/12/2020 Rangers Scotland Premiership 8.174
Kylian Mbappé 05/12/2020 PSG France Ligue 1 8.149
Robert Lewandowski 23/05/2021 Bayern Germany Bundesliga 8.098
Luuk de Jong 19/09/2019 Sevilla Spain LaLiga 8.073
Zlatan Ibrahimovic 21/02/2017 Man Utd England Premier League 8.026
3.3. Odds data for Granada when they lost 9-1 to Real Madrid in 2015,
resulting in a score of just 1.89. The resultant ratings are
Finally, historical betting odds were used to test the popular amongst fans and the media. Fig. 1 displays the
predictive capabilities of our forecasting model and home- histogram of ratings achieved by players. We separate
win, draw, and away-win odds were obtained for Bet365 players by whether they started in the match or came on
(a bookmaker) from football-data.co.uk. as a substitute.
By taking an average of the WhoScored match perfor-
4. Player ratings: League-adjusted WhoScored ratings mance ratings for a player, one obtains an idea of how
good the player is and how they might be expected to
Before describing the modelling framework, we present perform in future matches. Let a player’s raw WhoScored
the player ratings system we use as the forecasting model’s rating (RWS) be the average rating they achieved over two
input. years. Table 1 shows the highest RWS ratings of individual
The basis of our player-ratings model (which will be players throughout the data.
described in more detail in Section 5 and Section 6) Although some high-profile and widely accepted top
are the matchday ratings published by WhoScored.com. players offer reassurance that the WhoScored ratings are
WhoScored publishes performance ratings for every player meaningful, there are some unexpected names in Ta-
within a match based on their in-match actions. Although ble 1 (namely: Carlos Vela, James Tavernier, and Luuk
the methodology for calculating the ratings is not fully de Jong). This highlights three potential problems with
in the public domain, the general, top-level concept is the WhoScored ratings and taking a simple average of
described on the WhoScored website.1 To summarise, the match performance ratings. First, it appears that the
a player in a match starts with a rating score of 6. As methodology does not adjust the match performance rat-
the game progresses, a player receives points for actions ings for the league’s quality (the quality of the play-
deemed to impact team performance positively and is ers within a league) such that performances in different
penalised for actions that have been judged to have a leagues are not directly comparable. As such, an adjust-
negative impact on the team. Players can earn a maxi- ment to the WhoScored player ratings is needed to ac-
mum match performance score of 10. There have been count for the strength of the league and the players within
1084 instances where a player has received a 10. The that league.
famed ‘‘MSN’’ trio, Lionel Messi, Luis Suaréz and Ney- Second, some players appear in a small number of
mar, hold the records with 52, 20, and 22 perfect rat- games. Taking a simple average of their performance rat-
ings, respectively. The unfortunate recipient of the lowest ings to estimate how they can be expected to perform in
recorded rating is Oier Olazábal, who was the goalkeeper
the future is likely to result in volatile, unrealistically high
or low average ratings. Third, it is likely that more recent
1 See https://www.whoscored.com/Explanations performances by a player are more relevant to how that
304
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
Table 2
Top ten Adjusted WhoScored (AWS) ratings achieved by players. There is no minimum number of games required for
a player to appear in the table.
Player Date Team League AWS
Lionel Messi 19/05/2021 Barcelona Spain LaLiga 1.799
Neymar 20/02/2018 PSG France Ligue 1 1.603
Cristiano Ronaldo 07/08/2015 Real Madrid Spain LaLiga 1.397
Robert Lewandowski 23/05/2021 Bayern Germany Bundesliga 1.225
Kylian Mbappé 01/11/2020 PSG France Ligue 1 1.166
Kevin De Bruyne 20/01/2021 Man City England Premier League 1.106
Eden Hazard 10/05/2019 Chelsea England Premier League 1.100
Zlatan Ibrahimovic 20/08/2016 Man Utd England Premier League 1.091
Hakim Ziyech 11/03/2019 Ajax Netherlands Eredivisie 1.075
Harry Kane 21/01/2018 Tottenham England Premier League 1.053
player is expected to play than performances further in one-half that of the most recent matches when estimating
the past. the player’s adjusted WhoScored rating.
These problems can be addressed in a regression model, Table 2 shows the resulting top ten players accord-
which we use to generate ‘adjusted WhoScored ratings’ ing to the adjusted WhoScored ratings, which we denote
(AWS). The dependent variable equals the ‘raw’ WhoScored by AWS. Due to the shrinkage, there is no need to set
match performance rating. The covariates include dummy a threshold of ten games for the minimum number of
variables for the player and league and a home indicator matches a player must have played to appear in the table.
to allow for home advantage. To be explicit, suppose we The list of players is a who’s who of the top footballers
observe y1 , . . . , yN WhoScored ratings. For observation i, offering strong reassurance that the new adjusted ratings
let p(i) denote the player who achieved that rating, let l(i) are meaningful. The surprise inclusions in Table 1 have
denote the league it was achieved in and let h(i) indicate now disappeared from the top 10. Vela and Tavernier
whether the player was competing at their home ground. were competing in the MLS and Scottish Premiership at
Then our model can be written as the time of their top ratings. De Jong transferred from
the Eredivisie on 01/07/2019, shortly before his maximum
yi = α0 + h(i)α1 + βp(i) + γl(i) + ei , (1) RWS rating. Consequently, the high RWS ratings achieved
by these three players consisted of good performances in
where ei ∼ N(0, σ ) and β and γ are the estimated ratings
2
relatively easier leagues. The AWS ratings account for the
of each player and league, respectively, whilst α0 is the
weaker leagues; hence, the adjusted ratings are lower.
intercept, and α1 represents a home advantage parameter.
A potential problem with the AWS ratings presented
To account for players with small numbers of games,
in Table 2 is that forward players dominate it. Indeed, the
we shrink the ratings towards the average rating by
majority of the top 50 players are forwards. Of course, it
adding ‘fake’ games. In these fake games, we assume a
may be that the best players in the world are forwards;
player competed in a match within their current league.
after all, they attract the highest wages and transfer fees.
They receive a rating equal to the average within their
But it is also possible that the AWS ratings are biased to-
current league, and we assume home advantage is equal
wards forward players. One could use different rating sys-
to 0.5 (which means these games are effectively played tems in our modelling framework, but as demonstrated by
at a neutral venue). By increasing the weight of these the performance of the forecasting model (see later), the
pseudo-observations, we can adjust the level of shrinkage. AWS ratings perform well.
Letting ω denote the weight, when ω = 0, the model As a sense check of the newly adjusted WhoScored
has no shrinkage. As ω increases, the level of shrinkage ratings, we calculate the average rating of the eleven
increases. This itself is a hyper-parameter that must be starting players for each team. Table 3 shows the results.
tuned. As for the individual players, the identity of the teams
To account for changing player ability and form, we making up this table raises confidence that the ratings are
weight the observations to allow for match performance meaningful.
ratings further in the past to have a smaller effect on An interesting aside to the main topic here is the
the coefficient estimate of the player dummies than more estimated league strengths. This is an important area
recent match performances. We apply an exponential of research in itself, as when clubs recruit players from
weighting scheme to observations as was used by Dixon leagues other than their own, it is essential to gauge
and Coles (1997) and others since. We include only games whether a player will be able to play as well in the new
played within ψ years before the rating date, where the league as they have done in the current league. Fig. 2
weight is exp(−φ · t /3.5) and t is the number of days the shows the estimated league adjustments to player match
performance is from the calculation day. performance ratings (where the second tier of football
To tune the hyper-parameters, ω, ψ and φ , we aimed in England, the English Championship, is the reference
to minimise the RMSE when predicting a player’s future league). It is probably no surprise that the English Pre-
performance throughout the validation data. We found mier League is the most difficult league throughout the
the values that minimised the RMSE were ψ = 2.00, φ = data. Each match performance rating is worth around
0.0062 and ω = 7.00. The estimated value of φ is such 0.25 more than the same score in the English Cham-
that matches one year ago have a weight approximately pionship. Another interesting finding is the rise of the
305
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
Fig. 2. Plot of the league strengths over time. Note that we plot the negative value of the actual estimate, given a more negative value implies the
league is harder. The reference league is the English Championship.
There are 25 unique outfield positions; thus, 24 logit midfielder (RM). We note that the LST in a 4-4-2 will
models associated with the opponent’s outfield position interact with the opponent’s RST 0.022 of the time. For
are estimated. Each model is estimated relative to a refer- an LW in a 4-3-3, this increases to 0.031, suggesting a
ence category; in our case, the central attacking midfield winger will, on average, play more defensively than a
(CAM) position and separate coefficients are estimated for striker, which accords with intuition. Further, there is a
each logit model. 20.8% chance that the left-winger will attempt a shot.
For instance, the logit model, which estimates the We propose the following metric to measure the dif-
probability the opponent’s right-back (RB) attempts a ference in two teams’ strengths in a match.
defensive action, is ∑∑
( ) ∆= pij (AWSi − AWSj ) (3)
P(posdj = k) i j
log = int RB
+β RB
posdj
+β
RB
formaj
+β RB
formdj
, (2)
P(posd = CAM) where AWSi is the AWS rating of the ith player on the
where k = 1 . . . 25 is an index for the 25 playing positions. home team, and AWSj is the AWS rating of the jth player
When generating predictions from this model, 25 non- on the away team. pij is the weight estimated from the
zero probabilities are calculated. However, only ten out- multinomial models described above. The first summation
field players are possible.2 Consequently, we normalise provides a weighted difference between a player’s rating
these ten values to sum to one. The value pi,j (j ̸ = GK) thus and each of the opposition team’s player’s ratings. The
represents the probability that a defensive action against second summation calculates the first sum for each of the
attacking player i will be attempted by defending position players.
j, which measures the overall level of interaction between
the two positions. 6. Forecasting models
6.1. Data
5.2. Estimating players’ interaction with the opposing goal-
keeper
Having trained our multinomial positional models on
the 13/14 and 14/15 data, we use the remaining data
Similarly, we use data on shot events to determine
(15/16–20/21) for modelling results. This ensures the
the level of interaction between outfield players and the
probabilities generated by the multinomial models are
opposing goalkeepers. This model is needed because goal-
themselves out-of-sample.
keepers do not tend to interact with opponent players
As is common practice, we use the first 80% of the
in the duel-type events used as the basis for our first
15/16–20/21 data for training and the remaining 20% for
model above. This time, the dependent variable is the
testing. The order of the data is maintained so that no
player’s position who has taken a shot. The position of
leakage occurs.
the defending player is always the goalkeeper, so only the
There are several parameters to tune: the hyper-
two teams’ formations are used as independent variables.
parameters of the player ratings model, time-weightings
As in the previous model, we normalise the ten values
in the team-based models we use for comparison (see
associated with players actually on the pitch.
Section 7.1), and optimal thresholds for betting strate-
The value pi,GK represents the probability that a given
gies (see Section 7.2). We split the training set again to
shot will be attempted by position i. This measures the
optimise these parameters and ensure results are fully
level of interaction between i and the opponent’s goal-
out-of-sample, keeping the last 20% as a validation set.
keeper.
Fig. 4 displays these splits graphically.
Consequently, we present fully out-of-sample results
5.3. Examples in all the experiments reported herein. During the Covid
pandemic, football matches were played behind closed
Fig. 3 shows the results of these multinomial models doors. We removed these matches from our analysis as
for two example cases. The first plot shows how a left the home advantage is known to have been distorted
striker (LST), playing on a team in a 4-4-2 formation, during these games where no fans were present (see, for
interacts with each player on the opposition team, also example, McCarrick et al. (2021)). This leaves us with a
playing a 4-4-2 formation. 22.8% of their interactions are sample of 6824 matches between 12th August 2016 and
with the right-centreback (RCB), whilst just 2.3% of their 23rd May 2021. Of this sample, we use the final 20% as
interactions are with the left-striker (LST) on the opposing testing data.
team. We see a 21.1% chance that the left-striker will
attempt a given shot, indicating their interaction with the 6.2. Skellam model
goalkeeper.
The second plot shows that a left-winger (LW) in a Throughout the literature on forecasting in football,
4-3-3 formation has 75.4% of his interactions with the there has been considerable focus on estimating the scor-
opposing right-back (who is playing in a 4-4-2 formation) ing rates of teams. The pioneering idea of Maher (1982)
and 12.2% of his interactions against the opposition’s right was allowing teams to have separate attack and defence
abilities. The scoring rate of the home team is estimated
2 We note that the predicted probabilities of players not on the using the home team’s attack strength and the away
pitch are extremely small. team’s defence strength, and vice-versa for the away
307
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
Fig. 3. Examples of the player weights for two different scenarios. The defending team is coloured in red.
Fig. 4. Plot detailing how data was split during the three main stages of this work. Black indicates the portion of data used for training models,
whilst grey represents data used for testing.
308
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
come are easily obtained by summing the relevant goal • Model skellamadj removes the player interaction
difference probabilities. weights, thus using ∆Start
adj and ∆adj as covariates.
Sub
310
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
Table 6
Results for several betting strategies using the skellamfull model.
Strategy t N Accuracy (%) Stakes Profits ROI (%) Sharpe
Kelly 0.1866 556 24.10 65.36 7.81 11.96 1.07
Kelly 0.0000 1457 29.44 105.42 6.03 5.72 1.02
Flat 0.1760 199 37.19 199.00 9.04 4.54 0.45
Flat 0.0000 568 45.25 568.00 16.93 2.98 0.59
Flat 1350 52.00 1350.00 −32.56 −2.41 −0.87
Fig. 5. Plot displaying the ROI that would be achieved using the skellamfull model for betting under the modified Kelly strategy for different minimum
expected value thresholds.
number of bets being placed. For example, Koopman and and access to these should become increasingly easy in
Lit (2015) placed just 50 bets over two seasons. the future.
We find that only the most basic strategy- flat staking We have demonstrated the goodness-of-fit of the model.
with no value threshold- results in losses. Both Kelly Scoring rules suggest the model performs very well com-
strategies performed well, achieving very promising ROIs pared to bookmakers. Even when we perform the sternest
and Sharpe ratios. We highlight that these results have test of all forecasting models, examining the returns to
been obtained on a large number of bets. Whilst flat betting, the results are positive to the extent that we
stakes with t = 0 and t = 0.1866 both achieve positive achieve positive returns to betting on the 1X2 market.
returns, the Sharpe is less than 1 indicating more risk than Our results have implications in economics studies of
reward. market efficiency and the practice of trading in football.
Fig. 5 shows the relationship between the minimum For example, the player-based model may reduce, at least
expected value threshold and the ROI achieved by the to some extent, the reliance of bookmakers on expert
skellamfull model under the modified Kelly staking strat- traders to adjust predictions from a team-based model in
egy. Also shown is the number of bets placed along the light of information about the actual line-up of players,
chart’s top. The number of bets decreases as the threshold say when a star player is injured. Currently, traders are
increases, but the ROI increases to very high levels. typically required to adjust model probabilities subjec-
tively. Our player-based model does this automatically.
Future work on this type of model is promising. One
8. Closing remarks could, for example, model the interactions of players on
the same team. Football fans often believe some play-
In this paper, we have presented a new model for ers play well together and are greater together than the
forecasting the results of football matches. The model is sum of their abilities. A model including some interaction
a ‘player-based’ model as opposed to the previously pub- between players on the same team would be able to
lished ‘team-based’ models of Maher (1982) and Dixon identify whether this was true. Another area for potential
and Coles (1997). We developed a novel rating framework improvement of the model is to use ‘better’ player ratings.
which adjusts publicly available player matchday ratings Here we use WhoScored ratings, but these ratings may be
to ensure comparability across leagues. Further, we in- weak. For example, there may be a bias towards forward
troduced multinomial models to account for the level players in the WhoScored ratings (given that the top
of interaction between two opposing players, knowing ten are exclusively forwards). One could even use this
that different formations dictate how often a player will model to rate the player ratings themselves. For example,
compete against a particular opponent. one rating of players is their pass completion percentage.
Player-based models rely heavily on data but solve the This could be used as the metric feeding the forecasting
major issue with team-based models. There is no need to model (instead of the WhoScored rating), and the model’s
worry about time-varying team strengths: the mechanism performance is used to measure the usefulness of play-
which causes the dynamics is modelled directly, that is, ers’ pass completion percentage as a predictor of future
the changing line-ups of the teams and the changing team performance. Many player-level metrics could be
short-term form of the players. Admittedly, the model is tested, compared, and rated in this framework for their
data-hungry, but databases of player ratings now exist, usefulness.
311
B. Holmes and I.G. McHale International Journal of Forecasting 40 (2024) 302–312
Lastly, we note the model could be used to develop re- Dubitzky, W., Lopes, P., Davis, J., & Berrar, D. (2019). The Open Inter-
cruitment tools for football clubs and predict the potential national Soccer Database for machine learning. Machine Learning,
108(1), 9–28.
impact a new player might have on a club’s results.
Hubacek, O., Sourek, G., & Zelezny, F. (2019). Learning to predict soccer
results from relational data with gradient boosted trees. Machine
Declaration of competing interest Learning, 108(1), 29–47.
Hvattum, L. M., & Arntzen, H. (2010). Using elo ratings for match result
prediction in association football. International Journal of Forecasting,
The authors declare that they have no known com-
26(3), 460–470, Sports Forecasting.
peting financial interests or personal relationships that Johnstone, D. J., Jones, S., Jose, V. R. R., & Peat, M. (2013). Measures
could have appeared to influence the work reported in of the economic value of probabilities of bankruptcy. Journal of
this paper. the Royal Statistical Society: Series A (Statistics in Society), 176(3),
635–653.
Kelly, J. L. (1956). A new interpretation of information rate. Bell System
References Technical Journal, 35(4), 917–926.
Kharrat, T. (2016). A journey across football modelling with application
Arntzen, H., & Hvattum, L. M. (2021). Predicting match outcomes in to algorithmic trading (Ph.D. thesis).
association football using team ratings and player ratings. Statistical Koopman, S. J., & Lit, R. (2015). A dynamic bivariate Poisson model
Modelling, 21(5), 449–470. for analysing and forecasting match results in the english premier
Baker, R. D., & McHale, I. G. (2015). Time varying ratings in association league. Journal of the Royal Statistical Society: Series A (Statistics in
football: the all-time greatest team is.. Journal of the Royal Statistical Society), 178(1), 167–186.
Society: Series A (Statistics in Society), 178(2), 481–492. Lasek, J. (2019). New data-driven rating systems for association
Berrar, D., Lopes, P., & Dubitzky, W. (2019). Incorporating domain football (Ph.D. thesis).
knowledge in machine learning for soccer outcome prediction. Maher, M. J. (1982). Modelling association football scores. Statistica
Machine Learning, 108(1), 97–126. Neerlandica, 36(3), 109–118.
Boshnakov, G., Kharrat, T., & McHale, I. (2016). A bivariate weibull McCarrick, D., Bilalic, M., Neave, N., & Wolfson, S. (2021). Home
count model for association football scores. Journal of International advantage during the covid-19 pandemic: Analyses of european
Forecasting, 33(2), 458–466. football leagues. Psychology of Sport and Exercise, 56, Article 102013.
Constantinou, A. C. (2019). Dolores: a model that predicts football Owen, A. (2011). Dynamic bayesian forecasting models of foot-
match outcomes from all over the world. Machine Learning, 108(1), ball match outcomes with estimation of the evolution variance
49–75. parameter. IMA Journal of Management Mathematics, 22, 99–113.
Constantinou, A. C., Fenton, N. E., & Neil, M. (2012). pi-football: A Peeters, T. (2018). Testing the wisdom of crowds in the field: Trans-
fermarkt valuations and international soccer results. International
bayesian network model for forecasting association football match
Journal of Forecasting, 34(1), 17–29.
outcomes. Knowledge-Based Systems, 36, 322–339.
R. Core Team (2022). R: A language and environment for statistical
Crowder, M., Dixon, M., Ledford, A., & Robinson, M. (2002). Dynamic
computing. Vienna, Austria: R Foundation for Statistical Computing.
modelling and prediction of english football league matches for bet-
Tsokos, A., Narayanan, S., Kosmidis, I., Baio, G., Cucuringu, M.,
ting. Journal of the Royal Statistical Society: Series D (the Statistician),
Whitaker, G., & Király, F. (2019). Modeling outcomes of soccer
51(2), 157–168.
matches. Machine Learning, 108(1), 77–95.
da Costa, I. B., Marinho, L. B., & Pires, C. E. S. (2021). Forecasting football
Wheatcroft, E. (2020). A profitable model for predicting the over/under
results and exploiting betting markets: The case of ‘‘both teams to
market in football. International Journal of Forecasting, 36(3),
score’’. International Journal of Forecasting.
916–932.
Dixon, M. J., & Coles, S. G. (1997). Modelling association Wheatcroft, E. (2021). Evaluating probabilistic forecasts of football
football scores and inefficiencies in the football betting market. matches: the case against the ranked probability score. Journal of
Journal of the Royal Statistical Society. Series C. Applied Statistics, Quantitative Analysis in Sports, 17(4), 273–287.
46(2), 265–280.
312