Football Match Winner Prediction
Football Match Winner Prediction
31
International Journal of Computer Applications (0975 – 8887)
Volume 154 – No.3, November 2016
obtained after comparing all the models proposed in this paper ratio. These two features would signify how good the team is
[2]. in terms of attack. The defense quotient is computed using the
features: successful tackles and intercepted passes. These
Ben Ulmer and Matthew Fernandez predicted the soccer would signify the strength of the defense.
match results in English Premier League. They used some
machine learning techniques, which include classifiers namely After feature selection and computation, the next task would
Linear from stochastic gradient descent, Naïve Bayes, hidden be selecting upon the classifier to be used. Initially we used
Markov model, Support Vector Machine and Random forest. Logistic regression to classify the data set, however it
Accuracy of each and every model was calculated to find the classified only 2 classes and not the 3rd one.
better approach. They proposed that the results of the first few
matches couldn’t be predicted due to the lack of data
regarding the form of the team. They compared all the
methods out of which SVM showed the best result of 40% -
52% accuracy [3].
Finally, all the points of 5 recent matches will be added to 4. EXPECTED OUTCOME
generate a collective form. We collected data from various websites and data sources
using different scrapping tools. We generated a mathematical
Two main aspects of a football game are attack and defense.
model to represent the data in the format required by the
Thus comparing these two quotients of two teams gives us an
algorithms. The dataset was then divided in the ratio 80:20
intuition about the better team both attack-wise and defense-
(training: testing). We achieved 49.37% accuracy using
wise. The attacking quotient is again computed using
Logistic regression algorithm and below is the confusion
following features: shots on target and shots on target/goals
matrix:
32
International Journal of Computer Applications (0975 – 8887)
Volume 154 – No.3, November 2016
Table 2. Confusion Matrix of Logistic Regression Although this algorithm is not as accurate as the previous one,
it still classifies the 3rd class and hence there is a compromise
Predicted Predicted Predicted between accuracy and classification of all classes.
Win Loss Draw
5. CONCLUSION AND FUTURE SCOPE
Actual 268 32 1 Thus, it is seen that the case of draw reduces the accuracy of
Win predicting the remaining two classes. It is observed that by
removing the draw instances, accuracy can be increased up to
Actual 135 57 0 65%. Logistic regression fails to classify the draw class. So in
order to achieve generality, voting algorithm is preferred.
Loss
Availability of more features that can help in solving the issue
of predicting draw class would improve the accuracy. Also,
Actual 138 27 0 algorithms optimal for sparse data such as decision trees and
Draw boosting algorithms may also increase the accuracy.
6. REFERENCES
As we can see from the confusion matrix, Logistic regression [1] Douwe Buursma; Predicting sports events from past
classifies only 2 classes and just 1 instance of class 3. Hence, results, University of Twente, 2011.
we used a different algorithm Vote which selects the best
results of multiple algorithms. Here, we have used Random [2] Nivard, W. & Mei, R. D.Soccer analytics: Predicting the
forest and Naïve Bayes classification algorithms for voting. of soccer matches. (Master thesis: UV University of
Accuracy achieved is 47.11% and below is the confusion Amsterdam), 2012.
matrix:
[3] Ben Ulmer and Matthew Fernandez; Predicting Soccer
Table 3. Confusion Matrix of Vote Algorithm Match results in the English Premier League, cs229,
2014.
Predicted Predicted Predicted
Win Loss Draw [4] Data mining [Online]. Available:
https://en.wikipedia.org/wiki/Data_mining
Actual Win 235 52 14 [5] Machine Learning [Online]. Available:
https://en.wikipedia.org/wiki/Machine_learning
Actual Loss 114 66 12
Actual 112 44 9
Draw
IJCATM : www.ijcaonline.org 33