Neural Network Prediction of NFL Football Games
Joshua Kahn ECE539 Fall2003
Overview
Introduction Work Performed
Data Collection Preliminary Study Training and Prediction Set Creation Data Preprocessing Making Predictions
Results Conclusion
Introduction
The National Football League (NFL) is a multi-billion dollar business Many web sites claim to be able to predict the outcome of NFL games Some of these sites are trustworthy, others are downright seedy Why are actually correct?
Project Goal
Most prognostications are based on human opinion
Invariably, some degree of bias enters in
This project aims to create a completely objective, statistics based system for predicting the outcome of NFL games
The trouble lies in the intangible aspects of the game It seems plausible to do create a statistical system
Why a Neural Network?
Teams can win in a variety of ways
No linear mapping exists to determine the outcome
This problem essentially boils down to a pattern classification problem
Neural networks are very good at solving these problems Neural network provides a non-linear mapping
Data Collection
Data was to be available from a typical NFL box score A large data set was required to represent the large number of ways to win Collected from NFL.com
Used Excels web query feature to acquire tabular data, such as box scores and team averages
Data Collection
Data was extracted from the box scores using a Perl script
Perl provides an Excel interface
Statistics could be selected from the box scores as desired
Perl also allowed additional data processing
Needed to determine which statistics to use
Preliminary Study
Data was analyzed using Matlab to look for dependency, redundant data, etc. No hyperplane exists to separate wins and losses based on statistical analysis
6 4 T urnover D ifferential
2 0
-2 -4
-6 -8 -400 2000 -300 1000 -200 -100 0 0 100 200 -1000 300 400 -2000 Tim e of P ossession D ifferential
Total Y ardage D ifferential
Preliminary Study Results
Determined the following statistics were most predictive:
Total yardage differential Rushing yardage differential Time of possession differential (in seconds) Turnover differential Home or away
Differential statistics provide insight into offensive and defensive performance Scoring data was excluded as it would bias the networks output toward a single feature
Training and Prediction Sets
Training sets include the statistics for both teams for each game Each training vector also includes the outcome of the game
Outcome marked for both teams 1 = win, -1 = loss
Two prediction sets were created:
One based on team season averages Other based on average of prior 3 weeks Both sets were applied to determine effectiveness
Neural Network Selection
Back-propagation multi-layer perceptron provides a great deal of flexibility
Good pattern classifier Supervised learning
Network parameters and structure were determined based on testing
Data Preprocessing
Processed all data using singular value decomposition
Gives additional weight to the most pertinent features prior to network input Makes training more effective
Performed using Matlabs svd function
Making Predictions
Trained network using training data Applied prediction data three times
Used both season and three week average to determine effectiveness of the two
Found the average of the three trials Classified winner/loser of game
Winner had higher network output
Results
Week Week 14 Week 15
Prediction Rate Season Average Three Week Data Average Data 75% 62.5% 75% 37.5%
Neural network classification correct 94% when actual (not predicted) statistics are used NFL teams seem to be consistent over the long-term
Results
Week 14
Green Bay def. Chicago Philadelphia def. Dallas Indianapolis def. Tennessee San Diego def. Detroit Tampa Bay def. New Orleans Baltimore def. Cincinnati Jacksonville def. Houston Pittsburgh def. Oakland Minnesota def. Seattle New York Giants def. Washington
Week 15
Indianapolis def. Atlanta Kansas City def. Detroit New England def. Jacksonville New York Jets def. Pittsburgh Cincinnati def. San Francisco Denver def. Cleveland Dallas def. Washington New Orleans def. NY Giants Tennessee def. Buffalo Tampa Bay def. Houston Minnesota def. Chicago St. Louis def. Seattle Oakland def. Baltimore Carolina def. Arizona Green Bay def. San Diego Philadelphia def. Miami
San Francisco def. Denver def. Kansas Arizona City New England def. Miami Atlanta def. Carolina Buffalo def. New York Jets St. Louis def. Cleveland
Baseline Study
Prediction Rate Week Week 14 Week 15
Neural Network 75% 75%
ESPN.com 57% 87%
Neural network was more accurate on average Previous neural networks predictors accurate for 63% of games
Conclusions
Game Misclassification Reasoning
Of eight misclassifications, each can be subjectively identified in one of 3 categories
Philadelphia def. Dallas Misclassification San Diego def. Detroit Atlanta def. Carolina Minnesota def. Seattle New England def. Jacksonville New York Jets def. Pittsburgh Cincinnati def. San Francisco Oakland def. Baltimore Too close to call Upset Too close to call Misclassification Too close to call Too close to call Upset
Conclusions
Prediction rate could be improved by adding the human element
Take immeasurable into consideration Las Vegas betting lines Subjective team rankings
Training set could be based on previous season data
Ways in which teams win presumably does not change over time
Proves that a statistically based system can be developed to predict outcome of NFL games
References
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Upper Saddle River, New Jersey: Prentice-Hall, Inc. ESPN.com, http://www.espn.com [Retrieved Dec 2003]. Purucker, M.C. (1996) Neural Network Quarterbacking. Potentials, IEEE, vol. 15:3, pp. 9-15. NFL.com, http://www.nfl.com [Retrieved Dec 2003].
Questions???
Thank you