Sports Analytics in Practice With R 1st Edition PDF
Sports Analytics in Practice With R 1st Edition PDF
Visit the link below to download the full version of this book:
https://medipdf.com/product/sports-analytics-in-practice-with-r-1st-edition/
Ted Kwartler
Add Affiliation
Maynard, MA
Robert Baker
Haymarket
[Give City], VA
This edition first published 2022
© 2022 John Wiley and Sons, Ltd.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,
except as permitted by law. Advice on how to obtain permission to reuse material from this title is
available at http://www.wiley.com/go/permissions.
The right of Ted Kwartler and Robert Baker to be identified as the authors of this work has been asserted
in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products
visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content
that appears in standard print versions of this book may not be available in other formats.
[Typesetter: CiP data includes print ISBNs so when you add overwrite existing print ISBNs found on TV
page]
[Typesetter: please leave line space here for printer to insert “Printed in [Country only]” – do not include
in BPA files]
10 9 8 7 6 5 4 3 2 1
v
Contents
Preface vii
Author Biography ix
Foreword xii
1 Introduction to R 1
5 Logistic Regression: Explaining Basketball Wins and Losses with Coefficients 133
Index 253
vii
Preface
Sports is one of the few places where the data and outcomes are well known. Unlike
medicine which requires significant subject-matter expertise or business where the data
is proprietary in most cases, sports knowledge is relatively accessible, and the data and
outcomes are public. As a result, sports analytics serves as a great entry point for many
aspiring data scientists and analytics professionals. For the novice, this book demon-
strates the many facets and uses of countless techniques applicable outside of sports. It
should have more than enough topics and examples to aid learning for general practice.
For the avid R programmer and sports fan, the book likely has some new functions and
techniques which may be less well known. These readers will delight in improving and
expanding the demonstrated methods once the core concepts are understood. Finally, for
those already in the sports analytics world the techniques and individual chapter topics
can serve as a reference and starting point in their professional analysis. For instance,
much of the use cases in the chapters can be adjusted to specific sports or updated by
more recent underlying data.
This book has been a long journey in the making. Originally the book’s scope was
centered on individualized chapters demonstrating analytical techniques within a
sports context. The goal is that a reader inherits various tools that act as a foundation for
analysis to build upon and add complexity with subsequent analyses as the reader’s
technical acumen and sports interests grow. Each chapter is meant to be a standalone
reference as the reader explores and learns. This also frees up the reader to focus on top-
ics of interest. For example, a reader may not want to learn about natural language
processing so could skip that chapter altogether to focus on another subject such as
optimizing a fantasy football lineup. The book’s undertaking grew in complexity due to
a personal commitment to demonstrate concepts on diverse data sets including
Paralympic athletes, female soccer and basketball, and less US-centric popular sports
including cricket in addition to the more typically demonstrated sports analyses of
men’s football, baseball, and basketball. My goal is to make the subject accessible and
relevant to many in the analytics field despite this effort slowing the book’s creation.
Keep in mind a chapter’s concepts can be applied to many sports domains. For example,
the text analysis applied to cricket fan forum posts can easily be applied to men’s basket-
ball fan tweets or forum posts. Each chapter’s takeaway is meant to be a broadly useful
tool, not a brittle or narrowly focused analysis. Additionally, the book was delayed due
to the pandemic’s effect on the sports-world. Admittedly the shortened seasons, canceled
viii Preface
games, and other changes that created outlier statistics pales in comparison to the
pandemic’s hardship and humanistic impact outside of sports. Despite these challenges,
the book’s end result was worth the delay. The final product covers many diverse con-
cepts, and data, encouraging analytics professionals to enjoy the intersection of sports
and analysis.
The book’s supporting website is www.rstatsbook.com. The site contains data and scripts
along with any code revisions necessary as functions and packages change. Redundantly,
data is shared via git repository at www.github.com/kwartler/Practical_Sports_Analytics.
ix
Author Biography
Ted Kwartler
Adjunct Professor, Harvard University
Analytics don’t work at all. It’s just some crap some people who were really smart
made up.
Charles Barkley, former NBA player
Just because you don’t understand something doesn’t mean it’s crap.
Ross Drucker, NBA Future Analytics Stats Program Analyst
xi
My inspiration and guides. I wrote this book in your honor though don’t expect either of
you to follow my footsteps into analysis. Your journey is your own, may you find a passion
and, if desirable, have the opportunity to write about it. No matter where your attention
and intellect lead you I remain.
Foreword
Writing a book is no easy task yet for some reason I decided to write a second! Overall, I
am grateful to the countless people that helped me learn, expand, and apply these meth-
ods. Data science and analytics is as much as “team sport” as any, where collaboration,
communication, and effort often wins the day.
First I would like to acknowledge Jack W, whose intellect and athleticism left us far too
early. For anyone struggling with mental health, know that you are loved, you are valua-
ble, and people in your community are here for you. Your passing was a motivating
reminder of the short time we have to make contributions along with the need for more
kindness toward those that may be suffering silently.
Next, Anup B, one of the most brilliant supportive leaders I have worked for. Not to
mention your passion for cricket helped open my eyes to a noteworthy and enjoyable
sport. Losing you to the pandemic was a disturbing blow felt by many people who were
touched by your intelligence, humor, and positivity.
This entire book would not have been possible without the fine professors at the
University of Notre Dame that put me on my own professional journey. I fondly remem-
ber building my first logistic regression predicting March Madness after learning these
techniques from Dr. Keating, the late Dr. Gilbride, and Dr. Devaraj.
Further I would like to acknowledge my parents, Anatol and Trish, and my endearing
wife, Meghan. Your support and patience has been significant. Writing a book is no small
undertaking with much of the logistical burden falling to each of you. Completing this
book is a shared victory.
Lastly, my sincerest gratitude to the wonderful team at Wiley, particularly Kimberly
Monroe-Hill. Your patience and flexibility to late submissions and delayed seasons stem-
ming from the unusual 2020 year in sports (among other more important hardships) has
been greatly appreciated. I was ready to give up on the project yet your e-mails demon-
strated a commitment from Wiley that I cherish.
1
Introduction to R
Objectives
R Libraries
`ggplot2`
`ggthemes`
`RCurl`
`tidyr`
R Functions
`+`
`plot
`<-`
`round`
`class`
`as.factor`
`as.character`
`c`
`cbind`
`rbind`
`data.frame`
`as.matrix`
`as.data.frame`
`install.packages`
`library`
`getURL`
`read.csv`
`dim`
`names`
`head`
`tail`
`summary`
`table`
`qplot`
`pivot_longer`
`geom_tile`
`scale_fill_gradient`
`xlab`
`ggtitle`
`theme`
`theme_hc`
than to say R is an analysis language. This differs from other languages which specialize
in web development like Ruby or python which has extended its functionality to building
applications not just analysis.
In this textbook, the R language is applied specifically to sports contexts. Of course, the
code in this book can be used to extend your understanding of sports analytics. It may give
you insights to a particular sport or analytical aspect within the sport itself such as what
statistics should be focused on to win a basketball game. However, learning the code in this
book can also help open up a world of analytical capabilities beyond sports. One of the ben-
efits of learning statistics, programming, and various analysis methods with sports data is
that the data is widely available and outcomes are known. This means that your analysis,
models, and visualizations can be applied, and you can review the outcomes as you expand
upon what is covered in this book. This differs from other programming and statistical
examples which may resort to boring, synthetic data to illustrate an analytical result. Using
sports data is realistic and can be future oriented, making the learning more challenging yet
engaging. Modeling the survivors of the Titanic pales in comparison since you cannot
change the historical outcome or save future cruise ship mates. Thus, modeling which team
will win a match or which player is a good draft pick is a superior learning experience.
If you are new to programming don’t be intimidated. R is a forgiving language in that
things like spacing an indentation are ignored. Further, the R community is well sup-
ported and a simple online search of any error message usually finds an answer quickly
on any number of sites.
To begin your R and sports analytics journey, please download the “base-R” distribution
for your operating system. The “Comprehensive R Archive Network,” CRAN, is the home
of the official R distribution as well as officially supported packages (more on that in a
bit). The site to download base-R is https://cran.r-project.org.
Unfortunately, base-R, having started in the nineties, looks abysmal and lacks some
modern day functionality. Thus, you will need to next download the R-Studio Integrated
Development Environment, or IDE. An IDE is software that consolidates many of the
aspects needed to code into one place. For example, you will need to write code which
could be done in a simple notepad like program, a place to execute the code written, a
place to visualize plots that were output from the code, and so on. These individual com-
ponents are assembled into the IDE for ease of use and fast development. R and many
other languages have IDEs. In fact, R has multiple IDE optimized for the type of analysis
you are performing such as biostatistics or working with another language like Java. The
most popular and easily supported IDE for base-R is the R-Studio software. There are
server and desktop versions available. The code executed in this book should work for
either cloud or local but installation of base-R and R-Studio on a server is not covered.
Therefore, please download the R-Studio desktop IDE by navigating to https://www.
rstudio.com/products/rstudio.
Essentially R-Studio sits on top of base-R. The IDE provides a modern GUI expected of
today’s computer users while also adding functionality including the use of version con-
trol, terminal access and perhaps most importantly an easy way to create and view visuali-
zations for easy export and saving to disk. Figure 1.1 illustrates the basic relationship for
base-R and R-Studio. As you can see without base-R, the IDE will not function because
none of the computational functions exist in the IDE itself.
Now that you have both base-R and R-Studio, let’s start to explore the programming
environment. Think of an R environment as a relatively generic statistical piece of soft-
ware. Once downloaded it can perform all tasks programmatically found in many of the
popular spread sheet programs either online or for a laptop. The advantage of R is its exten-
sibility mentioned earlier. R can be specialized from a generic statistical set of tools into a
more interesting and nuanced piece of software. This is done through the download of
specialized packages and called in the console by loading the package for the task at hand.
Figure 1.2 shows the IDE itself without a “script” to be executed. For now, focus on the
“console” section in Figure 1.2. This is the lower left-hand side containing a “>” symbol.
This is the section where code will be executed and results are returned.
The next step is to navigate to “File > New File > R Script” in the upper left of the IDE.
This will open another pane in the IDE. The script pane will be located in the upper left
section of the IDE and will shrink the console on the lower left-hand side. While the con-
sole is where code is executed and computation enacted, the scripting section is where
you will write code that is then run within the console. Think of an R script as merely a
lightweight text file that can be saved and repeated by running in the console. A script is
nothing more than a set of instructions that have not been enacted yet. To save an R script,
navigate to “File > Save” and then simply follow the IDE dialog. The rest of the book pro-
vides R scripts for you to execute along with explanations along the way. Figure 1.3 shows
the new script pane with some basic example code.
Of particular note in the script shown in Figure 1.3 are two comments and two code
examples. A comment begins with a `#`. This tells R to ignore everything on that line. As
you begin your learning journey programming in R, it is a best practice to add comments
to remind yourself the nuances of the code to be executed. Thus, feel free to make a copy
of any scripts throughout the book, add comments, and save for yourself.
The R Programming Language 5
Figure 1.3 The upper left R script with basic commands and comments.