100% found this document useful (17 votes)
302 views16 pages

Sports Analytics in Practice With R 1st Edition PDF

The book 'Sports Analytics in Practice with R' by Ted Kwartler and Robert Baker, published in 2022, serves as a comprehensive guide to applying R programming in sports analytics. It covers various topics including data visualization, player evaluation, and gambling optimization, making it suitable for both novices and experienced analysts. The book aims to make sports analytics accessible and relevant, while also providing tools and techniques that can be applied across different sports domains.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (17 votes)
302 views16 pages

Sports Analytics in Practice With R 1st Edition PDF

The book 'Sports Analytics in Practice with R' by Ted Kwartler and Robert Baker, published in 2022, serves as a comprehensive guide to applying R programming in sports analytics. It covers various topics including data visualization, player evaluation, and gambling optimization, making it suitable for both novices and experienced analysts. The book aims to make sports analytics accessible and relevant, while also providing tools and techniques that can be applied across different sports domains.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Sports Analytics in Practice with R, 1st Edition

Visit the link below to download the full version of this book:

https://medipdf.com/product/sports-analytics-in-practice-with-r-1st-edition/

Click Download Now


Sports Analytics in Practice with R

Ted Kwartler
Add Affiliation
Maynard, MA

Robert Baker
Haymarket
[Give City], VA
This edition first published 2022
© 2022 John Wiley and Sons, Ltd.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,
except as permitted by law. Advice on how to obtain permission to reuse material from this title is
available at http://www.wiley.com/go/permissions.

The right of Ted Kwartler and Robert Baker to be identified as the authors of this work has been asserted
in accordance with law.

Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products
visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content
that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty


The contents of this work are intended to further general scientific research, understanding, and
discussion only and are not intended and should not be relied upon as recommending or promoting
scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing
research, equipment modifications, changes in governmental regulations, and the constant flow of
information relating to the use of medicines, equipment, and devices, the reader is urged to review and
evaluate the information provided in the package insert or instructions for each medicine, equipment,
or device for, among other things, any changes in the instructions or indication of usage and for added
warnings and precautions. While the publisher and authors have used their best efforts in preparing
this work, they make no representations or warranties with respect to the accuracy or completeness
of the contents of this work and specifically disclaim all warranties, including without limitation any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or
extended by sales representatives, written sales materials or promotional statements for this work. The
fact that an organization, website, or product is referred to in this work as a citation and/or potential
source of further information does not mean that the publisher and authors endorse the information or
services the organization, website, or product may provide or recommendations it may make. This work
is sold with the understanding that the publisher is not engaged in rendering professional services. The
advice and strategies contained herein may not be suitable for your situation. You should consult with a
specialist where appropriate. Further, readers should be aware that websites listed in this work may have
changed or disappeared between when this work was written and when it is read. Neither the publisher
nor authors shall be liable for any loss of profit or any other commercial damages, including but not
limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data


ISBN 9781119598077 (hardback)

[Typesetter: CiP data includes print ISBNs so when you add overwrite existing print ISBNs found on TV
page]

Cover image: [Production Editor to insert]


Cover design by [Production Editor to insert]

Set in 9.5/12.5pt STIXTwoText by SPi Global, Pondicherry, India

[Typesetter: please leave line space here for printer to insert “Printed in [Country only]” – do not include
in BPA files]

10 9 8 7 6 5 4 3 2 1
v

Contents

Preface vii
Author Biography ix
Foreword xii

1 Introduction to R 1

2 Data Visualization: Best Practices 25

3 Geospatial Data: Understanding Changing Baseball Player Behavior 55

4 Evaluating Players for the Football Draft 91

5 Logistic Regression: Explaining Basketball Wins and Losses with Coefficients 133

6 Gauging Fan Sentiment in Cricket 155

7 Gambling Optimization 191

8 Exploratory Data Analysis: Searching Data for Opponent Insights 227

Index 253
vii

Preface

Sports is one of the few places where the data and outcomes are well known. Unlike
medicine which requires significant subject-matter expertise or business where the data
is proprietary in most cases, sports knowledge is relatively accessible, and the data and
outcomes are public. As a result, sports analytics serves as a great entry point for many
aspiring data scientists and analytics professionals. For the novice, this book demon-
strates the many facets and uses of countless techniques applicable outside of sports. It
should have more than enough topics and examples to aid learning for general practice.
For the avid R programmer and sports fan, the book likely has some new functions and
techniques which may be less well known. These readers will delight in improving and
expanding the demonstrated methods once the core concepts are understood. Finally, for
those already in the sports analytics world the techniques and individual chapter topics
can serve as a reference and starting point in their professional analysis. For instance,
much of the use cases in the chapters can be adjusted to specific sports or updated by
more recent underlying data.
This book has been a long journey in the making. Originally the book’s scope was
centered on individualized chapters demonstrating analytical techniques within a
sports context. The goal is that a reader inherits various tools that act as a foundation for
analysis to build upon and add complexity with subsequent analyses as the reader’s
technical acumen and sports interests grow. Each chapter is meant to be a standalone
reference as the reader explores and learns. This also frees up the reader to focus on top-
ics of interest. For example, a reader may not want to learn about natural language
processing so could skip that chapter altogether to focus on another subject such as
optimizing a fantasy football lineup. The book’s undertaking grew in complexity due to
a personal commitment to demonstrate concepts on diverse data sets including
Paralympic athletes, female soccer and basketball, and less US-centric popular sports
including cricket in addition to the more typically demonstrated sports analyses of
men’s football, baseball, and basketball. My goal is to make the subject accessible and
relevant to many in the analytics field despite this effort slowing the book’s creation.
Keep in mind a chapter’s concepts can be applied to many sports domains. For example,
the text analysis applied to cricket fan forum posts can easily be applied to men’s basket-
ball fan tweets or forum posts. Each chapter’s takeaway is meant to be a broadly useful
tool, not a brittle or narrowly focused analysis. Additionally, the book was delayed due
to the pandemic’s effect on the sports-world. Admittedly the shortened seasons, canceled
viii Preface 

games, and other changes that created outlier statistics pales in comparison to the
­pandemic’s hardship and humanistic impact outside of sports. Despite these challenges,
the book’s end result was worth the delay. The final product covers many diverse con-
cepts, and data, encouraging analytics professionals to enjoy the intersection of sports
and analysis.
The book’s supporting website is www.rstatsbook.com. The site contains data and scripts
along with any code revisions necessary as functions and packages change. Redundantly,
data is shared via git repository at www.github.com/kwartler/Practical_Sports_Analytics.
ix

Author Biography

Ted Kwartler
Adjunct Professor, Harvard University

Ted Kwartler is the VP, Trusted AI at DataRobot.


At DataRobot, Ted sets product strategy for explain-
able and ethical uses of data technology in the com-
pany’s application. Ted brings unique insights and
experience utilizing data, business acumen, and
ethics to his current and previous positions at
Liberty Mutual Insurance and Amazon. In addition
to having four DataCamp courses, he teaches grad-
uate courses at the Harvard Extension School and is
the author of Text Mining in Practice with R.
x

Analytics don’t work at all. It’s just some crap some people who were really smart
made up.
Charles Barkley, former NBA player

Just because you don’t understand something doesn’t mean it’s crap.
Ross Drucker, NBA Future Analytics Stats Program Analyst
xi

My dear Nora & Brenna,

My inspiration and guides. I wrote this book in your honor though don’t expect either of
you to follow my footsteps into analysis. Your journey is your own, may you find a passion
and, if desirable, have the opportunity to write about it. No matter where your attention
and intellect lead you I remain.

Your loving father,


Ted
xii

Foreword

Writing a book is no easy task yet for some reason I decided to write a second! Overall, I
am grateful to the countless people that helped me learn, expand, and apply these meth-
ods. Data science and analytics is as much as “team sport” as any, where collaboration,
communication, and effort often wins the day.
First I would like to acknowledge Jack W, whose intellect and athleticism left us far too
early. For anyone struggling with mental health, know that you are loved, you are valua-
ble, and people in your community are here for you. Your passing was a motivating
reminder of the short time we have to make contributions along with the need for more
kindness toward those that may be suffering silently.
Next, Anup B, one of the most brilliant supportive leaders I have worked for. Not to
mention your passion for cricket helped open my eyes to a noteworthy and enjoyable
sport. Losing you to the pandemic was a disturbing blow felt by many people who were
touched by your intelligence, humor, and positivity.
This entire book would not have been possible without the fine professors at the
University of Notre Dame that put me on my own professional journey. I fondly remem-
ber building my first logistic regression predicting March Madness after learning these
techniques from Dr. Keating, the late Dr. Gilbride, and Dr. Devaraj.
Further I would like to acknowledge my parents, Anatol and Trish, and my endearing
wife, Meghan. Your support and patience has been significant. Writing a book is no small
undertaking with much of the logistical burden falling to each of you. Completing this
book is a shared victory.
Lastly, my sincerest gratitude to the wonderful team at Wiley, particularly Kimberly
Monroe-Hill. Your patience and flexibility to late submissions and delayed seasons stem-
ming from the unusual 2020 year in sports (among other more important hardships) has
been greatly appreciated. I was ready to give up on the project yet your e-mails demon-
strated a commitment from Wiley that I cherish.
1

Introduction to R

Objectives

● Learn about R as a programming language


● Define Integrated Development Environment
● Define objects
● Learn the assignment operator
● Define functions
● Executing a loop
● Learn logical operators
● Learn about R data types
● Learn about object classes
● Indexing data objects
● Extending R functionality with packages
● Writing a custom function
● Create a scatter plot with sports data
● Create a heatmap with sports data

R Libraries

`ggplot2`
`ggthemes`
`RCurl`
`tidyr`

R Functions

`+`
`plot
`<-`
`round`

Sports Analytics in Practice with R, First Edition. Ted Kwartler.


© 2022 John Wiley & Sons Ltd. Published 2022 by John Wiley & Sons Ltd.
2 1 Introduction to R

`class`
`as.factor`
`as.character`
`c`
`cbind`
`rbind`
`data.frame`
`as.matrix`
`as.data.frame`
`install.packages`
`library`
`getURL`
`read.csv`
`dim`
`names`
`head`
`tail`
`summary`
`table`
`qplot`
`pivot_longer`
`geom_tile`
`scale_fill_gradient`
`xlab`
`ggtitle`
`theme`
`theme_hc`

The R Programming Language

R is an open-source, freely available programming language used throughout this book. R


is a powerful and longstanding programming language developed more than 20 years ago.
It is a derivative of the “S” programming language for statistics originating in the mid-
1990s developed by AT&T and Lucent Technologies. Unlike other programming lan-
guages, R is optimized specifically for statistics including but not limited to simulation,
machine learning, visualizations, and traditional statistical modeling (linear regression)
as well as tests. Due to the open-source nature of R, many developers, academics, and
enthusiasts have contributed to its development for their specific needs. As a result, the
language is extensible meaning it can be easily used for various purposes. For example,
through R markdown, simple websites and presentations can be created. In another use
case, R can be used for traditional linear modeling or machine learning and can draw
upon various data types for analysis including audio files, digital images, text, numeric,
and various other data files and types. Thus, it is widely used and nonspecialized other
  The R Programming Language 3

than to say R is an analysis language. This differs from other languages which specialize
in web development like Ruby or python which has extended its functionality to building
applications not just analysis.
In this textbook, the R language is applied specifically to sports contexts. Of course, the
code in this book can be used to extend your understanding of sports analytics. It may give
you insights to a particular sport or analytical aspect within the sport itself such as what
statistics should be focused on to win a basketball game. However, learning the code in this
book can also help open up a world of analytical capabilities beyond sports. One of the ben-
efits of learning statistics, programming, and various analysis methods with sports data is
that the data is widely available and outcomes are known. This means that your analysis,
models, and visualizations can be applied, and you can review the outcomes as you expand
upon what is covered in this book. This differs from other programming and statistical
examples which may resort to boring, synthetic data to illustrate an analytical result. Using
sports data is realistic and can be future oriented, making the learning more challenging yet
engaging. Modeling the survivors of the Titanic pales in comparison since you cannot
change the historical outcome or save future cruise ship mates. Thus, modeling which team
will win a match or which player is a good draft pick is a superior learning experience.
If you are new to programming don’t be intimidated. R is a forgiving language in that
things like spacing an indentation are ignored. Further, the R community is well sup-
ported and a simple online search of any error message usually finds an answer quickly
on any number of sites.
To begin your R and sports analytics journey, please download the “base-R” distribution
for your operating system. The “Comprehensive R Archive Network,” CRAN, is the home
of the official R distribution as well as officially supported packages (more on that in a
bit). The site to download base-R is https://cran.r-project.org.
Unfortunately, base-R, having started in the nineties, looks abysmal and lacks some
modern day functionality. Thus, you will need to next download the R-Studio Integrated
Development Environment, or IDE. An IDE is software that consolidates many of the
aspects needed to code into one place. For example, you will need to write code which
could be done in a simple notepad like program, a place to execute the code written, a
place to visualize plots that were output from the code, and so on. These individual com-
ponents are assembled into the IDE for ease of use and fast development. R and many
other languages have IDEs. In fact, R has multiple IDE optimized for the type of analysis
you are performing such as biostatistics or working with another language like Java. The
most popular and easily supported IDE for base-R is the R-Studio software. There are
server and desktop versions available. The code executed in this book should work for
either cloud or local but installation of base-R and R-Studio on a server is not covered.
Therefore, please download the R-Studio desktop IDE by navigating to https://www.
rstudio.com/products/rstudio.

The R-Studio IDE, or Integrated Development Environment, adds functionality and


modern user interface to base-R. The IDE aggregates common functionality used for
software development and statistical analysis.
4 1 Introduction to R

Figure 1.1 The relationship between base-R and R-studio.

Essentially R-Studio sits on top of base-R. The IDE provides a modern GUI expected of
today’s computer users while also adding functionality including the use of version con-
trol, terminal access and perhaps most importantly an easy way to create and view visuali-
zations for easy export and saving to disk. Figure 1.1 illustrates the basic relationship for
base-R and R-Studio. As you can see without base-R, the IDE will not function because
none of the computational functions exist in the IDE itself.
Now that you have both base-R and R-Studio, let’s start to explore the programming
environment. Think of an R environment as a relatively generic statistical piece of soft-
ware. Once downloaded it can perform all tasks programmatically found in many of the
popular spread sheet programs either online or for a laptop. The advantage of R is its exten-
sibility mentioned earlier. R can be specialized from a generic statistical set of tools into a
more interesting and nuanced piece of software. This is done through the download of
specialized packages and called in the console by loading the package for the task at hand.
Figure 1.2 shows the IDE itself without a “script” to be executed. For now, focus on the
“console” section in Figure 1.2. This is the lower left-hand side containing a “>” symbol.
This is the section where code will be executed and results are returned.
The next step is to navigate to “File > New File > R Script” in the upper left of the IDE.
This will open another pane in the IDE. The script pane will be located in the upper left
section of the IDE and will shrink the console on the lower left-hand side. While the con-
sole is where code is executed and computation enacted, the scripting section is where
you will write code that is then run within the console. Think of an R script as merely a
lightweight text file that can be saved and repeated by running in the console. A script is
nothing more than a set of instructions that have not been enacted yet. To save an R script,
navigate to “File > Save” and then simply follow the IDE dialog. The rest of the book pro-
vides R scripts for you to execute along with explanations along the way. Figure 1.3 shows
the new script pane with some basic example code.
Of particular note in the script shown in Figure 1.3 are two comments and two code
examples. A comment begins with a `#`. This tells R to ignore everything on that line. As
you begin your learning journey programming in R, it is a best practice to add comments
to remind yourself the nuances of the code to be executed. Thus, feel free to make a copy
of any scripts throughout the book, add comments, and save for yourself.
  The R Programming Language 5

Figure 1.2 The R-Studio IDE console.

Figure 1.3 The upper left R script with basic commands and comments.

The first code to be executed, beginning on a non-commented line, is a simple arithme-


tic operation shown below.
2 + 2
Since this is in a script, it will not be run until you declare it within the console. Further,
as you can guess the operation `2 + 2` has a single result `4`. An easy way to run the script
is to place your cursor on the line you want to execute and click the “run” icon on the
upper right-hand side of the script. When this is done the code is transferred to the

You might also like