Predictive Analytics for Everyone!
Building CART Models using R
Chantal Larose
Assistant Professor of Decision Science (Statistics)
School of Business, SUNY New Paltz
DASH Lab Workshop
March 8 2017
Why Predictive Analytics?
Sports, healthcare, customer service – the world is full of data!
Fun to pull stories out of a mess of numbers
Examples:
A Logistic Regression Approach to Predicting
Who Will Make the NBA Playoffs 1
Data Mining Major League Baseballs Pace of Play Problem 2
More sports applications at:
New England Symposium on Statistics in Sports
Saturday, September 23, 2017
1
Ryan Elmore, Department of Business Information and Analytics, Daniels School of Business,
University of Denver
2
Aaron Crowley, Zhuolin He, and Rachael Hageman Blair. Department of Biostatistics, State
University of New York at Buffalo
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 1
Why R?
Open source, free to download
Active and helpful community
Appeals to non-programmers: Different user interfaces (e.g. RStudio)
allow for point-and-click interface for some tasks
Appeals to programmers: Customizable – program your own functions, etc.
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 2
Set-up for the Workshop
Open up RStudio on your laptop (Apps → Other → RStudio)
Go to the Workshop’s website:
hawksites.newpaltz.edu/dashlab/predictive-analytics-for-everyone/
Download the Churn data set (.csv file)
Download the Adult data set (.csv file)
Download the Do It Yourself! guide (.R file)
The analyses in this workshop are covered in more detail in Data Mining and
Predictive Analytics, Second Edition. Larose & Larose, Wiley, 2015.
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 3
Getting Acquainted with R
Open the Do It Yourself! R file.
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 4
Getting Acquainted with R
Input the data set:
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 5
Getting Acquainted with R
Let’s look at some code:
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 6
Getting Acquainted with R
How do we tell R to run the code?
1. Highlight the code and press the Run button
2. Put your cursor on the line you want to run and press Control+Enter
(no need to highlight code)
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 7
Activity 1: CART Models
We want to predict the value of one variable, using other variables
For our first example:
We want to predict the value of Churn,
i.e. whether or not a customer leaves our company.
We will predict Churn using variables such as:
Day Mins: How many minutes during the day a customer uses their phone3
CustServ Calls: How many times a customer has called customer service
VMail Plan: Whether or not a customer has the voicemail plan
3
Data is from when day and evening charges were different
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 8
Activity 1: CART Models
Wait – What about regression?
The data may be too messy to meet the normality requirements, even
with transformations
Regression interpretations get very complex very fast (especially with
transformations)
Data set is too large!
At some point, you have so many records that the F and t tests from
regression will come back significant, no matter what the reality of the
situation is
CART models generate easy-to-understand “decision rules” (IF this,
THEN that) that make intuitive sense
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 9
Activity 1: CART Models - Setup
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 10
Activity 1: CART Models - Setup
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 11
Activity 1: CART Models
CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total
Day Mins < 264
CustServ Calls < 3.5
Int'l Plan = no
Day Mins < 223
Eve Mins < 260
VMail Plan = yes
Intl Calls >= 2.5
Intl Mins < 13
Day Mins >= 160
Eve Mins >= 142
Day Mins >= 176
Eve Mins >= 212
VMail Plan = yes
Eve Mins < 188
Day Mins < 278
Eve Mins < 144
>= 264
>= 3.5
yes
>= 223
>= 260
no
< 2.5
>= 13
< 160
< 142
< 176
< 212
no
>= 188
>= 278
>= 144
False.
2850 / 3333
False.
2766 / 3122
False.
2642 / 2871
False.
2476 / 2604
False.
2161 / 2221
False.
315 / 383
False.
298 / 332
True.
34 / 51
False.
11 / 11
True.
34 / 40
False.
166 / 267
False.
166 / 216
False.
166 / 173
True.
43 / 43
True.
51 / 51
True.
127 / 251
False.
111 / 149
False.
106 / 130
False.
86 / 96
False.
20 / 34
False.
18 / 18
True.
14 / 16
True.
14 / 19
True.
89 / 102
True.
127 / 211
False.
47 / 53
True.
121 / 158
False.
32 / 57
False.
21 / 25
True.
21 / 32
False.
7 / 8
True.
20 / 24
True.
96 / 101
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 12
Activity 1: CART Models
CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total
Day Mins < 264
CustServ Calls < 3.5
Day Mins >= 160
VMail Plan = yes
>= 264
>= 3.5
< 160
no
False.
2850 / 3333
False.
2766 / 3122
False.
2642 / 2871
True.
127 / 251
False.
111 / 149
True.
89 / 102
True.
127 / 211
False.
47 / 53
True.
121 / 158
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 13
Activity 1: CART Models
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 14
Activity 2: On Your Own!
After you complete the Churn example,
go to Line 75 to begin Example 2.
All the code you need is there.
Follow the directions and run the code.
Task:
After building the CART model,
use the model to find at least two decision rules.
State the confidence level of each one.
LaroseC@newpaltz.edu Predictive Analytics for Everyone! Building CART Models using R 15

Predictive Analytics for Everyone! Building CART Models using R - Chantal D. Larose, Ph.D.

  • 1.
    Predictive Analytics forEveryone! Building CART Models using R Chantal Larose Assistant Professor of Decision Science (Statistics) School of Business, SUNY New Paltz DASH Lab Workshop March 8 2017
  • 2.
    Why Predictive Analytics? Sports,healthcare, customer service – the world is full of data! Fun to pull stories out of a mess of numbers Examples: A Logistic Regression Approach to Predicting Who Will Make the NBA Playoffs 1 Data Mining Major League Baseballs Pace of Play Problem 2 More sports applications at: New England Symposium on Statistics in Sports Saturday, September 23, 2017 1 Ryan Elmore, Department of Business Information and Analytics, Daniels School of Business, University of Denver 2 Aaron Crowley, Zhuolin He, and Rachael Hageman Blair. Department of Biostatistics, State University of New York at Buffalo [email protected] Predictive Analytics for Everyone! Building CART Models using R 1
  • 3.
    Why R? Open source,free to download Active and helpful community Appeals to non-programmers: Different user interfaces (e.g. RStudio) allow for point-and-click interface for some tasks Appeals to programmers: Customizable – program your own functions, etc. [email protected] Predictive Analytics for Everyone! Building CART Models using R 2
  • 4.
    Set-up for theWorkshop Open up RStudio on your laptop (Apps → Other → RStudio) Go to the Workshop’s website: hawksites.newpaltz.edu/dashlab/predictive-analytics-for-everyone/ Download the Churn data set (.csv file) Download the Adult data set (.csv file) Download the Do It Yourself! guide (.R file) The analyses in this workshop are covered in more detail in Data Mining and Predictive Analytics, Second Edition. Larose & Larose, Wiley, 2015. [email protected] Predictive Analytics for Everyone! Building CART Models using R 3
  • 5.
    Getting Acquainted withR Open the Do It Yourself! R file. [email protected] Predictive Analytics for Everyone! Building CART Models using R 4
  • 6.
    Getting Acquainted withR Input the data set: [email protected] Predictive Analytics for Everyone! Building CART Models using R 5
  • 7.
    Getting Acquainted withR Let’s look at some code: [email protected] Predictive Analytics for Everyone! Building CART Models using R 6
  • 8.
    Getting Acquainted withR How do we tell R to run the code? 1. Highlight the code and press the Run button 2. Put your cursor on the line you want to run and press Control+Enter (no need to highlight code) [email protected] Predictive Analytics for Everyone! Building CART Models using R 7
  • 9.
    Activity 1: CARTModels We want to predict the value of one variable, using other variables For our first example: We want to predict the value of Churn, i.e. whether or not a customer leaves our company. We will predict Churn using variables such as: Day Mins: How many minutes during the day a customer uses their phone3 CustServ Calls: How many times a customer has called customer service VMail Plan: Whether or not a customer has the voicemail plan 3 Data is from when day and evening charges were different [email protected] Predictive Analytics for Everyone! Building CART Models using R 8
  • 10.
    Activity 1: CARTModels Wait – What about regression? The data may be too messy to meet the normality requirements, even with transformations Regression interpretations get very complex very fast (especially with transformations) Data set is too large! At some point, you have so many records that the F and t tests from regression will come back significant, no matter what the reality of the situation is CART models generate easy-to-understand “decision rules” (IF this, THEN that) that make intuitive sense [email protected] Predictive Analytics for Everyone! Building CART Models using R 9
  • 11.
    Activity 1: CARTModels - Setup [email protected] Predictive Analytics for Everyone! Building CART Models using R 10
  • 12.
    Activity 1: CARTModels - Setup [email protected] Predictive Analytics for Everyone! Building CART Models using R 11
  • 13.
    Activity 1: CARTModels CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total Day Mins < 264 CustServ Calls < 3.5 Int'l Plan = no Day Mins < 223 Eve Mins < 260 VMail Plan = yes Intl Calls >= 2.5 Intl Mins < 13 Day Mins >= 160 Eve Mins >= 142 Day Mins >= 176 Eve Mins >= 212 VMail Plan = yes Eve Mins < 188 Day Mins < 278 Eve Mins < 144 >= 264 >= 3.5 yes >= 223 >= 260 no < 2.5 >= 13 < 160 < 142 < 176 < 212 no >= 188 >= 278 >= 144 False. 2850 / 3333 False. 2766 / 3122 False. 2642 / 2871 False. 2476 / 2604 False. 2161 / 2221 False. 315 / 383 False. 298 / 332 True. 34 / 51 False. 11 / 11 True. 34 / 40 False. 166 / 267 False. 166 / 216 False. 166 / 173 True. 43 / 43 True. 51 / 51 True. 127 / 251 False. 111 / 149 False. 106 / 130 False. 86 / 96 False. 20 / 34 False. 18 / 18 True. 14 / 16 True. 14 / 19 True. 89 / 102 True. 127 / 211 False. 47 / 53 True. 121 / 158 False. 32 / 57 False. 21 / 25 True. 21 / 32 False. 7 / 8 True. 20 / 24 True. 96 / 101 [email protected] Predictive Analytics for Everyone! Building CART Models using R 12
  • 14.
    Activity 1: CARTModels CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total Day Mins < 264 CustServ Calls < 3.5 Day Mins >= 160 VMail Plan = yes >= 264 >= 3.5 < 160 no False. 2850 / 3333 False. 2766 / 3122 False. 2642 / 2871 True. 127 / 251 False. 111 / 149 True. 89 / 102 True. 127 / 211 False. 47 / 53 True. 121 / 158 [email protected] Predictive Analytics for Everyone! Building CART Models using R 13
  • 15.
    Activity 1: CARTModels [email protected] Predictive Analytics for Everyone! Building CART Models using R 14
  • 16.
    Activity 2: OnYour Own! After you complete the Churn example, go to Line 75 to begin Example 2. All the code you need is there. Follow the directions and run the code. Task: After building the CART model, use the model to find at least two decision rules. State the confidence level of each one. [email protected] Predictive Analytics for Everyone! Building CART Models using R 15