Meetup_FGVA_Uplift @ Dataiku

Introduction to Uplift Modelling
An online gaming application

A few words about me
•  Senior Data Scientist at Dataiku
(worked on churn prediction, fraud detection, bot detection, recommender systems, graph
analytics, smart cities, … )
•  Occasional Kaggle competitor
•  Mostly code with python and SQL
•  Twitter @prrgutierrez

Plan
•  Introduction / Client situation
•  Uplift use case examples
•  Uplift modeling
•  Uplift evaluation & results

Client situation
•  Ankama : French Online Gaming Company (RPG)
•  Users are leaving
•  let’s do a churn prediction model !
•  Target : no come back in 14 or 28 days.
(14 missing days -> 80 % of chance not to come back
28 missing days -> 90 % of chance not to come back)
•  Features :
•  Connection features :
•  Time played in 1,7,15,30,… days
•  Time since last connection
•  Connection frequency
•  Days of week / hours of days played
•  Equivalent for payments and subscriptions
•  Age, sex, country
•  Number of account, is a bot …
•  No in game features (no data)

Client situation
•  Model Results :
•  AUC 0.88
•  Very stable model
•  Marketing actions :
•  7 diﬀerent actions based on customer segmentation
(oﬀers, promotion, … )
•  A/B test
-> -5 % churn for persons contacted by email
•  Going further :
•  Feature engineering : guilds, close network, in game actions, …
•  Study long term churn …

Client situation
•  But wait !
•  Strong hypothesis : target the person that are the most likely to churn

Client situation
•  But wait !
•  What is the gain / person for an action ?
•  cost of action
•  value of the customer
•  independent variables
•  “treated” population and “control” population
• 
•  Value with action :
•  Value without action :
•  Gain (if independent of treatment ) :
c
vi i
X
T C
Y =
⇢
1 if customer churn
0 otherwise
ET
(Vi) = vi(1 PT
(Y = 1|X)) c
EC
(Vi) = vi(1 PC
(Y = 1|X))
vi
E(Gi) = vi(PC
(Y = 1|X) PT
(Y = 1|X)) c

Client situation
•  But wait !
•  What is the gain / person for an action ?
•  Objective : maximize this gain
•  Targeting highly probable churner -> minimize
But not the diﬀerence !
•  Intuitive examples :
•  : action is expected to make the situation worst. Spam ?
•  : user does not care, is already lost
Upli&
=
Model

E(Gi) = vi(PC
(Y = 1|X) PT
(Y = 1|X)) c
PT
(Y = 1|X)
PC
(Y = 1) ⇡ PT
(Y = 1)
P
PC
(Y = 1) < PT
(Y = 1)

Uplift
•  Model eﬀect of the action
•  4 groups of customers / patients
•  1 Responded because of the action
(the people we want)
•  2 Responded, but would have responded anyway
(unnecessary costs)
•  3 Did not respond and the action had no impact
(unnecessary costs)
•  4 Did not respond because the action had a negative impact
(negative impact)
•  Incomplete knowledge

Uplift Examples
•  Healthcare :
•  A typical medical trial:
•  treatment group: gets the treatment
•  control group: gets placebo (or another treatment)
•  do a statistical test to show that the treatment is better than placebo
•  With uplift modeling we can find out for whom the treatment works best
•  Personalized medicine
•  Ex : What is the gain in survival probability ?
-> classification/uplift problem

Uplift Examples
•  Churn :
•  E-gaming
•  Other Ex : Coyote
•  Retail :
•  Compare coupons campaigns

Uplift Examples
•  Mailing : Hillstrom challenge
•  2 campaigns :
•  one men email
•  one woman email
•  Question : who are the people to target / that have the best response rate

Uplift Examples
•  Common pattern
•  Experiment or A/B testing -> Test and control
•  Warning : Control can be biased easily :
•  Targeted most probable churners and control is the rest
•  Call only the people that come to a shop
•  Limited experiment trial -> no bandit algorithm :
(once a medicine experiment is done, you don’t continue the “exploration”)
-> relatively large and discrete in time feedbacks.

Uplift modelling
•  Three main methods :
•  Two models approach
•  Class variable modification
•  Modification of existing machine learning models

Uplift modelling : Two model approach
•  Build a model on treatment to get
•  Build a model on control to get
•  Set :
PT
(Y |X)
PC
(Y |X)
P = PT
(Y |X) PC
(Y |X)

Uplift modelling : Two model approach
•  Advantages :
•  Standard ML models can be used
•  In theory, two good estimators -> a good uplift model
•  Works well in practice
•  Generalize to regression and multi-treatment easily
•  Drawbacks
•  Difference of estimators is probably not the best estimator of the difference
•  The two classifier can ignore the weaker uplift signal (since it’s not their target)
•  Algorithm focusing on estimating the difference should perform better

Uplift modelling : Class variable modification
•  Introduced in Jaskowski, Jaroszewicz 2012
•  Allows any classifier to be updated to uplift modeling
•  Let denote the group membership (Treatment or Control)
•  Let’s define the new target variable :
•  This corresponds to flipping the target in the control dataset.
G 2 {T, C}
Z =
8
<
:
1 if G = T and Y = 1
1 if G = C and Y = 0
0 otherwise

•  Summary :
•  Flip class for control dataset
•  Concatenate test and control dataset
•  Build a classifier
•  Target users with highest probability
•  Advantages :
•  Any classifier can be used
•  Directly predict uplift (and not each class separately)
•  Single model on a larger dataset (instead of two small ones)
•  Drawbacks :
•  Complex decision surface -> model can perform poorly
•  Interpretation : what is AUC in this case ?

Uplift modeling : Other methods
•  Based on decision trees :
•  Rzepakowski Jaroszewicz 2012
new decision tree split criterion based on information theory
•  Soltys Rzepakowski Jaroszewicz 2013
Ensemble methods for uplift modeling
(out of today scope)

Evaluation
•  We used :
•  2 model approach. -> AUC ? Not very informative.
•  1 model approach -> does AUC means something ?
•  How can we evaluate / compare them ?
•  Cross Validation :
•  4 datasets : treatment/control x train/test
•  Problem :
•  We don’t have a clear 0/1 target.
•  We would need to know for each customer
•  Response to treatment
•  Response to control
-> not possible

Evaluation
•  Gain for group of customers :
•  Gain for the 10% highest scoring customers =
% of successes for top 10% treated customers − % of successes for top 10% control
customers
•  Uplift curve ? :
•  Diﬀerence between two lift curve
•  Interpretation : net gain in success rate if a given percentage of the population is treated
•  Pb : no theoretic maximum
•  Pb 2 : weird behaviour for 2 wizard models.

Evaluation : Qini
•  Qini Measure :
•  Similar to Gini (Area under lift curve). Lift Curve <-> Qini Curve
•  Parametric curve defined by :
•  When taking the first observations
•  is the total number of 1 seen in target observations
•  is the total number of 1 seen in control observations
•  is the total number of target observations
•  is the total number of control observations
•  Balanced setting :
t
f(t) = YT (t) YC(t) ⇤ NC(t)/NT (t)
YT
YC
NC
NT
f(t) = YT (t) YC(t)

Evaluation : Qini
•  Personal intuition :
•  We can’t know everything :
•  treated that convert, not treated that don’t convert. What would have happen ?
•  But we don’t want to see :
•  Treated not converting
•  Not treated converting (in our top list)
•  In we want to minimize :
•  Very similar to lift taking into account only negative examples.
t
NT (t) YT (t) + YC(t)

Evaluation : Qini
f(t) = YT (t) YC(t)

Evaluation : Qini
•  Best model :
•  Take first all positive in target and last all positive in control.
•  No theoretic best model :
•  depends on possibility of negative effect
•  Displayed for no negative effect
•  Random model :
•  Corresponds to global effect of treatment
•  Hillstrom Dataset :
•  For women models are comparable and useful
•  For men, there is no clear individuals to target

Evaluation : Qini
•  Back to our study :
•  Class modification performs best
•  Two models approach performs poorly
•  A/B test problem :
•  Control dataset is way to small !
•  Class modification model very close to lift
•  Two model slightly better than random
-> would need to redo the A/B test.

Conclusion
•  Uplift :
•  Surprisingly little literature / examples
•  The theory is rather easy to test
•  Two models
•  Class modification
•  The intuition and evaluation are not easy to grasp
•  On the client side :
•  A good lead to select the best oﬀer for a customer

A few references
•  Data :
•  Churn in gaming :
WOWAH dataset (blog post to come)
•  Uplift for healthcare :
Colon Dataset
•  Uplift in mailing :
Hillstrom data challenge
•  Uplift in General :
Simulated data :
(blog post to come)

A few references
•  Application
•  Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz)
•  Uplift Modeling in Direct Marketing (Rzepakowski, Jaroszewicz)

A few references
•  Modeling techniques :
•  Rzepakowski Jaroszewicz 2011 (decision trees)
•  Soltys Rzepakowski Jaroszewicz 2013 (ensemble for uplift)
•  Jaskowski Jaroszewicz 2012 (Class modification model)

A few references
•  Evaluation
•  Using Control Groups to Target on Predicted Lift (Radcliﬀe)
•  Testing a New Metric for Uplift Models (Mesalles Naranjo)

Thank you for your attention !

Meetup_FGVA_Uplift @ Dataiku

More Related Content

Viewers also liked

Similar to Meetup_FGVA_Uplift @ Dataiku

More from Johan-André Jeanville

Recently uploaded

Meetup_FGVA_Uplift @ Dataiku