How To Build An Attrition Analysis Model
How To Build An Attrition Analysis Model
Building an attrition analysis (also known as churn analysis) is about finding the
relations between customers' attrition and the variables that affect it. The goal of
attrition analysis is to provide the manger or researcher the ability to understand what
the most important variables that cause attrition are and what the likelihood of a
customer to churn is.
It may looks easy to draw the main reasons that affect attrition: customer satisfaction,
length of service etc. Using those rules-of-thumbs the user can predict 15% of all
churners but using a mathematical procedure as in Analysis 6 can yield more then
60% precision.
Analysis 6 makes use of four logistic regression methods to find the best model that
can explain the main reasons for attrition. In this "how-to" paper, we will discuss a
simple yet powerful method for obtaining a good attrition model. We will also discuss
the model interpretation in order to deliver the manager / researcher tools to conduct
his own model.
A model that is based on logistic regression is a model that analyzes each variable
weight and contribution to the model goals. The variable contribution is measured in
percents and the manager or researcher can understand the weight of each variable on
the model target variable (In this example: attrition).
You can also use the Analysis 6 Logistic regression procedure for a wide variety of
fields like Employees attrition in HR field, Projects failure analysis, Engineering,
Social research, finance and other research aiming to find an explanation to a binary
(like "0" / "1" , "Churned"/"Not churned" etc.) event occurrence and prediction.
Preparing the data set:
Logistic regression can produce a model using a data set where the target variable has
two values: "1" means that the event has accrued and "0" that means that the event has
not accrued.
The data set contains 20 customers that have churned last year.10 of those customers
have churned and 10 are still with the company. The main goal is to score each
customer with a personal risk of churning (e.g. Joe john has 95% risk of churning).
An important outcome is the ability to understand what the reasons that cause the
attrition are and what influence do each variable has on customer's decision to leave
the company.
Variables description:
"Children": Number of children that a customer has.
"Age": Age of customer.
"Education": Number of education years.
"Calls": No of calls that the customer has done to the service center.
"Visits": No of visits that the customer has done to the local service center.
"Attrition": If the customer has churned ("1") or not ("0") – this is the model target
variable, so it has to have only two values: "1" or "0".
Step No .2: Selecting attrition model variables
The Explained variable is the target variable (In this case: Attrition). It is the variable
that we would like to know how the changes of explanatory variable values (In this
case: Age, Calls, Children, Education, Visits) affect it.
To define our model we will move the desired variables from the Explanatory
Variables frame on the left side of the wizard window to the Selected Columns frame
on the right side of the wizard window.
Step No .3: Defining attrition model
The software displays the ROC curve and the Area Under Curve (AUC) value.
We will not discuss the ROC or AUC methods here but, generally speaking, those
procedures measure the model's success to distinguish between "1" or "0" events of
the target variable.
As the AUC figure is close to "1" this means that the model has a very high success
distinguishing between the binary events ("0" or "1") of the targeted variable.
AUC value
0.5 No distinguish
ability
0.5-0.7 Not a very good
model
0.7-0.9 Very good
model
0.9-1.0 Excellent model
In our example, we have an AUC of 0.83 so we may proceed to view the rest of the
results.
Interpreting the statistical parameters can a complicated task that is not for our How-
to-paper so we will view other results that will help us to understand the attrition
phenomena.
Here is the main attrition model window; each variable has its own value regarding its
contribution to the attrition phenomena.
For example:
Age has the value of 0.9512 which means that for each additional year the churn risk
is decrease in 4.88%. (1-0.9512) = 4.88
Calls has the value of 1.0458 which means that for each additional call to the call
center the risk of churning increase by 4.58%. (1.0458-1) =4.58
In the same window there is another important test: Classification.
Figure no.1 measures the model performance identifying the non-churners ( 80%
success). Figure no.2 measures the model performance identifying the churners ( 70%
success). Figure no.3 measures the overall model performance ( 75% success).
After computing the logistic regression procedure, we can finally answer the question:
what affects the attrition phenomena and what is the weight of each variable on
"Attrition"?
1. "What-If" scenario where you can analyze specific case to learn from it on
your customer's attrition, or you can analyze specific customer.
Let us have a look at the following screen shot:
Frame no.1 is the variables calculator that calculate the risk of Attrition based
on the given variables : Age, Calls, Children, Education, Visits.
Frame no.2 shows the result of the calculation : The Probability of Attrition is
40% for a customer that is 57 years old, had called the call center 12 times, has
2 children, has 12 years education and has visited the customer service centers
two times.
1
2
2. "Sensitivity Table" where you can analyzed your customer's attrition
sensitivity having values changes of one of the explained variables that are
part of the attrition model.
Let us have a look at the following screen shot:
For a customer that have constant variables values as described in figure no.1
And the only change is the number of children from no children at all to six
children (Figure no.2) the Probability of Attrition is increasing from 14.9%
with no children to 90.78% risk at 6 children value (figure no.3).
1
3
3. Deploy Model : Current Results
A user that wishes to test the attrition model on current data may use the
Deploy Model: Current Results option that computes the probability of
attrition per customer. By doing so the user can find the cases where there is a
difference between the computed probability and actual value.
As you can see from the picture above the model has computed a churn risk of
97% for the first customer ( PROBABILITY = 0.97) and the actual Attrition was
"1" so in this case the DID_HIT value is "1" that means success of the model.
Now that you have the answer, you can decide where to put your organizational
efforts to fight attrition.