0% found this document useful (0 votes)
160 views

6.2 Marketing Analysis Predicting Customer CHurn in Python

This document discusses various data preparation techniques for predictive modeling, including encoding categorical variables, handling missing data, feature scaling, and feature selection. It explains how to encode binary and categorical features, handle missing data, detect and remove correlated features, scale features to put them on the same distribution, and engineer new features through domain knowledge to improve model performance. Feature engineering techniques like creating a total minutes feature or calculating the ratio of minutes to charges are presented. The key steps of data preparation for modeling are covered at a high level.

Uploaded by

murari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

6.2 Marketing Analysis Predicting Customer CHurn in Python

This document discusses various data preparation techniques for predictive modeling, including encoding categorical variables, handling missing data, feature scaling, and feature selection. It explains how to encode binary and categorical features, handle missing data, detect and remove correlated features, scale features to put them on the same distribution, and engineer new features through domain knowledge to improve model performance. Feature engineering techniques like creating a total minutes feature or calculating the ratio of minutes to charges are presented. The key steps of data preparation for modeling are covered at a high level.

Uploaded by

murari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Preparation

M A R K E T I N G A N A LY T I C S : P R E D I C T I N G C U S T O M E R C H U R N I N P Y T H O N

Mark Peterson
Senior Data Scientist, Alliance Data
Model assumptions
Some assumptions that models make:
That the features are normally distributed

That the features are on the same scale

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Data types
Machine learning algorithms require numeric data types
Need to encode categorical variables as numeric

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.dtypes

Account_Length int64
Vmail_Message int64
Day_Mins float64
Eve_Mins float64
Night_Mins float64
Intl_Mins float64
CustServ_Calls int64
Churn object
Intl_Plan object
Vmail_Plan object
Day_Calls int64
Day_Charge float64
Eve_Calls int64
Eve_Charge float64
Night_Calls int64
Night_Charge float64
Intl_Calls int64
Intl_Charge float64
State object
Area_Code int64
Phone object
dtype: object

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Encoding binary features
telco['Intl_Plan'].head()

0 no
1 no
2 no
3 yes
4 yes
Name: Intl_Plan, dtype: object

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Encoding binary features
Option 1: .replace() Option 2: LabelEncoder()

from sklearn.preprocessing importLabelEncoder

telco['Intl_Plan'].replace({'no':0 , 'yes':1}) LabelEncoder().fit_transform(telco["Intl_Plan"])

telco['Intl_Plan'].head() telco['Intl_Plan'].head()

0 0 0 0
1 0 1 0
2 0 2 0
3 1 3 1
4 1 4 1
Name: Intl_Plan, dtype: int64 Name: Intl_Plan, dtype: int64

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Encoding state
Could assign a number to each state
telco['State'].head(4)

0 0
0 KS 1 1
1 OH 2 2
2 NJ 3 1
3 OH Name: State, dtype: int64
Name: State, dtype: object
Bad idea

Would make your model less e ective

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


One hot encoding

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


One hot encoding

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


One hot encoding

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Feature scaling
Features should be on the same scale

Rarely true of real-world data

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Feature scaling
telco['Intl_Calls'].describe() telco['Night_Mins'].describe()

count 3333.000000 count 3333.000000


mean 4.479448 mean 200.872037
std 2.461214 std 50.573847
min 0.000000 min 23.200000
25% 3.000000 25% 167.000000
50% 4.000000 50% 201.200000
75% 6.000000 75% 235.300000
max 20.000000 max 395.000000
Name: Intl_Calls, dtype: float64 Name: Night_Mins, dtype: float64

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Standardization
Centers the distribution around the mean

Calculates the number of standard deviations away from the mean each point is

from sklearn.preprocessing import StandardScaler

df = StandardScaler().fit_transform(df)

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Let's practice!
M A R K E T I N G A N A LY T I C S : P R E D I C T I N G C U S T O M E R C H U R N I N P Y T H O N
Feature selection
and engineering
M A R K E T I N G A N A LY T I C S : P R E D I C T I N G C U S T O M E R C H U R N I N P Y T H O N

Mark Peterson
Senior Data Scientist, Alliance Data
Dropping unnecessary features
Unique identi ers
Phone numbers

Social security numbers

Account numbers

.drop() method

telco.drop(['Soc_Sec', 'Tax_ID'], axis=1)

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Dropping correlated features
Highly correlated features can be dropped

They provide no additional information to the model

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


telco.corr()

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Feature engineering
Creating new features to help improve model performance

Should consult with business and subject ma er experts

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Examples of feature engineering
Total Minutes: Sum of Day_Mins , Eve_Mins , Night_Mins , Intl_Mins

Ratio between Minutes and Charge

telco['Day_Cost'] = telco['Day_Mins'] / telco['Day_Charge']

MARKETING ANALYTICS: PREDICTING CUSTOMER CHURN IN PYTHON


Let's practice!
M A R K E T I N G A N A LY T I C S : P R E D I C T I N G C U S T O M E R C H U R N I N P Y T H O N

You might also like