0% found this document useful (0 votes)

54 views9 pages

PC2. Bernal Leandro, Melissa

The document describes a program that performs customer segmentation and lifetime value analysis on transactional data. It includes the following key steps: 1. Import necessary libraries and read in transaction data. 2. Calculate recency, frequency, and monetary (revenue) scores by customer and cluster customers into groups for each metric. 3. Analyze how outliers influence cluster formation and determine the optimal number of clusters. 4. Provide descriptive statistics that show the VVC cluster with the most recent customers has the lowest average revenue, while the cluster with most frequent, long-term customers has the highest average revenue.

Uploaded by

Diego Vega

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views9 pages

PC2. Bernal Leandro, Melissa

Uploaded by

Diego Vega

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

I PROGRAMA: ANALISTA DE DATOS DE NEGOCIOS – MODALIDAD

VIRTUAL

CLIENT PROFITABILITY ANALYTICS

Segunda Práctica Calificada

Alumno: Melissa Lesly Bernal Leandro

Email: [email protected]
Fecha: 12 abril de 2021
ACTIVIDAD 1
Enunciado: Mostrar procedimiento desarrollado en un solo programa.
# Parte A
import pandas as pd
import numpy as np
import sys
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
def order_cluster(cluster_field_name, target_field_name,df,ascending):
new_cluster_field_name = 'new_' + cluster_field_name
df_new =
df.groupby(cluster_field_name)[target_field_name].mean().reset_index()
df_new =
df_new.sort_values(by=target_field_name,ascending=ascending).reset_index(dr
op=True)
df_new['index'] = df_new.index
df_final = pd.merge(df,df_new[[cluster_field_name,'index']],
on=cluster_field_name)
df_final = df_final.drop([cluster_field_name],axis=1)
df_final = df_final.rename(columns={"index":cluster_field_name})
return df_final
# Parte B
tx_data = pd.read_csv('./data/customer_segmentation.csv', encoding='cp1252')
tx_data['InvoiceDate'] = pd.to_datetime(tx_data['InvoiceDate'])
tx_data['InvoiceYearMonth'] = tx_data['InvoiceDate'].map(lambda date:
100*date.year + date.month)
tx_uk = tx_data.query("Country=='United Kingdom'").reset_index(drop=True)
# Parte C
tx_user = pd.DataFrame(tx_data['CustomerID'].unique())
tx_user.columns = ['CustomerID']
# Parte D
tx_max_purchase =
tx_uk.groupby('CustomerID').InvoiceDate.max().reset_index()
tx_max_purchase.columns = ['CustomerID','MaxPurchaseDate']
tx_max_purchase['Recency'] = (tx_max_purchase['MaxPurchaseDate'].max() -
tx_max_purchase['MaxPurchaseDate']).dt.days
tx_user = pd.merge(tx_user, tx_max_purchase[['CustomerID','Recency']],
on='CustomerID')
kmeans = KMeans(n_clusters=4)
tx_user['RecencyCluster'] = kmeans.fit_predict(tx_user[['Recency']])
tx_user = order_cluster('RecencyCluster', 'Recency',tx_user,False)
# Parte E
tx_frequency = tx_uk.groupby('CustomerID').InvoiceDate.count().reset_index()
tx_frequency.columns = ['CustomerID','Frequency']
tx_user = pd.merge(tx_user, tx_frequency, on='CustomerID')
kmeans=KMeans(n_clusters=4)
tx_user['FrequencyCluster']=kmeans.fit_predict(tx_user[['Frequency']])
tx_user = order_cluster('FrequencyCluster', 'Frequency', tx_user, True )
# Parte F
tx_uk['Revenue'] = tx_uk['UnitPrice'] * tx_uk['Quantity']
tx_revenue = tx_uk.groupby('CustomerID').Revenue.sum().reset_index()
tx_user = pd.merge(tx_user, tx_revenue, on='CustomerID')
#tx_user = tx_user[tx_user['Revenue']<tx_user['Revenue'].quantile(0.98)]
kmeans = KMeans(n_clusters=4)
tx_user['RevenueCluster'] = kmeans.fit_predict(tx_user[['Revenue']])
tx_user = order_cluster('RevenueCluster', 'Revenue',tx_user,True)
# Parte G
kmeans = KMeans(n_clusters=4)
tx_user['VVCCluster'] = kmeans.fit_predict(tx_user[['RecencyCluster',
'FrequencyCluster','RevenueCluster']])
# Parte H G
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = np.array(tx_user['RecencyCluster'])
y = np.array(tx_user['FrequencyCluster'])
z = np.array(tx_user['RevenueCluster'])
ax.set_title("Valor de Vida de los Clientes")
ax.set_xlabel("RecencyCluster")
ax.set_ylabel("FrequencyCluster")
ax.set_zlabel("RevenueCluster")
ax.scatter(x,y,z, marker="s", c=tx_user["VVCCluster"], s=40, cmap="RdBu")
plt.show()
ACTIVIDAD 2
Enunciado: Explicación del desarrollo del programa.
# Parte A: Se importa las librerías necesarias para el funcionamiento del
programa

# Parte B: Se lee la data y se procede a su creación respectiva

# Parte C: Se crea una dataframe genérico para capturar el CustomerID y

nuevos escores de segmentación

# Parte D: Se calcula Recency

# Se calcula la fecha máxima de compra por cada cliente

# Se compara la fecha de la última transacción por cada cliente

# Se pasa estos datos a tx_user

# Se construye 4 grupos para recency y se adiciona a tx_user

# Se ordena los grupos

Resultado gráfico

# Parte E : Se calcula Frecuencia

# Se calcula el número de transacciones por cada cliente

# Se pasa estos datos a tx_user

# Se construye 4 grupos para frecuencia y adiciona a tx_user

# Se ordena por la frecuencia del cluster

Resultado gráfico:

# Parte F : Se calcula Revenue

# Se calcula el ingreso por cada cliente
# Se pasa estos datos a tx_user

# Se construye 4 grupos y adiciona a tx_user

# Se ordena

Resultado gráfico:
ACTIVDAD 3
Enunciado: Analizar la influencia de los valores extremos en la formación de
los grupos.
Recency

Antes Después

Frecuencia

Antes Después

Revenue

Antes Después

En los tres casos mostrados se puede apreciar los valores extremos mínimos y
máximos son suavizados para una mejor representación de los resultados.
ACTIVIDAD 4
Enunciado: Determinar la cantidad de grupos VVC apropiado.
4 grupos

20 grupos

100 grupos

Como se puede observar conforme se cuente con más grupos los datos
graficados incrementan en volumen y su ubicación en cercanía.
• Para RecencyCluster se tiene los siguientes valores:

count mean std min 25% 50% 75% max

VVCCluster
0 1046 0.543021 0.498384 0 0 1 1 1
1 1532 3 0 3 3 3 3 3
2 950 2 0 2 2 2 2 2
3 422 2.990521 0.097011 2 3 3 3 3

Se tiene el mayor valor recency para el VVCcluster 1 (más tiempo) y el menor

valor para el VVCcluster 0 (más reciente).

• Para FrecuencyCluster se tiene los siguientes valores:

count mean std min 25% 50% 75% max
VVCCluster
0 1046 0.007648 0.08716 0 0 0 0 1
1 1532 0 0 0 0 0 0 0
2 950 0.032632 0.177764 0 0 0 0 1
3 422 1.049763 0.308066 0 1 1 1 3

Se tiene el mayor valor frecuencia para el VVCcluster 3 y el menor valor para el

VVCcluster 1.

• Para RevenueCluster se tiene los siguientes valores:

count mean std min 25% 50% 75% max
VVCCluster
0 1046 0.005736 0.075556 0 0 0 0 1
1 1532 0.039817 0.195593 0 0 0 0 1
2 950 0.009474 0.096922 0 0 0 0 1
3 422 0.516588 0.638034 0 0 0 1 3

Se tiene el mayor valor revenue para el VVCcluster 3 y el menor valor para el

VVCcluster 0.

VVCCluster Cantidad % total Se puede apreciar que el grupo que más

0 1046 26% reciente ha operado con la empresa en el
1 1532 39%
VVCc 0 (26% del total de empresas), sin
2 950 24%
3 422 11% embargo, este grupo representa el menor
TOTAL 3950 100% revenue (promedio 0.57%). De la misma
manera, el mayor valor revenue corresponde al grupo VVCc 3 (promedio
51.66%), además cuentan con el mayor de valor frecuencia, este grupo
representa el 11% del total de empresas.

Clustering - Case Study 4
No ratings yet
Clustering - Case Study 4
27 pages
MBA60204 - Clustering Class Activity 2025
No ratings yet
MBA60204 - Clustering Class Activity 2025
4 pages
RFM Customer Segmentation Guide
No ratings yet
RFM Customer Segmentation Guide
5 pages
Day 4
No ratings yet
Day 4
62 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Chapter 5 CLUSTERING
No ratings yet
Chapter 5 CLUSTERING
36 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Group6-Workshop 2 Report
No ratings yet
Group6-Workshop 2 Report
14 pages
Lab 8-DA
No ratings yet
Lab 8-DA
1 page
Bank Customer Segmentation Guide
No ratings yet
Bank Customer Segmentation Guide
32 pages
Unit 4-DWDM
No ratings yet
Unit 4-DWDM
23 pages
AWS Certified AI Practitioner DAY-2
No ratings yet
AWS Certified AI Practitioner DAY-2
4 pages
Assignment ....
No ratings yet
Assignment ....
8 pages
PDF Custome Segmentation
No ratings yet
PDF Custome Segmentation
18 pages
Class Activity#7 Robert Skublen
No ratings yet
Class Activity#7 Robert Skublen
7 pages
Bank Customer Segmentation Guide
No ratings yet
Bank Customer Segmentation Guide
53 pages
Slide 4: Eda: (Loi)
No ratings yet
Slide 4: Eda: (Loi)
4 pages
Project Report - Data Mining
0% (1)
Project Report - Data Mining
52 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Credit Card Segmentation
No ratings yet
Credit Card Segmentation
3 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
A Novel Curve Clustering Method For Functional Dat
No ratings yet
A Novel Curve Clustering Method For Functional Dat
28 pages
Customer Segmentation Using RFM Analysis: Overview
No ratings yet
Customer Segmentation Using RFM Analysis: Overview
11 pages
Data Mining Project Anshul
100% (1)
Data Mining Project Anshul
48 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
Business Analytics
No ratings yet
Business Analytics
159 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Mi Hova 2018
No ratings yet
Mi Hova 2018
9 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
Data Analysis and Data Science Task - 3
No ratings yet
Data Analysis and Data Science Task - 3
3 pages
Data Enggineering
No ratings yet
Data Enggineering
16 pages
Data Mining
No ratings yet
Data Mining
27 pages
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
No ratings yet
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
10 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Customer Segmentation via Data Science
No ratings yet
Customer Segmentation via Data Science
21 pages
Data Mining for Business Insights
83% (12)
Data Mining for Business Insights
34 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
QYB - Set Analysis and AGGR Exercises
No ratings yet
QYB - Set Analysis and AGGR Exercises
15 pages
WORK BOOK 8 - Segmentation
No ratings yet
WORK BOOK 8 - Segmentation
12 pages
Mall Customer Segmentation Guide
No ratings yet
Mall Customer Segmentation Guide
8 pages
Clase 5.2
No ratings yet
Clase 5.2
4 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Análisis de Datos Empresariales con Power BI
No ratings yet
Análisis de Datos Empresariales con Power BI
8 pages
TAE 1 ABL - Report Format - R Programming
No ratings yet
TAE 1 ABL - Report Format - R Programming
5 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
ML Assignment 4
No ratings yet
ML Assignment 4
6 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
22 pages
AML Assignment 1 1
No ratings yet
AML Assignment 1 1
4 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
Segmentation Steps
No ratings yet
Segmentation Steps
10 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
我愛你
No ratings yet
我愛你
8 pages
Optimal Binary Search Trees: Problem
No ratings yet
Optimal Binary Search Trees: Problem
16 pages
Precision Care Challenge
No ratings yet
Precision Care Challenge
11 pages
Properties of Integers DEMO 1st Quarter
No ratings yet
Properties of Integers DEMO 1st Quarter
42 pages
Final Report
No ratings yet
Final Report
94 pages
Como Reservar IP No Servidor DHCP Do Firewall FortiGate - LinkedIn
No ratings yet
Como Reservar IP No Servidor DHCP Do Firewall FortiGate - LinkedIn
9 pages
Assessment 1 Short Answers SITXHRM008
No ratings yet
Assessment 1 Short Answers SITXHRM008
7 pages
Whatsnew 6767
No ratings yet
Whatsnew 6767
15 pages
Robot
No ratings yet
Robot
33 pages
Letter To Secretaries - 1
No ratings yet
Letter To Secretaries - 1
5 pages
AYA Mbanking 2.0 USER GUIDE
No ratings yet
AYA Mbanking 2.0 USER GUIDE
47 pages
MCQ's Blockchain PDF 1
No ratings yet
MCQ's Blockchain PDF 1
5 pages
Econometrics Data Cleaning Guide
No ratings yet
Econometrics Data Cleaning Guide
7 pages
C++ Programming Challenges
No ratings yet
C++ Programming Challenges
68 pages
Maintenance Log for Technicians
No ratings yet
Maintenance Log for Technicians
1 page
Technology For Teaching and Learning 1 Module 1
100% (5)
Technology For Teaching and Learning 1 Module 1
54 pages
Euc F2020
No ratings yet
Euc F2020
45 pages
Chapter 2 - Introduction To Visual Studio IDE
No ratings yet
Chapter 2 - Introduction To Visual Studio IDE
11 pages
ADR Touch Control PRO Overview
No ratings yet
ADR Touch Control PRO Overview
5 pages
Public Interface Format Definition
No ratings yet
Public Interface Format Definition
29 pages
East Asia's Digital Privacy Crisis
No ratings yet
East Asia's Digital Privacy Crisis
4 pages
Ip Project Topic
No ratings yet
Ip Project Topic
1 page
Ddco Module 2 PPT - Template Ait
No ratings yet
Ddco Module 2 PPT - Template Ait
73 pages
Support Files Guitar Scale Exercises Runs
100% (2)
Support Files Guitar Scale Exercises Runs
10 pages
Amazon Listing Step by Step
No ratings yet
Amazon Listing Step by Step
7 pages
Yadav FCCNs Fully Complex-Valued Convolutional Networks Using Complex-Valued Color Model and ICCV 2023 Paper
No ratings yet
Yadav FCCNs Fully Complex-Valued Convolutional Networks Using Complex-Valued Color Model and ICCV 2023 Paper
10 pages
Cloud Virtual Internship
No ratings yet
Cloud Virtual Internship
38 pages
Cao Assignment
No ratings yet
Cao Assignment
2 pages
Showing Employee Login Date, Logout Date and Status
No ratings yet
Showing Employee Login Date, Logout Date and Status
7 pages
R18 COs - 6
No ratings yet
R18 COs - 6
4 pages
ISO 15242-2-f
No ratings yet
ISO 15242-2-f
16 pages

PC2. Bernal Leandro, Melissa

Uploaded by

PC2. Bernal Leandro, Melissa

Uploaded by

I PROGRAMA: ANALISTA DE DATOS DE NEGOCIOS – MODALIDAD

CLIENT PROFITABILITY ANALYTICS

Alumno: Melissa Lesly Bernal Leandro

# Parte B: Se lee la data y se procede a su creación respectiva

# Parte C: Se crea una dataframe genérico para capturar el CustomerID y

# Parte D: Se calcula Recency

# Se compara la fecha de la última transacción por cada cliente

# Se pasa estos datos a tx_user

# Se construye 4 grupos para recency y se adiciona a tx_user

# Se ordena los grupos

# Parte E : Se calcula Frecuencia

# Se pasa estos datos a tx_user

# Se construye 4 grupos para frecuencia y adiciona a tx_user

# Se ordena por la frecuencia del cluster

# Parte F : Se calcula Revenue

# Se construye 4 grupos y adiciona a tx_user

count mean std min 25% 50% 75% max

Se tiene el mayor valor recency para el VVCcluster 1 (más tiempo) y el menor

• Para FrecuencyCluster se tiene los siguientes valores:

Se tiene el mayor valor frecuencia para el VVCcluster 3 y el menor valor para el

• Para RevenueCluster se tiene los siguientes valores:

Se tiene el mayor valor revenue para el VVCcluster 3 y el menor valor para el

VVCCluster Cantidad % total Se puede apreciar que el grupo que más

You might also like