0% found this document useful (0 votes)

69 views4 pages

Rev Insurance Business Report

The document discusses using various machine learning models including decision trees, random forests, and artificial neural networks to analyze a customer dataset and segment customers. It first preprocesses the data, splits it into training and test sets, then builds models with each algorithm. Performance is evaluated on each model using various metrics including accuracy, confusion matrices, and ROC curves. The artificial neural network model is identified as the most optimized for solving the business problem based on its performance metrics. Recommendations are made to first normalize the data before applying models and that random forests can be useful for both classification and regression tasks.

Uploaded by

Pratigya pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views4 pages

Rev Insurance Business Report

Uploaded by

Pratigya pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

BUSINESS REPORT-

Problem 2: CART-RF-ANN

2.1 Data Ingestion: Read the dataset. Do the descriptive statistics and do null value

condition check, write an inference on it.

import pandas as pd

from PIL import Image

import numpy as np

#from scipy.cluster.hierarchy import dendrogram, linkage,fcluster

import scipy.linalg as la

from sklearn.preprocessing import MinMaxScaler

from sklearn.model_selection import train_test_split

#from sklearn.cluster import KMeans

#from sklearn.metrics import silhouette_samples, silhouette_score

#from sklearn import metrics

from sklearn.tree import DecisionTreeClassifier

from sklearn import tree

from sklearn.ensemble import RandomForestClassifier

from sklearn.neural_network import MLPClassifier

import matplotlib.pyplot as plt

import seaborn as sns

2.2 Data Split: Split the data into test and train, build classification model CART, Random
Forest, Artificial Neural Network.
# Decision tree in Python can take only numerical / categorical colums. It
cannot take string / obeject types.
# The following code loops through each column and checks if the column type
is object then converts those columns
# into categorical with each distinct value becoming a category or code
# capture the target column ("default") into separate vectors for training set and test set.
# splitting data into training and test set for independent attributes
X_train, X_test, train_labels, test_labels = train_test_split(X, y, test_size=.30,
random_state=1)
print (pd.mydata(dt_model.feature_importances_, columns = ["Imp"], index =
X_train.columns))
Age 0.175142
Agency_Code 0.195045
Type 0.003095
Commision 0.082596
Channel 0.007262
Duration 0.266131
Sales 0.211101

Product Name 0.039937

Destination 0.019691
Random Forest
rfcl = RandomForestClassifier(n_estimators = 501,random_state=1)
rfcl = rfcl.fit(X_train, train_labels) precision recall f1-score support
0 0.80 0.91 0.86 1471
1 0.70 0.48 0.57 629

accuracy 0.78 2100

macro avg 0.75 0.70 0.71 2100
weighted avg 0.77 0.78 0.77 2100
Data is suitable for precision because first cluster is 80% and second is 70%, where as micro avg
values is 75% and weighted avg values id 77% and data distribution between 70 and 30% ratio.
ANN precision recall f1-score support
0 0.81 0.91 0.86 1471
1 0.70 0.51 0.59 629
accuracy 0.79 2100
macro avg 0.76 0.71 0.73 2100
weighted avg 0.78 0.79 0.78 2100
Data is suitable for precision because first cluster is 81% and second is 70%, where as micro avg
values is 76% and weighted avg values id 78% and data distribution between 70%and 30% ratio.
Means data set of 3000 customers have been divide in 1471 and 629 customers.
2.3 Performance Metrics: Check the performance of Predictions on Train and Test sets
using Accuracy (1.5 pts), Confusion Matrix (2 pts), Plot ROC curve and get ROC_AUC
score for each model (2 pts), Write inferences on each model (2 pts).
#Decison Tree
# AUC and ROC for the train data
reg_dt_model = DecisionTreeClassifier(criterion = 'gini', max_depth =
7,min_samples_leaf=10,min_samples_split=30)
reg_dt_model.fit(X_train, train_labels)
insu_tree_regularized = open('C:\Users\Anu\Downloads\insu_tree_regularized.dot,'w')
dot_data = tree.export_graphviz(reg_dt_model, out_file= insu_tree_regularized , feature_names
= list(X_train), class_names = list(train_char_label))
insu_tree_regularized.close()
print (pd.DataFrame(dt_model.feature_importances_, columns = ["Imp"], index =
X_train.columns))
ytrain_predict = reg_dt_model.predict(X_train)
ytest_predict = reg_dt_model.predict(X_test)
Need to setup data set in #Random Forest
AUC and ROC for the training data and AUC &ROC data set in Test data which will help us to
understand data calculate AUC and calculate ROC Curve as well.
#decision Tree
# AUC and ROC for the training data
# predict probabilities to get ROC Curve model for Training and Test data set separately. The data
set for Random Forest Decision Tree and ANN is very close to precision means 82.4% is close to
precision in random forest.method.
AUC: 0.824
AUC: 0.864
AUC: 0.817
AUC: 0.793
AUC: 0.793
AUC: 0.798
# AUC and ROC for the test data
precision recall

AUC: 0.793
AUC: 0.793
AUC: 0.798
# AUC and ROC for the test data
precision recall f1-score support
0 0.84 0.89 0.86 1471
1 0.70 0.61 0.65 629
accuracy 0.81 2100

macro avg 0.77 0.75 0.76 2100

weighted avg 0.80 0.81 0.80 2100
precision recall f1-score support
0 0.78 0.89 0.83 605
1 0.68 0.50 0.58 295
accuracy 0.76 900
macro avg 0.73 0.69 0.70 900
weighted avg

2.4 Final Model - Compare all models on the basis of the performance metrics in a
structured tabular manner (3 pts). Describe on which model is best/optimized (2 pts ).
Train Data
precision recall f1-score support
0 0.84 0.89 0.86 1471
1 0.70 0.61 0.65 629
accuracy 0.81 2100
macro avg 0.77 0.75 0.76 2100
weighted avg 0.80 0.81 0.80 2100

Test Data
precision recall f1-score support
0 0.78 0.89 0.83 605
1 0.68 0.50 0.58 295
accuracy 0.76 900
macro avg 0.73 0.69 0.70 900
weighted avg 0.75 0.76 0.75 900

Both the Model is good and can be used for Data Mining. But ANN is more Optimize to solve business
problems

2.5 Based on your analysis and working on the business problem, detail out appropriate 5
insights and recommendations to help the management solve the business objective.
We will be working on a wholesale customer segmentation problem. The data is hosted on
the UCI Machine Learning repository. The aim of this problem is to segment the clients to
provide Travel Distributions based on their traveling on diverse product categories,
destinations, , etc.
Our aim is to make clusters from this data that can segment similar clients together. We will, of
course, use ANN for this problem.
But before applying ANN or Random Forest Method, we have to normalize the data so that
the scale of each variable is the same. Why is this important? Well, if the scale of the
variables is not the same, the model might become biased towards the variables with a higher
magnitude like Claimed Commission.
First normalize the data and bring all the variables to the same scale
1. For applications in classification problems, Random Forest algorithm will avoid the overfitting
problem
2. For both classification and regression task, the same random forest algorithm can be used

Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
Random Forest Classifier on Banking Dataset
No ratings yet
Random Forest Classifier on Banking Dataset
7 pages
ML Lab
No ratings yet
ML Lab
29 pages
Car Evaluation Data Analysis & Random Forest Model
No ratings yet
Car Evaluation Data Analysis & Random Forest Model
12 pages
Machine Learning Assignment 1
No ratings yet
Machine Learning Assignment 1
4 pages
Final Report
No ratings yet
Final Report
17 pages
ML Fat
No ratings yet
ML Fat
9 pages
Data Mininig Project
67% (3)
Data Mininig Project
28 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Classification
No ratings yet
Classification
3 pages
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
No ratings yet
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
50 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
ML2
No ratings yet
ML2
7 pages
CS326 Report
No ratings yet
CS326 Report
36 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
Decision Tree, Random Forest
No ratings yet
Decision Tree, Random Forest
37 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
ADS Expt5 BE9 29
No ratings yet
ADS Expt5 BE9 29
3 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Data Mining Project
No ratings yet
Data Mining Project
11 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Machine Learning Extended Project - BrahmaChari
No ratings yet
Machine Learning Extended Project - BrahmaChari
29 pages
ML Manual
No ratings yet
ML Manual
24 pages
ML Ex 5
No ratings yet
ML Ex 5
6 pages
Predictive Modeling for Business Insights
100% (3)
Predictive Modeling for Business Insights
69 pages
AttiqAhmadAfsar Lab 13
No ratings yet
AttiqAhmadAfsar Lab 13
5 pages
CCD - Ipynb - Colab
No ratings yet
CCD - Ipynb - Colab
6 pages
ROC and AUC Practical Implementation PDF
No ratings yet
ROC and AUC Practical Implementation PDF
6 pages
Future of Supply Chain
No ratings yet
Future of Supply Chain
15 pages
Machine Learning Final Report
No ratings yet
Machine Learning Final Report
8 pages
Random Forest
100% (1)
Random Forest
11 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Python Implementation of Random Forest Algorithm
No ratings yet
Python Implementation of Random Forest Algorithm
10 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Project - Machine Learning (E)
No ratings yet
Project - Machine Learning (E)
34 pages
Machine Learning Model Optimization
No ratings yet
Machine Learning Model Optimization
17 pages
Last Day
No ratings yet
Last Day
35 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
SQR Da 2
No ratings yet
SQR Da 2
11 pages
Modelling-Project Notes-2
No ratings yet
Modelling-Project Notes-2
26 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
BR PRB 2
No ratings yet
BR PRB 2
6 pages
Performance Management
No ratings yet
Performance Management
18 pages
Summary of Dissertation
No ratings yet
Summary of Dissertation
5 pages
Online Coca Cola Company Management System SRS
No ratings yet
Online Coca Cola Company Management System SRS
81 pages
Analisis Pengaruh Hambatan Samping Pada Pasar Bandarjaya Plaza Sebagai Jalan Nasional
No ratings yet
Analisis Pengaruh Hambatan Samping Pada Pasar Bandarjaya Plaza Sebagai Jalan Nasional
10 pages
1002799.stancin Jovic
No ratings yet
1002799.stancin Jovic
6 pages
Computer Vision Pretrained Models: What Is Pre-Trained Model?
No ratings yet
Computer Vision Pretrained Models: What Is Pre-Trained Model?
10 pages
CISA 30 Questions
No ratings yet
CISA 30 Questions
6 pages
AES and DES Performance Comparison
No ratings yet
AES and DES Performance Comparison
9 pages
DDM Question Bank @
100% (1)
DDM Question Bank @
20 pages
NCM-W, NCM-F: ONYX® Series Network Communications Modules
No ratings yet
NCM-W, NCM-F: ONYX® Series Network Communications Modules
2 pages
Chapter 1 - Business Correspondence - Overview
No ratings yet
Chapter 1 - Business Correspondence - Overview
6 pages
A Munsell Colour-Based Approach For Soil Classification Using Fuzzy Logic and Artificial Neural Networks
No ratings yet
A Munsell Colour-Based Approach For Soil Classification Using Fuzzy Logic and Artificial Neural Networks
17 pages
QP 1
No ratings yet
QP 1
3 pages
Alka Tiwari
No ratings yet
Alka Tiwari
37 pages
Python Monthly Expense
No ratings yet
Python Monthly Expense
10 pages
SAP ABAP Developer Resume
No ratings yet
SAP ABAP Developer Resume
1 page
Macronix Nbit Memory Family: 64M-Bit (X 1) Cmos Serial Eliteflash Memory
No ratings yet
Macronix Nbit Memory Family: 64M-Bit (X 1) Cmos Serial Eliteflash Memory
46 pages
SET-331. Micro Controller Based Refrigeration Control System
No ratings yet
SET-331. Micro Controller Based Refrigeration Control System
4 pages
Aashtoware Darwin ® Pavement Design and Analysis System
0% (1)
Aashtoware Darwin ® Pavement Design and Analysis System
5 pages
FSX Emissive Textures and VCLighting
No ratings yet
FSX Emissive Textures and VCLighting
15 pages
Information Assurance Security Reviewer Exam 2nd Semester 2025 2026
No ratings yet
Information Assurance Security Reviewer Exam 2nd Semester 2025 2026
8 pages
Run Clubs Gmail
No ratings yet
Run Clubs Gmail
2 pages
SuccessFactors (EC RCM PMGM) Training Content - WindHond
No ratings yet
SuccessFactors (EC RCM PMGM) Training Content - WindHond
5 pages
Panasonic VL-SV74 PDF
No ratings yet
Panasonic VL-SV74 PDF
2 pages
Decodificador 2 A 4 en VHDL
No ratings yet
Decodificador 2 A 4 en VHDL
1 page
Guide To Effective ChatGPT Prompting
No ratings yet
Guide To Effective ChatGPT Prompting
42 pages
AS Level Computer Science - Ownership & Software Licensing
No ratings yet
AS Level Computer Science - Ownership & Software Licensing
21 pages
Interns JDs UTS 2026
No ratings yet
Interns JDs UTS 2026
5 pages
Data Analytics and AI Strategy Toolkit - Overview
No ratings yet
Data Analytics and AI Strategy Toolkit - Overview
23 pages
OLSX - API Doc 1
No ratings yet
OLSX - API Doc 1
19 pages
Are You Sure?: Astrology For Beginners B V Raman
No ratings yet
Are You Sure?: Astrology For Beginners B V Raman
2 pages
Interview-Shell Scripting
No ratings yet
Interview-Shell Scripting
31 pages

Rev Insurance Business Report

Uploaded by

Rev Insurance Business Report

Uploaded by

BUSINESS REPORT-

condition check, write an inference on it.

from PIL import Image

#from scipy.cluster.hierarchy import dendrogram, linkage,fcluster

from sklearn.preprocessing import MinMaxScaler

from sklearn.model_selection import train_test_split

#from sklearn.cluster import KMeans

#from sklearn.metrics import silhouette_samples, silhouette_score

#from sklearn import metrics

from sklearn.tree import DecisionTreeClassifier

from sklearn import tree

from sklearn.ensemble import RandomForestClassifier

from sklearn.neural_network import MLPClassifier

import matplotlib.pyplot as plt

import seaborn as sns

Product Name 0.039937

accuracy 0.78 2100

macro avg 0.77 0.75 0.76 2100

You might also like