0% found this document useful (0 votes)
19 views27 pages

First Seminar of Ei Ei Khin

This document presents a seminar on classifying bank marketing data using support vector machines (SVM). It aims to predict customer subscription to term deposits. The seminar outlines preprocessing steps like handling missing values and outliers. It explains SVM and how it finds the optimal hyperplane for classification. Performance is evaluated using a confusion matrix on a bank marketing dataset containing over 11,000 records and 17 attributes to classify customers as subscribing or not subscribing to deposits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views27 pages

First Seminar of Ei Ei Khin

This document presents a seminar on classifying bank marketing data using support vector machines (SVM). It aims to predict customer subscription to term deposits. The seminar outlines preprocessing steps like handling missing values and outliers. It explains SVM and how it finds the optimal hyperplane for classification. Performance is evaluated using a confusion matrix on a bank marketing dataset containing over 11,000 records and 17 attributes to classify customers as subscribing or not subscribing to deposits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

University of Computer Studies, Yangon (UCSY)

Classification of Bank Marketing Data


Using Support Vector Machine

Presented by : Ma Ei Ei Khin
Roll No : 6CS-15 (BIS-9)
1 Batch : 24th Batch
Supervised by : Dr. Tin Tin Htar
Seminar : First Seminar
Date :
Outline
2
 Abstract
 Objectives
 Introduction
 Related Work
 Steps of the System
 Support Vector Machine (SVM)
 Dataset Description
 System Flow Diagram
 Performance Evaluation
 Conclusion
 References
Abstract
3
 Banking system plays an important role of financial sectors all
over the world.
 The banking industry requires more accurate predictive
modeling system for their services or products.
 Bank workers can make those predictive models with manually,
but this process takes long time and lots of man-hours.
 For these reasons, machine learning techniques are useful to
predict the outcomes with huge amounts of data.
 Classification is one of the most important techniques to analyze
and to predict the new data.
 This system will implement the classification of bank marketing
data using support vector machine (SVM) to predict the
probability of the customers’ subscription to the term deposit
whether subscribe or not.
Objectives
4
 To classify and predict the bank marketing data to the term
deposit using Support Vector Machine
 To help the banks in identifying the main factor that can increase
the customers’ subscription to the term deposit
 To evaluate the performance of classification of bank marketing
data by using confusion matrix
Introduction
5
 A term deposit in bank is a fixed-term investment that involves
the deposit of money into an account.
 Deposits are the main source of revenue for banks and many
banks offer different types of accounts to attract customers
willing to deposit their funds.
 A bank can increase the number of subscribers to term deposit
through effective marketing.
 Bank marketing campaign can be carried out or launched in
various ways using telephone, social media, emails, short
message services, blogging, and others.
 The purpose of bank marketing campaign is to meet the targeted
needs of the customers to satisfy the bank’s product.
Introduction (Cont’d)
6
 By targeting a specific group of customers, banks can achieve
their organizational objectives to increase the number of
subscriptions to term deposits.
 This system will predict the probability of customers to
subscribe the deposit using bank marketing data with Support
Vector Machine (SVM).
Related Work - 1
7
Title : Using SVM for Smart Direct Marketing (SDM): A case
of predicting bank customers interested in
the Term Deposits
Author : Karim Amzile, Rajaa Amzile (2021)
 In this paper, the authors proposed to predict the behavior of
bank customers toward term deposits using Support Vector
Machine.
 The dataset was collected during a direct marketing campaign
and consisted of 1,572 rows, divided into two categories, with
671 rows representing customers who were interested in the term
deposit, and the remainder representing customers who were not
interested.
 The result came out the accuracy of 93% when predicting
customer behavior with a high level of predictability.
 This model obtained to minimize the cost and expense of a
marketing campaign for banks.
Related Work - 2
8
Title : Predicting acceptance of the bank loan offers
by using support vector machines
Author : Mehmet Furkan Akça a, Onur Sevli b (2022)
 The purpose of this paper is to predict acceptance of the bank
loan offers using the Support Vector Machine.
 The authors used to predict results with four kernels of SVM
such as Linear kernel, Polynomial kernel, RBF kernel and
Sigmoid kernel.
 The best results were obtained with a polynomial kernel as
97.2% accuracy and the lowest success rate with a sigmoid
kernel as 83.3% accuracy.
Related Work - 3
9
Title : A comparative study between support vector machine
and support vector data description in
bank telemarketing
Author : Han Gao* ; Pei Shan Fam; Heng Chin Low (2021)
 The authors conducted to compare the prediction performance
of the Support Vector Machine (SVM) and Support Vector Data
Description (SVDD).
 They used the bank telemarketing area based on the dataset
collected from a Portuguese banking institution to predict that
who possibly would buy the term deposit.
 The accuracy values are 0.9867 and 0.8565 for SVM and SVDD
model, respectively.
 They presented that SVM model was more suitable for binary
classification problem compared to SVDD.
Steps of the system
10
 Start and load data from Bank Marketing Dataset.
 Handling Data Preprocessing Steps
 Handle Missing Values and Remove Duplicate Values
 Handle Outliners
 Data Transformation
 Splitting Training data and Testing data as 80% and 20%
respectively.
 Feature Engineering the training data by using PCA.
 Classification by using SVM.
 Building a model.
 Performance Evaluation by Confusion Matrix
 End
Preprocessing Steps
11
 Handle Missing Values and Remove Duplicate Values
 Missing values and duplicate values can badly affect the
prediction results.
 Missing data can be anything such as missing sequence,
incomplete feature, files missing, information incomplete,
data entry error, etc.
 So the missing values and duplicate values are needed to
handle before going into further process.
Preprocessing Steps (Cont’d)
12
 Handle Outliners
 An outlier is a data point that is distant from other related
points.
 There are three basic ways to handle outliers.
 Remove all the outliers
 Replace Outlier values with a suitable value
 Using IQR — Remove the data above and below the limits or
replace them with the limit value.

 Data Transformation
 Data transformation is the process of converting, cleansing,
and structuring data into a usable format that can be analyzed
to support the processes.
 Data transformation is used when data needs to be converted
to match that of the destination system.
Feature Engineering
13
 Principal Component Analysis (PCA)
 Principal Component Analysis (PCA) is used to reduce the
dimensionality of large data sets, by transforming a large set
of variables into a smaller one that still contains most of the
information in the large set.
 The process of PCA is simply reducing the number of
variables of a data set, while preserving as much information
as possible.
Support Vector Machine (SVM)
14
 Support Vector Machine (SVM) is a supervised learning model
with related learning algorithms analyzing the data used for
classification and prediction of data.
 In the SVM, each data item is represented as a point in n-
dimensional space.
 The objective of a support vector classifier is to define an
optimal hyperplane to separate the two classes.
 The hyperplane separates the two classes to determine a plane
with the largest margin.
 SVM is known as the algorithm that finds a special type of
linear model called the maximum margin hyperplane, which
gives the maximum separation between decision classes.
Support Vector Machine (SVM) (Cont’d)
15

Fig-1: Linear SVM Fig-2:


Non-Linear SVM
Support Vector Machine (SVM) (Cont’d)
16
 – for Decision boundary
 – for Class-1 boundary
 – for Class-2 boundary

1. Positive samples -
2. Negative samples -

 Then by inducing the variable Y as following, we can generate a


conditional statement.
and

 Therefore, the support vectors is


 When this condition is satisfied, all the positive and negative data
points will be behind the boundary lines.
Support Vector Machine (SVM) (Cont’d)
17
 When we applying SVM, we need to have the maximum width
between the boundaries.
Width =
= For positive examples and negative
examples
=

 In order to maximize width, to find a hyperplane with the


maximum margin, which can be expressed as an optimization
problem shown as:
Minimize :
Subject to : , i = 1,2,…,n
Dataset Description
18  In this system, the bank marketing data are extracted from
https://www.kaggle.com/datasets/janiobachmann/bank-marketing-
dataset.
 The dataset contains 11162 rows,17 attributes and two classes (Yes
= 5289 rows and No = 5873 rows) to predict the deposit.
Dataset Description (Cont’d)

19
Dataset Description (Cont’d)
20
 Age = Age of the customer
 Job = Types of job
 Marital = Marital Status
 Education = Level of education of customer
 Default = Does the customer have credit default?
 Balance = Bank balance
 Housing = Does customer have housing?
 Loan = Does customer have loan?
 Contact = The type of contact
 Day = Last contact day of the month
 Month = Last contact month
 Duration = Contact time in second
 Campaign = Number of contact performed during this campaign
Dataset Description (Cont’d)

21
 Pdays = Number of days that the customer last contacted from
previous campaign
 Previous = Number of contact performed before this campaign
 Poutcome = Outcome of previous marketing campaign
 Deposit = Does the customer subscribe the term deposit?
System Flow Diagram
22
Performance Evaluation
23
 To evaluate the performance of SVM Classifier, accuracy,
precision, recall and F-measure are calculated by using
confusion matrix.
Conclusion
24
 The ability of the customers to subscribe to term deposits
depends on the service offered by the bank.
 In this system, Support Vector Machine (SVM) is used to find
out the main factor that influences customers decision to
subscribe to a term deposit in banks.
 SVM has highly competitive performance in real-world
applications.
 By using this system, the bankers can determine the rate of
customers’ subscription to the term deposit.
References
25
1. Karim Amzile, Rajaa Amzile : Using SVM for Smart Direct
Marketing (SDM): A case of predicting bank customers
interested in the Term Deposits (2021)
2. Mehmet Furkan Akça, Onur Sevli b : Predicting acceptance of
the bank loan offers by using support vector machines (2022)
3. Han Gao* ; Pei Shan Fam; Heng Chin Low : A comparative
study between support vector machine and support vector data
description in bank telemarketing (2021)
4. Zakaria Jaadi : A step by step explanation of Principal
Component Analysis (PCA) (2022)
5. Nachev, T. Teodosiev : Using Support Vector Machines for
Direct Marketing Models (2015)
6. Jamiu Olalekan Oni : Exploratory Analysis of Bank Marketing
Campaign Using Machine Learning: Logistic Regression,
Support Vector Machine, and K-Nearest Neighbor (2020)
Thesis Time Schedule
26

2023 Jan 2023 Feb 2023 Apr 2023 June

Paper
Thesis Book Submission
Preparation Thesis Book
Preparation

First Second Third


Defense
Seminar Seminar Seminar
27

Thank You!!!

You might also like