University of Computer Studies, Yangon (UCSY)
Classification of Bank Marketing Data
Using Support Vector Machine
Presented by : Ma Ei Ei Khin
Roll No : 6CS-15 (BIS-9)
1 Batch : 24th Batch
Supervised by : Dr. Tin Tin Htar
Seminar : First Seminar
Date :
Outline
2
Abstract
Objectives
Introduction
Related Work
Steps of the System
Support Vector Machine (SVM)
Dataset Description
System Flow Diagram
Performance Evaluation
Conclusion
References
Abstract
3
Banking system plays an important role of financial sectors all
over the world.
The banking industry requires more accurate predictive
modeling system for their services or products.
Bank workers can make those predictive models with manually,
but this process takes long time and lots of man-hours.
For these reasons, machine learning techniques are useful to
predict the outcomes with huge amounts of data.
Classification is one of the most important techniques to analyze
and to predict the new data.
This system will implement the classification of bank marketing
data using support vector machine (SVM) to predict the
probability of the customers’ subscription to the term deposit
whether subscribe or not.
Objectives
4
To classify and predict the bank marketing data to the term
deposit using Support Vector Machine
To help the banks in identifying the main factor that can increase
the customers’ subscription to the term deposit
To evaluate the performance of classification of bank marketing
data by using confusion matrix
Introduction
5
A term deposit in bank is a fixed-term investment that involves
the deposit of money into an account.
Deposits are the main source of revenue for banks and many
banks offer different types of accounts to attract customers
willing to deposit their funds.
A bank can increase the number of subscribers to term deposit
through effective marketing.
Bank marketing campaign can be carried out or launched in
various ways using telephone, social media, emails, short
message services, blogging, and others.
The purpose of bank marketing campaign is to meet the targeted
needs of the customers to satisfy the bank’s product.
Introduction (Cont’d)
6
By targeting a specific group of customers, banks can achieve
their organizational objectives to increase the number of
subscriptions to term deposits.
This system will predict the probability of customers to
subscribe the deposit using bank marketing data with Support
Vector Machine (SVM).
Related Work - 1
7
Title : Using SVM for Smart Direct Marketing (SDM): A case
of predicting bank customers interested in
the Term Deposits
Author : Karim Amzile, Rajaa Amzile (2021)
In this paper, the authors proposed to predict the behavior of
bank customers toward term deposits using Support Vector
Machine.
The dataset was collected during a direct marketing campaign
and consisted of 1,572 rows, divided into two categories, with
671 rows representing customers who were interested in the term
deposit, and the remainder representing customers who were not
interested.
The result came out the accuracy of 93% when predicting
customer behavior with a high level of predictability.
This model obtained to minimize the cost and expense of a
marketing campaign for banks.
Related Work - 2
8
Title : Predicting acceptance of the bank loan offers
by using support vector machines
Author : Mehmet Furkan Akça a, Onur Sevli b (2022)
The purpose of this paper is to predict acceptance of the bank
loan offers using the Support Vector Machine.
The authors used to predict results with four kernels of SVM
such as Linear kernel, Polynomial kernel, RBF kernel and
Sigmoid kernel.
The best results were obtained with a polynomial kernel as
97.2% accuracy and the lowest success rate with a sigmoid
kernel as 83.3% accuracy.
Related Work - 3
9
Title : A comparative study between support vector machine
and support vector data description in
bank telemarketing
Author : Han Gao* ; Pei Shan Fam; Heng Chin Low (2021)
The authors conducted to compare the prediction performance
of the Support Vector Machine (SVM) and Support Vector Data
Description (SVDD).
They used the bank telemarketing area based on the dataset
collected from a Portuguese banking institution to predict that
who possibly would buy the term deposit.
The accuracy values are 0.9867 and 0.8565 for SVM and SVDD
model, respectively.
They presented that SVM model was more suitable for binary
classification problem compared to SVDD.
Steps of the system
10
Start and load data from Bank Marketing Dataset.
Handling Data Preprocessing Steps
Handle Missing Values and Remove Duplicate Values
Handle Outliners
Data Transformation
Splitting Training data and Testing data as 80% and 20%
respectively.
Feature Engineering the training data by using PCA.
Classification by using SVM.
Building a model.
Performance Evaluation by Confusion Matrix
End
Preprocessing Steps
11
Handle Missing Values and Remove Duplicate Values
Missing values and duplicate values can badly affect the
prediction results.
Missing data can be anything such as missing sequence,
incomplete feature, files missing, information incomplete,
data entry error, etc.
So the missing values and duplicate values are needed to
handle before going into further process.
Preprocessing Steps (Cont’d)
12
Handle Outliners
An outlier is a data point that is distant from other related
points.
There are three basic ways to handle outliers.
Remove all the outliers
Replace Outlier values with a suitable value
Using IQR — Remove the data above and below the limits or
replace them with the limit value.
Data Transformation
Data transformation is the process of converting, cleansing,
and structuring data into a usable format that can be analyzed
to support the processes.
Data transformation is used when data needs to be converted
to match that of the destination system.
Feature Engineering
13
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is used to reduce the
dimensionality of large data sets, by transforming a large set
of variables into a smaller one that still contains most of the
information in the large set.
The process of PCA is simply reducing the number of
variables of a data set, while preserving as much information
as possible.
Support Vector Machine (SVM)
14
Support Vector Machine (SVM) is a supervised learning model
with related learning algorithms analyzing the data used for
classification and prediction of data.
In the SVM, each data item is represented as a point in n-
dimensional space.
The objective of a support vector classifier is to define an
optimal hyperplane to separate the two classes.
The hyperplane separates the two classes to determine a plane
with the largest margin.
SVM is known as the algorithm that finds a special type of
linear model called the maximum margin hyperplane, which
gives the maximum separation between decision classes.
Support Vector Machine (SVM) (Cont’d)
15
Fig-1: Linear SVM Fig-2:
Non-Linear SVM
Support Vector Machine (SVM) (Cont’d)
16
– for Decision boundary
– for Class-1 boundary
– for Class-2 boundary
1. Positive samples -
2. Negative samples -
Then by inducing the variable Y as following, we can generate a
conditional statement.
and
Therefore, the support vectors is
When this condition is satisfied, all the positive and negative data
points will be behind the boundary lines.
Support Vector Machine (SVM) (Cont’d)
17
When we applying SVM, we need to have the maximum width
between the boundaries.
Width =
= For positive examples and negative
examples
=
In order to maximize width, to find a hyperplane with the
maximum margin, which can be expressed as an optimization
problem shown as:
Minimize :
Subject to : , i = 1,2,…,n
Dataset Description
18 In this system, the bank marketing data are extracted from
https://www.kaggle.com/datasets/janiobachmann/bank-marketing-
dataset.
The dataset contains 11162 rows,17 attributes and two classes (Yes
= 5289 rows and No = 5873 rows) to predict the deposit.
Dataset Description (Cont’d)
19
Dataset Description (Cont’d)
20
Age = Age of the customer
Job = Types of job
Marital = Marital Status
Education = Level of education of customer
Default = Does the customer have credit default?
Balance = Bank balance
Housing = Does customer have housing?
Loan = Does customer have loan?
Contact = The type of contact
Day = Last contact day of the month
Month = Last contact month
Duration = Contact time in second
Campaign = Number of contact performed during this campaign
Dataset Description (Cont’d)
21
Pdays = Number of days that the customer last contacted from
previous campaign
Previous = Number of contact performed before this campaign
Poutcome = Outcome of previous marketing campaign
Deposit = Does the customer subscribe the term deposit?
System Flow Diagram
22
Performance Evaluation
23
To evaluate the performance of SVM Classifier, accuracy,
precision, recall and F-measure are calculated by using
confusion matrix.
Conclusion
24
The ability of the customers to subscribe to term deposits
depends on the service offered by the bank.
In this system, Support Vector Machine (SVM) is used to find
out the main factor that influences customers decision to
subscribe to a term deposit in banks.
SVM has highly competitive performance in real-world
applications.
By using this system, the bankers can determine the rate of
customers’ subscription to the term deposit.
References
25
1. Karim Amzile, Rajaa Amzile : Using SVM for Smart Direct
Marketing (SDM): A case of predicting bank customers
interested in the Term Deposits (2021)
2. Mehmet Furkan Akça, Onur Sevli b : Predicting acceptance of
the bank loan offers by using support vector machines (2022)
3. Han Gao* ; Pei Shan Fam; Heng Chin Low : A comparative
study between support vector machine and support vector data
description in bank telemarketing (2021)
4. Zakaria Jaadi : A step by step explanation of Principal
Component Analysis (PCA) (2022)
5. Nachev, T. Teodosiev : Using Support Vector Machines for
Direct Marketing Models (2015)
6. Jamiu Olalekan Oni : Exploratory Analysis of Bank Marketing
Campaign Using Machine Learning: Logistic Regression,
Support Vector Machine, and K-Nearest Neighbor (2020)
Thesis Time Schedule
26
2023 Jan 2023 Feb 2023 Apr 2023 June
Paper
Thesis Book Submission
Preparation Thesis Book
Preparation
First Second Third
Defense
Seminar Seminar Seminar
27
Thank You!!!