0% found this document useful (0 votes)
40 views61 pages

Credit Card Fraud Detection Project Report

The document is a project report for a Bachelor of Technology degree in Information Technology, focusing on credit card fraud detection using machine learning techniques. It outlines the project's goals, methodology, and the significance of detecting fraudulent transactions to protect customers. The report includes various machine learning models such as KNN, SVM, and Logistic Regression, and emphasizes the importance of evaluating their effectiveness in fraud detection.

Uploaded by

digital solution
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views61 pages

Credit Card Fraud Detection Project Report

The document is a project report for a Bachelor of Technology degree in Information Technology, focusing on credit card fraud detection using machine learning techniques. It outlines the project's goals, methodology, and the significance of detecting fraudulent transactions to protect customers. The report includes various machine learning models such as KNN, SVM, and Logistic Regression, and emphasizes the importance of evaluating their effectiveness in fraud detection.

Uploaded by

digital solution
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Smart Home Automation

A Project Report
Submitted in partial fulfilment of the requirement for the award of the
degree

OF

BACHELOR OF TECHNOLOGY
in
(Information Technology)

SUBMITTED BY

Siddhi Vinayak Singh (21359)


Rameshwar Pratap singh (21349)

Under the Supervision of

Dr. Vineet Kumar Singh

(Department of Information Technology)

Institute of Engineering & Technology


Dr Rammanohar Lohia Avadh University Ayodhya, UP, INDIA
Session 2025
“Credit Card Fraud Detection
Using Machine Learning”
A Project Report
Submitted in partial fulfilment of the requirement for the
award of the degree

OF

BACHELOR OF TECHNOLOGY
in
(Information Technology)

SUBMITTED BY

Amit Kumar (20305)


Dushyant Chauhan (20319)
Mohd. Arif Khan (20325)

August 2024

Under the Supervision of

Dr. Vineet Kumar Singh


(Department of Information Technology)

Institute of Engineering & Technology


Dr Rammanohar Lohia Avadh University Ayodhya, UP, INDIA
Session 2024
iii

DECLARATION

I declare that this written submission represents my work and ideas in my own
words and where others' ideas or words have been included, I have adequately
cited and referenced the original sources I also declare that I have adhered to all
principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in my submission. I understand
that any violation of the above will be cause for disciplinary action by the
University and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken when
needed. This project represents our own work conducted under the guidance of
Dr. Vineet Kumar Singh (Assistance Professor, Department of Information
Technology).

Amit Kumar (20305)

Dushyant Chauhan (20319)

Mohd. Arif Khan (20325)


iv

CERTIFICATE

This is to certify that this project entitled, SMART HOME AUTOMATION submitted
by Siddhi vinayak singh , Rameshwar pratap singh the requirement for the award of the
Degree of Bachelor of Technology in Information Technology of Institute of
Engineering and Technology Dr Ram Manohar Lohia Avadh University Ayodhya is a
record of student on work carried under the supervision and guidance. The project
embodies result of original work and studies carried out by students and contents do not
form the basis for the award of any other degree to the candidate all to anybody else.

Signature of Supervisor

Signature of Head of Department


v

ACKNOWLEDGEMENT

Before we Present Our work, we would like to be gratefully acknowledge the contribution
of all those people who helped in the work described in this project report. I would like
to thanks all my team member for building this great project and their hard work for the
project.

We are gratefully acknowledging our HOD, Mr. Rajesh Kumar Singh, Head of
Department of Information Technology, Institute of Engineering and Technology,
Ayodhya. For this unconditionally support and encouragement to pursue of our field of
interest Information Technology.

He is the person who always encourage us to develop an aptitude and an Endeavour.

We express our sincere and profound sense of gratitude to our respected supervisor faculty
Dr. Vineet Kumar Singh, Assistant professor of Information Technology Department.
Institute of Engineering and technology, Ayodhya for her expert guidance and constant
inspiration through this work that proved the way for successful completion of this
Endeavour.
vi

Approval Sheet

The Project report entitled SMART HOME AUTOMATION by Siddhi vinayak


(21359), Rameshwar pratap singh (22349) is approved for the degree of
“Bachelor of Technology”.

Signature Internal Examiner

Signature External Examiner

Supervisor

Head of Department

Date:

Place:
vii

ABSTRACT

The purpose of this project is to detect the fraudulent transactions made by credit cards
by the use of machine learning techniques, to stop fraudsters from the unauthorized usage
of customers’ accounts. The increase of credit card fraud is growing rapidly worldwide,
which is the reason actions should be taken to stop fraudsters. Putting a limit for those
actions would have a positive impact on the customers as their money would be recovered
and retrieved back into their accounts and they won’t be charged for items or services that
were not purchased by them which is the main goal of the project. Detection of the
fraudulent transactions will be made by using three machine learning techniques KNN,
SVM and Logistic Regression, those models will be used on a credit card transaction
dataset.

Keywords: Credit Card Fraud Detection, Fraud Detection, Fraudulent Transactions, K-


Nearest Neighbour, Support Vector Machine, Logistic Regression, Naïve Bayes.
viii

TABLE OF CONTENT

CONTENTS PAGE NO.


DECLRATION iii
CERTIFICATE iv
ACKNOWLEDGEMENT v
APPROVAL SHEET vi
ABSTRACT vii

CHAPTER 1 INTRODUCTION 1
1.1 INTODUCTION 1
1.2 PROJECT GOALS 1
1.3 RESEARCH METHODOLOGY 2
1.3.1 CRISP-DM 2

CHAPTER 2 LITERATURE REVIEW 4


2.1 INTRODUCTION 4
2.2 LITERATURE REVIEW 5-10
2.3 LITERATURE REVIEW CONCLUSION 10-11

CHAPTER 3 PROJECT DESCRIPTION 12


3.1 INTRODUCTION 12
3.2 DATA SOURCE 12

CHAPTER 4 DATAANALYSIS 13
4.1 SYSTEM REQUIREMENT SPECIFICATION 13
4.2 HARDWARE SPECIFICATION 13
4.3 SOFTWARE SPECIFICATION 14
4.4 FUNCTIONAL REQUIREMENTS 14
4.5 NON-FINCTIONAL REQUIREMENTS 14
4.6 PERFORMANCE REQIREMENT 15
ix

CHAPTER 5 DATA ANALYSIS 16


5.1 DATA PREPARATION 16
5.1.1 CORRELATION BETWEEN ATTRIBUTES “IMAGE FROM R” 17
5.1.2 ATTRIBUTE WITH THE MOST FRAUD 18
5.1.3 ATTRIBUTE WITH THE LESS FRAUD 18-19
5.2 DATA PREPROCESSING 19
5.3 DATAMODELING 19
5.3.1 KNN 19-21
5.3.2 NAÏVE BAYES 22
5.3.3 LOGISTIC REGRESSION 23
5.3.4 SUPPORT VECTOR MACHINE 24
5.4 EVALUATION AND DEPLOYMENT 24-26

CHAPTER 6 SYSTEM DESIGN 27


6.1 PROJECT MODULE 27-38

CHAPTER 7 IMPLEMENTATION 39
CODE 40-44

CHAPTER 8 TESTING 45-46

CHAPTER 9 ALGORITHM 47

9.1 RANDOM FOREST 47-49

CHAPTER 10 CONCLUSION 50

REFRENCES 51-52
1

CHAPTER:01

INTRODUCTION

1.1 INTRODUCTION

With the increase of people using credit cards in their daily lives, credit card
companies should take special care in the security and safety of the customers.
According to (Credit card statistics 2021) the number of people using credit
cards around the world was 2.8 billion in 2019, in addition 70% of those users
own a single card at least.

Reports of Credit card fraud in the US rose by 44.7% from 271,927 in 2019 to
393,207 reports in 2020. There are two kinds of credit card fraud, the first one
is by having a credit card account opened under your name by an identity thief,
reports of this fraudulent behavior increased 48% from 2019 to 2020. The
second type is by an identity thief uses an existing account that you created,
and it’s usually done by stealing the information of the credit card, reports on
this type of fraud increased 9% from 2019 to 2020 (Daly, 2021). Those
statistics caught my attention as the numbers are increasing drastically and
rapidly throughout the years, which gave me the motive to try to resolve the
issue analytically by using different machine learning methods to detect the
credit card fraudulent transactions within numerous transactions.

1.2 PROJECT GOALS

The main aim of this project is the detection of credit card fraudulent
transactions, as it’s important to figure out the fraudulent transactions so that
customers don’t get charged for the purchase of products that they didn’t buy.
The detection of the credit card fraudulent transactions will be performed with
multiple ML techniques then a comparison will be made between the outcomes
and results of each technique to
2

find the best and most suited model in the detection of


credit card transaction that are fraudulent, graphs and numbers will be
provided as well. In addition, exploring previous literatures and different
techniques used to distinguish the fraud within a dataset.

Research question: What is the most suited machine learning model in the
detection of fraudulent credit card transactions?

1.3 RESEARCH METHODOLOGY


1.3.1 CRISP-DM

I believe that taking the route of CRISP-DM will ease obtaining efficient and
elite results, as it takes the project into the whole journey, starting by
understanding the business and data, preparing the data then modeling it and
finally evaluate the model to make sure it’s performing well.

Phase 1: Business Understanding

As stated, before credit card fraud is increasing drastically every year, many
people are facing the problem of having their credits breached by those
fraudulent people, which is impacting their daily lives, as payments using a
credit card is similar to taking a loan. If the problem is not solved many people
will have large amounts of loans that they cannot pay back which will make
them face a hard life, and they won’t be able to afford necessary products, in
the long run not being able to pay back the amount might lead to them going
to jail. Basically, the problem proposed is the detection of the credit card
fraudulent transactions made by fraudsters to stop those breaches and to ensure
customers security.

Business Objective: Identification of fraudulent transaction to prohibit


deduction from effected customers’ accounts.
3

Phase 2: Data Understanding

In the Data understanding phase, it was critical to obtain a high-quality dataset


as the model is based on it, the dataset was explored by taking a closer look
into it which gave the knowledge needed to confirm the quality of the dataset,
additionally to reading the description of the whole dataset and each attribute.
It’s also important to have a dataset that contains several mixed transaction
types “Fraudulent and real” and a class to clarify the type of transaction,
finally, identifiers to clarify the reason behind the classification of the
transaction type. I made sure to follow all of those points during the search for
the most suited dataset.

Phase 3: Data Preparation

After choosing the most suited dataset the preparation phase begins, the
preparation of the dataset includes selecting the wanted attributes or variables,
cleaning it by excluding Null rows, deleting duplicated variables, treating
outlier if necessary, in addition to transforming data types to the wanted type,
data merging can be performed as well where two or more attributes get
merged. All those alterations lead to the wanted result which is to make the
data ready to be modeled.

The dataset chosen for this project didn’t need to go through all of the
alterations mentioned earlier, as there were no missing nor duplicated
variables, there was no merging needed as well. But there was some changing
in the types of the data to be able to create graphs, in addition to using the
application Sublime Text to be able to insert the data into Weka and perform
analysis, as it needed to be altered.

Phase 4: Modelling

Four machine learning models were created in the modelling phase, KNN,
SVM, Logistic Regression and Naïve Bayes. A comparison of the results will
be presented later in the paper to know which technique is most suited in the
4

credit card fraudulent transactions detection. The dataset is sectioned into a


ratio of
70:30, the training set will be the 70% and remaining set will be the testing set
which is the 30%. The four models were created using Weka and only two in
R, KNN and Naïve Bayes. Visualizations will be provided from both tools.

Phase 5: Evaluation and Deployment

The final phase will show evaluations of the models by presenting their
efficiency, the accuracies of the models will be presented in addition to any
comment observed, to find the best and most suited model for detecting the
fraud transactions made by credit card.
5

CHAPTER: 02

LITERATURE REVIEW

2.1 INTRODUCTION

It is essential for credit card companies to establish credit card transactions that
fraudulent from transactions that are non-fraudulent, so that their customers’
accounts won’t get affected and charged for products that the customers didn’t
buy (Maniraj et al., 2019). There are many financial Companies and
institutions that lose massive amounts of money because of fraud and
fraudsters that are seeking different approaches continuously to violate the
rules and commit illegal actions; therefore, systems of fraud detection are
essential for all banks that issue credit cards to decrease their losses (Zareapoor
et al., 2012). There are multiple methods used to detect fraudulent behaviors
such as Neural Network (NN), Decision Trees, K-Nearest Neighbor
algorithms, and Support Vector Machines (SVM). Those ML methods can
either be applied independently or can be used collectively with the addition
of ensemble or meta-learning techniques to develop classifiers (Zareapoor et
al., 2012).

2.2 LITERATURE REVIEW

Zarea poor and his research team used multiple techniques to determine the
best performing model in detecting fraudulent transactions, which was
established using the accuracy of the model, the speed in detecting and the
cost. The models used were Neural Network, Bayesian Network, SVM, KNN
and more. The comparison table provided in the research paper showed that
Bayesian Network was very fast in finding the transactions that are fraudulent,
with high accuracy. The NN performed well as well as the detection was fast,
with a medium accuracy. KNN’s speed was good with a medium accuracy, and
finally SVM scored one of the lower scores, as the speed was low, and the
6

accuracy was medium. As for the cost All models built were expansive
(Zareapoor et al., 2012).

The model used by Alenzi and Aljehane to detect fraud in credit cards was
Logistic Regression, their model scored 97.2% in accuracy, 97% sensitivity
and 2.8% Error Rate. A comparison was performed between their model and
two other classifier which are Voting Classifier and KNN. VC scored 90% in
accuracy, 88% sensitivity and 10% error rate, as for KNN where k = 1:10, the
accuracy of the model was 93%, the sensitivity 94% and 7% for the error rate
(Alenzi & Aljehane, 2020).

Manirams team built a model that can recognize if any new transaction is fraud
or nonfraud, their goal was to get 100% in the detection of fraudulent
transactions in addition to trying to minimize the incorrectly classified fraud
instances. Their model has performed well as they were able to get 99.7% of
the fraudulent transactions (Maniraj et al., 2019).

The classification approach used by Dheepa and Dhanapal was the behavior-
based classification approach, by using Support Vector Machine, where the
behavioral patterns of the customers were analyzed to distinguish credit card
fraud, such as the amount, date, time, place, and frequency of card usage. The
accuracy achieved by their approach was more than 80% (Dheepa & Dhanapal,
2012).

Mailini and Pushpa proposed using KNN and Outlier detection in identifying
credit card fraud, the authors found after performing their model over sampled
data, that the most suited method in detecting and determining target instance
anomaly is KNN which showed that its most suited in the detection of fraud
with the memory limitation. As for Outlier detection the computation and
memory required for the credit card fraud detection is much less in addition to
its working faster and better in online large datasets. But their work and results
showed that KNN was more accurate and efficient (Malini & Pushpa, 2017).

Maes and his team proposed using Bayesian and Neural Network in the credit
card fraud detection. Their results showed that Bayesian performance is 8%
more effective in detecting fraud than ANN, which means that in some cases
BBN detects 8% more of the fraudulent transactions. In addition to the
7

Learning times, ANN can go up to several hours whereas BBN takes only 20
minutes (Maes et al., 2002).

The team of Awoyemi compared the usage of three ML techniques in the


detection of credit card fraud, the first is KNN, the second is Naïve Bayes and
the third is Logistic Regression. They sampled different distributions to view
the various outcomes. The top

Accuracy of the 10:90 distribution is Naïve Bayes with 97.5%, then KNN with
97.1%, Logistic regression performed poorly as the accuracy is 36.4%.
Another distribution that was viewed is 34:66, KNN topped the chart with a
slight increase in the accuracy 97.9%, then Naïve Bayes with 97.6%, Logistic
Regression performed better in this distribution as the accuracy raised to
54.8% (Awoyemi et al., 2017).

Jain’s team used several ML techniques to distinguish credit card fraud, three
of them are SVM, ANN and KNN. Then to compare the outcome of each
model, they calculated the true positive (TP), false negative (FN), false
positive (FP), and true negative (TN) generated. ANN scored 99.71%
accuracy, 99.68% precision, and 0.12% false alarm rate. SVM accuracy is
94.65%, 85.45% for the precision, and 5.2% false alarm rate. and finally, the
accuracy of KNN is 97.15%, precision is 96.84% and the false alarm rate is
2.88% (Jain et al., 2019).

Gupta’s team worked on implementing an automated model that uses various


ML techniques to detect fraudulent instances that are related economically to
users but is specializing more in credit card transactions, according to Gupta
and his team Out of all the techniques that they used Naïve Bayes had an
outstanding performance in distinguishing fraudulent transactions as the
accuracy of it was 80.4% and the area under the curve is 96.3% (Gupta et al.,
2021).

Adepoju and his team used all of the ML methods that are used in this paper,
Logistic Regression, (SVM) Support Vector Machine, Naive Bayes, and
(KNN) K-Nearest Neighbor, those methods were used on distorted credit card
fraud data. The accuracies scored by all the models were 99.07% for Logistic
Regression, Naïve Bayes scored 95.98%, 96.91% for K-nearest neighbor, and
8

the last model (SVM) Support Vector Machine scored 97.53% (Adepoju et al.,
2019).

Safa and Ganga investigated how well Logistic Regression, (KNN) K-nearest
neighbor, and Naïve Bayes work on exceptionally distorted credit card dataset,
they implanted their work on Python where the best method was selected using
evaluation. The accuracies result of their model for Naïve Bayes is 83%,
97.69% for Logistic regression and in last place K-nearest neighbor with
54.86% (Safa & Ganga, 2019).

The team of Varmedja used multiple machine learning algorithms in their


paper such as Logistic Regression, Multilayer Perception, Random Forest, and
Naïve Bayes. As the dataset was quite very unbalanced Varmedja and his team
SMOTE technique to oversample, feature selection, in addition to sectioning
the data into a training section and a testing data section. The best scoring
model during the experiment is Random Forest with 99.96%, with not many
difference the model in second place is Multilayer Perceptron with 99.93%, in
third place is Naïve bayes with 99.23% and in last place is Logistic regression
with 97.46% (Varmedja et al., 2019).

The system to detect credit card fraud that was introduced by Sailusha and his
team to detect fraudulent activities. The algorithms used in their model is
adaboost and Random Forest, which scored the accuracy 93.99% and the
accuracy of adaboost is 99.90% which shows that it did better than Random
Forest in term of accuracy (Sailusha et al.).

The paper of Kiran and his team presents Naïve Bayes (NB) improved (KNN)
K-Nearest Neighbour method for Fraud Detection of Credit Card which is
(NBKNN) in short format. The outcome of the experiment illustrates the
difference in the process of each classifier on the same dataset. Naïve bayes
performed better than K-nearest neighbor as it scored an accuracy of 95%
while KNN scored 90% (Kiran et al., 2018).

Nadat and his team’s approach in detecting fraudulent transactions is


(BiLSTM) BiLSTM- MaxPooling BiGRU- MaxPooling, this approach is
established upon bidirectional Long short-term memory in addition to
(BiGRU) bidirectional Gated recurrent unit. In addition, the group decided to
9

go for six ML classifiers, which are Voting, Adaboost, Random Forest,


Decision Tree, Naïve bayes, and Logistic Regression. Knearest neighbor
scored an accuracy of 99.13%, and logistic regression scored 96.27%,
Decision tree scored 96.40% and Naïve bayes scored 96.98% (Najadat et al.,
2020).

The paper of Saheed and his group focuses on detection of Credit Card Fraud
with the use of (GA) Genetic Algorithm as a feature selection technique. In
feature selection the data is splitted in two parts first priority features and
second priority features, and the ML techniques that the group used are The
Naïve Bayes (NB), Random Forest (RF) and (SVM) Support Vector Machine.
Naïve bayes scored 94.3%, SVM scored 96.3%, and Random Forest scored
96.40% which is the highest accuracy (Saheed et al., 2020).

The work of Itoo and his group uses three different ML methods the first is
logistic regression, the second is Naïve bayes and the last one is K-nearest
neighbors. Itoo and his group recorded the work and comparative analysis,
their work is implemented on python. Logistic regression accuracy is 91.2%,
Naïve bayes accuracy is 85.4% and K-Nearest neighbour is last with an
accuracy of 66.9% (Itoo et al., 2020).

The team of Tanouz proposed working on various ML based classification


algorithms, like Naïve Bayes, Logistic Regression, Random Forest, and
Decision Tree in handling datasets that are strongly imbalanced, in addition
their research will have the calculations of five measures the first is accuracy,
the second is precision, the third is recall, the fourth is confusion matrix, and
the last one is Roc-auc score. 95.16% is the score of both Logistic Regression
and Naïve Bayes, 96.77% is the score for random forest, for the last model
Decision Tree scored 91.12% (Tanouz et al., 2021).

Dighe and his team used KNN, Naïve Bayes, Logistic Regression and Neural
Network, Multi-Layers Perceptron and Decision Tree in their work, then
evaluated the results in terms of numerous accuracy metrics. Out of all the
models created the best performing one is KNN which scored 99.13%, then in
second place Naïve Bayes which scored 96.98%, the third best performing
10

model 96.40% and in last place is logistic regression with 96.27% (Dighe et
al., 2018).

The paper of Bhanusri and his team implemented multiple ML techniques on


an unbalanced dataset. The ML methods used are logistic regression, naïve
bayes, and random forest to explain the relation of fraud and credit card. Their
conclusion of the project presents the best classifier by training and testing
supervised techniques in term of their work. The logistic regression model
scored 99.8% accuracy, random forest scored 100% and 90.8% is scored by
naïve bayes.

Sahin and Duman used four Support Vector Machine methods in detecting
credit card fraud. SVM) Support Vector Machine with RBF, Polynomial,
Sigmoid, and Linear Kernel, all models scored 99.87% in the training model
and 83.02% in the testing part of the model (Sahin & Duman, 2011).

2.3 Literature Review Conclusion

Throughout the search I found that there were many models created by other
researchers which have proven that people have been trying to solve the credit
card fraud problem. I found that Najdat Team used an approach that is
established upon bidirectional long/short-term memory in building their
model, other researchers have tried different data splitting ratios to generate
different accuracies. The team of Sahin and Duman used different Support
Vector Machine methods which are (SVM) Support Vector Machine with RBF,
Polynomial, Sigmoid, and Linear Kernel.

The lowest accuracy of the four models that will be studied in this research, is
54.86% for KNN and 36.40% for logistic Regression which were scored by
Awoyemi and his team, as for Naïve Bayes the lowest accuracy was scored by
Gupta and his team which is
80.4% and finally, SVM the lowest score was 94.65% and it was scored by
Jain’s team. To determine the best model out of the four models that will be
studied through the research, the average of the best three accuracies of each
model will be calculated, the average of the accuracy of KNN is 98.72%, the
11

average of logistic regression is 98.11%, 98.85% for Naïve bayes and 96.16%
for Support Vector Machine. So, for the best performing credit card fraud
detecting model within the Literature review is the Logistic Regression model.
12

CHAPTER: 03

Project Description

3.1 Introduction

In order to accomplish the objective and goal of the project which is to find
the most suited model to detect credit card fraud several steps need to be taken.
Finding the most suited data and preparing/preprocessing are the first and
second steps, after making sure that the data is ready the modeling phase starts,
where 4 models are created, K-Nearest Neighbor (KNN) , Naïve Bayes, SVM
and the last one is Logistic Regression. In the KNN model two Ks were chosen
K=3 and K=7. All models were created in both R and Weka programs expect
SVM which was created in Weka only, in addition all visualizations are taken
from both applications.

3.2 Data Source

The dataset was retrieved from an open-source website, [Link]. it


contains data of transactions that were made in 2013 by credit card users in
Europe, in two days only. The dataset consists of 31 attributes, 284,808 rows.
28 attributes are numeric variables that due to confidentiality and privacy of
the customers have been transformed using PCA transformation, the three
remaining attributes are “Time” which contains the elapsed seconds between
the first and other transactions of each attribute, “Amount” is the amount of
each transaction, and the final attribute “Class” which contains binary
variables where “1” is a case of fraudulent transaction, and “0” is not as case
of fraudulent transaction.

Dataset Link: [Link]


13

CHAPTER: 04
SYSTEM REQUIREMENTS AND SPECIFICATION

4.1 System Requirement Specification


System Requirement Specification (SRS) is a fundamental document,
which forms the foundation of the software development process. The
System Requirements Specification (SRS) document describes all data,
functional and behavioral requirements of the software under production
or development. An SRS is basically an organization's understanding (in
writing) of a customer or potential client's system requirements and
dependencies at a particular point in time (usually) prior to any actual
design or development work. It's a two- way insurance policy that
assures that both the client and the organization understand the other's
requirements from that perspective at a given point in time. The SRS
also functionsas a blueprint for completing a project with as little cost
growth as possible. The SRS is often referred to as the "parent"
document because all subsequent project management documents, such
as design specifications, statements of work, software architecture
specifications, testing and validation plans, and documentation plans,
are related to it. It is important to note that an SRS contains functional
and nonfunctional requirements only. It doesn't offer design suggestions,
possible solutions to technology or business issues, or any other
information other than what the development team understands the
customer's system requirements.

4.2 Hardware specification

➢ RAM: 4GB and Higher

➢ Processor: intel i3 and above

➢ Hard Disk: 500GB: Minimum


14

4.3 Software specification

➢ OS: Windows or Linux

➢ Python IDE: python 2.7.x and above

➢ Jupyter Notebook

➢ Language: Python

4.4 Functional Requirements

Functional Requirement defines a function of a software system and how the


system must behave when presented with specific inputs or conditions. These
may include calculations, data manipulation and processing and other specific
functionality. In this system following are the functional requirements:

• Collect the Datasets.

• Train the Model.

• Predict the results

4.5 Non-Functional Requirements


• The system should be easy to maintain.

• The system should be compatible with different platforms.

• The system should be fast as customers always need speed.

• The system should be accessible to online users.

• The system should be easy to learn by both sophisticated and novice


users.

• The system should provide easy, navigable and user-friendly


interfaces.

• The system should produce reports in different forms such as tables


and graphs foreasy visualization by management.
• The system should have a standard graphical user interface that allows
for the online
15

4.6 Performance Requirement

Performance is measured in terms of the output provided by the application.


Requirement specification plays an important part in the analysis of a system.
Only when the requirement specifications are properly given, it is possible to
design a system, which will fit into required environment. It rests largely with
the users of the existing system to give the requirement specifications because
they are the people who finally use the system. This is because the
requirements have to be known during the initial stages so that the system can
be designed according to those requirements. It is very difficult to change the
system once it has been designed and on the other hand designing a system,
which does not cater to the requirements of the user, is of no use
16

CHAPTER: 05

DATAANALYSIS

5.1 Data Preparation

The first figure bellow shows the structure of the dataset where all a
attributes are shown, with their type, in addition to glimpse of the
variables within each a attribute, as shown at the end of the figure the
Class type is integer which I needed to change to factor and identify the
0 as Not Fraud and the 1 as Fraud to ease the process of creating the
model and obtain visualizations.
17

The second figure shows the distribution of the class, the red bar which
contains 284,315 variables represents the non-fraudulent transactions, and the
blue bar with 492 variables represents the fraudulent transactions.

5.1.1 Correlation between attributes “Image from R”

The correlations between all the of the attributes within the dataset are
presented in the figure below.

Figure 3 - Correlations
18

5.1.2 Attribute with the most fraud

Figure 4 below shows attribute 18 the attribute with the most credit card
fraudulent transactions, the blue line represents the variable 1 which is the
fraudulent transactions.

Variable 18

5.1.3 Attribute with the less fraud

The figure below shows the variable that have the lowest number of fraudulent
transactions, as mentioned earlier the blue line represents the fraudulent
instances within the data.
19

Figure Variable28

5.2 Data Preprocessing

As there are no NAs nor duplicated variables, the preparation of the dataset
was simple the first alteration that was made to be able to open the dataset on
Weka program is changing the type of the class attribute from Numeric to Class
and identify the class as {1,0} using the program Sublime Text. Another
alteration was made on the type as well on the R program to be able to create
the model and the visualization.

5.3 Data modelling

After making sure that the data is ready to get modeled the four models were
created using both Weka and R. the model SVM was created using Weka only,
as for KNN, Logistic Regression and Naïve Bayes they were created using R
and Weka.

5.3.1 KNN

The K-Nearest Neighbor algorithm (KNN) is a supervised ML technique that


can be applied in both scenario instances, classification instances along with
20

regression instances (Mahesh, 2020).To figure the best KNN model two Ks
where used K=3 and K=7, both are presented with figures from both Weka and
R.

• K=3

During the making of the KNN model, I decided to create two models where
K=3 and K=7. Figure 5 shows the model created in R, the model scored an
accuracy of 99.83% and managed to correctly identify 91,719 transactions and
missed 155. As for the Weka program the model scored 99.94% for the
accuracy and missclassified 52 transactions.

As there are different accuracies the average of the accuracies is 99.89%.

• K=7
21

There was a slight decrease in the accuracy in the model created in R (Figure
6) as it scored 99.82% when K is 7, and the model miss classified 166
fraudulent transactions as nonfraudulent. As for Weka (Figure 7) the accuracy

is the same as K=3 99.94% with 52 misclassified transactions, the only


difference is within the classifications. The average of the accuracies is 99.88%

5.3.2 Naïve Bayes


Naïve Bayes is a classification algorithm that consider the being of a certain
trait within a class is unrelated to the being of any different feature, the main
use of it is for clustering and classifications, depending on the conditional
probability of happening (Mahesh, 2020).

The second model created by R is Naïve Bayes, figure 9 shows the


performance of the model, it scored an accuracy of 97.77% and misclassified
a total of 2,051 transactions, 33 fraudulent as nonfraudulent and 2018
nonfraudulent as fraudulent. There is a slight difference in the accuracy of the
Naïve bayes model created within Weka as its 97.73% and the
misclassification instances are 1,938.
22

5.3.3 Logistic Regression

Logistic Regression model is statical model where evaluations are formed of


the connection among dependent qualitative variable (binary or binomial
logistic regression) or variable with three values or higher (multinomial
logistic regression) and one independent explanatory variable or higher
whether qualitative or quantitative (Domínguez-Almendros et al., 2011).

The last model created using both R and Weka is Logistic Regression, the
model managed to score and accuracy of 99.92% in R (figure 11) with 70
23

misclassified instances, while it scored 99.91% in Weka with 77 misclassified


instances as presented in figure 10.

Regression

5.3.4 Support Vector Machine


Support Vector machine is a supervised ML technique with connected learning
algorithms which inspect data used for both classification and regression
analyses, it also performs linear classification, additionally to non-linear
classification by creating margins between the classes, which are created in
24

such a fashion that the space between the margin and the classes is maximum
which minimizes the error of the classification (Mahesh, 2020).

Finally, the model Support Vector Machine as show in figure 12 managed to


score 99.94% for the accuracy and misclassified 51 instances.

5.4 Evaluation and Deployment

The last stage of the CRISP-DM model is the evaluation and deployment
stage, as presented in table 2 below all models are being compared
to each other to figure the best model in identifying fraudulent credit
card transactions.

Accuracy is the overall number of instances that are predicted


correctly, accuracies are represented by confusion matrix where it
showed the True Positive (TP), True Negative (TN), False Positive (FP) and
False Negative (FN). True Positive represents the transactions that are
fraudulent and was correctly classified by the model as fraudulent. True
Negative represents the not fraudulent transactions that were correctly
predicted by the model as Not fraudulent. The third rating is False positive
which represents the transaction that are fraudulent but was misclassified as
not fraudulent. And finally False Negative which are the not fraudulent
transactions that were identified as fraudulent, table 1 below shows the
confusion matrix.
25

Actual/Predicted Positive Negative

Positive TP FN

Negative FP TN

Table 1 – Confusion Matrix

The table above shows all the components to calculate an accuracy of a model
which is displayed in the below equation.

Accuracy =

Table 2 – Table of Accuracies

Table 2 shows all of the accuracies of all the models that were created in the
project, all models performed well in detecting fraudulent transactions and
managed to score high accuracies. Out of all the models the model that scored
the best is Support Vector Machine as its accuracy is 99.94%, the second best
is Logistic Regression, then in third place is KNN as both Ks scored similar
26

accuracies, and the model that scored the lowest accuracy out of all models is
Naïve Bayes with a score of 97.76%.
27

CHAPTER: 06

SYSTEM DESIGN

6.1 Project Modules

Entire project is divided into 3 modules as follows:


Data Gathering and pre processing
Training the model using following
Machine Learning algorithms
1 SVM
2 Random Forest Classifier
3 Decision Tree

Module 1: Data Gathering and Data Pre processing

• A proper dataset is searched among various available ones and finalized


with the dataset.
• The dataset must be preprocessed to train the model.
• In the preprocessing phase, the dataset is cleaned and any redundant
values, noisy dataandnull values are removed.
• The Preprocessed data is provided as input to the module.

Module 2: Training the model

• The Preprocessed data is split into training and tes ng datasets in the
80:20 ra o to avoidthe problems of over-fi ng and under-fi ng.
• A model is trained using the training dataset with the following
algorithms SVM, Random Forest Classifier and Decision Tree
• The trained models are trained with the tes ng data and results are
visualized using bar graphs, sca er plots.
• The accuracy rates of each algorithm are calculated using different
params like F1 score, Precision, Recall. The results are then displayed
using various data visualiza on tools for analysis purpose.
28

• The algorithm which has provided the be er accuracy rate compared to


remaining algorithms is taken as final predic on model.

Module 3: Final Prediction model integrated with front end

• The algorithm which has provided be er accuracy rate has considered as


the finalpredic on model.

• The model thus made is integrated with front end.


• Database is connected to the front end to store the user informa on who
are using it.

SYSTEM ARCHITECTURE

Our Project main purpose is to making Credit Card Fraud Detection awaring
to people from credit card online frauds. the main point of credit card fraud
detection system is necessaryto safe our transactions & security. With this
system, fraudsters don't have the chance to make multiple transactions on a
stolen or counterfeit card before the cardholder is aware of the fraudulent
activity. This model is then used to identify whether a new transaction is
fraudulent or not. Our aim here is to detect 100% of the fraudulent transactions
while minimizing the incorrect fraud classifications.
29

Fig 5.1 System Architecture

Activity diagram

Activity diagram is an important diagram in UML to describe the dynamic


aspects of the system. Activity diagram is basically a flowchart to represent
the flow from one activity toanother activity. The activity can be described as
an operation of the system. The control flow is drawnfrom one operation to
another. This flow can be sequential, branched, or concurrent. Activity
diagramsdeal with all type of flow control by using different elements such
as fork, join, etc. The basic purposesof activity diagram are it captures the
dynamic behavior of the system. Activity diagram is used to showmessage
flow from one activity to another Activity is a particular operation of the
system. Activitydiagrams are not only used for visualizing the dynamic nature
of a system, but they are also used to construct the executable system by using
forward and reverse engineering [Link] only missing thing in the
activity diagram is the message part.
30

Fig 5.2 Activity Diagram


Use case diagram

In UML, use-case diagrams model the behavior of a system and help to


capture the requirements of the system. Use-case diagrams describe the high-
level functions and scope of a system. These diagrams also identify the
interactions between the system and its actors. The use cases and actors in
use-case diagrams describe what the system does and how the actors use it,
but not how the system operates internally. Usecase diagrams illustrate and
define the context and requirements of either an entire system or the
important parts of the system. You can model a complex system with a single
use-case diagram, or create many usecase diagrams to model the components
of the system. You would typically develop use- case diagrams in the early
phases of a project and refer to them throughout the development process.
31

Fig 5.3 Use case Diagram

Sequence Diagram

The sequence diagram represents the flow of messages in the system and is
also termed as an event diagram. It helps in envisioning several dynamic
scenarios. It portrays the communication between any two lifelines as a time-
ordered sequence of events, such that these lifelines took part at the run time.
In UML, the lifeline is represented by a vertical bar, whereas the message flow
is represented by a vertical dotted line that extends across the bottom of the
page. It incorporates the iterations as well as branching.
32

Fig 5.4
SEQUENCEDDIAGRAM

Data Flow Diagram

A Data Flow Diagram (DFD) is a traditional visual representation of the


information flows within a system. A neat and clear DFD can depict the right
amount of the system requirement graphically. It can be manual, automated,
or a combination of both. It shows how data enter sand leaves the system,
what changes the information, and where data is stored. The objective of a
DFD is to show the scope and boundaries of a system as a whole. It may be
used as a communication tool between a system analyst and any person who
plays a part in the order that acts as a starting point for redesigning a system.
The DFD is also called as a data flow graph or bubble chart.
33

Fig 5.5 Data Flow diagram

MODULES

• Data collection

• Data pre-processing

• Feature extraction

• Evaluation model

Data Collection

Data used in this paper is a set of product reviews collected from credit card
transactions records. This step is concerned with selecting the subset of all
available data that you will be working with. ML problems start with data
preferably, lots of data (examples or observations) for which you already know
34

the target answer. Data for which you already know the target answer is called
labelled data.

Fig. 2: Importing python packages for data exploration, preprocessing


and for using random

Data pre-processing

Pre-processing is the process of three important and common steps as follows:

Formatting: It is the process of putting the data in a legitimate way that it


would be suitable to work with. Format of the data files should be formatted
according to the need. Most recommended format is
.csv files.

Cleaning: Data cleaning is a very important procedure in the path of data


science as it constitutes the major part of the work. It includes removing
missing data and complexity with naming category and so on. For most of the
data scientists, Data Cleaning continues of 80% of work.
Sampling: This is the technique of analysing the subsets from whole large
datasets, which could provide a better result and help in understanding the
behaviour and pattern of data in an integrated way.
35

Data Exploration

Fig. 3: Data exploration

Pre-processing with python commands

STEP 1:

Fig. 4: Pre-Processing
36

STEP2:

Fig. 5: Preprocessing Step 2

STEP 3: Acquired trained and testing dataset from the large dataset

Fig. 6: Training and testing data


37

Fig. 7: Process of training and testing data extraction

Data visualization

Data Visualisation is the method of representing the data in a graphical and


pictorial way, data scientists depict a story by the results they derive from
analysing and visualising the data. The best tool used is Tableau which has
many features to play around with data and fetch wonderful results.

Feature extraction

Feature extraction is the process of studying the behavior and pattern of the
analyzed data and draw the features for further testing and training. Finally,
our models are trained using the Classifier algorithm. We use classify module
on Natural Language Toolkit library on Python. We use the labelled dataset
gathered. The rest of our labelled data will be used to evaluate the models.
Some machine learning algorithms were used to classify pre-processed data.
The chosen classifiers were Random forest. These algorithms are very popular
in text classification tasks.

Evaluation model

Model Evaluation is an essential part of the model development process. It


helps to find the best model that represents our data and how well the selected
model will work in the future. Evaluating model performance with the data
used for training is not acceptable in data science because it can effortlessly
generate overoptimistically and over fitted models. To avoid overfitting,
38

evaluation methods such as hold out and cross-validations are used to test to
evaluate model performance. The result will be in the visualized form.
Representation of classified data in the form of graphs. Accuracy is well-
defined as the proportion of precise predictions for the test data. It can be
calculated easily by mathematical calculation
i.e. dividing the number of correct predictions by the number of total
predictions.
39

CHAPTER: 07

IMPLEMENTATION

7.1 Algorithm

Step 1: Import dataset

Step 2: Convert the data into data frames format

Step3: Do random oversampling using ROSE package

Step4: Decide the amount of data for training


data and testing data Step5: Give 80% data
for training and remaining data for testing.
Step6: Assign train dataset to the models

Step7: Choose the algorithm among 3 different algorithms and create the
model

Step8: Make predictions for test dataset for each algorithm

Step9: Calculate accuracy for each algorithm

Step10: Apply confusion matrix for each variable

Step11: Compare the algorithms for all the variables and find out the best
algorithm.
40

CODE: - Importing Libraries

import numpy as np

# to store and analysis data


in dataframes import
pandas as pd # data
visualization import
[Link] as plt
import seaborn as sns
# python modules for data normalization and splitting from
[Link] import RobustScaler from
sklearn.model_selection import train_test_split

# python modules for creating training and


testing ml algorithms from [Link] import
SVC from [Link] import
DecisionTreeClassifier from [Link]
import RandomForestClassifier

# python modules for creating training and testing


Neural Networks import tensorflow as tf from
[Link] import load_model from
[Link] import Sequential from
[Link] import
Dropout,Dense

# evaluation

From [Link] import

accuracy_score,confusion_matrix,classification_report,
precision_score,recall_score, f1_score,roc_auc_score
import systemcheck
41

Data Acquisition
data = pd.read_csv('[Link]') data

Data Analysis
[Link]
[Link]()
[Link]()
[Link](x='Cl
ass', data=data)
print("Fraud:
",[Link]()/d
[Link]())
Fraud_class = [Link]({'Fraud': data['Class']})

Fraud_class. apply(pd.value_counts).
plot(kind='pie',subplots=True) fraud = data[data['Class'] == 1]
valid = data[data['Class'] == 0] [Link]()
[Link](figsize=(20,20)) [Link]('Correlation Matrix',
y=1.05, size=15)
[Link]([Link](float).corr(),linewidths=0.1,vmax=1.0,
square=True, linecolor='white', annot=True)

Data Normalization

rs = RobustScaler() data['Amount'] =

rs.fit_transform(data['Amount'].[Link](-1, 1)) data['Time']


=

rs.fit_transform(data['Time'].[Link](-1, 1)) data

Considering inputs columns and output column


X = [Link](['Class'], axis = 1)
42

Y = data["Class"]

Data splitting
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2,
random_state = 1) X_train

X
_
t
e
s
t
Y
_
t
e
s
t

def evaluate(Y_test, Y_pred):

print("Accuracy: ",accuracy_score(Y_test, Y_pred)) print("Precision:

",precision_score(Y_test, Y_pred)) print("Recall:

",recall_score(Y_test, Y_pred)) print("F1-Score:


",f1_score(Y_test, Y_pred)) print("AUC score:
",roc_auc_score(Y_test, Y_pred))
print(classification_report(Y_test, Y_pred, target_names =
['Normal', 'Fraud'])) conf_matrix = confusion_matrix(Y_test,
Y_pred) [Link](figsize =(6, 6)) [Link](conf_matrix,
xticklabels = ['Normal', 'Fraud'], yticklabels = ['Normal', 'Fraud'],
annot = True, fmt ="d"); [Link]("Confusion matrix")
[Link]('True class') [Link]('Predicted class') [Link]()
43

Creating algorithms, Training, Testing and Evaluating

# Creating Support Vector Classifier svm = SVC() # Training

SVC [Link](X_train, Y_train)

# Testing SVC

Y_pred_svm = [Link](X_test)

# Evaluating SVC evaluate(Y_pred_svm, Y_test)

#
Random
forest
model
creation
rfc =
Random
ForestCl
assifier()
# training [Link](X_train, Y_train)

# Testing

Y_pred_rf = [Link](X_test)

#Ev
alua
tion
eval
uate
(Y_
pred
_rf,
Y_t
est)
44

# Decision tree model creation

dtc = DecisionTreeClassifier() [Link](X_train, Y_train)

# predictions

Y_pred_dt_i = [Link](X_test) evaluate(Y_pred_dt_i, Y_test)

#Random forest balanced


weights from
[Link] import
RandomForestClassifier

# random forest model creation rfb =

RandomForestClassifier(class_weight='balanced') [Link](X_train,

Y_train)

# predictions

Y_pred_rf_b = rfb. predict(X_test) evaluate(Y_pred_rf_b, Y_test)


45

CHAPTER: 08
TESTING
Testing is a process of executing a program with intent of finding an error.
Testing presents an interesting
anomaly for the software engineering. The goal of the software testing is to
convince system developer and customers that the software is good enough for
operational use. Testing is a process intended to build confidence in the
software. Testing is a set of activities that can be planned in advance and
conducted systematically. Software testing is often referred to as verification &
validation.

8.1 Unit Testing

In this testing we test each module individually and integrate with the overall
system. Unit testing focuses verification efforts on the smallest unit of
software design in the module. This is also known as module testing. The
module of the system is tested separately. This testing is carried out during
programming stage itself. In this testing step each module is found to working
satisfactorily as regard to the expected output from the module. There are some
validation checks for fields also. It is very easy to find error debut in the
system.

8.2 Validation Testing


At the culmination of the black box testing, software is completely assembled as a
package, interfacing
errors have been uncovered and corrected and a final series of software tests. Asking
the user about the format required by system tests the output displayed or generated by
the system under consideration. Here the output format is considered the of screen
display. The output formation the screen is found to be correct as the format was
designed in the system phase according to the user need. For the hard copy also, the
output comes out as specified by the user. Hence the output testing does not result in
any correction in the system.
46

8.3 Functional Testing


Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items: Valid Input: identified classes of valid
input must be accepted. Invalid Input: identified classes of invalid input must be rejected.
Functions: identified functions must be exercised. Output: identified classes of application
outputs must be exercised. Systems/Procedures: interfacing systems or procedures must be
invoked. Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases Before functional testing is complete, additional tests are
identified and the effective value of current tests is determined.

8.4 Integration Testing


Data can be lost across an interface; one module can have an adverse effort on the other
sub functions when combined may not produces the desired major functions. Integrated
testingis the systematic testing for constructing the uncover errors within the interface.
The testing was done with sample data. The Developed system has run successfully for
this sample data. The need for integrated test is to find the overall system performance.

8.5 User acceptance testing

User Acceptance Testing is a critical phase of any project and requires significant
participation bythe end user. It also ensures that the system meets the functional
requirements. Some of my friendswere who tested this module suggested that this
was really a user-friendly application and givinggood processing speed.
47

CHAPTER: 09

ALGORITHM

9.1 Random Forest

Random forest is a supervised machine learning algorithm based on ensemble


learning. Ensemble learning is an algorithm where the predictions are derived
by assembling or bagging different models or similar model multiple times.
The random forest algorithm works in a similar way and uses multiple
algorithm i.e. multiple decision trees, resulting in a forest of trees, hence the
name "Random Forest". The random forest algorithm can be used for both
regression and classification tasks.

9.1.1 Advantages of using random forest

• The random forest algorithm is not biased and depends on multiple trees
where each tree is trained separately based on the data, therefore biasedness
is reduced overall.
• It’s a very stable algorithm. Even if a new data point is introduced in the
dataset it doesn’t affect the overall algorithm rather affect the only a single
tree.
• It works well when one has both categorical and numerical features.

• The random forest algorithm also works well when data possess missing
values, or when it’s not been scaled properly. Thus, using this Random forest
algorithm and decision trees algorithm we have extracted the accurate
percentage of detection of fraud from the given dataset by studying its
behavior. A confusion matrix is basically a summary of prediction results or
a table which is used to describe the performance of the classifier on a set of
test data where true values are known. It provides visualization of an
algorithm’s performance and allows easy identification of classes. Thus,
resulting in the computing of most performance measures by giving insights
not only the errors being made by the classification model but also tells the
48

type of errors being made. Trained Data and Testing Data is represented in a
confusion matrix which portrays:
• TP: True Positive which denotes the real data where customers are subjected
to fraud and are used for training and were accurately predicted.
• TN: True Negative denotes the data which was not predicted and doesn’t
match with the data which was subjected to the fraud.
• FP: False Positive is predicted but there is no possibility of the data to be
subjected to the fraud.

• FN: False Negative is not predicted but there is an actual possibility of the
data who is subjected to fraud.

Comprehensive 2D/3D plotting.

Fig. 8 Confusion matrix for testing dataset


49

Fig. 9: Confusion matrix for testing dataset


50

Fig. 10: Accurate result extracted from the random forest classification and
regression model using decision tree.

CHAPTER: 10

CONCLUSION

5.1 Conclusion

In conclusion, the main objective of this project was to find the most suited
model in credit card fraud detection in terms of the machine learning techniques
chosen for the project, and it was met by building the four models and finding
the accuracies of them all, the best model in terms of accuracies is Support
Vector Machine which scored 99.94% with only 51 misclassified instances. I
believe that using the model will help in decreasing the amount of credit card
fraud and increase the customers satisfaction as it will provide them with better
experience in addition to feeling secure.

5.2 Recommendations

There are many ways to improve the model, such as using it on different datasets
with various sizes, different data types or by changing the data splitting ratio, in
addition to viewing it from different algorithm perspective. An example can be
merging telecom data to calculate the location of people to have better
knowledge of the location of the card owner while his/her credit card is being
used, this will ease the detection because if the card owner is in Dubai and a
transaction of his card was made in Abu Dhabi it will easily be detected as fraud.
51

REFERENCE

[1] [Link], [Link], [Link],” Web Service mining and its


techniques in Web Mining” IJAEGT,Volume 2,Issue 1 , Page No.385-
389.
[2] F. N. Ogwueleka, "Data Mining Application in Credit Card Fraud
Detection System", Journal of Engineering Science and Technology, vol.
6, no. 3, pp. 311-322, 2019.

[3] G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, A. Riyaz, "A Machine


Learning

Approach for Detection of Fraud based on SVM", International Journal


of Scientific Engineering and Technology, vol. 1, no. 3, pp. 194-198,
2019, ISSN ISSN: 2277-1581.

[4] K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact
and detection techniques", International Journal of Computer Science
and Network (IJCSN), vol. 1, no. 4, pp. 31-35, 2019, ISSN ISSN: 2277-
5420.

[5] M. J. Islam, Q. M. J. Wu, M. Ahmadi, M. A. Sid- Ahmed, "Investigating


the Performance of NaiveBayes Classifiers and KNearestNeighbor
Classifiers", IEEE International Conference on Convergence
Information Technology, pp. 1541-1546, 2017.

[6] R. Wheeler, S. Aitken, "Multiple algorithms for fraud detection" in


Knowledge-Based Systems, Elsevier, vol. 13, no. 2, pp. 93-99, 2018.

[7] S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, R. Badgujar, "Credit


Card Fraud

Detection Using Decision Tree Induction Algorithm",


International Journal of Computer
52

[8] S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick,"Credit card


fraud detection using Bayesian and neural networks", Proceedings of the
1st international naiso congress on neuro fuzzy technologies, pp. 261-
270, 2017.

[9] S. Bhattacharyya, S. Jha, K. Tharakunnel, J. [Link], "Data mining


for credit card fraud: A comparative study", Decision Support Systems,
vol. 50, no. 3, pp. 602-613, 2019.

[10] Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic
regression",

You might also like