0% found this document useful (0 votes)
38 views42 pages

Final - Documentation (Main)

This document discusses a project that aims to detect fraud in credit card transactions using machine learning techniques. It proposes using a neural network based unsupervised learning method and compares its performance to existing approaches like autoencoder, logistic regression, and k-means clustering. The proposed neural network method achieves 99.87% accuracy, outperforming the other methods which obtain 92%, 97%, and 99.75% accuracy respectively.

Uploaded by

cmaheshm3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views42 pages

Final - Documentation (Main)

This document discusses a project that aims to detect fraud in credit card transactions using machine learning techniques. It proposes using a neural network based unsupervised learning method and compares its performance to existing approaches like autoencoder, logistic regression, and k-means clustering. The proposed neural network method achieves 99.87% accuracy, outperforming the other methods which obtain 92%, 97%, and 99.75% accuracy respectively.

Uploaded by

cmaheshm3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

FRAUD DETECTION IN CREDIT CARD DATA USING MACHINE LEARNING

BASED SCHEME

A Main-Project Report submitted to


JNTUA, Ananthapuramu

In partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology
In
(Information Technology)
By
CH.SAI JAYANTH(18KB1A1204)
C. UMA MAHESH(18KB1A1203)
P. SOMA SEKHAR REDDY(18KB1A1210)

Under the esteemed Guidance of


Mr. K. Penchalaiah
Assistant Professor, Department. of CSE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


N.B.K.R INSTITUTE OF SCIENCE & TECHNOLOGY
VIDYANAGAR – 524 413, NELLORE DIST, AP
(Autonomous)
MAY 2022

1
Website: www.nbkrist.org. Ph: 08624-228-247
mail:[email protected]. Fax :08624-228 257

N.B.K.R. INSTITUTE OF SCIENCE & TECHNOLOGY


(Autonomous)
(Approved by AICTE: Accredited by NBA: Affiliated to JNTUA, Ananthapuramu)
An ISO 9001-2000 Certified Institution
Vidyanagar -524 413, Nellore District, Andhra Pradesh, India

BONAFIDE CERTIFICATE

This is to certify that the mini project work entitled “FRAUD DETECTION IN CREDIT CARD DATA
USING MACHINE LEARNING BASED SCHEME” is a bonafide work done by CH.SAI
JAYANTH(18KB1A1204), C.UMA MAHESH(18KB1A1203), P.SOMA SEKHAR REDDY
(18KB1A1210) in the department of Information Technology, N.B.K.R. Institute of Science &
Technology, Vidyanagar and is submitted to JNTUA, Ananthapuramu in the partial fulfillment for the
award of B.Tech degree in Information Technology. This work has been carried out under my supervision.

K. Penchalaiah Dr. A. Raja Sekhar Reddy


Assistant Professor ` HOD
Department of CSE Department of CSE
N.B.K.R.I.S.T N.B.K.R.I.S.T

Submitted for the Viva-Voce Examination held on

Internal Examiner External Examiner

2
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of a project would be incomplete
without the people who made it possible of their constant guidance and encouragement crowned our efforts
with success.

We would like to express our profound sense of gratitude to our project guide Mr. K. Penchalaiah,
Assistant Professor, Department of Computer Science & Engineering, N.B.K.R.I.S.T (affiliated to
JNTUA, Ananthapur) Vidyanagar, for his masterful guidance and the constant encouragement throughout
the project. Our sincere appreciations for his suggestions and unmatched services without, which this work
would have been an unfulfilled dream.

We convey our special thanks to Dr. Y.VENKATARAMI REDDY respectable chairman of


N.B.K.R. Institute of Science and Technology, for providing excellent infrastructure in our campus for the
completion of the project.

We convey our special thanks to Sri N.RAM KUMAR REDDY respectable correspondent of
N.B.K.R. Institute of Science and Technology, for providing excellent infrastructure in our campus for the
completion of the project.

We are grateful to Dr. V. VIJAYA KUMAR REDDY, Director, of N.B.K.R Institute of Science
and Technology for allowing us to avail all the facilities in the college.

We express our sincere gratitude to Dr. A. RAJA SEKHAR REDDY, Professor, Head of
Department, Computer Science & Engineering, for providing exceptional facilities for successful
completion of our project work.

We would like to convey our heart full thanks to Staff members, Lab Technicians, and our friends,
who extended their cooperation in making this project as a successful one.

We would like to thank one and all who have helped us directly and indirectly to complete this
project successfully.

3
ABSTRACT

Development of communication technologies and ecommerce has made the credit card as the most common
technique of payment for both online and regular purchases. So, security in this system is highly expected to
prevent fraud transactions. Fraud transactions in credit card data transaction are increasing each year. In this
direction, researchers are also trying the novel techniques to detect and prevent such frauds. However, there is
always a need of some techniques that should precisely and efficiently detect these frauds. This paper proposes a
scheme for detecting frauds in credit card data which uses a Neural Network (NN) based unsupervised learning
technique. Proposed method outperforms the existing approaches of Auto Encoder (AE), Logistic Regression
and K-Means clustering. Proposed NN based fraud detection method performs with 99.87% accuracy whereas
existing methods AE, Logistic Regression and K Means gives 92%, 97℅, and 99.75℅ accuracy respectively.

4
TABLE OF CONTENTS
CHAPTER NO CHAPTER NAME PAGENO

ACKNOWLEDGEMENT 3
ABSTRACT 4
1. INTRODUCTION 6
2. SURVEY OF THE LITERATURE 7 - 10
2.1 Introduction of Literature Survey
2.2 Existing system
2.3 Proposed system
2.4 Feasibility study
3. SYSTEM DESIGN 11 - 28
3.1 System Design
3.2 System Architecture
3.2.1 Description of Modules
3.3 Algorithms
3.4 UML Diagrams
3.5 System Requirements
3.5.1 Software requirements
3.5.2 Hardware requirements
4. IMPLEMENTATION 29 - 40
4.1 Test Plan
4.1.1 Test Procedure
4.1.2 Test Cases
4.2 Code
4.3 Input/Output
5. CONCLUSION AND FUTURE ENHANCEMENT 41
5.1 Conclusion
5.2 Future Enhancement
BIBILOGRAPHY 42

5
CHAPTER-1
INTRODUCTION

Falsification of the credit card can be defined as the unapproved use of a customer’s card data to create
purchases or to dismiss funds from the cardholder's record. The misconduct extortion starts from the credit card
when somebody incorrectly acquires the number printed on card or the essential records for the card to be
operated . The owner of the card, the agent by whom card is issued and even guarantor of a card might not be
informed of the fraud until the record is used to create purchases. As shopping through internet-based
applications and paying bills online has been come into practice, there is no longer requirement of a physical
card to create purchases.

Fraud detection in online shopping systems is the hottest topic nowadays. Fraud investigators, banking systems,
and electronic payment systems such as PayPal must have an efficient and complex fraud detection system to
prevent fraud activities that change rapidly. According to a Cyber Source report from 2017, the present fraud
loss by order channel, that is, the percentage of fraud loss in their web store was 74 percent and 49 percent in
their mobile channels. Based on this information, the lesson is to determine anomalies across patterns of fraud
behavior that have undergone change relative to the past

The rising of E-commerce business has resulted in a gentle growth within the usage of credit cards for online
transactions and purchases. With the rise in the usage of credit cards, the number of fraud cases has also been
doubled. Credit card frauds are those which are done with an intention to gain money in a deceptive manner
without the knowledge of the cardholder.

Motivation:
The main objective of this project is to use machine learning techniques for detecting frauds in credit card data
which uses a Neural Network (NN) based supervised learning technique. Our Proposed method outperforms the
existing approaches of Auto Encoder (AE) Logistic Regression and K-Means clustering.

• Our Models can be used for detecting the fraudulent transactions.

• It can be helpful for the customers without losing their information.

6
CHAPTER – 2

SURVEY OF LITERATURE

2.1 INTRODUCTION OF LITERATURE SURVEY

[1] L. Bhavya , V. Sasidhar Reddy , U. Anjali Mohan , S. Karishma, 2020, Credit Card Fraud
Detection using Classification, Unsupervised, Neural Networks Models, INTERNATIONAL
JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 09, Issue 04 (April
2020),

Nowadays online transactions have grown in large quantities. Among them, online credit card transactions
hold a huge share. Therefore, there is much need for credit card fraud detection applications in bans and
financial business. Credit card fraud purposes may be to obtain goods without paying or to obtain
unauthorized funds from an account. With the demand for money credit card fraud events became common.
This results in a huge loss in finances to the cardholder.

[2] Renjith, Shini. (2018). Detection of Fraudulent Sellers in Online Marketplaces using Support
Vector Machine Approach. International Journal of Engineering Trends and Technology. 57. 48-53.
10.14445/22315381/IJETT-V57P210.

The e-commerce share in the global retail spend is showing a steady increase over the years indicating an
evident shift of consumer attention from bricks and mortar to clicks in retail sector. In recent years, online
marketplaces have become one of the key contributors to this growth. Fraudulent e-commerce buyers and
their transactions are being studied in detail and multiple strategies to control and prevent them are discussed.
Another area of fraud happening in marketplaces are on the seller side and is called merchant fraud.
Goods/services offered and sold at cheap rates, but never shipped is a simple example of this type of fraud.
This paper attempts to suggest a framework to detect such fraudulent sellers with the help of machine
learning techniques.

[3] Saputra, Adi & Suharjito, Suharjito. (2019). Fraud Detection using Machine Learning in e-
Commerce. 10.14569/IJACSA.2019.0100943.

The volume of internet users is increasingly causing transactions on e-commerce to increase as well. We
observe the quantity of fraud on online transactions is increasing too. Fraud prevention in e-commerce shall
be developed using machine learning, this work to analyze the suitable machine learning algorithm, the
algorithm to be used is the Decision Tree, Naive Bayes, Random Forest, and Neural Network. Result of
evaluation using confusion matrix achieve the highest accuracy of the neural network by 96 percent, random
7
forest is 95 percent, Naïve Bayes is 95 percent, and Decision tree is 91 percent. Synthetic Minority Over-
sampling Technique (SMOTE) is able to increase the average of F1-Score from 67.9 percent to 94.5 percent
and the average of G-Mean from 73.5 percent to 84.6 percent.

[4] A. K. Rai and R. K. Dwivedi, "Fraud Detection in Credit Card Data using Unsupervised Machine
Learning Based Scheme," 2020 International Conference on Electronics and Sustainable
Communication Systems (ICESC), Coimbatore, India, 2020, pp. 421-426, doi:
10.1109/ICESC48915.2020.9155615.

Development of communication technologies and e-commerce has made the credit card as the most
common technique of payment for both online and regular purchases. So, security in this system is highly
expected to prevent fraud transactions. Fraud transactions in credit card data transaction are increasing each
year. In this direction, researchers are also trying the novel techniques to detect and prevent such frauds.
However, there is always a need of some techniques that should precisely and efficiently detect these
frauds. This paper proposes a scheme for detecting frauds in credit card data which uses a Neural Network
(NN) based unsupervised learning technique. Proposed method outperforms the existing approaches of
Auto Encoder (AE), Local Outlier Factor (LOF), Isolation Forest (IF) and K-Means clustering. Proposed
NN based fraud detection method performs with 99.87% accuracy whereas existing methods AE, IF, LOF
and K Means gives 97%, 98%, 98% and 99.75% accuracy respectively.

8
2.2 EXISTING SYSTEM

In existing system, models are build based on Auto Encoder (AE), Logistic Regression and K-Means
clustering to estimate fraudulent and non-fraudulent transactions. This techniques gives low precision scores and
recall scores and also lacks the robustness because of higher computational time.

Disadvantages:
• Low accuracy.
• Time consuming.
• High complexities.

2.3 PROPOSED SYSTEM

We propose this system to investigate a problem of whether it is valuable or not to use machine learning
techniques to detect whether the credit card is fraud or not fraud using Neural Networks.

Advantages:

• High accuracy.
• Time Saving.
• Low complexities.
• High reliability.

• Used for preventing credit card frauds by banks.


• Financial industries employees them heavily to prevent frauds.

2.4 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan
for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to
be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY

 TECHNICAL FEASIBILITY

 SOCIAL FEASIBILITY

9
ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have on the organization. The
amount of fund that the company can pour into the research and development of the system is limited. The
expenditures must be justified. Thus the developed system as well within the budget and this was achieved
because most of the technologies used are freely available. Only the customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system
developed must not have a high demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the client. The developed system must have a
modest requirement, as only minimal or null changes are required for implementing this system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of
training the user to use the system efficiently. The user must not feel threatened by the system, instead must
accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to
educate the user about the system and to make him familiar with it. His level of confidence must be raised so
that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

10
CHAPTER-3

SYSTEM DESIGN

3.1 SYSTEM DESIGN :

Figure : Flow chart for fraud detection

11
3.2 SYSTEM ARCHITECTURE:

Fig: Architecture for fraud detection

12
3.2.1 DESCRIPTION ABOUT MODULES

Over view of modules:

System

User

1. System:

1.1 Store Dataset:

The System stores the dataset given by the user.

1.2 Model Training:

The system takes the data from the user and fed that data to the selected model.

1.3 Graphs Generation:

The system takes the dataset given by the user, selects the model and generates the graph corresponding to the
selected model

2. User:

2.1 Load Dataset:

The user can load the dataset he/she want to work on.

2.2 View Dataset:

The User can view the dataset.

2.3 Select model:

User can apply the model to the dataset for accuracy.

2.4 Graphs:

User can evaluate the model performance using the graphs.

13
3.3 ALGORITHMS

K-Means Clustering:

There is an algorithm that tries to minimize the distance of the points in a cluster with their centroid – the k-
means clustering technique.

The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their
respective cluster centroid.

A cluster refers to a collection of data points aggregated together because of certain similarities.

You’ll define a target number k, which refers to the number of centroids you need in the dataset. A centroid is
the imaginary or real location representing the center of the cluster.

Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares.

In other words, the K-means algorithm identifies k number of centroids, and then allocates every data point to
the nearest cluster, while keeping the centroids as small as possible.

STEPS:
1. Choose the number of clusters k.
2. Select k random points from the data as centroids.
3. Assign all the points to the closest cluster centroid.
4. Re-compute the centroids of newly formed clusters.
5. Repeat steps 3 and 4.

Logistic Regression:

Logistic Regression was used in the biological sciences in early twentieth century. It was then used in many
social science applications. Logistic Regression is used when the dependent variable (target) is categorical.

For example,

To predict whether an email is spam (1) or (0)

Whether the tumor is malignant (1) or not (0)

Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for
this problem, there is a need for setting up a threshold based on which classification can be done. Say if the
14
actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be
classified as not malignant which can lead to serious consequence in real time.

From this example, it can be inferred that linear regression is not suitable for classification problem. Linear
regression is unbounded, and this brings logistic regression into picture. Their value strictly ranges from 0 to 1.

Types of Logistic Regression

1. Binary Logistic Regression

The categorical response has only two 2 possible outcomes. Example: Spam or Not

2. Multinomial Logistic Regression

Three or more categories without ordering. Example: Predicting which food is preferred more (Veg,
Non-Veg, Vegan)

3. Ordinal Logistic Regression

Three or more categories with ordering. Example: Movie rating from 1 to 5

Auto Encoders:

Autoencoders are a specific type of feed forward neural networks where the input is the same as the output. They
compress the input into a lower-dimensional code and then reconstruct the output from this representation. The
code is a compact “summary” or “compression” of the input, also called the latent-space representation.

An autoencoder consists of 3 components: encoder, code and decoder. The encoder compresses the input and
produces the code, the decoder then reconstructs the input only using this code.

Autoencoders are mainly a dimensionality reduction (or compression) algorithm with a couple of important
properties:

Data-specific: Autoencoders are only able to meaningfully compress data similar to what they have been trained
on. Since they learn features specific for the given training data, they are different than a standard data
compression algorithm like gzip. So we can’t expect an autoencoder trained on handwritten digits to compress
landscape photos.

Lossy: The output of the autoencoder will not be exactly the same as the input, it will be a close but degraded
representation. If you want lossless compression they are not the way to go.

Unsupervised: To train an autoencoder we don’t need to do anything fancy, just throw the raw input data at it.
15
Autoencoders are considered an unsupervised learning technique since they don’t need explicit labels to train on.
But to be more precise they are self-supervised because they generate their own labels from the training data.

Neural Networks:

A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data
through a process that mimics the way the human brain operates. Neural Networks are used for solving many
business problems such as sales forecasting, customer research, data validation, and risk management. It is
neurally implemented mathematical model. It contains huge number of interconnected processing elements
called neurons to do all operations. Information stored in the neurons are basically the weighted linkage of
neurons.

Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize
patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. The
patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound,
text or time series, must be translated.

Neural networks help us cluster and classify. You can think of them as a clustering and classification layer on
top of the data you store and manage. They help to group unlabeled data according to similarities among the
example inputs, and they classify data when they have a labeled dataset to train on. (Neural networks can also
extract features that are fed to other algorithms for clustering and classification; so you can think of deep neural
networks as components of larger machine-learning applications involving algorithms for reinforcement
learning, classification and regression.)

3.4 DETAILED DESIGN OF THE PROJECT

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling language in the
field of object-oriented software engineering. The standard is managed, and was created by, the Object
Management Group.

The goal is for UML to become a common language for creating models of object oriented computer software.
In its current form UML is comprised of two major components: a Meta-model and a notation. In the future,
some form of method or process may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying, Visualization, Constructing and
documenting the artifacts of software system, as well as for business modeling and other non-software systems.

16
The UML represents a collection of best engineering practices that have proven successful in the modeling of
large and complex systems.

The UML is a very important part of developing objects oriented software and the software development
process. The UML uses mostly graphical notations to express the design of software projects.

The Primary goals in the design of the UML are as follows:

1. Provide users a ready-to-use, expressive visual modeling Language so that they can develop and

exchange meaningful models.

2. Provide extendibility and specialization mechanisms to extend the core concepts.

3. Be independent of particular programming languages and development process.

4. Provide a formal basis for understanding the modeling language.

5. Encourage the growth of OO tools market.

6. Support higher level development concepts such as collaborations, frameworks, patterns and

components.

7. Integrate best practices.

USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram defined by and
created from a Use-case analysis. Its purpose is to present a graphical overview of the functionality provided by
a system in terms of actors, their goals (represented as use cases), and any dependencies between those use
cases. The main purpose of a use case diagram is to show what system functions are performed for which actor.
Roles of the actors in the system can be depicted.

The Use Case diagram for fraud detection is shown below.

17
Stores Data Load Data

Model Training View Data

System

User
Model Selection
Generate Graphs

Graphs

Fig: use case diagram for model

CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure
diagram that describes the structure of a system by showing the system's classes, their attributes, operations (or
methods), and the relationships among the classes. It explains which class contains information.

Fig: class diagram for model

SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of a Message Sequence Chart. Sequence
diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.

18
System User

Load Data

Displays Data

Model Selection

Model Training

Graphs

Generating graphs

Fig: sequence diagram for model

COLLABORATION DIAGRAM:

In collaboration diagram the method call sequence is indicated by some numbering technique as shown below.
The number indicates how the methods are called one after another. We have taken the same order management
system to describe the collaboration diagram. The method calls are similar to that of a sequence diagram. But the

19
difference is that the sequence diagram does not describe the object organization whereas the collaboration
diagram shows the object organization.

4: Model Training
6: Generating graphs

2: Displays Data
System User

1: Load Data
3: Model Selection
5: Graphs

Fig: collaboration diagram for model

DEPLOYMENT DIAGRAM

Deployment diagram represents the deployment view of a system. It is related to the component diagram.
Because the components are deployed using the deployment diagrams. A deployment diagram consists of nodes.
Nodes are nothing but physical hardware used to deploy the application.

User System

Fig: Deployment diagram for model

ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for
choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to describe
the business and operational step-by-step workflows of components in a system. An activity diagram shows the
overall flow of control.

20
User
System

Load Data

stores Data

View Data

Model Training

Select Model

Generate Graphs

Graphs

Fig: Activity diagram for model

COMPONENT DIAGRAM:

A component diagram, also known as a UML component diagram, describes the organization and wiring of the
physical components in a system. Component diagrams are often drawn to help model implementation details
and double-check that every aspect of the system's required functions is covered by planned development.

System User

21
3.5 SYSTEM REQUIREMENTS :

Software requirements:
Language used
Python is a widely used general-purpose, high level programming language. It was initially
designed by Guido van Rossum in 1991 and developed by Python Software Foundation. It was mainly
developed for emphasis on code readability, and its syntax allows programmers to express concepts in fewer
lines of code.
Python is a programming languagethat lets you work quickly and integrate systems more efficiently.
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to
be highly readable. It uses English keywords frequently where as other languages use punctuation, and it has
fewer syntactical constructions than other languages.

• Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.

• Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.

• Python is Object-Oriented − Python supports Object-Oriented style or technique of programming


that encapsulates code within objects.

• Python is a Beginner's Language − Python is a great language for the beginner-level programmers
and supports the development of a wide range of applications from simple text processing to WWW
browsers to games.

The biggest strength of Python is huge collection of standard library which can be used for the following:
• Machine Learning
• GUI Applications (like Kivy, Tkinter, PyQt etc. )
• Web frameworks like Django (used by YouTube, Instagram, Dropbox)
• Image processing (like OpenCV, Pillow)
• Web scraping (like Scrapy, BeautifulSoup, Selenium)
22
Figure : Python Features

Python XML Parser

XML is a portable, open source language that allows programmers to develop applications that can be read by
other applications, regardless of operating system and/or developmental language.

What is XML? The Extensible Markup Language XML is a markup language much like HTML or SGML.

This is recommended by the World Wide Web Consortium and available as an open standard.

XML is extremely useful for keeping track of small to medium amounts of data without requiring a SQL-based
backbone.

XML Parser Architectures and APIs the Python standard library provides a minimal but useful set of interfaces
to work with XML.

The two most basic and broadly used APIs to XML data are the SAX and DOM interfaces.

Simple API for XML SAX: Here, you register callbacks for events of interest and then let the parser proceed
through the document.

This is useful when your documents are large or you have memory limitations, it parses the file as it reads it
from disk and the entire file is never stored in memory.

Document Object Model DOM API : This is a World Wide Web Consortium recommendation wherein the
23
entire file is read into memory and stored in a hierarchical tree − based form to represent all the features of an
XML document.

SAX obviously cannot process information as fast as DOM can when working with large files. On the other
hand, using DOM exclusively can really kill your resources, especially if used on a lot of small files.

SAX is read-only, while DOM allows changes to the XML file. Since these two different APIs literally
complement each other, there is no reason why you cannot use them both for large projects.

Python Web Frameworks

A web framework is a code library that makes a developer's life easier when building reliable, scalable and
maintainable web applications.

Why are web frameworks useful?

Web frameworks encapsulate what developers have learned over the past twenty years while programming sites
and applications for the web. Frameworks make it easier to reuse code for common HTTP operations and to
structure projects so other developers with knowledge of the framework can quickly build and maintain the
application.

Common web framework functionality

Frameworks provide functionality in their code or through extensions to perform common operations required to
run web applications. These common operations include:

• URL routing

• HTML, XML, JSON, and other output format templating

• Database manipulation

• Security against Cross-site request forgery (CSRF) and other attacks

• Session storage and retrieval

HTML:

HTML, or Hyper Text Markup Language, allows web users to create and structure sections, paragraphs,
and links using elements, tags, and attributes. However, it’s worth noting that HTML is not considered a

programming language as it can’t create dynamic functionality.


24
HTML has a lot of use cases, namely:

• Web development. Developers use HTML code to design how a browser displays web page

elements, such as text, hyperlinks, and media files.

• Internet navigation. Users can easily navigate and insert links between related pages and websites

as HTML is heavily used to embed hyperlinks.

• Web documentation. HTML makes it possible to organize and format documents, similarly to

Microsoft Word.

It’s also worth noting that HTML is now considered an official web standard. The World Wide Web

Consortium (W3C) maintains and develops HTML specifications, along with providing regular updates.

This article will go over the basics of HTML, including how it works, its pros and cons, and how it relates

to css and JavaScript.

CSS:

CSS stands for Cascading Style Sheets. It is a style sheet language which is used to describe the look and
formatting of a document written in markup language. It provides an additional feature to HTML. It is generally
used with HTML to change the style of web pages and user interfaces. It can also be used with any kind of XML
documents including plain XML, SVG and XUL.

CSS is used along with HTML and JavaScript in most websites to create user interfaces for web applications and
user interfaces for many mobile applications. Before CSS, tags like font, color, background style, element
alignments, border and size had to be repeated on every web page. This was a very long process. For example: If
you are developing a large website where fonts and color information are added on every single page, it will be
become a long and expensive process. CSS was created to solve this problem. It was a W3C recommendation.

CSS is designed to make style sheets for the web. It is independent of HTML and can be used with any XML-
based markup language. Now let’s try to break the acronym:

• Cascading: Falling of Styles


• Style: Adding designs/Styling our HTML tags
• Sheets: Writing our style in different documents

25
Scikit - learn:

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection
of efficient tools for machine learning and statistical modeling including classification, regression, clustering and
dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python,
is built upon NumPy, SciPy and Matplotlib.

Matplotlib:

Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform
data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was
introduced by John Hunter in the year 2002.

One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily
digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram etc.

Pandas:

Pandas is defined as an open-source library that provides high-performance data manipulation in Python. The
name of Pandas is derived from the word Panel Data, which means an Econometrics from Multidimensional
data. It is used for data analysis in Python and developed by Wes McKinney in 2008.

Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are different
tools are available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas
because working with Pandas is fast, simple and more expressive than other tools.

Installing PyCharm:

1. To download PyCharm visit the website https://www.jetbrains.com/pycharm/download/ and Click the


"DOWNLOAD" link under the Community Section.
2. Once the download is complete, run the exe for install PyCharm. The setup wizard should have started.
Click “Next”.
3. On the next screen, Change the installation path if required. Click “Next”.
4. On the next screen, you can create a desktop shortcut if you want and click on “Next”.
5. Choose the start menu folder. Keep selected JetBrains and click on “Install”.
26
6. Wait for the installation to finish.
7. Once installation finished, you should receive a message screen that PyCharm is installed. If you want to
go ahead and run it, click the “Run PyCharm Community Edition” box first and click “Finish”.
8. After you click on "Finish," the Following screen will appear.

9. You need to install some packages to execute your project in a proper way.

10. Open the command prompt/ anaconda prompt or terminal as administrator.

11. The prompt will get open, with specified path, type “pip install package name” which you want to install
(like numpy, pandas, seaborn, scikit-learn, matplotlib.pyplot)

Ex: pip install numpy

27
3.5.2 HARDWARE REQUIREMENTS
The hardware requirements may serve as the basis for a contract for the implementation of the system and
should therefore be a complete and consistent specification of the whole system. They are used by software
engineers as the starting point for the system design. What the system do and not how it should be
implemented.
• Processor - I3/Intel Processor
• RAM - 4GB (min)
• Hard Disk - 128 GB
• Key Board - Standard Windows Keyboard
• Mouse - Two or Three Button Mouse
• Monitor - LCD

SOFTWARE REQUIREMENTS
The software requirements document is the specification of the system. It should include both a definition
and a specification of requirements. It is a set of what the system should do rather than how it should do it.
The software requirements provide a basis for creating the software requirements specification. It is useful in
estimating cost, planning team activities, performing tasks and tracking the teams and tracking the team’s
progress throughout the development activity.

• Operating System : Windows 7+


• Server side Script : Python 3.6+
• IDE : PyCharm
• Libraries Used : Pandas, Numpy, scikit-learn, Matplotlib

28
CHAPTER – 4
IMPLEMENTATION
4.1 TEST PLAN

The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault
or weakness in a work product. It provides a way to check the functionality of components, sub-assemblies,
assemblies and/or a finished product It is the process of exercising software with the intent of ensuring that the

Software system meets its requirements and user expectations and does not fail in an unacceptable manner.
There are various types of test. Each test type addresses a specific testing requirement.
Unit testing

Unit testing involves the design of test cases that validate that the internal program logic is functioning properly,
and that program inputs produce valid outputs. All decision branches and internal code flow should be validated.
It is the testing of individual software units of the application .it is done after the completion of an individual
unit before integration. This is a structural testing, that relies on knowledge of its construction and is invasive.
Unit tests perform basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately to the
documented specifications and contains clearly defined inputs and expected results.
Integration testing

Integration tests are designed to test integrated software components to determine if they actually run as one
program. Testing is event driven and is more concerned with the basic outcome of screens or fields. Integration
tests demonstrate that although the components were individually satisfaction, as shown by successfully unit
testing, the combination of components is correct and consistent. Integration testing is specifically aimed at
exposing the problems that arise from the combination of components.

Functional test

Functional tests provide systematic demonstrations that functions tested are available as specified by the
business and technical requirements, system documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

29
Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or special test cases.
In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes,
and successive processes must be considered for testing. Before functional testing is complete, additional tests
are identified and the effective value of current tests is determined.
4.1.1 TEST PROCEDURE

System testing ensures that the entire integrated software system meets requirements. It tests a configuration to
ensure known and predictable results. An example of system testing is the configuration oriented system
integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links
and integration points.
White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of the inner workings,
structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that cannot be
reached from a black box level.
Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings, structure or language of
the module being tested. Black box tests, as most other kinds of tests, must be written from a definitive source
document, such as specification or requirements document, such as specification or requirements document. It is
a testing in which the software under test is treated, as a black box .you cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.

Unit Testing:

Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle,
although it is not uncommon for coding and unit testing to be conducted as two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

• All field entries must work properly.

30
• Pages must be activated from the identified link.

• The entry screen, messages and responses must not be delayed.

Features to be tested

• Verify that the entries are of the correct format

• No duplicate entries should be allowed

• All links should take the user to the correct page.

4.1.2 TEST CASES

Integration Testing

Software integration testing is the incremental integration testing of two or more integrated software components
on a single platform to produce failures caused by interface defects.

The task of the integration test is to check that components or software applications, e.g. components in a
software system or – one step up – software applications at the company level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant participation by the end user.
It also ensures that the system meets the functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

4.2 CODE:

import os

import pandas as pd
import pygal
from flask import Flask,render_template,request
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
31
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans
import seaborn as sns

Total_transactions = len(data)
normal = len(data[data.Class == 0])
fraudulent = len(data[data.Class == 1])
fraud_percentage = round(fraudulent/normal*100, 2)
print(cl('Total number of Trnsactions are {}'.format(Total_transactions), attrs = ['bold']))
print(cl('Number of Normal Transactions are {}'.format(normal), attrs = ['bold']))
print(cl('Number of fraudulent Transactions are {}'.format(fraudulent), attrs = ['bold']))
print(cl('Percentage of fraud Transactions is {}'.format(fraud_percentage), attrs = ['bold']))

app = Flask(__name__)
app.config['upload_folder']= r'uploads'
global df
global path
@app.route('/')
def home():
return render_template('index.html')

@app.route('/load',methods=["POST","GET"])
def load_data():
if request.method=="POST":
print('1111')
files = request.files['file']
print(files)
filetype = os.path.splitext(files.filename)[1]
if filetype == '.csv':
print('111')
path = os.path.join(app.config['upload_folder'],files.filename)
files.save(path)
print(path)

32
return render_template('Load Data.html',msg='valid')
else:
return render_template('Load Data.html',msg= 'invalid')
return render_template('Load Data.html')

@app.route('/preprocess')
def preprocess():
file = os.listdir(app.config['upload_folder'])
path =os.path.join(app.config['upload_folder'],file[0])
df = pd.read_csv(path)
print(df.head())
df.isnull().sum()
return render_template('Pre-process Data.html',msg = 'success')

@app.route('/viewdata',methods=["POST","GET"])
def view_data():
file = os.listdir(app.config['upload_folder'])
path = os.path.join(app.config['upload_folder'], file[0])
df = pd.read_csv(path)
df1 = df.sample(frac=0.3)
df1 = df1[:100]
print(df1)
return render_template('view data.html',col_name = df1.columns, row_val=list(df1.values.tolist()))

@app.route('/model',methods=["POST","GET"])
def model():
# global lascore, lpscore, lrscore
# global nascore, npscore, nrscore
# global aascore, apscore, arscore
# global kascore, kpscore, krscore
global accuracy,recall,precision
global accuracy1, recall1, precision1
global accuracy3, recall3, precision3
global accuracy2, recall2, precision2
if request.method == "POST":
model = int(request.form['selected'])
33
file = os.listdir(app.config['upload_folder'])
path = os.path.join(app.config['upload_folder'], file[0])
df = pd.read_csv(path)
df1 = df.sample(frac=0.3)
X = df1.drop(['Time','Class'],axis = 1)
y= df1.Class
global train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state = 10)
if model == 1:
lr = LogisticRegression(solver='sag')
model1 = lr.fit(x_train,y_train)
pred = model1.predict(x_test)
accuracy = accuracy_score(y_test,pred)
precision= precision_score(y_test,pred)
recall = recall_score(y_test,pred)
return render_template('model.html',msg='accuracy',score =accuracy,selected = 'LOGISTIC
REGRESSION')
elif model == 2:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# We are transforming data to numpy array to implementing with keras
X_train = pd.np.array(X_train)
X_test = pd.np.array(X_test)
y_train = pd.np.array(y_train)
y_test = pd.np.array(y_test)
X_train.shape
model = Sequential([
Dense(units=20, input_dim=X_train.shape[1], activation='relu'),
Dense(units=24, activation='relu'),
Dropout(0.5),
Dense(units=20, activation='relu'),
Dense(units=24, activation='relu'),
Dense(1, activation='sigmoid')
])
34
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
nb_epoch = 50
batch_size = 32
model.fit(X_train, y_train, epochs=nb_epoch, batch_size=batch_size)
pred1 = model.predict(X_test)
model.evaluate(X_test, y_test)
accuracy1 = accuracy_score(y_test, pred1.round())
precision1 = precision_score(y_test, pred1.round())
recall1 = recall_score(y_test,pred1.round())
return render_template('model.html',msg= 'accuracy',score = accuracy1,selected = 'NEURAL
NETWORKS')

elif model == 3:
from keras.models import Model
from keras.layers import Input, Dense
from keras import regularizers
df = df1.drop(['Time'], axis=1)
X_train, X_test = train_test_split(df, test_size=0.2, random_state=42)
X_train = X_train[X_train.Class == 0]
X_train = X_train.drop(['Class'], axis=1)
y_test = X_test['Class']
X_test = X_test.drop(['Class'], axis=1)
X_train = X_train.values
X_test = X_test.values
X_train.shape
input_dim = X_train.shape[1]
encoding_dim = 14
input_layer = Input(shape=(input_dim,))
encoder = Dense(encoding_dim, activation="tanh",
activity_regularizer=regularizers.l1(10e-5))(input_layer)
encoder = Dense(int(encoding_dim / 2), activation="relu")(encoder)
decoder = Dense(int(encoding_dim / 2), activation='tanh')(encoder)
decoder = Dense(input_dim, activation='relu')(decoder)
autoencoder = Model(inputs=input_layer, outputs=decoder)
nb_epoch = 20
35
batch_size = 32
autoencoder.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])

history = autoencoder.fit(X_train, X_train,


epochs=nb_epoch,
batch_size=batch_size,

validation_data=(X_test, X_test)).history
predictions = autoencoder.predict(X_test)
mse = pd.np.mean(pd.np.power(X_test - predictions, 2), axis=1)
error_df = pd.DataFrame({'reconstruction_error': mse,
'true_class': y_test})
threshold = 10
y_pred3 = [1 if e > threshold else 0 for e in error_df.reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.true_class, y_pred3)
accuracy2 = accuracy_score(error_df.true_class, y_pred3)
precision2 = precision_score(error_df.true_class, y_pred3)
recall2 = recall_score(error_df.true_class, y_pred3)

return render_template('model.html',msg = 'accuracy',score = accuracy2,selected = 'AUTO


ENCODERS')

elif model == 4:
kmeans = KMeans(n_clusters=2, init='k-means++', )
model1 = kmeans.fit(X)
pre = model1.predict(X)
accuracy3 = accuracy_score(y, pre)
precision3 = precision_score(y, pre)
recall3 = recall_score(y, pre)
return render_template('model.html',msg = 'accuracy',score=accuracy3,selected = 'K-MEANS
CLUSTERING')
return render_template('model.html')

@app.route('/graph',methods = ["POST","GET"])
36
def graph():
print('ihdweud')
print('jhdbhsgd')
line_chart = pygal.Bar()
line_chart.x_labels= ['Logistic Regression','Neural Network','Auto Encoders','K-Means Clustering']
print('jdjkfdf')
line_chart.add('RECALL', [recall,recall1,recall2,recall3])
print('1')
line_chart.add('PRECISION', [precision,precision1,precision2,precision3])
print('2')
line_chart.add('ACCURACY', [accuracy,accuracy1,accuracy2,accuracy3])
print('3')
graph_data = line_chart.render()
print('4')
return render_template('graphs.html', graph_data=graph_data)

if __name__ == ('__main__'):
app.run(debug=True)

4.3 INPUT/OUTPUT :

Home:

37
Data Loading:

Data Pre-Processing:

38
Data Viewing:

Model Selection:

39
Graphs:

40
CHAPTER - 5
CONCLUSION AND FUTURE ENHANCEMENTS

.
5.1 CONCLUSION:

In this application, we have successfully created unsupervised ML models to detect whether the credit card is
fraud or not fraud. We noticed that out of Logistic Regression, K-Means Clustering, Neural Networks and Auto
Encoders Neural Networks performs well with good accuracy along with good precision and recall scores.

5.2 FUTURE ENHANCEMENT:

This system can be extended to improve the models precision and recall scores by applying imbalanced data
treatment techniques to improve precision and recall scores of our Machine Learning models.

41
REFERENCES

[1] Taha, Altyeb & Malebary, Sharaf. (2020). An Intelligent Approach to Credit Card Fraud Detection Using an
Optimized Light Gradient Boosting Machine. IEEE Access. 8. 25579-25587.

[2] Assaghir, Zainab & Taher, Yehia & Haque, Rafiqul & Hacid, Mohand-Said & Zeineddine, Hassan. (2019).
An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE
Access.

[3] L. Meneghetti, M. Terzi, S. Del Favero, G. A Susto, C. Cobelli, “DataDriven Anomaly Recognition for
Unsupervised Model-Free Fault Detection in Artificial Pancreas”, Ieee Transactions On Control Systems
Technology, (2018) pp. 1-15

[4] F. Carcillo, Y.-A. Le Borgne and O. Caelen et al., “Combining unsupervised and supervised learning in
credit card fraud detection”, Information Sciences, Elsevier (2019), pp. 1-15.

[5] Ashphak, Mr. & Singh, Tejpal & Sinhal, Dr. Amit. (2012). A Survey of Fraud Detection System using
Hidden Markov Model for Credit Card Application Prof. Amit Sinhal. 1.

[6] Renjith, Shini. (2018). Detection of Fraudulent Sellers in Online Marketplaces using Support Vector
Machine Approach. International Journal of Engineering Trends and Technology. 57. 48-53.
10.14445/22315381/IJETT-V57P210.

[7] Saputra, Adi & Suharjito, Suharjito. (2019). Fraud Detection using Machine Learning in e-Commerce.
10.14569/IJACSA.2019.0100943.

[8] A. K. Rai and R. K. Dwivedi, "Fraud Detection in Credit Card Data using Unsupervised Machine Learning
Based Scheme," 2020 International Conference on Electronics and Sustainable Communication Systems
(ICESC), Coimbatore, India, 2020, pp. 421-426, doi: 10.1109/ICESC48915.2020.9155615.

[9] John O. Awoyemi, Adebayo O. Adetunmbi, Samuel A. Oluwadare et al., “Credit card fraud detection using
Machine Learning Techniques: A Comparative Analysis”, IEEE, 2017.

[10] Rajendra Kumar Dwivedi, Sonali Pandey, Rakesh Kumar “A study on Machine Learning Approaches for
Outlier Detection in Wireless Sensor Network” IEEE International Conference Confluence, (2018).

42

You might also like