Survey On Credit Card Fraud Detection Techniques IJERTV3IS031593 PDF
Survey On Credit Card Fraud Detection Techniques IJERTV3IS031593 PDF
ISSN: 2278-0181
Vol. 3 Issue 3, March - 2014
Abstract - Due to a rapid advancement in the electronic The details of credit card should be kept private. To secure credit
commerce technology, the use of credit cards has dramatically card privacy, the details should not be leaked. Different ways to
increased. Since credit card is the most popular mode of steal credit card details are phishing websites, steal/lost credit
payment, the number of fraud cases associated with it is also cards, counterfeit credit cards, theft of card details, intercepted
rising. In this paper, the survey on the present techniques cards etc. For security purpose, the above things should be
available for detecting fraud in credit card is presented as a avoided. The credit card security is needed for the detection of
review paper. Fraud detection involves identifying fraud as valid and invalid number of transactions. Most fraudulent
quickly as possible once it has been done. Fraud detection transactions result from stolen card numbers rather than the actual
methods are continuously developed to defend criminals in theft of card. So, keep credit card safely.
adapting to their strategies. The transaction is classified as
normal, abnormal or suspicious depending on this initial A fraud committed over Internet like online credit card frauds
belief. Once a transaction is found to be suspicious, belief is becomes more popular because of their nature. In online fraud,
further strengthened or weakened according to its similarity the transaction is made remotely and only the card‟s details are
RT
with fraudulent or genuine transaction history using Bayesian needed. A manual signature, a PIN or a card imprint are not
learning. required at the purchase time. In most of the cases the genuine
Keywords - Credit card, fraud detection, supervised cardholder is not aware that someone else has seen or stolen
his/her card information. The simple way to detect this type of
IJE
Bankruptcy fraud, Theft fraud/counterfeit fraud, Application charges which have not been used by the card holder or any
fraud, Behavioral fraud. authorized person, close down the account to prevent more
fraudulent transactions and issue a new account number and new
Credit Card Fraud: Credit card fraud is divided into two types: card, and transfer old information to the new account.
Offline fraud: Offline fraud is done by using a stolen physical It‟s also a good idea to check credit report to be sure there‟s
card at any place. nothing else that looks suspicious. In most cases, the involvement
of law enforcement will be coordinated with the financial
On-line fraud: On-line fraud is committed over internet, phone, institution.
online shopping or when the card holder is not present.
How to Deal with identity theft? Identity theft is a particular type
Telecommunication Fraud [2] - The use of telecommunication of fraud in which a thief uses the personal information to set up
services to commit other forms of fraud. Consumers, businesses new accounts or get other benefits in the name of cardholder.
and communication service provider are the victims. Though it‟s not as common as other types of fraud, it can be more
challenging and cause more severe problems.
Computer Intrusion - Intrusion is defined as the act of entering
without warrant or invitation; that means “potential possibility of Some signs of identity theft are: cardholder is not receiving the
unauthorized attempt to access Information, Manipulate bills or other mail, receives credit card, being denied credit for no
Information Purposefully. Intruders may be from any apparent reason, getting calls or letters about things that were not
environment, an outsider (Or Hacker) and an insider who knows transaction by credit cardholder, being served court papers or
the layout of the system [3]. arrest warrants for things in which there is no involvement of
cardholder. Never assume that such unexplained occurrences are
Bankruptcy Fraud - Bankruptcy fraud means using a credit card just a mistake always look into the details to find out for sure.
while being absent. Bankruptcy fraud is one of the most
complicated types of fraud to predict [3]. 3. LITERATURE SURVEY
The fraud detection is a complex task and there is no system that
Theft Fraud/ Counterfeit Fraud [3] - In this section, the focus is on correctly predicts any transaction as fraudulent. The properties for
theft and counterfeit fraud, which are related to one other. Theft a good fraud detection system are:
fraud refers to the other person who is not the owner of the card.
As soon as the owner give some feedback and contact the bank, 1. Should identify the frauds accurately.
the bank will take measures to check the thief as early as possible.
RT
Likewise, counterfeit fraud occurs when the credit card is used 2. Should detect the frauds quickly.
remotely; where only the credit card details are needed.
3. Should not classify a genuine transaction as
Application Fraud [3] - When any people apply for a credit card fraud.
IJE
Internal Fraud - Banking sector allows their employees to access 3.1 Unsupervised outlier detection technique
customer data. The data is the same information needed to access An unsupervised outlier detection technique does not make any
online banking to customer accounts. So the fraud can be done assumption about the availability of labeled data. This method
easily by an employee. Instead of this, financial institutions simply seek those accounts, customer etc, whose behavior is
should require a password or PIN for net banking, and the “unusual” [7]. Unsupervised methods are useful in applications
password or PIN should be stored in the format of encrypted [5]. where there is no prior knowledge about the particular class of
observations in a data set. An advantage of using unsupervised
2. INTRODUCTION TO TYPES OF SOLUTIONS methods over supervised methods is that previously occurred
FOR THE FRAUD undiscovered types of fraud may be detected. There are some
Frauds and identity theft should be taken personally and the techniques which were used now a day they are as follows:
financially as a challenge. The frauds and identity theft can cause
a lot of frustration. Peer Group Analysis [7] - Peer Group Analysis (PGA) is an
unsupervised method for monitoring behavior over time in data
How to Deal with credit card fraud? Fraud is considered as mining [8]. The main task of PGA method is to identify peer
unauthorized use of credit card accounts. Usually fraud is groups for all the present target observations (objects). The tool
discovered when a credit card is lost or stolen, when unfamiliar detects individual objects that begin to behave in a different
charges on the billing statement are found, when calls or letters manner from objects to which they had previously been similar.
about transactions that have not been made, contacted by the Each object is selected as a target object and is compared with all
credit card company‟s fraud department to question about the other objects in the database, using either external comparison
charge. If the fraud is suspected on the account, then one should criteria or internal criteria by summarizing earlier behavior
contact the credit card company immediately. The credit card patterns of each object. A peer group of objects most similar to
company will be able to help in verifying the fraud, remove the the target object is chosen on the basis of comparisons. The tool
is a part of the data mining process that involves cycling between 3.2 Supervised outlier detection technique
the detection of objects that behave in anomalous ways and the Supervised outlier detection techniques assume the availability of
detailed examination of those objects. a data set which has been needed for the normal as well as the
outlier class. Supervised method detects fraudulent transactions
PGA method is used in credit card fraud detection by changing
that can be used to differentiate between those accounts or
the length of the time windows that is used initially to determine
transactions which are known to be fraudulent and those which
the peer group. are known to be legitimate. Classification techniques such as
statistical discriminate analysis and neural networks can be used
Break Point Analysis [7] - Break Point Analysis is another
to discriminate between fraudulent and non-fraudulent
unsupervised outlier detection tool that is developed for
transactions to give transactions a suspicion score. Supervised
behavioral fraud detection. A break point is an observation or
methods are only trained to differentiate between legitimate
time for detecting anomalous behavior. Break point analysis is
transactions and previously known fraud [7].
operated on the account level by comparing sequences of
transactions so that a change in behavior for a particular account
While doing the literature survey on various methods for fraud
is detected. In break point analysis, a fixed length moving detection, there are multiple approaches like Gass Algorithm,
window of transactions is present, as a transaction occurs it enters Bayesian Networks, Hidden Markov Model (HMM), Genetic
into the window and the oldest transaction from the window is
Algorithm (GA), A Fusion approach using Dempster-Shafer
removed.
Theory and Bayesian learning, Decision tree, Neural Network
(NN), Logistic Regression (LR).
An advantage of using break point analysis is that the „balanced‟
data is not required as the transactions between different accounts
Gass Algorithm [2] – Gass algorithm is a combination of genetic
are not compared and the anomalous sequences of events that
algorithm and scatter search. The basic idea is that the chance of
may indicate fraudulent behavior can be identified. survival for the stronger members of a population is larger than
that of the weaker members and as the generations increases the
K-Means Clustering technique [5] - K-Means clustering is the
average fitness of the population gets better. The less fit members
most simple and efficient method to cluster the data. Initially, the
of the generation are eliminated and the fittest members are
numbers of cluster K, and Centroid values are obtained. Any
selected as the parents for the next generation. This procedure is
random objects as the initial Centroid or the first K objects can
repeated until the best solution was found.
also serve as the initial Centroid. This technique is a non
hierarchical method; initially it takes the number of objects equal
Bayesian Networks [2] - For fraud detection, two Bayesian
to the final required number of clusters. Iterate until stable (= no networks to describe the behavior of user are constructed. First
RT
object move group): Bayesian network is constructed to model behavior under the
assumption that the user is fraudulent (F) and the second model is
1. Place K points into the space represented by the objects
constructed under the assumption that the user is a legitimate
that are being clustered. These points represent initial
(NF). The „fraud net‟ is set up by using expert knowledge and the
IJE
group centroids.
„user net‟ is set up by using data from non fraudulent users.
During operation the user net is adapted by a specific user based
2. Assign each object to the group that has the closest
on present data. By inserting evidence in the networks and
centroid.
propagating it through the network, the probability less than two
is obtained. This shows at what degree the observed user behavior
3. When all objects have been assigned, recalculate the
should meet typical fraudulent or non fraudulent behavior.
positions of the K centroids.
Bayesian networks also allow the integration of expert
4. Repeat Steps 2 and 3 until the centroids no longer
knowledge, which is used for initial set up in the models. On the
move. This produces a separation of the objects into
other hand, the user model is retrained in an unsupervised way
groups from which the metric to be minimized can be
using data. Thus Bayesian approach incorporates both, expert
calculated.
knowledge and learning.
Genetic Algorithm [2] - Genetic algorithms, inspired from knowledge of fraudulent and non fraudulent transactions in
natural evolution was first introduced by Holland (1975). Genetic database. NNs are best for large transaction dataset.
algorithms are an evolutionary algorithm which provides better
solutions as time progresses. Fraud detection has been usually in Logistic Regression [2] - The two data mining approaches, are
domain of Ecommerce data mining [10]. GA is used in data support vector machines and random forests, together with the
mining mainly for variable selection [11] and is mostly coupled well known logistic regression, as part of an attempt to detect the
with other DM algorithms. Its combination with other techniques credit card fraud. It is well-understood, easy to use, and it is most
has a very good performance. GA is used in credit card fraud commonly used for data-mining. Thus it provides a useful
detection for reducing the wrongly classified number of baseline for comparing performance of newer methods.
transactions. And it is easily accessible for computer
programming language implementations which make it strong in Supervised learning methods for fraud detection face two
credit card fraud detection. challenges. They are:
But this method has high performance and is quite expensive. 1. The unbalanced class sizes of legitimate and fraudulent
transactions, with legitimate transactions far
A Fusion Approach Using Dempster-Shafer Theory and outnumbering fraudulent ones.
Bayesian Learning [2] - Dempster-Shafer Theory proposes
Fraud Detection System using information fusion and Bayesian 2. The second is to develop supervised models for fraud
learning in which the evidences of both the current as well as the that can arise from potentially undetected fraud
past behavior are combined together and depending on certain transactions, leading to mislabeled cases in the data to
type shopping behavior establishes an activity profile for every be used for building the model.
cardholder.
For the purpose of the above problems, the fraudulent transactions
The advantages are high accuracy, processing speed, reduces are those specifically identified by the institutional auditors as
false alarm, improves detection rate, applicable in E-commerce. those that caused an unlawful transfer of funds from the bank
There is only one disadvantage of this approach that it is highly sponsoring the credit cards. These transactions were observed to
expensive. The FDS system consists of four components, namely, be fraudulent expose. The study is based on real-life data of
rule-based filter, Dempster–Shafer adder, transaction history transactions from an international credit card operation.
database and Bayesian learner. The transaction is classified as
suspicious or suspicious depending on its initial stage. Once a 4. ANALYSIS OF EXISTING TECHNIQUES
transaction is found to be suspicious, belief is strengthened or Srivastava et al. [1] has implemented a model to show the
RT
weakened by comparing fraudulent or genuine transaction. sequence of credit card transaction process and presents the
experimental results which shows the effectiveness of the system
Decision Tree [2] - Decision trees are statistical data mining and demonstrate the usefulness of learning the spending profile of
technique that uses independent attributes and a dependent cardholders. Comparative studies reveal that the Accuracy of the
IJE
attributes which are logically AND in a tree shaped structure. The system is close to 80 percent over a wide variation in the input
classification rules extracted from decision trees are IF-THEN data. Accuracy represents the fraction of total number of
expressions and all the tests have to succeed if each rule is to be transactions (both genuine and fraudulent) that have been detected
generated. Decision tree usually separates the complex problem correctly. The system is also scalable for handling large volumes
into many simple ones and resolves the sub problems through of transactions.
repeatedly using [11]. Decision trees are predictive decision
support tools which create mapping from observations. Decision Suman and Nutan [2] has presented a survey of current techniques
tree methods are C5.0, C&RT and CHAID. The data mining used in credit card fraud detection and telecommunication fraud.
techniques including decision trees and SVMs to the credit card In this paper, comprehensive review of different techniques to
fraud detection problem is useful in reducing the bank‟s risk. detect fraud is provided. Various types of frauds in this paper
include credit card frauds, telecommunication frauds, and
Neural Network [2] - Fraud detection methods based on neural computer intrusions, Bankruptcy fraud, Theft fraud/counterfeit
network are popular. An artificial neural network [12] consists of fraud, Application fraud, Behavioral fraud. Gass algorithm,
an interconnected group of artificial neurons .The principle of Bayesian networks, Hidden markov model, Genetic algorithm, A
neural network is motivated by the functions of the brain fusion approach using dempster-shafer theory and Bayesian
especially pattern recognition and associative memory [13]. The learning, Decision tree, Neural network and Logistic Regression
neural network identify similar patterns, predicts future values or techniques are explained to detect credit card fraud. One aim of
events based upon the associative memory of the learned patterns. this paper is to identify the user model that best identifies fraud
It is applied in classification and clustering. The advantages of cases.
neural networks over other techniques are that this model learns
from the past and thus, improve results as time passes. They can Delamaire et al. [3] has identified the different types of credit
also extract rules and predict future activity based on the current card fraud such as bankruptcy fraud, counterfeit fraud, theft fraud,
situation. application fraud and behavioral fraud and review alternative
techniques that include pair-wise matching, decision trees,
The two phases of neural network are training and recognition. clustering techniques, neural networks, and genetic algorithms.
Learning in a neural network is called training. The NN training Also state the problems that have been faced by the banks and
methods are supervised and unsupervised. In supervised training, credit card companies. The next step in this research program is to
samples of both fraudulent and non fraudulent records are taken focus on the implement of a „suspicious‟ scorecard on a real data-
to create models. While unsupervised training simply seeks those set and its evaluation. The main tasks should be to build scoring
transactions, which are more different from the normal one models to predict fraudulent behavior, taking into account the
though the unsupervised techniques do not need the previous fields of behavior that should be related to the different types of
credit card fraud identified in this paper, and to evaluate the
associated ethical implications. The plan is to take one of the simulation study. An example of credit card spending in 858
European countries, probably Germany, and then to extend the accounts over 52 weeks period with the total spending recorded
research to other EU countries. per week is shown and PGA can detect that the spending for these
weeks is unusual amongst accounts that have similar spending
Phua et al. [4] proposed an innovative fraud detection method, trends
built upon existing fraud detection research and Minority Report,
to deal with the data mining problem of skewed data distributions. Ferdousi and Maeda [8] has presented in this paper the problem
For experiment, Angoss Knowledge Seeker software is used. In of finding outliers in time series financial data using Peer Group
this paper, success rates X outperformed all the averaged success Analysis (PGA) which is a unsupervised technique for fraud
rates W by at least 10% on evaluation sets. When applied on the detection. It can be observed that PGA can detect those brokers
score set, bagged success rates Z performed marginally better than who suddenly start selling the stock in a different way to other
the averaged success rates Y. The future work is to make one brokers to whom they were previously similar. The experiment is
classifier more appropriate than another. conducted on PGA tool in an unsupervised problem over the
stock market data sets with continuous values over regular time
Esakkiraj and Chidambaram [5] has design a predictive model intervals. The experimental results were shown through graphical
with sequence of operations in online transaction by using hidden plots that peer group analysis can be useful in detecting
markov model (HMM) and decides whether the user act as a observations that deviate from their peers. Also t-statistics is
normal user or fraud user. In the trained system, the new applied to find the deviations effectively. The future work is to
transaction is evaluated with transition and observation integrate some other effective methods with PGA and also apply
probability. Depending upon the observation probability, system this strategy on other more applications, such as banking fraud
finds the acceptance probability and decides whether the detection.
transaction should be declined or not. Normally existing fraud
detection system for online banking will detect the fraudulent Mishra et al. [9] has present the necessary theory to detect fraud
transaction after completion of the transaction. This causes the in credit card transaction processing using a Hidden Markov
economic loss and makes the bank name as unsecured. The model Model and shows how the model is used for the detection of
predicts the fraudulent during the transaction time and prevents fraud. If an incoming credit card transaction is not accepted by
the money transfer. As future work, some effective classification the HMM with sufficiently high probability, it is considered to be
algorithms instead of using clustering which can perform well for fraudulent. At the same time, they try to ensure that genuine
the prediction. transactions are not rejected. Different ranges of transaction
amount as the observation symbols has been used whereas the
Sahin and Duman [6] has used seven classification methods using types of items has been considered to be states of the HMM. Also
RT
decision tree algorithm and SVM to build fraud detecting model a method for finding the Spending Profile of the Cardholders as
for the improvement of the financial transaction systems in an well as application of this knowledge in deciding the observation
effective way. This work demonstrates the advantages of applying symbols is suggested. It has also been explained how the HMM
the data mining techniques including decision trees and SVMs to can detect whether an incoming transaction is fraudulent or not
IJE
the credit card fraud detection problem with the real data set. In and if it is found to be fraudulent then how the user is notified
this study, the performance of classifier models built by using the instantly regarding the fraud. In the proposed model more than
well-known decision tree methods C5.0, C&RT and CHAID and 85% transactions are genuine and very low false alarms are about
a number of different SVM methods (SVM with polynomial, 8% of the total number of transactions. Comparative studies
sigmoid, linear and RBF kernel functions) are compared. When reveal that accuracy of the system is close to 82% over a wide
the performances of the models are compared with respect to range of input data.
accuracy, it is seen that as the number of the training data
increases, this over fitting behavior becomes less remarkable and In this paper, RamaKalyani and UmaDevi [10] are proposing a
the performances of the SVM based models become comparable credit card fraud detection system using genetic algorithm. The
to decision tree based models. But the number of frauds caught by aim is to develop a method of generating test data and to detect
SVM models is less than the decision tree models, especially fraudulent transaction by using the genetic algorithm. This
C&RT model. Though C5.0 model is the champion over the other algorithm is an optimization technique and evolutionary search
models with respect to accuracy for each sample, C&RT model based on the principles of genetic and natural selection, heuristic
catches the largest number of frauds. So the C&RT and C5.0 used to solve high complexity computational problems and
models are choose as the final methods to build the prediction examines the result based on the principles of this algorithm. This
model. As a future work, other data mining algorithms such as algorithm is applied into bank credit card fraud detection system
different versions of Artificial Neural Networks (ANN) and and the probability of fraud transactions can be predicted soon
logistic regression will be used to build new classification models after credit card transactions by the banks.
on the same real world dataset and the performance of the new
models will be compared with the performance of the models Chang et al. [11] proposed a new learning methodology towards
given in this paper. developing a novel intrusion detection system (IDS) by back
propagation neural networks (BPN) with sample-query and
Bolton and hand [7] has explained the two categories: behavioral attribute-query. In this paper, combination of data reduction and
fraud and application fraud. But this paper is concerned with classification with a query-based learning methodology is used
detecting behavioral fraud through the analysis of longitudinal because it is less time consuming. Experiment has showed that the
data. So two methods for unsupervised fraud detection in credit training time of the proposed method is 1447 seconds. However,
card are discussed here and have applied them to some real data the training of BPN is over 21746 seconds. The future work is to
sets. Peer group analysis which is the new tool for monitoring extend the concept of BPN to develop more learning methods for
behavior over time in data mining situations followed by break more real world applications.
point analysis were discussed here. It describes an implementation
of PGA to detect changes in credit card account spending Patidar and Sharma [11] has used the neural network along with
behavior and illustrates its propensity to detect outliers through a the genetic algorithm to detect fraudulent transaction. For the
learning purpose of artificial neural Network, supervised learning 5. CONCLUSION AND FUTURE WORK
feed forward back propagation algorithm is used. In this paper, Credit card fraud has become more and more rampant in recent
BPN is used for training purpose and then in order to choose years. Fraud detection methods are continuously developed to
those parameter (weight, network type, number of layer, number defend criminals in adapting to their strategies. In Fraud
of node etc.) that play an important role to perform neural as detection, identifying Fraud as quickly as possible once it has
accurately as possible, genetic algorithm is used and using this been done through fraud detection techniques, is now becoming
combined Genetic Algorithm and Neural Network (GANN), easier and faster. The techniques which were studied here,
detection of the credit card fraud is tried successfully. The future through which credit card fraud can be detected quickly and fast
work is to design some system that may control credit card fraud and the crime can be stopped.
before any real transaction is made.
The Future work is to design an improved technique which will
be much better than the available techniques.
Subashini and Chitra [13] has build the classifier models i.e. 6. ACKNOWLEDGMENT
C5.0, CART from five classification methods: decision tree, We would like to take opportunity to thank to Dr. Ashok K.
SVMs using SMO algorithm with kernels of polynomial Chauhan, Founder President, Amity University to provide
functions, Logistic regression and Bayes Net for detecting fraud necessary support and infrastructure to carry out the research
in the banking sector using credit card fraud data set. The work. We would like to thank Mr. Aditya Shastri, Vice
legitimate user is denoted by good and the fraud user is denoted chancellor, Banasthali University who has rendered their
by bad. C5.0 using J48, SVM using SMO and Bayes Net has been continuous help and support to us.
giving the success rate of 72.4% whereas the Bad to Good
classification is more in SVM using SMO because classifying a 7. REFERENCES
Bad customer as Good is more worse than the classifying a Good [1] A. Srivastava, A. Kundu, S. Sural, and A. K. Majumdar,
customer as Bad. While the logistic regression method provides a “Credit card fraud detection using hidden markov model”,
success rate of 73.1% and CART gets the highest success rate IEEE transactions on dependable and secure computing, vol.
74.1%. Therefore depending on the success rate CART 5, no. 1, january-march 2008.
outperforms the other models whereas considering the Bad to [2] Suman and Nutan “Review paper on credit card fraud
Good classification J48 shows better performance. Hence, while detection”, International Journal of Computer Trends and
classifying the customers different classification methods should Technology (IJCTT) – volume 4 Issue 7–July 2013.
be used to make the correct decision about a customer. [3] L. Delamaire, H. Abdou and J. Poinon, “Credit card fraud and
detection techniques: a review”, Banks and Bank Systems,
RT
Phua et al. [14] has categorizes, compares and explored almost all Volume 4, Issue 2, 2009.
published technical and review articles in automated fraud [4] Phua, D. Alahakoon and V. Lee, “Minority report in fraud
detection. The paper defines the professional fraudster, types and detection: classification of skewed data,” ACM SIGKDD
subtypes of fraud, the technical nature of data, performance
IJE
RT
IJE