0% found this document useful (0 votes)

35 views36 pages

A Meta-Learning Framework For Detecting Financial Fraud

A Meta-Learning Framework for Detecting Financial Fraud

Uploaded by

bhadraashok03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views36 pages

A Meta-Learning Framework For Detecting Financial Fraud

A Meta-Learning Framework for Detecting Financial Fraud

Uploaded by

bhadraashok03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

MetaFraud: A Meta-Learning Framework for Detecting Financial Fraud

Author(s): Ahmed Abbasi, Conan Albrecht, Anthony Vance and James Hansen
Source: MIS Quarterly , December 2012, Vol. 36, No. 4 (December 2012), pp. 1293-1327
Published by: Management Information Systems Research Center, University of
Minnesota

Stable URL: https://www.jstor.org/stable/41703508

REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/41703508?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Management Information Systems Research Center, University of Minnesota is collaborating

with JSTOR to digitize, preserve and extend access to MIS Quarterly

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Quarterly -
MetaFraud: A Meta-Learning Framework for
Detecting Financial Fraud1
Ahmed Abbasi
Mclntire School of Commerce, University of Virginia,
Charlottesville, VA 22908 U.S.A. {[email protected]}

Conan Albrecht, Anthony Vance, and James Hansen

Information Systems Department, Marriott School of Management, Brigham Young University,
Provo, UT 84606 U.S.A. {[email protected]} {[email protected]} {[email protected]}

Financial fraud can have serious ramifications for the long-term sustainability of an organization , as well as
adverse effects on its employees and investors , and on the economy as a whole. Several of the largest
bankruptcies in U.S. history involved firms that engaged in major fraud. Accordingly , there has been
considerable emphasis on the development of automated approaches for detecting financial fraud. However ,
most methods have yielded performance results that are less than ideal. In consequence , financial fraud
detection continues as an important challenge for business intelligence technologies.

In light of the need for more robust identification methods , we use a design science approach to develop
MetaFraud, a novel meta-learning framework for enhanced financial fraud detection. To evaluate the pro-
posed framework, a series of experiments are conducted on a test bed encompassing thousands of legitimate
and fraudulent firms. The results reveal that each component of the framework significantly contributes to its
overall effectiveness. Additional experiments demonstrate the effectiveness of the meta-learning framework
over state-of-the-art financial fraud detection methods. Moreover, the MetaFraud framework generates
confidence scores associated with each prediction that can facilitate unprecedented financial fraud detection
performance and serve as a useful decision-making aid. The results have important implications for several
stakeholder groups, including compliance officers, investors, audit firms, and regulators.

Keywords: Fraud detection, financial statement fraud, feature construction, meta-learning, business
intelligence, design science

Introduction several high-profile frauds have been uncovered. These

frauds have had a significant negative impact on the economy
Financial fraud has important implications for investors,and stock markets worldwide, and have resulted in a general
regulators, auditors, and the general public. In recent years,
loss of confidence in the integrity of businesses (Albrecht,
Albrecht, and Albrecht 2008; Carson 2003). For instance,
major frauds at Enron, WorldCom, and several other firms
^sinchun Chen, Roger Chiang, and Veda Storey were the accepting senior
were the principal catalyst of a 78 percent drop in the
editors for this paper. Gediminas Adomavicius served as the associate
editor. NASDAQ Index between 2000 and 2002. Despite the
resulting changes to basic accounting and internal control pro-
cedures, the problem has not abated, as the number of known
The appendix for this paper is located in the "Online Supplements" section
of the MIS Quarterly's website (http://www.misq.org).

MIS Quarterly Vol. 36 No. 4, pp. 1293-1327 /December 2012 1293

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

Company Assets (Billions) When Filed Fraud Involved?

1. Lehman Brothers, Inc. $691.0 September 2008 Yes
2. Washington Mutual, Inc. $327.9 September 2008 Not yet determined
3. WorldCom, Inc. $103.9 July 2002 Yes
4. General Motors Corp. $91 .0 June 2009 Not yet determined
5. CIT Group, Inc. $80.4 November 2009 Not yet determined
6 Enron Corp. $65.5 December 2001 Yes
7. Conseco, Inc. $61.4 December 2002 Yes
8. Chrysler, LLC $39.3 April 2009 Not yet determined
9. Thornburg Mortgage, Inc. $36.5 May 2009 Not yet determined
10. Pacific Gas & Electric Co. $36.2 April 2001 No

Source: Bankruptcydata.com (2010)

frauds in the United States has increased considerably

financial in the investors, audit firms,
fraud detection capabilities:
past 10 years (Dechow et al. 201 1). According
and government toregulators
the Asso-(Cecchini et al. 2010; Lin et al.
ciation of Certified Fraud Examiners 2003).(2010),
Investors -the typical
a term we use broadly to include indi-
organization loses an estimated 5 percent
viduals, of its total
investment firms,annual
rating agencies, and others - often
revenue to fraud. Moreover, many of havethe
littlefraud schemes
inside information on potential fraud risk. Fraud
detection
responsible for these losses span several tools could
years, allow investors to make better informed
underscoring
decisions (Albrecht,
the ineffectiveness of existing fraud prevention Albrecht, and Dunn 2001). The second
mechanisms.
group, audit firms, could benefit by using a fraud risk assess-
Financial fraud has also contributed ment
to the bankruptcy
during of period, as well as during
the client acceptance
routine
major organizations. Table 1 shows that 4 audits
of the (Albrecht, Albrecht, and Albrecht 2008;
10 largest
bankruptcies in U.S. history were Cecchini
associated
et al. 20 1with major
0). Finally, enhanced fraud detection tools
could
financial frauds (in bold). Three other help a third
major group, government regulators. Since
bankruptcies,
Refco, Global Crossing, and Adelphia (the 13th,
regulators are 15th,
typically and 21st
limited in terms of their available time
and resources,
largest bankruptcies), were also associated effective
with fraud detection tools could allow
financial
frauds. There has not been sufficient time to know whether them to better prioritize and focus their investigatory efforts
some of the other recent bankruptcies involved fraudulent (Albrecht, Albrecht, and Dunn 2001; Dechow et al. 201 1).
activities.

Despite these needs, existing methods for financial fraud

Moreover, the problem appears to have global reach. For detection have been unable to provide adequate fraud detec-
example, major incidents in Australia (Harris Scarfe and tion capabilities, with most studies on U.S. firms attaining
HIH), Italy (Parmalat), France (Vivendi), the Netherlands detection rates of less than 70 percent (Cecchini et al. 2010;
(Royal Ahold), Korea, (SKGlobal), China (YGX), Japan Green and Choi 1 997; Lin et al. 2003). Moreover, many prior
(Livedoor Co.), and India (Satyam) indicate that financial studies utilized internal (i.e., non-public) data, which is more
fraud is occurring not just in the United States, but throughout costly and time-consuming to acquire, and is generally
the world (Albrecht, Albrecht, and Albrecht 2008). The unavailable to many of the aforementioned stakeholders
Association of Certified Fraud Examiners estimates that every (Cecchini et al. 2010). For instance, even audit firms and
year, worldwide financial fraud losses exceed a trillion regulators do not have access to internal data until an audit/
dollars. This estimate is based on hundreds of documented investigation is already taking place (Albrecht, Albrecht, and
financial fraud cases every year in Europe, Asia, Africa,
Albrecht 2008). A need remains for techniques capable of
South America, and Oceania, coupled with the reality that
providing enhanced detection of financial fraud using publicly
many frauds (and resulting losses) go undetected. available information. While financial fraud can never be
confirmed without a full investigation, advances in fraud
In addition to the large-scale economic implications, three
detection methods may provide red flags that warn stake-
stakeholders in particular stand to benefit from enhanced holders of the likelihood of fraud (Bay et al. 2006).

1 294 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

Recent developments in business intelligence (BI) tech- enhance fraud detection capabilities over existing techniques.
nologies have elevated the potential for discovering patterns Further, we provide (3) a confidence-level measure that iden-
associated with complex problem domains (Watson and tified a large subset of fraud cases at over 90 percent legiti-
Wixom 2007), such as fraud (Bolton and Hand 2002). mate and fraud recall, making fraud detection using public
Broadly, BI technologies facilitate historical, current, and information practicable for various stakeholders.
predictive views of business operations (Shmueli et al. 2007),
and may suggest innovative and robust methods for predicting The remainder of this paper is organized as follows. The next
the occurrence of fraud (Anderson-Lehman et al. 2004). In section reviews previous efforts to identify financial fraud,
fact, fraud detection is recognized as an important application and shows the need for more robust methods. The subsequent
area for predictive BI technologies (Brachman et al. 1996; section introduces the meta-learning kernel theory, and
Michalewicz et al. 2007). Since BI tools facilitate an im- describes its usefulness in the context of financial fraud. It
proved understanding of organizations' internal and external also introduces six hypotheses by which the design artifact
environments (Chung et al. 2005), enhanced financial fraud was evaluated. The design of the MetaFraud framework is
detection methods could greatly benefit the aforementioned then outlined, and details of the financial measures derived
stakeholder groups: investors, audit firms, and regulators. from publicly available information used to detect fraud are
The research objective of this study is to develop a business presented. The fifth section describes the experiments used
intelligence framework for detecting financial fraud using to evaluate the MetaFraud framework, and reports the results
publicly available information with demonstratively better of the hypothesis testing. A discussion of the experimental
performance than that achieved by previous efforts. results and their implications follows. Finally, we offer our
conclusions.

To achieve this objective, we adopted the design science para-

digm to guide the development of the IT artifact, the
MetaFraud framework (Hevner et al. 2004). In doing so, we
selected meta-learning as the kernel theory to guide the design Literature Review ^ i
of the IT artifact. Meta-learning is a specialized form of
machine learning that is able to learn about the learning While prior financial fraud det
process itself to increase the quality of results obtained sionally utilized internal data (e.g
(Brázdil et al. 2008). This ability was especially useful in our ships, personal and behavioral chara
context because of the complexities and nuances associated overrides), most recent studies reco
with fraud detection (Abbasi et al. 20 1 0; Virdhagriswaran and problematic for most stakehold
Dakin 2006). The MetaFraud framework encompasses four sources are not readily available t
components that advocate the use of a rich set of measures regulators (Albrecht, Albrecht, an
derived from publicly available financial statements, coupled et al. 2010; Dechow et al. 201 1). A
with robust classification mechanisms. We rigorously eval- discussion of prior financial fr
uated our framework in a series of experiments that demon- studies that have used publicly ava
strate the utility of each individual component of the frame- mation. These studies have genera
work, as well as the framework's overall effectiveness in from public financial statements i
comparison to existing state-of-the-art financial fraud detec- tical and/or machine learning-bas
tion methods. A summary of related fraud detect
data is presented in Table 2, which
The research contribution of this paper is the MetaFraud year of publication, a brief descript
framework, which demonstrates that BI techniques (Shmueli method used, the data set, and th
et al. 2010; Vercellis 2009) based on meta-learning can be fication accuracy and fraud detect
integrated via a design science artifact to detect financial see that the time line of prior stu
fraud with substantially higher performance than that obtained from Persons (1995) to Dikmen an
by previous research. Important aspects of the architecture which underscores the continuing
and process of MetaFraud include ( 1 ) robust feature construc- fraud detection to both academics
tion, which includes organizational and industry contextual
information and the use of quarterly and annual data, and (2) a With respect to the feature sets
method of fusing stacked generalization, semi-supervised varyingly employed data taken from
learning, and adaptive/active learning. We demonstrate in the Most studies used 8 to 10 financi
paper how these two facets of the framework substantially 1999a; Green and Choi 1997; Ki

MIS Quarterly Vol. 36 No. 4/

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

Annual Statement-

Study based Feature Set Classification Method(s) Data Set Results

Persons (1995) 10 financial measures Logistic regression 200 firm-years; Overall: 71.5%;
from previous year 100 fraud, Fraud: 64.0%
100 non-fraud

Green and Choi 5 financial and 3 Neural net 95 firm-years; Overall: 71.7%;
(1997) accounting measures 46 fraud, Fraud: 68.4%
49 non-fraud

Fanning and Cogger 26 financial and 36 Discriminant analysis, 204 firm-years; Overall: 63.0%;
(1998) accounting measures Logistic regression, Neural 102 fraud, Fraud: 66.0%
net 102 non-fraud

Summersand 6 financial measures Logistic regression 102 firm-years; Overall: 59.8%;

Sweeney (1998) 51 fraud, Fraud: 67.8%
51 non-fraud

Beneish (1999a) 8 financial measures Probit regression 2,406 firm-years; Overall: 89.5%;
74 fraud, Fraud: 54.2%
2,332 non-fraud
Spathis (2002)a 10 financial measures Logistic regression 76 firm-years; Overall: 84.2%;
38 fraud, Fraud: 84.2%
38 non-fraud

Spathis et al. (2002)a 10 financial measures Logistic regression, 76 firm-years; Overall: 75.4%;
UTADIS 38 fraud, Fraud: 64.3%
38 non-fraud

Lin et al. (2003) 6 financial and 2 Logistic regression, 200 firm-years; Overall: 76.0%;
accounting measures Neural net 40 fraud, Fraud: 35.0%
160 non-fraud

Kaminskietal. 21 financial measures Discriminant analysis 158 firm-years; Overall: 53.8%;

(2004) 79 fraud, Fraud: 21.7%
79 non-fraud

Kirkos et al. (2007)a 10 financial measures Bayesian net, 76 firm-years; Overall: 90.3%;
ID3 decision tree, Neural 38 fraud, Fraud: 91.7%
net 38 non-fraud

Gaganis (2009)a 7 financial measures Discriminant analysis, 398 firm-years; Overall: 87.2%;
Logistic regression 199 fraud, Fraud: 87.8%
Nearest neighbor, Neural 199 non-fraud

Cecchini et al. (2010) 23 financial variables SVM using custom 3,319 firm-years; Overall: 90.4%
used to generate ratios financial kernel 132 fraud, Fraud: 80.0%
3,187 non-fraud
Dikmen and 10 financial measures Three-phase cutting plane 126 firm-years; Overall: 67.0%;
Kûçûkkocaoglu algorithm 17 fraud, Fraud: 81.3%
(2010)b
Dechow et al. (201 1) 7 financial measures Logistic regression 79,651 firm-years; Overall: 63.7%;
293 fraud Fraud: 68.6%
79,358 non-fraud

aData taken from Greek firms. bData taken from Turkish firms.

1296 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

1995; Spathis 2002; Summers and Sweeney 1998). While fraud classification methods. The MetaFraud framework and
most prior studies using larger feature sets did not attain good its components are discussed in detail in the following
results (e.g., Fanning and Cogger 1997; Kaminski et al. 2004), sections.
Cecchini et al. (2010) had greater success using 23 seed
financial variables to automatically generate a large set of
financial ratios.
Using Meta-Learning as a Kernel Theory
The most commonly used classification methods were logistic for Financial Fraud Detection
regression, neural networks, and discriminant analysis, while
decision trees, Bayesian networks, and support vector mach- As mentioned in the previous section, and evidenced by
ines (SVM) have been applied in more recent studies Table 2, prior studies using data from U.S. firms have gener-
(Cecchini et al. 2010; Gaganis 2009; Kirkos et al. 2007). The ally attained inadequate results, with fraud detection rates
number of fraud firms in the data sets ranged from 38 firms to typically less than 70 percent. These results have caused
293 firms. Most studies used a pair-wise approach in which some to suggest that data based on financial statements is
the number of non-fraud firms was matched with the number incapable of accurately identifying financial fraud (at least in
of fraud firms (Fanning and Cogger 1998; Gaganis 2009; the context of U.S. firms). In one of the more recent studies
Kirkos et al. 2007; Persons 1995; Spathis 2002; Summers and on U.S. firms, Kaminski et al. (2004, p. 17) attained results
Sweeney 1998). However, a few studies had significantly only slightly better than chance, causing the authors to state,
larger sets of non-fraud firms (Beneish 1999a; Cecchini et al. "These results provide empirical evidence of the limited
2010; Dechow et al. 201 1). ability of financial ratios to detect fraudulent financial
reporting." A more recent study conducted by researchers at
In terms of results, the best performance values were achieved Pricewaterhouse Coopers attained fraud detection rates of 64
by Cecchini et al. (2010), Gaganis (2009), Kirkos et al. percent or lower (Bay et al. 2006). The limited performance
(2007), and Spathis (2002). Only these four studies had over- of these and other previous studies suggests that the financial
all accuracies and fraud detection rates of more than 80 measures and classification methods employed were insuffi-
percent. The latter three were all conducted using a datacient.
set Prior studies have generally relied on 8 to 10 financial
composed of Greek firms (mostly in the manufacturing
ratios, coupled with classifiers such as logistic regression or
neural networks. In light of these deficiencies, more robust
sector), which were governed by Greek legislation and Athens
Stock Exchange regulations regarding what is classifiedapproaches
as for detecting financial fraud are needed (Cecchini
et al. 2010).
unusual behavior by a firm. Given the differences in auditing
and reporting standards between Greece and the United
Design
States, as well as the larger international community (Barth et science is a robust paradigm that provides concrete
al. 2008), it is unclear how well those methods generalize/
prescriptions for the development of IT artifacts, including
translate to other settings. Cecchini et al. used an SVMconstructs, models, methods, and instantiations (March and
Smith 1995). In the design science paradigm, "Methods
classifier that incorporated a custom financial kernel. The
define processes. They provide guidance on how to solve
financial kernel was a graph kernel that used input financial
variables to implicitly derive numerous financial ratios. Their
problems, that is, how to search the solution space" (Hevner
approach attained a fraud detection rate of 80 percent on a et
25 al. 2004, p. 79). Several prior studies have utilized a design
fraud-firm test set. science approach to develop BI technologies encompassing
methods and instantiations (Abbasi and Chen 2008a; Chung
With respect to the remaining studies, Beneish (1999a) et
at-al. 2005). Accordingly, we were motivated to develop a
tained an overall accuracy of 89.5 percent, but this was
framework for enhanced financial fraud detection (i.e., a
primarily due to good performance on non-fraud firms, method).
whereas the fraud detection rate was 54.2 percent. With the
exception of Cecchini et al. no other prior study on U.S. firms
When creating IT artifacts in the absence of sufficient design
has attained a fraud detection rate of more than 70 percent.guidelines, many studies have emphasized the need for design
theories to help govern the development process (Abbasi and
Our analysis of these studies motivated several refinements
Chen 2008a; Markus et al. 2002; Storey et al. 2008; Walls et
al. 1992). We used meta-learning as a kernel theory to guide
that are incorporated in our meta-learning framework, namely
(1) the inclusion of organizational and industry-level context
the development of the proposed financial fraud detection
information, (2) the utilization of quarterly and annual
framework (Brázdil et al. 2008). In the remainder of this
statement-based data, and (3) the adoption of more robust
section, we present an overview of meta-learning and discuss

MIS Quarterly Vol. 36 No. 4/December 2012 1297

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./A Meta-Learning Framework for Detecting Financial Fraud

how meta-learning concepts can be utilized to address the bias specifies the representation of the space of hypotheses;
aforementioned research gaps, resulting in enhanced financial it is governed by the quantity and type of attributes
fraud detection capabilities. Testable research hypotheses are incorporated (i.e., the feature space). Procedural bias pertains
also presented. A framework based on meta-learning for to the manner in which classifiers impose constraints on the
financial fraud detection is then described and evaluated. ordering of the inductive hypotheses.

An effective meta-learning strategy dynamically identifies

Meta-Learnlng appropriate levels of declarative and procedural bias for a
given classification task (Vilalta and Drissi 2002). Declara-
Meta-learning is a specialized form of machine learning that tive bias can be manipulated by altering the feature space (i.e.,
uses the expertise acquired through machine learning or data via expansion or contraction), while procedural bias can be
mining processes to increase the quality of results obtained in improved by selecting a suitable predictive model or
future applications (Brázdil et al. 2008). While machine combination of models (Giraud-Carrier et al. 2004; Vilalta
learning provides a multitude of algorithms to complete a task and Drissi 2002). In the remainder of this section, we
without offering guidance about which particular algorithms describe how declarative bias can be improved by supple-
to use in a given context, in contrast, meta-learning provides menting existing measures based on annual statements with
a way to learn about the learning process itself to obtain context-based features and quarterly data. We also discuss
knowledge about which underlying features and algorithms how procedural bias can be enhanced by using stacked
can be most efficiently applied (Brázdil et al. 2008). We posit generalization and adaptive learning.
that a meta-learning approach is especially appropriate for
financial fraud detection because of the complex, dynamic,
and adversarial nature of this problem domain (Virdhagris- Improving Declarative Bias by Incorporating
waran and Dakin 2006). Context-Based Features and Quarterly Data
Meta-learning was developed in the late 1980s and earlyPrior financial fraud detection studies have typically utilized
1990s by a number of researchers who sought to integrate feature sets composed of fewer than 30 financial measures
several machine learning strategies to enhance overall accu-
(with most using fewer than 10). Moreover, prior studies
racy (e.g., Wolpert 1992). The term meta-learning was have relied on measures derived exclusively from annual
coined by Chan and Stolfo (1993), who proposed a method
statements. Consequently, the feature sets incorporated
for combining the outputs of multiple machine-learning tech-
lacked representational richness and were simply not large
niques in a self-adaptive way to improve accuracy. The
enough to generate appropriate hypothesis spaces for the
method has since evolved into several active streams of
classification methods utilized. Meta-learning approaches for
research in a variety of application domains (Brázdil et al.
expanding the feature space in order to improve declarative
2008; Vilalta and Drissi 2002). Collectively, meta-learning
bias include feature construction (i.e., the use of seed features
provides an array of prescriptions for improving machine
to generate additional features) as well as the extraction of
learning capabilities pertaining to a particular problem task.
new and existing features from additional (parallel) data sets
(Brázdil et al. 2008; Vilalta et al. 2004). Both of these
In machine learning, learning bias refers to any preference for
approaches are discussed below.
choosing one hypothesis that explains the data over other
equally acceptable hypotheses (Mitchell 1997). Meta-
learning can be defined as the ability to learn from experience
when different biases are appropriate for a particular problem
Industry-Level and Organizational
Context-Based Features
(Brázdil et al. 2008; Rendell et al. 1987). Meta-learning,
therefore, differs from conventional or base-learning tech-
Financial fraud detection relies on identification of financial
niques in that it enriches the model hypothesis space, which
increases the likelihood of finding good models. Yet, the irregularities. Prior studies have used measures derived from
firms' financial statements ( Kaminski et al. 2004; Lin et al.
space itself is of fixed form: no dynamic selection of bias
takes place. A critical need in the BI area is to devise2003; Summers and Sweeney 1998). While there is strong
theoretical evidence justifying the use of financial measures
methods of bias selection that can adapt to changes that may
occur in the problem domain (Vilalta and Drissi 2000). (Beneish 1999a; Dikmen and Kûçiikkocaoglu 2010), the
manner in which they have been utilized merely provides a
At another level, two key facets of meta-learning are declara-non-holistic snapshot, without appropriate consideration of
tive and procedural bias (Brázdil et al. 2008). Declarative the context surrounding these measures. Two important types

1298 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./A Meta-Learning Framework for Detecting Financial Fraud

of contextual information generally omitted by previous H la: Combining yearly financial measures with organi-
studies are organizational and industry-level contexts. zational context features will outperform the use of
yearly financial measures alone.
Organizational context information can be derived by com-
paring a firm's financial performance relative to its perfor- Hl b : Combining yearly financial measures with industry-
mance in prior periods. Auditors commonly compare firms' level context features will outperform the use of
financial measures across consecutive time periods in order to yearly financial measures alone.
identify potential irregularities (Ameen and Strawser 1994;
Green and Choi 1997). Further, prior financial fraud detec- HI c: Combining yearly financial measures with industry-
level and organizational context features will
tion studies suggest that utilizing measures from the preceding
outperform the use of yearly financial measures
year can provide useful information (Cecchini et al. 2010;
alone.
Fanning and Cogger 1998; Persons 1995; Virdhagriswaran
and Dakin 2006). Consideration of data across multiple time
Hid : Combining quarterly financial measures with
periods can reveal organizational trends and anomalies that
organizational context features will outperform the
are often more insightful than information derived from
use of quarterly financial measures alone.
single-period snapshots (Chen and Du 2009; Coderre 1999;
Green and Calderon 1995; Kinney 1987). Hie: Combining quarterly financial measures with
industry-level context features will outperform the
Industry-level context information can be derived by com- use of quarterly financial measures alone.
paring a firm's financial performance relative to the perfor-
mance of its industry peers. Prior studies have found that H If Combining quarterly financial measures with
certain industries have financial statement irregularity patterns industry-level and organizational context features
that are unique and distinctly different from other industries will outperform the use of quarterly financial
(Maletta and Wright 1996). Beasley et al. (2000) analyzed measures alone.

financial fraud cases arising in three industries over a two-

decade period. They found that technology companies and
financial-service firms differed considerably in terms of their
Quarterly Statement-Based Features
values for accounting measures as well as the categories of
Most prior studies of financial fraud used only information
fraud that were pervasive in the two industries. These
based on annual statements (Yue et al. 2007). However,
findings suggest that industry-level context information could
using both quarterly and annual data can provide a more
be highly useful for automated financial fraud detection.
robust feature space for financial fraud detection (Albrecht,
Albrecht, and Albrecht 2004). On the practice side, Putra
Despite the potential of industry-level context information to
provided an extended argument for including an examination
aid in fraud detection, thus far this information has not been
of both quarterly and annual data in assessments of financial
utilized in fraud detection research. Fanning and Cogger statement fraud (Accounting Financial and Tax 2009). This
(1998) noted that industry-level context information repre- has been affirmed by several recent research studies. Dechow
sented an important and unexplored future research direction: et al. (2011) proposed an F-score designed to detect ac-
"Certain variables may become useful classifiers when ex- counting fraud from misrepresentations made on quarterly or
amined in a sample stratified by industry" (p. 37). Similarly, annual financial statements. Dull and Tegarden (2004) in-
Spathis (2002, p. 2 1 8) claimed that "Industry standing prob- cluded quarterly financial data in their development of a
ably would provide additional valuable information." In this control-chart approach to monitoring key financial processes.
study we extend and exploit relevant contextual information. Chai et al. (2006) used quarterly financial statements as the
basis for developing rankings of the degree to which firms
The use of financial measures without appropriate contextual match rules associated with financial fraud.

information results in parsimonious input feature spaces.

Accordingly, we posit that feature sets utilizing yearly or Albrecht, Albrecht, and Albrecht (2004) argued that annual
quarterly financial measures in conjunction with organiza- numbers are often too aggregated to reveal significant differ-

tional and industry-level context information will result in ences, and that concealment methods differ dramatically
between the end of the first three quarters and the end of the
improved financial fraud detection performance in terms of
year-end quarter. In their analysis, one large company that
overall accuracy and class-level f-measure, precision, and
recall. manipulated its financial statements used unsupported topside

MIS Quarterly Vol. 36 No. 4/December 2012 1299

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

entries - entries that remove the discrepancy between actual H2a : Combining yearly and quarterly statement-based
operating results and published financial reports - as the features will outperform the use of only yearly
major way to commit fraud at the end of the first three quar- features in terms of fraud detection performance.
ters, but harder-to-detect, sophisticated revenue and expense
frauds at the end of the year. Another company used topside H2b : Combining yearly and quarterly statement-based
entries at the end of the first three quarters but shifted losses features will outperform the use of only quarterly
and debt to unconsolidated, related entities at the end of the features in terms of fraud detection performance.
year.

As a specific example, consider the cash flow earnings differ- Improving Procedural Bias Using Stacked
ence ratio for the Enron fraud (Figure 1). Figure la shows Generalization and Adaptive Learning
this ratio for Enron and its industry model, over a two-year
period, using only annual numbers. While Enron's values are Prior financial fraud detection studies have used several dif-
slightly lower than the industry model's, the graph exhibits no ferent classification methods, with logistic regression, neural
recognizable pattern. Figure lb shows this ratio for Enron networks, and discriminant analysis being the most common.
and its industry model on a quarterly basis. Enron's figures However, no single classifier has emerged as a state-of-the-art
are primarily positive for the first three quarters, and then technique for detecting financial fraud (Fanning and Cogger
sharply negative in the fourth quarter. Throughout the first 1998; Gaganis 2009; Kirkos et al. 2007). Therefore, the need
three quarters, Enron's management was using various ac- remains for enhanced classification approaches capable of
counting manipulations to make their income statement look improving procedural bias. Meta-learning strategies for
better (Albrecht, Albrecht, and Albrecht 2004; Kuhn and enhancing procedural bias include stacked generalization and
Sutton 2006). At the end of the year, Enron's management adaptive learning (Brázdil et al. 2008). Stacked generaliza-
"corrected" the discrepancy by shifting those losses to off- tion involves the use of a top-level learner capable of
balance sheet, nonconsolidated special purpose entities (now effectively combining information from multiple base learners
called variable interest entities). This difference in manipula- (Vilalta and Drissi 2002; Wolpert 1992). Adaptive learning
tion methods between quarters is not apparent when analyzing entails constant relearning and adaptation (i.e., dynamic bias
annual data, but it shows up in the cash flow earnings differ- selection) to changes in the problem environment, including
ence ratio when used with quarterly data. concept drift (Brázdil et al. 2008).

The above evidence supports the argument that incorporating

different levels of information granularity is important in Stacked Generalization
detecting financial fraud. Accordingly, we believe that using
context-based features taken from both annual and quarterly Classifiers are often diverse with respect to their categori-
statements will improve financial fraud detection performance zation performance, patterns, and biases (Kuncheva and
over using features from either category alone. Whitaker 2003; Lynam and Cormack 2006). In addition,

1 300 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

classifiers frequently provide complementary information that H 3 a: When yearly context-based features are used ' stack
could be useful if exploited in unison (Tsoumakas et al. classifiers will outperform individual classifiers in
2005). Prior financial fraud detection studies utilizing terms of fraud detection performance.
multiple classification methods have also observed some
levels of noncorrelation between the classifiers' predictions, H3b: When quarterly context-based features are used,
even when overall accuracies were equivalent. For instance, stack classifiers will outperform individual
Fanning and Cogger (1998) attained the best overall accuracy classifiers in terms of fraud detection performance.
using a neural network, yet the fraud detection rates were con-
siderably higher (i.e., 12 percent) when using discriminant
analysis. While comparing logistic regression and neural Adaptive Learning
network classifiers, Lin et al. (2003) noted that the two classi-
fiers achieved somewhat comparable overall accuracies; While fraud lies behind many of the largest bankruptcies in
however, logistic regression had 1 1 percent better perfor- history, there are considerable differences between the types
mance on non-fraudulent firms, while the neural network of frauds committed and the specific obfuscation tactics
obtained 30 percent higher fraud detection rates. Similarly, employed by previous firms. For instance, the $104 billion
Gaganis (2009) observed equally good overall results using a WorldCom fraud utilized a fairly straightforward expense
UT ADIS scoring method and a neural network; however, the
capitalization scheme (Zekany et al. 2004). In contrast, the
respective false positive and false negative rates for the two
$65 billion Enron fraud was highly complex and quite unique;
methods were exact transpositions of one another. These
the use of special-purpose entities as well as various other
findings suggest that methods capable of conjunctively lever-
tactics made detection very difficult (Kuhn and Sutton 2006).
aging the strengths of divergent classifiers could yield
These examples illustrate how financial fraud cases can be
improved financial fraud detection performance.
strikingly different in terms of their complexity and nature.
Effective financial fraud detection requires methods capable
Stacked generalization (also referred to as stacking ) provides
of discovering fraud across a wide variety of industries,
a mechanism for harnessing the collective discriminatory
reporting styles, and fraud types over time.
power of an ensemble of heterogeneous classification
methods (Wolpert 1992). Stacking involves the use of a top-
Fraud detection is a complex, dynamic, and evolving problem
level classification model capable of learning from the
(Abbasi et al. 2010; Bolton and Hand 2002). Given the
predictions (and classification biases) of base-level models in
adversarial nature of fraud detection, the classification mech-
order to achieve greater classification power (Brázdil et al.
2008; Hansen and Nelson 2002; Ting and Witten 1997; anisms used need constant revision (Abbasi et al. 2010).
Wolpert 1992). As Sigletos et al. (2005, p. 1751) noted, "The Adaptive learning methods have the benefit of being able to
success of stacking arises from its ability to exploit the relearn, in either a supervised or semi-supervised capacity, as
diversity in the predictions of base-level classifiers and thus new examples become available (Brázdil et al. 2008; Fawcett
predicting with higher accuracy at the meta-level." This and Provost 1997). The ability to adaptively learn is espe-
ability to learn from underlying classifiers makes stacking cially useful in the context of financial fraud because the risk
more effective than individual classifier-based approaches or environment surrounding fraud is expected to change, making
alternate fusion strategies that typically combine base-level it more difficult to detect (Deloitte 2010). Virdhagriswaran
classifications using a simple scoring or voting scheme and Dakin (2006, p. 947) noted that adaptive learning could
(Abbasi and Chen 2009; Dzeroski et al. 2004; Hu and greatly improve fraud detection capabilities by "identifying
Tsoukalas 2003; Lynam and Cormack 2006; Sigletos et al. compensatory behavior" by fraudsters "trying to camouflage
2005). Consequently, stacking has been effectively utilized their activities." Similarly, Fawcett and Provost (1997, p. 5)
in related studies on insurance and credit card fraud detection, observed that "it is important that a fraud detection system
outperforming the use of individual classifiers (Chan et al. adapt easily to new conditions. It should be able to notice
1999; Phua et al. 2004). new patterns of fraud." An adaptive, learning-based classifier
that is aware of its changing environment and able to con-
Given the performance diversity associated with fraud detec- stantly retrain itself accordingly, should outperform its static
tion classifiers employed in prior research, the use of stacked counterpart.
generalization is expected to be highly beneficial, facilitating
enhanced financial fraud detection capabilities over those H4: The use of an adaptive learning mechanism, capable of
achieved by individual classifiers. We predict this perfor- relearning as new information becomes available, will
mance gain will be actualized irrespective of the specific outperform its static counterpart in terms of fraud
context-based feature set utilized.
detection performance.

MIS Quarterly Vol. 36 No. 4/December 2012 1301

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

Collective Impact Attributable to Improving presented in the previous section, namely (1) the use of
Declarative and Procedural Bias organizational and industry contextual information, (2) the
use of quarterly and annual data, and the use of more robust
Based on the previous four hypotheses, we surmise that finan-
classification methods using (3) stacked generalization and
cial fraud detection methods incorporating meta-learning
(4) adaptive learning. In this section, we demonstrate how
principles pertaining to the improvement of declarative and
our meta-learning framework fulfills each of these require-
procedural bias are likely to provide enhanced discriminatory
ments to enhance financial fraud detection.
potential (Brázdil et al. 2008). Specifically, we expect that
the use of industry and organizational context information
The MetaFraud framework utilizes a rich feature set,
derived from both yearly and quarterly statements for declara-
numerous classification methods at the base and stack level,
tive bias improvement (H1-H2), coupled with and stacked
an adaptive learning algorithm. Each component of the
generalization and adaptive learning for procedural bias (shown in Figure 2) is intended to enhance
framework
improvement (H3-H4), will facilitate improvementsfinancial
in overall
fraud detection capabilities. Beginning with a set of
financial fraud detection capabilities. Accordingly,
yearly andwequarterly seed ratios, industry-level and organi-
hypothesized that, collectively, a meta-learning framework
zational context-based features are derived to create the yearly
that incorporates these principles will outperform existing
and quarterly feature sets (bottom of Figure 2). These feature
state-of-the art financial fraud detection methods.
sets are intended to improve declarative bias. The features are
used as inputs for the yearly and quarterly context-based
H5: A meta-learning framework that includes appropriate
classifiers. The classifications from these two categories of
provisions for improving declarative and procedural bias
classifiers are then used as input for a series of stack classi-
in concert will outperform existing methods in terms of
fiers. The adaptive, semi-supervised learning algorithm,
fraud detection performance.
shown at the top of Figure 2, uses the stack classifiers'
predictions to iteratively improve classification performance.
Prior studies have effectively used ensemble approaches in
The stack classifiers and adaptive learning algorithm are
concert with semi-supervised learning (Balean et al. 2005;
intended to improve procedural bias.
Ando and Zhang 2007; Zhou and Goldman 2004). For
instance, Zhou and Li (2005) markedly improved the per-
As a simple example, think of Stack Classifierl as an SVM,
formance of underlying classifiers on several test beds, in
which takes input from bottom (1) yearly context-based
various application domains, by using a three-classifier
classifiers and (2) quarterly context-based classifiers; such as
ensemble in a semi-supervised manner. It is, therefore, con-
SVM, J48, BayesNet, NaiveBayes etc. Stack_Classifier2
ceivable that such ensemble-based semi-supervised methods
might be a J48 classifier, which accepts inputs from the same
could also facilitate improved procedural bias for financial
bottom yearly and quarterly context-based classifiers: SVM,
fraud detection. However, given the reliance of such methods
J48,
on voting schemes across base classifiers (Balean et al.BayesNet,
2005; NaiveBayes etc. Output from the stack classi-
fiers is aggregated and input to the adaptive learner. The four
Zhou and Li 2005), we believe that ensemble semi-supervised
learning methods will underperform meta-learningcomponents
strategies of the framework are closely related to the
research
that harness the discriminatory potential of stacked hypotheses. Each of these components is explained
generali-
below.
zation and adaptive learning.

H6: A meta-learning framework that includes stacked gener-

Financial Fraud Detection Feature Sets
alization and adaptive learning will provide improved
procedural bias over existing ensemble-based semi-
supervised learning methods, resulting in enhanced
The framework uses two feature sets based on yearly and
financial fraud detection performance. quarterly information, respectively. The yearly feature set
uses 12 seed financial ratios derived from annual statements
to generate additional organizational and industry-level con-
text features. The quarterly feature set uses the quarterly
A Meta-Learning Framework for versions of the same 12 ratios, resulting in 48 seed measures,
Financial Fraud Detection to generate additional quarterly organizational and industry-
level context features. In the ensuing sections, we describe
the financial ratios utilized followed by a discussion of the
We propose a meta-learning framework for detecting financial
fraud to address the research gaps and related hypotheses
yearly and quarterly feature sets.

1302 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

Financial Ratios 4. Days Sales in Receivables (DSIR): DSIR is the ratio of

day sales in receivables in period t to the corresponding
Our selection of the seed financial ratios was influenced by
measure in period t-1. Firms engaging in revenue fraud
prior financial fraud detection studies. These 12 ratios are
often add fictitious receivables, causing DSIR to increase
described below.
(Green and Choi 1997; Kaminski 2004; Lin 2003).

1. Asset Quality Index (AQI): AQI is the ratio of non- 5 . Depreciation Index (DEPI) : DEPI is the ratio of the rate
current assets other than property, plant, and equipment, of depreciation in period t-1 as compared to period t.
to total assets, for time period t relative to time period t-1. Fictitious assets accelerate the depreciation rate, resulting
An AQI greater than 1 indicates that the firm has in smaller values for DEPI (Beneish 1999a; Cecchini et
potentially increased its involvement in cost deferral, a
al. 2010; Dikman and Kûçiikkocaoglu 2010).
possible indicator of asset overstatement fraud (Beneish
1999a; Dikman and Kûçiikkocaoglu 2010).
6. Gross Margin Index (GMI): GMI is the ratio of the gross
margin in period t-1 to the gross margin in period t. A
2. Asset Turnover (AT): AT is the ratio of net sales to total
GMI greater than 1 suggests that gross margins have
assets. When revenue fraud is being committed, net sales
deteriorated, a condition rarely encountered when a firm
are often increased artificially and rapidly, resulting in a
is engaging in revenue fraud (Beneish 1999a; Lin 2003).
large ^rvalue (Cecchini et al. 2010; Kirkos et al. 2007;
Spathis 2002; Spathis et al. 2002).
7. Inventory Growth (IG): IG assesses whether inventory
3. Cash Flow Earnings Difference (CFED): CFED has grown in period t as compared to period t-1. IG is
assesses the impact of accruals on financial statements used to detect whether ending inventory is being over-
(Beneish 1999a; Dechow et al. 201 1). This ratio is often stated to decrease cost of goods sold and increase gross
positive when revenue fraud is occurring or when margin (Cecchini et al. 2010; Dikman, and Kûçiik-
employees are engaging in cash theft. kocaoglu 2010; Persons 1995).

MIS Quarterly Vol. 36 No. 4/December 2012 1303

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

8. Leverage (LEV): LEV is the ratio of total debt to declarative bias in situations where the hypothesized set
total assets in period t relative to period t-1. LEV is generated by a particular feature set needs to be expanded
used to detect whether firms are fictitiously (Brázdil et al. 2008). In prior BI studies, feature construction
including assets on the balance sheet without any was used to derive complex and intuitive financial ratios and
corresponding debt (Cecchini et al. 2010; Beneish metrics from (simpler) seed accounting variables (Piramuthu
1999a; Kirkos, et al 2007; Persons 1995; Spathis et al. 1998; Zhao et al. 2009). These new features were often
2002; Spathis et al. 2002). generated by combining multiple seed measures using arith-
metic operators such as multiplication and division (Langley
9. Operating Performance Margin (OPM): OPM is et al. 1986; Zhao et al. 2009).
calculated by dividing net income by net sales.
When fraudulent firms add fictitious sales revenues, We used subtraction and division operators to construct new
they often fall to the bottom line without additional features indicative of a firm's position relative to its own prior
costs, thus inflating the value of OPM (Cecchini et performance (organizational context) or its industry (industry-
al. 2010; Persons 1995; Spathis 2002; Spathis et al. level context). First, the organizational context features were
2002). constructed by computing the difference between (-) and the
ratio of (/) the firms' seed financial ratios (described in the
10. Receivables Growth (RG): RG is the amount of previous section) in the current time period relative to their
receivables in period t divided by the amount in values for the same ratios in the previous time period.
period t-1. Firms engaging in revenue fraud often
add fictitious revenues and receivables, thereby Second, to generate the industry-level context features, we
increasing RG (Cecchini et al. 2010; Dechow et al. developed industry-representative models designed to charac-
terize what is normal for each industry. Each firm's industry
201 1; Summers and Sweeney 1998).
affiliation was defined by its North American Industry
Classification System (NAICS) code. NAICS was used since
11. Sales Growth (SG): SG is equal to net sales in
it is now the primary way Compustat and other standards
period t divided by net sales in period t-1. In the
bodies reference industries. Two types of models were devel-
presence of revenue fraud, the value of SG generally
oped. Top-5 models were created by averaging the data from
increases (Beneish 1999a; Cecchini et al. 2010;
the five largest companies in each industry-year (in terms of
Gaganis 2009; Dikman and Ktiçíikkocaoglu 2010;
sales), and then generating the 12 seed financial ratios from
Persons 1995; Summers and Sweeney 1998).
these averaged values. Hence, each industry had a single
corresponding top-5 model. Closest-5 models were created
12. SGE Expense (SGEE): SGEE is calculated by
for each firm by averaging the data from the five companies
dividing the ratio of selling and general administra-
from the same industry-year that were most similar in terms
tive expenses to net sales in period t by the same
of sales. Hence, each firm had a corresponding closest-5
ratio in period t-1. When firms are engaging in
model. The intuition behind using these two types of models
revenue fraud, SGE expenses represent a smaller
was that the top-5 models represent the industry members
percentage of the artificially inflated net sales, there-
with the greatest market share (and therefore provide a single
by causing SGEE to decrease (Beneish 1999a;
reference model for all firms in the industry), while closest-5
Cecchini et al. 2010; Dikman and Ktiçíikkocaoglu
models represent the firms' industry peers. As revealed in the
2010).
evaluation section, both types of models had a positive impact
on fraud detection performance.

Yearly and Quarterly Context-Based Feature Sets For a given model, total assets were calculated as the average
of total assets of the five companies, while the accounts
The yearly and quarterly context-based feature sets used the receivable was the average accounts receivable of the same
aforementioned seed ratios to derive industry-level and companies. Multiple firms were used to construct each model
organizational context features. The context features were in order to smooth out any non-industry-related fluctuations
developed using feature construction: the process of applying attributable to individual firms (Albrecht, Albrecht, and Dunn
constructive operators to a set of existing features in order to
2001). On the other extreme, using too many firms produced
generate new features (Matheus and Rendell 1989). Feature models that were too aggregated. Therefore, in our prelimi-
construction facilitates the fusion of data and domain knowl- nary analysis, we explored the use of different numbers of
edge to construct features with enhanced discriminatory firms and found that using five provided the best balance.
potential (Dybowski et al. 2003). In meta-learning, feature The industry-level context features were then constructed by
construction is recommended as a mechanism for improving computing the difference between (-) and the ratio of (/) the

1 304 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./ A Meta-Learning Framework for Detecting Financial Fraud

Type Description Quantity

Yearly financial ratios R1, R2....R12 12
R1-T1 , R2-T2....R12-T12 12
Industry-level context: Top-5 Model

___

R1-C1 , R2-C2,...R12-C12 12
Industry-level context: Closest-5 Model

R1-P1 , R2-P2....R12-P12 12
Organizational context

Total 84

Type Description Quantity

Quarterly financial ratios R1Q1, R2Q1,...R12Q4 48
R1Q1-T1Q1 , R2Q1-T2Q1 ,...R12Q4-T12Q4 48
Industry-level context: Top-5 Model

Industry-level context: Closest-5 R1Q1-C1Q1, R2Q1-C2Q1....R12Q4-C12Q4 48

Model
Organizational context R1Q2-R1Q1, R1Q3-R1Q2,...R12Q4-R12Q3 48
R1 Q2/R1Q1 , R1 Q3/R1 Q2, . . . R1 2Q4/R1 2Q3 48
Total 336

firms' see
firms' ratios in a particular quarter against those from the pre-
and viouscloses
quarter (e.g., R1Q2/R1Q1 denotes the ratio of a firm's
Asset Quality Index in quarter 2 as compared to quarter 1).
This resulted in a quarterly
Table 3 feature set composed
sh of 336
attributes.
firm, we
as 48 indu
correspo
Yearly and Quarterly Context-Based Classifiers
industry
feature R
Quality In
The yearly and quarterly context-based feature sets were
additional
coupled with an array of supervised learning classification
using the
methods. Prior studies have mostly used logistic regression
resulted
and neural network classifiers (Fanning and Cogger 1998; i
Green and Choi 1997; Lin et al. 2003; Persons 1995; Spathis
Table 4
2002). However, additional sh
classification methods have also
of the 12 seed financial ratios was derived from all four
attained good results for financial fraud detection (e.g., Kirkos
quarterly statements, resulting in 48 core features (R1Q1-
et al. 2007), as well as related fraud detection problems (e.g.,
R12Q4). These were used to generate 96 top-5 model-based
Abbasi and Chen 2008b; Abbasi et al. 2010), including
industry-level context features (e.g., R1Q1-T1Q1 and
support vector machines, tree classifiers, and Bayesian
R1Q1/T1Q1), and 96 closest-5 model-based features (e.g.,
methods. Given the lack of consensus on best methods, as
R1Q1-C1Q1 and R1Q1/C1Q1). Furthermore, 96 organiza-described earlier, a large number of classifiers were used in
tional context features were constructed by comparingorder
the to improve overall fraud detection performance. More-

MIS Quarterly Vol. 36 No. 4/December 2012 1305

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./ A Meta-Learning Framework for Detecting Financial Fraud

over, the use of a large set of classifiers also provided a highly Adaptive Learning
useful confidence-level measure, which is described in the
evaluation section. The ability of adaptive learning approaches to dynamically
improve procedural bias are a distinct advantage of meta-
Accordingly, we incorporated several classifiers in addition learning (Brázdil et al. 2008), particularly for complex and
to logistic regression and neural networks. Three support evolving problems such as fraud detection (Fawcett and
vector machines (SVM) classifiers were utilized: linear, Provost 1997). We propose an adaptive semi-supervised
polynomial, and radial basis function (RBF) kernels (Vapnik learning (ASL) algorithm that uses the underlying generalized
1 999). Two Bayesian classifiers were used: Naïve Bayes and stacks. ASL is designed to exploit the information provided
Bayesian Networks (Bayes 1958). Various tree-based classi- by the stack classifiers in a dynamic manner; classifications
fiers were employed, including the J48 decision tree, Naïve are revised and improved as new information becomes
Bayes Tree (NBTree), ADTree, Random Forest, and REPTree available. When semi-supervised learning is used, a critical
(Breiman 2001; Freund and Mason 1999; Kohavi 1996; problem arises when misclassified instances are added to the
Quinlan 1986). Two rule-based classifiers were also training data (Tian et al. 2007). This is a major concern in the
included: nearest neighbor (NNge) and JRip (Cohen 1995; context of financial fraud detection, where models need to be
Martin 1995). These 14 classifiers were each run using the updated across years (i.e., semi-supervised active learning),
yearly and quarterly feature sets, resulting in 28 classifiers in
since classification models can incorporate incorrect rules and
total: 14 yearly context-based classifiers and 14 quarterly
assumptions, resulting in amplified error rates over time.
context-based classifiers.
ASL addresses this issue in two ways. First, the expansion
process is governed by the stack classifiers' predictions. Only
test instances that have strong prediction agreement across the
Stacked Generalization
top-level classifiers in the generalized stacks are added to the
training data. Second, during each iteration, the training data
In the third component of the framework, we utilized stacked
set is reset and all testing instances are reclassified in order to
generalization to improve procedural bias, where the classi-
provide error correction.
fications from the underlying individual classifiers were used
as input features for a top-level classifier (Brázdil et al. 2008;
A high-level description of ASL's steps is as follows:
Hansen and Nelson 2002; Hu and Tsoukalas 2003). All 14
classifiers described in the previous section were run as top- the bottom-level classifiers and run them on the
1. Train
level classifiers, resulting in 14 different stack arrangements.entire test bed.
Stacking can be highly effective when incorporating large
quantities of predictions from underlying classifiers as input
2. Train the top-level classifiers in the generalized stack,
features (Lynam and Cormack 2006). Accordingly, for each
using the training and testing data generated by the
stack, we utilized all 28 individual classifiers as inputs for the
bottom-level classifiers.
top-level classifier: 14 yearly context-based classifiers and 14
quarterly context-based classifiers.
3. Reset the training data to include only the original
training instances.
The testing data for the top-level classifiers was composed of
the individual (i.e., bottom-level) classifiers' classifications on
the testing instances. The training data for the top-level 4. Rank the test instances based on the top-level classifiers'
predictions.
classifiers was generated by running the bottom-level classi-
fiers using 10-fold cross-validation on the training instances
5. If the stopping rule has not been satisfied, add the d test
(Dzeroski and Zenko 2004; Ting and Witten 1997). In other
words, the training data was split into 10 segments. In each instances with the highest rank to the training data (with
class labels congruent with the top-level classifiers'
fold, a different segment was used for testing, while the
remaining 9 segments were used to train the bottom-levelpredictions) and increment d. Otherwise go to step 7.
classifiers. The bottom-level classifiers' classifications from
the test instances associated with these 10 secondary folds 6. If d is less than the number of instances in the test bed,
collectively constituted the top-level classifiers' training data. repeat steps 1-5, using the expanded training data for
This approach was necessary to ensure that feature values in steps 1 and 2.
the training and testing instances of the stack classifiers were
consistent and comparable (Abbasi and Chen 2009; Witten 7. Output the predictions from the top-level classifiers in
and Frank 2005). the generalized stacks.

1306 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et aUA Meta-Learning Framework for Detecting Financial Fraud

Given training examples T= [tl9 t2, . . *„], training class labels L = [1/j, /2, . . /J, and testing instances R = [r„ r2, . . rm]
Let c denote the number of classification algorithms utilized (in this study, c- 14)
Initialize variable d to track number of items from R to add to T, where pisa predefined constant and d = p
While d <m

Derive yearly classifiers' test prediction matrix Y= 'yx = Yearly X{T, L, R), ...,yc = Yearly C(T, L, R)] and training data cross-
validation prediction matrix W=[wx = Yearly X(T, L), ...,wc = Yearly C{T, L)]
Derive quarterly classifiers' test prediction matrix QW= [q¡ = Quarterly(T, L, R), qc = Quarterly C(T, L, Ä)] and training data
cross-validation prediction matrix V=[vl = Quarterly X{T, L ), vc = Quarterly C(T, L )]
Derive top-level stack classifiers' prediction matrix S = [s, = Stackx{[W, V], L , [Y, Q ]), . . sc = Stackc([W, V], L, [Y, Q])'
Reset training data to original set of instances T=[tl9t2, . . /„], training class labels L = [/l5 /2, . . /„]
c c

Compute
L 1=1 1=1 J
, Í Uf'p' = c
Compute test instance weights X=[x{, . . xm] where x = '
[0 , otherwise
m

if Yjxi-d
/= i

Compute descending order rank of values in X :

J= [/, = arg max(a0 =X)J2 = arg max(a, = - 1 J, + l:m)), ... ,ym = arg maxfa^,)]
Set the d test instances with the highest rank V = [v, =jl9 ...,vd =jd]

[ l, if pvi > 0
Determine the instances' class labels Z = [z,, . . zj where z, = i ^

Add selected test instances to training data T= [tx, t2, ...,tn, F], / = [/ l9l2, ln9 Z]
Increment selection quantity variable d= d+p
Else

Exit Loop
End If

Loop
Output S

During each iteration, the training set is used to train (and run) Figure 3 shows the detailed mathematical formulation of the
the bottom-level classifiers on the entire test bed. The top- ASL algorithm. In each iteration, the yearly and quarterly
level classifiers are then run using the training and testing context-based classifiers are run with the training data T and
instance feature values generated by the bottom-level classi- class labels Y. These yearly and quarterly classifiers are each
fiers. The testing instances are ranked based on the top-level run in two ways. First, they are trained on T and run on the
classifiers' predictions, where instances with greater predic- testing data R to generate the two m x c test data prediction
tion agreement across the classifiers are given a higher rank. matrices Y and Q. Next, they are run on T using 10-fold cross
The selected instances are added to the original training data, validation in order to generate the two n x c training data
where the number of instances added is proportional to the matrices ( W and V) for the generalized stacks' top-level
iteration number (i.e., an increasing number of test instances classifiers (as described earlier). The predictions from the
are added during each subsequent iteration). Test instances top-level classifiers are used to construct the stack prediction
are added with the predicted class label (as opposed to the matrix S. Once the stack predictions have been made, the
actual label), since we must assume that the actual labels of training set is reset to its original instances in order to allow
the test instances are unknown (Chapelle et al. 2006). The error correction in subsequent iterations in the event that an
instances added in one iteration are not carried over to the erroneous classification has been added to the training set.
Next, the top-level classifiers' predictions for each instance
next one. The steps are repeated until all testing instances are
added during an iteration or the stopping rule has beenare aggregated across classifiers (in P ), and only those
reached. instances with unanimous agreement (i.e., ones deemed

MIS Quarterly Vol. 36 No. 4/December 2012 1307

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et alt A Meta-Learning Framework for Detecting Financial Fraud

legitimate or fraudulent by all top-level classifiers) are given 1 995 and 20 1 0. Based on these AAERs, fraudulent instances
a weight of 1 in X. If the number of instances in X with a for the fiscal years ranging from 1 985 to 2008 were identified.
value of 1 is greater than or equal to the selection quantity The information gathered from the AAERs was verified with
variable d9 we add d of these test instances to our training set other public sources (e.g., business newspapers) to ensure that
T with class labels that correspond to the top-level classifiers' the instances identified represented bona fide financial state-
predictions (Z). We then increment d so that a larger number ment fraud cases. Consistent with prior research, firms com-
of instances will be added in the following iteration. If there mitting fraud over a two-year period were treated as two
are insufficient unanimous agreement instances in X , we do separate instances (Cecchini et al. 2010; Dikmen and
not continue since adding ones where the top-level classifiers Kûçûkkocaoglu 2010; Persons 1995). Thus, the 815 fraudu-
disagree increases the likelihood of inserting misclassified lent instances were associated with 307 distinct firms.
instances into the training set. Otherwise, the process is
repeated until all testing instances have been added to the The legitimate instances encompassed all firms from the same
training set (i.e., d > m).
industry-year as each of the fraud instances (Beneish 1999a;
Cecchini et al. 2010). After removing all non-fraud firm-year
instances in which amendments/restatements had been filed
Evaluation as well as ones with missing statements (Cecchini et al. 20 1 0),
8,191 legitimate instances resulted. As noted by prior studies,
although the legitimate instances included did not appear in
Consistent with Hevner et al. (2004), we rigorously evaluated
any AAERs
our design artifact. We conducted a series of experiments to or public sources, there is no way to guarantee
that
assess the effectiveness of our proposed financial fraud none of them have engaged in financial fraud (Bay et al.
2006; Dechow et al. 201 1; Kirkos et al. 2007).
detection framework; each assessed the utility of a different
facet of the framework. Experiment 1 evaluated the proposed
Consistent with prior work (Cecchini et al. 2010), the test bed
yearly and quarterly context-based feature sets in comparison
was
with a baseline features set composed of annual statement-split into training and testing data based on chronological
based financial ratios (HI). Experiment 2 assessed theorder (i.e., firm instance years). All instances prior to 2000
effec-
were used
tiveness of using stacked classifiers. We tested the efficacy for training, while data from 2000 onward was
of combining yearly and quarterly information over used for testing. The training data was composed of 3,862
using
firm-year instances (406 fraudulent and 3,456 legitimate),
either information level alone (H2) and also compared stacked
while the
classifiers against individual classifiers (H3). Experiment 3 testing data included 5 , 1 44 firm-year instances (409
evaluated the performance of adaptive learning versus fraudulent
a static and 4,735 legitimate). All 14 classifiers described
learning model (H4). Experiments 4 and 5 assessed in the
thesection "Yearly and Quarterly Context-Based Classi-
fications"
overall efficacy of the proposed meta-learning framework in were employed. For all experiments, the classifiers
were trained on the training data and evaluated on the 5,144-
comparison with state-of-the-art financial fraud detection
instance test set.
methods (H5) and existing ensemble semi-supervised learning
techniques (H6).
For financial fraud detection, the error costs associated with
We tested the hypotheses using a test bed derivedfalse
fromnegatives (failing to detect a fraud) and false positives
(considering a legitimate firm fraudulent) are asymmetric.
publicly available annual and quarterly financial statements.
Moreover,
The test bed encompassed 9,006 instances (815 fraudulent these costs also vary for different stakeholder
groups.of
and 8,191 legitimate), where each instance was composed For investors, prior research has noted that investing
the information for a given firm, for a particular year.inHence,
a fraudulent firm results in losses attributable to decreases
in
for each instance in the test bed, the 12 financial ratios stock value when the fraud is discovered, while failing to
invest in a legitimate firm comes with an opportunity cost
(described earlier in the section "Financial Fraud Detection
(Beneish 1999a). Analysis has revealed that the median drop
Feature Sets") were derived from the annual and quarterly
financial statements for that year. in stock value attributable to financial fraud is approximately
20 percent, while the average legitimate firm's stock appre-
ciates
The data collection approach undertaken was consistent withat a rate of 1 percent, resulting in an investor cost ratio
of 1:20 (Beneish 1999a, 1999b; Cox and Weirich 2002).
the approaches employed in previous studies (e.g., Cecchini
From the regulator perspective, failing to detect fraud can
et al. 2010; Dechow et al. 2011). The fraudulent instances
result
were identified by analyzing all of the SEC Accounting in significant financial losses (Albrecht, Albrecht, and
and
Auditing Enforcement Releases (AAERs) posted betweenDunn 2001; Dechow et al. 201 1). On the other hand, false
positives come with unnecessary audit costs. According to

1 308 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et a! J A Meta-Learning Framework for Detecting Financial Fraud

the Association of Certified Fraud Examiners (2010), the uated the effectiveness of the yearly and quarterly context-
median loss attributable to undetected financial statement based feature sets (described earlier and in Tables 3 and 4) in
fraud is $4.1 million (i.e., cost of false negatives), while the comparison with a baseline feature set composed of the 12
median audit cost (i.e., cost of false positives) is $443,000 ratios described earlier. For the baseline, these 12 ratios were
(Charles et al. 2010). For regulators, this results in an derived from the annual statements, as done in prior research
approximate cost ratio of 1 : 1 0. Accordingly, in this study we (Kaminski et al. 2004; Kirkos et al. 2007; Summers and
used cost ratios of 1:20 and 1:10 to reflect the respective Sweeney 1998). The three feature sets were run using all 14
situations encountered by investors and regulators. classifiers described in the previous section.

It is important to note that, consistent with prior work, we Table 5 shows the results for the baseline classifiers. Tables
only consider error costs (Beneish 1999a; Cecchini et al. 6 and 7 show the results for the yearly and quarterly context-
20 1 0). In the case of the regulator setting, the cost breakdown based classifiers (i.e., the 14 classifiers coupled with the 84
is as follows: and 336 yearly and quarterly context-based features, respec-
tively). Due to space limitations, we report only the overall
• True Negatives: Legitimate firms classified as legitimate AUC and legitimate/fraud recall (shaded columns) and
(no error cost) legitimate/fraud precision when using the 1:20 investor cost
• True Positives: Fraudulent firms classified as fraudulent setting (Cecchini et al. 2010). Results for the regulator cost
(no error cost since the audit was warranted) setting can be found in Appendix A.
• False Negatives: Fraudulent firms classified as legiti-
mate (fraud-related costs of $4.1 million) For all three feature sets, the best AUC results were attained
• False Positives: Legitimate firms classified as fraudulent using NBTree and Logit. These methods also provided the
(unnecessary audit costs of $443,000) best balance between legitimate/fraud recall rates for the
investor cost setting. In comparison with the baseline classi-
Due to space constraints, in the following three subsections fiers, the yearly and quarterly context-based classifiers had
we only reported performance results for the investor situa-higher overall AUC values, with an average improvement of
tion, using a cost ratio of 1 :20. Appendices A, B, and C con-over 10 percent. For the quarterly and yearly context-based
tain results for the regulator situation (i.e., using a cost setting classifiers, the most pronounced gains were attained in terms
of 1:10). However, in the final two subsections, when com-of fraud recall (17 percent and 23 percent higher on average,
paring MetaFraud against other methods, results for both respectively). The results for the various yearly and quarterly
stakeholder groups' cost settings are reported. The evaluationcontext-based classifiers were quite diverse: six of the quar-
metrics employed included legitimate and fraud recall for the terly classifiers had fraud recall rates over 80 percent while
two aforementioned cost settings (Abbasi et al. 2010; three others had legitimate recall values over 90 percent.
Cecchini et al. 2010). Furthermore, area under the curveClassifiers such as SVM-Linear and REPTree were able to
(AUC) was used in order to provide an overall effectivenessidentify over 85 percent of the fraud firms (but with false
measure for methods across cost settings. Receiver operating positive rates approaching 40 percent). Conversely, tree-
characteristic (ROC) curves were generated by varying the based classifiers such as J48, Random Forest, and NBTree
false negative cost between 1 and 100 in increments of 0.1, had false positive rates below 1 0 percent, but with fraud recall
while holding the false positive cost constant at 1 (e.g., 1:1, rates of only 25 to 50 percent. This diversity in classifier per-
1:1.1,1:1.2, etc.), resulting in 99 1 different cost settings. Forformance would prove to be highly useful when using stacked
each method, AUC was derived from these ROC curves. generalization (see the subsection "Evaluating Stacked
Moreover, all hypothesis testing results in the paper also Classifiers").
incorporated multiple cost settings (including the investor and
regulator settings).
Context-Based Classifiers Versus
Baseline Classifiers

Comparing Context-Based Classifiers

against Baseline Classifiers We conducted paired t-tests to compare the performance of
our yearly and quarterly context-based classifiers against the
Most prior studies have utilized 8 to 1 0 financial ratios devoid baseline. Consistent with HI, three different settings were
of organizational or industry-level context information (e.g.,evaluated at the yearly and quarterly level: organizational,
Beneish 1999a; Dikmen and Kiiçûkkocaoglu 2010; Lin et al. industry, and organizational plus industry (i.e., the context
2003; Persons 1995; Spathis 2002). Accordingly, we eval- classifiers from Tables 6 and 7). The t-tests were conducted

MIS Quarterly Vol. 36 No. 4/December 2012 1 309

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.
SVM-Lin 0.522 91.3 ~9.9 6.8" 34.0 "ÃDTree " 0.612 ~94T~ 55.0 10.7 ~575~
LogitReg 0.703 96.1 67.0 16.6 65.0 RandForest 0.695 95.2 67.5 14.8 59.2
~J48 0.664 952 685 14^9 587 NBTree 0.687 95.8 84.5 30.0 54.8
BayesNet 0.673 95.5 74.0 18.2 58.2 REPTree 0.606 94.6 65.5 13.3 54.0
NaiveBayes 0.551 9Z4 51^6 ÕÃ~ 49.6 JRip 0.557 94.1 80.0 17.5 39.6
SVM-RBF 0.452 89.8 64.7 ŠIT" 14.7 NNge 0.665 93.1 72.0 11.2 33.3
SVM-Poly 0.538 91.3 52.0 ŤT" 38.9 NeuralNet 0.591 91.9 56.0 7.8 42.8

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.

SVM-Lin 0.694 99^3 36^9 117 96Ü ÃDTree 0.773 977 58^5 14^9 84.1
LogitReg 0791 97^0 701 17.8 74.8 RandForest 0.785 96.4 71.3 17.2 69.2
J48 0.669 94.1 82.0 16.3 40.6 NBTree 0.814 96.6 73.6 18.6 69.7

BayesNet 0.752 96.5 73.4 18.3 68.7 REPTree 0.624 96.0 61.1 13.5 70.2
NaiveBayes 0.716 98.2 55.2 14.6 88.5 JRip 0.626 95.3 69.7 14.6 60.2
SVM-RBF 0.645 94.2 63.3 11.5 55.3 NNge 0.703 94.2 70.3 12.7 50.1
SVM-Poly 0.729 98.5 52.5 14.2 90.7 NeuralNet 0.619 95.9 59.2 13.1 70.9

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.

SVM-Lin 0.733 987 603 16^5 907 ADTree 0.741 98^0 53^9 MA 87.3
LogitReg 0.780 96^5 6Õ4 16.6 70.7 RandForest 0.689 93.6 93.4 25.0 25.7
J48 0.739 95.5 90.8 31.9 49.9 NBTree 0.724 93.7 92.3 24.1 28.4

BayesNet 0.645 95.6 67.4 14.5 64.1 REPTree 0.761 97.9 52.4 13.6 86.8
NaiveBayes 0.724 96.7 50.7 12.3 80.2 JRip 0.670 95.6 63.3 13.4 66.0
SVM-RBF 0.745 98.2 60.3 16.0 87.5 NNge 0.703 93.2 80.6 12.3 31.5
SVM-Poly 0.742 97.1 55.0 13.4 80.7 NeuralNet 0.652 94.4 65.6 12.1 55.0

on legitimate and fraud recall. We ran

Table
paired
8 shows
t-tests
the
using
t-test results. The three
eachof
cost settings of 1 :5 to 1 :50 (in increments significantly
5) across the outperformed
14 the baselin
fraud
classifiers. For each test, this resulted in 140 recall/precision
controlled pairs: and legit precision. Ho
10 cost settings multiplied by 14 classifiers.
formance This particular
gains for legitimate recall were not
alpha set the
range of costs was chosen since it incorporated to 0.05, with p-values of 0.21 1, 0.
two cost
This employed
settings used in this study as well as those was due to inthe imbalanced performance
prior
studies (e.g., Beneish 1999a; Cecchini etratios
al. 2010),
(highand there-
legit recall and very low fraud r
less, for
fore represented a reasonable range of costs the various
yearly stake-
organizational, industry, and
holder groups. Costs and classifiers were controlled
sets in the
all improved t-
legitimate recall (with the lat
test in order to isolate the impact of cant
the input
at alpha features (as three quarterly settin
= 0. 1). The
described in HI). outperformed the baseline classifiers in term

1 31 0 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./ A Meta-Learning Framework for Detecting Financial Fraud

Yearly Ratio Hypotheses Quarterly Ratio Hypotheses

Metric H1a: Org. H1b: Industry H1c: Context H1d: Org. H1e: Industry H1f: Context
Legit Precision < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001
Legit Recall 0.211 0.056 0.065 0.003

Fraud Precision < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001

Fraud Recall < 0.001 < 0.001 <0.001 0.014 0.016

and fraud recall. Evaluating

Overall, Stacked
21 out of 24 condition
ficant at alpha = 0.05, while 23 out of 24 were
We
alpha = 0.1. The results evaluated
support the
H la effectiv
through
classifications
sets composed of financial ratios as from the
well as un
orga
industry-level contextused as input
measures arefeatures
more adepfo
financial fraud than ones solely
types employing
of stacks fina
were utiliz
The top-level yearly stack
To further thecontext-based
assess
impact of the classifiers
context f a
quarterly
computed the information gain stack classifiers
(IG) for each f
based
univariate measure of theclassifiers,
amount of while
entro t
provided fiers
by a particular used all across
attribute 28 yearly and
classes
greater than or equal as
toinput
0, withfeatures.
higher For eac
scores
greater discriminatoryused all 14 Related
potential. classification
fraud d m
The
has used IG to measure stack
the classifiers
ability wer
of context
discriminate between proach
fraud and non-fraud
described inst
in the s
et al. 2010). We The the
computed training data for
IG scores for the
all f
yearly and running
quarterly the bottom-leve
context-based feature
validation
shows the top 1 5 features on
(based the
on training
their IG s
feature sets. For each tions
feature, the three
generated by colum
the b
feature rank, training
description, instances
and IG colle
value, respect
description columns, classifiers'
the letters training data.
R, T, C, Q, an
ratio, top-5 industry
model, closest-5 industry m
and previous Tables 10 and
respectively, year, while 1 1 show the num the
stack classifiers. Most stacks
the ratio or quarter number. had AUC valuesfor
Hence, above 0.8, the y
with the highest value attained
based features, the description R2 using quarterly
denotes stack with an the s
the subsection SVM-RBF top-level
"Financial classifier. For the R7
Ratios." investors'/cost P7
situa- repr
number by tion,
7 fraud recallnumber
divided
ratio rates tended to be considerably
7 from higher than the
R2 - T2 signifies the value
legitimate of
recall rates. This ratio
outcome, number
which is a result of the
subsection "Financial cost setting used (i.e., minus
Ratios") 1:20), is likely desirable
the sincevalue
an o
investor's costs associated
the firms' corresponding top-5 with failingindustry
to detect fraud are m
considerably higher than
ranked quarterly ratio-based the costs associated with
feature false posi-
R8Q4 / C
tives (Beneish 1999a).
the value of ratio 8 (from The yearly and quarterly stack
"Financial classi-
Ratios")
divided by the value fiers'
of results were noticeably
ratio 8 better
in than those achieved using
quarter 4 f
their individual context-based
corresponding closest-5 industry model. Based counterparts (see Table 6), with
is apparent an
of average
that gain of over
the best1 1 percent
most in terms of AUC. All were
features 14 i
yearly stacks had higher
and organizational context-based measures. Th fraud f-measures, while 13 also had
higher ratios
measures, coupled with legitimate f-measures,
1, suggesting
7, and that in addition to
8 (asset
inventory growth, improved
and overall accuracy, the yearly
leverage) stacks were far more
seemed to pro
discriminatory balanced inThe
potential. their performance
table across legitimate
provides and fraudulent insig
the context-based instances. For thesupplemented
measures investor cost setting, the most balancedthe f
resulting in enhancedresults were attained using a Random
financial fraud Forest classifier.
detectionThe

MIS Quarterly Vol. 36 No. 4/December 2012 1311

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./A Meta-Learning Framework for Detecting Financial Fraud

Yearly Context-Based Features Quarterly Context-Based Features

Rank Description IG Value Rank Description IG Value
1 R2 0.0276 1 R8Q4/C8Q4 0.0619

2 R7 / P7
3 R3 0.0249 3 R7Q3 - C7Q3 0.0503
4 R9 0.0234 4 R8Q3 - T8Q3 0.0499

5 R2 - T2

6 R8 - C8
7 R2-C2 0.0188 7 R1Q3-R1Q2 0.0482

8 R1/P1

9 R7 - T7

10 R8-P8

11 R7

12 R7/T7

13 R1-C1

14 R8 / T8
15 R8-T8 0.0147 15 R8Q4 / T8Q4 0.0427

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.

SVM-Lin
LogitReg
J48
BayesNet
NaiveBayes

SVM-RBF 0.853 97.9 72.0 20.2 82.4 NNge 0.827 967 ŤŤ0 208 69.7
SVM-Poly 0.797 97.5 71.7 19.3 78.5 NeuralNet 0.823 97Ü 73Ü 206 78.2

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.

SVM-Lin

LogitReg

J48
BayesNet

NaiveBayes 0.805 97.7 67.6 17.9 81.9 JRip 0.837 992 66Ü 19 J 93.9
SVM-RBF

SVM-Poly

1312 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.

SVM-Lin 0.904 98ÍÕ 8ÕÕ 261) 81.2 ADTree 0.896 99^2 76^0 25^0 92.4
LogitReg 0.875 97.8 84.9 30.9 78.2 RandForest 0.857 96.7 95.2 53.0 62.8
J48 0.839 97.2 77.6 22.3 74.3 NBTree 0.887 98.8 77.0 25.0 89.0

BayesNet 0.893 98.1 77.9 24.3 82.2 REPTree 0.888 98.6 75.3 23.5 88.0
NaiveBayes 0.851 97^4 906 3Õ8 71.6 JRip 0.870 98^8 802 273 88.8
SVM-RBF 0.894 98Ü 745 23^3 90.0 NNge 0.865 96.5 88.4 32.1 63.3
SVM-Poly 0.865 97.5 88.9 " 36.5 73.8 NeuralNet ~Õ865~ 98.7 " 80.3 27.8 88.0

Legit Fraud
Metrics Precision Recall Precision Recall

H2a: Combined versus Yearly < 0.001 < 0.001 < 0.001 < 0.001
H2b: Combined versus Quarterly < 0.001 0.001 <0.001 <0.001Ť
ŤOpposite to hypothesis

quarterly stack classifiers had considerablyenhanced performance

better fraud recall of the combined stacks w
table to their
rates than the individual quarterly context-based ability to leverage the complemen
classifiers
(see Table 7), outperforming them by nearly
mation 25provided
percent by
on the yearly and quarterly co
average. However, this improved fraud classifiers.
recall was coupled
with reduced legitimate recall for four of the classifiers.

When compared to the yearly stacks, the H2: quarterly

Combined Stack Classifiers Versus Yearly
stacks
and Quarterly
yielded higher fraud recall rates (12 percent higher onStack Classifiers
aver-
age); every quarter stack classifier identified at least 80
percent of the fraud firms, albeit withWe conducted
false paired t-tests
positive ratesto compare the performance of
generally above 30 percent. However, both the combined
the yearlystacks against
andthe yearly (H2a) and quarterly
quarterly stacks had comparable average (H2b)
AUC stackvalues.
classifiersThis
using the same setup as the previous
hypothesis test (HI).
was attributable to the yearly stacks' 6 percent higher average Table 13 shows the t-test results. The
performance on legitimate firms, which resulted in con-significantly outperformed the
combined stack classifiers
siderably lower false positive rates. yearly stack classifiers on legitimate and fraud precision/recall
(with all four p-values < 0.01). The combined stacks also
outperformed
Table 12 shows the results for the combined stack the quarterly stacks on legitimate precision/
classifiers,
recall and fraud
which used the yearly and quarterly context-based classifiers'precision. However, the quarter stacks had
classifications as input. The overall AUC values ranged from gain on fraud recall (denoted
significantly better performance
by a plus using
0.839 to 0.904, with the best results attained sign). Overall,
a SVM-seven out of eight tests were signi-
Linear classifier. All 14 combined stacks outperformed their H2a-b and suggest that com-
ficant; the t-test results support
biningan
yearly and quarterly stack counterparts, with yearly and quarterly
average AUC information can facilitate
gain of 3 to 5 percent. For the investorenhanced financial fraud
cost setting, detection capabilities.
while
some of the combined stacks had higher legitimate/fraud
To further assess
recall rates, they were generally more balanced than the effectiveness
the of combining yearly and
quarterly information,
yearly and quarterly stacks in terms of their classifications we performed an instance-level analy-
sis of the classifications
(e.g., SVM-Linear, Logistic Regression, J48, Bayesian made by all 14 combined stack classi-
fiers.suggests
Network, JRip, and Neural Network). This For each of the instances
that thein our test bed, we aggregated

MIS Quarterly Vol. 36 No. 4/December 2012 1313

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

that particular instance's 14 classification scores. Since the Figure 4 shows the analysis results. The first column shows
classifiers assigned a "-1" to instances classified as fraudulent |*|; the absolute value of the aggregated score x for an
and "1" to instances considered legitimate, the aggregated instance. Columns two, three, five, and six show legitimate/
scores varied from -14 to 14 (in intervals of 2). Hence, a score fraud precision and recall percentages attained on instances
of 12 meant that 13 classifiers considered the instance with that particular score. Columns four, seven, and eight
legitimate and 1 deemed it fraudulent. The absolute valuedepict
|jt| the percentage of the legit, fraud, and total instances in
the test bed covered by that score, respectively. The chart at
of an aggregated score x represents the degree of agreement
the bottom left shows plots of the precision rates (i.e.,
between stack classifiers. For each |jc|, class-level precision
columns 2 and 5) and recall rates (columns 3 and 6) for
and recall measures were computed as follows: All instances
instances with that score. The chart at the bottom right shows
with positive aggregated scores (i.e., x > 0) were considered
legitimate (i.e., scores of 2, 4, 6, 8, 10, 12, and 14) while
cumulative precision and recall rates if using that score as a
those with negative scores (i.e., x < 0) were considered threshold. The results can be interpreted as follows: 32.7
fraudulent. These score-based predictions were compared percent of all instances in the test set had a score of -14 or 14.
These instances accounted for 33.4 percent of all legitimate
against the actual class labels to generate a confusion matrix
test instances and 24 percent of all fraud test instances. Of
for each |x| . From each confusion matrix, legit/fraud precision
these instances, 96.5 percent of the legitimate instances were
and recall were computed, resulting in performance measures
correctly classified (i.e., x = 14) while the remaining 3.5 per-
for every level of classifier agreement (i.e., minimal agree-
cent were misclassified as fraudulent (i.e., x = -14).
ment of |jc| = 2 to maximal agreement |x| = 14). It is important
to note that the 5.8 percent of instances in the test bed that had
an aggregated score of 0 were excluded from the analysis The results reveal that these aggregated scores provide a nice
mechanism
(i.e., test instances where the classifiers were evenly split for assessing the confidence level of a particular
between predictions of legitimate and fraudulent). classification. When all 14 combined stacks agreed (i.e., the

1314 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

absolute value of the sum of their classification scores was various stages of the supply chain (e.g., manufacturing,
14), the legit and fraud recall rates were 96.5 and 90.8 per- wholesale, retail), with fraud recall of between 94 and 100
cent, respectively. Moreover, this accounted for 32 percent ofpercent on such firms. The yearly context and baseline classi-
the test instances. Looking at the chart on the left, as ex- fiers also performed best on these sectors, although with
pected, lower scores generally resulted in diminished recall lower detection rates. The enhanced performance on manu-
rates (one exception was fraud recall, which increased when facturing firms is consistent with previous work that has also
using a score of 12). Based on the chart on the right, the attained good fraud detection rates on such data ( Kirkos et al.
performance degradation was quite gradual for thresholds 2007; Spathis 2002). The combined stacks had relatively
greater than 4. For example, using a threshold of 6 or better lower fraud recall rates on firms in the information and
resulted in legitimate and fraud recall rates of over 90 percent finance/insurance sectors (73 percent and 77 percent),
on instances encompassing 79.2 percent of the test bed. although still higher than the yearly context (approximately
Using the aggregated scores as a confidence-level measure 60 percent) and baseline classifiers (less than 50 percent).
can be useful for prioritizing regulator or investor resources. Analysis of the test bed revealed that fraud firms in these two
Moreover, the stack scores can also be exploited in a semi- sectors were 25 percent more prevalent in the test set, as
supervised learning manner to further improve performance, compared to the training data. Since fraud is often linked to
as discussed in the next subsection. financial distress (Chen and Du 2009; Zhao et al. 2009), the
increased number of fraud firms from the information sector
We also analyzed the combined stack classifiers' fraud detec- appearing in the year 2000 onward could be related to the dot-
tion rates for different sectors by categorizing the 409 fraud com bubble burst. Similarly, fraud in the financial sector
firm-years in the test set based on their top-level NAICS continues to grow and play a pivotal role in the economy
classifications (also referred to as business sectors). While (Stempel 2009). Such temporal changes in fraud patterns
the 20 top-level classifications (e.g., manufacturing, whole- attributable to overall economic and industry-level conditions
sale, retail, construction, mining, etc.) are more general than further underscore the importance of adaptive learning
the 1,017 bottom-level industry classifications in the NAICS approaches (discussed in the next subsection).
hierarchy, which were used to build the industry-level models,
aggregating results at the top-level provided interesting in-
sights. Figure 5 shows the results for all top-level NAICS H3: Yearly and Quarterly Stack Classifiers
Versus Context-Based Classifiers
codes with at least 10 fraud firm-years in the test set. The
table on the left lists the results for the combined stack classi-
fiers, while the chart on the right shows the combined stacks' We conducted paired t-tests to compare the performance of
results relative to the yearly context and baseline classifiers. the yearly stacks against the yearly context-based classifiers
The combined stacks performed best on merchandise firms at (H3a) and the quarterly stacks against the quarterly context-

MIS Quarterly Vol. 36 No. 4/December 2012 1315

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./A Meta-Learning Framework for Detecting Financial Fraud

Legit Fraud
Metrics Precision Recall Precision Recall

H3a: Yearly Stack-Individual < 0.001 < 0.001 < 0.001 0.01 1
H3b: Quarterly Stack-Individual < 0.001 0.482 0.008 < 0.001

based classifiers (H3b) (see Table 14).

reliability when The
applied to yearly
a diverse set of stacks
complementary
significantly outperformed the yearly context-based
underlying classifiers
classifiers (Lynam and Cormack 2006; Sigletos et
al. 2005; Wolpert
on legitimate and fraud precision/recall, with 1992).all four p-values
less than 0.01. With respect to H3b, the quarterly stacks
significantly outperformed the quarterly context-based classi-
Evaluating
fiers on legitimate precision and fraud Adaptive Semi-Supervised
precision/recall. Learning
While
the quarterly stacks also had higher legitimate recall rates, the
gains were not significant. Overall, three
We ran of the
the ASL algorithm four
on top condi-
of the base and stack
tions were significant. The results classifiers,
support ourin hypothesis
as described the "Adaptive Learing" that
subsection.
stack classifiers can improve financial fraud
During each iteration, detection
we incremented dby 10 (i.e.,/? = 10).
capabilities over simply using individual classifiers, irrespec-
The test instances were evaluated in chronological order. In
tive of whether yearly or quarterly
other words, ASLfeatures are
was first run on the being
data from 2000. Once
employed. all of the instances from 2000 had been processed (i.e.,
assigned final predictions), ASL moved on to the firms from
Figure 6 depicts the mean and range of the yearly stack (left 200 1 . Once the algorithm had finished running, the prediction
chart) and yearly context (right chart) ROC curves. The solid values in the S matrix were used to compute the performance
black lines show the mean ROC curves (taken as the average of the 14 classifiers. Table 15 shows the results. All 14
across the 14 classifiers' curves), which depict the tradeoffs classifiers had AUC values above 0.866, with many classifiers
between true and false positives. Here, false positives indi- attaining values over 0.900. The most balanced results were
cate legitimate firms misclassified as being fraudulent. attained using an NBTree top-level classifier, which yielded
Curves situated closer to the top left corner signify better the highest AUC and attained legitimate and fraud recall rates
results, since they denote high ratios of true to false positives. above 88 percent for the investor cost setting. The results of
The shaded areas indicate the ROC curve range (minimum all 14 ASL classifiers were better than those achieved using
and maximum values) across the 14 yearly stack and context- their combined stack counterparts (i.e., static supervised
based classifiers. The two values on the bottom right corner learning). The biggest gains were achieved when the Baye-
of the charts indicate the mean AUC (i.e., the AUC for the sian Network classifier was used.
solid black lines) and the mean range (i.e., the area of the
shaded regions). Looking at the charts, we can see that the
yearly context-based classifiers had greater variability across H4: ASL Versus Combined Stack Classifiers
classifiers; the larger shaded region is indicative of classifiers
that varied considerably in terms of their true/false positive We conducted paired t-tests (n = 140) to compare the perfor-
rates (with some providing better legitimate recall, and others mance of the adaptive semi-supervised learning classifiers
specializing in fraud recall). In contrast, the stacks provided against the combined stack classifiers (H4). The adaptive
better performance and also exhibited less variation; their semi-supervised learning classifiers significantly outper-
curves were situated closer to one another (as evidenced by formed the combined stack classifiers on legitimate precision/
the smaller shaded region), while the results of the context- recall (both p-values < 0.001) and fraud precision/recall (p-
based classifiers are relatively sporadic. The AUC values values < 0.001 and 0.049, respectively). The t-test results
varied 6 percent across the yearly stacks and 19 percent support H4 and suggest that adaptive learning can further
across the yearly context-based classifiers. Although not enhance financial fraud detection performance over static
depicted here, a similar trend was observed with the quarterly supervised learning models.
stack and context classifiers. These results are consistent with
prior research, which has suggested that stacked generali- The two solid lines in Figure 7 show the legitimate and fraud
zation can improve performance, generalization ability, and recall performance of ASL in comparison with the combined

1 31 6 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai./ A Meta-Learning Framework for Detecting Financial Fraud

Legit Fraud Legit Fraud

Classifier AUC Prec. Ree. Prec. Ree. Classifier AUC Prec. Ree. Prec. Ree.
SVM-Lin 0.912 984 8&6 35^ 83Ü ADTree 0.904 992 83Ü 327 92.2
LogitReg 0.911 98.3 ~88J~ 38.5 ' 81.9 RandForest " 0.881 97.2 95.8~ 58.5 68.2 "
J48
BayesNet

NaiveBayes

SVM-RBF 0.906 99Ü) 86/7 36Ü 900 NNge 0.884 97^2 94~6 5 2A 68.7
SVM-Poly 0.895 97Í ÍãÕ Š70 T5A NeuralNet 0.876 9&9 84Ü 33^1 88.8

MIS Quarterly Vol. 36 No. 4/December 2012 1317

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

stack for each year of the test bed (2000-2008). ASL had cularly for fraud recall. Moreover, ASL outperformed the
higher legit/fraud recall rates across years. Margins seemed dynamic stack in legitimate recall for all 7 years and on fraud
to improve in later years. The improved performance on both recall for 6 of 7 years, while dynamic ASL dominated the
legitimate and fraud firms suggests that adaptive learning is dynamic stack with respect to legitimate/fraud recall.
important not only to better detect fraud firms, but also in Appendix F provides additional analysis which shows that
order to react to changes in non-fraud firms. The biggest ASL outperformed dynamic stacks for any window length
improvements in fraud recall were in the information and between 1 and 5 years. The results suggest that ASL is able
financial and insurance sectors (over 5 percent), the two to effectively leverage existing knowledge, including knowl-
sectors where the combined stacks underperformed. Given edge gained during the detection process, toward enhanced
the adversarial nature of fraud detection (Virdhagriswaran and subsequent detection of firms.
Dakin 2006), one may expect fraud detection rates for static
models to deteriorate over time, as evidenced by the To illustrate this point, we analyzed fraud firm-years correctly
decreasing performance for the combined stacks from 2004 detected by ASL that were not identified by the combined
onward. In contrast, ASL' s fraud performance holds steady stacks (we refer to these instances as y firm-years). Sensi-
across these years. tivity analysis was performed to see how fraud firms
previously added to the classification models (x firm-years)
However, this analysis assumes that no new training data is subsequently impacted ASL' s prediction scores for each of
made available during the test time period 2000-2008 (i.e., these y firm-years. This was done by failing to add each x
only data through 1999 was used for training). In order to firm-year to the model, one at a time, in order to assess their
evaluate the impact of having additional training data on the individual impact on the ASL prediction scores for the >> firm-
performance of ASL and the combined stacks, we used an years. One example generated as a result of the analysis is
expanding window approach. For a given test year a , all depicted in Figure 8. The four gray nodes represent^ firm-
instances up to and including year a -2 were used for training years: fraud instances correctly identified by ASL (that were
(e.g., test year 2002 was evaluated using training data through misclassified by the combined stacks). White nodes represent
2000). A two-year window was used since prior research has x firm-years: fraud instances from the test set that were added
noted that the median time needed to detect financial fraud is to the classification models by ASL. Node labels indicate
26.5 months (Beneish 1999b). The dotted lines in Figure 7 firms' names and top-level NAICS categories. The x-axis
show the results for these "dynamic" ASL and stack classi- displays the years associated with these firm-years (2000-
fiers. Based on the results, as expected, the dynamic stacks 2005). A directed link from a white node to a gray one can be
outperformed the static combined stacks. The gains in legiti- interpreted as that x firm-year influencing the prediction score
mate and fraud recall were most pronounced for 2005 onward. of the y firm-year. The nature and extent of the influence is
Dynamic ASL also improved performance over ASL, parti- indicated by the number along the link. For example, a "2"

1 31 8 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./ A Meta-Learning Framework for Detecting Financial Fraud

means that the absence of that particular* firm-year from the studies using public data all had fraud detection rates of less
classification model worsened the >> firm-year's ASL score by than 70 percent. The details regarding the three comparison
two. For example, the addition of MCSi in 2002 improved approaches are as follows.
the prediction score for Natural Health in 2005 by one.
Kirkos et al. used a set of 10 ratios/measures in combination
Figure 8 provides several important insights regarding the with three classifiers: ID3 decision tree, neural network, and
adaptive learning component of the MetaFraud framework. Bayesian network. We replicated their approach by including
It reveals that new fraud firm-year detections were not neces- the same 10 ratios/measures: debt to equity, sales to total
sarily attained simply by adding one or two additional fraud assets, sales minus gross margin, earnings before income tax,
cases. Rather, they were sometimes the result of a complex working capital, Altman's Z score, total debt to total assets,
series of modifications to the training models. In some cases, net profit to total assets, working capital to total assets, and
these correct predictions were the culmination of modifica- gross profit to total assets. We also tuned the three classi-
tions spanning several years of data. For example, the detec- fication algorithms as they did in their study (e.g., the number
tion of Interpublic Group and Natural Health in 2005 of hidden layers and nodes on the neural network).
leveraged 6-10 jc firm-years between 2000 and 2005. How-
ever, firms from the same year also played an important role, Gaganis used 7 financial ratios in combination with 10 classi-
as evidenced by the positive impact Diebold and Dana fication algorithms. We replicated his approach by including
Holding had on Natural Health in 2005. Interestingly, busi- the same seven ratios: receivables to sales, current assets to
ness sector affiliations also seemed to play a role in the adap- current liabilities, current assets to total assets, cash to total
tive learning process: several of the links in the figure are assets, profit before tax to total assets, inventories to total
between firms in the same sector. For example, three of the assets, and annual change in sales. We ran many of his classi-
firms that influenced Natural Health were also from the fication methods that had performed well, including neural
wholesale trade sector, while one or two of the firms thatnetwork, linear/polynomial/RBF SVMs, logistic regression,
impacted Charter Communications and Interpublic Group k-Nearest neighbor, and different types of discriminant
were from the information and professional services sectors,analysis. The parameters of all techniques were tuned as was
respectively. It is also important to note that not all x firm-done in the original study. We included the results for the
years' additions to the models had a positive impact. Forthree methods with the best performance: linear S VM, Neural
example, both Veritas Solutions and Cornerstone (from 2000)Net, and Logit.
worsened Bennett Environmental' s prediction score by one.
Thus, Figure 8 sheds light on how ASL was able to improveCecchini et al. used an initial set of 40 variables. After pre-
the detection of fraudulent firms. It is important to note thatprocessing, the remaining 23 variables were used as input in
in our analysis, consistent with prior work, we represent each their financial kernel. Following their guidelines, we began
fraud firm based on the year in which the fraud occurredwith the same 40 variables and removed 19 after prepro-
(Cecchini et al. 201 1). An interesting future direction wouldcessing, resulting in 21 input variables for the financial
be to also consider when the fraud was actually discovered,kernel.
and to use this information to retrospectively identify pre-
viously undetected frauds committed in earlier years. All comparison techniques were run using the same training
and testing data used in our prior experiments. ROC curves
were generated, and AUC for the ROC curves was computed
Evaluating MetaFraud in Comparison with (as with the previous experiments). For MetaFraud, we
Existing Fraud Detection Methods generated a final prediction for each test case by aggregating
the predictions of the 14 ASL classifiers to derive a single
We evaluated MetaFraud in comparison with three priorstack score for each instance (as described earlier). We used
the results of the ASL classifiers since these results are based
approaches that attained state-of-the-art results: Kirkos et al.
(2007), Gaganis (2009), and Cecchini et al. (2010). Each ofon the amalgamation of all four phases of MetaFraud and
these three approaches was run on our test bed, in comparison therefore signify the final output of the meta-learning frame-
with the overall results from the proposed meta-learning work.
framework. Kirkos et al. and Gaganis each attained good
results on Greek firms, with overall accuracies ranging fromTable 16 and Figure 9 show the experiment results. Table 16
73 to 90 percent and 76 to 87 percent, respectively. Cecchinidepicts the results for MetaFraud (MF) and the comparison
et al. attained excellent results on U.S. firms, with fraud methods using both the regulator (1:10) and investor (1:20)
detection rates as high as 80 percent. In comparison, priorcost settings. In addition to legitimate and fraud recall, we

MIS Quarterly Vol. 36 No. 4/December 2012 1319

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./ A Meta-Learning Framework for Detecting Financial Fraud

Regulators (Cost 1:10) Investors (Cost 1 :20)

Legit Fraud Error Legit

Setting AUC Prec. Ree. Prec. Ree. Cost Prec. Ree. Prec. Ree. Cost
MetaFraud (MF) "o.931 98.3 90.2 41.8 81.5 ~ $100.5 " 98.4 ~
Cecchini 0.818 97.8 79.8 25.4 79.5 $149.2 98.0 74.7 21 .9 82.2 $620.5

Kirkos - BayesNet 0.784 95.8~ 72.4 ~16.6 63.6 " $231.

Kirkos - ID3 "p.783 96.3~ 85.7 ~27.3 62.1 ~ $181.8 " 96.3 ~73X~ 17.9 67.2 $918.9
Kirkos - NeuralNet 0.630 96.0 68.3 15.5 67.2 $236.2 " 96.8 64.2 15.4 ~75.6 $861.7
Gaganis - NeuralNet 0.792 97.2 69.9 18.1 76.8 $198.4 97.5 65.5 16.7 80.4 $754.9
Gaganis - LogitReg 0.772 96.7 71.3 17^8 72A $208.1 972 6&3 TM 77'Õ $788.3
Gaganis - SVM-Lin 0.769 967 T'2 Í7Í 7Z1 $208.4 97^3 6&5 16^9 787 $775.9

also assessed the error cost per instance (i.e., firm-year) of

less per firm-year than comparison methods for the regulator
each method. For the regulator setting, each false positive setting; MF saved $307K and $225K per firm, respectively,
was assessed a cost of $443,000 (median cost of audit) whileover baselines of auditing all firms and auditing none. From
false negatives were attributed a cost of $4,100,000 (medianthe investor perspective, where low false negative rates are
even more critical, MF's error costs were at least $179K per
cost of financial statement fraud). For the investor setting, the
cost of false negatives was computed as a 20 percent drop infirm less than comparison methods. Across the test bed,
MetaFraud's total error costs were $250M better than the best
the median stock value across firms in our test bed (i.e., 20
percent of $120M, or $24M). The cost of false positives was
comparison method on the regulator setting and $900M better
the opportunity cost of failing to invest in a legitimate firm;on the investor setting. Further analysis of the error costs
this was computed as a 1 percent increase in the median stock
associated with the regulator and investor settings for various
value ($1.2M) (Beneish 1999a, 1999b; Cox and Weirichcomponents of MetaFraud and comparison methods can be
2002). The error cost numbers depicted are in thousands of
found in Appendices D and E.
dollars. Based on the table, the results for MF were con-
siderably better than those associated with all three com-Figure 9 shows the ROC curves for MF and the three com-
parison methods. MF had higher overall AUC and better
parison methods. MF's curve was situated closest to the top
legitimate and fraud recalls for both cost settings. Theseleft corner, indicating higher ratios of true/false positive rates
performance gains had a significant financial impact: thethan comparison methods. The results suggest that MF is
errors produced by MF cost approximately $49K to $136Kcapable of providing enhanced detection performance for

1320 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et a!./ A Meta-Learning Framework for Detecting Financial Fraud

various cost settings. With respect to the comparison Cecchini did garner slightly better fraud recall rates (relative
methods, the financial kernel outperformed Kirkos et al. and to MF), the legitimate recall values were 4 to 5 percent lower
Gaganis in terms of overall AUC and legit/fraud recall for for both cost settings. Consequently, MF had a better finan-
both cost settings, as evidenced by Table 16. While the best cial impact than MF-Cecchini. Overall, the results demon-
results for the financial kernel were somewhat lower than strate the efficacy of MetaFraud as a viable mechanism for
those attained in the Cecchini et al. study, the overall ac- detecting financial fraud.
curacy results for the Kirkos et al. and Gaganis methods were
significantly lower than those reported in their studies (i.e.,
1 5-20 percent). As previously alluded to, this is likely due to H5: MetaFraud Versus Comparison Methods
the application of these approaches on an entirely different set
of data: U.S. firms from various sectors as opposed to Greek We conducted paired t-tests to compare the performance of
manufacturing firms (Barth et al. 2008). MetaFraud against the seven comparison settings shown in
Table 16. We compared cost settings of 5 through 50 in
In order to further assess the effectiveness of the MetaFraud increments of 1 (n = 46). MetaFraud significantly outper-
framework, we ran MetaFraud using the ratios/measures formed the comparison methods on precision and recall (all p-

utilized by Kirkos et al., Gaganis, and Cecchini et al. We values < 0.001). We also ran t-tests to compare MF-Kirkos,
called these MF-Kirkos, MF-Gaganis, and MF-Cecchini. For MF-Gaganis, and MF-Cecchini against their respective
all three feature sets, we derived the industry and organiza- settings from Table 1 6. Once again, using MetaFraud signifi-
tional context measures from the quarterly and annual state- cantly improved performance, with all p-values less than
ments. For instance, the 7 Gaganis ratios were used to gener- 0.001 . The t-test results support H5 and suggest that the pro-

ate 49 annual and 196 quarterly attributes (see Tables 3 and posed meta-learning framework can enhance fraud detection
4 for details). Similarly, the 2 1 Cecchini et al. measures were performance over the results achieved by existing methods.
used to develop 147 annual and 588 quarterly features. We
then ran the combined stack and ASL modules and computed
a single performance score for all 991 cost settings (i.e., 1:1 Evaluating MetaFraud in Comparison with
to 1:100), as done in the previous experiment. Table 17 and Existing Semi-Supervised Learning Methods
Figure 10 show the experiment results.
In order to demonstrate the effectiveness of the procedural
Based on the table, the results for MF-Cecchini, MF-Gaganis, bias improvement mechanisms incorporated by MetaFraud
and MF-Kirkos were considerably better than the best com- over existing ensemble-based semi-supervised learning
parison results reported in Table 16 for both cost settings. methods, we compared MetaFraud against Tri-Training (Zhou
Figure 10 shows the ROC curves for MF-Cecchini, MF- and Li 2005). Tri-Training has outperformed existing semi-
Kirkos, and MF-Gaganis and the comparison techniques. MF supervised learning methods on several test beds, across
is also included to allow easier comparisons with the results various application domains. It uses an ensemble of three
from Table 1 6 and Figure 9. Using MetaFraud improved per- classifiers. In each iteration, the predictions of all test
formance for all three feature sets over the best techniques instances where two classifiers j and k agree are added to the
adopted in prior studies. MF-Cecchini outperformed MF- third classifier's training set (i.e., classifier i) with the pre-
Kirkos and MF-Gaganis. The lower performance of MF- dicted class labels, provided that the estimated error rates for
Kirkos and MF-Gaganis relative to MF-Cecchini was attribu- instances where j and k agree has improved since the previous
table to the fact that the ratios of these methods were less iteration. We ran Tri-Training on the base ratios as well as
effective on nonmanufacturing firms. Interestingly, the MF- those proposed by the three comparison studies. In order to
Cecchini ROC curve was very similar to the one generated isolate the impact of MetaFraud' s procedural bias improve-
using MetaFraud with the 1 2 ratios (i.e., MF). This is because ment methods, we ran Tri-Training using the context-based
the measures employed by Cecchini et al. (2010) include features for all four sets of ratios (as done with MetaFraud).
many of these baseline ratios. Their financial kernel impli- For Tri-Training, we evaluated various combinations of
citly developed 8 of the 12 ratios (see "Financial Ratios" for classifiers and found that the best results were generally
details): asset turnover (R2), depreciation index (R5), inven- attained when using Bayes Net, Logit, and J48 in conjunction.
tory growth (R7), leverage (R8), operating performance For these three classifiers, we then ran all 991 cost settings as
margin (R9), receivables growth (RIO), sales growth (RI 1), done in the H5 experiments. Consistent with the settings used
and SGE expense (R12). Moreover, they also used variations by MetaFraud's ASL algorithm, Tri-Training was run on test
of asset quality index (Rl) and gross margin index (R6). On instances in chronological order (i.e., all year 2000 instances
the flip side, while the larger input feature space for MF- were evaluated before moving on to the 2001 data).

MIS Quarterly Vol. 36 No. 4/December 2012 1321

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

Regulators (Cost 1:10) Investors (Cost 1 :20)

Legit

Setting AUC Prec. Ree. Prec. Ree. Cost Prec. Ree. Prec. Ree. Cost
MF -Cecchini ~0922 98.2 86.0 33.6~ 81.7 ~ $116.7 ~983~ 84.8 32.1 ~83.4 $485.7
MF - Kirkos
MF - Gaganis

Regulators (Cost 1:10) Investors (Cost 1:20)

Legit Fraud Error Legit Fraud Error
Setting AUC Prec. Ree. Pree. Ree. Cost Prec. Ree. Prec. Ree. Cost
TT-Cecchini-BN """5.882 97.7 86.0 3Z0 7&Õ $135.0 97^9 8^2 26^8 79/7 $594.9
TT-Kirkos-Logit 0.769 97^3 709 ïâi 77'Õ $193.4 97^6 68 2 TãÕ 809 $715.5
TT-Gaganis-J48 ~Õ.8Q8 97.1 76.3 21.1 73^3 $183.4 977 66^8 Ì7i 814 $721.5
TT-Base-BN 0.845 97.5 84.2 29A 75~î $145.6 97ß 79A 24ß 775 $657.2

1322 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

Table 18 and Figure 11 show the experiment results, in- mation, whether at the yearly or quarterly level, can improve
cluding overall AUC, legit/fraud recall, and error cost per performance over just using annual-statement-based ratios
firm-year for both cost settings. Figure 1 1 shows the ROC devoid of context information. In experiment 2, combining
curves for MF-Cecchini, MF-Kirkos, and MF-Gaganis in yearly and quarterly information yielded the best results a
comparison with the three Tri-Training classifiers. Based on they provide complementary information (H2). Experiment
the results, it is evident that MetaFraud outperformed Tri- 2 also supported the notion that the ability of stack classifiers
Training, both in terms of best performance results (see the to exploit disparate underlying classifiers enables them to im-
top of Table 16 and Table 17, versus Table 18) and across prove classification performance (H3). Experiment 3 re
cost settings. While some settings of Tri-Training improved vealed that the proposed adaptive semi-supervised learning
performance over the original methods run with those fea- algorithm further improved performance by leveraging th
tures, the performance gains were small in comparison to the underlying classifications with the highest confidence level
improvements garnered by MF-Cecchini, MF-Gaganis, MF- (H4). Experiments 4 and 5 showed that, collectively, th
Kirkos, and MF.
proposed meta-learning framework was able to outperform
comparison state-of-the-art methods (H5 and H6).

H6: MetaFraud Versus Comparison Ensemble-

Each phase of the MetaFraud framework is intended t
Based Semi-Supervised Learning Methods
improve financial fraud detection performance while simul
taneously serving as an input refinement mechanism for th
Paired t-tests were used to evaluate the performance of
MetaFraud relative to the Tri-Training results presented in ensuing phase of the framework. The first two phases
Table 18. P-values for all four evaluation metrics were enhanced declarative bias, while the latter two improved
significant across the four feature sets (with all p-valuesprocedural
< bias. For instance, the 12 financial fraud ratios
(i.e.,
0.001). The results illustrate how MetaFraud' s utilization of the baseline features) were used to generate the yearly
an elaborate stacked generalization scheme comprised and of quarterly context-based feature sets (phase A). These
features were then used as inputs into the yearly and quarterly
several top-level classifiers in conjunction with the ASL
context-based classifiers (phase B). The classifications from
algorithm facilitated enhanced procedural bias improvement,
and consequently resulted in better fraud detection per-these base classifiers were used as input features in the
combined stack classifiers (phase C). The combined stack
formance, over existing ensemble-based semi-supervised
learning methods. classifiers' classifications were used to inform the adaptiv
semi-supervised learning algorithm (phase D). Collectively,
the MetaFraud framework improved overall accuracy by 27
percent on average as compared to the baseline classifiers,
Discussion
and by 7 to 20 percent over state-of-the-art methods, with
each phase contributing significantly as evidenced by the
In this study, we proposed a meta-learning framework for
hypotheses test results.
enhanced detection of financial fraud. Our research objective
for this study was to develop a BI framework that detected
fraud from publicly available financial information with
demonstratively better performance than that obtained by
Conclusions ^
existing methods. To achieve this objective, we developed a
design artifact - the MetaFraud framework - using principles
Consistent with design scienc
from meta-learning. We also incorporated ex post sensitivity
ments to rigorously test each
analysis as part of the iterative evaluation and refinement
framework, as well as to comp
process in artifact development. This is consistent with the
state-of-the-art methods. The
design science guideline, "design as a search process"
that the MetaFraud framework
(Hevner et al. 2004, p. 83). Our evaluation of the framework,
legitimate and fraud recall of
including the detailed results for H1-H6 as well as the
stakeholder
analyses presented in the appendices, shows the efficacy of cost settings (Tabl
each component of the MetaFraud framework and demon-that MetaFraud markedly impro
methods.
strates that the complete framework substantially improves Overall, the results c
performance over existing state-of-the-art methods. meta-learning methods for enh

Our
The results from experiment 1 (and HI) demonstrated thatresearch contribution is th
framework
incorporating industry-level and organizational context infor- for financial fraud

MIS Quarterly Vol. 36 No.

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al/ A Meta-Learning Framework for Detecting Financial Fraud

methods into a meta-learning artifact. The MetaFraud frame- References

work encompasses several important aspects that each con-
tribute to its overall effectiveness. The use of organizational Abbasi, A., and Chen, H. 2008a. "CyberGate: A Design Frame-
and industry contextual information, taken from quarterly and work and System for Text Analysis of Computer-Mediated
Communication," MS Quarterly (32:4), pp. 81 1-837.
annual data, provides an effective mechanism for expanding
Abbasi, A., and Chen, H. 2008b. "Writeprints: A Stylometric
the fraud detection feature space. Although stacking and
Approach to Identity-level Identification and Similarity Detection
semi-supervised learning have been used previously, we are
in Cyberspace," ACM Transactions on Information Systems
unaware of any prior work that has similarly integrated an
(26:2), no. 7.
extensive stacked generalization scheme, semi-supervised Abbasi, A., and Chen, H. 2009. "A Comparison of Fraud Cues and
learning, and adaptive/active learning. The unique combina- Classification Methods for Fake Escrow Website Detection,"
tion of these elements in our framework (i.e., through the ASL Information Technology and Management (10), pp. 83-101.
algorithm) constitutes an important contribution to the fraud Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., and Nunamaker, Jr.,
detection and BI literature. Specifically, the results of J. F. 2010. "Detecting Fake Websites: The Contribution of
Hypothesis 6 show the effectiveness of this approach over Statistical Learning Theory," MIS Quarterly (34:3), pp. 435-461 .
existing methods. MetaFraud outperformed the Tri-Training Accounting Financial and Tax. 2009. "Using Ratios to Detect
adaptive-learning approach (Zhou and Li 2005), which has Fraud and Financial Misstatement," October 10 (http://
accounting-financial-tax.com/2009/10/using-ratios-to-detect-
been successfully applied to a dozen problem domains,
fraud-and-financial-misstatement; accessed July 8, 2010).
including credit card approval, credit screening, and web page
Albrecht, C. C., Albrecht, W. S., and Dunn, G. 2001. "Can
classification.
Auditors Detect Fraud: A Review of the Research Evidence,"
Journal of Forensic Accounting (2), pp. 1-12.
Another contribution to the domain of fraud detection is the
Albrecht, W. S., Albrecht, C. C., and Albrecht, C. O. 2004. "Fraud
confidence scores generated by MetaFraud (Figure 4). For and Corporate Executives: Agency, Stewardship, and Broken
instance, the proposed approach was able to attain over 90 Trust," Journal of Forensic Accounting (5:1), pp. 109-130.
percent legitimate and fraud recall on the 79.2 percent of test Albrecht, W. S., Albrecht, C. O., and Albrecht, C. C. 2008.
instances with the highest confidence scores. These confi- "Current Trends in Fraud and its Detection," Information
dence scores can provide a useful decision aid for various Security Journal: A Global Perspective (17), pp. 2-12.
stakeholder groups, analogous to the credit- worthiness ratings Ameen, E. C., and Strawser, J. R. 1994. "Investigating the Use of
that are currently available for commercial and government Analytical Procedures: An Update and Extension," Auditing: A
Journal of Theory and Practice (13:2), pp. 69-76.
entities. For instance, investors may wish to shy away from
Anderson-Lehman, R., Watson, H. J., Wixom, B. H., and Hoffer,
firms that are considered fraudulent with a high confidence
J. A. 2004. "Continental Airlines Flies High with Real-Time
level. Audit firms can leverage these recommendations in
Business Intelligence," MIS Quarterly Executive (3:4), pp.
order to better assess the risk associated with new potential 163-176.
clients. Regulators can use such scores to aid in the appro- Ando, R. K., and Zhang, T. 2007. "Two- View Feature Generation
priation and prioritization of investigatory resources. Model for Semi-Supervised Learning," in Proceedings of the 24th
International Conference on Machine Learning , Corvallis, OR,
The MetaFraud framework could also influence the design June 20-24.

and development of financial fraud detection systems that Association of Certified Fraud Examiners. 2010. "2010 Global
integrate predictive and analytical business intelligence tech- Fraud Study: Report to the Nations on Occupational Fraud and
nologies, thereby allowing analysts to draw their own Abuse," Association of Certified Fraud Examiners, Austin, TX.

conclusions (Bay et al. 2006; Coderre 1999; Virdhagriswaran Balean, M. F., Blum, A., and Yang, K. 2005. "Co-Training and
Expansion: Towards Bridging Theory and Practice," in Ad-
and Dakin 2006). By combining a rich feature set with a
vances in Neural Information Processing Systems 1 7, L. K. Saul,
robust classification mechanism that incorporates adaptive
Y. Weiss, and L. Bottou (eds.), Cambridge, MA: MIT Press, pp.
learning, meta-learning-based systems could provide fraud 89-96.
risk ratings, real-time alerts, analysis and visualization of
Bankruptcydata.com. "20 Largest Public Domain Company Bank-
fraud patterns (using various financial ratios in the feature rutcy Filings 1980-Present (http://www.bankruptcydata.com/
sets), and trend analyses of fraud detection patterns over time Research/Largest Overall All-Time; accessed July 8, 2010).
(utilizing the adaptive learning component). As business Barth, M., Landsman, W., and Lang, M. 2008. "International
intelligence technologies continue to become more pervasive Accounting Standards and Accounting Quality," Journal of
(Watson and Wixom 2007), such systems could represent a Accounting Research (46:3), pp. 467-498.
giant leap forward, allowing fraud detection tools to perform Bay, S., Kumaraswamy, K., Anderle, M. G., Kumar, R., and Steier,
at their best, when they are combined with human expertise. D. M. 2006. "Large Scale Detection of Irregularities in Ac-

1 324 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

counting Data," in Proceedings of the 6th IEEE International Cohen, W. W. 1995. "Fast Effective Rule Induction," in Pro-
Conference on Data Mining , Hong Kong, December 18-22, ceedings of the 12th International Conference on Machine
pp. 75-86. Learning , Tahoe City, CA, July 9-12, pp. 1 15-123.
Bayes, T. 1958. "Studies in the History of Probability and Cox, R. A. K., and Weirich, T. R. 2002. "The Stock Market
Statistics: XI. Thomas Bayes' Essay Towards Solving a Problem Reaction to Fraudulent Financial Reporting," Managerial
in the Doctrine of Chances," Biometrika (45), pp. 293-295. Auditing Journal (17:7), pp. 374-382.
Beasley, M. S., Carcello, C. V., Hermanson, D. R., and Lapides, Dechow, P., Ge, W., Larson, C., and Sloan, R. 201 1. "Predicting
P. D. 2000. "Fraudulent Financial Reporting: Consideration of Material Accounting Misstatements," Contemporary Accounting
Industry Traits and Corporate Governance Mechanisms," Research (28:1), pp. 1-16.
Accounting Horizons (14:4), pp. 441-454. Deloitte. 2010. "Deloitte Poll: Majority Expect More Financial
Beneish, M. D. 1999a. "The Detection of Earnings Manipulation," Statement Fraud Uncovered in 2010 2011," April 27
Financial Analysts Journal (55:5), pp. 24-36. (http://www.deloitte.com/view/en_US/us/Services/Financial-
Beneish, M. D. 1999b. "Incentives and Penalties Related to Advisory-Services/7ba0852e4de38210VgnVCM200000bb42f0
Earnings Overstatements that Violate GAAP," The Accounting 0aRCRD.htm; accessed July 8, 2010).
Review (74:4), pp. 425-457. Dikmen, B., and Kûçûkkocaoglu, G. 2010. "The Detection of
Bolton, R. J., and Hand, D. J. 2002. "Statistical Fraud Detection: Earnings Manipulation: The Three-Phase Cutting Plane Algo-
A Review," Statistical Science , (17:3), pp. 235-255. rithm Using Mathematical Programming," Journal of Fore-
Brachman, R. J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., casting (29:5), pp. 442-466.
Dull, R., and Tegarden,D. 2004. "Using Control Charts to Monitor
and Simoudis, E. 1996. "Mining Business Databases," Commu-
Financial Reporting of Public Companies," International Journal
nications of the ACM (39: 11), pp. 42-48.
Brázdil, P., Giraud-Carrier, C., Soares, C., and Vilalta, R. 2008. of Accounting Information Systems (5:2), pp. 109-127.
Dybowski, R., Laskey, K. B., Myers, J. W., and Parsons, S. 2003.
Metalearning : Applications to Data Mining , Berlin: Springer-
"Introduction to the Special Issue on the Fusion of Domain
Verlag.
Knowledge with Data for Decision Support "Journal of Machine
Breiman, L. 2001. "Random Forests," Machine Learning (45:1),
Learning Research (4), pp. 293-294.
pp. 5-32.
Dzeroski, S., and Zenko, B. 2004. "Is Combining Classifiers with
Carson, T. 2003. "Self-interest and Business Ethics: Some Lessons
Stacking Better than Selecting the Best One?" Machine Learning
of the Recent Corporate Scandals," Journal of Business Ethics
(54:3), pp. 255-273.
(43:4), pp. 389-394.
Fanning, K. M., and Cogger, K. O. 1998. "Neural Network Detec-
Cecchini, M., Aytug, H., Koehler, G., and Pathak, P. 2010.
tion of Management Fraud Using Published Financial Data,"
"Detecting Management Fraud in Public Companies,"
International Journal of Intelligent Systems in Accounting and
Management Science (56:7), pp. 1 146-1 160.
Finance Management (7), pp. 21-41.
Chai, W., Hoogs, B., and Verschueren, B. 2006. "Fuzzy Ranking
Fawcett, T., and Provost, F. 1997. "Adaptive Fraud Detection,"
of Financial Statements for Fraud Detection," in Proceedings of
Data Mining and Knowledge Discovery (1), pp. 291-3 16.
the IEEE International Conference on Fuzzy Systems ,
Freund, Y., and Mason, L. 1999. "The Alternating Decision Tree
Vancouver, BC, July 16-21, pp. 152-158.
Learning Algorithm," in Proceedings of the 16th International
Chan, P. K., Fan, W., Prodromidis, A. L., and Stolfo, S. J. 1999.
Conference on Machine Learning , Bled, Slovenia, June 27-30,
"Distributed Data Mining in Credit Card Fraud Detection," IEEE
pp. 124-133.
Intelligent Systems (14:6), pp. 67-7 4.
Gaganis, C. 2009. "Classification Techniques for the Identification
Chan, P., and Stolfo, S. 1993. "Toward Parallel and Distributed
of Falsified Financial Statements: A Comparative Analysis,"
Learning by Meta-Learning," in Proceedings of the Knowledge International Journal of Intelligent Systems in Accounting and
Discovery in Databases Workshop , pp. 227-240. Finance Management (16), pp. 207-229.
Chapelle, O., Schölkopf, B., and Zien, A. 2006. Semi-Supervised Giraud-Carrier, C., Vilalta, R., and Brázdil, P. 2004. "Introduction
Learning , Cambridge, MA: MIT Press. to the Special Issue on Meta-Learning," Machine Learning (54),
Charles, S. L., Glover, S. M., and Sharp, N. Y. 2010. "The Asso- pp. 187-193.
ciation Between Financial Reporting Risk and Audit Fees Before Green, B. P., and Calderon, T. G. 1995. "Analytical Procedures
and After the Historic Events Surrounding SOX," Auditing: A and the Auditor's Capacity to Detect Management Fraud,"
Journal of Practice and Theory (29:1), pp. 15-39. Accounting Enquiries: A Research Journal (5:2), pp. 1-48.
Chen, W., and Du, Y. 2009. "Using Neural Networks and Data Green, B. P., and Choi, J. H. 1997. "Assessing the Risk of
Mining Techniques for the Financial Distress Prediction Model," Management Fraud through Neural Network Technology,"
Expert Systems with Applications (36), pp. 4075-4086. Auditing (16:1), pp. 14-28.
Chung, W., Chen, H., and Nunamaker Jr., J. F. 2005. "A Visual Hansen, J. V., and Nelson, R. D. 2002. "Data Mining of Time
Framework for Knowledge Discovery on the Web: An Empirical Series Using Stacked Generalizes," Neurocomputing (43), pp.
Study of Business Intelligence Exploration," Journal of Manage- 173-184.

ment Information Systems (21:4), pp. 57-84. Hevner, A. R., March, S. T., Park, J., and Ram, S. 2004. "Design
Coderre, D. 1999. "Computer- Assisted Techniques for Fraud Science in Information Systems Research," MIS Quarterly
Detection," The CPA Journal (69:8), pp. 57-59. (28:1), pp. 75-105.

MIS Quarterly Vol. 36 No. 4/December 2012 1 325

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et al./A Meta-Learning Framework for Detecting Financial Fraud

Hu, M. Y., and Tsoukalas, C. 2003. "Explaining Consumer Choice Mitchell, T. M. 1997. Machine Learning, New York: McGraw-
through Neural Networks: The Stacked Generalization Ap- Hill.

proach," European Journal of Operational Research ( 1 46:3), pp. Persons, O. S. 1995. "Using Financial Statement Data to Identify
650-660. Factors Associated with Fraudulent Financial Reporting,"
Kaminski, K. A., Wetzel, T. S., and Guan, L. 2004. "Can Financial Journal of Applied Business Research (11 :3), pp. 38-46.
Ratios Detect Fraudulent Financial Reporting," ManagerialPhua, C., Alahakoon, D., and Lee, V. 2004. "Minority Report in
Auditing Journal (19:1), pp. 15-28. Fraud Detection: Classification of Skewed Data," ^ CM SIGKDD
Kinney, W. R., Jr. 1987. "Attention-Directing Analytical Review Explorations Newsletter (6:1), pp. 50-59.
Using Accounting Ratios: A Case Study," Auditing : A JournalPiramuthu, S., Ragavan, H., and Shaw, M. J. 1998. "Using Feature
of Practice and Theory , Spring, pp. 59-73. Construction to Improve the Performance of Neural Networks,"
Kirkos, E., Spathis, C., and Manolopoulos, Y. 2007. "Data MiningManagement Science (44:3), pp. 416-430.
Techniques for the Detection of Fraudulent Financial State-Quinlan, R. 1986. "Induction of Decision Trees," Machine
ments," Expert Systems with Applications (32), pp. 995-1003. Learning ( 1 : 1 ), pp. 8 1 - 1 06.

Kohavi, R. 1996. "Scaling Up the Accuracy of Naïve Bayes Rendell, L., Seshu, R., and Tcheng, D. 1987. "Layered Concept-
Classifiers: A Decision Tree Hybrid," in Proceedings of the 2nd Learning and Dynamically- Variable Bias Management," in Pro-
International Conference on Knowledge Discovery and Data ceedings of the 10th International Joint Conference on Artificial

Mining , Portland, OR, August 2-4, pp. 202-207. Intelligence , San Francisco: Morgan Kaufmann, pp. 308-314.
Kuhn, J. R., Jr., and Sutton, S. G. 2006. "Learning from Shmueli, G., Patel, N., and Bruce, P. 2010. Data Mining for
WorldCom: Implications for Fraud Detection and Continuous Business Intelligence (2nd ed.), Hoboken, NJ: Wiley & Sons.
Assurance," Journal of Emerging Technologies in AccountingSigletos, G., Paliouras, G., Spyropoulos, C. D., and Hatzopoulos, M.
2005 . "Combining Information Extraction Systems Using Voting
(3), pp. 61-80.
Kuncheva, L. I., and Whitaker, C. J. 2003. "Measures of Diversity
and Stacked Generalization," Journal of Machine Learning
Research (6), pp. 1751-1782.
in Classifier Ensembles and Their Relationship with the
Spathis, C. T. 2002. "Detecting False Financial Statements Using
Ensemble Accuracy," Machine Learning (51:2), pp. 181-207.
Published Data: Some Evidence from Greece," Managerial
Langley, P., Zytkow, J. M., Simon, H. A., and Bradshaw, G. L.
Auditing Journal (17:14), pp. 179-191.
1986. "The Search for Regularity: Four Aspects of Scientific
Spathis, C. T., Doumpos, M., and Zopounidis, C. 2002. "Detecting
Discovery," in Machine Learning: An Artificial Intelligence
Falsified Financial Statements: A Comparative Study Using
Approach , Vol. II, S. R. Michalski, G. J. Carbonell, and M. T.
Multicriteria Analysis and Multivariate Statistical Techniques,"
Mitchell (eds.), San Francisco: Morgan Kaufman, pp. 425-470.
The European Accounting Review (11 :3), pp. 509-535.
Lin, J. W., Hwang, M. I., and Becker, J. D. 2003. "A Fuzzy Neural
Stempel, J. 2009. "Fraud Seen Growing Faster in Financial Sec-
Network for Assessing the Risk of Fraudulent Financial
tor," Reuters , October 19 (http://www.reuters.com/article/2009/
Reporting," Managerial Auditing Journal (18:8), pp. 657-665.
1 0/1 9/businesspro-us-fraud-study-idUSTRE59I5592009 1019).
Lynam, T. R., and Cormack, G. V. 2006. "On-line Spam Filtering
Storey, V., Burton- Jones, A., Sugumaran, V., and Purao, S. 2008.
Fusion," in Proceedings of the 29th Annual International ACM
"CONQUER: A Methodology for Context-Aware Query Pro-
SIGIR Conference on Research and Development in Information
cessing on the World Wide Web," Information Systems Research
Retrieval , Seattle, WA, August 6-11, pp. 123-130.
(19:1), pp. 3-25.
Maletta, M., and Wright, A. 1996. "Audit Evidence Planning: An
Summers, S. L., and Sweeney, J. T. 1998. "Fraudulently Misstated
Examination of Industry Error Characteristics," Auditing : A
Financial Statements and Insider Trading: An Empirical
Journal of Practice and Theory (15), pp. 71-86.
Analysis," The Accounting Review (73:1), pp. 131-146.
March, S. T., and Smith, G. 1995. "Design and Natural Science
Tian, Y., Weiss, G. M., and Ma, Q. 2007. "A Semi-Supervised
Research on Information Technology," Decision Support Systems
Approach for Web Spam Detection Using Combinatorial Feature-
(15:4), pp. 251-266. Fusion," in Proceedings of the ECML Graph Labeling Work-
Markus, M. L., Majchrzak, A., and Gasser, L. 2002. "A Design shops ' Web Spam Challenge , Warsaw, Poland, September 17-21,
Theory for Systems That Support Emergent Knowledge Pro- pp. 16-23.
cesses," MIS Quarterly (26:3), pp. 179-212. Ting, K. M., and Witten, I. H. 1997. "Stacked Generalization:
Martin, B. 1995. "Instance-Based Learning: Nearest Neighbor When Does It Work?" in Proceedings of the 15th Joint Interna-
with Generalization," unpublished Master's Thesis, University of tional Conference on Artificial Intelligence , San Francisco:
Waikato, Computer Science Department, Hamilton, New Morgan Kaufmann., pp. 866-871.
Zealand.
Tsoumakas, G., Angelis, L., and Vlahavas, I. 2005. "Selective
Matheus, C. J., and Rendell, L. A. 1989. "Constructive Induction Fusion of Heterogeneous Classifiers," Intelligent Data Analysis
on Decision Trees," in Proceedings of the 1 1th International Joint (9), pp. 511-525.
Conference on Artificial Intelligence , San Mateo, CA: Morgan Vapnik, V. 1999. The Nature of Statistical Learning Theory ,
Kaufman, pp. 645-650. Berlin: Springer- Verlag.
Michalewicz, Z., Schmidt, M., Michalewicz, M., and Chiriac, C.
Vercellis, C. 2009. Business Intelligence: Data Mining and Opti-
2007. Adaptive Business Intelligence , New York: Springer. mization for Decision Making , Hoboken, NJ: Wiley.

1 326 MIS Quarterly Vol. 36 No. 4/December 2012

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms
Abbasi et ai/A Meta-Learning Framework for Detecting Financial Fraud

Vilalta, R., and Drissi, Y. 2002. "A Perspective View and Survey ACM Transactions on Information Systems. He is a member of the
of Meta-Learning, "Artificial Intelligence Review ( 1 8), pp. 77-95. AIS and IEEE.

Vilalta, R., Giraud-Carrier, C., Brázdil, P., and Soares, C. 2004.

"Using Meta-Learning to Support Data Mining," International Conan C. Albrecht is an associate professor in the Information
Journal of Computer Science and Applications (1 : 1), pp. 3 1-45. Systems Department at the Marriott School of Management,
Virdhagriswaran, S., and Dakin, G. 2006. "Camouflaged Fraud Brigham Young University. He received his Ph.D. in MIS from the
Detection in Domains with Complex Relationships," in Pro- University of Arizona and his Master's of Accountancy from
ceedings of the 12th ACM SIGKDD Conference on Knowledge Brigham Young University. His research is focused on algorithms
Discovery and Data Mining , New York: ACM, pp. 941-947. and data structures that support computer-based fraud detection.
Walls, J. G., Widmeyer, G. R., and El Sawy, O. A. 1992. "Building Conan has been involved in fraud detection systems at British
an Information System Design Theory for Vigilant EIS," Petroleum, the World Bank, and the State of North Carolina, and he
Information Systems Research (3:1), pp. 36-59. has trained auditors and fraud detectives for the Association of
Watson, H. J., and Wixom, B. H. 2007. "The Current State of Busi- Certified Fraud Examiners. His work is published in Computational
ness Intelligence," IEEE Computer (40:9), pp. 96-99. Intelligence , Decision Support Systems , Communications of the
Witten, I. H., and Frank, E. 2005. Data Mining : Practical ACM , Information & Management , and IEEE Transactions on
Machine Learning Tools and Techniques (2nd ed.), San Francisco: Systems, Man, and Cybernetics. Conan is the author of the Picalo
Morgan Kaufmann. open source fraud detection system, available at
Wolpert, D. H. 1992. "Stacked Generalization," Neural Networks http://www.picalo.org.
(6), pp. 241-259.
Yue, D., Wu, X., Wang, Y., and Li, Y. 2007. "A Review of Data Anthony Vance is as an assistant professor in the Information
Mining-Based Financial Fraud Detection Research," in Systems Department at the Marriott School of Management, Brig-
Proceedings of the International Conference on Wireless Com- ham Young University. He has earned Ph.D. degrees in Information
munications Networking and Mobile Computing , Shanghai, Systems from Georgia State University, USA; the University of
September 21-25, pp. 5519-5522. Paris-Dauphine, France; and the University of Oulu, Finland. He
Zekany, K., Braun, L., and Warder, Z. 2004. "Behind Closed received a Master's of Information Systems Management from
Doors at WorldCom: 2001," Issues in Accounting Education Brigham Young University, during which he was also enrolled in the
(19:10), pp. 101-117. IS Ph.D. preparation program. His previous experience includes
Zhao, H., Sinha, A. P., and Ge, W. 2009. "Effects of Feature working as a visiting research professor in the Information Systems
Construction on Classification Performance: An Empirical Study Security Research Center at the University of Oulu, where he
in Bank Failure Prediction," Expert Systems with Applications remains a research fellow. He also worked as an information

(36), pp. 2633-2644. security consultant and fraud analyst for Deloitte. His work is
Zhou, Y., and Goldman, S. 2004. "Democratic Co-learning," in published in MIS Quarterly , Journal of Management Information
Proceedings of the 16th IEEE International Conference on Tools Systems, European Journal of Information Systems , Journal of the
with Artificial Intelligence , pp. 594-202. American Society for Information Science and Technology , Infor-
Zhou, Z. and Li, M. 2005. "Tri-Training: Exploiting Unlabeled mation & Management , Journal of Organizational and End User
Data Using Three Classifiers," /2s£is Transactions on Knowledge Computing , and Communications of the AIS. His research interests
and Data Engineering ( 1 7: 1 1 ), pp. 1 529- 1 54 1 . are information security and trust in information systems.

James V. Hansen is the J. Owen Cherrington Professor of Infor-

About the Authors mation Systems in the Information Systems Department at the
Marriott School of Management, Brigham Young University. He
received his Ph.D. in Computer Science (Machine Learning) from
Ahmed Abbasi is an assistant professor of Information Technology
the University of Washington, Seattle. He is an associate editor for
in the Mclntire School of Commerce at the University of Virginia.
He received his Ph.D. in MIS from the University of Arizona andIEEE
an Intelligent Systems and a former associate editor for
M.B.A. and B.S. in Information Technology from Virginia Tech. Accounting Review. His research focus is on machine learning and
His research interests include fraud detection, online security, data
and mining, with related publications appearing in Management
text mining. Ahmed's projects on Internet fraud and social media Science , MIS Quarterly , ACM Computing Surveys , Computational
analytics have been funded by the National Science Foundation. Intelligence
His , IEEE Transactions on Neural Networks and Learning ,
research has appeared, among other outlets, in MIS Quarterly IEEE
, Transactions on Knowledge and Data Engineering , Commu-
nications
Journal of Management Information Systems , IEEE Transactions on of the ACM , Decision Support Systems , and Decision
Sciences , among others.
Knowledge and Data Engineering , IEEE Intelligent Systems , and

MIS Quarterly Vol. 36 No. 4/December 2012 1327

This content downloaded from

58.84.60.225 on Sat, 15 Mar 2025 03:54:08 UTC
All use subject to https://about.jstor.org/terms

Abbas I 2012
No ratings yet
Abbas I 2012
36 pages
Financial Statement Fraud Challenges and
No ratings yet
Financial Statement Fraud Challenges and
11 pages
OpenAI's Role in Financial Fraud Detection
No ratings yet
OpenAI's Role in Financial Fraud Detection
19 pages
22aa06 - Arun Kumar R
No ratings yet
22aa06 - Arun Kumar R
18 pages
Strategic Fraud Detection
No ratings yet
Strategic Fraud Detection
30 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
11 pages
Cecchini 2010
No ratings yet
Cecchini 2010
16 pages
Machine Learning for Fraud Detection
No ratings yet
Machine Learning for Fraud Detection
14 pages
2011 DSS Detecting Evolutionary Financial Statement Fraud PDF
No ratings yet
2011 DSS Detecting Evolutionary Financial Statement Fraud PDF
7 pages
Ashtiani 2022
No ratings yet
Ashtiani 2022
22 pages
Framework for Detecting Financial Fraud
No ratings yet
Framework for Detecting Financial Fraud
16 pages
The Role of AI and Machine Learning in Fraud Detection: Enhancing Risk Management in Corporate Finance
No ratings yet
The Role of AI and Machine Learning in Fraud Detection: Enhancing Risk Management in Corporate Finance
20 pages
AI in Financial Fraud Detection
No ratings yet
AI in Financial Fraud Detection
25 pages
Fame Main
No ratings yet
Fame Main
41 pages
AI Fraud Detection
No ratings yet
AI Fraud Detection
19 pages
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
No ratings yet
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
18 pages
Deep Learning For Detecting Financial Statement Fraud
No ratings yet
Deep Learning For Detecting Financial Statement Fraud
46 pages
A Review of Data Mining-Based Financial Fraud Detection Research
No ratings yet
A Review of Data Mining-Based Financial Fraud Detection Research
4 pages
AI-Powered Fraud Detection in Real-Time Financial Transactions
No ratings yet
AI-Powered Fraud Detection in Real-Time Financial Transactions
11 pages
An Analysis On Financial Statement Fraud Detection For Chinese Listed Companies Using Deep Learning
No ratings yet
An Analysis On Financial Statement Fraud Detection For Chinese Listed Companies Using Deep Learning
17 pages
Fraud Detection Based On Data Mining
No ratings yet
Fraud Detection Based On Data Mining
11 pages
Fraud Detection with Machine Learning
100% (1)
Fraud Detection with Machine Learning
25 pages
MetaFraud: Detecting Financial Fraud
No ratings yet
MetaFraud: Detecting Financial Fraud
37 pages
899-Article Text-2373-1-10-20240316
No ratings yet
899-Article Text-2373-1-10-20240316
11 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Paper 9
No ratings yet
Paper 9
13 pages
A Review A Review of Financial Accounting Fraud Detection Based On Data Mining Techniquesof Financial Accounting Fraud Detection Based On Data Mining Techniques
No ratings yet
A Review A Review of Financial Accounting Fraud Detection Based On Data Mining Techniquesof Financial Accounting Fraud Detection Based On Data Mining Techniques
11 pages
DP Fraud Detection BANKING
No ratings yet
DP Fraud Detection BANKING
9 pages
Fraud Detection Automation Through Data Analytics and Artificial Intelligence
No ratings yet
Fraud Detection Automation Through Data Analytics and Artificial Intelligence
18 pages
216 649 4 PB - 3
No ratings yet
216 649 4 PB - 3
9 pages
LabVIEW Lab Report: Arduino Integration
No ratings yet
LabVIEW Lab Report: Arduino Integration
7 pages
Mitsubishi Engine Di k3c k3d K3e k3f K4e k4f k4m Service Manual Hh0854-300
100% (1)
Mitsubishi Engine Di k3c k3d K3e k3f K4e k4f k4m Service Manual Hh0854-300
158 pages
Panduan Teknis MEMO 2022
No ratings yet
Panduan Teknis MEMO 2022
22 pages
Elective Ict Parcticals 2025 SHS 3 First Term
No ratings yet
Elective Ict Parcticals 2025 SHS 3 First Term
2 pages
Class 12 (IP) PT.1question Paper2024-25
No ratings yet
Class 12 (IP) PT.1question Paper2024-25
3 pages
SL Editor Quick Guide EN
No ratings yet
SL Editor Quick Guide EN
9 pages
Architecture by Yourself Participation - Literature-Review - Thvardouli PDF
No ratings yet
Architecture by Yourself Participation - Literature-Review - Thvardouli PDF
10 pages
Lab 1 Linear Lna Design 195
No ratings yet
Lab 1 Linear Lna Design 195
49 pages
Ceng240 Week 01 0.5.5
No ratings yet
Ceng240 Week 01 0.5.5
111 pages
C++ in 10 Hours PDF
No ratings yet
C++ in 10 Hours PDF
234 pages
Campus Connect Project Guide
No ratings yet
Campus Connect Project Guide
8 pages
Wireless Patient Monitoring System
67% (6)
Wireless Patient Monitoring System
26 pages
Day 1 - Java Seminar Notes
No ratings yet
Day 1 - Java Seminar Notes
32 pages
Screenshot 2023-12-20 at 5.57.31 PM
No ratings yet
Screenshot 2023-12-20 at 5.57.31 PM
3 pages
Module 3 - Agents Environments
No ratings yet
Module 3 - Agents Environments
7 pages
You Exec - Digital Transformation Part 2 Free
No ratings yet
You Exec - Digital Transformation Part 2 Free
12 pages
TLE ICT CSS - Q1 Notes 2 - Tools and Equipment
No ratings yet
TLE ICT CSS - Q1 Notes 2 - Tools and Equipment
2 pages
1710.P1-FD-FG-08-LST-0102 - 0 - IO List For FGS
No ratings yet
1710.P1-FD-FG-08-LST-0102 - 0 - IO List For FGS
4 pages
SOP Sample For Masters
No ratings yet
SOP Sample For Masters
3 pages
KRA Template V1.0
No ratings yet
KRA Template V1.0
6 pages
Full Stack Developer: Spring Boot & SOAP
No ratings yet
Full Stack Developer: Spring Boot & SOAP
6 pages
1ZBG000828 - en TXpert Hub Power Technical Brochure
No ratings yet
1ZBG000828 - en TXpert Hub Power Technical Brochure
14 pages
Fire Alarm: Harrington
No ratings yet
Fire Alarm: Harrington
4 pages
Sensing - Universal PID-Thermoregulator: Application Note Abstract
100% (2)
Sensing - Universal PID-Thermoregulator: Application Note Abstract
19 pages
Fuel Pump to EV Charging Station Project
No ratings yet
Fuel Pump to EV Charging Station Project
7 pages
PMS0233 Paw3335db Tzdu DS R1.11 101218.
No ratings yet
PMS0233 Paw3335db Tzdu DS R1.11 101218.
51 pages
Email Accounts and Details Summary
No ratings yet
Email Accounts and Details Summary
1 page
Easy English Computer Dialogue
No ratings yet
Easy English Computer Dialogue
1 page
Cost Function Loss Function
No ratings yet
Cost Function Loss Function
7 pages
JLR Approved J2534 VCI Update Guide
100% (3)
JLR Approved J2534 VCI Update Guide
28 pages