0% found this document useful (0 votes)
118 views21 pages

Enhancing Managerial Judgment with BSC

This study examines how increasing effort through process accountability (requiring managers to justify performance evaluations) and improving perceived quality of balanced scorecard measures through independent assurance impacts managerial judgment. The study finds either accountability or assurance increases use of unique measures in evaluations, better aligning with theory on weighting performance measures. This contributes to literature on debiasing evaluations and effects of assurance on use of non-financial measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views21 pages

Enhancing Managerial Judgment with BSC

This study examines how increasing effort through process accountability (requiring managers to justify performance evaluations) and improving perceived quality of balanced scorecard measures through independent assurance impacts managerial judgment. The study finds either accountability or assurance increases use of unique measures in evaluations, better aligning with theory on weighting performance measures. This contributes to literature on debiasing evaluations and effects of assurance on use of non-financial measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

THE ACCOUNTING REVIEW

Vol. 79, No. 4


2004
pp. 1075-1094

The Balanced Scorecard: The Effects of


Assurance and Process Accountability on
Managerial Judgment
Theresa Libby
Wilfrid Laurier University
Steven E. Salterio
Queen's University
Alan Webb
University of Waterloo
ABSTRACT: The balanced scorecard is one of the major developments in management
accounting in the past decade (Ittner and Larcker 2001). Lipe and Salterio (2000) find
that managers ignore one of the key scorecard features, the inclusion of measures that
are unique to the strategic objectives of a business unit, when making performance
evaluation judgments. This study identifies and tests two approaches to reducing this
"common measures bias." We examine vi/hether increasing effort via invoking process
accountability (i.e., requiring managers to justify to their superior their performance
evaluations) and/or improving the perceived quality of the balanced scorecard mea-
sures (i.e., via an independent third-party assurance report on the balanced scorecard)
increases managers' usage of unique performance measures in their evaluations. Re-
sults suggest that either the requirement to justify an evaluation to a superior or the
provision of an assurance report on the balanced scorecard increases the use of unique
measures in managerial performance evaluation judgments. Implications for theory and
practice are discussed.
Keywords: balanced scorecard; performance measures; performance evaluation; de-
biasing; assurance; justification; process accountability.

We thank Linda Thotne, Kate Bewley, Howard Teall. Claude Lanfranconi. Joan Conrad, Jason Lee, and Peter
Tiessen tor help in obuiining participants. We thank, Bryan Cloyd. Susiin Krische. Bill Messier, and Natalia Kotch-
etuva for their insightful commenis on previotjs versions of the paper. In addition, we thank accounting research
workshop participanis al the Richard Ivcy School of Business, the University of Western Ontario. Concordia
University (Montreal), University of Toronto, Queen's University, Virginia Polytechnic & State University, Georgia
Slale University, and the University of Waterloo, the Accounting Brownbag seminar at the University of Illinois,
and the Social Psychology Brownbag seminars at the University of Waterloo and the University of Illinois, We
thank conference participants at ihe AAA Audit Section Midyear Meeting, the AAA MAS Section Annual Con-
ference and the AAA Annual Meeting, especially our discussants Anne Farrell, Monte Swain, and Wendy Bailey.
We thank Jane Kennedy (associate editor) and two anonymous reviewers for many helpful suggestions. Finally,
we thank Guoping Liu and Rawan El-klatib for excellent research assistance. We are grateful to the Canadian
Institute of Chartered Accountant's Academic Development Fund and the UW/SSHRC for project funding.
Editor's note; This paper was accepted by Jane Kennedy, Editor,
Submitted March 2003
Accepted March 2004

1075
1076 Libby, Salterio, and Webb

I. INTRODUCTION

T
he balanced scorecard (BSC) is a performance measurement tool used to translate
an organization's strategic goals into financial and nonfinancial objectives and per-
formance measures (Kaplan and Norton 2001). In diversified organizations, individ-
ual business units may face different competitive pressures, operate in different product
markets, and may therefore require different divisional strategies (Kaplan and Norton 1993).
Consequently, business units may develop customized scorecards to fit their unique situa-
tions within the context of the overall organizational strategy (Kaplan and Norton 2001).
Customized scorecards allow business unit managers to identify and set targets for the
business unit's own unique set of performance measures. Although business units within a
company may have several BSC measures in common, the unique measures represent what
individual business units must accomplish in order to succeed (Kaplan and Norton 1996a).
Kaplan and Norton (2001) provide several anecdotal examples of organizations that
have successfully implemented customized divisional scorecards. Lipe and Salterio (2000)
(hereafter LS 2000), however, find divisional performance evaluations reflect information
contained in the measures that are common across divisions, while failing to incorporate
the information contained in the unique performance measures. LS (2000) attribute their
results to the common measures bias. Such a bias can be problematic in this setting because
relevant measures that are ignored in post hoc performance evaluations are unlikely to be
attended to in ex ante decision making (Holmstrom and Milgrom 1991; McNamara and
Fisch 1964).
Based on results of prior analytical research on performance measurement, we argue
that managers should use both the common and unique BSC measures in performance
evaluation. This research demonstrates that non-zero weights should be attached to all
performance measures that: (I) are sensitive to a manager's actions; (2) ean be measured
with some precision; (3) are not perfectly correlated with other measures in the system;
and (4) lead to outcomes desirable to the owners of the firm (Banker and Datar 1989;
Feltham and Xie 1994; Hemmer 1996; Datar et al. 2001). Although the literature offers no
prescriptions concerning the relative weighting of performance measures, the general con-
clusion is that zero weights are inappropriate (Ittner et al. 2003) unless the data quality is
sufficiently poor (Yim 2001).
Research in both performance evaluation and consumer behavior suggests more effort
is required to perform absolute versus relative evaluations of alternatives (e.g.. Heneman
1986; Zhang and Markman 2001). Since participants in LS (2000) had only intrinsic in-
centives (i.e., a personal desire to do well on the task) when preparing the evaluations, they
may have defaulted to the less effortful approach of comparing the performance of the two
divisions on the common measures. Thus, the source of the bias may be motivational or
effort-related.
According to the organizational psychology research, performance evaluators tend to
place less weight on measures considered to be less reliable (e.g., Blum and Naylor 1968).
Survey results suggest that corporate executives have concerns over the quality of nontra-
ditional performance measures used in business unit scorecards (Lingle and Schiemann
1996; Ittner and Larcker 2001). It is therefore possible that managers ignore the unique
BSC measures because of concerns about their quality (e.g., reliability or relevance) (Ittner
and Larcker 2003; Yim 2001).

The Accounting Review, October 2004


Effects of Assurance and Process Accountability on Managerial Judgment 1077

To address the possibility that the common measures bias is effort-related, we examine
the effects of imposing process accountability by requiring managers to justify their per-
formance evaluations to their superior' To address the possibility that the bias relates to
data-quality concerns, we examine the effects of providing a third-party assurance report
over the BSC measures. In principle, organizations can adopt both of these bias reduction
approaches subject to firm-specific cost/benefit considerations.
This study contributes to the literature in at least three ways. First, we examine ways
of improving managers' BSC-based performance evaluation judgments by encouraging the
incorporation of both common and unique performance measures, which we argue is ap-
propriate in many managerial settings. Second, we investigate the effects of providing an
assurance report about business performance measures on managers' use of those measures
(Reck 2001). Maines et al. (2002, 359) comment. "Whether attestation services or other
forms of reliability enhancement could affect the quality of nonfinanciai measure reporting
remains largely unexplored." The AICPA and the Canadian Institute of Chartered Account-
ants (CICA) have recently developed a combined training and software program to allow
CPAs and CAs the opportunity to develop additional expertise in nontraditional performance
measures.- Once CPA/CA credentials are established in this practice area, standards more
specific than the current broad assurance and attestation standards may be issued.' Through
development of the form and content of the assurance report, in cooperation with experts
from the CICA, we provide insights about the impact of this potential assurance service.^
Third, we extend the study of debiasing to the information quality domain, adding to the
accounting research literature on effort-related and internal data-related (i.e., knowledge)
debiasing techniques (Kennedy 1993, 1995).
We find that the provision of an assurance report on the BSC and/or the requirement
for process accountability increases the use of unique measures in performance evaluations.
This shift to greater use of the unique measures is consistent with both the analytical
literature on the weighting of performance measures and BSC proponents' assertions con-
cerning the appropriate use of the scorecard information.
The remainder of this paper is organized as follows. First, we describe the common
measures bias, review literature concerning the source of this bias, and then hypothesize
about approaches to reduce the bias. Next, we describe the experiment and the results. We
conclude with a discussion of our results and study limitations.

n. LITERATURE REVIEW AND HYPOTHESIS DEVELOPMENT


Slovic and MacPhillamy (1974) perform a series of experiments in which participants
are given information about a student (e.g., need for achievement, English skills, and quan-
titative skills) and then must predict the student's first year university GPA. Each participant
receives one of the three items of information in common and one piece of information
that is unique about each student. Participants tend to base their judgments on the common

Although the justification of perfomiance evaluations to a subordinate is the norm (e.g,, Edwards 1989), justi-
fication of performance evaluations to superiors and even peers also occurs (e.g,. Inner et al. 2(H)3),
See http://www,aicpa.org/assurance (accessed July 30. 2003) for material abotit CPA/CA Pertbrmance Views,
a product designed for CPA/CAs to help clients develop balanced scorecard-like performance measurement
systems.
Section .^025 of the CICA Handbook. SSAE No. 10 in the U,S. and International Standard on Assurance
Engagements (ASA) ISA No, 10 all outline assurance standards for all types of assurance engagements broad-
ening previous standards that applied to assurance on financial statements only.
This approach is consistent with calls in accounting research to provide ex anie evidence about the benehts and
costs of potential new standards (e,g., Libby and Kinney 2000: Libby et al, 2002).

The Accounting Review, October 2004


1078 Libby, Salterio. and Webb

information and place substantially less weight on the unique information. They label this
effect the "common measures bias."
Slovic and MacPhillamy (1974) provide outcome feedback, improve information qual-
ity, and provide monetary incentives to induce effort on the task, all in an attempt to debias
their participants' judgments. Slovic and MacPhillamy (1974) find that these approaches do
not reduce the common measures bias. It is unclear whether incentives are ineffective
because the bias is not due to lack of effort, the incentives are insufficient to motivate the
requisite effort, or the task is not effort-sensitive. Similarly, participants may not appreciate
the relevance of the outcome feedback or improved information quality because they are
unaware they have underweighted the unique information in the first place.
From a theoretical perspective, performance evaluations exhibiting the common mea-
sures bias may be of poorer quality than evaluations ba,sed on all available BSC measures.
Specifically, Holmstrom's (1979) informativeness criterion indicates that any performance
measure providing information about a manager's actions is valuable as long as some
uncertainty unique to the divisional environment exists. The existence of unique measures
in a BSC implies some degree of distinctiveness in the uncertainty of each division's op-
erating environment suggesting performance evaluation using only the common measures
is inappropriate.^
Subsequent analytical research shows that the relative weights placed on performance
measures depend on the sensitivity (i,e,, the change in a measure's mean value in response
to a change in the manager's action), precision (i.e., the inverse of the variance in the
measure given the manager's action), congruency, and quality of the measures (Banker and
Datar 1989; Datar et al. 2001; Feltham and Xie 1994; Ittner et al. 2003; Yim 2001). Since
these values likely vary from company to company due to different measurement ap-
proaches, specific relative weights or even a range of specific weights cannot be prescribed.
We gain insight into relative weighting of performance measures from psychology
research on improving judgment (e.g., Hastie and Dawes 2001). Results of this research
demonstrate that the quality of judgment improves when managers consider all, as opposed
to only a subset of, relevant factors in making the judgment. These results hold even when
there is a large deviation between the participant's specific factor weights and the "optimal"
weights determined by linear regression (Hastie and Dawes 2001. 62-64; Dawes 1979).
Hence, in the LS (2000) environment, the actual weights employed by managers using the
common and unique measures (versus unknown "optimal" weights) are not as important
as ensuring that all relevant measures are weighted in the performance evaluation judgment.
If the unique measures represent key drivers of performance, can be influenced by man-
agers, are measurable with some accuracy, are not perfectly correlated with other measures,
and are of sufficient quality, they should be used in performance evaluation judgments even
if the optimal weight for each measure is unknown (Ittner et al, 2003),^
From a practice perspective, we assume that performance evaluations encourage man-
agers to pursue strategically congruent actions based on at least some strategies that are

-^ Alternatively, a focus on common tneasurcs may be appropriule if applying relative performance evaluation
practices (RPE) (Holmstrom 1982), Under RPE, strict reliance on the common measures is viewed as an effective
means of filteting out the common uncertainty facing the two divisional managers (Frederickson 1992: Dye
1992: Antle and Smith I986|, Given a divisional organization structure, it is unlikely that the only uncertainty
facing the various business units is common to all; hence, RPE will not be effective in this .setting,
^ Kaplan and Norton (2001) propose a "balanced" weighting scheme across the four BSC quadrants, but they
are silent on the relative weight of measures within each quadrant. However, becau.se Kaplan and Norton (1996b)
place great emphasis on the process employed to select measures that are relevant to divisional strategy, they
would probably not advocate ignoring these measures altogether.

The Accounting Review. October 2004


Effects of Assurance and Process Accountability on Managerial Judgment 1079

unique to their division. If we further assume that the scorecard has been designed so that
all measures are strategically relevant (e.g., Kaplan and Norton I996a|, then it seems rea-
sonable that both the common and unique performance measures should be considered in
performance evaluation judgments. Therefore, we examine potential causes for the common
measures bias documented by LS (2000) and means of Increasing the use of unique
measures.

Bias Due to Lack of Cognitive Effort


According to Slovic and MacPhillamy (1974), decision makers fail to use unique in-
formation because it is more difficult to evaluate than common information. Specifically,
common information is direct and unambiguous while unique information requires the
evaluation of trade-offs between different attributes (e.g., the relative importance of a higher
GMAT score versus a higher need for achievement score to GPA prediction). Since the
estimation and implementation of these trade-offs is cognitively more difficult, individuals
must exert additional cognitive effort to integrate the unique information.'' Similarly.
Heneman (1986) suggests that a comparative performance evaluation (i.e., one employee
is compared to another) is cognitively easier to perform than an absolute evaluation {i.e.,
each employee is evaluated independently) since the evaluator is only processing infor-
mation common to both employees.
A lack of motivation to attend to information that is relevant to the judgment, but
cognitively difficult to process may affect the quality of judgment (Kennedy 1995). In the
LS (2000) context, managers may have ignored the unique BSC measures because proc-
essing them requires greater cognitive effort. Markman and Medin (1995), however, suggest
that given sufficient cognitive effort, decision makers can establish commonality across
dimensions that, on the surface, are not common. Consistent with this claim. Kurtz et al.
(2001) find that participants who engage in a more effortful comparison process are better
able to detect the similarities and relations between two analogous scenarios. Further, Zhang
and Markman (2001) find that greater task involvement (i.e., motivation) leads to greater
use of unique information in preference formation.
Establishing process accountability by informing individuals they will have to justify
their decision process before making a final decision or judgment is one way to invoke
more effortful and complete processing of available information (Tetlock 1985; Simonson
and Staw 1992; Lerner and Tetlock 1999). Numerous studies provide evidence supporting
the effort-inducing effects of process accountability (e.g., Mero and Motowidlo 1995;
Kennedy 1995; Tan and Kao 1999).* If the common measures bias is due to lack of effort,
then prior research suggests invoking process accountability will result in managers apply-
ing some nonzero weight to all relevant information, including the previously ignored

Even though participanls in LS (2000) were provided perrormance targets for each unique and c-onimon measure,
their judgments still appear lo have been affected by the common measures bias. This is not surprising since
participants would still need lo deleniiine the relative importance of each unique measure to the overall per-
formance evaluation judgment, a step that Slovic and MaePhillamy (1974) argue is unnecessary for common
measures.
Alihough process accountability can lead lo increased use of irrelevant or nondiagnostic information (Tetlock
and Boettger 1989; Siegel-Jacobs and Yates 1996). in our setting all BSC performance measures are relevant lo
the strategic objectives of a division as indicated by the LS (2000) pre-test results and the Krumwiede et al.
(2002) expert panel assessments.

The Accounting Review. October 2004


1080 Libby, Salterio, and Webb

unique information, in preparing their performance evaluations.'* Incorporating other rele-


vant factors into judgment, even if the weights assigned are not statistically "optimal," can
lead to improved judgment in a wide variety of settings (Hastie and Dawes 2001; Dawes
1979).
Since LS (2000) find that their participants ignore the unique measures, any nonzero
weight attached to those measures as the result of invoking process accountability should
increase the relative use of unique versus common information resulting in better perform-
ance evaluation judgments (Holmstrom 1979; Feltham and Xie 1994; Hemmer 1996). Thus,
we formulate the following hypothesis:

HI: Managers who are required to justify their performance evaluation judgments will
be more likely to use unique performance measures in their performance evaluation
judgments than managers who are not so required.

Bias Due to Concerns about Data Quality


Many studies in the performance appraisal literature (e.g.. Arvey and Murphy 1998)
and In recent analytical management accounting research (Yim 2001) examine the impor-
tance of performance measure quality to the accuracy of performance evaluations. Survey
results suggest corporate executives find both financial and nonfinancial measures to be
important in evaluating performance, but they question the quality of the nonfinancial mea-
sures (Ittner and Larcker 2001; Lingle and Schiemann 1996). Reck (2001) reports evidence
consistent with the notion that any measures of effectiveness or efficiency other than tra-
ditional financial statement measures may be considered by managers to be unreliable.
Therefore, when managers assess performance, they may place less weight on the measures
perceived to be less reliable (Blum and Naylor 1968; Yim 2001; Ittner and Larcker 2001).
Therefore, the LS (2000) managers may fail to rely on the unique BSC measures in de-
veloping performance evaluation judgments due to perceptions that mea.sures in common
across divisions are of higher quality than measures developed within ju.st one division.
This conclusion is consistent with Kennedy's (1993) argument that individuals will be
generally less responsive to external data, defined as information or signals from the external
environment, if they perceive the data to be of poor quality.'" Kennedy (1993) suggests that
improving the decision-maker's perception of the quality of the data may lead to improved
judgment.
According to assurance theory, assurance reports are valuable in decision making be-
cause they enhance the reliability of data in linancial statement reporting (Mautz and Sharaf
1961; Libby 1979; Blackwell et al. 1998). One way to improve the perceived quality of
performance measures included in the BSC might be to provide an assurance report over
them. This is consistent with Holmstrom (1979). who argues that an information signal has
value if it provides some information about the managers' true actions, even if this signal
is imperfect. Yim (2001) shows this result is subject to a "quality of the signal" constraint
and that managers should use only a sufficiently reliable signal in evaluations. Thus, in the

'' While it is possible that invoking process accountability will cause e,\cessive weight to be placed on unique
measures (i,e.. a unique measures bias), it is unlikely this would occur in the LS (2000) setting since participants
otherwise (end not to weight the unique factors at all.
'" Kennedy (1993) distinguishes between external data available in the environment and internal data representa-
tions of the decision maker (i,e,. knowledge stored in memory). Due to difficulties in defining BSC knowledge
in our setting and lack of consensus on how to train participants to become knowledgeable (LS 2000). we
partially control for internal data representations using participant's prior work experience and academic
background.

The Accounting Review, October 2004


Effects of Assurance and Process Accountability on Managerial Judgment 1081

BSC context, third-party assurance signaling the reliability and relevance of the performance
measures included in the BSC could increase the likelihood that managers will assign
weight to those measure.s when making performance evaluation judgments.
Analogous to arguments that an auditor cannot certify only the individual parts of the
financial statements (CICA HB 5510.26), an assurance report must cover the complete set
of performance measures, not just the unique measures (Reck 2001)." Even if an assurance
report increases the perceived quality of all measures, the increase in perceived quality for
the unique measures would move them from a zero weight to some positive weight and
consequently lead to improved judgment (Dawes 1979). This analysis leads to the following
hypothesis about the effect of assurance on divisional managers* use of unique measures
in performance evaluation:

H2: Managers receiving third-party assurance about the quality of the BSC performance
measures will be more likely to use unique performance measures in their per-
formance evaluation judgments than managers nol receiving such assurance.

IV. METHOD
Experimental Task
In this study, we use the experimental case developed by LS (2000). The case requires
participants to assume the role of a .senior executive at WCS Incorporated, a women's
apparel company. The case indicates WCS has recently implemented the BSC and describes
key features of the approach. The two WCS divisions featured are: RadWear. a retail di-
vision that focuses on clothing for urban teenagers; and WorkWear, a division selling busi-
ness uniforms, using a network of sales contacts with business clients. The case provides
information about each division, its management, and describes the strategic objectives of
RadWear and WorkWear. The materials also include a BSC for each division containing
performance measures stated to be relevant to the strategic objectives described in the case.
In their role as WCS executives, participants separately evaluate the performance of
RadWear's manager and WorkWear's manager.
We use the four-category, 16-measure balanced scorecards developed by LS (2000).
Each category on the divisional scorecard contains two measures common to both divisions
and two measures unique to each division. On every measure, each division outperforms
its target, although the percentage by which the target is exceeded differs by type of per-
formance measure (common versus unique). RadWear (WorkWear) exceeds its target
performance on common (unique) measures to a greater degree than WorkWear (RadWear).
However, the sum of excess performance (i.e., total percentage above target) across all
measures (common and unique) is about the same in each division. Table 1 presents
Work Wear's

Design and Procedures


LS (2000) manipulate, in a 2 x 2 design, all possible combinations of performance on
common and unique measures (e.g., common and unique measures are consistently better
for one division, common measures are better for one division and unique measures are

" In addition, practitioners consulted for this study argued that it would be very difficult for ihem to associate
themselves with only the unique measures on an organization's BSC. Hence, we are increasing the mundane
realism of our expcnniental seiting and biasing against finding an assurance report effect if assurance over all
measures increases equally Ihe weights assigned to ail measures.
'- For more details about the case see LS (2000, 288-290).

The Accounting Review, October 2004


1082 Libby, Sallerio. and Webb

TABLE 1
WorkWear's Balanced Scorecard
Targets and Actuals for the Year Ended December 31, 2000

Measure" Target Actual % Better Than Target


Financial
1. Return on sales 24% 25% 4.17
2. Revenues per sales visit $400 $433 8.25
3. Sales growth 34% 36% 5.88
4. Catalog profits 6% 6.5% 8.33
Customer-Related
1. Captured Customers 20% 22.7% 13.50
2. Repeat sales 25% 27% 8.00
3. Referrals 50% 5/.6% 3.20
4. Customer satisfaction rating 84% 86% 2.38
Internal Business Processes
1. Returns to suppliers 8% 7% 12.50
2. Orders filled within one week 55% 99% 16.47
3. Average markdowns 20% 18.5% 7.50
4. Catalog orders filled with errors 5% 4.2% 16 00
Learning and Growth
1. M.B.A. degrees 72% 13.6% 13.33
2. Hours of employee 12 13 8.33
training/employee
3. Certification 20% 21.2% 6.00
4. Employee suggestions/employee 3.1 3.2 3.22

' Items in italics are ihe unique measureslo this division.

better for the other division, etc.). We decided not to replicate the entire LS (2000) design
to ensure efficient use of our subject pool and because the complete LS (2000) results have
been replicated in several subsequent studies (e.g.. Banker et al. 2004; Dilla and Steinbart
2004). Instead, we examine the setting where performance on common measures favors
one division and performance on unique measures favors the other division. Only this
pattern of results allows for the possibility of the common measures bias because when
performance on both types of measures favors the same division, the conclusion of superior
performance for that division is unequivocal (LS 2000).
From the two LS (2000) versions of the case containing offsetting differences in di-
rection for common versus unique measures, we randomly select the case where the com-
mon measures favored RadWear and the unique measures favored WorkWear. Thus, if
evaluators of RadWear rely on the common measures, they will evaluate RadWear's man-
ager as the better performer (i.e., RadWear evaluation minus WorkWear evaluation will be
significantly greater than zero).
We use a 2 X 2 X 2 design with one within-subjects factor and two between-subjects
factors. The within-subjects factor, consistent with the design used by LS (2000). is the
requirement that participants evaluate one manager from each division.'-^ The first between-
subjects factor is the requirement (or not) that participants justify their evaluation of each

" We do not vary order of division presentation or manager evaluation as LS (2000) find that order effects have
no impact on the experimental variables of interest.

The Accounting Review, October 2004


Effects of Assurance and Process Accountability on Managerial Judgment 1083

divisional manager's performance before providing a final performance evaluation.'* We


designed this manipulation to induce process accountability among the participants, causing
them to use additional effort in processing the BSC measures. The case materials for one-
half of the participants indicate that upon completion of their numerical performance eval-
uations they will be "asked to explain and justify in writing how you arrived at your
decision ... The President of WCS will review the initial evaluations of the managers'
performance, so it is important to be careful and precise in completing the evaluations."
The second between-subjects factor is the provision (or not) of a third-party assurance
report. The cases for one-half of the participants explain that WCS top management engaged
ABC Chartered Accountants to provide assurance on the relevance of the scorecard mea-
sures and the reliability of the reported results for each division. ABC's assurance report
immediately precedes the scorecard results in the experimental materials for RadWear and
WorkWear. We jointly developed the form and content of the report, shown in Table 2.
with an acknowledged Canadian expert on assurance reports (Section 5025 of the CICA
Handbook) and other members of the assurance services area of the CICA.

TABLE 2
Auditor's Report on the Relevance and Reliability of the Divisional Scorecard Performance
Measures and Results

Auditor's Report
To the Management of WCS Inc.:
We have audited WCS Inc's disclosure of its Balanced Scorecard Targets and Actuals for each
of its divisions, and the reliability and relevance of the financial and nonfinancial performance
measures presented for each division, for the year ended December 31, 2000. The determination of
the measures to include in each division's scorecard report, and the completeness and accuracy of
the reported results are the responsibility of WCS Inc's management. Our responsibility is to
express an opinion, based on our audit, of the conformity of the performance measures with the
relevance and reliability criteria of the International Performance Measures Reporting Initiative.
Our audit was performed in accordance with standards for assurance engagements established
by the Canadian Institute of Chartered Aceountants. Those standards require that we plan and
perform our audit to obtain reasonable assurance as a basis for our opinion. Our audit included (1)
obtaining an understanding of the strategic objectives and goals of WCS Inc. and each of its
divisions, (2) assessing whether the selected pertbrmance measures relate to the chosen strategy for
each division. (3) assessing the procedures used to produce the reported results, (4) selectively
testing the reported results and (5) performing sueh other procedures as we considered necessary in
the circumstances. We believe that our audit provides a reasonable basis for our opinion.
In our opinion, in all material respects, the financial and nontinancial performance measures
included the Balanced Scorecard Targets and Actuals for each of the divisions of WCS Inc. for the
year ended December 31, 2()00, are relevant, and reliable in accordance with the criteria of the
International Performance Measures Reporting Initiative.
ABC Chartered Accountants
March 3t, 2001

'"* Alerting participants to the justitieation requiremcnl after performing their evaluations would lead to self-
justilication of the decisions already made (Lemer and Tetlock 19^9). Selt-JListification is ineflective as a de-
biasing approach since it does not address the need to increase processing of the unique pertbrmance measures
prior to making evaluation decisions. In addition, since we administer materials during class time to our partic-
ipants, it was not practical to have them accountable to a third party.

The Accounting Review. October 2004


iO84 Libby. Salterio. and Webb

Control Variables
Kennedy (1993, 1995) notes that judgments may be biased if decision-makers' internal
data representations (knowledge stored in memory) are not suited to the task they are
performing. Managers develop internal data representations from work experience and/or
academic study. These representations may allow managers to process both the common
and unique BSC measures. Prior accounting research documents the performance improving
effects of work experience and knowledge on a wide variety of accounting judgments (e.g.,
Bonner and Lewis 1990; Bonner et al. 1992; Vera-Mufioz et al. 2()01; Dearman and Shields
2001). Accordingly, we elicit our participants" type and length of work experience and their
academic background and current emphasis of study, recognizing that each may be a some-
what crude proxy for internal data representations. These measured variables are covariates
in our statistical analyses.
Our hypotheses predict the justification and assurance manipulations will increase par-
ticipants' ability to use the information provided by the unique measures, reducing the
degree to which the performance evaluations are comparative in nature (Hsee 1996). Equity
theory (Adams 1965), however, suggests another possible control variable—differential
managerial emphasis on fairness in performance evaluation. According to equity theory,
individuals will judge fairness by comparing the quantity and quality of one divisional
manager's inputs and outcomes relative to the inputs and outcoines of the other manager
(Adams 1965). Common measures are obvious candidates for use in dealing with such
fairness concems as they are readily comparable across managers (Leventhal et al. 1980;
Folger and Cropanzano 1998). Hence, the degree to which participants believe that fairness
is important to their performance evaluations is employed as a covariate in our statistical
analyses. We assess participants' concems about fairness by asking them to respond to this
statement: "To provide a fair performance evaluation for each manager, it was necessary
to compare the performance of RadWear to WorkWear." Responses are measured on an
11-point scale with endpoints of "Strongly disagree" (-5) and "Strongly Agree" ( + 5).

Dependent Variable
Participants evaluate the performance of both the RadWear and WorkWear managers
on a scale from 0 to 100 that had seven descriptive labels ranging from "Reassign" to
"Excellent" performance.''^ Each of the descriptive labels is explained on the evaluation
form. We employ the differences in our participants' evaluations of the two divisions as
our dependent variable. Since RadWear (WorkWear) outperformed WorkWear (RadWear)
on common (unique) measures relative to target, we interpret a positive difference in eval-
uations as indicative of greater reliance on the common measures since the total percentage
above target across all measures is the same in each division. Increased use of the unique
measures in evaluating the divisional managers' performance should therefore reduce the
differences in managerial evaluations between divisions.

Participants
Two hundred twenty-seven M.B.A. students from four public universities participated
in the experiment."" Eaeh received $20 for taking part. Participants were about halfway

See LS (2000. 289) for details on the exaci scale used in the present study.
Our sample size yielded 32-58 participanls per experimental condition resulting in an ex post estimated statistical
power (i.e., the probability that our tesls will yield statistically signilicant results) of approximately 0.70 (Neler
et al. 1985; Cohen 1988).

The Accounting Review. October 2004


Effects of Assurance and Process Accountability on Managerial Jiulgmetu 1085

through their M.B.A. program and all were enrolled in a graduate-level tlKIQ!^|jBment ac-
counting course in which they were exposed to the basic concepts ot the SSC. Table 3
includes descriptive statistics about the experimental participants.
The levels ol" experience and knowledge ptissessed by participants in management ac-
counting research are problematic in situations where internal data representations may
affect experimental results. " However, we believe that our participants are appropriate for
several reasons. First, participants are drawn from a subject pool similar to LS (2000).
allowing for a legitimate comparison of findings across studies. Second, there is no reason
to believe a priori that motivational concerns and/or data-quality perceptions would be
different for more experienced or knowledgeable subjects. Third, studies that examine sta-
bility of measures included in BSCs over time indicate that new measures enter and old
measures leave BSCs on a regular basis (Ittner et al. 2(K)3; Malina and Sello 2(H)I). Results
in the acc()unting expertise literature (e.g.. Bonner and Walker 1994) suggest such change
would inhibit the development of internal data representations even for more experienced
BSC participants. Finally, participants with extensive BSC backgrounds could bring a va-
riety of firm-specific experiences in performance evaluation to the experimental setting
leading to an increase in experimental noise. Thus, while a priori we believe there is no
reason to believe that our participant group is inappropriate for purposes of testing our

TABLE 3
Descriptive SlalLstics of M.B.A. Students Partieiputing in Kxperimcnt
<n = 227)

Mean (Standard Deviation)


Age (years) 29.2 (4.9)
Full-time work experience (years) 5.8 (4.3)
Gender—^nuinher (percentage) male 151 (66.5)
Number of Participantji %
Work Experience
Accounting and Finance 4% 21.1
Marketing 45 19.8
General Management 39 17.2
Engineering 27 11.9
Other 58 25.6
None 10 4.4
Total 227 100.0%

Number of Participants %
M.B.A. Functional Academic Emphasis:
Accounting and Finance 100 44.0
General Management 38 25.6
Marketing 49 21.6
Other 20 8.8
Total 227 100.0%

LS <2000. 295-2%) include an exteasive discussion of who might be knowledgeable and/or expen in the use
of tht; BSC lor performance cvaluutions given concerns that are commonly raised about cxpcrimentiil paiticipanis*
differential experience or knowledge.

The Accounting Review, October 2004


1086 Ubby. Salterio. and Webb

hypotheses, we employ data on participants' background (work experience, academic back-


ground, current emphasis of study) to control for any cross-sectional differences.

V. RESULTS
Manipulation Checks
Manipulation checks indicate participants recognize that the two divisions empkiy dif-
ferent performance measures {p < 0.01) and sell lo different markets {p < O.Ol). Partici-
pants also believe different performance measures are appropriate for the divisions (p
< O.OI). Participants who receive an assurance report believe all the performance measures
are more relevant and reliable than tho.se who do not receive such a report (p < 0.01)."*
There are no differences across experimental treatments in ease of understanding, case
difficulty, and case realism (all p > 0.50). Finally, university affiliation and interactions of
affiliation with the manipulated variables do not affect the results.
The unadjusted ceil means are presented in Table 4. The common measures bias in the
control condition, calculated as the unadjusted mean difference between the evaluations of
RadWear and WorkWear, is 635 (std. dev. = 9.S4). Thi.s difference replicates the LS (2000)
common measures bias as the difference is both significantly greater than zero (p < 0.01)
and not significantly different from the bias of 7.12 documented by LS (2000) for this
condition (all p > 0.50 using both parametric and nonparametric statistics).'''

Tests of HI and H2
We use a 2 X 2 X 2 ANCOVA to test HI and H2. Participants' type of prior work
experience (accounting/finance or other) and measured concems about fairness are em-
ployed as covariates (see Table 5, Panel A for adjusted means).-" As seen in Table 5. Panel
B. the main effects for the justification requirement and the assurance report are not sig-
niticant (respectively p > 0.37 and p > 0.34); however, the interaction of the justification
and assurance manipulations is significant (p < 0.05) as are both covariates (p < 0.05).
Panel C of Table 5 graphically depicts the interaction form and reveals that each treatment
either alone or in combination leads to greater use of unique measures in the performance
evaluations (i.e.. a .smaller difference between evaluations) than in the control condition.
To investigate these findings further, we perform several comparisons of the differences
in adjusted cell means. First, we use one-tailed pairwise tests to compare performance
evaluation scores in the control condition to those in each of the three treatment conditions.
As reported in Table 5, Panel D, we lind that the reduction in the difference .scores is at
least marginally significant (all p ^ 0.10) in the three comparisons indicating greater use
of the unique measures by participants receiving the experimental treatments. Second, we

'" Over 90 percent of ihe participanls answered a recall question correctly abt>ut whether they were required to
write a ju.stilicati(jn lor tlieir performance evaluations. Over SO percent correctly recalled receiving an assurance
reptin. We include all participants in our analysis because the inability to recall whether ihc assurance report
was present or justilication was required d()es nol affect the fact that they were manipulated. Eliminating par-
ticipants with incorrect recall results in a similar pattern nl" the adjusted means as those reported in the text.
''* In carrying oui our statistical analysis of the control condition, we found four outliers among the .'S6 participants.
These participant.s seem to be subject to a pronounced tiniqiu- mciisures bias as their average score wa.s - I3.2.'i
(i.e., WorkWear evaluation exceeded RadWear evaluation). Inclusion ol" these participants weakens, bul does not
change the inferences that arc drawn from our statistical tests.
-" Other control variables were also included in the analysis and found to be nonsignificant (i.e.. years of full-time
work experience, area of academic emphasis, familiarity witb Ihe BSC. and years of accounting and/or finance
experienee).

The Accounting Review. October 2004


Effects of Assurance and Process AccountahiUty on Managerial Judgment 1087

TABIK 4
Managers' Unadjusted Mean Perfurmance Kvaluatiuns and DifTerences

Mean Evaluations (standard deviations) of the Performance of RadWear and WorkWear


Divisions' Managers by t^xperimental Condition and Tests of Mean DilTerences from Zero'-''
Written Justincalion of Performance
Evaluations
Assurance Report Not Required Required
RadWear 78.37 76.43
(7,90) (9.70)
WorkWear 72.02 74.31
(9.26) (10.65)
Not provided Difference:
•^ RadWear-WorkWear 6.35 2.12
t-statistic and p-value I = 4.65 p < O.OI t = 1.32 ns
associaled with n = 52 n = 58
significance of
difference from " 0 " tesi
74.83 75.13
RadWear (10.75) 19.26}
73.10 71.47
WorkWear (10.17) (11.19)
Provided DitTcrcnce:
RadWear-WurkWear 1.73 3.66
t-stalislic and p~value t = 1.27 m t = 2.58 p < 0.05
asstxriated with n = 58 n = 55
significance of
difference from " 0 " test
• Perfoniiance evaluaiions art; made using a lOI-point scale, with 0 lahcled "Reassign" and I(K) labeled
"Excellcm" with tivc olhcr ijiialilalivt: Jahcl.*; al approximately equal inlcrvals. See LS (2(KX)l.
''In all cxpcriincnlal coiidiliun.s |X'irtiriiiani.e on commun measures tavon. RadWear, while perlormancc on
unii|U(.- measures favors WtirkWcar Meiiire. a statisEically signiticani positive difference from /cm m the
RadWcar-WorkWcar ptrfoniiaiKt: L-\aluatiiins cakulation indicates a conim<in measures bias.

use two-tailed pairwise tests to compare ihe adjusted means of the three treatmenl condi-
tions. We (ind no significant differences (all p > 0.10) between ihe three treatment con-
ditions (result.^ not tabulated). Third, we compare the average of the cell means differen-
ces for the three treatment conditions to the control condition mean difference. As .seen in
Table 5. Pane! D. the average of the three treatment condition means is signilicantly lower
than the control condition mean (p < 0.05). The.se results indieate that the requirement to
justify and/or the provision of an assurance report reduee ihe common measures bias.
We also examine the relationship between each covariate and the performance evalu-
ation differences. Perlbrmance evaluation difference scores for participants with accounting
and linance related work experience (adjusted mean dilferenee = .43) are signihcantly lower
(p < O.O.'i) than for those participants with other types of work experience (adjusted mean
difference = 4.19).-' This result is consistent with suggestions that internal data represen-
tation.s matter in performance evaluations. However, the work experience variable does not

The magnitude of this diflercnte did not differ stalisUeally aemss ihe four cond!tion.s.

The Accounting Review. October 2004


1088 Libby, Salterio, and Webb

TABLE 5
Analysis of Managers' Adjusted Mean Performance Evaluation Differences

Panel A: Adjusted Means (standard errors) for Differences between Managers' Evaluations of
the Performance of RadWear and WorkWear Division Managers"
Written Justification of Performance Evaluations
. „ {JUSTIFY)
(ASSURE) Not Required Required
6.27 (1.47) 1.92(1.40)
None provided n - 52 n = 58
M-l.Kconlm])
2.02(1.40) 3.63 (1.43)
Provided n - 58 n = 55
M-2.1

Panel B: ANCOVA on the Differences in Managers' Performance Evaluations by Condition''•''


Sum of p-value
Factor df Squares F (two-tailed)
ASSURE 1 89.30 .79 0.37
JUSTIEY 1 102.69 .91 0.34
ASSURE X JUSTIEY 1 493.95 4.38 0.04
Work experience 1 474.90 4.21 0.04
Fairness 1 551.97 4.89 0.03
Error 217
Panel C: Graphical Depiction of Effects of Assurance Report and Justification Requirement
on Differences in Managers' Performance Evaluations'*

No Assurance report
• • • Assurance report

No Justification Justification
(continued on next page)

The Accounting Review. October 2004


Effects of Assurance and Process Accountability on Managerial Judgment 1089

TABLE 5 (continued)

Panel D: Simple Effects Analysis Comparin}> Performance Evaluation Differences among the
Various (.'onditions and the Resultant Contrast Tests and p-values'
Mean Differences (Standard Errors) Tested (see Panel A) p-value
ji,., - ti,, - 4.35 (2.03) p < 0.04
M-i.i - M-:.i = -^^S (2.03) p ^ 0.04
P-..I - M-2.: ^ 2.64 (2.06) p S 0.10
(M-I.Z + JJL2,i + JA,_2)/3 - t i , , = 3.74 (1.69) p ^ 0.03

' The cell means are calciitalcd using llio regression ;ippm;ich (Ncicr ci ;il. 1985) to estimate values for the
dependent variable (RadWear-WurkWeur) adjusted Ibr dilfcrences in participants' work exprcriencc (accounting
and finance versus all other) and their perceptions of fairness. Analysis shows the assumplinn of homogeneity
of regression slopes for the control variables is satisfied.
^ Definition of variables included in the ANCOVA: (1) Independent variables: ASSURE = assurance repon on
relevance and reliability of BSC measures and results (provided, none provided): JUSTIFY = written
justificalion of pertonnance evalualions (required, not required); (2) Control variables: Work experience
= accounting/Hnance. other/none; Haimess = participants' agreement with statement "to provide a fair
pedbnnance evaluation for each manager, it was necessary to compare the performance of RadWear to
WorkWear" (11 pt»int scale)- (3) Dependent variable: the difference between participants' evaluations of ihe Iwo
divisions (RadWear-WorkWear) adjusted for the control variables.
' Untabuiatcd ANCOVA tests show no higher order interactions are significant among the variables in Panel B.
'' The values used to construct the graph are the adjusted means presented in Panel A,
'• Probabilities are calculated using one-tailed tests.

interact with our treatment variables. We find the correlation between the fairness proxy
and the performance evaluation dilferences is positive and significant (r - .14, p < 0.05).
Consistent with equity theory, participants who believe it is fairer to compare the two
divisions in preparing their evaluations tend to rely more (»n the common measures.'-^

Further Analysis of the Results in the Combined Treatment Condition


Analysis of the unadjusted tiieans in Table 4 indicates that the unadjusted mean in the
joint treatment condition, unlike the assurance-only and justification-only conditions, is
significantly different from zero (p < 0.05), implying that a significant common measures
bias continues in this condition. Hence, in light of this unexpected result, we compare the
contents of the justification menios in the justification-only condition to those in the joint
treatment condition.-' Participants in the justification-only condition indicate they employ
all performance measures in making their evaluations significantly more often {p = 0.05)
than participants in the joint treatment condition.
We also reanalyze responses to the tnanipulation check qtjestion about assurance (i.e.,
"The BSC measures are relevant and reliable"). In the assurance-only condition, the re-
ported mean for the BSC measures" relevance and reliability is significantly greater than in
the other three conditions, and significantly greater than zero, the scale's neutral tnid-point
(all p < 0.05). In the other two tteatment conditions and the control condition, the means
are not significantly different from zero or from each other (alt p > 0.10).

'- Neither fairness concern scores nor the strength of their association with the performance evaluation scores
dilVers statistically across conditions. Therefore, we rule out the possibility that our justification manipulation
created greater concern for preparing fair evaluations that could he justified to a superior.
' ' Two independent cinlers who were hlind as to the participant treatment condition and the hypotheses under
study ctxled the justilication meinos. The two coders met and reconciled by consensus any differences in their
codings.

TTie Accounting Review, October 2004


1090 Libby. Salterio, and Webb

Overall, this additional analysis indicates that the combined treatment leads to a dif-
ferent cognitive representation of the information presented relative to the assurance-only
and justification-only condition. However, we did not gather data that would explain why
the cognitive representations differ. Accordingly, we leave further examination of the cause
of this unexpected result to future research.

Reliability Assurance and BSC Commitment


Participants in LS (2000. 292) consider all measures, both common and unique, to be
equally relevant to the evaluation of managerial performance. Krumv^iede et al. (2002)
replicate that manipulation check with a panel of 24 academics and experienced perform-
ance evaluators and find similar results.-* Hence, the assurance report likely provides as-
surance over measure reliability, not relevance and reliability.
To investigate this conclusion, we replicate our assurance-only condition and the control
condition using 54 M.B.A. students with backgrounds similar to those employed in the
main experiment. We remove all references to "relevance of performance measures" from
the assurance report and the case text; otherwise, the instrument is unchanged. Similar to
the main experiment's results, we find a decrease in the common measures bias (p < 0.08).'^
Further, participants indicate that all measures are significantly more reliable in the assur-
ance condition than in the control condition (p < 0.04).
Alternatively, the provision of an assurance report over the BSC may signal to partic-
ipants that management is more committed to the BSC in the assurance condition than in
the control condition. Hence, we ask the participants in this follow-up experiment their
perceptions of senior management's commitment to the BSC. We find no significant dif-
ference between the two conditions (p > 0.30).^'^

VI. LIMITATIONS AND CONCLUSIONS


We replicate the common measures bias documented by LS (2000) and demonstrate
that invoking process accountability via the requirement for managers to justify their eval-
uations to superiors or providing an assurance report over the BSC increases managerial
use of unique measures. Both interventions can be implemented in managerial practice,
subject to organizational circumstances including cost-benefit trade-offs.
Our experimental design has limitations. First, our experimental participants were not
involved in the design of the units' scorecards. Participants may gain a better appreciation
for the measures if they are involved in their selection. However, It is impossible for all
managers in any reasonable-sized organization to be involved in the selection of BSC
measures. Second, although all our participants were familiar with Kaplan and Norton
(1992). they were likely novices in the use of the BSC. Like most managerial groups.

-•* Neiihcr LS (2000) nor Krumwiede et al. (2002) examined whether all measures were considered reliable.
" The smaller sample size of ihc additional experiment reduces slalisiical power lo about 0.45.
^'' After the 26 participants in the control condition completed their performance evaluations, we drew their attention
to the pattern of results for the common and unique measures (i.e., common iiivoring one division and unique
favoring the other). We then asked Ihem if they wished to change ihcir performance evaluations based on this
additional information. Of the 24 subjects who answered the question. 17 indicated that they would make no
change in their evaluations and seven said they would evaluate the managers equally. Hence, it does not appear
that ignoring the unique measures is a simple judgment inconsistency caused by an unintended inlormation
processing error easily remedied by drawing participants' attention to the measures" paltem (Tan et al. 2002,
240).

The Accounting Review. October 2004


Effects of Assurance and Process AccounlabHity on Managerial Judgment 1091

participants also have heterogeneous backgrounds resulting in differences in work experi-


ence, industry experience, and educational background. Addressing the educational back-
ground issue, Dilla and Steinbart (2004) find that undergraduate accounting and MIS stu-
dents with in-depth BSC classroom experience use mainly con:imon performance measures
with some small but significant, reliance on unique measures. Hence, consistent with the
findings for our accounting/finance work experience control variable, there is some evi-
dence that internal data representations effect manager's use of diverse BSC measures.
However, it is probably unreasonable to suggest that all managers have in-depth accounting
education and/or accounting/finance work experience before making BSC-based mana-
gerial performance evaluations.
Third, our experiment took place in a one-period environment. If our participants had
multiple periods to obtain familiarity with the BSC's measures, then results might have
differed. In practice, however, divisions often change their BSC measures on a regular basis
due to changes in strategy or the availability/development of new and "better" measures
(Malina and Selto 2001) and measures often differ across divisions. Hence, managers may
frequently be in the position of having to deal with new and unfamiliar measures due to
new measure introduction or transfer/promotion to a new division.
Fourth, a manager in our setting following a strategy of weighting particular common
measures as relatively more important than unique measures (i.e., does not ignore unique
measures but consciously chooses not to place much weight on them) could produce per-
formance evaluations that mimic the common measures bias. A necessary condition for this
individual level analysis to generalize to the group of evaluating managers in our experiment
is that the each member of the group employs a similar weighting strategy.
This study contributes to the literature by examining ways of increasing the information
set managers' use to evaluate divisional performance by incorporating both common and
unique BSC performance measures. This research provides the first evidence that an as-
surance report over the BSC is useful in managerial decision making, which, until this
study, was an open research question (Maines et al. 2002. 359). Results suggest that auditing
and assurance regulators, standard setters, and public accounting firms and their clients may
wish to continue to examine the nature and value of assurance reports in the area of
performance measurement. Our results also imply that senior management should require
divisional managers to justify their performance evaluations when employing BSCs con-
taining both common and unique performance measures. Either of these approaches in-
creases the likelihood that managers will use all relevant information contained in the BSC
in making performance evaluation judgments.

REFERENCES
Adams, J. S. 1965. Inequity in social exchange. In Advances in Experimental Social Psychology,
Volume 2, edited by L. Berkowitz. 267-299. San Diego, CA: Academic Press.
Antle, R., and A. Smith. 1986. An empirical investigation of relative performance evaluation of
coqjorate executives. Journal of Accounting Research 24 (I): 1-39.
Arvey. R. D.. and K. R. Murphy. 1998. Performance evaluation in work settings. Annual Review of
Psvchology A9: 141-168.
Banker, R. D., and S. M. Datar. 1989. Sensitivity, precision and linear aggregation of signals for
performance evaluation. Journal of Accounting Research 27 (I): 21-40.
, H. Chang and M. Pizzini. 2004. The balanced scorecard: Judgment effects of performance
measures linked to strategy. The Accounting Review 79 (1): 1-23.

The Accounting Review. October 2004


1092 Libby, Salterio, and Webb

Blackwell, D. W.. T. R. Noland, and D. B. Winters. 1998. The value of auditor assurance: Evidence
from loan pricing. Journal of Accniinting Research (Spring): 57-70.
Blutn, M. L., and J. C. Naylor. 1968. Industrial Psychology: Its Theoretical and Social Foundations.
New York. NY: Harper and Row.
Bonner, S. E.. and B. L. Lewis. 1990. Determinants of auditor expertise. Journal of Accounting
Research 28 (Supplement): 1-20.
. J. Davis, and B. R. Jackson. 1992. Expertise in corporate tax planning: The issue identification
stage. Journal of Accounting Research 30 (Supplement): 1-28.
. and R Walker. 1994. The effects of instruction and experience on the acquisition of auditing
knowledge. The Accounting Review 69 (I): 157-178.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Eribaum
Associates.
Datar. S. M., S. C. Kulp, and R. A. Lambert. 2001. Balancing performance measures. Journal of
Accounting Research 39 (1): 75-93.
Dawes, R. 1979. The robust beauty of improper linear models in decision making. American Psy-
chologist 34 (7): 571-582.
Dearman. D. T , and M. D. Shields. 2001. Cost knowledge and cost-based judgment performance.
Journal of Management Accounting Research 13: 1-19.
Dilla, W.. and P. Steinbart. 2005. Relative weighting of common and unique balanced scorecard
measures by knowledgeable decision makers. Behavioral Research in Accounting (forthcoming).
Dye, R. A. 1992. Relative performance evaluation and project selection. Journal of Accounting Re-
search 30 (1): 27-51.
Edwards, M. R. 1989. Making performance appraisals meaningful and fair. Business (Jul-Sep): 17-
25.
Feltham. G. A., and J. Xie. 1994. Performance measure congruity and diversity in multi-task principal/
agent relations. The Accounting Review 69 (3): 429-453.
Foiger, R., and R. Cropanzano. 1998. Organizational Justice and Human Resource Management.
Foundations for Social Science Series. Thousand Oaks, CA: Sage Publications.
Frederickson. J. R. 1992. Relative performance information: The effects of uncertainty and contract
type on agent effort. The Accounting Review bl (4): 647-669.
Hastie, R.. and R. M. Dawes. 2001. Rational Choice in an Uncertain World. Thousand Oaks, CA:
Sage Publications.
Hemmer, T 1996. On the design and choice of "modern" management accounting tneasures. Journal
of Management Accounting Research 8: 87-116.
Heneman, R. L. 1986. The relationship between supervisory ratings and results-oriented measures of
performance: A meta-analysis. Personnel Psychology 39 (4): 811-827.
Holmstrom, B. 1979. Moral hazard and observability. Bell Journal of Economics 10 (1): 74-91.
— . 1982. Moral hazard in teams. Rand Journal of Economics 13 (2): 324-341.
. and P. Milgrom. 1991. Multitask principal-agent analyses: Incentive contracts, asset owner-
ship, and job design. Journal of Law, Economics, and Organization 7: 24-52.
Hsee, C. 1996. The evaluability hypothesis: An explanation for preference reversals between joint
and separate evaluations of alternatives. Organizational Behavior and Human Decision Proc-
esses 67 (3): 247-257.
Ittner, C , and D. Larcker. 2001. Assessing empirical research in managerial accounting: A value
based management perspective. Journal of Accounting and Economics 32: 349-410.
, and — - — . 2003. Coming up short on nonfinancial performance measurement. Harvard
Business Review 81 (11): 8 8 - 9 5 .
and M. Meyer. 2003. Subjectivity and the weighting of performance measures:
Evidence from a balanced scorecard. The Accounting Review 78 (3): 725-758.
Kaplan, R., atid D. Norton. 1992. The balaticed scorecard—Measures that drive performance. Han'ard
Business Review (January-February): 71-79.
— -, and . 1993. Putting the balanced scorecard to work. Hanard Business Review (Sep-
tember-October): 134-147.

The Accounting Review, October 2004


Effects of Assurance and Process Accountability on Managerial Judgment 1093

, and D. . 1996a. The Balanced Scorecard: Translating Strategy Into Action. Boston.
MA: Harvard Business School Press.
. and . 1996b. Linking the balanced scorecard to strategy. California Management
Review 1,9 {Xy. 53-79.
-. and . 2001. The Strategy-Eocused Organization: How Balanced Scorecard Companies
Thrive in the New Business Environment. Boston, MA: Harvard Business School Press.
Kennedy, J. 1993. Debiasing audit judgment with accountability: A framework and experimental
results. Journal of Accounting Re.search 31 (2): 231-245.
. 1995. Debiasing the curse of knowledge in audit judgment. The Accounting Review 70 (2):
249-273.
Krumwiede, K. R., M. R. Swain, and D. L. Eggett. 2002. The effects of feedback, prior experience,
and division similarity on the utilization of unique performance measures in multi-division
evaluations. Working paper. Brigham Young University.
Kurtz, K. J., C. Miao, and D, Gentner. 2001. Learning by analogical bootstrapping. Journal of the
Learning Sciences 10 (4): 4 1 7 ^ 4 6 .
Lemer, J. S., and P. E. Tetlock. 1999. Accounting for the effects of accountability. Psychological
Bulletin 125 (2): 255-275.
Leventhal. G. S., J. Karuza Jr.. and W. R. Fry. 1980. Beyond fairness: A theory of allocation pref-
erences. In Justice and Social Interaction, edited by G. Mikula, 167-218. Bern, Switzerland:
Hans Huber Publishers.
Libby, R. 1979. Bankers' and auditors' perceptions of the message communicated by the audit report.
Journal of Accounting Research 17 (Spring): 99-122.
, and W. R. Kinney. 2000. Earnings management, audit differences, and analysts' forecasts.
The Accounting Review 75 (4): 393-404.
, R. Bloomfield, and M. W. Nelson. 2002. Experimental research in financial accounting.
Accounting. Organizations and Society 27 (8): 775-810.
Lingle, J. H., and W. A. Schiemann. 1996. From balanced scoreeard to strategic gauges: Is measure-
ment worth it? Management Review 85 (3): 56-61.
Lipe, M. G.. and S. E. Salterio. 2000. The balanced scorecard: Judgmental effects of common and
unique performance measures. The Accounting Review 75 (3): 283-298.
Malnes. L.. E. Bartov, P. M. Fairfield, D. E. Hirst, T E. lannaconi, R. Maliett. C. M. Schrand. D. J.
Skinner, and L. Vincent. 2002. Recommendations on disclosure of nonfinancial performance
measures. Accounting Horizons 16 (4): 353-362.
Malina, M. A., and F. H. Selto. 2001. Communicating and controlling strategy: An empirical study
of the effectiveness of the balanced scorecard. Journal of Management Accounting Research
13: 47-90.
Markman, A. B., and D. L. Medin. 1995. Similarity and alignment in choice. Organizational Behavior
and Human Decision Processes 63 (2): 117-130.
Mautz, R.. and H. Sharaf. 1961. The Philosophy of Auditing. Madison, Wl: American Accounting
Association.
McNamara. H., and R. Fisch. 1964. Effect of high and low motivation on two aspects of attention.
Perceptual and Motor Skills 19: 571-578.
Mero, N. P., and S. J. Motowidlo. 1995. Effects of rater accountability on the accuracy and the
favorability of performance ratings. Journal of Applied P.sychology 80 (4): 517-525.
Neter, J., W. Wasserman, and M. H. Kutner. 1985. Applied Linear Statistical Models. Homewood, IL:
Irwin.
Reck, J. L. 2(X)I. The usefulness of financial and nonfinancial performance information in resource
allocation decisions. Journal of Accounting and Public Policy 20: 45-71.
Siegel-Jacobs, K., and J. F Yates. 1996. Effects of procedural and outcome accountability on judgment
quality. Organizational Behavior and Human Decision Processes 65 (1): 1--17.
Simonson, I., and B. Staw. 1992. De-escalation strategies: A comparison of techniques for reducing
commitment to losing courses of action. Journal of Applied Psychology 11 (4): 419—427.

The Accounting Review. October 2004


1094 Libby, Salterio, and Webb

Slovic, P., and D. MacPhillamy. 1974. Dimensional commensurability and cue utilization in compar-
ative j u d g m e n t . Organizational Behavior and Human Performance I I : 172-194.
Tan. H-T, and A. Kao. 1999. Accountability effects on auditors" performance: The influence of knowl-
edge, problem-solving ability, and task complexity. Journal of Accounting Research 37 (I):
209-223.
, R. Libby, and J. Hunton. 2002. Analyst's reactions to earnings preannouncement strategies.
Journal of Accounting Research 40 (1): 223-245.
Tetiock, P. E. 1985. Accountability: The neglected social context of judgment and choice. Research
in Organizational Behavior 7: 297-332.
-, and R. Boettger. 1989. Accountability: A social magnifier of the dilutioti effect. Journal of
Personality and Social Psychology 51 (3): 388-398.
Vera-Mufioz. S. C , W. R. Kinney Jr., and S. E. Bonner. 2001. The effects of domain experience and
task presentation format on accountants* information relevance assurance. The Accounting Re-
view 76 (3): 405-429.
Yim, A. T. 2001. Renegotiation and relative performance evaluation: Why an informative signal may
be useless. Review of Accounting Studies 6: 77-108.
Zhang, S., and A. B. Markman. 2001. Processing product unique features: Alignability and involve-
ment in preference construction./owrna/«/Co/Lmme-r/'.v_y(.7io/(7g_>' 11 (1): 13-27.

The Accounting Review, October 2004

You might also like