Enhancing Managerial Judgment with BSC
Enhancing Managerial Judgment with BSC
We thank Linda Thotne, Kate Bewley, Howard Teall. Claude Lanfranconi. Joan Conrad, Jason Lee, and Peter
Tiessen tor help in obuiining participants. We thank, Bryan Cloyd. Susiin Krische. Bill Messier, and Natalia Kotch-
etuva for their insightful commenis on previotjs versions of the paper. In addition, we thank accounting research
workshop participanis al the Richard Ivcy School of Business, the University of Western Ontario. Concordia
University (Montreal), University of Toronto, Queen's University, Virginia Polytechnic & State University, Georgia
Slale University, and the University of Waterloo, the Accounting Brownbag seminar at the University of Illinois,
and the Social Psychology Brownbag seminars at the University of Waterloo and the University of Illinois, We
thank conference participants at ihe AAA Audit Section Midyear Meeting, the AAA MAS Section Annual Con-
ference and the AAA Annual Meeting, especially our discussants Anne Farrell, Monte Swain, and Wendy Bailey.
We thank Jane Kennedy (associate editor) and two anonymous reviewers for many helpful suggestions. Finally,
we thank Guoping Liu and Rawan El-klatib for excellent research assistance. We are grateful to the Canadian
Institute of Chartered Accountant's Academic Development Fund and the UW/SSHRC for project funding.
Editor's note; This paper was accepted by Jane Kennedy, Editor,
Submitted March 2003
Accepted March 2004
1075
1076 Libby, Salterio, and Webb
I. INTRODUCTION
T
he balanced scorecard (BSC) is a performance measurement tool used to translate
an organization's strategic goals into financial and nonfinancial objectives and per-
formance measures (Kaplan and Norton 2001). In diversified organizations, individ-
ual business units may face different competitive pressures, operate in different product
markets, and may therefore require different divisional strategies (Kaplan and Norton 1993).
Consequently, business units may develop customized scorecards to fit their unique situa-
tions within the context of the overall organizational strategy (Kaplan and Norton 2001).
Customized scorecards allow business unit managers to identify and set targets for the
business unit's own unique set of performance measures. Although business units within a
company may have several BSC measures in common, the unique measures represent what
individual business units must accomplish in order to succeed (Kaplan and Norton 1996a).
Kaplan and Norton (2001) provide several anecdotal examples of organizations that
have successfully implemented customized divisional scorecards. Lipe and Salterio (2000)
(hereafter LS 2000), however, find divisional performance evaluations reflect information
contained in the measures that are common across divisions, while failing to incorporate
the information contained in the unique performance measures. LS (2000) attribute their
results to the common measures bias. Such a bias can be problematic in this setting because
relevant measures that are ignored in post hoc performance evaluations are unlikely to be
attended to in ex ante decision making (Holmstrom and Milgrom 1991; McNamara and
Fisch 1964).
Based on results of prior analytical research on performance measurement, we argue
that managers should use both the common and unique BSC measures in performance
evaluation. This research demonstrates that non-zero weights should be attached to all
performance measures that: (I) are sensitive to a manager's actions; (2) ean be measured
with some precision; (3) are not perfectly correlated with other measures in the system;
and (4) lead to outcomes desirable to the owners of the firm (Banker and Datar 1989;
Feltham and Xie 1994; Hemmer 1996; Datar et al. 2001). Although the literature offers no
prescriptions concerning the relative weighting of performance measures, the general con-
clusion is that zero weights are inappropriate (Ittner et al. 2003) unless the data quality is
sufficiently poor (Yim 2001).
Research in both performance evaluation and consumer behavior suggests more effort
is required to perform absolute versus relative evaluations of alternatives (e.g.. Heneman
1986; Zhang and Markman 2001). Since participants in LS (2000) had only intrinsic in-
centives (i.e., a personal desire to do well on the task) when preparing the evaluations, they
may have defaulted to the less effortful approach of comparing the performance of the two
divisions on the common measures. Thus, the source of the bias may be motivational or
effort-related.
According to the organizational psychology research, performance evaluators tend to
place less weight on measures considered to be less reliable (e.g., Blum and Naylor 1968).
Survey results suggest that corporate executives have concerns over the quality of nontra-
ditional performance measures used in business unit scorecards (Lingle and Schiemann
1996; Ittner and Larcker 2001). It is therefore possible that managers ignore the unique
BSC measures because of concerns about their quality (e.g., reliability or relevance) (Ittner
and Larcker 2003; Yim 2001).
To address the possibility that the common measures bias is effort-related, we examine
the effects of imposing process accountability by requiring managers to justify their per-
formance evaluations to their superior' To address the possibility that the bias relates to
data-quality concerns, we examine the effects of providing a third-party assurance report
over the BSC measures. In principle, organizations can adopt both of these bias reduction
approaches subject to firm-specific cost/benefit considerations.
This study contributes to the literature in at least three ways. First, we examine ways
of improving managers' BSC-based performance evaluation judgments by encouraging the
incorporation of both common and unique performance measures, which we argue is ap-
propriate in many managerial settings. Second, we investigate the effects of providing an
assurance report about business performance measures on managers' use of those measures
(Reck 2001). Maines et al. (2002, 359) comment. "Whether attestation services or other
forms of reliability enhancement could affect the quality of nonfinanciai measure reporting
remains largely unexplored." The AICPA and the Canadian Institute of Chartered Account-
ants (CICA) have recently developed a combined training and software program to allow
CPAs and CAs the opportunity to develop additional expertise in nontraditional performance
measures.- Once CPA/CA credentials are established in this practice area, standards more
specific than the current broad assurance and attestation standards may be issued.' Through
development of the form and content of the assurance report, in cooperation with experts
from the CICA, we provide insights about the impact of this potential assurance service.^
Third, we extend the study of debiasing to the information quality domain, adding to the
accounting research literature on effort-related and internal data-related (i.e., knowledge)
debiasing techniques (Kennedy 1993, 1995).
We find that the provision of an assurance report on the BSC and/or the requirement
for process accountability increases the use of unique measures in performance evaluations.
This shift to greater use of the unique measures is consistent with both the analytical
literature on the weighting of performance measures and BSC proponents' assertions con-
cerning the appropriate use of the scorecard information.
The remainder of this paper is organized as follows. First, we describe the common
measures bias, review literature concerning the source of this bias, and then hypothesize
about approaches to reduce the bias. Next, we describe the experiment and the results. We
conclude with a discussion of our results and study limitations.
Although the justification of perfomiance evaluations to a subordinate is the norm (e.g,, Edwards 1989), justi-
fication of performance evaluations to superiors and even peers also occurs (e.g,. Inner et al. 2(H)3),
See http://www,aicpa.org/assurance (accessed July 30. 2003) for material abotit CPA/CA Pertbrmance Views,
a product designed for CPA/CAs to help clients develop balanced scorecard-like performance measurement
systems.
Section .^025 of the CICA Handbook. SSAE No. 10 in the U,S. and International Standard on Assurance
Engagements (ASA) ISA No, 10 all outline assurance standards for all types of assurance engagements broad-
ening previous standards that applied to assurance on financial statements only.
This approach is consistent with calls in accounting research to provide ex anie evidence about the benehts and
costs of potential new standards (e,g., Libby and Kinney 2000: Libby et al, 2002).
information and place substantially less weight on the unique information. They label this
effect the "common measures bias."
Slovic and MacPhillamy (1974) provide outcome feedback, improve information qual-
ity, and provide monetary incentives to induce effort on the task, all in an attempt to debias
their participants' judgments. Slovic and MacPhillamy (1974) find that these approaches do
not reduce the common measures bias. It is unclear whether incentives are ineffective
because the bias is not due to lack of effort, the incentives are insufficient to motivate the
requisite effort, or the task is not effort-sensitive. Similarly, participants may not appreciate
the relevance of the outcome feedback or improved information quality because they are
unaware they have underweighted the unique information in the first place.
From a theoretical perspective, performance evaluations exhibiting the common mea-
sures bias may be of poorer quality than evaluations ba,sed on all available BSC measures.
Specifically, Holmstrom's (1979) informativeness criterion indicates that any performance
measure providing information about a manager's actions is valuable as long as some
uncertainty unique to the divisional environment exists. The existence of unique measures
in a BSC implies some degree of distinctiveness in the uncertainty of each division's op-
erating environment suggesting performance evaluation using only the common measures
is inappropriate.^
Subsequent analytical research shows that the relative weights placed on performance
measures depend on the sensitivity (i,e,, the change in a measure's mean value in response
to a change in the manager's action), precision (i.e., the inverse of the variance in the
measure given the manager's action), congruency, and quality of the measures (Banker and
Datar 1989; Datar et al. 2001; Feltham and Xie 1994; Ittner et al. 2003; Yim 2001). Since
these values likely vary from company to company due to different measurement ap-
proaches, specific relative weights or even a range of specific weights cannot be prescribed.
We gain insight into relative weighting of performance measures from psychology
research on improving judgment (e.g., Hastie and Dawes 2001). Results of this research
demonstrate that the quality of judgment improves when managers consider all, as opposed
to only a subset of, relevant factors in making the judgment. These results hold even when
there is a large deviation between the participant's specific factor weights and the "optimal"
weights determined by linear regression (Hastie and Dawes 2001. 62-64; Dawes 1979).
Hence, in the LS (2000) environment, the actual weights employed by managers using the
common and unique measures (versus unknown "optimal" weights) are not as important
as ensuring that all relevant measures are weighted in the performance evaluation judgment.
If the unique measures represent key drivers of performance, can be influenced by man-
agers, are measurable with some accuracy, are not perfectly correlated with other measures,
and are of sufficient quality, they should be used in performance evaluation judgments even
if the optimal weight for each measure is unknown (Ittner et al, 2003),^
From a practice perspective, we assume that performance evaluations encourage man-
agers to pursue strategically congruent actions based on at least some strategies that are
-^ Alternatively, a focus on common tneasurcs may be appropriule if applying relative performance evaluation
practices (RPE) (Holmstrom 1982), Under RPE, strict reliance on the common measures is viewed as an effective
means of filteting out the common uncertainty facing the two divisional managers (Frederickson 1992: Dye
1992: Antle and Smith I986|, Given a divisional organization structure, it is unlikely that the only uncertainty
facing the various business units is common to all; hence, RPE will not be effective in this .setting,
^ Kaplan and Norton (2001) propose a "balanced" weighting scheme across the four BSC quadrants, but they
are silent on the relative weight of measures within each quadrant. However, becau.se Kaplan and Norton (1996b)
place great emphasis on the process employed to select measures that are relevant to divisional strategy, they
would probably not advocate ignoring these measures altogether.
unique to their division. If we further assume that the scorecard has been designed so that
all measures are strategically relevant (e.g., Kaplan and Norton I996a|, then it seems rea-
sonable that both the common and unique performance measures should be considered in
performance evaluation judgments. Therefore, we examine potential causes for the common
measures bias documented by LS (2000) and means of Increasing the use of unique
measures.
Even though participanls in LS (2000) were provided perrormance targets for each unique and c-onimon measure,
their judgments still appear lo have been affected by the common measures bias. This is not surprising since
participants would still need lo deleniiine the relative importance of each unique measure to the overall per-
formance evaluation judgment, a step that Slovic and MaePhillamy (1974) argue is unnecessary for common
measures.
Alihough process accountability can lead lo increased use of irrelevant or nondiagnostic information (Tetlock
and Boettger 1989; Siegel-Jacobs and Yates 1996). in our setting all BSC performance measures are relevant lo
the strategic objectives of a division as indicated by the LS (2000) pre-test results and the Krumwiede et al.
(2002) expert panel assessments.
HI: Managers who are required to justify their performance evaluation judgments will
be more likely to use unique performance measures in their performance evaluation
judgments than managers who are not so required.
'' While it is possible that invoking process accountability will cause e,\cessive weight to be placed on unique
measures (i,e.. a unique measures bias), it is unlikely this would occur in the LS (2000) setting since participants
otherwise (end not to weight the unique factors at all.
'" Kennedy (1993) distinguishes between external data available in the environment and internal data representa-
tions of the decision maker (i,e,. knowledge stored in memory). Due to difficulties in defining BSC knowledge
in our setting and lack of consensus on how to train participants to become knowledgeable (LS 2000). we
partially control for internal data representations using participant's prior work experience and academic
background.
BSC context, third-party assurance signaling the reliability and relevance of the performance
measures included in the BSC could increase the likelihood that managers will assign
weight to those measure.s when making performance evaluation judgments.
Analogous to arguments that an auditor cannot certify only the individual parts of the
financial statements (CICA HB 5510.26), an assurance report must cover the complete set
of performance measures, not just the unique measures (Reck 2001)." Even if an assurance
report increases the perceived quality of all measures, the increase in perceived quality for
the unique measures would move them from a zero weight to some positive weight and
consequently lead to improved judgment (Dawes 1979). This analysis leads to the following
hypothesis about the effect of assurance on divisional managers* use of unique measures
in performance evaluation:
H2: Managers receiving third-party assurance about the quality of the BSC performance
measures will be more likely to use unique performance measures in their per-
formance evaluation judgments than managers nol receiving such assurance.
IV. METHOD
Experimental Task
In this study, we use the experimental case developed by LS (2000). The case requires
participants to assume the role of a .senior executive at WCS Incorporated, a women's
apparel company. The case indicates WCS has recently implemented the BSC and describes
key features of the approach. The two WCS divisions featured are: RadWear. a retail di-
vision that focuses on clothing for urban teenagers; and WorkWear, a division selling busi-
ness uniforms, using a network of sales contacts with business clients. The case provides
information about each division, its management, and describes the strategic objectives of
RadWear and WorkWear. The materials also include a BSC for each division containing
performance measures stated to be relevant to the strategic objectives described in the case.
In their role as WCS executives, participants separately evaluate the performance of
RadWear's manager and WorkWear's manager.
We use the four-category, 16-measure balanced scorecards developed by LS (2000).
Each category on the divisional scorecard contains two measures common to both divisions
and two measures unique to each division. On every measure, each division outperforms
its target, although the percentage by which the target is exceeded differs by type of per-
formance measure (common versus unique). RadWear (WorkWear) exceeds its target
performance on common (unique) measures to a greater degree than WorkWear (RadWear).
However, the sum of excess performance (i.e., total percentage above target) across all
measures (common and unique) is about the same in each division. Table 1 presents
Work Wear's
" In addition, practitioners consulted for this study argued that it would be very difficult for ihem to associate
themselves with only the unique measures on an organization's BSC. Hence, we are increasing the mundane
realism of our expcnniental seiting and biasing against finding an assurance report effect if assurance over all
measures increases equally Ihe weights assigned to ail measures.
'- For more details about the case see LS (2000, 288-290).
TABLE 1
WorkWear's Balanced Scorecard
Targets and Actuals for the Year Ended December 31, 2000
better for the other division, etc.). We decided not to replicate the entire LS (2000) design
to ensure efficient use of our subject pool and because the complete LS (2000) results have
been replicated in several subsequent studies (e.g.. Banker et al. 2004; Dilla and Steinbart
2004). Instead, we examine the setting where performance on common measures favors
one division and performance on unique measures favors the other division. Only this
pattern of results allows for the possibility of the common measures bias because when
performance on both types of measures favors the same division, the conclusion of superior
performance for that division is unequivocal (LS 2000).
From the two LS (2000) versions of the case containing offsetting differences in di-
rection for common versus unique measures, we randomly select the case where the com-
mon measures favored RadWear and the unique measures favored WorkWear. Thus, if
evaluators of RadWear rely on the common measures, they will evaluate RadWear's man-
ager as the better performer (i.e., RadWear evaluation minus WorkWear evaluation will be
significantly greater than zero).
We use a 2 X 2 X 2 design with one within-subjects factor and two between-subjects
factors. The within-subjects factor, consistent with the design used by LS (2000). is the
requirement that participants evaluate one manager from each division.'-^ The first between-
subjects factor is the requirement (or not) that participants justify their evaluation of each
" We do not vary order of division presentation or manager evaluation as LS (2000) find that order effects have
no impact on the experimental variables of interest.
TABLE 2
Auditor's Report on the Relevance and Reliability of the Divisional Scorecard Performance
Measures and Results
Auditor's Report
To the Management of WCS Inc.:
We have audited WCS Inc's disclosure of its Balanced Scorecard Targets and Actuals for each
of its divisions, and the reliability and relevance of the financial and nonfinancial performance
measures presented for each division, for the year ended December 31, 2000. The determination of
the measures to include in each division's scorecard report, and the completeness and accuracy of
the reported results are the responsibility of WCS Inc's management. Our responsibility is to
express an opinion, based on our audit, of the conformity of the performance measures with the
relevance and reliability criteria of the International Performance Measures Reporting Initiative.
Our audit was performed in accordance with standards for assurance engagements established
by the Canadian Institute of Chartered Aceountants. Those standards require that we plan and
perform our audit to obtain reasonable assurance as a basis for our opinion. Our audit included (1)
obtaining an understanding of the strategic objectives and goals of WCS Inc. and each of its
divisions, (2) assessing whether the selected pertbrmance measures relate to the chosen strategy for
each division. (3) assessing the procedures used to produce the reported results, (4) selectively
testing the reported results and (5) performing sueh other procedures as we considered necessary in
the circumstances. We believe that our audit provides a reasonable basis for our opinion.
In our opinion, in all material respects, the financial and nontinancial performance measures
included the Balanced Scorecard Targets and Actuals for each of the divisions of WCS Inc. for the
year ended December 31, 2()00, are relevant, and reliable in accordance with the criteria of the
International Performance Measures Reporting Initiative.
ABC Chartered Accountants
March 3t, 2001
'"* Alerting participants to the justitieation requiremcnl after performing their evaluations would lead to self-
justilication of the decisions already made (Lemer and Tetlock 19^9). Selt-JListification is ineflective as a de-
biasing approach since it does not address the need to increase processing of the unique pertbrmance measures
prior to making evaluation decisions. In addition, since we administer materials during class time to our partic-
ipants, it was not practical to have them accountable to a third party.
Control Variables
Kennedy (1993, 1995) notes that judgments may be biased if decision-makers' internal
data representations (knowledge stored in memory) are not suited to the task they are
performing. Managers develop internal data representations from work experience and/or
academic study. These representations may allow managers to process both the common
and unique BSC measures. Prior accounting research documents the performance improving
effects of work experience and knowledge on a wide variety of accounting judgments (e.g.,
Bonner and Lewis 1990; Bonner et al. 1992; Vera-Mufioz et al. 2()01; Dearman and Shields
2001). Accordingly, we elicit our participants" type and length of work experience and their
academic background and current emphasis of study, recognizing that each may be a some-
what crude proxy for internal data representations. These measured variables are covariates
in our statistical analyses.
Our hypotheses predict the justification and assurance manipulations will increase par-
ticipants' ability to use the information provided by the unique measures, reducing the
degree to which the performance evaluations are comparative in nature (Hsee 1996). Equity
theory (Adams 1965), however, suggests another possible control variable—differential
managerial emphasis on fairness in performance evaluation. According to equity theory,
individuals will judge fairness by comparing the quantity and quality of one divisional
manager's inputs and outcomes relative to the inputs and outcoines of the other manager
(Adams 1965). Common measures are obvious candidates for use in dealing with such
fairness concems as they are readily comparable across managers (Leventhal et al. 1980;
Folger and Cropanzano 1998). Hence, the degree to which participants believe that fairness
is important to their performance evaluations is employed as a covariate in our statistical
analyses. We assess participants' concems about fairness by asking them to respond to this
statement: "To provide a fair performance evaluation for each manager, it was necessary
to compare the performance of RadWear to WorkWear." Responses are measured on an
11-point scale with endpoints of "Strongly disagree" (-5) and "Strongly Agree" ( + 5).
Dependent Variable
Participants evaluate the performance of both the RadWear and WorkWear managers
on a scale from 0 to 100 that had seven descriptive labels ranging from "Reassign" to
"Excellent" performance.''^ Each of the descriptive labels is explained on the evaluation
form. We employ the differences in our participants' evaluations of the two divisions as
our dependent variable. Since RadWear (WorkWear) outperformed WorkWear (RadWear)
on common (unique) measures relative to target, we interpret a positive difference in eval-
uations as indicative of greater reliance on the common measures since the total percentage
above target across all measures is the same in each division. Increased use of the unique
measures in evaluating the divisional managers' performance should therefore reduce the
differences in managerial evaluations between divisions.
Participants
Two hundred twenty-seven M.B.A. students from four public universities participated
in the experiment."" Eaeh received $20 for taking part. Participants were about halfway
See LS (2000. 289) for details on the exaci scale used in the present study.
Our sample size yielded 32-58 participanls per experimental condition resulting in an ex post estimated statistical
power (i.e., the probability that our tesls will yield statistically signilicant results) of approximately 0.70 (Neler
et al. 1985; Cohen 1988).
through their M.B.A. program and all were enrolled in a graduate-level tlKIQ!^|jBment ac-
counting course in which they were exposed to the basic concepts ot the SSC. Table 3
includes descriptive statistics about the experimental participants.
The levels ol" experience and knowledge ptissessed by participants in management ac-
counting research are problematic in situations where internal data representations may
affect experimental results. " However, we believe that our participants are appropriate for
several reasons. First, participants are drawn from a subject pool similar to LS (2000).
allowing for a legitimate comparison of findings across studies. Second, there is no reason
to believe a priori that motivational concerns and/or data-quality perceptions would be
different for more experienced or knowledgeable subjects. Third, studies that examine sta-
bility of measures included in BSCs over time indicate that new measures enter and old
measures leave BSCs on a regular basis (Ittner et al. 2(K)3; Malina and Sello 2(H)I). Results
in the acc()unting expertise literature (e.g.. Bonner and Walker 1994) suggest such change
would inhibit the development of internal data representations even for more experienced
BSC participants. Finally, participants with extensive BSC backgrounds could bring a va-
riety of firm-specific experiences in performance evaluation to the experimental setting
leading to an increase in experimental noise. Thus, while a priori we believe there is no
reason to believe that our participant group is inappropriate for purposes of testing our
TABLE 3
Descriptive SlalLstics of M.B.A. Students Partieiputing in Kxperimcnt
<n = 227)
Number of Participants %
M.B.A. Functional Academic Emphasis:
Accounting and Finance 100 44.0
General Management 38 25.6
Marketing 49 21.6
Other 20 8.8
Total 227 100.0%
LS <2000. 295-2%) include an exteasive discussion of who might be knowledgeable and/or expen in the use
of tht; BSC lor performance cvaluutions given concerns that are commonly raised about cxpcrimentiil paiticipanis*
differential experience or knowledge.
V. RESULTS
Manipulation Checks
Manipulation checks indicate participants recognize that the two divisions empkiy dif-
ferent performance measures {p < 0.01) and sell lo different markets {p < O.Ol). Partici-
pants also believe different performance measures are appropriate for the divisions (p
< O.OI). Participants who receive an assurance report believe all the performance measures
are more relevant and reliable than tho.se who do not receive such a report (p < 0.01)."*
There are no differences across experimental treatments in ease of understanding, case
difficulty, and case realism (all p > 0.50). Finally, university affiliation and interactions of
affiliation with the manipulated variables do not affect the results.
The unadjusted ceil means are presented in Table 4. The common measures bias in the
control condition, calculated as the unadjusted mean difference between the evaluations of
RadWear and WorkWear, is 635 (std. dev. = 9.S4). Thi.s difference replicates the LS (2000)
common measures bias as the difference is both significantly greater than zero (p < 0.01)
and not significantly different from the bias of 7.12 documented by LS (2000) for this
condition (all p > 0.50 using both parametric and nonparametric statistics).'''
Tests of HI and H2
We use a 2 X 2 X 2 ANCOVA to test HI and H2. Participants' type of prior work
experience (accounting/finance or other) and measured concems about fairness are em-
ployed as covariates (see Table 5, Panel A for adjusted means).-" As seen in Table 5. Panel
B. the main effects for the justification requirement and the assurance report are not sig-
niticant (respectively p > 0.37 and p > 0.34); however, the interaction of the justification
and assurance manipulations is significant (p < 0.05) as are both covariates (p < 0.05).
Panel C of Table 5 graphically depicts the interaction form and reveals that each treatment
either alone or in combination leads to greater use of unique measures in the performance
evaluations (i.e.. a .smaller difference between evaluations) than in the control condition.
To investigate these findings further, we perform several comparisons of the differences
in adjusted cell means. First, we use one-tailed pairwise tests to compare performance
evaluation scores in the control condition to those in each of the three treatment conditions.
As reported in Table 5, Panel D, we lind that the reduction in the difference .scores is at
least marginally significant (all p ^ 0.10) in the three comparisons indicating greater use
of the unique measures by participants receiving the experimental treatments. Second, we
'" Over 90 percent of ihe participanls answered a recall question correctly abt>ut whether they were required to
write a ju.stilicati(jn lor tlieir performance evaluations. Over SO percent correctly recalled receiving an assurance
reptin. We include all participants in our analysis because the inability to recall whether ihc assurance report
was present or justilication was required d()es nol affect the fact that they were manipulated. Eliminating par-
ticipants with incorrect recall results in a similar pattern nl" the adjusted means as those reported in the text.
''* In carrying oui our statistical analysis of the control condition, we found four outliers among the .'S6 participants.
These participant.s seem to be subject to a pronounced tiniqiu- mciisures bias as their average score wa.s - I3.2.'i
(i.e., WorkWear evaluation exceeded RadWear evaluation). Inclusion ol" these participants weakens, bul does not
change the inferences that arc drawn from our statistical tests.
-" Other control variables were also included in the analysis and found to be nonsignificant (i.e.. years of full-time
work experience, area of academic emphasis, familiarity witb Ihe BSC. and years of accounting and/or finance
experienee).
TABIK 4
Managers' Unadjusted Mean Perfurmance Kvaluatiuns and DifTerences
use two-tailed pairwise tests to compare ihe adjusted means of the three treatmenl condi-
tions. We (ind no significant differences (all p > 0.10) between ihe three treatment con-
ditions (result.^ not tabulated). Third, we compare the average of the cell means differen-
ces for the three treatment conditions to the control condition mean difference. As .seen in
Table 5. Pane! D. the average of the three treatment condition means is signilicantly lower
than the control condition mean (p < 0.05). The.se results indieate that the requirement to
justify and/or the provision of an assurance report reduee ihe common measures bias.
We also examine the relationship between each covariate and the performance evalu-
ation differences. Perlbrmance evaluation difference scores for participants with accounting
and linance related work experience (adjusted mean dilferenee = .43) are signihcantly lower
(p < O.O.'i) than for those participants with other types of work experience (adjusted mean
difference = 4.19).-' This result is consistent with suggestions that internal data represen-
tation.s matter in performance evaluations. However, the work experience variable does not
The magnitude of this diflercnte did not differ stalisUeally aemss ihe four cond!tion.s.
TABLE 5
Analysis of Managers' Adjusted Mean Performance Evaluation Differences
Panel A: Adjusted Means (standard errors) for Differences between Managers' Evaluations of
the Performance of RadWear and WorkWear Division Managers"
Written Justification of Performance Evaluations
. „ {JUSTIFY)
(ASSURE) Not Required Required
6.27 (1.47) 1.92(1.40)
None provided n - 52 n = 58
M-l.Kconlm])
2.02(1.40) 3.63 (1.43)
Provided n - 58 n = 55
M-2.1
No Assurance report
• • • Assurance report
No Justification Justification
(continued on next page)
TABLE 5 (continued)
Panel D: Simple Effects Analysis Comparin}> Performance Evaluation Differences among the
Various (.'onditions and the Resultant Contrast Tests and p-values'
Mean Differences (Standard Errors) Tested (see Panel A) p-value
ji,., - ti,, - 4.35 (2.03) p < 0.04
M-i.i - M-:.i = -^^S (2.03) p ^ 0.04
P-..I - M-2.: ^ 2.64 (2.06) p S 0.10
(M-I.Z + JJL2,i + JA,_2)/3 - t i , , = 3.74 (1.69) p ^ 0.03
' The cell means are calciitalcd using llio regression ;ippm;ich (Ncicr ci ;il. 1985) to estimate values for the
dependent variable (RadWear-WurkWeur) adjusted Ibr dilfcrences in participants' work exprcriencc (accounting
and finance versus all other) and their perceptions of fairness. Analysis shows the assumplinn of homogeneity
of regression slopes for the control variables is satisfied.
^ Definition of variables included in the ANCOVA: (1) Independent variables: ASSURE = assurance repon on
relevance and reliability of BSC measures and results (provided, none provided): JUSTIFY = written
justificalion of pertonnance evalualions (required, not required); (2) Control variables: Work experience
= accounting/Hnance. other/none; Haimess = participants' agreement with statement "to provide a fair
pedbnnance evaluation for each manager, it was necessary to compare the performance of RadWear to
WorkWear" (11 pt»int scale)- (3) Dependent variable: the difference between participants' evaluations of ihe Iwo
divisions (RadWear-WorkWear) adjusted for the control variables.
' Untabuiatcd ANCOVA tests show no higher order interactions are significant among the variables in Panel B.
'' The values used to construct the graph are the adjusted means presented in Panel A,
'• Probabilities are calculated using one-tailed tests.
interact with our treatment variables. We find the correlation between the fairness proxy
and the performance evaluation dilferences is positive and significant (r - .14, p < 0.05).
Consistent with equity theory, participants who believe it is fairer to compare the two
divisions in preparing their evaluations tend to rely more (»n the common measures.'-^
'- Neither fairness concern scores nor the strength of their association with the performance evaluation scores
dilVers statistically across conditions. Therefore, we rule out the possibility that our justification manipulation
created greater concern for preparing fair evaluations that could he justified to a superior.
' ' Two independent cinlers who were hlind as to the participant treatment condition and the hypotheses under
study ctxled the justilication meinos. The two coders met and reconciled by consensus any differences in their
codings.
Overall, this additional analysis indicates that the combined treatment leads to a dif-
ferent cognitive representation of the information presented relative to the assurance-only
and justification-only condition. However, we did not gather data that would explain why
the cognitive representations differ. Accordingly, we leave further examination of the cause
of this unexpected result to future research.
-•* Neiihcr LS (2000) nor Krumwiede et al. (2002) examined whether all measures were considered reliable.
" The smaller sample size of ihc additional experiment reduces slalisiical power lo about 0.45.
^'' After the 26 participants in the control condition completed their performance evaluations, we drew their attention
to the pattern of results for the common and unique measures (i.e., common iiivoring one division and unique
favoring the other). We then asked Ihem if they wished to change ihcir performance evaluations based on this
additional information. Of the 24 subjects who answered the question. 17 indicated that they would make no
change in their evaluations and seven said they would evaluate the managers equally. Hence, it does not appear
that ignoring the unique measures is a simple judgment inconsistency caused by an unintended inlormation
processing error easily remedied by drawing participants' attention to the measures" paltem (Tan et al. 2002,
240).
REFERENCES
Adams, J. S. 1965. Inequity in social exchange. In Advances in Experimental Social Psychology,
Volume 2, edited by L. Berkowitz. 267-299. San Diego, CA: Academic Press.
Antle, R., and A. Smith. 1986. An empirical investigation of relative performance evaluation of
coqjorate executives. Journal of Accounting Research 24 (I): 1-39.
Arvey. R. D.. and K. R. Murphy. 1998. Performance evaluation in work settings. Annual Review of
Psvchology A9: 141-168.
Banker, R. D., and S. M. Datar. 1989. Sensitivity, precision and linear aggregation of signals for
performance evaluation. Journal of Accounting Research 27 (I): 21-40.
, H. Chang and M. Pizzini. 2004. The balanced scorecard: Judgment effects of performance
measures linked to strategy. The Accounting Review 79 (1): 1-23.
Blackwell, D. W.. T. R. Noland, and D. B. Winters. 1998. The value of auditor assurance: Evidence
from loan pricing. Journal of Accniinting Research (Spring): 57-70.
Blutn, M. L., and J. C. Naylor. 1968. Industrial Psychology: Its Theoretical and Social Foundations.
New York. NY: Harper and Row.
Bonner, S. E.. and B. L. Lewis. 1990. Determinants of auditor expertise. Journal of Accounting
Research 28 (Supplement): 1-20.
. J. Davis, and B. R. Jackson. 1992. Expertise in corporate tax planning: The issue identification
stage. Journal of Accounting Research 30 (Supplement): 1-28.
. and R Walker. 1994. The effects of instruction and experience on the acquisition of auditing
knowledge. The Accounting Review 69 (I): 157-178.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Eribaum
Associates.
Datar. S. M., S. C. Kulp, and R. A. Lambert. 2001. Balancing performance measures. Journal of
Accounting Research 39 (1): 75-93.
Dawes, R. 1979. The robust beauty of improper linear models in decision making. American Psy-
chologist 34 (7): 571-582.
Dearman. D. T , and M. D. Shields. 2001. Cost knowledge and cost-based judgment performance.
Journal of Management Accounting Research 13: 1-19.
Dilla, W.. and P. Steinbart. 2005. Relative weighting of common and unique balanced scorecard
measures by knowledgeable decision makers. Behavioral Research in Accounting (forthcoming).
Dye, R. A. 1992. Relative performance evaluation and project selection. Journal of Accounting Re-
search 30 (1): 27-51.
Edwards, M. R. 1989. Making performance appraisals meaningful and fair. Business (Jul-Sep): 17-
25.
Feltham. G. A., and J. Xie. 1994. Performance measure congruity and diversity in multi-task principal/
agent relations. The Accounting Review 69 (3): 429-453.
Foiger, R., and R. Cropanzano. 1998. Organizational Justice and Human Resource Management.
Foundations for Social Science Series. Thousand Oaks, CA: Sage Publications.
Frederickson. J. R. 1992. Relative performance information: The effects of uncertainty and contract
type on agent effort. The Accounting Review bl (4): 647-669.
Hastie, R.. and R. M. Dawes. 2001. Rational Choice in an Uncertain World. Thousand Oaks, CA:
Sage Publications.
Hemmer, T 1996. On the design and choice of "modern" management accounting tneasures. Journal
of Management Accounting Research 8: 87-116.
Heneman, R. L. 1986. The relationship between supervisory ratings and results-oriented measures of
performance: A meta-analysis. Personnel Psychology 39 (4): 811-827.
Holmstrom, B. 1979. Moral hazard and observability. Bell Journal of Economics 10 (1): 74-91.
— . 1982. Moral hazard in teams. Rand Journal of Economics 13 (2): 324-341.
. and P. Milgrom. 1991. Multitask principal-agent analyses: Incentive contracts, asset owner-
ship, and job design. Journal of Law, Economics, and Organization 7: 24-52.
Hsee, C. 1996. The evaluability hypothesis: An explanation for preference reversals between joint
and separate evaluations of alternatives. Organizational Behavior and Human Decision Proc-
esses 67 (3): 247-257.
Ittner, C , and D. Larcker. 2001. Assessing empirical research in managerial accounting: A value
based management perspective. Journal of Accounting and Economics 32: 349-410.
, and — - — . 2003. Coming up short on nonfinancial performance measurement. Harvard
Business Review 81 (11): 8 8 - 9 5 .
and M. Meyer. 2003. Subjectivity and the weighting of performance measures:
Evidence from a balanced scorecard. The Accounting Review 78 (3): 725-758.
Kaplan, R., atid D. Norton. 1992. The balaticed scorecard—Measures that drive performance. Han'ard
Business Review (January-February): 71-79.
— -, and . 1993. Putting the balanced scorecard to work. Hanard Business Review (Sep-
tember-October): 134-147.
, and D. . 1996a. The Balanced Scorecard: Translating Strategy Into Action. Boston.
MA: Harvard Business School Press.
. and . 1996b. Linking the balanced scorecard to strategy. California Management
Review 1,9 {Xy. 53-79.
-. and . 2001. The Strategy-Eocused Organization: How Balanced Scorecard Companies
Thrive in the New Business Environment. Boston, MA: Harvard Business School Press.
Kennedy, J. 1993. Debiasing audit judgment with accountability: A framework and experimental
results. Journal of Accounting Re.search 31 (2): 231-245.
. 1995. Debiasing the curse of knowledge in audit judgment. The Accounting Review 70 (2):
249-273.
Krumwiede, K. R., M. R. Swain, and D. L. Eggett. 2002. The effects of feedback, prior experience,
and division similarity on the utilization of unique performance measures in multi-division
evaluations. Working paper. Brigham Young University.
Kurtz, K. J., C. Miao, and D, Gentner. 2001. Learning by analogical bootstrapping. Journal of the
Learning Sciences 10 (4): 4 1 7 ^ 4 6 .
Lemer, J. S., and P. E. Tetlock. 1999. Accounting for the effects of accountability. Psychological
Bulletin 125 (2): 255-275.
Leventhal. G. S., J. Karuza Jr.. and W. R. Fry. 1980. Beyond fairness: A theory of allocation pref-
erences. In Justice and Social Interaction, edited by G. Mikula, 167-218. Bern, Switzerland:
Hans Huber Publishers.
Libby, R. 1979. Bankers' and auditors' perceptions of the message communicated by the audit report.
Journal of Accounting Research 17 (Spring): 99-122.
, and W. R. Kinney. 2000. Earnings management, audit differences, and analysts' forecasts.
The Accounting Review 75 (4): 393-404.
, R. Bloomfield, and M. W. Nelson. 2002. Experimental research in financial accounting.
Accounting. Organizations and Society 27 (8): 775-810.
Lingle, J. H., and W. A. Schiemann. 1996. From balanced scoreeard to strategic gauges: Is measure-
ment worth it? Management Review 85 (3): 56-61.
Lipe, M. G.. and S. E. Salterio. 2000. The balanced scorecard: Judgmental effects of common and
unique performance measures. The Accounting Review 75 (3): 283-298.
Malnes. L.. E. Bartov, P. M. Fairfield, D. E. Hirst, T E. lannaconi, R. Maliett. C. M. Schrand. D. J.
Skinner, and L. Vincent. 2002. Recommendations on disclosure of nonfinancial performance
measures. Accounting Horizons 16 (4): 353-362.
Malina, M. A., and F. H. Selto. 2001. Communicating and controlling strategy: An empirical study
of the effectiveness of the balanced scorecard. Journal of Management Accounting Research
13: 47-90.
Markman, A. B., and D. L. Medin. 1995. Similarity and alignment in choice. Organizational Behavior
and Human Decision Processes 63 (2): 117-130.
Mautz, R.. and H. Sharaf. 1961. The Philosophy of Auditing. Madison, Wl: American Accounting
Association.
McNamara. H., and R. Fisch. 1964. Effect of high and low motivation on two aspects of attention.
Perceptual and Motor Skills 19: 571-578.
Mero, N. P., and S. J. Motowidlo. 1995. Effects of rater accountability on the accuracy and the
favorability of performance ratings. Journal of Applied P.sychology 80 (4): 517-525.
Neter, J., W. Wasserman, and M. H. Kutner. 1985. Applied Linear Statistical Models. Homewood, IL:
Irwin.
Reck, J. L. 2(X)I. The usefulness of financial and nonfinancial performance information in resource
allocation decisions. Journal of Accounting and Public Policy 20: 45-71.
Siegel-Jacobs, K., and J. F Yates. 1996. Effects of procedural and outcome accountability on judgment
quality. Organizational Behavior and Human Decision Processes 65 (1): 1--17.
Simonson, I., and B. Staw. 1992. De-escalation strategies: A comparison of techniques for reducing
commitment to losing courses of action. Journal of Applied Psychology 11 (4): 419—427.
Slovic, P., and D. MacPhillamy. 1974. Dimensional commensurability and cue utilization in compar-
ative j u d g m e n t . Organizational Behavior and Human Performance I I : 172-194.
Tan. H-T, and A. Kao. 1999. Accountability effects on auditors" performance: The influence of knowl-
edge, problem-solving ability, and task complexity. Journal of Accounting Research 37 (I):
209-223.
, R. Libby, and J. Hunton. 2002. Analyst's reactions to earnings preannouncement strategies.
Journal of Accounting Research 40 (1): 223-245.
Tetiock, P. E. 1985. Accountability: The neglected social context of judgment and choice. Research
in Organizational Behavior 7: 297-332.
-, and R. Boettger. 1989. Accountability: A social magnifier of the dilutioti effect. Journal of
Personality and Social Psychology 51 (3): 388-398.
Vera-Mufioz. S. C , W. R. Kinney Jr., and S. E. Bonner. 2001. The effects of domain experience and
task presentation format on accountants* information relevance assurance. The Accounting Re-
view 76 (3): 405-429.
Yim, A. T. 2001. Renegotiation and relative performance evaluation: Why an informative signal may
be useless. Review of Accounting Studies 6: 77-108.
Zhang, S., and A. B. Markman. 2001. Processing product unique features: Alignability and involve-
ment in preference construction./owrna/«/Co/Lmme-r/'.v_y(.7io/(7g_>' 11 (1): 13-27.