Handbook of Industrial, Work and Organizational Psychology - Personnel Psychology-139-155
Handbook of Industrial, Work and Organizational Psychology - Personnel Psychology-139-155
CHOCKALINGAM VISWESVARAN
assessing training and placement needs. The systems norm-referenced (e.g., rankings). The distinction
maintenance category refers to the use of individual between organizational records and subjective eval-
job performance assessments for human resources uations has a long history. Burtt (1926) and Viteles
planning and reinforcement of authority structures in (1932) grouped criterion measures into objective and
organizations. Finally, individual job performance subjective classes. Farmer (1933) grouped criteria
data are also used for legal documentation purposes. into objective measures, judgments of performance
Cascio (1991) groups these uses into three main (judgments based on objective performance), and
categories: administrative, feedback, and research judgments of ability (judgments based on traits).
purposes. Administrative use refers to the use of Smith (1976) distinguished between hard criteria
individual job performance assessment for making (i.e., organizational records) and soft criteria (i.e.,
administrative decisions such as pay allocation, pro- subjective evaluations).
motions, and layoffs. Individual job performance Methods of assessments should be distinguished
assessment can also be used to provide feedback to from types of criteria. Thorndike (1949) identifies
individuals by identifying their strengths and weak- three types of criteria: immediate, intermediate, and
nesses, and finally, it is required for research ultimate criteria. The ultimate criterion summarizes
purposes – be it validation of a selection technique the total worth of the individual to the organization
or evaluating the efficacy of a training program. over the entire career span. The immediate criteria
That individual job performance assessments are on the other hand is a measure of individual job per-
used in a variety of ways has also been found in formance at that particular point in time. Inter-
several studies. For a long time, administrative uses mediate criteria summarize performance over a
have been known (Whisler & Harper, 1962), and period of time. Note that both organizational
DeVries, Morrison, Shullman and Gerlach (1986) records and subjective evaluations can be used to
report that surveys conducted in the 1970s in both assess, say, an intermediate criterion. Similarly,
the United States and the United Kingdom indi- Mace (1935) argued that measures of individual job
cated the prevalence of individual job performance performance can stress either capacity or will to
assessment for the purpose of making administra- perform. This distinction is a forerunner to the dis-
tive decisions. In fact, these surveys suggested that tinction between maximal and typical performance
more than 50% of the use of individual job perfor- measures (e.g., DuBois, Sackett, Zedeck & Fogli,
mance assessment was for the purpose of making 1993; Sackett, Zedeck & Fogli, 1988). Maximal
administrative decisions. DeVries et al. (1986) performance is what an individual can do if highly
noted that the use of such assessments in Great motivated whereas typical performance is what
Britain can be classified into three categories: (1) to an individual is likely to do in a typical day. The
improve current performance, (2) to set objectives, distinction between ultimate, intermediate, and
and (3) to identify training and development needs. immediate criteria or between maximal and typical
In this chapter, I review the research on individ- performance refers to types of criteria. Both
ual job performance. There are four sections. The organizational records and subjective evaluations
first deals with the different methods of assessment, (methods) can be used to assess them.
and following this I summarize the studies con- Organizational records can be further classified
ducted to explicate the content domain of individual into direct measures of productivity and personnel
job performance. Factor analytic studies as well as data (Schmidt, 1980). Direct measures of produc-
theoretical and rational analyses of what constitutes tivity stress the number of units produced. Also
individual job performance are reviewed. In the included are measures of quality such as the num-
third section, I review the criteria for assessing the ber of errors, scrap material produced, and so forth.
quality of individual job performance assessments Personnel data, on the other hand, do not directly
along with a discussion of such studies. Finally, in measure productivity but inferences of productivity
the fourth section, I summarize some of the causal can be derived based on them. Lateness or tardi-
path models postulated to explain the determinants ness, tenure, absences, accidents, promotion rates,
and components of individual job performance. and filing grievances can be considered as indirect
measures of productivity – there is an inferential leap
involved in using these personnel data as a measure
METHODS OF ASSESSMENT of individual job performance. Organizational
records, by focusing on observable, countable, dis-
Methods used to assess individual job performance crete outcomes, may overcome the biasing influ-
can be broadly classified into (1) organizational ences of subjective evaluations but may be affected
records, and (2) subjective evaluations. Organiza- by criterion contamination and criterion deficiency.
tional records are considered to be more ‘objective’ Contamination occurs in that outcomes could be
in contrast to the subjective evaluations that depend due to factors beyond the control of the individuals;
on a human judgment. Subjective evaluations could deficiency results as the outcomes assessed may not
either be criterion referenced (e.g., ratings) or take into account important aspects of individual
112 Handbook of Industrial, Work and Organizational Psychology — 1
job performance. I will discuss the literature on the Several research studies have been conducted
construct validity of organizational records after over the years to compare the quality of the different
presenting the criteria for the job performance crite- rating scales. Symonds (1924) investigated the opti-
rion in the third section of this chapter. mal number of scale points and recommended seven
Subjective evaluations can be either ratings categories as optimal. Other researchers (e.g., Bendig,
or rankings of performance. Ratings are criterion- 1954; Lissitz & Green, 1975) present conflicting
referenced judgments where an individual is evalu- conclusions. Schwab Heneman and DeCotiis (1975)
ated without reference to other individuals. The questioned the superiority of BARS over other for-
most common form of rating scale is a graphic rating mats, and finally, Landy and Farr (1980) in an influ-
scale (GRS), which typically involves presenting the ential article concluded that rating formats and
rater with a set of dimensions or task categories with scales do not alter the performance assessments, and
several levels of performance and requiring the guided researchers away from the unprofitable con-
raters to choose the level that best describes the troversies of which scale and rating format is supe-
person being rated. There are several formats of rior to investigations of the cognitive processes
GRS. The different formats differ in the number of underlying performance assessments.
levels presented, the clarity in demarcating the dif- In contrast to ratings which are criterion-referenced
ferent levels (e.g., asking the rater to circle a number assessments, rankings are norm-referenced assess-
vs. asking them to indicate a point in a line the end ments. The simplest form of ranking is to rank all
points of which are described), and the clarity in ratees from best to worst. The ranking will depend
identifying what behaviors constitute a particular on the set of ratees and it is impossible to compare
level. Smith and Kendall (1963) designed the the rankings from two different sets of individuals;
Behaviorally Anchored Rating Scales (BARS) to the worst in one set may be better than the best in the
explicitly tie the different levels to behavioral second set of ratees. A modified version, called
anchors. Steps involved in the construction of BARS alternate ranking, involves (1) picking the best and
include generating a list of behaviors depicting dif- worst ratees in the set of ratees under consideration,
ferent performance levels of a particular dimension (2) removing the two chosen ratees, (3) picking the
of performance, checking the agreement across next best and worst from the remaining ratees, and
raters (retranslation), and designing the layout of the (4) repeating the process until all ratees are ranked.
scale. A variant of the BARS is the Behavioral The advantage of the alternate ranking method is
Observation Scale (BOS) where the rater merely that it reduces the cognitive load on the raters. Yet
notes whether a behavior was displayed by the ratee another approach is to compare each ratee to every
(Latham Fay, & Saari, 1980) and the Behavioral other ratee, a method of paired comparisons that
Evaluation Scale (BES) where the rater notes the becomes unwieldy when the number of ratees
likelihood of the ratee exhibiting a particular behav- increases. Finally, forced distribution methods can
ior (Bernardin, Alvares & Cranny, 1976). be used where a fixed percentage of ratees are
Researchers have also addressed, by developing placed in each level. Forced distribution methods
checklists, the reluctance of raters to judge the per- can be useful to generate the desired distribution
formance of others. The rater merely indicates (mostly normal) of assessed scores.
whether a particular behavior has been exhibited, With subjective evaluations (ratings or rankings),
and either a simple sum or weighted combination is the question of who should rate arises. Typically, in
then computed to assess performance. There are traditional organizations the supervisors of the
several types of these summated rating scales in employees provide the ratings. Recent years have
existence. To address the problem that raters could seen an increase in the use of 360 degree feedback
intentionally distort their ratings, forced choice systems (Church & Bracken, 1997) where rating
scales and mixed standard scales (MSS) have also assessments can be made by self (the ratee himself or
been developed. In a forced choice assessment, herself), subordinates, peers, and customers or clients.
raters are provided with two equally favorable state- I discuss the convergence among the different sources
ments of which only one discriminates between as well as the convergence between subjective evalu-
good and poor performers. The idea is that the rater ations and organizational records under the section on
who wants to give lenient ratings may choose the the construct validity of performance assessments.
favorable but nondiscriminating statement as
descriptive of the rater. The MSS comprises of three
statements for each dimension of performance rated EXPLICATING THE CONSTRUCT
with the three statements depicting an excellent, an DOMAIN OF INDIVIDUAL
average and a poor performance, respectively on
that dimension. The rater rates the performance of JOB PERFORMANCE
each ratee as better than, equal to or worse than the
performance depicted in that statement. Scoring Job performance is an abstract, latent construct. One
rules are developed and MSS can identify inconsis- cannot point to one single physical manifestation
tent or careless raters (Blanz & Ghiselli, 1972). and define it as job performance; there are several
Individual Job Performance 113
manifestations of an abstract construct. Explicating First, researchers have reviewed job performance
the construct domain of individual job performance measures used in different contexts and attempted
involves specifying what is included when we talk to synthesize what dimensions make up the con-
of the concept (Wallace, 1965). Further, keeping struct. This rational method of synthesizing and
with the abstract nature of constructs, there are sev- theory building is however affected by the personal
eral manifestations of individual job performance bias of the individual researchers.
with the actual operational measure varying across Second, researchers have developed measures of
contexts; explication of the construct involves iden- hypothesized dimensions, collected data on these
tifying dimensions that make up the construct. The measures, and factor analyzed the data (e.g., Rush,
dimensions generalize across contexts whereas the 1953). This empirical approach is limited by the
exact measures differ. For example, interpersonal number and type of measures included in the data
competence is a dimension of individual job perfor- collection phase. Recently, Viswesvaran (1993)
mance that could be relevant in several contexts, invoked the lexical hypothesis from personality
but the actual behavior could vary depending on the literature (Goldberg, 1995) to address this limita-
construct. One measure of interpersonal competence tion. The lexical hypothesis states that practically
for a professor may be how polite the professor is in significant individual differences in personality are
replying to reviewers. For a bank teller, a measure encoded in the language used, and therefore a com-
of interpersonal competence is how considerate prehensive description of personality can be
they are of customer complaints or the extent to obtained by collating all the adjectives found in the
which they smile at customers. dictionary. Viswesvaran, Ones and Schmidt (1996)
To explicate a construct domain, it is optimal to extended this principle to job performance assess-
start with a definition of the construct. In this ment and argued that a comprehensive specification
chapter, I define individual job performance as of the content domain of the job performance con-
evaluatable behaviors. Although I use the term struct can be obtained by collating all the measures
behaviors, I would stress that the difference between of job performance that had been used in the extant
behaviors and outcomes is not clear-cut in many literature.
instances. Some researchers (Campbell, 1990) Third, researchers (e.g., Welbourne, Johnson &
insist on a clear demarcation between behaviors and Erez, 1998) have invoked organizational theories to
outcomes whereas others (Austin & Villanova, define what the content of the job performance con-
1992; Bernardin & Pence, 1980) deemphasize this struct should be. Welbourne et al., used role theory
difference. The reason for emphasizing this differ- and identity theory to explicate the construct of job
ence between behaviors and outcomes is the alleged performance. Another example of invoking a
control an individual has over them. The argument theory of work organization to explicate the con-
is that the construct of individual job performance struct of job performance comes in the distinction
should not include what is beyond the individual’s made between task and contextual performance
control. The distinguishing feature is whether the (Borman & Motowidlo, 1993). Distinguishing
individual has control over what is assessed. If the between task and contextual performance parallels
individual does have such control, it is included the social and technical systems that are postulated
under the individual job performance construct. to make-up the organization. Of these three
Consider the research productivity of a professor. approaches, most of the extant literature employs
Is the number of papers published a measure of either rational synthesis or factor analytic appro-
individual job performance? Surely, several factors aches. Therefore, I review these two set of studies
beyond the control of the professor affect the pub- separately.
lishing of the paper. Is the number of papers writ-
ten, a measure of individual job performance?
Again, surely we can think of several factors that Rational Synthesis of Job
could affect the number of papers written that are Performance Dimensions
not under the control of the professor. Thus, for
every measure or index of individual job perfor- Toops (1944) was one of the earliest attempts to
mance, the degree of control the individual has is a hypothesize what dimensions comprise the con-
matter of degree. As such the distinction between struct of job performance, arguing a distinction
behaviors and outcomes is also a question of degree between accuracy (quality or lack of errors) and
and not some absolute distinction. Whether one volume of output (quantity). Toops (1944) lists
defines performance and related constructs as units of production, quality of work, tenure, super-
behaviors or outcomes depends on the attributions visory and leadership abilities as dimensions of
one makes and the purpose of the evaluation. individual job performance. Wherry (1957), on the
How have researchers and practitioners defined other hand, lists listed six dimensions: output, qual-
the construct domain of individual job performance ity, lost time, turnover, training time or promotabil-
in their studies? Generally they have applied some ity, and satisfaction. The last two decades have
combination of the following three approaches. seen several rational analyses (of the individual job
114 Handbook of Industrial, Work and Organizational Psychology — 1
performance construct) based on the plethora of found in a sample of military jobs (Campbell,
factor analytic studies that have been conducted McHenry & Wise, 1990). Further details about these
over the years. In this section, I present three such dimensions may be found in Campbell (1990).
frameworks. Murphy (1989) describes the construct of job
Bernardin and Beatty (1984) define performance performance as comprising of four dimensions:
as the record of outcomes produced on a specified downtime behaviors, task performance, interper-
job function or activity during a specified time sonal, and destructive behaviors. Task performance
period. Although a person’s job performance depends focuses on performing role-prescribed activities
on some combination of ability, motivation and whereas downtime behaviors refer to lateness, tar-
situational constraints, it can be measured only in diness, absences or, broadly, to the negative pole of
terms of some outcomes. Bernardin and Beatty time on task (i.e., effort exerted by an individual on
(1984) then consider the issue of dimensions of job the job). Interpersonal behaviors refer to helping
performance. Every job function could be assessed others, teamwork ratings, and prosocial behaviors.
in terms of six dimensions (Kane, 1986): quality, Finally, destructive behaviors correspond to com-
quantity, timeliness, cost-effectiveness, need for pliance with rules (or lack of it), violence on the
supervision, and interpersonal impact. Some of job, theft, and other behaviors counterproductive to
these dimensions may not be relevant to all job the goals of the organization. The four dimensions
activities. Bernardin and Russell (1998) emphasize are further elaborated in Murphy (1989).
the need to understand the interrelationships among
the six dimensions of performance. For example, a Factor Analytic Studies
work activity performed in sufficient quantity and
quality but not in time may not be useful to the In a typical factor analytic study, individuals are
organization. assessed on multiple measures of job performance.
Campbell (1990) describes the latent structure of Correlations are obtained between the measures of
job performance in terms of eight dimensions. job performance and factor analysis is used to iden-
According to Campbell (1990) and Campbell, tify the measures that cluster together. Based on
McCloy, Oppler and Sager (1993), the true score the commonalities across the measures that cluster
correlations between these eight dimensions are together, a dimension is defined. For example,
small, and hence any attempt to cumulate scores when absence measures, lateness measures, and
across the eight dimensions will be counterproduc- tenure cluster together, a dimension of withdrawal
tive for guiding research and interpreting results. is hypothesized. I review below some representa-
The eight factors are: job-specific task proficiency, tive studies; the actual number of studies is too
nonjob-specific task proficiency, written and oral numerous to even list, let alone describe in a book
communication, demonstrating effort, maintaining chapter.
personal discipline, facilitating peer and team An important point needs to be stressed here.
performance, supervision, and management or Factor analyses of importance ratings of task ele-
administration. ments, frequency of tasks performed, and time
Job-specific task proficiency is defined as the spent on tasks done on the job, are not reviewed.
degree to which the individual can perform the core The dimensions identified in such studies do not
substantive or technical tasks that are central to a capture dimensions of individual job performance
job and which distinguish one job from another. (Schmidt, 1980; Viswesvaran, 1993). Consider a
Nonjob-specific task proficiency, on the other hand, typical job analytic study that obtains importance
is used to refer to tasks not specific to a particular ratings of task statements from raters. The correla-
job, but is expected of all members of the organiza- tion between these ratings (e.g., the correlation
tion. Demonstrating effort captures the consistency between task i and task j) are computed and the
or perseverance and intensity of the individuals to resulting correlation matrix is factor analyzed.
complete the task, whereas maintenance of personal Tasks that cluster together are used to identify a
discipline refers to the eschewment of negative dimension of job performance. But because all
behaviors (such as rule infractions) at work. Mana- raters are rating the same stimulus (say task i), the
gement or administration differs from supervision true variance is zero (Schmidt & Hunter, 1989).
in that the former includes performance behaviors Any observed variability across raters is the result
directed at managing the organization that are dis- of random errors, disagreements between raters,
tinct from supervisory or leadership roles. Written and differences between raters in leniency and other
and oral communications reflect that component of rater idiosyncrasies. Correlating the rating errors in
the job performance that refers to the proficiency of pairs of variables (importance ratings of tasks i and j)
an incumbent to communicate (written or oral) and factor analyzing the resulting correlations can-
independent of the correctness of the subject matter. not reveal individual differences dimensions of job
The description of these eight dimensions are fur- performance (Schmidt, personal communication,
ther elaborated in Campbell (1990) and Campbell June 25, 1993). Therefore, in this section I focus only
et al. (1993). Five of the eight dimensions were on studies that obtained individual job performance
Individual Job Performance 115
data on different measures, correlated the measures, soldier effectiveness based on data collected during
and factor analyzed the resulting correlation matrix Project A. Project A is a multi-year effort under-
to identify dimensions of performance. taken by the United States Army to develop a com-
Rush (1953) factor analyzed nine rating measures prehensive model of work effectiveness. As part of
and three organizational records-based measures of that landmark project, Borman et al., developed a
job performance for 100 salespeople. He identified model of job performance for first-tour soldiers that
the following four factors: objective achievement, are important for unit effectiveness. Borman et al.,
learning aptitude, general reputation, and profi- noted that in addition to task performance, there
ciency of sales techniques. A sample size of 100 for were three performance dimensions: allegiance,
analyzing a 12 × 12 matrix of correlations would teamwork, and determination, and that each of these
probably be considered inadequate by present-day three dimensions could be further subdivided. Thus,
standards, but this was one of the first studies to allegiance involved following orders, following
employ factor analytic techniques to explicate the regulations, respect for authority, military bearing,
underlying dimensions and factor structure of the and commitment. Teamwork comprised of cooper-
individual job performance construct. ation, camaraderie, concern for unit morale, boost-
Baier and Dugan (1957) obtained data on 346 ing unit morale, and leadership. Determination
sales agents on 15 objective variables and two sub- involved perseverance, endurance, conscientious-
jective ratings. Factor analysis of the 17 × 17 inter- ness, initiative, and discipline.
correlation matrix resulted in one general factor. Hunt (1996) developed a model of generic work
Several different measures such as percentage sales, behavior applicable to entry-level jobs especially in
units sold, tenure, knowledge of products, loaded on the service industry. Using performance data from
this general factor. In contrast, Prien and Kult over 18,000 employees primarily from the retail
(1968) factor analyzed a set of 23 job performance sector, Hunt identified nine dimensions of job
measures and found evidence for seven distinct performance that do not depend on job-specific
dimensions. Roach and Wherry (1970) using a large knowledge. The nine dimensions were: adherence
sample of (N = 900) salespersons found evidence for to confrontational rules, industriousness, thorough-
a general factor. Seashore, Indik and Georgopoulos ness, schedule flexibility, attendance, off-task behav-
(1960) using comparably large samples (N = 975) ior, unruliness, theft, and drug misuse. Adherence
found no evidence for a general factor. to confrontational rules reflected an employee’s
Ronan (1963) conducted a factor analysis of a set willingness to follow rules that might result in a
of 11 job performance measures. Four of the mea- confrontation between the employee and a customer
sures were objective records including measures of (e.g., checking for shoplifting). Industriousness
accidents and disciplinary actions. The factor captured the constant effort and attention towards
analysis indicated a four-factor solution. One of the work while on the job. Thoroughness was related to
four factors reflected the ‘safe’ work habits of the the quality of work whereas schedule flexibility
individual (e.g., index of injuries, time lost due to reflected the employees’ willingness to change their
accidents); acceptance of authority and adjustment schedule to accommodate demands at work.
constituted two other factors. The fourth factor was Attendance captured the employee’s presence at
uninterpretable (Ronan, 1963). work when scheduled to work, and punctuality.
Gunderson and Ryman (1971) examined the Off-task behavior involved the use of company
factor structure of individual job performance in time to engage in nonjob activities. Unruliness
extremely isolated groups. The sample analyzed referred to minor deviant tendencies as well as
involved scientists spending their winter in abrasive and inflammatory attitudes towards co-
Antarctica. Three factors were identified: task effi- workers, supervisors, and work itself. Finally, theft
ciency, emotional stability, and interpersonal rela- involved taking money or company property, or
tions. Klimoski and London (1974) obtained data helping friends steal property whereas drug misuse
from different sources (e.g., supervisors, peers) to referred to inappropriate use of drugs and alcohol.
avoid monomethod problems, and reported evi- Another trend discernible in the last two decades
dence for the presence of a general factor, a finding is the focus on specific performance aspects other
that is interesting when considered in the wake of than task performance. Smith, Organ and Near
arguments (cf. Borman, 1974) that raters at differ- (1983) popularized the concept of ‘Organizational
ent levels of job performance construe the content Citizenship Behavior’ (OCB) into the job perfor-
domain of job performance differently. mance literature. OCB was defined as individual
Factor analytic studies in the last two decades behavior that is discretionary, not directly or explic-
(1980–99) have used much larger samples and itly recognized by the formal reward system, and
refined techniques of factor analysis, and the use of that in the aggregate promotes the effective func-
confirmatory factor analysis has enabled researchers tioning of the organization (Organ, 1988). Factor ana-
to combine rational synthesis and empirical partition- lytic studies have identified distinct sub-dimensions
ing of variance. For example, Borman, Motowidlo, of OCB: altruism, courtesy, cheerleading, sports-
Rose and Hansen (1985) developed a model of manship, civic virtue, and conscientiousness.
116 Handbook of Industrial, Work and Organizational Psychology — 1
Over the years several concepts related and behaviors have long been studied by work psycho-
overlapping with OCB have been proposed. George logists in terms of lateness or tardiness, absenteeism,
and Brief (1992) introduced the concept of ‘organi- and turnover. Work psychologists and social psy-
zational spontaneity’, defining organizational spon- chologists have explored the antecedents and conse-
taneity as voluntarily performed extra-role behavior quences of social loafing, shirking or the propensity
that contributes to organizational effectiveness. Five to withhold effort (Kidwell & Bennett, 1993).
dimensions were postulated: helping co-workers, pro- A major concern in evaluating the different factor
tecting the organization, making constructive sugges- analytic studies in the job performance domain is
tions, developing oneself, and spreading goodwill. the fact that the dimensions identified are a function
Organizational spontaneity is distinguished from of the measures included. To ensure a comprehen-
OCB partly on account of reward systems being sive specification of the content domain of the job
designed to recognize organizational spontaneity. performance construct, Viswesvaran (1993) invoked
Van Dyne, Cummings and Parks (1995) argued the lexical hypothesis which was first introduced in
for the use of ‘Extra-Role Behavior’ (ERB). Based the personality assessment literature (see also
on role theory concepts developed by Katz (1964), Viswesvaran et al., 1996). A central thesis of this
ERB has been hypothesized to contribute to organi- lexical approach is that the entire domain of job per-
zational effectiveness. Brief and Motowidlo (1986) formance can be captured by culling all job perfor-
introduced the related concept of Prosocial Organi- mance measures used in the extant literature. This
zational Behavior (POB), which has been defined parallels the lexical hypothesis used in the person-
as behavior performed with the intention of pro- ality literature which, as first enunciated by
moting the welfare of individuals or groups to Goldberg, holds that a comprehensive description
whom the behavior has been directed. POB can be of the personality of an individual can be obtained
either role-prescribed or extra-role, and it can be by examining the adjectives used in the lexicon
negative towards organizations although positive (e.g., all English language words that could be
towards individuals. obtained/culled from a dictionary).
Finally, Borman (1991) as well as Borman and Viswesvaran (1993) listed job performance mea-
Motowidlo (1993) describe the construct of job sures (486 of them) used in published articles over
performance as comprising task and contextual the years. Two raters working independently then
performance. Briefly, task performance focuses on derived 10 dimensions by grouping conceptually
performing role-prescribed activities whereas con- similar measures. The 10 dimensions were: overall
textual performance accounts for all other helping job performance, job performance or productivity,
and productive behaviors (Borman, 1991; Borman & effort, job knowledge, interpersonal competence,
Motowidlo, 1993). The two dimensions are further administrative competence, quality, communication
elaborated in Borman and Motowidlo (1993). competence, leadership, and compliance with rules.
Motowidlo, Borman and Schmit (1997) developed Overall job performance captured overall effective-
a theory of individual differences in task and con- ness, overall work reputation, or was the sum of all
textual performance. Some researchers (e.g., Van individual dimensions rated. Job performance or
Scotter & Motowidlo, 1996) have argued that indi- productivity included ratings of quantity or ratings
vidual differences in personality variables are of volume of work produced. Ratings of effort were
linked more strongly than individual differences in statements about the amount of work an individual
(cognitive) abilities to individual differences in con- expends in striving to do a good job. Interpersonal
textual performance. Cognitive ability was hypoth- competence was assessments of how well an indi-
esized to be more predictive of task performance vidual gets along with others whereas administra-
than contextual performance. Although persuasive, tive competence was a ratings measure of the
empirical support for this argument has been mixed. proficiency exhibited by the individual in handling
Conscientiousness, a personality variable, has been the coordination of the different roles in an organi-
linked as strongly as cognitive ability to task per- zation. Quality was an assessment of how well the
formance in some studies (Alonso, 2000). job was done and job knowledge was a measure of
Behaviors that have negative value for organiza- the expertise demonstrated by the individual.
tional effectiveness have also been proposed as con- Communication competence reflected how well an
stituting distinct dimensions of job performance, individual communicated regardless of the content.
and organizational misbehavior has become a topic Leadership was a measure of the ability to success-
of research interest. Clark and Hollinger (1983) fully bring out extra performance from others, and
discussed the antecedents of employee theft on compliance with or acceptance of authority assessed
organizations. Our work on integrity testing (Ones, the perspective the individual has about rules and
Viswesvaran & Schmidt, 1993) as well as the works regulations. Illustrative examples as well as more
of Paul Sackett and colleagues (cf. Sackett & Wanek, elaborate explanations of these dimensions are pro-
1996) have identified the different forms of coun- vided in Viswesvaran et al. (1996).
terproductive behaviors such as property damage, Although the lexical approach is promising, it
substance abuse, violence on the job. Withdrawal should be noted that there are two potential concerns
Individual Job Performance 117
here. First, it can be argued that just as the technical validated. While Freyd argued for the importance
nuances of personality may not be reflected in of establishing the construct validity of criteria,
the lexicon, some technical but important aspects Farmer (1933) stressed the need for assessing the
of job performance have never been used in the reliability of measures. Burtt (1926) provided a list
literature – thus, not covered in the 10 dimensions of variables (e.g., opportunity bias) that could affect
identified. Second, it should be noted that generat- organizational records or objective performance.
ing 10 dimensions from a list of all job performance Brogden and Taylor (1950) discussed the different
measures used in the extant literature involved the types of criterion bias, specifically differentiating
judgmental task of grouping conceptually similar between bias that is correlated with predictor vari-
measures. ables and biases that are unrelated to predictors.
Of these two concerns, the first is mitigated to the Bellows (1941) identified six criteria that he
extent that the job performance measures found in grouped into statistical, acceptability, and practical
the extant literature were identified by industrial- effects categories. Bechtoldt (1947) introduced three
organizational psychologists and other professionals criteria: (1) reliability and discriminability, (2) per-
(in consultation with managers in organizations). tinence and comprehensiveness, and (3) compara-
As such the list of measures can be construed as a bility. Reliability is the consistency of measurement
comprehensive specification of the entire domain of (Nunnally, 1978) and a good measure of assess-
the construct of job performance. The second con- ment of individual job performance should discrim-
cern, the judgmental basis on which the job perfor- inate across individuals. Pertinence refers to
mance measures were grouped into 10 conceptual job-relatedness, and comprehensiveness requires
dimensions, is mitigated to the extent that inter- that all important aspects of job performance are
coder agreement is high (the intercoder agreement included in the assessment. Comparability focuses
in grouping the conceptually similar measures into on the equivalence across the different dimensions
the 10 dimensions was reported in the 90%s, assessed (e.g., time, place).
Viswesvaran, 1993). Thorndike (1949) proposed four criteria: (1) rele-
A comprehensive specification of the job perfor- vance, (2) reliability, (3) freedom from discrimina-
mance construct involves many measures, the inter- tion, and (4) practicality. Relevance is the construct
correlation among which is needed to conduct the validity of the measures, and can be construed as
factor analyses. Estimating the correlations among the correlation between the true scores and the con-
all variables with adequate sample sizes may not be struct (i.e., job performance). Given that this corre-
feasible in a single study. Fortunately, meta-analysis lation can never be empirically estimated, relevance
can be used to cumulate the correlations across or construct validity is assessed by means of a
pairs of variables, and the meta-analytically con- nomological net of correlations with several related
structed correlation matrix can be used in the factor measures (see section on construct validity in
analyses (cf. Viswesvaran & Ones, 1995). Conway Chapter 2 by Aguinis et al., in this volume). Relevance
(1999) developed a taxonomy of managerial behav- is the lack of criterion contamination (the measure
ior by meta-analytically cumulating data across 14 includes what it should not include) and criterion
studies, and found a three-level hierarchy of man- deficiency (measure lacks what it should include).
agerial performance. Viswesvaran (1993) cumu- Note that Thorndike’s use of the term ‘discrimina-
lated results from over 300 studies that reported tion’ differs from use of the term by Bechtoldt
correlations across the 10 dimensions. Both interrater (1947). For Thorndike discrimination is unfair dis-
and intrarater correlations, as well as nonratings- tinctions made based on (demographic) group
based measures were analyzed. The 10 dimensions memberships. All measures designed to assess indi-
showed a positive manifold of correlations, suggest- vidual job performance should discriminate – the
ing the presence of a general factor across the differ- question is whether the discrimination is relevant to
ent dimensions (Campbell, Gasser & Oswald, 1996). job performance or is unrelated to it.
Ronan and Prien (1966) argued that reliability of
assessments is the most important factor in evaluat-
CRITERIA FOR ASSESSING THE ing the quality of individual job performance assess-
QUALITY OF INDIVIDUAL JOB ments. Guion (1976), on the other hand, stressed the
importance of assessing the construct validity of the
PERFORMANCE ASSESSMENTS performance assessments. Smith (1976) identified
relevance (construct validity), reliability, and prac-
For over a century, researchers have grappled with ticality as criteria for evaluating job performance
the issues involved in assessment of individual job assessments. Blum and Naylor (1968) summarize
performance (cf. Austin & Villanova, 1992, for a the conclusions of many researchers on criteria.
summary). It is no wonder that several researchers Across the different classifications, the common
have advanced criteria for evaluating these assess- criteria can be stated as (1) discriminability across
ments. Freyd (1926) argued that measures of indivi- individuals, (2) practicality, (3) acceptability,
dual job performance assessments should be (4) reliability, (5) comprehensiveness (lack of
118 Handbook of Industrial, Work and Organizational Psychology — 1
criterion deficiency), and (6) construct validity (or account the values which underlie performance
relevance or job relatedness or pertinence or free- assessments.
dom from bias such as contamination).
Of these six criteria, voluminous research has
focused on issues of reliability and construct validity. Reliability of Individual Job
Methods to assess job relatedness (pertinent) has Performance Assessments
been covered in other chapters (see Chapter 4 by
Sanchez & Levine, this volume). Criteria such as Reliability is defined as the consistency of mea-
discriminability and practicality pertain to adminis- surement (Nunnally, 1978; Schmidt & Hunter,
tration issues and may depend on the context. For 1996). Mathematically it can be defined as the ratio
example, how well an individual counts can be a of true to observed variance, and depending on what
good measure of job performance for entry-level part of observed variance is construed as true vari-
clerks in a grocery store but not for high-level ance and what is construed as error variance we
accountants. Finally, there has been some limited have different reliability coefficients (Pedhazur &
research on user acceptability as a criterion. In light Schmelkin, 1991; Schmidt & Hunter, 1996). The
of this, I devote the rest of this section to these two three major types of reliability assessments that per-
issues – reliability and construct validity of indivi- tain to individual job performance are (1) internal
dual job performance assessments – after briefly consistency, (2) stability estimates, and (3) inter-
summarizing the research on user acceptability. rater reliability estimates. These reliability estimates
can be computed for either overall job performance
User Acceptability assessments or for each dimension assessed. Some
of these estimates (e.g., interrater) are applicable to
Some recent research in the past 20 years has only some methods of assessments (e.g., subjective
focused on user acceptability of peer ratings of indi- evaluations such as ratings) whereas other types of
vidual job performance. Researchers (e.g., King, reliability estimates (e.g., stability) are applicable to
Hunter & Schmidt, 1980) have noted that raters all methods of assessment (subjective evaluations
were unwilling to accept nontransparent rating such as ratings and organizational records).
instruments such as the mixed standard scales and Consider a researcher interested in assessing the
forced choice measures. Bobko and Colella (1994) dimension of interpersonal competence in indivi-
summarize the research on how users make mea- dual job performance. The researcher could
ning and set acceptable performance standards. develop a list of questions that relate to interper-
Dickinson (1993) reviews several factors that sonal competence and require knowledgeable
could affect user reactions. Folger, Konovsky and raters to evaluate individuals in each of the ques-
Cropanzano (1992) present a due process model tions. Either an unweighted or weighted sum of the
based on notions of organizational justice to explain responses to all questions is taken as a measure of
user reactions. User reactions were more favorable interpersonal competence. Now in considering the
when adequate notice was given by the organization observed variance across individuals, each ques-
about the performance assessment process, a fair tion has a specific or unique variance as well as a
hearing was provided, and standards were consis- shared variance with other items. To estimate what
tently applied across individuals. Peer ratings were proportion of the observed variance is common or
more accepted when peers were considered knowl- shared across items, we employ measures of inter-
edgeable and have had opportunity to observe the nal consistency. The most commonly used measure
performance. of internal consistency is Cronbach’s alpha
Earlier research by Borman (1974) had suggested (Cronbach, 1951).
that involvement in the development of rating Internal consistency estimates are also appropri-
scales produced more favorable user reactions. This ate when organizational records are used to assess
is consistent with the idea that the ability to provide individual job performance. If several operational
input into a decision process enhances perceptions measures of absenteeism are obtained and absen-
of procedural justice. Notions of informational and teeism is defined as the common or shared variance
interactional justice (see Chapter 8 in Volume 2 by across these different operationalizations, then an
Gilliland & Chan) also affect user reactions. Taylor, estimate of internal consistency of organizational
Tracey, Renard, Harrison and Carroll (1995) found records can be computed.
that when rater–ratee pairs were randomly formed Stability estimates can be obtained as the correla-
with some raters trained in due process compo- tion between measures obtained at times 1 and 2.
nents, ratees assigned to the trained raters expressed Here true performance is construed as what is com-
more favorable reactions even though their perfor- mon to both time periods. The greater the time
mance evaluations were more negative compared interval, the more likely that true performance will
to the ratees assigned to untrained raters. Several change. Coefficients of stability can be assessed for
researchers (e.g., Villanova, 1992) have advanced both organizational records as well as for subjective
a stakeholder model that explicitly takes into evaluations such as ratings. With ratings, the same
Individual Job Performance 119
rater has to evaluate the individual at both times of studies in the literature compared to longitudinal
assessment; if different raters are used, stability studies, fewer studies have estimated stability coef-
estimates of ratings confound rater differences with ficients. Further, more reliability estimates have
temporal instability. been reported for subjective evaluations such as rat-
To estimate the extent to which two raters will ings than for measures of organizational records.
agree in their ratings, the interrater reliability is Rather than reviewing each study (which is impos-
assessed as the correlation between the ratings pro- sible even in a book-length format), I will summa-
vided by two raters of the same group of individu- rize the results of major studies and meta-analyses
als. In reality, different sets of two raters are used to conducted on this topic.
estimate different individuals; under such circum- Rothe (1978) conducted a series of studies to
stances the interrater correlation also takes into assess the stability of productivity measures for dif-
account rater leniency. Interrater reliability is less ferent samples of chocolate dippers, welders, and
applicable with measures based on organizational other types of workers. Hackett and Guion (1985)
records, unless the interest is on estimating how report the reliability of absenteeism measures.
accurately the performance has been recorded (bet- Accident measures at two different time periods
ter designated as interobserver or intercoder or have been correlated.
interrecorder reliability). Interrater reliability can Viswesvaran et al. (1996) conducted a compre-
be assessed for overall job performance assess- hensive meta-analysis cumulating results across
ments as well as for specific dimensions of individ- studies reporting reliability estimates for peer and
ual job performance. supervisor ratings. Coefficient alphas, stability esti-
Interrater reliability can be assessed for different mates, and interrater reliability estimates were aver-
types of raters: supervisors, peers, subordinates, aged separately. The reliability was reported both
clients/customers. One question that could be raised for assessments of overall job performance as well
is whether there are two ‘parallel’ supervisors. That as for nine dimensions of performance. For super-
is, to estimate interrater agreement among supervi- visory ratings of overall job performance, coeffi-
sors, we need ratings of a group of individuals from cient alpha was .86, the coefficient of stability was
at least two supervisors. In many organizations we .81, and interrater reliability was .52. It appears that
have only one ‘true’ supervisor and a second indi- the largest source of error variance was due to rater-
vidual (perhaps the supervisor to the supervisor) is specific variance. This finding compares with the
included to assess interrater reliability. It could be generalizability estimates obtained by Greguras and
argued that these two sets of ratings are not parallel. Robie (1998) as well as meta-analysis of the gener-
Although conceptually sound, the evidence we alizability studies by Hoyt and Kerns (1999).
review below for interrater reliability of job perfor- The reliability estimates for supervisory ratings
mance assessments shows that this is not the case. of different dimensions of job performance are
The interrater reliability for peer ratings is lower also summarized in Viswesvaran et al. (1996). The
than that for supervisor ratings (and presumably sample size weighted mean estimates (along with
there are parallel peers). total number of estimates averaged and total sample
The different types of reliability estimates for job size across averaged estimates) are provided
performance assessments were explained in terms below. Interrater reliability estimates were
of correlations. However, analysis of variance mod- .57 (k = 19, N = 2,015), .53 (k = 20, N = 2,171),
els can also be used (Hoyt & Kerns, 1999). In fact, .55 (k = 24, N = 2,714), .47 (k = 31, N = 3,006), and
generalizability theory (Cronbach, Gleser, Nanda & .53 (k = 20, N = 14,072), for ratings of productivity,
Rajaratnam, 1972) has been used as a framework leadership, effort, interpersonal competence, and
to assess the variance due to different sources. job knowledge, respectively. Coefficient alphas
Depending on how error variance is conceptual- for ratings of productivity, leadership, effort, inter-
ized, different generalizability coefficients can then personal competence were .82, .77, .79, and .77,
be proposed. Some researchers (e.g., Murphy & respectively.
DeShon, 2000) have mistakenly argued that gener- Viswesvaran et al. (1996) also report the sample
alizability theory alone estimates these different size weighted average reliability for peer ratings.
reliability estimates. In reality, correlational meth- For ratings of overall job performance, interrater
ods and analysis of variance models based on clas- reliability was .42 and coefficient alpha was .85.
sical measurement theory can be (and were) used to Reliabilities for peer ratings of leadership, job know-
estimate the different reliability estimates (general- ledge, effort, interpersonal competence, adminis-
izability coefficients). There is not much difference trative competence, and communication competence
across the different frameworks when properly esti- are also reported (see Viswesvaran et al., 1996).
mated and interpreted (Schmidt, Viswesvaran & Average coefficient alphas for peer ratings of
Ones, 2000). leadership, effort, and interpersonal competence
Several studies that had evaluated individual job were .61, .77, and .61, respectively.
performance report internal consistency estimates. Viswesvaran et al. (1996) focused on peer and
Consistent with the predominance of cross-sectional supervisor ratings, whilst recent studies have
120 Handbook of Industrial, Work and Organizational Psychology — 1
explored the reliability of subordinate ratings, for conceptual replications and cumulated the compos-
example Mount (1984) as well as Mount, Judge, ite correlations. Composite correlations are more
Scullen, Stysma and Hezzlett (1998). Interrater reli- construct-valid than average correlations (see
ability of subordinate ratings have been found to Viswesvaran, Schmidt & Ones, 1994, for a mathe-
vary between .31 to .36 for the various dimensions matical proof). Bommer et al., estimated the conver-
of performance. Scarce data exist for assessing the gence validity between supervisory ratings and
reliability of customer ratings of performance, and organizational records as .39, a value that agrees
research in the new millennium should remedy this with the correlations estimated by Viswesvaran
deficiency in the literature. (1993). Both Heneman (1986) and Bommer et al.
Further research should also explore the effects of (1995) concluded that rating format and rating scale
contextual variables in reliability assessments. do not moderate the convergent validity.
Churchill and Peter (1984) as well as Petersen (1994) McEvoy and Cascio (1987) estimated the correla-
investigated the moderating effects of 13 variables tion between turnover and supervisory ratings of job
on the reliability estimates of different variables performance as −.28. This estimate of −.28 was based
(including job performance). No strong moderator on a cumulation of results across 24 studies involving
effects were found. Rothstein (1990) reported that 7,717 individuals. McEvoy and Cascio (1987) had
the interrater reliability of supervisor ratings of job used a reliability estimate of .60 for supervisory rat-
performance is moderated by the length of exposure ings; using an estimate of .52, results in a correlation
the rater has to the ratees. Similar effects such as of −.30. Bycio (1992) meta-analyzed the results
opportunity to observe should be explored for their across studies reporting a correlation between absen-
effects on reliability estimates. However, these mod- teeism and job performance. Across 49 samples
erating variables can also be construed as variables involving 15,764 datapoints, the correlation was −.29.
affecting the construct validity of ratings. It is erro- This estimate of −.29 averaged results across studies
neous to argue that since several variables could that used either time lost or frequency measures of
potentially affect ratings, interrater reliability esti- absenteeism. When the cumulation was restricted to
mates do not assess reliability. Reliability is not time lost measures of absenteeism, the correlation
validity and validity is not reliability (Schmidt et al., was −.26 (28 samples, 7,704 individuals); when
2000). I now turn to a discussion of the construct restricted to frequency measures of absenteeism, the
validity of individual job performance assessments. estimate was −.32 (21 samples, 8,060 individuals).
In addition to investigating the convergence
between supervisory ratings of job performance
Construct Validity of Individual and organizational records of (1) productivity, and
Job Performance Assessments (2) personnel data such as turnover and absenteeism,
researchers have explored the overlap between
The construct validity of a measure can be concep- organizational records of productivity and person-
tualized as the correlation between the true scores nel data (e.g., absenteeism, promotions etc.). Bycio
from the measures and the underlying construct (i.e., (1992) reports a correlation of .24 between organi-
individual job performance). This correlation can zational records of performance indices and
never be empirically estimated, and several lines of absenteeism (23 samples, 5,204 individuals). The
evidence are analyzed to assess construct validity. A correlation was −.28 (11 samples, 1,649 individu-
major component of construct validity is to assess als) when time lost measures of absenteeism were
the convergent validity between different methods considered; with frequency-based measures of
of assessing the same construct. Heneman (1986) absenteeism (12 samples, 3,555 individuals) the
meta-analytically cumulated the correlation between meta-analyzed correlation was −.22.
subjective evaluations of job performance provided The meta-analytic results summarized so far
by supervisors with organizational records-based focused on supervisory ratings and on ratings of
measures of individual job performance. Heneman overall job performance. Viswesvaran (1993) reports
(1986) cumulated results across 23 studies (involv- correlations between organizational records of
ing a total sample of 3,718) and found a corrected productivity and 10 dimensions of rated job per-
mean correlation of .27 between supervisory ratings formance. The convergent validity of ratings and
and organizational records. Heneman used a relia- records-based measures were analyzed for peers
bility estimate of .60 for supervisory ratings and a and supervisors. In general, the convergent validity
test–retest stability estimate of .63 for output mea- was higher for supervisory ratings than they were
sures. Using a value of .52 for the reliability of for peer ratings. Organizational records seem to
supervisory ratings results in a correlation of .29. reflect the supervisory perspective more than the
Heneman’s (1986) analyses were updated by peer perspective.
Bommer, Johnson, Rich, Podsakoff and Mackenzie The convergence among the different sources of
(1995), who also introduced refinements to the esti- ratings have been explored, and two reviews of this
mation of the convergent validity. Bommer et al. literature have been reported. Mabe and West (1982)
(1995) computed composite correlations across presented the first review of this literature which
Individual Job Performance 121
was subsequently updated by Harris and The last half of the twentieth century has seen an
Schaubroeck (1988). Harris and Schaubroeck explosion of research on judgmental errors that
(1988) found a correlation of .62 between peer and could affect ratings. Lance, LaPointe and Stewart
supervisory ratings of overall job performance (23 (1994) identified three definitions of halo error.
samples, 2,643 individuals). The correlation Halo could be conceptualized as (1) a general eval-
between self and supervisor or peer ratings were uation that affects all dimensional ratings, (2) a
much lower. Whilst Harris and Schaubroeck salient dimension that affects ratings on other
focused on overall ratings of job performance, dimensions, and (3) insufficient discrimination
Viswesvaran, Schmidt and Ones (2000) meta- among dimensions (Solomonson & Lance, 1997).
analyzed the peer–supervisor correlations for over- Cooper (1981) discusses the different measures of
all as well as eight dimensions of job performance. halo as well as strategies designed to mitigate the
Viswesvaran et al. (2000) reported a mean observed effects of halo. Distributional problems such as
peer–supervisor correlation of .40, .48, .38, .34, .35, leniency, central tendency, and stringency have
.36, .41, and .49, for ratings of productivity, effort, been assessed. Judgmental errors such as the funda-
interpersonal competence, administrative compe- mental attribution error, representativeness and
tence, quality, job knowledge, leadership, and com- availability heuristics, and contrast effects in assess-
pliance with authority, respectively. Research ments have been studied. Wherry and Bartlett
suggests (e.g., Harris, Smith & Champagne, 1995) (1982) present a model incorporating many of the
that ratings obtained for administrative and research potential influences on ratings. Recent methodolog-
purposes are comparable. ical advances such as combining meta-analysis and
Most of the extant literature reported correlations structural equations modeling, meta-analysis and
between self ratings, peer ratings, supervisor rat- generalizability theory (Hoyt, 2000; Hoyt & Kerns,
ings, and organizational records. Recent research 1999), have enabled researchers to assess the
has started exploring the convergence between effects of these judgmental processes on the con-
other sources of ratings (e.g., subordinates, cus- struct validity of job performance assessments.
tomers). Mount et al. (1998) report correlations Finally, investigations of the construct validity of
between subordinate ratings and peer or supervisor ratings have been explored by estimating the effects
ratings for overall as well as three dimensions of of demographic variables on assessments. Kraiger
performance. More research is needed in the future and Ford (1985) reported differences between racial
to make robust conclusions of convergent validity groups of almost one half of a standard deviation unit.
across these sources. However, the Kraiger and Ford (1985) meta-analyses
In addition to investigating the convergent valid- included laboratory-based experimental studies as
ity across sources with correlations, researchers well as field studies. More importantly, ratee ability
have used the multitrait–multimethod matrix was not controlled. Pulakos, White, Oppler and
(Campbell & Fiske, 1959) of correlations between Borman (1989) found in a large sample study of job
different methods and performance dimensions to performance assessment in a military setting, that
tease out the trait and method variance. Cote and once ratee ability is controlled, the biasing effects of
Buckley (1987) as well as Schmitt and Stults (1986) race are small. Similar findings were found with
provide elaboration of this approach as well as a civilian samples of over 36,000 individuals across
summary of the application to performance assess- 174 jobs (Sackett & Dubois, 1991). The effects of age
ment. Conway (1996) used an MTMM matrix and and gender of the ratees and raters have also been
confirmatory factor analyses to support the con- investigated (see Cascio, 1991, for a summary). The
struct validity of task and contextual performance biasing effects of demographic variables has not been
measures. Mount et al. (1998), however, caution found to be substantial. The dynamic nature of crite-
that previous applications of this approach had ria has also been investigated, and empirical evidence
neglected within-source variability, and once this suggests that although mean levels of individual job
source is taken into account substantive conclusions performance changed over time, rank ordering of
vary. individuals did not (Barrett, Cladwell & Alexander,
Convergence across sources is one aspect of con- 1989). Although potential exists for distortion, most
struct validity. Assessment of construct validity well-constructed and administered performance
also involves assessing the potential and presence assessments systems result in construct-valid data on
of several sources of variance that is unrelated to individual job performance.
the construct under investigation. From as early as
the 1920s researchers have been developing lists of
factors that could affect the construct validity of job CAUSAL MODELS FOR JOB
performance assessments. Burtt (1926) drew atten-
tion to the potential for criterion contamination PERFORMANCE DIMENSIONS
and deficiency in organizational records, whilst
Thorndike (1920) introduced the concept of halo In the last section of this chapter, I review models of
error in ratings. work behavior that postulate how different individual
122 Handbook of Industrial, Work and Organizational Psychology — 1
differences variables are linked to different aspects antecedent of a particular dimension of job
of performance. The search for explanation and performance. This is also to be expected given the
understanding suggests a step beyond mere predic- positive correlations across the various dimensions.
tion (Schmidt & Kaplan, 1971). Hunter (1983)
developed and tested a causal model where cogni-
tive ability was a direct causal antecedent to both CONCLUSIONS
job knowledge and job performance. Job knowl-
edge was an antecedent to job performance. Both
job knowledge and job performance contributed to Job performance is a central construct in our field.
supervisory ratings. These findings suggest that cog- Voluminous research has been undertaken to assess
nitive ability contributes to overall job performance (1) the factor structure of the construct, (2) refine
through its effects on learning job knowledge and the methods of assessment, (3) assess user reactions,
mastery of required skills. Borman, Hanson, Oppler, reliability, and construct validity of assessments of
Pulakos and White (1993) extended the model to individual job performance, and (4) develop models
explain supervisory performance. of work behavior that delineate the antecedents of
McCloy, Campbell and Cudeck (1994) argued individual job performance. A century of research
that all individual differences variables affect per- suggests that the factor structure of job performance
formance in any dimension by their effects on either can be summarized as a hierarchy with a general
procedural knowledge or declarative knowledge or factor at the apex with group factors at the next
motivation. Barrick, Mount and Strauss (1993) level. The breadth and range of the group factors
tested and found support for a model where consci- differ across authors.
entiousness predicted overall performance by Several methods of assessments have been pro-
affecting goal setting. Ones and Viswesvaran (1996) posed, evaluated, and used. Research on user reac-
argued that conscientiousness has multiple pathways tions has invoked justice theory concepts. Interrater
by which it affects overall performance. First, con- reliability, internal consistency estimates, and sta-
scientious individuals are likely to spend more time bility assessments have been examined for assess-
on the task and less time daydreaming. This invest- ments of overall performance as well as for several
ment of time will result in greater acquisition of job dimensions of performance. Correlational, Anova
knowledge, which in turn will result in greater pro- and generalizability models have been used in reli-
ductivity and which in turn will result in positive rat- ability estimation. The construct validity of individ-
ings. Further, conscientious individuals are likely to ual job performance assessment has been assessed
engage in organizational citizenship behaviors with emphasis on judgmental errors such as halo,
which in turn might enhance productivity and rat- group differences, convergences between different
ings. Finally, conscientious individuals are expected methods of assessments. Finally, path models have
to pay more attention to detail and profit more via been specified to link antecedents to the different
vicarious learning (Bandura, 1977) which would job performance dimensions.
result in higher job knowledge and productivity. Impressive as the existing literature is on assess-
Borman and Motowidlo (1993) postulated that ments of individual job performance, several
ability will predict task performance more strongly trends in the workplace call for additional research.
than individual differences in personality. On the The changing nature of work (Howard, 1996)
other hand, individual differences in personality brings with it the changes in assessment of perfor-
were hypothesized to predict contextual perfor- mance (Ilgen & Pulakos, 1999); and the use of
mance better than ability. Motowidlo et al. (1997) electronic monitoring and other technological
developed a more nuanced model where contextual advances may change the nature of what we mea-
performance was modeled as dependent on contex- sure (Hedge & Borman, 1995). Assessments of
tual habits, contextual skills, and contextual knowl- performance of expatriates will also gain in impor-
edge. Although habits and skills were predicated on tance (see Chapter 20 by Sinangil & Ones in this
personality, contextual knowledge was influenced volume). In short, a lively phase is ahead for
both by personality and cognitive ability. Similarly, researchers and practitioners.
task performance is influenced by task habits, task
skill and task knowledge. Whereas task skill and
task knowledge are influenced solely by cognitive REFERENCES
ability, task habits are affected by both cognitive
ability and personality variables. Thus, this more Alonso, A. (2000). The relationship between cognitive
nuanced model implies that both ability and person- ability, the Big Five, Task and Contextual Performance:
ality have a role in explaining task and contextual A meta-analysis. Unpublished Masters Thesis, Florida
performance. The bottom line appears to be that International University, Miami, FL.
each performance dimension is complexly deter- Arvey, R.D., & Murphy, K.R. (1998). Performance evalu-
mined so that it is impossible to specify different ation in work settings. Annual Review of Psychology,
individual differences variables as sole cause or 49, 141–168.
Individual Job Performance 123
Austin, J.T., & Villanova, P. (1992). The criterion (2nd ed., Vol. 2, pp. 271–326). Palo Alto, CA:
problem: 1917–1992. Journal of Applied Psychology, Consulting Psychologists Press.
77, 836–874. Borman, W.C., Hanson, M.A., Oppler, S.H.,
Baier, D.E., & Dugan, R.D. (1957). Factors in sales suc- Pulakos, E.D., & White, L.A. (1993). Role of early
cess. Journal of Applied Psychology, 41, 37–40. supervisory experience in supervisor performance.
Bandura, A. (1977). Social learning theory. Englewood Journal of Applied Psychology, 78, 443–449.
Cliffs, NJ: Prentice-Hall. Borman, W.C., & Motowidlo, S.J. (1993). Expanding the
Barber, A.E. (1998). Recruiting employees: Individual criterion domain to include elements of contextual per-
and organizational perspectives. Thousand Oaks, CA: formance. In N. Schmitt & W.C. Borman (Eds.),
Sage. Personnel selection in organizations (pp. 71–98).
Barrett, G.V., Cladwell, M.S., & Alexander, R.A. (1989). San Francisco, CA: Jossey-Bass.
The predictive stability of ability requirements for task Borman, W.C., Motowidlo, S.J., Rose, S.R., &
performance: A critical reanalysis. Human Performance, Hansen, L.M. (1985). Development of a model of
2, 167–181. soldier effectiveness. Minneapolis, MN: Personnel
Barrick, M.R., Mount, M.K., & Strauss, J. (1993). Decisions Research Institute.
Conscientiousness and performance of sales representa- Borman, W.C., White, L.A., Pulakos, E.D., Oppler, S.H.
tives: Test of the mediating effects of goal setting. (1991). Models of supervisory job performance ratings.
Journal of Applied Psychology, 78, 715–722. Journal of Applied Psychology, 76, 863–872.
Bechtoldt, H.P. (1947). Factorial investigation of the Brief, A.P., & Motowidlo, S.J. (1986). Prosocial organi-
perceptual-speed factor. American Psychologist, 2, zational behavior. Academy of Management Review, 11,
304–305. 710–725.
Bellows, R.M. (1941). Procedures for evaluating loca- Brogden, H., & Taylor, E.K. (1950). The dollar criterion:
tional criteria. Journal of Applied Psychology, 25, Applying the cost accounting concept to criterion con-
499–513. struction. Personnel Psychology, 3, 133–154.
Bendig A.W. (1954). Reliability and the number of Burtt, H.E. (1926). Principles of employment psychology.
rating-scale categories. Journal of Applied Psychology, Boston: Houghton-Mifflin.
38, 38–40. Bycio, P. (1992). Job performance and absenteeism: A
Bernardin, J.H., Alvares, K.M., Cranny, C.J. (1976). A review and meta-analysis. Human Relations, 45,
recomparison of behavioral expectation scales to sum- 193–220.
mated scales. Journal of Applied Psychology, 61(5), Campbell, J.P. (1990). Modeling the performance predic-
564–570. tion problem in industrial and organizational psycho-
Bernardin, H.J., & Beatty, R. (1984). Performance logy. In M. Dunnette & L.M. Hough (Eds.), Handbook
appraisal: Assessing human behavior at work. Boston: of industrial and organizational psychology (Vol. 1,
Kent-PWS. 2nd ed., pp. 687–731). Palo Alto, CA: Consulting
Bernardin, H.J., Pence, E.C. (1980). Effects of rater train- Psychologists Press.
ing: Creating new response sets and decreasing accu- Campbell, D.T., & Fiske, D.W. (1959). Convergent and
racy. Journal of Applied Psychology, 65(1), 60–66. discriminant validation by means of the multitrait-
Bernardin, H.J., & Russell, J.E.A. (1998). Human mutimethod matrix. Psychological Bulletin, 56, 81–105.
resource management: An experiential approach (2nd Campbell, J.P., Gasser, M.B., & Oswald, F.L. (1996). The
ed.). Boston, MA: McGraw-Hill. substantive nature of job performance variability. In
Blanz, F., & Ghiselli, E.E. (1972). The mixed standard K.R. Murphy (Ed.), Individual differences and behavior in
scale: A new rating system. Personnel Psychology, 25, organizations (pp. 258–299). San Francisco: Jossey-Bass.
185–199. Campbell, J.P., McCloy, R.A., Oppler, S.H., & Sager, C.E.
Blum, M.L., & Naylor, J.C. (1968). Industrial (1993). A theory of performance. In N. Schmitt &
Psychology: Its theoretical and social foundations. W.C. Borman (Eds.), Personnel selection in organiza-
New York: Harper & Row. tions (pp. 35–70). San Francisco, CA: Jossey-Bass.
Bobko, P., & Colella, A. (1994). Employee reactions to Campbell, J.P., McHenry, J.J., & Wise, L.L. (1990).
performance standards: A review and research proposi- Modeling job performance in a population of jobs.
tions. Personnel Psychology, 47, 1–29. Personnel Psychology, 43, 313–333.
Bommer, W.H., Johnson, J.L., Rich, G.A., Cascio, W.F. (1991). Applied psychology in personnel man-
Podsakoff, P.M., & MacKenzie, S.B. (1995). On the agement (4th ed.). Englewood Cliffs, NJ: Prentice-Hall.
interchangeability of objective and subjective measures Church, A.H., & Bracken, D.W. (1997). Advancing the
of employee performance: A meta-analysis. Personnel state of the art of 360 degree feedback. Group and
Psychology, 48, 587–605. Organization Management, 22, 149–161.
Borman, W.C. (1974). The rating of individuals in organi- Churchill, G.A., Jr., & Peter, J.P. (1984). Research design
zations: An alternate approach. Organizational Behavior effects on the reliability of rating scales: A meta-analysis.
and Human Performance, 12, 105–124. Journal of Marketing Research, 21, 360–375.
Borman, W.C. (1991). Job behavior, performance, and Clark, J.P., & Hollinger, R.C. (1983). Theft by employees
effectiveness. In M.D. Dunnette, & L.M. Hough (Eds.), in work organizations: Executive summary. Washington,
Handbook of industrial and organizational psychology DC: National Institute of Justice.
124 Handbook of Industrial, Work and Organizational Psychology — 1
Cleveland, J.N., Murphy, K.R., & Williams, R.E. (1989). Guion, R.M. (1998). Assessment, measurement, and pre-
Multiple uses of performance appraisal: Prevalence and diction for personnel selection. Mahwah, NJ: Lawrence
correlates. Journal of Applied Psychology, 74, 130–135. Erlbaum.
Conway, J.M. (1996). Analysis and design of multitrait- Gunderson, E.K.E., & Ryman, D.H. (1971). Convergent
multirater performance appraisal studies. Journal of and discriminant validities of performance evaluations
Management, 22, 139–162. in extremely isolated groups. Personnel Psychology,
Conway, J.M. (1999). Distinguishing contextual perfor- 24, 715–724.
mance from task performance for managerial jobs. Hackett, R.D., & Guion, R.M. (1985). A re-evaluation of
Journal of Applied Psychology, 84, 3–13. the absenteeism-job satisfaction relationship. Organiza-
Cooper, W.H. (1981). Ubiquitous halo. Psychological tional Behavior and Human Decision Processes,
Bulletin, 90(2), 218–244. 35, 340–381.
Cote, J.A., & Buckley, M.R. (1987). Estimating trait, Harris, M.M., & Schaubroeck, J. (1988). A meta-analysis
method and error variance: Generalizing across seventy of self-supervisor, self-peer, and peer-supervisor rat-
construct validation studies. Journal of Marketing ings. Personnel Psychology, 41, 43–62.
Research, 24, 315–318. Harris, M.M., Smith, D.E., & Champagne, D. (1995). A
Cronbach, L.J. (1951). Coefficient alpha and the internal field study of performance appraisal purpose: Research-
structure of tests. Psychometrika, 16, 297–334. versus administrative-based ratings. Personnel
Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. Psychology, 48, 151–160.
(1972). The dependability of behavioral measurements: Hedge, J.W., & Borman, W.C. (1995). Changing concep-
Theory of generalizability for scores and profiles. tions and practices in performance appraisal. In
New York: Wiley. A. Howard (Ed.), The changing nature of work
DeVries, D.L., Morrison, A.M., Shullman, S.L., & (pp. 451–481). San Francisco, Jossey-Bass.
Gerlach, M.L. (1986). Performance appraisal on the Heneman, R.L. (1986). The relationship between supervi-
line. Greensboro, NC: Center for Creative Leadership. sory ratings and results-oriented measures of perfor-
Dickinson, T.L. (1993). Attitudes about performance mance: A meta-analysis. Personnel Psychology, 39,
appraisal. In H. Schuler, J.L. Farr & M. Smith (Eds.), 811–826.
Personnel selection and assessment: Individual and Howard, A. (Ed.) (1996). The changing nature of work.
organizational perspectives (pp. 141–162). Hillsdale, San Francisco, Jossey-Bass.
NJ: Erlbaum. Hoyt, W.T. (2000). Rater bias in psychological research:
DuBois, C.L., Sackett, P.R., Zedeck, S., & Fogli, L. When is it a problem and what can we do about it?
(1993). Further exploration of typical and maximum Psychological Methods, 5, 64–86.
performance criteria: Definitional issues, prediction, Hoyt, W.T., & Kerns, M.D. (1999). Magnitude and mod-
and white-black differences. Journal of Applied erators of bias in observer ratings: A meta analysis.
Psychology, 78, 205–211. Psychological Methods, 4, 403–424.
Farmer, E. (1933). The reliability of the criteria used for Hunt, S.T. (1996). Generic work behavior: An investiga-
accessing the value of vocational tests. British Journal tion into the dimensions of entry-level, hourly job per-
of Psychology, 24, 109–119. formance. Personnel Psychology, 49, 51–83.
Folger, R., Konovsky, M.A., & Cropanzano, R. (1992). A Hunter, J.E. (1983). Test validation for 12,000 jobs: An
due process metaphor for performance appraisal. In application of job classification and validity general-
B.M. Staw & L.L. Cummings (Eds), Research in ization to General Aptitude Test Battery (USES Test
Organizational Behavior (Vol. 14, pp. 129–177). Research Report no. 45). Washington, DC: United
Greenwich, CT: JAI Press. States Department of Labor.
Freyd, M. (1926). What is applied psychology? Psycho- Ilgen, D.R., & Pulakos, E.D. (1999). The changing nature
logical Review, 33, 308–314. of performance: Implications for staffing, motivation,
George, J.M., & Brief, A.P. (1992). Feeling good–doing and development. San Francisco: Jossey-Bass.
good: A conceptual analysis of the mood at work- Kane, J.S. (1986). Performance distribution assessment. In
organizational spontaneity relationship. Psychological R.A. Berk (Ed.), Performance assessment (pp. 237–273).
Bulletin, 112, 310–329. Baltimore: Johns Hopkins University Press.
Goldberg, L.R. (1995). What the hell took so long? Katz, D. (1964). The motivational basis of organizational
Donald Fiske and the big-five factor structure. In behavior. Behavioral Science, 9, 131–146.
P.E. Shrout & S.T. Fiske (Eds.), Advances in personal- Kidwell, R.E., & Bennett, N. (1993). Employee propen-
ity research, methods, and theory: A festschrift honor- sity to withhold effort: A conceptual model to intersect
ing Donald W. Fiske. New York, NY: Erlbaum. three avenues of research. Academy of Management
Greguras, G.J., & Robie, C. (1998). A new look at within- Review, 18, 429–456.
source interrater reliability of 360-degree feedback rat- King, L.M., Hunter, J.E., & Schmidt, F.L. (1980). Halo in
ings. Journal of Applied Psychology, 83, 960–968. a multidimensional forced-choice performance evalua-
Guion, R.M. (1976). Recruiting, selection, and job place- tion scale. Journal of Applied Psychology, 65, 507–516.
ment. In M.D. Dunnette (Ed.), Handbook of industrial Klimoski, R., & London, M. (1974). Role of the rater in
and organizational psychology (pp. 777–828). Chicago: performance appraisal. Journal of Applied Psychology,
Rand McNally. 59, 445–451.
Individual Job Performance 125
Kraiger, K., & Ford, J.K. (1985). A meta-analysis of ratee Organ, D.W. (1988). Organizational citizenship behavior.
race effects in performance ratings. Journal of Applied Lexington, MA: D.C. Heath.
Psychology, 70, 56–65. Pedhazur, E.J., & Schmelkin, L.P. (1991). Measurement,
Lance, C.E., LaPointe, J.A., & Stewart, A.M. (1994). A test design, and analysis: An integrated approach.
of the context dependency of three causal models of halo Hillsdale, NJ: Erlbaum.
rater error. Journal of Applied Psychology, 79, 332–340. Peterson, R.A. (1994). A meta-analysis of Cronbach’s
Landy, F.J., & Farr, J.L. (1980). Performance rating. coefficient alpha. Journal of Consumer Research, 21,
Psychological Bulletin, 87, 72–107. 381–391.
Latham, G.P., Fay, C., & Saari, L.M. (1980). BOS, BES, Prien, E.P., & Kult, M. (1968). Analysis of performance cri-
and baloney: Raising Kane with Bernardin. Personnel teria and comparison of a priori and empirically-derived
Psychology, 33, 815–821. keys for a forced-choice scoring. Personnel Psychology,
Lissitz, R., & Green, S.B. (1975). Effect of the number of 21, 505–513.
scale points on reliability: A Monte Carlo approach. Pulakos, E.D., White, L.A., Oppler, S.H., & Borman, W.C.
Journal of Applied Psychology, 60, 1–10. (1989). Examination of race and sex effects on perfor-
Mabe, P.A. III, & West, S.G. (1982). Validity of self-eval- mance ratings. Journal of Applied Psychology, 74,
uation of ability: A review and meta-analysis. Journal 770–780.
of Applied Psychology, 67, 280–296. Roach, D.E., & Wherry, R.J. (1970). Performance dimen-
Mace, C.A. (1935). Incentives: Some experimental studies. sions of multi-line insurance agents. Personnel Psycho-
(Report 72). London: Industrial Health Research Board. logy, 23, 239–250.
McCloy, R.A., Campbell, J.P., & Cudeck, R. (1994). A Ronan, W.W. (1963). A factor analysis of eleven job per-
confirmatory test of a model of performance determi- formance measures. Personnel Psychology, 16, 255–267.
nants. Journal of Applied Psychology, 79, 493–505. Ronan, W.W., & Prien, E. (1966). Toward a criterion
McEvoy, G.M., & Cascio, W.F. (1987). Do good or poor theory: A review and analysis of research and opinion.
performers leave? A meta-analysis of the relationship Greensboro, NC: Smith Richardson Foundation.
between performance and turnover. Academy of Rothe, H. (1978). Output rates among industrial employ-
Management Journal, 30, 744–762. ees. Journal of Applied Psychology, 63, 40–46.
Motowidlo, S.J., Borman, W.C., & Schmit, M.J. (1997). Rothstein, H.R. (1990). Interrater reliability of job perfor-
A theory of individual differences in task and contex- mance ratings: Growth to asymptote level with increas-
tual performance. Human Performance, 10, 71–83. ing opportunity to observe. Journal of Applied
Mount, M.K. (1984). Psychometric properties of subordi- Psychology, 75, 322–327.
nate ratings of managerial performance. Personnel Rush, C.H. (1953). A factorial study of sales criteria.
Psychology, 37, 687–702. Personnel Psychology, 6, 9–24.
Mount, M.K., Judge, T.A., Scullen, S.E., Stysma, M.R., & Sackett, P.R., & DuBois, C.L. (1991). Rater-ratee race
Hezlett, S.A. (1998). Trait, rater, and level effects in effects on performance evaluations: Challenging meta-
360-degree performance ratings. Personnel Psychology, analytic conclusions. Journal of Applied Psychology,
51, 557–576. 76, 873–877.
Murphy, K.R. (1989). Dimensions of job performance. Sackett, P.R., & Wanek, J.E. (1996). New developments in
In R. Dillon & J. Pelligrino (Eds.), Testing: Applied and the use of measures of honesty, integrity, conscientious-
theoretical perspectives (pp. 218–247). New York: ness, dependability, trustworthiness and reliability for
Praeger. personnel selection. Personnel Psychology, 49, 787–830.
Murphy, K.R., & Cleveland, J.N. (1995). Understanding Sackett, P.R., Zedeck, S., & Fogli, L. (1988). Relations
performance appraisal: Social, organizational, and between measures of typical and maximum job perfor-
goal-based perspectives. Thousand Oaks, CA: Sage. mance. Journal of Applied Psychology, 73, 482–486.
Murphy, K.R., & DeShon, R. (2000). Inter-rater correla- Schmidt, F.L. (1980). The measurement of job perfor-
tions do not estimate the reliability of job performance mance. Unpublished manuscript.
ratings. Personnel Psychology, 53, 873–900. Schmidt, F.L., & Hunter, J.E. (1989). Interrater reliability
Nunnally, J.C. (1978). Psychometric theory (2nd ed.). coefficients cannot be computed when only one stimulus
New York: McGraw Hill. is rated. Journal of Applied Psychology, 74, 368–370.
Ones, D.S., & Viswesvaran, C. (1996). A general theory Schmidt, F.L., & Hunter, J.E. (1992). Causal modeling of
of conscientiousness at work: Theoretical underpin- processes determining job performance. Current
nings and empirical findings. In J.M. Collins (Chair), Directions in Psychological Science, 1, 89–92.
Personality predictors of job performance: Contro- Schmidt, F.L., & Hunter, J.E. (1996). Measurement error
versial issues. Symposium conducted at the eleventh in psychological research: Lessons from 26 research
annual meeting of the Society for Industrial and scenarios. Psychological Methods, 1, 199–223.
Organizational Psychology, San Diego, CA, April. Schmidt, F.L., & Kaplan, L.B. (1971). Composite versus
Ones, D.S., Viswesvaran, C., & Schmidt, F.L. (1993). multiple criteria: A review and resolution of the contro-
Comprehensive meta-analysis of integrity test validi- versy. Personnel Psychology, 24, 419–434.
ties: Findings and implications for personnel selection Schmidt, F.L., Viswesvaran, C., & Ones, D.S. (2000).
and theories of job performance. Journal of Applied Reliability is not validity and validity is not reliability.
Psychology, 78, 679–703. Personnel Psychology, 53, 901–912.
126 Handbook of Industrial, Work and Organizational Psychology — 1
Schmitt, N., & Stults, D.M. (1986). Methodology review: Villanova, P. (1992). A customer-based model for devel-
Analysis of multitrait–multimethod matrices. Applied oping job performance criteria. Human Resource
Psychological Measurement, 10(1), 1–22. Management Review, 2, 103–114.
Schwab, D.T., Heneman, H.G. III., & DeCotiis, T. (1975). Viswesvaran, C. (1993). Modeling job performance: Is
Behaviorally anchored rating scales: A review of the lit- there a general factor? Unpublished doctoral disserta-
erature. Personnel Psychology, 28, 549–562. tion, University of Iowa, Iowa City, IA.
Seashore, S.E., Indik, B.P., & Georgopoulos, B.S. (1960). Viswesvaran, C., & Ones, D.S. (1995). Theory testing:
Relationships among criteria of job performance. Combining psychometric meta-analysis and structural
Journal of Applied Psychology, 44, 195–202. equations modeling. Personnel Psychology, 48,
Smith, C.A., Organ, D.W., & Near, J.P. (1983). Organiza- 865–885.
tional citizenship behavior: Its nature and antecedents. Viswesvaran, C., & Ones, D.S. (2000). Perspectives on
Journal of Applied Psychology, 68, 655–663. models of job performance. International Journal of
Smith, P.C. (1976). Behavior, results, and organizational Selection and Assessment, 8, 216–226.
effectiveness: The problem of criteria. In M.D. Dunnette Viswesvaran, C., Ones, D.S., & Schmidt, F.L. (1996).
(Ed.), Handbook of industrial and organizational Comparative analysis of the reliability of job perfor-
psychology (pp. 745–775). Chicago: Rand McNally. mance ratings. Journal of Applied Psychology, 81,
Smith, P.C., & Kendall, L.M. (1963). Retranslation of 557–574.
expectations. Journal of Applied Psychology, 47, Viswesvaran, C., Schmidt, F.L., & Ones, D.S. (1994).
149–155. Examining the validity of supervisory ratings of job
Solomonson, A.L., & Lance, C.E. (1997). Examination of performance using linear composites. Paper presented
the relationship between true halo and halo error in per- in F.L. Schmidt (Chair), The construct of job perfor-
formance ratings. Journal of Applied Psychology, 82, mance. Symposium conducted at the ninth annual meet-
665–674. ing of the Society of Industrial and Organizational
Symonds, P. (1924). On the loss of reliability in ratings Psychologists, Nashville, Tennessee, April.
due to coarseness of the scale. Journal of Experimental Viswesvaran, C., Schmidt, F.L., & Ones, D.S. (2000). The
Psychology, 7, 456–461. moderating influence of job performance dimensions
Taylor, M.S., Tracey, K.B., Renard, M.K., Harrison, J.K., & on convergence of supervisory and peer ratings of job
Carroll, S.J. (1995). Due process in performance performance: Unconfounding construct-level conver-
appraisal: A quasi-experiment in procedural justice. gence and rating difficulty. Unpublished manuscript.
Administrative Science Quarterly, 40, 495–523. Viteles, M.S. (1932). Industrial psychology. New York:
Thorndike, E.L. (1920). A constant error in psychological Norton.
ratings. In J.P. Porter & W.F. Book (Eds.), The Journal Wallace, S.R. (1965). Criteria for what? American
of Applied Psychology, 4, 25–29. Psychologist, 20, 411–417.
Thorndike, R.L. (1949). Personnel selection: Test and Welbourne, T.M., Johnson, D.E., & Erez, A. (1998). The
measurement techniques. New York: Wiley. role-based performance scale: Validity analysis of a
Toops, H.A. (1944). The criterion. Educational and theory-based measure. Academy of Management
Psychological Measurement, 4, 271–297. Journal, 41, 540–555.
Van Dyne, L., Cummings, L.L., & Parks, J.M. (1995). Wherry, R.J. (1957). The past and future of criterion
Extra-role behaviors: Its pursuit of construct and defin- evaluation. Personnel Psychology, 10, 1–5.
itional clarity (a bridge over muddied waters). In Wherry, R.J., Bartlett, C.J. (1982). The control of bias in
L.L. Cummings & B.M. Staw (Eds.), Research in ratings: A theory of rating. Personnel Psychology, 35,
organizational behavior (Vol. 17, pp. 215–285). 521–551.
Greenwich, CT: JAI Press. Whisler, T.L., & Harper, S.F. (Eds.) (1962). Performance
Van Scotter, J.R., & Motowidlo, S.J. (1996). Interpersonal appraisal: Research and practice. New York: Holt.
facilitation and job dedication as separate facets of con-
textual performance. Journal of Applied Psychology,
81, 525–531.