Psychologists: Lie Scales Validity
Psychologists: Lie Scales Validity
Abstract
An assumed feature of lie scales is that partialling out their influence achieves truer scores
on other scales, which should increase reliability when one of the measurements is taken in
a socially sensitive context. This was tested in several samples. Controlling for faking
responding increases the reliability of other scales, that lie scales capture aspects of
personality that are not limited to self-report situations, and measure common method
variance. Controlling for social desirability does to some degree have the desired result,
Key words: social desirability, lie scale, common method variance, response bias
1
ON THE VALIDITY OF LIE SCALES
Introduction
Within psychology, the use of self-reported data is very common, and for several
sub-areas, it is totally dominating. This state of the art is somewhat peculiar, as self-reports
have often been shown to be very unreliable (see the review in af Wåhlberg, 2009), and
when no other source of data is used, the risk of common method variance always exist
(Podsakoff & Organ, 1986; Podsakoff, Mackenzie, Lee & Podsakoff, 2003), meaning that
The main source of bias (or at least the most discussed and researched factor)
would seem to be socially desirable responding (e.g. Furnham, 1986; Ashley & Holtgraves,
Schill & Leung, 1977). This mechanism is usually thought to be individual differences in
socially acceptable. Whether this acceptability refer to the wider social norm within the
society, or of some sub-group to which the respondent belong, does not seem to be known,
To measure the socially desirable response set, lie scales were invented (e.g.
Crowne & Marlowe, 1960). These scales work according to the logic that only people who
are high on social deception would endorse very improbable and trivial statements such as
‘I have never stolen anything, not even a hairpin’. This basic logic has become the standard
for lie scales and a fair number of different scales exist, such as the Unlikely Virtues scale
(Ellingson, Smith & Sackett, 2001) and the Balanced Inventory of Desirable Responding
(Paulhus, 1984). The actual use for these lie scales beside pure research would seem to be
to correct the values of other scales (often personality), for example when hiring personnel
2
ON THE VALIDITY OF LIE SCALES
The use of lie scales would seem to imply that a sound suspicion of self-reported
data exists. However, when the issue of validity of lie scales is studied, it is found that
most researchers take this feature for granted (e.g. Ramanaiah, Schill & Leung, 1977;
Kline, Sulsky & Rever-Moriyama, 2000; Fischer & Katz, 2000; see the review by Uziel,
2010).
Some researchers, however, claim that social desirability is not an effect elicited by
the questions posed, but a personality trait, i.e. a tendency to behave in a certain way which
pervades all facets of life. This 'style' (response tendency) versus 'substance' (personality
trait) discussion has been going on for decades (McRae & Costa, 1983; Uziel, 2010). The
evidence seem to favor the substance position (Uziel, 2010), but lie scales are, despite this,
It was pointed out by Ellingson, Sackett and Hough (1999) that a habit of accepting
lie scale corrected personality scores as valid seem to exist, without any evidence for this
assumption being presented. These authors suggested a test where the lie scale effect was
partialled out from dishonest responses and compared to honest responses, seeing if this
correction made the variables more similar (i.e. if a stronger correlation resulted). This was
not found in their study. However, the method used by Ellingson et al. was the standard
faking method (respondents were instructed to fake), meaning that the error might lie with
However, testing the validity of lie scales and/or the correction method, under
ecologically valid circumstances, is not easy, which might explain why this specific test
does not seem to have been carried out (see the review by Uziel, 2010).
induced social desirability was used to test whether correcting for lie scale values increased
3
ON THE VALIDITY OF LIE SCALES
scales.
Method
General
Data was gathered in three different projects where the effects of online driver
education for driving offenders were evaluated (af Wåhlberg, 2010a; 2011). In many police
districts in Britain, drivers who have committed an endorsable offence can chose to take,
and pay for, an educational course instead of paying a fine and possibly receive penalty
points on their licenses. Usually, these courses are of the workshop model, but in 2008, the
The presently used data has previously been analysed in several other investigations
Data was available from three offender groups (young drivers, seatbelt and red light
offenders), as well as a random control sample. The young drivers (YDS) and the seatbelt
offenders (SS) were from Thames Valley, while the red light running scheme (RLS)
The drivers who had elected to take the course were directed to a homepage where
they were requested to respond to a questionnaire before they could take the first
educational module. For the YDS and RLS, the course takers were again directed to a
questionnaire when they had finished the course, and six months later they were invited by
e-mail to respond to a third wave. For the SS, there were only two waves, three months
4
ON THE VALIDITY OF LIE SCALES
apart, with an e-mail prompt for the second one. The control group was recruited by the
use of an e-mail scheme, where mail was sent to lists of (UK) addresses bought from a
marketing company. A sweepstake with GPS gadgets as prizes was used as incentive. The
responders to this request received another e-mail six months later, requesting them to
respond again.
Questionnaires
scales from well known inventories, as summarized in Table 1. The control group
The violations scale from the Manchester Driver Behaviour Questionnaire (DBQ-
V; for a review, see af Wåhlberg, Dorn & Kline, 2011) is intended to canvass intentional
dangerous driving acts, like speeding and overtaking under uncertain conditions, and is
widely used within traffic safety research. From the same inventory was included the lapse
scale (DBQ-L), which targets vehicle handling mistakes, like shifting into the wrong gear.
The (Brief) Driving Anger Scale (DAS; Deffenbacher, Oetting & Lynch, 1994;
Deffenbacher, Richards, Filetti & Lynch, 2005) is supposed to measure the frequency of
anger experienced due to common driving events, like being held up by someone blocking
the road. The Aggression scale of the Driver Behaviour Inventory (DBI-A; Gulian,
Glendon, Matthews, Davies & Debney, 1988; for a review, see af Wåhlberg, unpublished),
is somewhat similar to the DAS, but describe the emotions experienced more in terms of
stress and irritation. None of these scales has been validated against objective data.
Two scales included that were not driving-specific were the (Short) Sensation
Seeking Scale (SSS; Slater, 2003), and a Big Five Conscientiousness scale (Gosling,
Rentfrow & Swann, 2003). These are supposed to measure dimensions of personality.
5
ON THE VALIDITY OF LIE SCALES
Finally, two lie scales were used; the Driver Impression Management (DIM) scale
of the Driver Social Desirability Scale (Lajunen, Corry, Summala & Hartley, 1997) and the
Marlowe-Crowne scale (M-C; Hays, Hayashi & Stewart, 1989). The DIM scale has been
found to correlate negatively with self-reported collisions, and very weakly positively with
recorded crashes (af Wåhlberg, Dorn & Kline, 2010), as could be expected if it was valid
(and self-reports of collisions were influenced by social desirability). As for the Marlowe-
Crowne scale, no information regarding its criterion validity was found. Both scales are
intended to measure the impression management part of social desirability, i.e. conscious
faking.
Given that lie scales measure individual differences in response tendency, it could
be expected that controlling for faking in the self-reports (by the use of a lie scale) would
increase the correlations between scales in different waves for the YDS, RLS and SS
samples, but not the Control sample, due to the socially sensitive situation of the first wave
of the education samples. In other words, it was assumed that the first wave measurements
for YDS, RLS and SS were distorted by social desirability response bias, due to the
situation, and that the lie scales included could capture this distortion. For the control
sample, no such effect was expected, as there was no difference between waves regarding
their situation. That the offending drivers were indeed in a situation where they were prone
to socially desirable responding in the first wave had been established in the evaluation
6
ON THE VALIDITY OF LIE SCALES
Therefore, each scale from the first wave was correlated with itself measured in
later waves. Thereafter, these correlations were re-run, controlling for the lie scale values
Results
First, mean values for each scale and the differences for these between waves were
calculated. It can be seen in Table 3 that strong differences were found for most of the
scales, especially for the offender groups, as reported previously (af Wåhlberg, 2010), with
responses growing worse (less socially acceptable) with time. This has been interpreted as
an effect of situationally induced social desirability, where the offending drivers were
strongly motivated to lie before the course (in wave 1), but less so after the course (waves
2 and 3), as indicated by the lie scales. Such a conclusion was possible to draw because
these changes were not limited to the driving-specific scales, but strong effects were also
noted for the personality scales (Sensation Seeking, Big Five Conscientiousness). As
personality usually does not change much over time, this is probably due to changes in a
7
ON THE VALIDITY OF LIE SCALES
Thereafter, the lie scales were correlated with the other scales within each sample
and wave (Table 4). Notable is that all these correlations were at least moderate in size,
even the driving lapses scale, which is about involuntary but non-dangerous behavior.
In Table 5 can be seen the correlations between waves for all scales used, and
partial correlations between waves, controlling for lie scales (from both waves). It can be
noted that in every single case, applying a social desirability control had the effect of
All in all, but excluding the lie scales, the average test-retest correlation was .599,
while the partial correlation was .540, a reduction of 18.7 percent in amount of explained
variance (6.7 units reduced). Comparing the Control sample with the other, it was found
that the first had an average correlation of .674, partial r of .629, and thus a reduction in
explained variance of 12.7 percent. The education samples had values of .578 and .514, a
20.8 percent reduction. The effect was thus substantially larger in the samples where the
respondents were under social pressure, despite the fact that the control sample correlation
If only one lie scale wave was controlled for, results were very similar to those
described for controls in each wave. In each case, the first wave accounted for at least half
the difference between raw and partial correlations, and in many cases almost all of it.
Also, the correlations between scales within and between waves were similarly reduced
Discussion
between-measurements correlations when corrected for lie scale scores. On the contrary,
8
ON THE VALIDITY OF LIE SCALES
all correlations shrunk, some of them quite drastically. This effect was also present in the
control sample, where no situationally induced social desirability had been detected,
although not as strongly as in the educational samples. The results can therefore be
regarded as very trustworthy, and the explanation for the unexpected results must be
It should first be asked, what does a so-called lie scale measure? As no ecologically
valid test of such a scale seems to exist, this is a very relevant question. This could also be
phrased as ‘if such scales do not measure a tendency of socially desirable responding, what
do they measure?’.
In agreement with many previous authors (e.g. McRae & Costa, 1983; Uziel, 2010),
it is concluded from the present data that what is measured by lie scales is mainly a
substantive trait, at least in terms of it being stable over time and situations. This would
seem to be a personality trait which is always present, in all kinds of social situations. Lie
Schmitt, Oswald, Kim, Gillespie, Ramsay & Yoo, 2003; Biderman, Nguyen, Mullins &
Luna, 2008; see the review by Uziel, 2010), especially behavior in socially sensitive
measuring the same variable) are predicted by self-reports, it would seem to be the rule that
the association is always strongest between the reports (Hessing, Elffers & Weigel, 1988;
Armitage & Conner, 2001; Elliott, Armitage & Baughan, 2007; af Wåhlberg, Dorn &
As several very (theoretically) different scales were used in the present study, and
these still tended to correlate with the lie scales, and each other, it would seem to be
evident that the lie scales do indeed capture some sort of response bias. This tendency was
9
ON THE VALIDITY OF LIE SCALES
also stable over time, despite the differences in situations between waves. But if the lie
scales are valid to some degree in that they capture common method variance, even if very
limited, the question remains; why does controlling for this influence not create stronger
The first answer to this question would seem to be that there are no conditions
when social desirability is not active, which Ellingson et al. apparently assumed when
suggesting the test run in the present study. Whenever a questionnaire is presented,
(individual differences in) social desirability seems to influence the responses. Therefore,
instead of adding error, social desirability adds systematic variance, which creates stability
between measurements. Similar arguments can be made for other response bias
If social desirability would be able to insert error variance in such a way as to distort test-
retest reliability values, this presumes that there is a fairly stable behavior (or attitude)
about which the respondents are able to report correctly. Given the present results, it can
instead be claimed that the correlations found between questionnaire waves are to some
degree due to a stable response tendency, and not to what is purportedly measured.
Given the present results, it can be stated that lie scales are not very different from
many other questionnaire scales, although they are more extremely worded. As other
scales, they seem to pick up several different response biases, like scale use. This, of
course, should be no surprise. Why should lie scales be free from the types of biases that
have been noted for other scales? However, such a conclusion is usually not drawn by
researchers discussing what lie scales measure (e.g. McRae & Costa, 1983; Smith &
10
ON THE VALIDITY OF LIE SCALES
Ellingson, 2002; Uziel, 2010), and this is where the present study depart from previous
One interesting feature of the lie scale for driving that was used can pointed out; it
had been tested as predictor of self-reported and recorded accidents, and while the former
were negatively correlated with the scale, the latter were unrelated, or possibly slightly
positively associated (af Wåhlberg, Dorn & Kline, 2010). In this sense, the scale was
therefore validated, and functioned as intended. However, again, this does not mean that it
Concluding that lie scales do not measure response style should not be interpreted
as evidence that social desirability bias does not exist in self-reports, only that lie scales is
not a good method for detection of this. Does this mean that correcting for social
desirability is wrong, as claimed by Uziel and others? It would seem to depend upon what
the goal is. If the goal is to cleanse a measurement of bias induced by the measurement
format, it could be used in this way, unless there is something in the social dimension
which is of interest. However, as stated, any kind of scale can probably be used for this
goal.
In the present study, there was a fair number of non-responders (as can be seen in
the difference between calculations in YDS and RLS). However, this is not a problem, as
what this is about is what happen when you use a questionnaire. Whether results would
have been different if non-responders had been included is beside the point, as this is how
test of response bias in the presently used data sets, which have the peculiar feature of
100% response rate to the first wave, indicate that differences between those who
responded to later waves and those who did not were small (paper in preparation).
11
ON THE VALIDITY OF LIE SCALES
One limitation of the present study was the on-line format used, which could have
such effects seem exist (see the meta-analysis by Richman, Kiesler, Weisband & Drasgow,
1999). Similarly, using the web as administration tool, seem to have no effects (Hancock &
Flowers, 2001).
Yet another limitation concern the populations sampled. Three out of four samples
were driving offenders. However, the bulk of these offences were rather ordinary
behaviors, like moderate speeding and not using a seatbelt, which do not make these
drivers very different from the majority. Furthermore, the same kind of effect was
Finally, it can be observed that even though many researchers seem to realize the
problematic nature of self-reports and common method variance (Holden, 2008; Chang,
van Witteloostuijn & Eden, 2010), some research areas are still dominated by self-report
studies (af Wåhlberg, 2009). This state of the art is not acceptable, and lie scales are not the
References
Armitage, C. J., & Conner, M. (2001). Efficacy of theory of planned behaviour: A meta-
Ashley, A., & Holtgraves, T. (2003). Repressors and memory: Effects of self-deception,
296.
12
ON THE VALIDITY OF LIE SCALES
Biderman, M. D., Nguyen, N. T., Mullins, B., & Luna, J. (2008). A method factor
Chang, S.-J., van Witteloostuijn, A., & Eden, L. (2010). From the Editors: Common
Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of
Deffenbacher, J. L., Oetting, E. R., & Lynch, R. S. (1994). Development of a driving anger
Deffenbacher, J. L., Richards, T. L., Filetti, L. B., & Lynch, R. S. (2005). Angry drivers: A
Ellingson, J. E., Sackett, P. R., & Hough, L. M. (1999). Social desirability corrections in
Ellingson, J. E., Smith, D. B., & Sackett, P. R. (2001). Investigating the influence of social
133.
13
ON THE VALIDITY OF LIE SCALES
Elliott, M. A., Armitage, C. J., & Baughan, C. J. (2007). Using the theory of planned
Fisher, R. J., & Katz, J. E. (2000). Social desirability bias and the validity of self-reported
Furnham, A. (1986). Response bias, social desirability and dissimulation. Personality and
Gosling, S. D., Rentfrow, P. J., & Swann, W. B. (2003). A very brief measure of the Big
Gulian, E., Glendon, A. I., Matthews, G., Davies, D. R., & Debney, L. M. (1988).
de Bruin (Eds.) Road User Behaviour: Theory and Research (pp. 342-347).
Hancock, D. R., & Flowers, C. (2001). Comparing social desirability responding on World
Hays, R. D., Hayashi, T., & Stewart, A. L. (1989). A five-item measure of socially
14
ON THE VALIDITY OF LIE SCALES
Hessing, D. J., Elffers, H., & Weigel, R. H. (1988). Exploring the limits of self-reports and
Kline, T. J., Sulsky, L. M., & Rever- Moriyama, S. D. (2000). Common method variance
Lajunen, T., Corry, A., Summala, H., & Hartley, L. (1997). Impression management and
McRae, R. R., & Costa, P. T. (1983). Social desirability scales: More substance than style.
Podsakoff, P. M., Mackenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common
15
ON THE VALIDITY OF LIE SCALES
Ramanaiah, N. V., Schill, T., & Leung, L. S. (1977). A test of the hypothesis about the
Richman, W. L., Kiesler, S., Weisband, S., & Drasgow, F. (1999). A meta-analytic study
Schmitt, N., Oswald, F. L., Kim, B. H., Gillespie, M. A., Ramsay, L. J., & Yoo, T.-Y.
of adolescent use of violent film, computer and website content. Journal of Community,
53, 105–121.
Smith, D. B., & Ellingson, J. E. (2002). Substance versus style: A new look at social
243-262.
16
ON THE VALIDITY OF LIE SCALES
www.psyk.uu.se/hemsidor/busdriver.
af Wåhlberg, A. E., Dorn, L., & Kline, T. (2010). The effect of social desirability on self
reported and recorded road traffic accidents. Transportation Research Part F, 13,
106-114.
af Wåhlberg, A. E., Dorn, L., & Kline, T. (2011). The Manchester Driver Behaviour
17
ON THE VALIDITY OF LIE SCALES
Table 1
scale), DBQ-L=Driver Behaviour Questionnaire (lapse scale), Big Five-C= Big Five
18
ON THE VALIDITY OF LIE SCALES
Table 2
19
ON THE VALIDITY OF LIE SCALES
Table 3
20
ON THE VALIDITY OF LIE SCALES
Table 4
21
ON THE VALIDITY OF LIE SCALES
Table 5
Sample Scale First versus second wave Second versus third wave
N Correlation Partial N Correlation Partial
correlation correlation
YDS DAS 8378 .493 .429 1189 .678 .591
YDS SSS 8378 .521 .478 1189 .715 .679
YDS DBI-A 8378 .461 .369 1189 .644 .542
YDS DBQ-V 8378 .464 .340 1189 .634 .507
YDS DIM - - - 1189 .568 -
(1st/3rd
waves)
Control DAS 234 .662 .649 - - -
Control SSS 234 .580 .545 - - -
Control DBI-A 234 .748 .724 - - -
Control DBQ-V 234 .694 .585 - - -
Control DIM 234 .733 - - - -
RLS Big 4036 .589 .506 342 .584 .548
Five-C
RLS DBQ-L 4036 .694 .660 342 .588 .544
RLS DIM 4036 .767 - 342 .705 -
RLS M-C 4036 .705 - 342 .627 -
SS Big 505 .383 .355 - - -
Five-C
SS SSS 505 .604 .551 - - -
SS DBQ-L 505 .503 .472 - - -
SS DIM 505 .602 - - - -
22
ON THE VALIDITY OF LIE SCALES
Table captions
Table 1: Overview of the samples and the characteristics of each data gathering.
Table 2: Descriptive data for the samples used, for those drivers who responded to two
waves of each questionnaire. Shown are the percentages of males and mean /standard
deviation of age.
Table 3: Descriptive statistics for the scales used, for all samples, for waves 1 and 2, for
those who responded to both. Five-step scales were used for all items, mean values and
standard deviations over items presented. Dependent t-tests and Cohen’s d calculated
between waves (wave 1 std as denominator). Also presented are the number of items in
each scale and the Cronbach alphas for the scales in each wave.
Table 4: The Pearson correlations between the lie scales and other scales, within each
wave.
Table 5: The Pearson correlations between the scales in waves 1 and 2, raw and with the
first wave DIM lie scale held constant. In the YDS, the DIM scale was distributed in the
first and third waves, in the other samples in all waves. In the RLS, both lie scales were
held constant.
23