Gigerenzer (2018) The Bias Bias in Behavioral Economics
Gigerenzer (2018) The Bias Bias in Behavioral Economics
ABSTRACT
Behavioral economics began with the intention of eliminating the
psychological blind spot in rational choice theory and ended up
portraying psychology as the study of irrationality. In its portrayal,
people have systematic cognitive biases that are not only as persis-
tent as visual illusions but also costly in real life—meaning that
governmental paternalism is called upon to steer people with the
help of “nudges.” These biases have since attained the status of
truisms. In contrast, I show that such a view of human nature is
tainted by a “bias bias,” the tendency to spot biases even when there
are none. This may occur by failing to notice when small sample
statistics differ from large sample statistics, mistaking people’s ran-
dom error for systematic error, or confusing intelligent inferences
with logical errors. Unknown to most economists, much of psycho-
logical research reveals a different portrayal, where people appear
to have largely fine-tuned intuitions about chance, frequency, and
framing. A systematic review of the literature shows little evidence
that the alleged biases are potentially costly in terms of less health,
wealth, or happiness. Getting rid of the bias bias is a precondition
for psychology to play a positive role in economics.
Jacobs, Joshua B. Miller, Hans-Jörg Neth, Andreas Ortmann, and Peter Todd for helpful
comments.
than in the theory. Subsequently, an attempt was made to explain the devia-
tions by adding free parameters to expected utility theories (Camerer et al.,
2003). In this vision, Homo economicus remained the unquestioned ideal, while
Homer Simpson was teasingly proposed as a more apt description of Homo
sapiens, given his bouts of ignorance, laziness, and brilliant incompetence
(Thaler and Sunstein, 2008).
The field would have progressed in an entirely different direction had it
followed Herbert Simon’s original vision of behavioral economics, in which
psychology comprised more than mental quirks and theory more than expected
utility maximization. Simon (1979, 1989) urged economists to move away from
as-if expected utility models and study how people actually make decisions
in realistic situations of uncertainty as opposed to under risk. I use the term
uncertainty for situations where the exhaustive and mutually exclusive set
of future states and their consequences is not known or knowable. In these
frequent real-world situations where the assumptions of rational choice theory
do not hold, individuals and institutions can nevertheless make smart decisions
using psychological tools such as fast-and-frugal heuristics (Gigerenzer and
Selten, 2001). Frank Knight, John Maynard Keynes, and Vernon Smith
similarly distinguished situations where rational choice theory applies and
where its assumptions are violated, along with the need for a complementary
framework for decision making under uncertainty.
Whereas Simon aimed at developing an alternative to neoclassical eco-
nomics, the line of behavioral economics shaped by the heuristics-and-biases
program largely limited itself to studying deviations from the neoclassical
paradigm, or what it took this paradigm to be. Experimenters aimed at
demonstrating “anomalies” and “biases” in human behavior. These biases have
attained the status of truisms and provided justification for new paternalistic
policies, popularly known as nudging, adopted by governments in the UK,
US, and elsewhere. The argument leading from cognitive biases to govern-
mental paternalism—in short, the irrationality argument—consists of three
assumptions and one conclusion:
But why would researchers exhibit a bias bias? One motive is an academic
agenda to question the reasonableness of people. Unlike in Simon’s vision, be-
havioral economics à la Kahneman and Thaler defined its agenda as uncovering
deviations from rational choice theory; without these deviations it would lose
its raison d’être. A second possible motive is a commercial agenda to discredit
the judgment of jurors, as the Exxon Valdez oil spill in Alaska illustrates. In
1994, an Alaskan federal jury had awarded $5.3 billion to fishermen and others
whose livelihoods had been devastated by the spill. Exxon quietly funded a new
line of research using studies with mock juries that questioned juror’s cognitive
abilities. Based on the results, Exxon argued in an appeal “that jurors are
generally incapable of performing the tasks the law assigns to them in punitive
damage cases” (Zarembo, 2003). This argument served Exxon well in court. A
third possible motive—or more likely, unintended consequence—is to promote
trust in abstract logic, algorithms, and predictive analytics and distrust in
human intuition and expertise. For instance, the COMPAS algorithm has been
used in U.S. courts to predict the probability of recidivism for over one million
defendants, influencing pretrial, parole, and sentencing decisions. Apparently,
the algorithm remained unquestioned for almost two decades until a study
disclosed that COMPAS predicted no better than ordinary people without
any experience with recidivism and had a racial bias to boot (Dressel and
Farid, 2018). In the book jacket copy of his biography on Kahneman and
Tversky, Lewis (2017) states that they “are more responsible than anybody for
the powerful trend to mistrust human intuition and defer to algorithms.”
Visual Illusions
By way of suggestion, articles and books introduce biases together with images
of visual illusions, implying that biases (often called “cognitive illusions”) are
equally stable and inevitable. If our cognitive system makes such big blunders
like our visual system, what can you expect from everyday and business
decisions? Yet this analogy is misleading, and in two respects.
First, visual illusions are not a sign of irrationality, but a byproduct of
an intelligent brain that makes “unconscious inferences”—a term coined by
Hermann von Helmholtz—from two-dimensional retinal images to a three-
dimensional world. In Kandel’s (2012) words, “In fact, if the brain relied solely
on the information it received from the eyes, vision would be impossible” (p.
203). Consider Roger Shepard’s fascinating two-tables illusion, which Thaler
and Sunstein (2008) present as “the key insight that behavioral economists
have borrowed from psychologists,” namely that our judgment is “biased, and
predictably so” (p. 20). Yet that is not the insight drawn by Shepard and
other psychologists who study visual illusions; they consider our visual system
a marvel of unconscious intelligence that manages to infer three-dimensional
objects (here, tables) from a two-dimensional retinal picture. Because any two-
dimensional projection is consistent with infinitely different three-dimensional
realities, it is remarkable how seldom we err. In Shepard’s (1990) words, “to
fool a visual system that has a full binocular and freely mobile view of a well-
illuminated scene is next to impossible” (p. 122). Thus, in psychology, the visual
system is seen more as a genius than a fool in making intelligent inferences, and
inferences, after all, are necessary for making sense of the images on the retina.
Second, the analogy with visual illusions suggests that people cannot learn,
specifically that education in statistical reasoning is of little efficacy (Bond,
2009). This is incorrect, as can be shown with one of the most important items
on Conlisk’s list: making errors in updating probabilities.
Consider the claim that people fail to make forecasts that are consistent with
Bayes’ rule: the human mind “is not Bayesian at all” (Kahneman and Tversky,
1972, p. 450; see also Kahneman, 2011, pp. 146–155). This claim is correct if
people have no statistical education and are given the relevant information
in conditional probabilities. But another picture emerges if people receive a
short training and learn to use proper representations. To illustrate, please
attempt to solve this problem without the help of a calculator.
Conditional probabilities:
A disease has a base rate of .1, and a test is performed that has a
hit rate of .9 (the conditional probability of a positive test given
The Bias Bias in Behavioral Economics 309
There is reliable experimental evidence that most people cannot find the
Bayesian posterior probability, which would be (.1×.9)/[(.1×.9)+(.9×.1)] = .5.
However, when people can learn from experience instead, research reports
that people are approximate Bayesians (e.g., Edwards, 1968). On the basis
of this result, one can present the information as the tally of experienced
frequencies, known as natural frequencies (because experienced frequencies are
not normalized like conditional probabilities or relative frequencies):
Natural frequencies:
The argument that biases are costly is essential for justifying governmental
paternalism. The irrationality argument provides a convenient rhetoric to
attribute problems caused by flawed incentives and system failures to flaws
inside people’s minds, detracting from political and industrial causes (e.g.,
Conley, 2013; Thaler, 2015; Thaler and Sunstein, 2008). Nicotine addiction
and obesity have been attributed to people’s myopia and probability-blindness,
not to the actions of the food and tobacco industry. Similarly, an article by the
Deutsche Bank Research “Homo economicus – or more like Homer Simpson?”
attributed the financial crisis to a list of 17 cognitive biases rather than the
reckless practices and excessive fragility of banks and the financial system
(Schneider, 2010).
The claim that biases incur substantial costs is often based on anecdotal
evidence. Surely there must be hard evidence for this important hypothesis?
Arkes et al. (2016) searched for such evidence through articles that demon-
strated biases. In over 100 studies on violations of transitivity, the search found
not a single one demonstrating that a person could become a money pump,
that is, be continually exploited due to intransitive choices. In more than 1,000
studies that identified preference reversals, arbitrage or financial feedback made
preference reversals and their costs largely disappear. Similarly, in hundreds of
studies on the Asian Disease Problem and other framing effects, little evidence
was found that “irrational” attention to framing would be costly. All in all,
little to no evidence was found that violations of these and other logical rules
are either associated or causally connected with less income, poorer health,
lower happiness, inaccurate beliefs, shorter lives, or any other measurable costs.
Lack of evidence for costs does not mean evidence for lack of costs; however,
this striking absence suggests that the ever-growing list of apparent fallacies
includes numerous “logical bogeymen,” as psychologist Lola Lopes once put it.
There are two ways to understand this negative result. One reason is the
bias bias, that is, many of the alleged biases are not biases in the first place.
The second reason was mentioned above: The biases are typically defined
as deviations of people’s judgments from a logical or statistical rule in some
laboratory setting, and therefore may have little to do with actual health,
wealth, or happiness.
The previous analysis showed that the three assumptions in the argument
leading from biases to governmental paternalism are based on questionable
evidence. Moreover, systematic departures from rational choice may either
increase or decrease policy makers’ rationale for paternalism, depending on
the specific assumptions made; specifically, the aggregate consequences of
The Bias Bias in Behavioral Economics 311
There is no room here to cover all items on Conlisk’s list, not to speak of the
175 or so biases listed on Wikipedia. Instead I will focus on intuitions about
randomness, frequency, and framing, which play a role in several alleged biases.
Specifically, I will deal with three common sources of the bias bias:
1. Small sample statistics are mistaken for people’s biases. The statistics of
small samples can systematically differ from the population parameters.
If researchers ignore this difference, they will mistake correct judgments
about small samples for systematic biases.
random data for patterned data and vice versa,” Point 2 for the claim that
people “display overconfidence,” and Point 3 for the claim that people “use
irrelevant information” and “give answers that are highly sensitive to logically
irrelevant changes in questions.” In each case, the problem lies in the norm,
against which a judgment is evaluated as a bias. I hope that these analyses
encourage readers to look more critically at other biases.
A large body of psychological research has concluded that people have good
intuitions about chance. For instance, children are reported to have a fairly
solid understanding of random processes, including the law of large numbers
(Piaget and Inhelder, 1975) and the production of random sequences (Kareev,
1992), and adults are reported to be good intuitive statisticians (Farmer et al.,
2017; Peterson and Beach, 1967).
In contrast to these findings, one of the most fundamental cognitive biases
reported in the heuristics-and-biases program is that intuitions about chance
depart systematically from the laws of chance. Consider experiments on the
perception of randomness in which a fair coin is thrown many times, and
the outcomes of heads (H) and tails (T) are recorded. Two key experimental
findings are:
1. Law of small numbers: People think a string is more likely the closer
the number of heads and tails corresponds to the underlying equal
probabilities. For instance, the string HHHHHT is deemed more likely
than HHHHHH.
2. Irregularity: If the number of heads and tails is the same in two strings,
people think that the one with a more irregular pattern is more likely.
For instance, the string HHTHTH is deemed more likely than HTHTHT.
Both phenomena have been called systematic errors and are often confused
with the gambler’s fallacy. But that is not so.
H H H H H H H H T T T T T T T T
H H H H T T T T H H H H T T T T
H H T T H H T T H H T T H H T T
H T H T H T H T H T H T H T H T
✓ ✓+ + + – – – – ✓ + – – – – – –
Figure 1: If the length of the string k is smaller than that of the observed sequence n, then
a pure string of heads (H) is less likely to be encountered than one with an alternation. This
is shown here for the case of k = 3 and n = 4. There are 16 possible sequences of four tosses
of a fair coin, each equally likely. In three of these, there is at least one string HHH (“check
mark”), while HHT occurs in four of these (“cross”). In this situation, people’s intuitions are
correct (Hahn and Warren, 2009).
because each of the two strings is said to have the same probability of occurring.
The alleged bias was explained by people’s belief in the “law of small numbers,”
which means that people expect the equal probability of H and T to hold in
small samples as well: HHT is more “representative” than HHH (Kahneman
and Tversky, 1972). The belief in the law of small numbers has been seen as a
key example of how people depart from perfect rationality, posing a radical
challenge to neoclassical economic theory (Rabin, 1998), and was modeled as
mistaking an urn with replacement for one with limited replacement (Rabin,
2002). Rabin and Vayanos (2010) concluded that people “view randomness
in coin flipping as corresponding to a switching rate of 60% instead of 50%”
(p. 735), which corresponds to misunderstanding statistical independence in
Conlisk’s list. Whatever the explanation, the alleged bias shows “stubbornness,”
namely that people show little insight and fail to overcome it: “For anyone
who would wish to view man as a reasonable intuitive statistician, such results
are discouraging” (Kahneman and Tversky, 1972, p. 445).
Consider now the general case. Let k be the length of the string of heads
and tails judged (which is three in the example above), and n be the total
sequence (number of tosses) observed (k ≤ n). If k = n = 3, there are eight
possible outcomes, all equally likely, and one of these contains HHH and one
HHT; thus, both are equally likely to be encountered. In this situation, the
intuition that HHT is more likely would be a fallacy. If k < n, however, that
same intuition is correct. Similarly, it can be shown that HHT is likely to be
encountered earlier than HHH: The expected waiting time for HHT is eight
tosses of a coin, compared with 14 tosses for HHH (Hahn and Warren, 2009).
Now we can specify the general principle under which people’s intuition is
ecologically rational :
You flip a fair coin 20 times. If this sequence contains at least one
HHHH, I pay you $100. If it contains at least one HHHT, you pay
me $100. If it contains neither, nobody wins.
If HHHH and HHHT were equally likely to be encountered, then the two
players should break even. But in fact, the person who accepts the bet can
expect to lose in the long run. If you watch 20 flips, the probability that
you will see at least one HHHH is about 50%, but the chance of an HHHT
is around 75% (Hahn and Warren, 2009). For the same reason, a gambler who
watches the outcomes of the roulette wheel in a casino for half an hour can
more likely expect to see a string of three reds followed by a black than a string
of four reds. In this situation, believing in the law of small numbers pays.
The phenomenon that people expect more alternations than probability
theory predicts has been sometimes linked to the gambler’s fallacy. Yet that
assumption is mistaken, as can be deduced from the ecological rationality
The Bias Bias in Behavioral Economics 315
condition. The gambler’s fallacy refers to the intuition that after witnessing a
string of, say, three heads, one expects that the next outcome will be more
likely tail than head. This would be a true fallacy because it corresponds to
the condition k = n. In other words, a total of four throws is considered, either
HHHH or HHHT, and there is no sample k with the property k < n.
Irregularity
The second alleged misconception about chance is that people believe that
irregular sequences are more likely. Consider the following two sequences:
HTHTHT
HTTHTH
Which sequence is more likely to be encountered? The number of heads
and tails are now identical, but alternations are regular in the first string and
irregular in the second. Psychological research documented that most people
judge the more irregular string as more likely. In Kahneman and Tversky’s
(1972) account, among all possible strings of length six, “we venture that only
HTTHTH appears really random” (p. 436).
Once again, this intuition was generally declared a fallacy, for the same
reason as before: All sequences are assumed to have equal probability. Yet
the same ecological analysis shows that if k < n < ∞, then the sequence
HTTHTH is actually more likely than the regular one. For instance, if John
throws a fair coin, the expected waiting time to get a HTTHTH is 64 flips,
whereas it is 84 for HTHTHT (and 126 for HHHHHH; see Hahn and Warren,
2009). This can be verified in the same way as with the table above.
In sum, under the ecological condition, people’s belief that irregular alterna-
tions are more likely or more quickly to be encountered reflects an astonishingly
fine-tuned sensitivity to the statistics of finite samples. The belief is erroneous
only in cases where the condition does not hold.
In the various justifications for why people commit the gambler’s fallacy,
the condition k < n < ∞ appears to have been overlooked in the bias literature.
In the next section, we will see that the same oversight applies to the hot hand
fallacy, the mirror image of the gambler’s fallacy.
Most basketball fans can recall magical moments where a player is “on fire,“ “in
the zone, “in rhythm,” or “unconscious.” This temporarily elevated performance
is known as the “hot hand.” For players and coaches, the hot hand is a common
experience. Gilovich et al. (1985), however, considered this intuition a cognitive
illusion and named it the hot hand fallacy. Even the website of the National
Collegiate Athletic Association (NCAA) once warned of believing in magic:
“Streaks and ‘hot hands’ are simply illusions in sports. And, it is better to be
a scientist than be governed by irrational superstition.”
Coaches and players reacted with stubborn disbelief, which some behavioral
economists saw as an indication that the hot hand fallacy resembles a robust
visual illusion. Thaler and Sunstein (2008) asserted “that the cognitive illusion
is so powerful that most people (influenced by their Automatic System) are
unwilling even to consider the possibility that their strongly held beliefs
might be wrong” (p. 31). Like the gambler’s fallacy, the hot hand fallacy
was attributed to “representativeness” and formalized by Rabin and Vayanos
(2010). It has served as an explanation for various vices and behaviors in
financial markets, sports betting, and casino gambling.
Professional coaches and players presumably have no incentives to be wrong,
only to be right. Why then would this erroneous belief exist and persist? The
answer is the same as with intuitions about randomness.
Gilovich et al. (1985) took care to collect multiple sources of data and
made their strongest case against the hot hand by looking at free shots, where
the other team cannot strategically reduce a “hot” player’s number of hits
through physical intervention (Raab et al., 2012). Twenty-six shooters from
Cornell each did 100 free shots from a fixed distance with varying locations.
The authors argued that if there is no hot hand (the null hypothesis), then the
frequency of a hit after three hits should be the same as that of a miss after
three hits. And that is exactly what they reported for all but one player—this
exception could be expected to happen by chance. Thus, their conclusion was
that the hot hand does not exist.
Now let us take a closer look at the null hypothesis of chance. To keep it
short, I use Hahn and Warren’s (2009) analysis of the gambler’s fallacy as an
analogy, whereas the original analysis of the hot hand by Miller and Sanjuro
(forthcoming) and Miller and Sanjuro (2016) was in terms of selection bias.
If there is no hot hand, the frequency of a hit after three hits should be not
equal but actually smaller, and for the reason just discussed. For again we
The Bias Bias in Behavioral Economics 317
are dealing with k = 4 (the length of the string) and n = 100; that is, the
ecological condition k < n < ∞ is in place. Figure 1 can be used to illustrate
the point by replacing the fair coin with a player who shoots from a distance
where there is a probability of .50 to score a hit and by replacing the streak of
four in the free-shooter experiment with a streak of three. In this situation, is
a hit or a miss more likely after two hits?
If a player makes n = 4 free shots, there are 16 sequences with equal
probabilities. In the absence of a hot hand, we observe a miss (T) after two
hits (H) in four of these sequences and a hit after two hits in only three
sequences. Thus, Gilovich et al.’s null hypothesis that the expected relative
frequency of hits and misses after a streak of hits should be the same is not
correct in a small sample. The correct null hypothesis is that after a streak
of HH, H has an expected proportion of about .42, not .50 (Miller and San-
juro, forthcoming). Because HHH should be less often observed, finding a
relative frequency of .50 instead of .42 actually indicates a hot hand. In a
reanalysis of the original data, a substantial number of the shooters showed
a pattern consistent with the hot hand (Miller and Sanjuro, forthcoming).
Across players, the hot hand boosted performance by 11 percentage points,
which is substantial and roughly equal to the difference in field goal per-
centage between the average and the very best three-point shooter in the
NBA.
Many have been puzzled about the relationship between the gambler’s
fallacy and the hot hand fallacy, given that they refer to contrary phenomena,
a negative recency bias and a positive recency bias (Rabin and Vayanos,
2010). Both have been attributed to the usual suspects, the representativeness
heuristic or the associative System 1. The present analysis shows a quite
different link: Both are a consequence of the same bias bias. Coaches and
players have good reason to maintain their belief in the hot hand.
1,000,000
ALL ACCIDENTS
Estimated Number of Deaths per Year
ALL DISEASE
100,000 MOTOR VEH. ACC.
ALL CANCER
HEART DISEASE
10,000 HOMICIDE STROKE
STOMACH CANCER
PREGNANCY
DIABETES
FLOOD TB
1,000 TORNADO ASTHMA
BOTULISM
ELECTROCUTION
100
SMALLPOX VACCINATION
10
1
1 10 100 1,000 10,000 100,000 1,000,000
Figure 2: Relationship between estimated and actual number of deaths per year for 41
causes of death in the United States (Lichtenstein et al., 1978). Each point is the mean
estimate (geometric mean) of 39 students; vertical bars show the variability (25th and 75th
percentile) around the mean estimates for botulism, diabetes, and all accidents. The curved
line is the best-fitting quadratic regression line. This pattern has been (mis)interpreted as
evidence of overestimation of low risks and underestimation of high risks. (Adapted from
Slovic et al., 1982).
get answers that reflect the actual relative frequencies of the events with great
fidelity” (p. 368).
Against this evidence, the claim that people show systematic biases in
evaluating the frequency of events is surprising. In a classic study, 39 college
students were asked to estimate the frequency of 41 causes of death in the
United States, such as botulism, tornado, and stroke (Lichtenstein et al., 1978;
Slovic et al., 1982). The result (Figure 2) was interpreted as showing two
systematic biases in people’s minds: “In general, rare causes of death were
overestimated and common causes of death were underestimated” (p. 467).
Why would people overestimate low risks and underestimate high risks?
Slovic et al. (1982) argued that “an important implication of the availabil-
ity heuristic is that discussion of a low-probability hazard may increase its
memorability and imaginability and hence its perceived riskiness” (p. 456).
Kahneman (2011, p. 324) referred to the associative System 1. The two biases
correspond to the weighting function in prospect theory and became widely
cited as the reason why the public lacks competence in political decisions:
People exaggerate the dangers of low-risk technology such as nuclear power
plants and underestimate the dangers of high-frequency risks.
The Bias Bias in Behavioral Economics 319
Let us now have a closer look at how these biases were diagnosed. Slovic
et al. (1982) explain: “If the frequency judgments were accurate, they would
equal the statistical rates, with all data points falling on the identity line.”
(p. 466–467). The identity line is taken as the null hypothesis. Yet that
would only be correct if there were no noise in the data, which is not the
case, as the wide vertical bars in Figure 2 reveal. In the presence of noise,
that is, unsystematic error, the curve is not the identity line but a line that is
bent more horizontally, a phenomenon known as regression toward the mean.
It was discovered in the 1880s by the British polymath Sir Francis Galton,
who called it “reversion toward the mean.” Galton observed that the sons
of small fathers were on average taller than their fathers, and the sons of
tall fathers were on average smaller than their fathers. If we replace fathers
and sons with actual and perceived risks, respectively, the result resembles
the curved line in Figure 2. However, Galton also observed that the fathers
of small sons were on average taller than their sons, and the fathers of tall
sons were on average smaller than their sons. The first observation would
wrongly suggest that the actual variability of the sons is smaller than that
of their fathers, the second observation the opposite. What diminishes in the
presence of noise is the variability of the estimates, both the estimates of the
height of the sons based on that of their fathers, and vice versa. Regression
toward the mean is a result of unsystematic, not systematic error (Stigler,
1999).
If the pattern in Figure 2 is due to unsystematic rather than systematic
error, plotting the data the other way round as Galton did will show this. In
a replication of the original study (Hertwig et al., 2005), when the estimated
frequencies were predicted from the actual frequencies, as originally done,
the same regression line that fitted the data best was replicated (Figure 3).
However, when the actual frequencies were predicted from the estimated
frequencies, the regression line showed a reverse pattern, suggesting the opposite
biases. For instance, consider low estimates between 10 and 100 per year on
the Y-axis. The corresponding actual frequency on the second regression
line is higher, between about 50 and 500. This discrepancy now makes it
look as though people underestimate, not overestimate low risks. Yet neither
is the case: The two regression lines are a consequence of unsystematic
noise.
Behavioral economists have repeatedly pointed out that people confuse
regression toward the mean with real effects. In this case, it has misled
researchers themselves. Milton Friedman (1992) suspected that “the regression
fallacy is the most common fallacy in the statistical analysis of economic data”
(p. 2131).
Thus, the reanalysis is consistent with the general result in psychological
research that people are on average fairly accurate in estimating frequencies,
whereas unsystematic error can be large.
320 Gerd Gigerenzer
1,000,000
ALL DISEASE
Estimated Number of Deaths per Year
1
1 10 100 1,000 10,000 100,000 1,000,000
Figure 3: Replication of the causes of death study (Lichtenstein et al., 1978, Figure 2) by
Hertwig et al. (2005) showing that what appears to be overestimation of low risks and
underestimation of high risks is in fact a statistical artifact known as regression toward the
mean. The replication used the same 41 causes (7 are not shown because their frequencies
were zero in 1996–2000), 45 participants, and no anchor. When the estimated frequencies
are predicted from the actual frequencies, the resulting curve (the best-fitting quadratic
regression line) is virtually identical with that in the original study (Figure 2). However,
when the actual frequencies are predicted from the estimated frequencies, the resulting curve
appears to show the opposite bias: Rare risks appear to be underestimated and high risks
overestimated. The figure also shows the large unsystematic error in the data (individual
data points). (Adapted with permission from Hertwig et al., 2005).
The classic study of Lichtenstein et al. illustrates the second cause of a bias
bias: when unsystematic error is mistaken for systematic error. One might
object that systematic biases in frequency estimation have been shown in the
widely cited letter-frequency study (Kahneman, 2011; Tversky and Kahneman,
1973). In this study, people were asked whether the letter K (and each of four
other consonants) is more likely to appear in the first or the third position
of a word. More people picked the first position, which was interpreted as
a systematic bias in frequency estimation and attributed post hoc to the
availability heuristic. After finding no single replication of this study, we
repeated it with all consonants (not only the selected set of five, each of
which has the atypical property of being more frequent in the third position)
and actually measured availability in terms of its two major meanings, number
The Bias Bias in Behavioral Economics 321
and speed, that is, by the frequency of words produced within a fixed time and
by time to the first word produced (Sedlmeier et al., 1998). None of the two
measures of availability was found to predict the actual frequency judgments.
In contrast, frequency judgments highly correlated with the actual frequencies,
only regressed toward the mean. Thus, a reanalysis of the letter-frequency
study provides no evidence of the two alleged systematic biases in frequency
estimates or of the predictive power of availability.
These studies exemplify two widespread methodological shortcomings in
the bias studies. First, the heuristics (availability, representativeness, affect)
are never specified; since the 1970s, when they were first proposed, they have
remained common-sense labels that lack formal models. Second, the heuristic
is used to “explain” a bias after the fact, which is almost always possible given
the vagueness of the label. The alternative is to use formal models of heuristics
that can be tested; to test these models in prediction, not by data fitting after
the fact; and to test multiple models competitively, as opposed to testing a
single model against chance (Berg and Gigerenzer, 2010; Gigerenzer et al.,
2011).
Overconfidence 1: Miscalibration
In the typical study, participants are asked many such questions, and then
the average percentage of correct answers is plotted for each confidence level.
A typical result is: When people said they were 100% confident, the average
322 Gerd Gigerenzer
proportion of correct answers was only 90%; when they said they were 90%
confident, the accuracy was 80%; and so on. This “miscalibration” was called
overconfidence (Lichtenstein et al., 1982). Yet when confidence was low, such
as 50%, accuracy was actually higher than confidence; this phenomenon was
called the hard–easy effect: Hard questions result in underconfidence, while
easy questions result in overconfidence (Gigerenzer et al., 1991).
The resulting miscalibration curves are similar to those in Figure 2, with
confidence on the X-axis (range 50% to 100%) and the average percentage of
correct answers on the Y-axis. And they are subject to the same argument as
stated before: In the presence of noise (quantitative confidence judgments are
noisy), the pattern that has been interpreted as overconfidence bias actually
results from unsystematic error because of regression toward the mean. This
result can be shown in the same way as in the previous section for overestimation
of small risks (similar to Figure 3). If one estimates the confidence judgments
from the proportion of correct responses (rather than vice versa, as customary),
then one should obtain the mirror result, which appears to be underconfidence.
Studies that reanalyzed empirical data sets in this twofold manner consistently
reported that regression toward the mean accounted for most or all of the
pattern that had been attributed to people’s overconfidence (Dawes and
Mulford, 1996; Erev et al., 1994; Pfeifer, 1994). For instance, when looking
at all questions that participants answered 100% correctly (instead of looking
at all questions where participants were 100% confident), it was found that
the average confidence was lower, such as 90 %—which then looked like
underconfidence. The regression effect also explains the hard–easy effect, that
is, why “overconfidence” flips into “underconfidence” when confidence is low
(Juslin et al., 2000).
In sum, studies on “miscalibration” have misinterpreted regression toward
the mean as people’s overconfidence, just as studies on overestimation of
low risks and underestimation of high risks have done. Nevertheless, one
might object that overconfidence has been documented elsewhere. Let us then
consider a second prominent definition of overconfidence.
Other phenomena that have been named overconfidence include the findings
that most people believe they are better than average and that people produce
too narrow confidence intervals. Neither of these is necessarily a bias; the
ecological conditions first need to be specified. Consider the finding that most
drivers think they drive better than average. If better driving is interpreted
as meaning fewer accidents, then most drivers’ beliefs are actually true. The
number of accidents per person has a skewed distribution, and an analysis of
U.S. accident statistics showed that some 80% of drivers have fewer accidents
than the average number of accidents (Mousavi and Gigerenzer, 2011). One
can avoid this problem by asking people about percentiles, but that still leaves
a second problem, namely that the typical study does not define what exactly
the target variable is. In the case of driving, people can pick whatever factor
324 Gerd Gigerenzer
2.5 Framing
Positive Frame: Five years after surgery, 90% of patients are alive.
Negative Frame: Five years after surgery, 10% of patients are dead.
Should the patient listen to how the doctor frames the answer? Behavioral
economists say no because both frames are logically equivalent (Kahneman,
2011). Nevertheless, people do listen. More are willing to agree to a medical
procedure if the doctor uses positive framing (90% alive) than if negative
framing is used (10% dead) (Moxey et al., 2003). Framing effects challenge
the assumption of stable preferences, leading to preference reversals. Thaler
and Sunstein (2008) who presented the above surgery problem, concluded that
“framing works because people tend to be somewhat mindless, passive decision
makers” (p. 40).
Framing effects have been presented as key justifications for the politics of
libertarian paternalism. In the words of Thaler and Sunstein (2008), a “policy
is ‘paternalistic’ if it tries to influence choices in a way that will make choosers
better off, as judged by themselves” (p. 5, italics in original). Paradoxically,
326 Gerd Gigerenzer
Consider the surgery problem again. For the patient, the question is not about
checking logical consistency but about making an informed decision. To that
end, the relevant question is: Is survival higher with or without surgery? The
survival rate without surgery is the reference point against which the surgery
option needs to be compared. Neither “90% survival” nor “10% mortality”
provides this information.
There are various reasons why information is missing and recommenda-
tions are not directly communicated. For instance, U.S. tort law encourages
malpractice suits, which fuels a culture of blame in which doctors fear making
explicit recommendations (Hoffman and Kanzaria, 2014). However, by framing
an option, doctors can convey information about the reference point and make
an implicit recommendation that intelligent listeners intuitively understand.
Experiments have shown, for instance, that when the reference point was lower,
that is, fewer patients survived without surgery, then 80–94% of speakers chose
the “survival” frame (McKenzie and Nelson, 2003). When, by contrast, the
reference point was higher, that is, more patients survived without surgery,
then the “mortality” frame was chosen more frequently. Thus, by selecting a
positive or negative frame, physicians can communicate their belief whether
surgery has a substantial benefit compared to no surgery and make an implicit
recommendation.
A chosen frame can implicitly communicate not only a recommendation
but also other information. Consider the prototype of framing:
Both frames are the same logically, but not psychologically. In an ex-
periment by Sher and McKenzie (2006), a full and an empty glass were put
on a table, and the participant was asked to pour half of the water into the
other glass and then to place the “half empty glass” on the edge of the table.
Most people chose the glass that was previously full. Here, framing conveys
unspoken information, and a careful listener understands that half full and
half empty are not the same.
The Bias Bias in Behavioral Economics 327
Perhaps the most famous example of a framing effect stems from the “Asian
disease problem” (Tversky and Kahneman, 1981). It has been presented as
evidence of risk aversion for gains and risk seeking for losses and of the value
function in prospect theory:
Imagine that the United States is preparing for an outbreak of an
unusual Asian disease, which is expected to kill 600 people. Two
alternative programs to combat the disease have been proposed.
Assume that the exact scientific estimates of the consequences of
the programs are as follows:
[Positive Frame]
If Program A is adopted, 200 people will be saved.
If Program B is adopted, there is a 1/3 probability that 600 people
will be saved and a 2/3 probability that no people will be saved.
[Negative Frame]
If Program A is adopted, 400 people will die.
If Program B is adopted, there is a 1/3 probability that nobody
will die and a 2/3 probability that 600 people will die.
The authors argued that the two problems are logically equivalent, and
that invariance requires that positive versus negative framing should not alter
the preference order. Nevertheless, when given the positive frame, most people
favored Program A, but favored Program B when given the negative frame.
This difference was interpreted as evidence that people are risk averse for
gains (positive frame) and risk seeking for losses (negative frame) (Tversky and
Kahneman, 1981). In this logical interpretation, the Asian disease problem
refutes the assumption of stable preferences and shows that people can be
easily manipulated.
Now recall the psychological analysis of the surgery problem: If people
notice that part of the information is left out, such as the effect of no surgery,
they tend to make inferences. In the Asian disease problem, the “risky” option
is always spelled out completely in both frames (e.g., 1/3 probability that
600 people will be saved and a 2/3 probability that no one is saved), whereas
the “certain” option is always incomplete. For instance, it communicates that
200 people will be saved but not that 400 will not be saved. This systematic
asymmetry matters neither from the logical norm of “description invariance”
nor for prospect theory, given that the framing in terms of loss and gains is
preserved. But it should matter if people make intelligent inferences. To test
these two competing explanations—logical error or intelligent inference—all
that needs to be done is to complete the missing options in both frames. Here
is the complete version for the positive frame:
328 Gerd Gigerenzer
Is Framing Unavoidable?
The proposal that governments should nudge people by framing messages has
been defended by the argument that there is no escape from framing: Results
have to be framed either positively or negatively (Thaler and Sunstein, 2008).
Yet that is not necessarily so. As the examples above have shown, there is an
alternative to positive or negative framing, namely to specify the message in
its entirety, such as “Five years after surgery, 90% of patients are alive and
10% are dead,” or “200 people will be saved and 400 not.”
In sum, the principle of logical equivalence or “description invariance” is a
poor guide to understanding how human intelligence deals with an uncertain
world where not everything is stated explicitly. It misses the very nature of
intelligence, the ability to go beyond the information given (Bruner, 1973).
The three sources of the bias bias I identified—failing to notice when small
sample statistics differ from large sample statistics, mistaking people’s random
error for systematic error, and confusing intelligent inferences with logical
errors—generalize to other items in Conlisk’s list as well (e.g., Gigerenzer et al.,
2012; Gigerenzer et al., 1988; McKenzie and Chase, 2012). There also appears
to be a common denominator to these sources. If one accepts rational choice
theory as a universal norm, as in Kahneman and Thaler’s version of behavioral
economics, where all uncertainty can be reduced to risk, the statistics of
The Bias Bias in Behavioral Economics 329
small samples or the art of framing play no normative role. The normative
consequences of uncertainty are easily overlooked, and people’s intuitions are
likely misperceived as being logical errors.
References
Berg, N. and D. Lien. 2005. “Does society benefit from investor overconfidence
in the ability of financial market experts?” Journal of Economic Behavior
and Organization. 58: 95–116.
Binmore, K. 2009. Rational decisions. Princeton, NJ: Princeton University
Press.
Bond, M. 2009. “Risk school”. Nature. 461(2): 1189–1192. doi: 10 . 1038 /
4611189a.
Brighton, H. and G. Gigerenzer. 2015. “The bias bias”. Journal of Business
Research. 68: 1772–1784. doi: 10.1016/j.jbusres.2015.01.061.
Bruner, J. S. 1973. Beyond the information given: Studies on the psychology
of knowing. Oxford, England: W. W. Norton.
Bundesregierung [Federal Government of Germany]. 2018. “Wirksam regieren”.
url: https://www.bundesregierung.de/Webs/Breg/DE/Themen/Wirksam-
regieren/_node.html.
Camerer, C., S. Issacharoff, L. Loewenstein, T. O’Donoghue, and M. Rabin.
2003. “Regulation for conservatives: Behavior Economics and the case for
‘asymmetric paternalism’”. University of Pennsylvania Law Review. 151:
1211–1254.
Chater, N., J. B. Tenenbaum, and A. Yuille. 2006. “Probabilistic models of
cognition: Conceptual foundations”. TRENDS in Cognitive Sciences. 10:
335–344.
Conley, S. 2013. Against autonomy. Justifying coercive paternalism. New York:
Cambridge University Press.
Conlisk, J. 1996. “Why bounded rationality”. Journal of Economic Literature.
34: 669–700.
Dawes, R. M. and M. Mulford. 1996. “The false consensus effect and overconfi-
dence: Flaws in judgment or flaws in how we study judgment?” Organiza-
tional Behavior and Human Decision Processes. 65: 201–211.
DeBondt, W. F. and R. Thaler. 1995. “Financial decision-making in mar-
kets and firms: A behavioral perspective”. In: Handbook in Operations
Research and Management Science, Vol. 9, Finance. Ed. by R. A. Jarrow,
V. Maksimovic, and V. T. Ziemba. North Holland: Elsevier. 385–410.
Dressel, J. and H. Farid. 2018. “The accuracy, fairness, and limits of predicting
recidivism”. Science Advances. 4: eaao5580.
Edwards, W. 1968. “Conservatism in human information processing”. In: Formal
representation of human judgment. Ed. by B. Kleinmuntz. New York: Wiley.
17–52.
Erev, I., T. S. Wallsten, and D. V. Budescu. 1994. “Simultaneous over- and
under confidence: The role of error in judgment processes”. Psychological
Review. 101: 519–527.
Farmer, G. D., P. A. Warren, and U. Hahn. 2017. “Who ‘believes’ in the
gambler’s fallacy and why?” Journal of Experimental Psychology: General.
146: 63–76.
The Bias Bias in Behavioral Economics 331