Vous êtes sur la page 1sur 86

The Nave Intuitive 1

Running head: THE NAVE INTUITIVE STATISTICIAN

The Nave Intuitive Statistician:

A Nave Sampling Model of Intuitive Confidence Intervals

Peter Juslin and Anders Winman

Uppsala University, Uppsala, Sweden

Patrik Hansson

Ume University, Ume, Sweden


The Nave Intuitive 2

Abstract

The perspective of the nave intuitive statistician is outlined and applied to explain overconfidence

when people produce intuitive confidence intervals and why this format leads to more

overconfidence than other formally equivalent formats. The nave sampling model implies that

people accurately describe the sample information they have but are nave in the sense that they

uncritically take sample properties as estimates of population properties. A review demonstrates that

the nave sampling model accounts for the robust and important findings in previous research, as well

as provides novel predictions that are confirmed, including a way to minimize the overconfidence

with interval production. The authors discuss the NSM as a representative of models inspired by the

nave intuitive statistician.


The Nave Intuitive 3

The Nave Intuitive Statistician:

A Nave Sampling Model of Intuitive Confidence Intervals

The study of judgment is often portrayed as a sequence of research programs, each guided by

a different andat least, on the face of itinconsistent metaphor (Fiedler & Juslin, 2006b). In the

sixties the mind was likened to an intuitive statistician producing judgments that by large are

responsive to the variables implied by statistics and probability theory (Peterson & Beach, 1967).

Less than a decade later the conclusion was that the mind operates according to principles other than

those prescribed by probability theory and statistics (Kahneman, Slovic, & Tversky, 1982). Because

of limited ability to process information people are condemned to the use of heuristics that, although

useful as rules of thumb, produce serious and persistent cognitive biases (Gilovich, Griffin, &

Kahneman, 2002).

In this article we will draw on yet another developing metaphor: that of the nave intuitive

statistician (Fiedler & Juslin, 2006b). Re-evoking the intuitive statistician this metaphor emphasizes

that, although the cognitive processes accurately describe the available information the intuitive

statistician is nave with respect to origins and estimator properties of the samples given. Two key

ingredients of this naivety are: a) that people take for granted that samples are representative of the

populations of interest, as when the frequency of violent death as channeled by the media affects the

judged risk of violent death (Lichtenstein, Slovic, Fischhoff, Layman, & Combs, 1978; Winman &

Juslin, 1993). b) People tend to assume that sample properties can be directly used to estimate the

corresponding population properties, as when the variance in a sample is correctly assessed but not

corrected by n/(n-1) to become an unbiased estimate of population variance (Kareev, Arnon, &

Horwitz-Zeliger, 2002).

Specifically, we will apply the metaphor of a nave intuitive statistician to explain


The Nave Intuitive 4

overconfidence in the production of intuitive confidence intervals for unknown quantities, and for

understanding why this method produces much more overconfidence than other formally equivalent

methods (Juslin & Persson, 2002; Juslin, Wennerholm, & Olsson, 1999; Klayman, Soll, Gonzalez-

Vallejo, & Barlas, 1999; Soll & Klayman, 2004; Winman, Hansson, & Juslin, 2004). If, for

example, people produce intuitive confidence intervals for their best guess for the interest rate next

year that should include the true value with probability .9 the proportion of intervals including the true

value tend to be far below .9, often closer to .4 or .5 (Block & Harper, 1991; Lichtenstein,

Fischhoff, & Phillips, 1982; Russo & Schoemaker, 1992). Because confidence intervals are often

made by experts (Clemen, 2001; Russo & Schoemaker, 1992), and the difference in expected

return given a probability .9 or .4 can be enormous, the phenomenon is not only theoretically puzzling

but of profound applied relevance.

Intriguingly, however, when participants are provided with the same intervals, and assess the

probability that the value falls within the interval, the overconfidence bias tends to diminish or even

disappear (Winman et al., 2004). This phenomenon is referred to as format dependence (Juslin et

al., 1999). The key idea explored in this article is the following: Even if the samples from the

environment are unbiased and the cognitive processes portray these samples accurately, if the sample

properties are uncritically taken as estimators of population properties, the implication is indeed

overconfidence with intuitive confidence intervals, yet relatively accurate judgment with probability

judgment. As demonstrated below, when people are confined to reliance on small samples these

effects produce non-trivial effects.

In this article we first briefly discuss the framework provided by the nave intuitive statistician,

with the intent of arguing that it complements the previous approaches to intuitive judgment in useful

ways. Thereafter we illustrate this framework by providing an in-depth application to the puzzling
The Nave Intuitive 5

phenomena of extreme overconfidence with interval production and the format-dependence effect.

After reviewing the relevant literature, the nave sampling model (NSM) is introduced and we

illustrate how it explains the results, including how it can be used to reduce the overconfidence with

intuitive confidence intervals. Finally, we discuss the NSM as one representative of models inspired

by the nave intuitive statistician.

The Nave Intuitive Statistician

Many of the enlightenment philosophers involved in the early development of probability

theory appeared to take for granted that judgments of probability are informed by the frequencies

encountered in ones experience (Gigerenzer et al., 1989; Hacking, 1975). Early research comparing

judgment to normative principles from probability theory and statistics indeed suggested that the mind

operates in the manner of an intuitive statistician (Peterson & Beach, 1967). People were apt at

learning probabilities (proportions) and other distributional properties from trial-by-trial experience.

Although already this early research documented some discrepancies from probability theory, most

famously that people are conservative updaters as compared to Bayes theorem (Edwards, 1982), in

most studies they appeared responsive to the factors implied by normative models (e.g., sample

size). Based on peoples remarkable ability to learn frequencies in controlled laboratory settings it

was also proposed that frequency is encoded and stored automatically (Hasher & Zacks, 1979;

Zacks & Hasher, 2002). The perspective of the intuitive statistician thus emphasizes our cognitive

ability to learn frequencies in well-defined and well-structured learning tasks and suggests that the

cognitive algorithms of the mind are in basic agreement with normative principles.

The heuristics and biases perspective emphasizes that, because of limited time, knowledge,

and computational ability, we rely on judgmental heuristics that provide useful guidance but also

produce characteristic biases (Gilovich et al., 2002; Kahneman et al., 1982). It is thus proposed that
The Nave Intuitive 6

the target attribute of probability is often substituted by heuristic variables, as when the probability

that an instance belongs to a category is assessed by its similarity to the category prototype, or the

probability of an event is assessed by the ease with which examples are brought to mind (Kahneman

& Frederick, 2002). For example, because violent deaths are more frequently reported in media it is

easier to retrieve examples of violent deaths than more mundane causes of death and people

therefore tend to overestimate the risk of dying a violent death (Lichtenstein et al., 1978; Winman &

Juslin, 1993).

As with the intuitive statistician, the heuristics and biases perspective highlights performance,

but here the limitations of intuitive judgment are brought to the forefront. The heuristics and biases

program has been extremely influential, inspired much new research, and organized large amounts of

empirical data (Gilovich et al., 2002). Yet, there remains an aching tension between this view and the

extensive body of data supporting the view of the intuitive statistician (Peterson & Beach, 1967;

Sedlmeier & Betsch, 2002).

The three assumptions that define the nave intuitive statistician (Fiedler & Juslin, 2006b)

integrate several aspects of these previous two research programs:

a) People have a remarkable ability to store frequencies in the form of natural and relative

frequencies and their judgments are accurate expressions of these frequencies. This assumption is

supported by an extensive body of data (Estes, 1976; Gigerenzer & Murray, 1987; Peterson &

Beach, 1967; Zacks & Hasher, 2002). The crucial working assumption is that the processes

operating on the information in general provide accurate description of the samples and, as such, are

not based on heuristic processing of the sample 1.

b) People are nave with respect to the effects of external sampling biases in the information

from the environment and more sophisticated sampling constraints (Einhorn & Hogarth, 1978;
The Nave Intuitive 7

Fiedler, 2000). In essence, people tend spontaneously to assume that the samples they encounter

are representative of the relevant populations. Peoples confidence in their answers to general

knowledge items may in part derive from accurate assessment of sampling probabilities in the

environment, but they fail to correct for the selection strategies used when general knowledge items

are created (Gigerenzer, Hoffrage, & Kleinblting, 1991; Juslin, 1994; Juslin, Winman, & Olsson,

2000). Confidence may accurately reflect experience but fail to correct for the effects of actions

taken on the basis of the confidence judgment that constrains the feedback received (Einhorn &

Hogarth, 1978; Elwin et al., in press). Likewise, many biases in social psychology may derive not

primarily from biased processing per se, but from inabilities to correct for the effects of sampling

strategies (Fiedler, 2000).

In fact, most of the traditional availability biases can be explained by accurate description of

biased samples, rather than by the reliance on heuristics that are inherently incompatible with

normative principles. Even if the available evidence is correctly assessed in terms of probability

(proportion), if the external media coverage (Lichtenstein et al., 1978) or the processes of search or

encoding in memory (Kahneman et al., 1982) yield biased samples the judgment becomes biased. In

these situations it is debatable whether the substitution of probability with a heuristic variable is where

the explanatory action lies2.

c) People are also nave with respect to the sophisticated statistical properties of statistical

estimators, such as whether an estimator as such is biased or unbiased (Kareev et al., 2002; Winman

et al., 2004). People tend spontaneously to assume that the properties of samples can be used to

describe the populations. People, for example, accurately assess the variance in a sample but fail to

understand that sample variance needs to be corrected by n/(n-1) to be an unbiased estimate of

population variance (Kareev et al., 2002). They may also underestimate the probability of rare
The Nave Intuitive 8

events in part because small samples seldom include the rare events (p. 537, Hertwig, Barron,

Weber, & Erev, 2004). Arguably, this naivety is an equally compelling account of the belief in the

law of small numbersthe tendency to expect small samples to be representative of their

populationsas the traditional explanation in terms of the representativeness heuristic (Tversky &

Kahneman, 1971). Although the biases discussed under b (biased input) and c (biased estimator) are

conceptually distinct the process is the same; the direct use of sample properties to estimate

population properties.

The nave intuitive statistician highlights both the cognitive mechanisms and the organism-

environment relations that support the judgments. Biases are traced not primarily or exclusively to the

cognitive mechanisms that describe the samples experienced, but to biases in the input and the

naivety with which samples are used to describe the populations (Fiedler & Juslin, 2006a). On this

view, the cognitive algorithms are not inherently irrational, but knowledge of more sophisticated

properties of samples are properly regarded as hard-earned and fairly recent cultural achievements

(Gigerenzer et al., 1989; Hacking, 1975).

The relationship between the research programs is summarized in Figure 1. The problem

involves three components, the environment, the sample of information available to the judge, and

the judgment that derives from this sample (Fiedler, 2000). These components, in turn, highlight two

interrelationships, the degree to which the sample is a veridical description of the environment and the

degree to which the judgment is a veridical description of the sample. When biased judgment occurs

in real environments or with task contents that involve general knowledge acquired outside of the

laboratory, both of these interrelations are unknowns in the equation. The heuristics and biases

program accounts for biases in terms of heuristic description of the samples (Figure 1B), but only

rarely is this attribution of the bias validated empirically, although there are exceptions (e.g., Schwarz
The Nave Intuitive 9

& Wnke, 2002).

Benefiting from the research inspired by the intuitive statistician (Figure 1A) in the sixties that,

in effect, ascertained one of the unknowns by demonstrating that judgments are fairly accurate

descriptions of experimentally controlled samples, the nave intuitive statistician emphasizes deviations

between sample and population properties as the cause of biased judgments (Figure 1C). This view

seems consistent both with the results supporting the intuitive statistician and the many

demonstrations of judgment biases. In the following we illustrate how the nave intuitive statistician

complements the other perspectives by addressing one of the more intriguing unresolved issues in

research on confidence.

Overconfidence with Interval Production

With the interval production or fractile format the participant produces an .xx confidence

interval around his or her best guess about a continuous quantity. As an example, the participants

may be asked to produce an interval within which they are .5 confident to find the population of

Thailand (Figure 2A). The fractiles in the subjective probability distribution define the upper and

lower boundaries for intervals, for example, the .25 and .75 fractile in the distribution define a .5

confidence interval within which the person is .5 confident to find the population of Thailand. To be

realistic xx % of the .xx intervals should include the correct values, but in general the intervals are

much too tight, a phenomenon commonly interpreted as overconfidence (Block & Harper, 1991;

Juslin & Persson, 2002; Juslin et al., 1999; Juslin, Winman, & Olsson, 2003; Klayman et al., 1999;

Lichtenstein et al., 1982; Russo & Schoemaker, 1992; Soll & Klayman, 2004; Winman et al.,

2004). The overconfidence bias is robust, but its exact magnitude differs depending on the target

quantity that is assessed. This is true, despite identical methods and procedures (Klayman et al.,

1999). Overconfidence also tends to be lower for more familiar quantities (Block & Harper, 1991;
The Nave Intuitive 10

Pitz, 1974).

Consider the two other ways illustrated in Figure 2A to elicit the same probability distribution.

With the half-range format the participant decides whether a statement is true or false and the

subjective probability that this choice is correct is assessed on a half range scale from .5 (Guessing)

to 1.0 (Certain). With the full-range format a proposition is presented and the probability that the

statement is true is assessed on a full-range scale from 0 (Certainly false) to 1.0 (Certainly true).

Overconfidence occurs if the participants are too confident in their ability to identify the true state of

affairs. To be perfectly consistent, if you respond; Yes with .9 confidence in the half-range task in

Figure 2A, in the full-range task you should assess a probability of .9 that the population of Thailand

exceeds 25 million. If you are asked to produce a .8 confidence interval within which the population

of Thailand falls you should provide a lower boundary of 25 million, because you are .9 confident

that the population of Thailand exceeds 25 million (i.e., the 90th and the 10th percentiles are the

upper and lower limits of a .8 interval). The items in Figure 2A are merely different ways to elicit the

same subjective probability distribution and they should produce the same conclusions.

Studies where these formats are applied to the same events emphasize format dependence,

that realism of confidence in a knowledge domain varies profoundly depending on the assessment

format (Juslin & Persson, 2002; Juslin et al., 1999; Juslin et al., 2003; Klayman et al., 1999) (see

Figure 2B). The interpretation of the overconfidence observed with probability judgment has been

the subject of controversy. The traditional interpretation has been that the overconfidence derives

from genuine cognitive processing biases (Kahneman & Tversky, 1996). Other studies suggest that

when you control for biased task selection (Gigerenzer et al., 1991; Juslin, 1994) and other artifacts

often little or no overconfidence is observed with the half-range format (Juslin, Winman, & Olsson,

2000; Klayman et al., 1999). Regardless of the interpretation of overconfidence with probability
The Nave Intuitive 11

judgment it is clear that the overconfidence with interval production is larger, and often of an

enormous magnitude. This holds even if we control for the way items are sampled and regression

effects from random error in the judgment process (Juslin et al., 1999; Soll & Klayman, 2004).

Also in contrast to probability judgment, where experts are better calibrated than novices

(Keren, 1987) and sometimes do assess impressively realistic probability judgments (Lichtenstein et

al., 1982), there are few signs that experts produce more realistic confidence intervals than novices.

Russo and Schoemaker (1992), for example, investigated more than 2000 experts from a large

number of professions using questions tailored to the experts field of expertise (e.g., advertising,

management, petroleum industry) and 99% of them were overconfident. In regard to the exceptions

that are known, where experts are not severely overconfident with interval production (Murphy &

Winkler, 1977; Tomassini, Solomon, Romney, & Krogstad, 1982), the extent to which these effects

are mediated by technical support and extensive experience of expressing similar distributions is

unclear. These factors may allow the judge to rely on pre-computed knowledge rather than

assessments computed on line. We return to this issue in a later section, where we also review the

data from laboratory experiments demonstrating that overconfidence with interval production

appears relatively immune to substantive expertise as such (Hansson, Juslin, & Winman, 2006).

In sum, there are several robust findings that should be addressed by a model of production of

intuitive confidence intervals: 1) there is a strong overconfidence bias with interval production that is

not accounted for by biased sampling of items or random error in judgment. 2) The magnitude of

bias differs depending on the target variable in ways that cannot be accounted for by methodological

or procedural differences. 3) When the same probability distribution is assessed by other formally

equivalent elicitation methods involving predefined events overconfidence is diminished or

disappears. 4) The overconfidence with interval production is affected little or not at all by expertise.
The Nave Intuitive 12

We conclude that this pattern of results defines one of the more intriguing unresolved issues in

research on confidence. In the next section we illustrate how the metaphor nave intuitive statistician

can enlighten our understanding of the phenomena and offer us ways to reduce overconfidence.

A Nave Sampling Model (NSM)

In this section we first introduce the assumptions behind the NSM. Thereafter, we explain why

the NSM explains the pattern of results documented in the previous section and we evaluate the

predictions by the NSM against empirical data in the literature. In the final part we review more

recent research that has been directly motivated by the NSM.

Assumptions of the Nave sampling Model

Three main assumptions define the NSM:

1. Each judgment elicits retrieval of a small sample of similar observations from

long term memory that become active in short term memory.

The sample of similar observations creates an impression of the likely value for the unknown quantity

and of the variability expected among such quantities. In the General Discussion we return to a

discussion of how literally we need to interpret this sampling metaphor.

2. The sample size is constrained by short term memory capacity.

Assumption 2 implies that the sample size is restricted by architectural limitations of the mind. Similar

capacity limitations (Baddeley, 1998) have proven important to understand many other processes of

controlled thought (e.g., Johnson-Laird, 1983; Just & Carpenter, 1992; Newell & Simon, 1972)

and if overconfidence is a function of sample size unaided experience (more observations), as such,

may not be sufficient to cure it.

3. People directly use sample properties to estimate population properties.


The Nave Intuitive 13

Assessment of probability in essence involves estimation of a proportion. Sample proportion is an

unbiased estimator of population proportion. For example, if you sample from an urn with a

proportion p of red balls, the long-run average sample proportion P of red balls equals p. If peoples

estimates of probabilities are based literally onor are computationally equivalent tosample

proportions they should show little or no overconfidence (Juslin et al., 1997; Soll, 1996)3. In more

concrete terms, to assess the probability that Thailand has more than 25 million inhabitants you may,

for example, retrieve a sample of Asian countries with a known number of inhabitants and assess the

proportion of those that satisfy the event.

By contrast, interval production involves estimation of a dispersion of plausible or possible

values. The idea is that people take the central coverage of the sample distribution as a direct proxy

for the central coverage of the population distribution. If, for example, 90% of the values in the

sample fall in between two limits this is taken as evidence that 90% of the population values fall

within these limits. As detailed below, not only is sample dispersion a biased estimate of the

population dispersion, but a sample is in general dislocated relative to the population distribution

further decreasing the proportion of population values covered by the central portion of the sample

distribution. For example, a 90% confidence interval for the population of Thailand may be formed

by retrieving a sample of Asian countries with a known number of inhabitants and report limits that

include 90% of the values in this sample.

To grasp the difference between estimating the proportion and the coverage of a distribution it

is instructive to consider that when estimating a proportion every observation is equally informative,

but when estimating the coverage of a distribution the observation of rare but extreme values of the

distribution becomes crucial. For example, in the extreme case of repeated sampling of only two

observations from a distribution on a continuous dimension, all samples will under-estimate the
The Nave Intuitive 14

100% coverage of the distribution, except one (i.e., the one actually containing the population

maximum and minimum values). The claim is that people use sample proportion and sample coverage

as if they are both unbiased estimators.

Computational Steps of the Nave Sampling Model

A simple implementation of these three assumptions is illustrated in Figure 3 (a Monte Carlo

simulation is presented in greater detail in the Appendix A).

1. Retrieval of cues. One or several cues (or facts) relevant to estimate the target quantity are

retrieved from long term memory. Keeping with the example of estimating the population of Thailand

you may retrieve that Thailand is situated in Asia. The cue(s), in turn, define a corresponding

objective environmental distribution (OED) of similar observations in the persons natural

environment (Brunswik, 1955). In regard to the cue located in Asia we have an OED defined by

the distribution of population figures of Asian countries.

2. Sampling the target values of similar objects. In long term memory a subset of the target

values in the OED is stored and these observations define the subjective environmental

distribution (SED). The target values of a sample of n observations from the SED are retrieved to

produce a sample distribution (SSD). In the spirit of the nave intuitive statistician, we assume that

the SSD is a random sample from the SED. The SED may sometimes be a random sample of the

OED, as, for example, if the populations of world countries that you know about are a random

sample of all world country populations. In many circumstances, however, it may deviate

systematically from the OED because of biases in the external input, for example from media

coverage, or the strategies used to collect the information (Fiedler, 2000). In our example the person

may retrieve the populations of n Asian countries which provide a sample of population figures for

countries similar to Thailand (i.e., in this example the similarity refers to the common property of
The Nave Intuitive 15

being an Asian country).

3. Nave estimation. The properties of the SSD are directly taken as estimates of the

corresponding properties of the population distributions (i.e., the OEDs):

a. Probability judgment: Given a target event, the subjective probability judgment for the

event is the proportion of observations in the SSD that satisfy the event. If, for example,

the estimate concerns the population of Thailand and the event is having a population

larger than 25 million, the person may retrieve a sample of n known population figures of

Asian countries. If m out of these n observations have a population larger than 25 million

the probability judgment is m/n.

b. Interval production. The coverage of the SSD is used directly to estimate the coverage

of the OED. In our example, to produce a .5 probability interval for the population of

Thailand a person may retrieve a sample of Asian countries and report the 25th and 75th

fractiles within this sample as the estimated interval.

With both assessment formats a sample of observations is retrieved and directly expressed as

required by the format, either as a probability (proportion) for interval evaluation or as fractiles of a

distribution for interval production. There is no bias in the description of the sample, only naivety in

that the sample property is treated as an unbiased estimator of the population property. Although the

assumptions behind the model in Figure 3 appear innocent in the following we demonstrate that they

predict several non-trivial phenomena.

Application to Previous Data

Why is there Overconfidence with Interval Production?

For infinite sample size the computations in Figure 3 for interval evaluation and interval

production produce identical results: at small sample sizes they do not! Naive use of the SSD to
The Nave Intuitive 16

assess the probability of an event (Step 3a) provides a relatively unbiased estimator of probability

(but see the qualification in Footnote 3); naive use of the SSD to produce intervals (Step 3b)

generates much too narrow intervals for small samples.

To see this, consider the schematic illustration of an OED in Figure 4A. In our example this

would correspond to the OED for populations of Asian countries, although this distribution is not

normal like the one in Figure 4A. The xxth fractile of the OED is the target value such that xx% of

the OED is equal to or lower than this value. The limits of the interval in Figure 4A are the 25th and

the 75th fractiles of the OED, thus defining an interval around the median within which 50% of the

OED falls. Again, in terms of our example, 50% of the Asian countries have a population between

3.5 and 48.8 million (i.e., based on the country populations listed in the United Nations database in

the year 2002).

For the moment, let us make the following simplifying assumptions about the knowledge state

of the person (these assumptions are later relaxed): the SED, the target values from the OED that are

known to the person, comprise a perfectly random and representative sub-sample of the OED. The

person forms his or her uncertain belief about the target quantity by retrieving a SSD of n similar

observations from the SED. Let us consider two ways of eliciting this uncertain belief. With full-

range interval evaluation the person is given interval limits and is asked to assess the probability

that the target quantity falls within the interval. With interval production the person is given a

probability and is asked to produce the limits of a central confidence interval around his or her best

guess for the quantity.

With interval evaluation there is one factor, sampling error, which contributes to the

probability that the quantity falls outside of the interval (to the error rate) and this contributor is

explicit in the sample. To the extent that the pre-defined interval does not include the entire OED,
The Nave Intuitive 17

pending sampling error some observations in the SSD will fall within, some outside, the interval when

observations are sampled from the SED. Knowing that the target quantity belongs to an OED

therefore suggests that the target quantity falls in the interval with a probability. For example,

knowing that a country is Asian suggests that its population is between 3.5 and 48.8 million with

probability .5. For random sampling with replacement sample proportion P is an unbiased estimate

of population proportion p (the expected value of P is p). As such, relying on the sample proportion

yields accurate judgment.

In regard to interval production with the NSM, there are three contributors to the error rate,

only one of which is explicitly manifested in the SSD: a) sampling error: The observations in the

SSD define a dispersion and, as with interval valuation, this variability is explicit in the SSD and for

all but the 1.0 interval there will be manifest observations within the SSD that fall outside of the

produced interval. With correctly estimated fractiles, the quantity falls outside the interval with

probability 1-.xx for an .xx interval. In the example in Figure 4A, the proportion of values falling

inside the interval is .5 and the error rate is .5.

b) Error from underestimation of the population dispersion: Sample dispersion is a biased

estimator, under-estimating the population dispersion. If you sample from a population with

dispersion d the average sample dispersion D is lower than d and this is why the variance in a sample

needs to be multiplied by n/(n-1) to become an unbiased estimator of population variance. The

reason why the sample dispersion underestimates the population dispersion is that to appreciate the

variability in the population you need to observe also the extreme but unlikely values and the

probability that these are represented in a small sample is low. For this reason alone, assuming a

normal distribution, random sampling, and sample size 4, the .5 sample central interval from a SSD

will include at most 39% of the OED (i.e., the error rate is .61 rather than .5; see Figure 4B).
The Nave Intuitive 18

Several experiments in Kareev et al. (2002) verify that people fail to correct their estimates of the

population variance for this effect.

c) Error from misjudged location of the population distribution. The above under-

estimation of the dispersion occurs even if the central tendency of the sample coincides with the

central tendency of the population. In addition, at small sample size, the sample is likely to be

randomly dislocated relative to the population distribution by sampling error (i.e., as measured by the

standard error for the sample mean). For example, at sample size 4 the .5 sample coverage interval

includes 39% of the population values only if the sample is not dislocated by sampling error relative

to the population distribution. The Monte Carlo simulations presented below suggest that if we take

this sampling error into account, the sample coverage interval only includes 34% of the population

values (error rate 66 %: Figure 4C). The NSM implies that because only the first variability is explicit

in the sample, only the first contributor to the error-rate is taken into account when the intervals are

produced. At small sample sizes this produces extreme overconfidence and format dependence.

A parametric statistical model. The NSM assumes the estimation of limits of a population

distribution, the OED, on the basis of n randomly sampled observations from this distribution. In

terms of statistical theory, these intervals correspond to coverage intervals, covering a specified

proportion of a population distribution (e.g., a 90% coverage interval covers 90% of the population

distribution). The probability that another observation from the OED falls within certain limits is

based on this coverage interval for the OED.

In a standard statistical model, assuming normally and independently distributed observations,

the width W of a coverage interval is (Poulsen, Holst, & Christensen, 1997),

W = 2 n (n 1) sd 2 (1 + 1 n ) t n, p , (1)
The Nave Intuitive 19

where sd is the standard deviation in the sample, t n, p is the t-statistic at n (df=n-1) covering the

central proportion p of the t-distribution (e.g., at n=4 the .95 interval is defined by a t 4, .95 =

3.18). At infinite sample size n, Eq. 1 converges on 2 sd z p , where zp is the z-score delimiting the

central proportion p of the normal distribution (e.g., for p=.95, z.95= 1.96; for large n, 95% of the

normal distribution falls within 1.96 sd ). In other words, the size of the interval that covers the

central .xx proportion of the population distribution (e.g., the OED) is estimated from the standard

deviation in the sample, which is corrected for being a biased estimator ( n (1 n ) in the first right-

hand component of Eq. 1) and for the random sampling error in the interval placement (the second

right-hand component of Eq. 1).

Framed in this way, the NSM claims that the subjective coverage intervals w produced by

human participants is better approximated by,

w = 2 sd z p , (2)

that is, as if the sample properties directly describe the population properties, regardless of sample

size and without the appropriate small-sample corrections. In Figure 5A that illustrates the predicted

relative interval sizes (w/W) we see that the NSM predicts constantly too tight intervals and

profoundly so for small sample sizes. Figure 5B illustrates how the predicted hit rates depend on the

sample size, given a SED that is representative of the OED. (In this article, hit-rate will refer to the

proportion of values that fall within the intervals). The hit-rates are too low and more so with smaller

sample size. Figure 5C, finally, shows how a bias in the SED affects the hit-rates (the mean of the

SED is assumed to be 0, .5 or 1 standard deviations higher than the OED, where both distributions

have unit standard deviation). In sum: the predicted overconfidence increases with smaller samples

and more SED-bias.

Although the parametric model in Figure 5 has the advantage of portraying the key-mechanism

of the NSM in terms of a statistical model that is familiar to many readers it has drawbacks as a

psychological model. First, this parametric model assumes an infinite population. In many inference

problems (and in the ones of concern here) the population is finite and observations are drawn
The Nave Intuitive 20

without replacement. Second: the distribution is normal with infinite upper and lower limits for the

quantity, in contrast to applications where the distributions may vary and there are bounds on the

values that the quantity can take. Third: the assumption that people compute standard deviations,

specifically, may not appear plausible. In the following we therefore explore a model that is non-

parametric in the sense that it does not make assumptions about distributions4, and which is applied

to a real-life distribution relevant to a judgment task in regard to which we have collected extensive

empirical data.

A non-parametric simulation model. A Monte Carlo simulation was applied to estimation of

world country population (Juslin et al., 1999; Juslin et al., 2003; Winman et al., 2004). The

populations of the 188 countries listed by the United Nations in the year 2002 defined the database

and continent served as the cue. Simulations were executed as detailed by Steps 1, 2 and 3 in Figure

3 with the following amendments (see Appendix A for details). Sample size was treated as a random

variable, n e , where e is a normally distributed error with zero expectation and a standard

deviation of 1 that is rounded to an integer.

It was assumed that, rather than computing standard deviations and relying on the assumption

of a normal distribution, people report fractiles within the SSD. These fractiles are either

observations within the SSD, whenever the fractiles implied by the interval are explicitly represented

in the sample, or values generated by a standard procedure for interpolating fractiles from small

samples, when the requested fractiles fall in between values explicitly represented in the sample (e.g.,

for the 90th fractile at n=4). The procedure relies on plotting the empirical cumulative distribution in

the sample, in a small sample increasing by discrete stepsone for each observation, and to

interpolate in between the steps.

The predictions in Figure 6A, based on the assumption of no SED bias, reproduce the

overconfidence bias reported for interval production. The main difference from the predictions by the
The Nave Intuitive 21

parametric model in Figure 5B is that the hit-rate for the 1.0 interval is lower. This decrease is

explained by the non-parametric version capturing the end effects that arise near the limits of the

distribution that produce small and asymmetric intervals. For illustration Figure 6A also reports

typical empirical data produced by experimental participants (from Winman et al., 2004). For

example, with sample size 3 only half of the target values are predicted to be included by the 100%

intervals. When the SEDs are biased relative to the OEDs as will often be the case, also larger

sample sizes than 3 produce overconfidence of the magnitude observed in data. In view of estimates

of short term memory capacity we expect the sample size to fall between app. 3 and 8 observations

(Broadbent, 1975; Cowan, 2001; Miller, 1956)5. The connection implied between overconfidence

and short term memory capacity is empirically validated by data from a study reviewed below

(Hansson et al., 2006).

It has been noted that in empirical data the hit-rate is sometimes quite similar for intervals of

varying subjective probability (Block & Harper, 1991; Yaniv & Foster, 1997). It is therefore

important to consider the alternative hypothesis that the participants pay no attention at all to the

probability and just produce an interval that appears informative, or that they are unable to express

their knowledge as a probability distribution. By contrast, the NSM assumes the use of a distribution

and naturally predicts that the size of the intervals and the hit-rates that fall within the intervals should

increase with subjective probability (Figure 6A).

We therefore reanalyzed the data in Winman et al.s (2004) Experiment 1 where participants

made .5, .75, and 1.0 confidence intervals for the population of world countries in a blocked within-

subjects design. As illustrated in Figure 6A, in the analysis of the entire data set the hit-rates increase

significantly with subjective probability (see also Juslin et al., 1999). This could be a side effect of the

within-subjects comparison that highlights the different probabilities, therefore triggering the analytic
The Nave Intuitive 22

insight (see Kahneman & Frederick, 2002) that high-confidence intervals should be wider than low-

confidence intervals. However, half of the participants started by making .5 intervals in the first

block, the other half by making 1.0 intervals, so this comparison is strictly between-subjects and a

difference cannot be explained by a within subjects design. In the first block the group with 1.0

confidence intervals had a substantially higher hit rate than those who made .5 intervals (.64 vs. .32,

t 18 = 2.52, p = .01, one-tail) with 66 percent larger sizes of the intervals (t 18 = 2.52; p = .01, one-

tail).

Why is there a Format Dependence Effect?

What if the NSM is given the intervals produced in the simulation for Figure 6A now asking

it to assess the probability that the quantity falls within the interval? For example, if the initial

simulation for interval production suggested a 75% interval between X and Y for a country, the

simulation for interval evaluation assessed the probability that the population of the same country falls

in between X and Y. Figure 6B illustrate that now the predicted probability converges on the hit-rate

of countries falling within the interval, yielding close to zero overconfidence. (The overconfidence

score in Figure 6B is the probability for the interval minus the hit rate collapsed across the .5, .75,

and 1.0 intervals in Figure 6A.) If the 75% confidence intervals on average yield a hit-rate of .4, re-

entering the same intervals to the NSM for interval evaluation produces an average probability

judgment close to .4

Overconfidence is virtually abolished for the same events because proportion is not a biased

estimator (as is sample coverage) and the error from too small and displaced intervals is explicit in a

new random sample (compare with cross-validation in statistical modeling). Note, however, that the

simulation model predicts slight overconfidence also with interval evaluation, because by necessity

the target quantity (e.g., Thailand) is never in the SSD (a sample of Asian countries other than
The Nave Intuitive 23

Thailand) so strictly speaking the SSD is not a random sample from the OED (all Asian countries)

(see also Footnote 4). Below we review data demonstrating that people exhibit format dependence

not only when assessing independently defined events or intervals by others, but when they assess

their own intervals.

Figure 6B also illustrates typical data on format dependence from Juslin et al. (2003), which

again fall numerically close to the predictions by the NSM. Figure 7 provides a more detailed

comparison with the data from Juslin et al., where the participants assessed the probability that

statements about the world populations are true (i.e., Thailand has more than 10 million

inhabitants.) or produced confidence intervals for the populations of the world countries. The full-

range probability judgment task was implemented in the NSM simulation by a random selection of a

country and a cut-off value (the 10th, 50th, or 90th percentile of the OED for all countries). The NSM

made probability assessments for these propositions according to Steps 1, 2, and 3a, assuming that

the SED is representative of the OED. The predictions by the NSM in Figure 7B illustrates that

already by setting n = 4 1 and assuming a SED that coincides with the OED the predictions by the

NSM qualitatively capture the data in Figure 7A nicely, with full-range proportions close to the line

for perfect realism, yet simultaneously extreme overconfidence for the confidence intervals. In sum:

while interval production is plagued both by overconfidence due to a biased estimator and biased

input, interval evaluation (or probability judgment, more generally) is only affected by bias in the

input, and to the extent that this bias small, overconfidence can approximate zero.

As a different illustration of the effect of replacing coverage as an estimator with proportion

consider the results in Soll and Klayman (2004). After producing confidence intervals, Soll and

Klayman asked the participants for post ratings of the number of intervals that included the true

values. These post ratings were less overconfident than the interval productions. One crucial aspect
The Nave Intuitive 24

that is affected by changing the task from interval production into post rating of number of intervals

including the true value is that the estimator changes from coverage to a proportion (for how

many of the 50 questions do you think that the correct answer will turn out to be within the interval

you gave?, p. 304). The participants can now consider the proportion of intervals that include the

true value and, according to the NSM, this change of estimator variable explains the reduced

overconfidence.

Why is Overconfidence Different with Different Target Variables?

The other source of bias in Figure 1C, biased input samples (SEDs that deviate systematically

from the OEDs), also has an effect on overconfidence. Such deviations, which affect both interval

production and interval evaluation, may arise from, for example, a correlation between the target

magnitude and memory storage probability, as when we know more about larger countries (e.g.,

Goldstein & Gigerenzer, 2002) or when the extremes of a distribution, small and large magnitudes,

are more salient and likely to be encoded in the SED. Most deviations between SEDs and OEDs

contribute to overconfidence and for sample sizes between 4 and 8 and moderate discrepancies

between the SEDs and the OEDs we expect overconfidence of the magnitude observed in data.

Because, on average, the SEDs should be better calibrated to the OEDs for familiar quantities there

should also be less overconfidence for familiar than for unfamiliar quantities (Block & Harper, 1991;

Pitz, 1974).

The overconfidence OU observed in an interval production task can therefore be partitioned

into two separate components:

OU i = o( ni ) + ou( SEDik ) . (3)

OUi is the observed over-/underconfidence bias for individual i and o(ni) is the overconfidence bias

added by the naive interpretation of sample dispersion, the magnitude of which is determined by the
The Nave Intuitive 25

sample size ni available to individual i. ou(SEDik) is the over- or underconfidence added by a bias

between the SED and the OED, the magnitude of which is idiosyncratic to the individual i and the

target variable k. (ou signifies the possibility of both over- and underconfidence bias, in contrast to o

that signifies overconfidence.).

The NSM also highlights the conditions where the overconfidence in probability assessment

should be minimized. When the assessment involves probability judgments for pair comparisons

(e.g., which city has the larger population: a) London, or b) Paris?), the effect of the bias between

the SED and the OED is, in effect, minimized because the judgment now involves a rank order

within distributions (compare with the elimination of bias in psychophysical discrimination by pair-

comparison, see Macmillan & Creelman, 1991). To the extent that the SED preserves the rank

order in the OED, probability assessment with pair-comparisons should be especially conducive of

zero overconfidence, as they appear to be (Juslin & Persson, 2002; Juslin et al., 1999; Juslin et al.,

2000). Moreover, because the factors that determine the overconfidence bias are different, target

variables that produce more overconfidence with interval production need not produce more

overconfidence with assessment for pair comparisons (Klayman et al., 1999). In sum: the deviations

between the SEDs and the OEDs are likely to be highly idiosyncratic, contributing to differences in

the overconfidence observed for different target variables. If the SED-bias is small or the task

involves pair comparisons probability judgments converge on zero overconfidence.

Why is Overconfidence with Interval Production not Cured by Expertise?

In regard to the NSM it is important to distinguish between procedural expertise referring to

experience with expressing judgments as probability distributions and domain expertise implying

experience with the content domain of judgment. Procedural expertise, especially if judgments are

repeatedly performed in the same domain and with the same format, is likely to promote a shift form
The Nave Intuitive 26

the sort of on-line processing captured by the NSM to retrieval of pre-computed sufficient statistics

that describe the OED (e.g., retrieval of its mean and standard deviation). There are examples of

domain experts that make well calibrated probability judgments (Keren, 1987; Murphy & Brown,

1985), but few examples of domain experts producing well calibrated confidence intervals (Russo &

Schoemaker, 1992).

For the exceptions, like the meteorologist in Murphy and Winkler (1977), domain expertise

appears to be confounded with unknown degrees of technical support and extensive procedural

expertise, suggesting the possibility to rely in part on pre-computed knowledge rather than the

intuitive on-line computation of intervals captured by the NSM. Importantly, however, this difference

between the formats is verified also by the laboratory experiments reviewed below where procedural

expertise is controlled for (Hansson et al., 2006).

The explanation by the NSM is straightforward. With probability judgment, over-

/underconfidence biases mainly derive from systematic deviations between SEDs and OEDs. Given

appropriate feedback from the environment, domain expertise should provide ample opportunity to

correct these deviations to attain realistic probability judgments. With interval production expertise

can likewise eliminate the overconfidence that derives from a biased SEDand thus diminish the

overconfidencebut, because the sample size is architecturally constrained by short term memory

capacity, the overconfidence that derives from a nave use of sample dispersion is not cured by more

experience6. These implications are consistent with the literature on probability judgment and interval

production by novices and experts (Russo & Schoemaker, 1992). Below we review data (Hansson

et al., 2006) that directly address the role of experience and short term memory for overconfidence

in judgment.

Summary
The Nave Intuitive 27

The NSM provides straightforward accounts of the overconfidence with interval production

and the format dependence effect. With interval production there are two contributors to the

overconfidence corresponding to the two origins of judgment bias in the general scheme for the nave

intuitive statistician in Figure 1C. The first origin is that people use small-sample coverage as a direct

proxy for a population property. Only the sampling variability that is explicit in the sample is

considered and the error rate added by the underestimated population dispersion and misplaced

interval is ignored. This contributes a bias towards overconfidence regardless of what target variable

is estimated. The second origin is that whenever the input itself is biased the SEDs deviate

systematically from the OEDs. The SED-bias is idiosyncratic to the target variable and affects

probability judgment too.

We have demonstrated that the NSM is consistent with much of the data previously collected

in regard to overconfidence with interval production and format dependence. The true test of a

theory, however, lies in its ability to control the phenomena. In the following, we review research

directly motivated by the NSM illustrating how it allows us to control the overconfidence and the

format dependence effects, including a method to minimize the overconfidence in interval production

tasks. We also provide the first evidence in regard to the role of short term memory capacity for the

overconfidence with interval production.

Data Collection Motivated by the Nave Sampling Model

Can we Control the Magnitude of Format Dependence?

Studies of format dependence have most often compared formats that involve the assessment

of one fractile of the subjective probability distribution (What is the probability that the population of

Thailand exceeds 25 million?) to production of intervals that involve two fractiles of the subjective

probability distribution (Juslin et al., 1999). Winman et al. (2004) therefore compared two formats
The Nave Intuitive 28

that both consist of an interval and two fractiles; viz. interval production and interval evaluation as

discussed above. To precisely equate the events, participants first produced intervals that defined

events, and later the probability of these events (intervals) was assessed. For example, a participant

may assess a .5 interval of 10 to 30 million inhabitants for the population of Thailand. On a later

occasion the same or another participant assesses the probability that the population of Thailand falls

between 10 and 30 million inhabitants. The NSM predicts that the overconfidence should be

substantially reduced when participants make probability assessments for the intervals.

In regard to format dependence an important aspect is whether the event in the probability

judgment is already correlated with the SSD used to assess it. The event is typically uncorrelated if it

is defined a priori, as when the assessment concerns unknown future events (Keren, 1987; Murphy

& Brown, 1985) or general knowledge questions that are randomly selected from natural

environments (Gigerenzer et al., 1991; Juslin, 1994; Juslin et al., 2000). When the event is

uncorrelated with the SSD used to assess the event and the SED approximates the OED, especially

when the task involves pair-comparisons, the NSM suggests a potential for fairly well calibrated

probability judgments (Figure 6B).

Another possibility is that the event is correlated with the SSD used to make the probability

assessment. One reason for such a correlation is SSD-overlap. If different people retrieve the same

or almost the same SSD because of a small and biased environmental input (an SED bias) a

correlation arises when the event is defined by another persons best guess. If you, for example,

believe that Thailand has a large population because all Asian countries you know have large

populations and I know the same Asian countries as you do, I will share your belief. Similar

correlations obtain when a person that selects general knowledge items for a confidence study can
The Nave Intuitive 29

second-guess the modal answer from data or their own intuitions, and over-represent surprising

items (Gigerenzer et al., 1991; Juslin, 1994; Juslin et al., 2000).

One extreme case is accordingly that of no correlation that predicts the format dependence in

Figure 6B with close to zero overconfidence for probability judgment and large overconfidence with

interval production. The other extreme is that of perfect correlation where exactly the same SSD is

used to both produce and probability assess an interval that yields no format dependence7 and large

overconfidence with both formats. In between these extreme cases, the predicted format

dependence is of an intermediate magnitude.

In the between-subjects design in Winman et al. (2004) participants evaluated the probability

that the target quantity falls within the intervals produced by another participant. There was

significantly and substantially more overconfidence in the interval production condition than in the

interval evaluation condition (.34 vs. .15), although the events were matched on an item-by-item

basis. As predicted by the NSM, however, interval evaluation of an event that represents the best-

guess by a peer yields more overconfidence than observed when the events are predefined and

independent of the judges own knowledge state (see, e.g., Figures 6B & 7). The explanation is that

the SSDs retrieved by two peers are likely to overlap due to their experience with a similar small and

biased input from the environment.

The most remarkable demonstration is if participants are susceptible to format-dependence

also for the intervals they produce themselves. In the within-subjects design, the participants first

produced confidence intervals for 40 world country populations. One week later they assessed the

probability that the populations fall within the intervals produced on the first occasion. Finally, they

again produced intervals for the same 40 country populations. The intervals produced on the first

occasion revealed extreme overconfidence (.34), which was significantly reduced to .26 when they
The Nave Intuitive 30

one week later made interval evaluations of the intervals. When they, again, produced intervals for

the same 40 country populations, they returned to their initial overconfidence (.34; see Winman et

al., 2004).

Format dependence is thus robust enough to obtain also when participants assess their own

intervals. In sum: the format dependence effect is profound when the event is uncorrelated with the

assessors knowledge state (Figure 6B). As also predicted from consideration of the SSD-overlap in

the production and evaluation of an interval, the format dependence is smaller when a person

evaluates another persons best guess, and especially when evaluating his or her own best guess,

where the SSD-overlap is likely to be large.

Can we Cure (or Reduce) Overconfidence with Interval Production?

In Winman et al. (2004), Experiment 2, the NSM was used to design a method to produce

intuitive confidence intervals, but in a way that minimizes the overconfidence with interval production.

The ADaptive INterval Adjustment (ADINA) procedure proposes a sequence of intervals, each of

which changes in size in response to the probability assessment for the previous interval, with the

effect that the intervals home in on a target probability. For example, assume that the target interval

probability is .9 and the target value is the population of Thailand. The ADINA proposes an interval

to the participant, for example, between 20 and 40 millions. If the participant assesses a probability

below .9, the next interval proposed by ADINA is wider. If the assessed probability is above .9,

ADINA proposes a smaller interval. This procedure continues until the assessed probability is .9.

As with usual interval production the end-product is an interval, but the procedure requires

estimates of proportion rather than coverage. Experiment 2 had three conditions; a) a control

condition with ordinary interval production; b) an ADINA(O)-condition where ADINA proposes

intervals centered on the participants Own point estimate. Because the interval (event) that is
The Nave Intuitive 31

assessed itself presumably is affected by the same sampling error as the SSD used for the probability

evaluation this is likely to involve high SSD-overlap. c) An ADINA(R)-condition where ADINA

locates intervals centered on a Random population value. Because the interval is randomly placed

this implies a decoupling of the SSD-overlap8.

In the ADINA(O)-condition, the NSM predicts less overconfidence than in the control

condition because proportion rather than coverage is the estimator, but not zero overconfidence.

Roughly speaking, ADINA(O) controls for the underestimated dispersion of the population

distribution in Figure 4B, but not for the misplaced interval in Figure 4C, because the event and the

SSD are correlated. In the ADINA(R)-condition we expected a larger decrease in overconfidence

because the SSD-overlap is now removed. In effect, the ADINA(R)-condition transforms the task

into a standard full-range probability judgment task and the NSM therefore predicts close to zero

overconfidence (see Figure 6B).

Figure 8 shows the mean subjective probability and hit rate with 95 percent (real) confidence

intervals in the three conditions. Figure 8A confirms that overconfidence for the initial interval follows

the predicted order with extreme overconfidence for the control condition and close to zero

overconfidence for the ADINA(R)-condition. The hit-rate is similar in all three conditions, but

whereas the subjective probability assigned to the interval is too high in the control condition it is

quite accurate with ADINA(R). However, these initial intervals are not yet intervals associated with

the pre-specified target probability.

Figure 8B that present mean subjective probability and hit rate for the final intervals confirms the

predicted order with extreme overconfidence for the control condition and close to zero

overconfidence for the ADINA(R)-condition. In this case the subjective probability is almost the

same9at the levels defined by the desired probabilitiesbut overconfidence is diminished by a


The Nave Intuitive 32

manipulation that, in effect, increases the too low hit rates in the control condition so that they

approximate the stated probabilities. In sum: we can reduce overconfidence substantially both by

decreasing the subjective probability attached to the intervals and by increasing the hit rates

associated with the intervals. Both of the ADINA procedures produce confidence intervals with

significantly less overconfidence.

All of the studies reviewed thus far involve naturalistic general knowledge content, in regard to

which the participants have acquired the knowledge expressed in the estimates from their pre-

laboratory experience. This experimental paradigm obviously limits our ability to control the learning

history and the processes used by the participants. In the final sections, we therefore turn to data

collected in more controlled laboratory training experiments.

Do the Intervals Derive from Experience with Distributions?

The NSM implies that people possess subjective counterparts (the SEDs) of the ecological

distributions (the OEDs), and that the intuitive confidence intervals derive from retrieval operations

(sampling) on these representations. The aim of an experiment presented in detail in Appendix B was

to manipulate the OEDs in order to establish that a) people have the cognitive capacities to generate

SEDs that represent the OEDs and b) when encountering an unknown object the uncertainty

expressed by an interval is determined by the SED.

The participants were exposed to an OED that was either U-shaped or inversely U-shaped

(range = 1-1000). The U-shaped OED condition had a high frequency of values close to 1 and to

1000 and the inversely U-shaped OED condition consisted predominantly of target values close to

500. The task involved estimation of the revenue of fictive business companies (see Appendix B). To

directly elicit the SEDs the participants also estimated the relative frequency of observations in ten

intervals evenly distributed over the OED range.


The Nave Intuitive 33

In the learning phase the participants guessed the revenue of 156 companies with outcome

feedback about the correct target value. In a test phase, they produced confidence intervals for 60

companies they had previously seen in training and for 60 new companies. As predicted if people are

able to represent the OEDs, the assessed SEDs were bimodal in the U-shaped condition and

unimodal in the inversely U-shaped condition. The standard deviation of the SED was 497 in the U-

shaped condition (OED = 574.3) vs. 177.6 (OED = 183.6) in the inversely U-shaped condition

(F(1, 28) = 410.9, p<.001: in both conditions the 95% real statistical confidence intervals for the

SEDs include the OED). As implied if the intuitive confidence intervals derive from the SEDs, the

interval size for new objects was larger (750) in the U-shaped condition than in the inversely U-

shaped condition (550) (F(1, 28) = 12.0, p = .002). There was also a strong positive correlation

between an individual participants SED standard deviation and his or her average interval size (r(30)

= .54, p<.001).

Thus, when encountering unknown objects the participants appeared to rely on previously

experienced distributions in expressing intuitive confidence intervals. The study demonstrated that

people are sensitive to the OEDs with SEDs that largely mirror the OEDs, interval sizes are affected

by the OED manipulation in the way predicted by NSM, and a direct measure of the SED verifies

that the SED is strongly linked to interval size.

Sample Size: Constrained by Experience or Short Term Memory?

In Hansson et al. (2006) we investigated the role of short term memory capacity (referred to

as n by STM), the total number of observations stored in long term memory (n by LTM), and the

bias between the SED and the OED in a laboratory learning task of the sort described in Appendix

B. Nominally the task involved estimates of the revenue of companies, but unbeknownst to the

participants the distributions were from a real environment (the world country populations), which
The Nave Intuitive 34

they had to learn from scratch in the laboratory.

A bold assumption of the NSM is that the sample size that can be applied to an estimation

task is constrained by the judges short term memory capacity. The NSM therefore predicts that

overconfidence in interval production tasks should correlate negatively and strongly with n by STM

and, although experience might diminish it, overconfidence should be extreme also after extensive

experience (i.e., experience can eliminate ou(SED) but not o(n) in Eq. 3). In a probability judgment

task overconfidence should be rapidly reduced by experience because experience can eliminate

ou(SED), the main source of overconfidence, and overconfidence should not be as strongly

correlated with n by STM.

An alternative hypothesis is that the sample size instead is constrained by long term memory (n

by LTM). This is plausible to the extent that the representations derive from crystallized knowledge

that is accumulated and updated gradually from experience. In that case, extensive experience should

reduce overconfidence with both assessment formats and overconfidence should be more strongly

correlated with n by LTM than n by STM.

Experiments 1 and 2 from Hansson et al. (2006) varied the extent of training in conditions with

immediate, complete and accurate outcome feedback from very modest (68 trials), over intermediate

(272 trials) to extensive (544 trials). After 68 training trials with the same training stimuli

overconfidence was .28 with interval production, but .07 with interval evaluation. The predictions by

the NSM assuming sample size 4 and no SED-bias are .28 and .02, respectively. The format

dependence effect was thus replicated and the overconfidence with probability judgment approached

0 already after 68 training trials. Figures 9A and B summarize the results with interval production as a

function of training, after 68, 272, and 544 trials. As illustrated in Figure 9A, the proportions of

correctly recalled target values (and thus n by LTM) increased significantly with training, from less
The Nave Intuitive 35

than 10% of all 136 target values in each training block after 68 training trials, to more than 30%

after 544 training trials.

The marginal and statistically non-significant decrease in overconfidence bias in Figure 9B is

consistent with the NSM. The overconfidence contributed by SED-bias early in training is corrected

by further training as the SED converges on the OED. More crucially, as predicted there is persistent

extreme overconfidence with interval production even after 544 trials with feedback, more than four

times larger bias than with interval evaluation after only 68 trials. This is remarkable because

conditions with immediate, unambiguous, and representative feedback have traditionally been

presumed to be optimal for attaining well calibrated judgments (Keren, 1991; Lichtenstein et al.,

1982). Consistently with the research on experts (Russo & Schoemaker, 1992) and as predicted by

the NSM, however, experience appears to have a minimal effect on the overconfidence with interval

production.

In addition to n by LTM, as estimated by the proportion of correctly recalled target values, in

Experiments 1, 2 and 3 in Hansson et al. (2006) participants were measured with a standard digit-

span test to estimate n by STM. Individual SED bias was estimated by the absolute deviation

between the mean OED and the mean correctly recalled target value. These data were entered into a

multiple linear regression model to predict overconfidence. As illustrated in Table 1, after 68 training

trials no independent variable was significantly related to overconfidence, but after 272 training trials

overconfidence was significantly related to n by STM ( =-.72) and SED-bias ( =.39). After 544

training trials there was a strong -weight for n by STM (-.67) but now the relation with SED-bias

had disappeared (-.26). In none of the experiments is n by LTM significantly related to

overconfidence and nominally often the -weight is of the wrong sign (larger n by LTM suggests

more overconfidence).
The Nave Intuitive 36

Our interpretation was that after 68 training trials the effects of short term memory capacity

and SED-bias were overshadowed by large idiosyncratic differences in the variability in the small

samples of target values encoded in long term memory. After 272 training trials the participants had

formed stable, although still somewhat idiosyncratic SEDs and now the correlations with short term

memory and SED-bias appear. After 544 trials the SEDs have converged on the OEDs rendering

SED-bias a less useful predictor and the model is now dominated by the correlation with short term

memory capacity. By contrast, the overconfidence with interval production was not related the total

sample size in the participants long term memory (n by LTM), neither when it was experimentally

manipulated (see Figure 9B), nor in the analysis of the individual differences in Table 1.

In Experiment 3, a larger participant sample was measured in fluid intelligence (RAPM; Raven,

1965) and episodic memory (a version of Wexlers paired-associates). As illustrated in Table 1, in

the interval production condition n by STM significantly predicts overconfidence, but not in the

interval evaluation condition. The more specific purposes of Experiment 3 were to inquire if the

correlation between short term memory capacity and overconfidence obtains also when we control

for individual differences in intelligence and episodic memory capacity. We also wanted to test the

interaction predicted by the NSM: there is more overconfidence with interval production, but this

difference should be especially pronounced for people with low short term memory capacity. To test

this interaction, in each condition the participants were divided into low or high short term memory

capacity based on a median split.

The interaction in Figure 9C confirms the prediction by the NSM. In the interval evaluation

condition (probability judgment) there is no significant difference between the participants with low

and high short term memory capacity, but in the interval production condition there is significantly

more overconfidence bias for participants with low short term memory capacity. Entering RAPM
The Nave Intuitive 37

and episodic memory as covariates in the analysis has no effect on the significant interaction in Figure

9C. In sum: as predicted by the NSM, n by STM but not n by LTM predicts the overconfidence

observed with interval production, and this correlation is higher with interval production than with

probability judgment.

Interval Size and Prediction Error

Yaniv and Foster (1995; 1997) proposed that confidence intervals may not be expressions of

subjective probability distributions but formed by a trade off between two countervailing objectives:

to provide intervals that are as accurate as possible (which pulls towards wide intervals) and as

informative as possible (which pulls towards tight intervals). A realistic high probability confidence

interval is often too wide to be of any practical use and therefore the judges may implicitly, or by

habit, trade a lower accuracy, contributing to overconfidence, for the attainment of a tighter and

more informative interval.

Yaniv and Foster (1997) compared three ways to produce intervals: a) self-selected grain size

of ones estimate of an unknown quantity (e.g., whether one prefers to indicate the birth year of

Mozart by an interval in terms of centuries, decades, or years); b) 95 percent confidence intervals; c)

expected plus-or-minus error. They found that with all three methods the mean absolute error

between the midpoint of the interval and the correct target value was a constant fraction of the

interval size. Yaniv and Foster therefore concluded that One might be better off interpreting

intuitive interval judgments as predictors of absolute error rather than ranges that include the truth

with some high probability (p. 31).

This conclusion conflicts with the NSM, which presumes that confidence intervals express

knowledge of an underlying distribution (albeit accessed by a small sample). Already that high-

confidence intervals are associated with wider intervals and higher hit-rates (Figure 6A)
The Nave Intuitive 38

demonstrates that the intervals express more than a plausible margin of error, or an accuracy-

informativeness trade off. Yet, it is worth asking how the NSM can address the constant normalized

error observed by Yaniv and Foster (1997). The three rightmost columns of Figure 10 that present

the data from Hansson et al (2006) after 68, 272 or 544 training trials, and where absolute error is

plotted against interval size with best fitting regression lines, illustrate this pattern. There is a positive

slope between absolute error and interval size, but there are also two separate clusters of absolute

errors at the low interval sizes.

Critically, as in the data reported by Yaniv and Foster (1997) the slope is virtually constant

regardless of the probability of the interval. By contrast, with real confidence intervals the slopes

naturally decrease with higher confidence levels and overconfidence is zero. The predictions by the

NSM (n=4, no SED bias) in the leftmost row of Figure 10, reproduces both the exact bivariate

distribution and the relationship between absolute error and interval size. The explanation is that

unlike parametric coverage intervals (Eq. 1) or classical confidence intervals for a parameter (e.g.,

the population mean), the non-parametric NSM does not compute means and standard deviations

within the SSD, nor does it rely on distributional assumptions. The interval size is determined solely

by the upper and lower limits corresponding to two sample fractiles and the midpoint of the interval is

not the sample mean, but the midpoint between these two limits. The variability of the boundary

values directly affects the precision of the midpoint of the interval, in terms of absolute error

regardless of whether the interval covers 50%, 75% or 100% of the SSD.

The similarity between the predicted and observed patterns in Figure 10 therefore suggests

that the non-parametric version of the NSM better captures the cognitive processes than the

parametric version. Finally, both the interval size and the overconfidence bias increases with the

probability of the interval. In sum: Figure 10 demonstrates that the constant fraction between
The Nave Intuitive 39

absolute error and interval size observed empirically is consistent with the NSM, even though the

intervals express knowledge of probability distributions.

General Discussion

Jean Piaget (1896 1980) is the psychologist most renowned for the notion of egocentrism,

conceived of as the young childs inability to transcend his or her own local perspective of the

situation (Piaget & Inhelder, 1956). The arguments in this and several other publications (e.g.,

Fiedler, 2000; Fiedler & Juslin, 2006) propose that one important cause of judgment error lies in the

adults continued struggle to transcend the locally available samples of information. In this article we

thus present two interrelated arguments: First, that the perspective of the nave intuitive statistician

(Fiedler & Juslin, 2006b) complements previous approaches to intuitive judgment in useful ways.

Second: to illustrate the approach we have provided its in-depth application to two perplexing

phenomena in research on confidence: the extreme overconfidence with interval production and the

format dependence effect.

The Nave Intuitive Statistician

As suggested by the the intuitive statistician (Peterson & Beach, 1967), the nave intuitive

statistician entails that by large people accurately describe the samples of information they encounter.

On this view, heuristics that substitute probability with intensional properties like similarity or

availability (Kahneman & Fredericks, 2002) are often not the appropriate locus of explanation for

cognitive biases. Rather they arise from sample properties that are inappropriate estimators, either

because of a biased input (Einhorn & Hogarth, 1978; Fiedler, 2000a), or because the property is a

biased estimator (Kareev et al., 2002; Winman et al., 2004). This is not to deny that judgments are

affected by similarity and fluency, but only to propose that more often than perhaps appreciated the

most important cause of the bias lies not in the processing of the sample. The explanatory burden is
The Nave Intuitive 40

thereby relocated from heuristic processing of the sample to the relationship between sample and

environment.

This program highlights at least three lines of research; first, to address judgment phenomena in

terms of accurate but nave description of experience (see Fiedler, 2000; Fiedler & Juslin, 2006a).

As noted initially, people, for example, have biased conceptions of the frequency of death causes.

The traditional account of this phenomenon is that they rely on the availability heuristic (Lichtenstein

et al., 1978). Detailed investigations (Hertwig, Pachur, & Kurzenhuser, 2005), however, suggest

that it is not the ease of retrieval, an intensional property substituting for proportion, that explains

the bias, but accurate assessment of biased samples (availability by recall, in the terms of Hertwig

et al., 2005). The naivety implied by the nave intuitive statistician is moreover an equally compelling

account of the belief in the law of small numbersthe tendency to expect small samples to be

representative of their populationsas the explanation in terms of the representativeness heuristic

(Tversky & Kahneman, 1971). In many situations it may be unclear what the notion of

representativeness adds over and above the assumption that people take small sample properties as

direct proxies for population properties. In many of these instantiations, but certainly not all, the locus

of explanation seems better captured by accurate but nave description of experience.

Although not cast in terms of the metaphor of the nave intuitive statistician several lines of

recent research illustrate its fertility by incorporating aspects of it. Denrell (2005), for example,

showed that biases in impression formation traditionally explained by cognitive and motivational

factors may arise from accurate description of samples that are biased as a side-effect of sampling

strategies. Because you are inclined to terminate the contact (the sampling) with people whom you

get an initial negative impression of, you are unlikely to correct an initial false negative impression. By

contrast, a false positive impression encourages additional sampling that serves to correct the false
The Nave Intuitive 41

impression. The net effect is a bias, even if the samples are correctly described. Hertwig et al. (2004)

demonstrated that if people accurately express the proportions in small samples they will under-

weight the low probability events when they make decisions from trial-by-trial experience. A recent

theory also emphasizing the sampling metaphor is decision by sampling theory (Stewart, Chater, &

Brown, 2006) showing how ordinal comparisons and frequency counts based on retrieval of small

samples explain the value function and probability weighting assumed by prospect theory

(Kahneman & Tversky, 1979), such as the concave utility function and losses looming larger than

gains. A crucial part of the explanation is the relationship between small proximal samples and the

distributions of values and probabilities in real environments.

These recent approaches illustrate the fertility of more carefully analyzing the relationships

between the available proximal samples and distal environmental properties (Fiedler & Juslin,

2006a). A perhaps more far-reaching question refers to the extent to which differences of opinion in

societal and political controversies may sometimes derive from selective and different samples of

experience, together with an inability to appreciate this fact, rather than from differences in value

orientation or biased information processing.

Second, surprisingly little is known about how people represent knowledge of distributions. Is

this knowledge shaped by a priori assumptions about distributional shape (akin to a parametric

statistician) or driven by bottom up experience? Analogously to the debate in categorization

learning as to whether people develop abstract representations, like prototypes and rules, or rely on

exemplars (Nosofsky & Johansen, 2000) one can ask do people spontaneously compute summary

representations of distributions, perhaps of central tendency and dispersion, or are observations

stored in their raw form? What is the metric by which dispersion and skew is represented?

Although research on frequency learning (Sedlmeier & Betsch, 2002) and judgment (Gilovich et al.,
The Nave Intuitive 42

2002) provide useful hints, the cognitive machinery of the intuitive statistician needs to be better

understood.

Third, we need to determine the scope of the naivety or meta-cognitive myopia (Fiedler,

2000). On the one hand, large amounts of research suggest considerable naivety in a variety of

situations (e.g., Denrell, 2005; Einhorn & Hogarth, 1978; Fiedler, 2000; Fiedler & Juslin, 2006a;

Hertwig et al., 2004; Kareev et al., 2002). On the other hand, it also seems evident that we are not

completely encapsulated in our personal samples. Working as a professor does not produce the

conviction that most people are PhDs, even though this might be true of the sample of people that is

encountered in a working day. The empirical research that exists on peoples ability to learn from

selective feedback suggests that often they are less affected than one might initially suspect (Elwin et

al., in press; Grosskopf, Erev, & Yechiam, 2001; Yechiam & Busemeyer, 2006, but see Fischer &

Budescu, 2005). What are the limits of our naivety, and by what cognitive processes do we

transcend our personal samples?

In sum: in order to counteract biased judgments we need to understand a) the relationship

between the proximal samples and the distal environmental properties; b) the way in which

knowledge of distributions is represented; and c) the limits of, and our ability to transcend, our

naivety. Although the nave intuitive statistician holds some promise to enlighten our understanding of

judgment phenomena it should, of course, also be emphasized that there are phenomena that are still

beyond the scope of the approach and which might be better addressed by the variable-substitution

approach of the heuristic and biases program, such as, for example, the conjunction fallacy (Tversky

& Kahneman, 1983).

The Nave Sampling Model of Intuitive Confidence Intervals


The Nave Intuitive 43

The metaphor of the nave intuitive statistician was applied to the extreme overconfidence with

interval production and the format dependence effect. The NSM thus assumes that people accurately

but naively describe small samples to make a judgment and the overconfidence is determined by the

two factors that are emphasized by the nave intuitive statistician (Figure 1C): the extent to which the

samples are representative of the environment and the extent to which the sample property described

is an unbiased estimator.

The NSM assumes that people take sample proportion as an estimator of probability.

Because proportion is an unbiased estimator there is potential for good calibration, provided that the

SEDs approximate the OEDs, or if the task involves pair comparisons minimizing the impact of

SED-biases. With interval production people take sample coverage as an estimate of the probable

range of the unknown target quantity. Because at small sample sizes, sample coverage is a biased

estimator there is extreme overconfidence. In line with the constructive and malleable nature of

judgment in other domains (e.g., Kahneman & Tversky, 1979; Slovic, Griffin, & Tversky, 2002) the

implication is that people have potential to assess both well calibrated and extremely overconfident

probability distributions for the same events.

We first verified that already the crude version of the NSM in Figure 3 is able to reproduce

most of the characteristic properties of previous data. In research motivated by the NSM (Winman

et al., 2004) it was demonstrated that the degree of sample overlap predicts the size of the format

dependence. We also devised a method from which the end result is confidence intervals for

unknown quantities, as with traditional interval production, but where the participants rely on

proportion rather than coverage as the estimator variable, thereby minimizing the overconfidence in

the produced intervals. This result is notable not only because the overconfidence is extreme and
The Nave Intuitive 44

therefore of considerable applied concern, but also because cognitive biases are notoriously difficult

to debias (Fischhoff, 1982).

In learning experiments we corroborated that confidence intervals derive from distributions of

observed similar target values and that the overconfidence with interval production is minimally

affected by the sample size in long term memory, but strongly correlated with short term memory

capacity (Hansson et al., 2006). As predicted, in the same circumstances the overconfidence with

probability judgment is rapidly reduced already by modest training and the correlation with short

term memory capacity is low.

The NSM outlined in Figure 3 is an obvious simplification, ignoring many of the potential

contributors to overconfidence, like for example the regression effects that may arise from other

stochastic components than sampling error (Erev et al., 1994), and the effects of attention and

memory captured by support theory (Tversky & Koehler, 1994) and of biased selection of task

items (Gigerenzer et al., 1991; Juslin, 1994). Although the simulation model of the NSM predicts

minor amounts of overconfidence (see Figure 6B & Footnote 4), some of the overconfidence

observed for probability judgment (e.g., with ADINA(R) in Figure 8) may derive from these and

other unknown origins of overconfidence. Despite its simplicity we conclude that it is plausible that

the NSM captures one important aspect of the processes that underlie overconfidence with interval

production and the format dependence effect.

Limitations of the NSM

One unsettling aspect is that there is no consensus on how people represent the variability in a

sample, except that it appears to be normalized against the sample mean and therefore is better

predicted by the coefficient of variation than the standard deviation (Weber, Shafir, & Blaise, 2004).

That sample coverage is biased, however, is firmly rooted in the relationship between sample and
The Nave Intuitive 45

population and surfaces robustly regardless of whether the NSM is framed as a parametric model

implying the computation of a sample standard deviation or as a simulation model that presumes the

direct use of sample fractiles.

To highlight its statistical logic the NSM was formulated in terms of frequency, something that

may suggest two limitations of the model: First, that it ignores that probability is often affected by

similarity (Tversky & Kahneman, 1983); second, that for many judgment tasks the applicability of a

sampling metaphor may be less obvious than for the country population tasks used here and, indeed,

often there may be no obvious reference classes from which to sample (Kahneman & Tversky,

1996). For example, it is not obvious what class of similar observations to consider when you

estimate the birth date of (the Swedish author and play writer) August Strindberg. We propose that

these concerns can be mitigated by implementing the NSM as an exemplar model where the

probability judgment is driven both by similarity and frequency (see, e.g., Dougherty, 2001;

Dougherty, Gettys, & Ogden, 1999; Juslin & Persson, 2002; Nilsson, Olsson, & Juslin, 2005; Sieck

& Yates, 2001).

The algorithm in Figure 3 is actually the special case of an exemplar model where similarity

takes only one of two discrete values, either an exemplar (e.g., a country) satisfies the cue and is

entered into the computations or it does not and is not processed. It would be straightforward to

relax these assumptions by allowing exemplars to enter the SSD as a continuous function of the

feature overlap (similarity) between probe and exemplars (see Juslin & Persson, 2002, for such a

relaxation in regard to probability judgment). An exemplar model does not require any pre-specified

reference class; only that similar exemplars are sampled into the SSD as a continuous function of

their similarity to the probe. If people naively report the properties of small proximal samples the

inherent difference between proportion and coverage as statistical estimators nonetheless remains
The Nave Intuitive 46

relevant. Indeed, the statistical logic captured by the NSM is not premised on how the variability that

underlies the uncertainty is generated and holds also when the alternative possibilities are generated

by, for example, imagination or mental simulation (Kahneman & Tversky, 1982).

Another question concerns the interpretation of the NSM, which could either be interpreted as

a computational-level theory or an algorithm-level theory (Marr, 1982). A computational-level

theory specifies what is computed, regardless of the algorithm that executes the computation. An

algorithm-level theory specifies the exact algorithm for the computation. The NSM has largely been

described as an (albeit crude) algorithm-level theory of the processes and representations involved in

probability judgment. An alternative interpretation is as a computational-level theory where the

crucial claim by the NSM is that what the mind computes, however it is computed, corresponds not

to the function implied by normative statistics but by a statistician that is nave in the specified sense.

We conclude that, by large, the data seems consistent with the computational level theory implied by

the NSM and note that data are accumulating (Dougherty, 2001; Dougherty et al., 1999; Juslin &

Persson, 2002; Nilsson et al., 2005; Sieck & Yates, 2001) suggesting that in an exemplar

implementation the NSM may also have validity as an algorithm-level theory.

What appears to be the most immediate empirical challenge to the NSM is the observation of

response order effects (as we shall see, it provides an even larger challenge to prominent previous

accounts of overconfidence with interval production). In seven experimental conditions, Block and

Harper (1991) documented pervasive effects of asking the participants to provide point estimates

before the intervals, with wider intervals and decreased overconfidence. The effect did not generalize

when participants were provided with point estimates by peers, suggesting that the process of

generating the point estimate and not the existence of an anchor is crucial for the effect. Soll and

Klayman (2004) further demonstrated that if participants make separate assessments of the upper
The Nave Intuitive 47

and lower limits of the interval, immediately following each other, rather than producing the interval as

a single judgment, the intervals became even wider with further reduction in the overconfidence bias.

That overconfidence is reduced by the point estimate suggests that, in effect, it temporarily

enlarges the sample applied to estimation, analogous to a priming effect (Juslin et al., 1999). If

consecutive estimates produce residual activation in short term memory, the effective sample size

may temporarily exceed the sample size retrieved for a single judgment enlarging interval size and hit

rate. For example, the NSM implies that extending a sample size of 4 observations by 2 additional

observations increases the hit rates for 80% intervals by more than 10 percentage units, roughly

approximating the effects observed in data.

The priming account predicts that order effect should only occur if the short term memory

content activated by the point estimate is still active when the interval is produced. In Juslin et al.

(1999) where the point estimates were assessed immediately before each interval production the

median normalized interval sizes for point estimates first was significantly higher than for point

estimate after interval production (2.37 vs.1.78; Wilcoxon, Z = 3.40, p < .001). When point

estimates were blocked before or after the block of interval productions (Winman et al., 2004),

implying at least ten minutes between the point estimates and the interval productions, there was no

priming effect (median normalized interval sizes of 1.22 vs. 1.23; Kruskal-Wallis, 2 = 0, p = 1).

Although this result is consistent with the priming account, these data were not collected with this

question in mind and we therefore have to await further research to test it and compare it with

alternative accounts.

Alternative Explanations and Models

The naivety of the NSM is similar to the law of small numbers (Tversky & Kahneman,

1971), peoples inclination to expect also small samples to accurately portray the population
The Nave Intuitive 48

properties. The traditional account of the law of small numbers is the representativeness heuristic,

which is insensitive to notions of sample size. Although the NSM implies that people should behave

as if they believe in the law of small numbers, clearly the explanation is different and emphasizes

accurate description of small samples rather than heuristic processing that replaces probability with

similarity.

The NSM also has resemblance to probabilistic mental models (PMM) theory (Gigerenzer et

al., 1991). As with the NSM, PMM-theory emphasizes that people represent knowledge of the

distributions of objects in natural environments; although more emphasis is placed on pre-computed

abstractions in the form of cue validites (see Juslin & Persson, 2000). The explanatory focus of the

PMM-theory is different, however, and complementary. In regard to overconfidence, PMM-theory

emphasizes the relationship between the items presented to participants in the laboratory and the

natural environments, capturing how biased item selection may produce overconfidence (see also

Juslin et al., 2000). The NSM, as discussed in this article, assumes representative task selection and

concentrates on biased sample properties and external biases in the information stored in memory.

The NSM further incorporates limited aspects of the error models (Erev et al., 1994; Juslin et

al., 1997; Soll, 1996). The error models show how unbiased stochastic noise in the judgment

process can produce overconfidence in the calibration analysis. The NSM addresses the sampling

error involved in retrieving the SSD from the SED, but not the other origins of noise, such as

response error in the use of the probability scale (see Juslin et al., 1997). The previous studies that

have applied error models to intuitive confidence intervals (Juslin et al., 1999; Soll & Klayman,

2004) clearly suggest that the regression effect implied by random error, at various stages of the

process, is insufficient to explain the observed bias.

The most well-known account of overconfidence with interval production is the anchoring-
The Nave Intuitive 49

and-adjustment heuristic (Tversky & Kahneman, 1974). According to this account, people

produce intervals by anchoring on their best guess and then adjusting insufficiently from this anchor.

Although this account has face validity and often appears to be accepted as the explanation of

overconfidence with interval production, it has limitations. First: there are concerns as to whether the

heuristic truly explains as opposed to describes the phenomenon, although there have been attempts

to explicate its cognitive bases (Chapman & Johnson, 2002). Second, and more seriously, it is

directly refuted by the finding that emphasizing the anchor by asking for a point estimate prior to the

interval increases the interval size and yields less rather than more overconfidence. In one of the few

other attempts to directly test the anchoring-and-adjustment explanation Juslin et al. (1999) showed

that the contribution was minor and insufficient to account for the magnitude of the overconfidence

bias.

Soll and Klayman (2004) proposed that the overconfidence bias is explained by selective and

confirmatory search of memory and the observation of wider intervals when the upper and lower

limits were assessed separately was interpreted in these terms. When people assess the upper limit

they selectively activate knowledge consistent with a high target value (e.g., reasons why Thailand

may have a large population); when assessing the lower limit they selectively activate knowledge

consistent with a low target value (reasons why Thailand may have a small population). As a result,

the intervals become wider. This account again appears inconsistent with intervals that become wider

when preceded by a point estimate (Block & Harper, 1991; Clemen, 2001; Juslin et al., 1999; Soll

& Klayman, 2004): first concentrating onor committing toa best guess should if anything

instigate more priming of information consistent with or confirming the point estimate and thus shrink

the intervals.

Nonetheless, the assumption of perfectly unbiased retrieval in the NSM (the SSD is assumed
The Nave Intuitive 50

to be a random sample of the SED) is a strong assumption, but, as emphasized in the introduction, it

is a working assumption that may have to be rejected if assumptions of biased memory retrieval

prove to yield significantly improved prediction or explanation of the data. At present, however, we

conclude that the NSM accounts for most of the data already without recourse to confirmatory

search of memory. It should also be noted that the mechanism emphasized by the NSM, the nave

interpretation of unbiased or biased sample properties, is not rendered irrelevant by the addition of

an assumption of biased retrieval.

More generally, our ability to compare the NSM to these competitors is at present hampered

by the fact that both the anchoring and the confirmatory search explanations remain too vaguely

specified. To apply these accounts to the range of phenomena discussed in this article, for example,

the format dependence effect, and the differential effect of experience and short term memory with

interval production and probability assessment, would require their quantitative specification and

numerous arbitrary decisions. By contrast, the NSM in its plain vanilla version requires a minimum

of such auxiliary assumptions to account for these phenomena and offers a practical method to

minimize the overconfidence.

Conclusions

Viewed from a distance the issue of whether cognitive biases arises from heuristic processing

of the evidence or from nave but accurate description of available samples may seem like splitting

hairs: the result is a judgment bias just the same. Data are, however, accumulating that in regard to

many biases the relations between samples and populations is a more important factor than heuristic

processing (e.g., Denrell, 2005; Fiedler, 2000; Fiedler & Juslin, 2006a; Hertwig et al., 2004). The

results in this article suggest that the NSM captures, at least, one important aspect of the

overconfidence and format dependence in previous studies and illustrates that these insights may be
The Nave Intuitive 51

crucial for reducing the biases.


The Nave Intuitive 52

References

Baddeley, A. (1998). Recent developments in working memory. Current Opinion in

Neurobiology, 8, 234-238.

Block, R. A., & Harper, D. R. (1991). Overconfidence in estimation: Testing the anchoring and

adjustment hypothesis. Organizational Behavior and Human Decision Processes, 49,

188-207.

Broadbent, D. E. (1975). The magic number seven after fifteen years. In A. Kennedy & A. Wilkes

(Eds.), Studies in Long Term Memory (pp. 3-18). London: Wiley.

Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology.

Psychological Review, 62, 193-217.

Chapman, G. B., & Johnson, E. J. (2000). Incorporating the irrelevant: Anchors in judgments of

belief and value. In T. Gilovich, D. W. Griffin, & D. Kahneman, (Eds.) Heuristics and

Biases: The Psychology of Intuitive Judgment (pp. 120-138). New York: Cambridge

University Press.

Clemen, R. T. (2001). Assessing 10-50-90s: A surprise. Decision Analysis Newsletter, 20, 2, 15.

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental

storage capacity. Behavioral & Brain Sciences, 24, 87-114; discussion 114-185.

Denrell, J. (2005). Why most people disapprove of me: Experience sampling in impression

formation. Psychological Review, 112, 951-978.

Dougherty, M. R. (2001). Integration of the ecological and error models of overconfidence using a

multiple-trace memory model. Journal of Experimental Psychology: General, 130, 579-

599.

Dougherty, M. R., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: A memory process
The Nave Intuitive 53

model for judgments of likelihood. Psychological Review, 106, 180-209.

Edwards, W. (1982). Conservatism in human information processing. In D. Kahneman, P. Slovic, &

A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 359-369).

New York: Cambridge University Press.

Einhorn, H. J., & Hogarth, R. M. (1978). Confidence in judgment: Persistence of the illusion of

validity. Psychological Review, 85, 395-416.

Elwin, E., Juslin, P., Olsson, H., & Enkvist, T. (in press). Constructivist coding of selective

feedback. Psychological Science.

Erev, I., Wallsten, T. S., & Budescu, D. V. (1994). Simultaneous over- and underconfidence: The

role of random error in judgment processes. Psychological Review, 101, 519-527.

Estes, W. K. (1976). The cognitive side of probability learning. Psychological Review, 83, 37-64.

Fischer, I., & Budescu, D. V. (2005). When do those who know more also know more about how

much they know? The development of confidence and performance in categorical decision

tasks. Organizational Behavior and Human Decision Processes, 98, 39-53.

Fischhoff, B. (1982). Debiasing. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under

uncertainty: Heuristics and biases (pp. 422-444). New York: Cambridge University

Press.

Fiedler, K. (2000). Beware of samples! A cognitive-ecological sampling approach to judgment

biases. Psychological Review, 107, 659-676.

Fiedler, K., & Juslin, P. (2006a). Information sampling and adaptive cognition. New York:

Cambridge University Press.

Fiedler, K., & Juslin, P. (2006b). Taking the interface between mind and environment seriously. In

K. Fiedler, & P.Juslin (Ed.), Information sampling and adaptive cognition. New York:
The Nave Intuitive 54

Cambridge University Press.

Gigerenzer, G., Hoffrage, U., & Kleinblting, H. (1991). Probabilistic mental models: A Brunswikian

theory of confidence. Psychological Review, 98, 506-528.

Gigerenzer, G., & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The empire of

chance. Cambridge: Cambridge University Press.

Gilovich, T., Griffin, D., & Kahneman, D. (2002). Heuristics and biases: The psychology of

intuitive judgment. Cambridge: Cambridge University Press.

Goldstein, D. G., & Gigerenzer, G. (2002). Models of ecological rationality: The recognition

heuristic. Psychological Review, 109, 75-90.

Grosskopf, B., Erev, I., & Yechiam, E. (2001). Foregone with the Wind. Retreived April 7th 2006,

from http://www2.gsb.columbia.edu/faculty/ierev/foregone4.pdf

Hacking, I. (1975). The emergence of probability. London: Cambridge University Press.

Hansson, P., Juslin, P., & Winman, A. (2006). The role of short term memory and task

experience for overconfidence in judgment under uncertainty. Manuscript submitted for

publication, Department of Psychology, Ume University, Sweden.

Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of

Experimental Psychology: General, 108, 356-388.

Hertwig, R., Pachur, T. & Kurzenhuser, S. (2005). Judgments of risk frequencies: Tests of possible

cognitive mechanisms. Journal of Experimental Psychology: Learning, Memory and

Cognition, 35, 621-642.

Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (2004). Decisions from experience and the effect

of rare events in risky choice. Psychological Science, 15, 534-539.


The Nave Intuitive 55

Johnson-Laird, P. N. (1983). Mental models. Cambridge, MA: Harvard University Press.

Juslin, P. (1994). The overconfidence phenomenon as a consequence of informal experimenter-

guided selection of almanac items. Organizational Behavior and Human Decision

Processes, 57, 226-246.

Juslin, P., Olsson, H., & Bjorkman, M. (1997). Brunswikian and Thurstonian origins of bias in

probability assessment: On the interpretation of stochastic components of judgment. Journal

of Behavioral Decision Making, 10, 189-209.

Juslin, P., & Persson, M. (2002). PROBabilities from EXemplars (PROBEX): A "lazy" algorithm for

probabilistic inference from generic knowledge. Cognitive Science, 26, 563-607.

Juslin, P., Wennerholm, P., & Olsson, H. (1999). Format dependence in subjective probability

calibration. Journal of Experimental Psychology: Learning, Memory, and Cognition,

28, 1038-1052.

Juslin, P., Winman, A., & Olsson, H. (2000). Naive empiricism and dogmatism in confidence

research: A critical examination of the hard-easy effect. Psychological Review, 107, 384-

396.

Juslin, P., Winman, A., & Olsson, H. (2003). Calibration, additivity, and source independence of

probability judgments in general knowledge and sensory discrimination tasks.

Organizational Behavior and Human Decision Processes, 92, 34-51.

Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences

in working memory. Psychological Review, 99, 122-149.

Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in

intuitive judgment. In T. Gilovich, D. W. Griffin, & D. Kahneman (Eds.), Heuristics and

biases: The psychology of intuitive judgment (pp. 49-81). New York: Cambridge
The Nave Intuitive 56

University Press.

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.

Econometrica, 47, 263-291.

Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P. Slovic, & A.

Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 201-208). New

York: Cambridge University Press.

Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review,

103, 582-591.

Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty : Heuristics and

biases. Cambridge: Cambridge University Press.

Kareev, Y., Arnon, S., & Horwitz-Zeliger, R. (2002). On the misperception of variability. Journal

of Experimental Psychology: General, 131, 287-297.

Keren, G. (1987). Facing uncertainty in the game of bridge. Organizational Behavior and Human

Decision Processes, 39, 98-114.

Keren, G. (1991). Calibration and probability judgments. Conceptual and methodological issues.

Acta Psychologica, 77, 217-273.

Klayman, J., Soll, J. B., Gonzalez-Vallejo, C., & Barlas, S. (1999). Overconfidence: It depends on

how, what, and whom you ask. Organizational Behavior and Human Decision

Processes, 79, 216-247.

Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of subjective probabilities: The

state of the art up to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment

under uncertainty: Heuristics and biases (pp. 306-334). New York: Cambridge

University Press.
The Nave Intuitive 57

Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B. (1978). Judged frequency of

lethal events. Journal of Experimental Psychology: Human Learning and Memory, 4,

551-578.

Macmillan, N. A., & Creelman, C. D. (1991). Detection theory: A user's guide. New York:

Cambridge University Press.

Marr, D. (1982). Vision. San Franscisco: W. H. Freeman.

Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for

processing information. Psychological Review, 63, 81-97.

Murphy, A. H., & Brown, B. G. (1985). A comparative evaluation of objective and subjective

weather forecasts in the United States. In G. Wright (Ed.), Behavioral decision making

(pp. 329-359). New York: Plenum Press.

Murphy, A. H., & Winkler, R. L. (1977). Reliability of subjective probability forecasts of

precipitation and temperature. Applied Statistics, 26, 41-47.

Newell, A., & Simon, H. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall.

Nilsson, H., Olsson, H., & Juslin, P. (2005). The cognitive substrate of subjective probability.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 600-620.

Nosofsky, R. M., & Johansen, M. K. (2000). Exemplar-based accounts of multiple-system

phenomena in perceptual categorization. Psychonomic Bulletin & Review, 7, 375-402.

Peterson, C. R., & Beach, L. R. (1967). Man as an intuitive statistician. Psychological Bulletin, 68,

29-46.

Piaget, J. & Inhelder, B. (1956). The child's conception of space. London: Routledge and Kegan

Paul.

Pitz, G. F. (1974). Subjective probability distributions for imperfectly known quantities. In L. W.


The Nave Intuitive 58

Gregg (Ed.), Knowledge and cognition. (pp. 29-41). Potomac, MD: Erlbaum.

Poulsen, O. M., Holst, E., & Christensen, J. M. (1997). Calculation and application of coverage

intervals for biological reference values. Pure and Applied Chemistry, 69, 1601-1611.

Raven, J. C. (1965). Advanced Progressive Matrices, Set I and II. London: H. K. Lewis.

Russo, J. E., & Schoemaker, P. J. (1992). Managing overconfidence. Sloan Management Review,

33, 7-17.

Schwarz, N., & Wnke, M. (2002). Experiental and contextual heuristics in frequency judgment:

ease of recall and response scales. In P. Sedlmeier & T. Betsch (Eds.), Etc. Frequency

processing and cognition (pp. 89-108). Oxford, UK: Oxford University Press.

Sedlmeier, P., & Betsch, T. (2002). Etc. Frequency processing and cognition. Oxford, UK:

Oxford University Press.

Sieck, W. R., & Yates, J. F. (2001). Overconfidence effects in category learning: A comparison of

connectionist and exemplar memory models. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 27, 1003-1021.

Slovic, P., & Griffin, D., & Tversky, A. (2002). Compatibility effects in judgment and choice. In T.

Gilovich, D. W. Griffin, & D. Kahneman, (Eds.) Heuristics and biases: The psychology of

intuitive judgment (pp. 217-229). New York: Cambridge University Press.

Soll, J. B. (1996). Determinants of overconfidence and miscalibration: The roles of random error and

ecological structure. Organizational Behavior and Human Decision Processes, 65, 117-

137.

Soll, J. B., & Klayman, J. (2004). Overconfidence in interval estimates. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 30, 299-314.

Stewart, N., Chater, N., & Brown, G. D. A. (2006). Decision by sampling. Cognitive Psychology,
The Nave Intuitive 59

53, 1-26.

Tomassini, L. A., Solomon, I., Romney, M. B., & Krogstad, J. L. (1982). Calibration of auditors

probabilistic judgments. Some empirical evidence. Organizational Behavior and Human

Performance, 30, 391-406.

Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin,

2, 105-110.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science,

185, 1124-1131.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy

in probability judgment. Psychological Review, 90, 293-315.

Tversky, A., & Koehler, D. J. (1994). Support theory: A nonextensional representation of

subjective probability. Psychological Review, 101, 547-567.

Weber, E. U., Shafir, S., & Blaise, A-R. (2004). Predicting risk sensitivity in humans and lower

animals: Risk as variance or coefficient of variation. Psychological Review, 111, 430-445.

Winman, A., Hansson, P., & Juslin, P. (2004). Subjective probability intervals: How to cure

overconfidence by interval evaluation. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 30, 1167-1175.

Winman, A., & Juslin, P. (1993). Calibration of sensory and cognitive judgments - Two different

accounts. Scandinavian Journal of Psychology, 34, 135-148.

Yaniv, I., & Foster, D. P. (1995). Graininess of judgment under uncertainty: An accuracy

informativeness trade-off. Journal of Experimental Psychology: General, 124, 424-432.

Yaniv, I., & Foster, D. P. (1997). Precision and accuracy of judgmental estimation. Journal of

Behavioral Decision Making, 10, 21-32.


The Nave Intuitive 60

Yechiam, E., & Busemeyer, J. R. (2006). The effect of foregone payoffs on underweighting small

probability events. Journal of Behavioral Decision Making, 19, 1-16.

Zacks, R. T., & Hasher, L. (2002). Frequency processing: A twenty-five year perspective. In P.

Sedlmeier & T. Betsch (Eds.), Frequency processing and cognition (pp. 21-36). New

York: Oxford University Press.

Appendix A:

Details of the Monte Carlo Simulations

Interval production. In each iteration a target country T with population yT was randomly

sampled among the 188 world nations. Continent of the country was used as the cue C. A sample of

n exemplars Xi was sampled randomly without replacement from the reference class RC of exemplars

from the same continent as T (excluding T). For the 100%, 75% and 50% intervals the fractiles of

the sample distributions were used to generate the appropriate interval for the value of yT , as detailed

by Steps 1, 2 and 3b in Figure 3. For finite and small samples, all fractiles are not explicitly

represented in the sample as an observed value. In this case one has to rely on some method to

derive the value of the fractiles not explicitly represented in the sample by interpolating between the

observed values.

There exist several methods for interpolating fractiles from finite samples (see

http://www.maths.murdoch.edu.au/units/c503/unitnotes/boxhisto/quartilesmore.html for a discussion

of the two most common methods). The simulations rely on a standard procedure, commonly

referred to as the EXCEL method. This method is the only customary used procedure directly

applicable to the present simulations with small sample sizes. The fractiles are calculated as follows;

Let n be the number of cases, and p be the percentile value divided by 100. (n-1)p ((n-1) times p) is
The Nave Intuitive 61

expressed as (n-1)p=j+g where j is the integer part of (n-1)p, and g is the fractional part of (n-1)p.

The fractile value is calculated as xj+1 when g=0 and as xj+1 + g(xj+2 - xj+1) when g>0. The proportion

of intervals that included the true value was recorded. Realism or calibration requires that the

proportions are 1.0, .75, and .5, respectively, for 100%, 75%, and 50% confidence intervals. The

results are presented in Figure 6A.

Interval evaluation; independent samples. For evaluation of independently defined intervals

the intervals produced in the first simulation were taken to define the events (the event being that the

population of T falls within the interval). For each such intervalnow pre-stated and fed to the

algorithm rather than produced by itthe algorithm produces a probability judgment from a new and

independent random sample according to Steps 1, 2, and 3a of the NSM. Because the event

(interval) is defined independently of the specific sample retrieved to make the interval evaluation

and, in effect, uses a sample proportion to estimate a population proportion, it yields an almost

unbiased estimate (see Figure 6B).

Interval evaluation: Dependent samples. The same basic procedure was used to simulate

the case where the samples retrieved at the two occasions are dependent and overlap more than

expected by chance. When retrieving the observations at the second occasion (interval evaluation)

each observation was sampled from the observations in the sample from the first occasion (interval

production) with resampling probability r. With probability 1-r each observation was sampled

randomly among the observations within Rc that were not part of the first sample. All samplings were

made without replacement. The simulations verified that the difference between the overconfidence

with interval production and interval evaluation diminishes as a function of the resampling probability

(sample overlap).
The Nave Intuitive 62

Robustness. To further ascertain the robustness of the predictions in Figure 6A we applied the

algorithm to several idealized target variables for which the effects of distributional shape and

discrepancies between the SEDs and OEDs are more easily determined. Simulations based on

uniform, positively skewed, negatively skewed, U-shaped, and inverse U-shaped Beta-distributions

indicate that as long as the SED is representative of the OED, overconfidence is of similar magnitude

across a variety of distributional shapes. The simulations also reveal that the difference between the

overconfidence with interval production and full-range probability assessment (the format

dependence) is constant across different distribution shapes and deviations between the SEDs and

OEDs.

The simulations moreover show that systematic deviations between the SEDs and OEDs do

not always produce more overconfidence. The effect may actually be negative and pull towards

reduced overconfidence or even underconfidence. In simulations three different OEDs, a uniform, a

U-shaped, and an inversely U-shaped distribution, are crossed factorially with the corresponding

three distribution shapes of the SEDs. For example, a uniform OED and a U-shaped SED may

correspond to a target variable where all values are equally likely in the environment, but because

extreme values are more memorable the SED over-represents low and high values. Generally

speaking, the results showed that a U-shaped SED produces less overconfidence because most

observations are either small or large leading to large intervals. Inversely U-shaped SEDs contain

mostly intermediate target values and therefore tend to produce small intervals and strong

overconfidence.

The simulations further assume that people have exact knowledge of a number of target values,

for example, about the populations of some countries. To control for imperfect knowledge the

simulations in Figure 6A were repeated with the population figures in the database perturbed by a
The Nave Intuitive 63

random error representing imperfect knowledge with zero expectation and a standard deviation of

either10%, 20%, 30%, 40%, or 50% of the true population of the country. The predicted

overconfidence was robust and virtually unaffected up to a standard error of about 40% of the

population figure, where after it is slightly diminished. The empirically obtained point-estimates for

country populations observed in Winman et al. (2004) suggest an average absolute error in the

region of 8-9% of the population figures. Similar conclusions about the robustness of the predictions

hold also when we allow for the possibility that people confuse the cues (e.g., attribute a country to

the wrong continent in some proportion of trials). The simulations are not dependent on the

idealization of accurate knowledge but hold over much less accurate knowledge than that actually

observed.
The Nave Intuitive 64

Appendix B:

Details of the Empirical Study

The aim of the experiment was see whether participants are able to generate SEDs that

represent the OEDs and if intuitive confidence intervals are determined by the SED. We used two

distinct distributions: a U-shaped and an inversely U-shaped. The predictions following from the

NSM regarding interval production based on these two distributions are indeed straightforward: If

the intervals are generated from a U-shaped SED the average interval width should be considerably

wider compared to intervals that are generated from an inversely U-shaped SED. Another way to

verify that SEDs mirror OEDs is to look at distribution of point estimates and their standard

deviations. The standard deviations of an SED that is U-shaped should numerically be larger then an

SED that is inversely U-Shaped.

Method

Participants. Thirty undergraduate students (15 woman & 15 men), 15 in each condition,

with an average age of 25 years participated in the experiment. The participants received 150 SEK

(proximally 20 US dollars) for attending the experiment.

Materials and apparatus. The experiment was carried out on PC. The judgment task

involved estimation of the revenue of fictive business companies. Sixty different target values

corresponding to fictive revenue of business companies (range = 1-1000) were distributed either as

U-shaped or inversely U-shaped defined the OED. The U-shaped condition accordingly had a high

frequency of values close to 1 and to 1000, and the inversely U-shaped condition consisted

predominantly of target values close to 500. For each participant, the target values were randomly

attached to names in a database containing 156 real company names, with a new and independent

random assignment for each participant.


The Nave Intuitive 65

Design and procedure. The experiment was a between-subjects design and the participants

were randomly assigned to one of the two conditions (U-shape or inversely U-shape). In the learning

phase participants guessed the revenue of 60 companies, receiving outcome feedback about the

correct target value. A learning criterion of 40% (24) of the 60 revenue figures determined when the

learning phase was ended. In a test phase, they produced an initial point estimate followed by

intuitive confidence intervals for 60 companies they had previously seen in training and for 60 new

ones. After the test phase each participant assessed how many of the 60 companies they did

encounter during the training phase falling within ten pre-defined intervals (i. e., 0-100, 101-200,

201-300, 301-400, 401-500, 501-600, 601-700, 701-800, 801-900, 901-1000). The dependent

measures were the interval size, the standard deviation of the point estimates in the test phase, and

the reproduced SED.


The Nave Intuitive 66

Author Note

Peter Juslin, Department of Psychology, Uppsala University; Anders Winman, Department of

Psychology, Uppsala University; Patrik Hansson, Department of Psychology, Ume University.

This research was supported by the Swedish Research Council and the Bank of Sweden

Tercentenary Foundation. We are indebted to Nick Chater, Tommy Enquist, Josh Klayman, Ben

Newell, Hkan Nilsson, Henrik Olsson, and Jack Soll for useful discussions and comments on the

issues addressed in this article.

Correspondence concerning this paper should be addressed to: Peter Juslin, Department of

Psychology, Uppsala University, SE 751 42 Uppsala, Sweden, or to the e-mail address:

Peter.Juslin@psyk.uu.se
The Nave Intuitive 67

Footnotes

1
Working assumption is used to emphasize that we do not to deny that the cognitive

processes applied to the available information may at times be biased and violate normative

principles. Traditionally, when judgment bias (defined in one way or another) is encountered, the

most common explanatory scheme is to take the information input as given, while, in comparison,

often hastening to postulate cognitive algorithms that account for deviations from normative

responses. The strategy implied by the metaphor of the nave intuitive statistician is to start off the

analysis the other way around: to assume as a working assumption that the cognitive algorithms are

consistent with normative principles and to place the explanatory burden at the other end: the

information samples to which cognitive algorithms are applied. In the end, of course, in many

situations we may have to reject this working assumption.

2
One finds at least two different interpretations of availability in the literature (see Schwarz &

Wnke, 2002). The stronger version that implements the variable substitution approach advocated

in the heuristics and biases literature (see Kahneman & Fredericks, 2002) implies that assessments

of frequency are substituted by assessments of how easy it is to recall instances. Another

interpretation is that the judgments do derive from judgments of frequency but the samples are

themselves biased because of external biases in the information or the effects of biased recall. In the

latter case, of course, the processes are better described by the metaphor of the nave intuitive

statistician, because the proximal samples are described in terms of relative frequency and no

intensional heuristics are involved.

3
Note that this holds when plotting mean sample proportion P as a function of population

proportion p. If the expected population proportion instead is plotted as a function of the observed

sample proportion for a given sample size there is a regression effect that may sometimes contribute
The Nave Intuitive 68

to overconfidence bias (see Soll, 1996; Juslin et al., 1997).

4
The main differences between the parametric and the non-parametric model are the

following: If the SED is a finite set an additional source adds marginally to the error rate. By

necessity, in many estimation problems the target quantity (e.g., Thailand) cannot itself be an

observation in the retrieved SSD (a sample of Asian countries other than Thailand). Of course, if the

target quantity can be retrieved there is no need to estimate it. That the SSD always excludes the

target quantity lowers the hit rates somewhat. The assumption that the quantity can take on infinitely

low or high values increases the hit-rates for high confidence intervals as compared to a quantity with

defined lower and upper bounds. By contrast, our simulations suggest that the exact way in which

sample variability is represented (e.g., standard deviation, interquartile index) has minor effects. As

long as the measure is applied in the same way regardless of sample size the qualitative effect is

overconfidence.

5
The literature on working memory and short-term memory is complex with no consensus in

regard to the use of these terms (see Cowan, 2001, for a discussion). In this article we use the term

short-term memory for the capacity to passively maintain information retrieved from long-term

memory for the processing implied by the NSM.

6
Does this mean that experts produce confidence intervals that are no better than those

produced by novices? Clearly not: Although there are architectural constraints that determine

overconfidence, in general experts know better predictive cues of relevance to the estimation task.

Better cues allow the experts to access more homogenous OEDs, where the target values of the

observations are similar, allowing higher accuracy and tighter intervals.

7
This only holds to an approximation in the non-parametric implementation. This is because

even if all fractiles can be generated from a small sample by interpolation, when the sample is
The Nave Intuitive 69

retrieved again it still only contains a limited number of observations.

8
With ADINA(R) the independence strictly only holds for the first probability judgment.

Already at the second interval, its size is a function of the participants belief. The results presented in

Figure 8A are for the first interval where the assumption of independence is best approximated

and the experimental manipulation is most extreme.

9
The reason why the mean subjective probabilities are not identical in the three conditions in

Figure 8B is missing data in the ADINA-conditions. The ADINA procedure terminated the

sequence if the desired probability had not been produced within 24 trials.
The Nave Intuitive 70

Table 1

Linear Multiple Regression Models with Overconfidence as Dependent Variable, and Short

Term Memory Capacity (n by STM), Long Term Memory Content(n by LTM and SED Bias as

the Independent Variables for Experiment 1, 3, and 3 in Hansson et al. (2006) . Adapted from

Hansson, P., Juslin, P., & Winman, A. (2006). The Role of Short Term Memory and Task

Experience in for Overconfidence Judgment under Uncertainty. (Submitted for Publication).

Model n by STM n by LTM SED bias

Trials of training R N p p p p

Interval production
68 trials .21 20 .86 .04 .44a -.24 .41 -.11 .35 a
(Experiment 1)

Interval production
272 trials .82* 15 .006 -.72* .001a .29 .13 .39* .029a
(Experiment 2)

Interval production
544 trials .68 15 .072 -.67* .012a .38 .21 -.26 .34
(Experiment 2)

Interval production
232 trials 42*. 60 .02 -33* .00a .00 .96 .12 .17a
(Experiment 3)

Interval evaluation
232 trials .16 61 .67 -.15 .25 -.04 .78 .00 .94
(Experiment 3)

Note: R = Multiple correlation; N = Number of independent observations; p = p-value associated

with model or weight; = Beta weight for the predictor.

* Correlations statistically significant beyond alpha .05.


The Nave Intuitive 71

a
These-p-values correspond to one-tail tests of statistical significance.
The Nave Intuitive 72

Figure Captions

Figure 1. Schematic summary of three research programs on intuitive judgment. Research

guided by the perspective of the intuitive statistician (Panel A) has often found judgments that are in

approximate agreement with normative models, presumably because of the concentration on well

structured tasks in the laboratory. Research on heuristics and biases (Panel B) has emphasized

biases in intuitive judgment attributing them to heuristic processing of the evidence available to the

judge. The nave intuitive statistician (Panel C) emphasizes that the judgments are accurate

descriptions of the available samples and that performance depends on whether the samples

available to the judge are biased or unbiased, and on whether the estimator variable affords unbiased

estimation directly from the sample properties.

Figure 2. Format dependence. Panel A: Application of the half-range, the full-range, and the

interval production formats to elicit a participants belief about the population of Thailand. Panel B:

With the half-range format there is close to zero over/underconfidence, with the full-range format

there is marginal overconfidence, and with the interval production format there is extreme

overconfidence. Panel B is reproduced from Juslin, P., & Persson, M. (2002). PROBabilities from

EXemplars (PROBEX): A lazy algorithm for probabilistic inference from generic knowledge.

Cognitive Science, 26, 563-607.

Figure 3. The Nave Sampling Model (NSM) of format dependence.

Figure 4. Panel A: Probability density function for the distribution of target values in the

reference class defined by a cue (the OED). The values on the target dimension have been

standardized to have mean 0 and standard deviation 1. The interval between the .75th and the .25th

fractiles of the population distribution include 50% of the population values. Panel B: Schematic

illustration of the expectation for a sample of 4 exemplars with sample mean at the population mean.
The Nave Intuitive 73

The values on the target dimension are expressed in units standardized against the population

variance. The interval between the .75th and the .25th fractiles of the sample distribution include 39%

of the population values. Panel C: Schematic illustration of the expectation for a sample of 4

exemplars with the sample mean displaced relative to the population mean. The values on the target

dimension are expressed in units standardized against the population variance. The interval between

the .75th and the .25th fractiles of the sample distribution on average include 34% of the population

values.

Figure 5. Predictions by an implementation of the NSM as a parametric statistical model.

Panel A: The relative interval width, that is, the ratio between the interval width w predicted by the

NSM and the interval width W required for calibration, as a function of sample size. Panel B: hit-

rates as a function of sample size and the subjective probability of the interval, assuming that the SED

is perfectly representative of the OED. Panel C: Hit-rates as a function of the bias in the SED (the

mean SED is 0, .5, or 1 standard deviation larger than the mean OED) and the subjective probability

of the interval, assuming sample size 4.

Figure 6. Predictions by the NSM as a non-parametric simulation model applied to the data-

base for world-country populations. Panel A: Proportion of correct target values included in the

intervals for the three different probabilities and three different sample sizes (n). The identity line is

the proportions required for perfect calibration. The graph also illustrates typical empirical data from

interval production for world country populations adapted from Winman, A., Hansson, P., & Juslin,

P. (2004). Subjective probability intervals: How to reduce overconfidence by interval evaluation.

Journal of Experimental Psychology: Learning Memory and Cognition, 30, 1167-1175. Panel

B: The resulting predicted format dependence. On the y-axis is the overconfidence score, defined as

the mean probability of the interval minus the proportion of values falling within the interval. The
The Nave Intuitive 74

graph also illustrates typical empirical data in regard to judgments of world country populations

adapted from Juslin, P., Winman, A., & Olsson, H. (2003). Calibration, additivity, and source

independence of probability judgments in general knowledge and sensory discrimination tasks.

Organizational Behavior and Human Decision Processes, 92, 34-51. (This line for empirical data

is difficult to discern in the graph because it overlaps with the prediction for n=3.)

Figure 7. Panel A: Empirical data for full-range probability assessment and interval production

in regard to world country populations from Juslin, P., Winman, A., & Olsson, H. (2003).

Calibration, additivity, and source independence of probability judgments in general knowledge and

sensory discrimination tasks. Organizational Behavior and Human Decision Processes, 92, 34-

51. Panel B: The corresponding predictions by the NSM.

Figure 8. Panel A: Mean confidence and proportion of intervals that include the population

value as computed from the response to the first a priori interval for each of the three assessment

methods (with 95% CI, n=15). Panel B: Mean confidence and proportion of intervals that include

the population value as computed for the final, homed in, confidence intervals for each of the three

assessment methods (with 95% CI, n=15). Overconfidence is the difference between the mean

confidence and the proportion. Adapted from Winman, A., Hansson, P., & Juslin, P. (2004).

Subjective probability intervals: How to reduce overconfidence by interval evaluation. Journal of

Experimental Psychology: Learning Memory and Cognition, 30, 1167-1175.

Figure 9. Panel A: Proportion of correctly recalled target values as a function of the number

of training blocks in a laboratory learning task. The main effect of number of training blocks (1/2 a

block is 68 training trials, 2 blocks is 244 training trials, and 4 blocks is 544 training trials) is sizeable

and statistically significant. Panel B: Overconfidence in interval production as a function of the

number of training blocks in a laboratory learning task (1/2 a block is 68 training trials, 2 blocks is
The Nave Intuitive 75

244 training trials, and 4 blocks is 544 training trials). Overconfidence is the difference between the

probability of the interval and the hit-rate of target values included by the interval. The main effect of

number of training blocks is small and not statistically significant. The dotted line denoted

NSM(n=4) represents the prediction by NSM assuming sample size 4 and no SED bias. The

lower dotted line denoted probability judgment refers to the observed overconfidence in the

interval evaluation task after .5 training blocks. Panel C: Overconfidence and the statistically

significant interaction effect between short term memory capacity and assessment format. Adapted

from Hansson, P., Juslin, P., & Winman, A. (2006). The Role of Short Term Memory and Task

Experience for Overconfidence in Judgment under Uncertainty. (submitted for publication).

Department of Psychology, Ume University, Sweden.

Figure 10. First column (from the left): Predictions by the NSM (n=4) of the relationship

between interval sizes, mean absolute error and overconfidence, separately for .5 (Panel A), .80

(Panel B), and 1.0 intervals. Second column: Observed relationship between interval sizes, mean

absolute error and overconfidence in the interval production with half block of training (68 trials),

separately for .5 (Panel A), .80 (Panel B), and 1.0 intervals. Third column: Observed relationship

between interval sizes, mean absolute error and overconfidence with two blocks of training (244

trials), separately for .5 (Panel A), .80 (Panel B), and 1.0 (Panel C) intervals. Fourth column:

Observed relationship between interval sizes, mean absolute error and overconfidence with four

blocks of training (544 trials), separately for .5 (Panel B), .80 (Panel C), and 1.0 (Panel D) intervals.

Adapted from Hansson, P., Juslin, P., & Winman, A. (2006). The Role of Short Term Memory

and Task Experience for Overconfidence in Judgment under Uncertainty. (submitted for

publication). Department of Psychology, Ume University, Sweden.


The Nave Intuitive 76

Figure 1

A THE INTUITIVE STATISTICIAN

Environment Task sample Intuitive


Judgment
Environment
judgment

Veridicality
Veridicality
granted by
granted by training
training
in well
in well structured
structured
laboratory tasks
laboratory tasks

Closeto
Close tonormative
normativejudgment
judgment

B HEURISTICS AND BIASES

Environment Task sample Judgment

Biases attributed
to heuristic
processing of the
available sample

Biased judgment

C THE NAIVE INTUITIVE STATISTICIAN

Environment Task sample Judgment

Appropriateness of using
sample property to estimate
population property?
a) sampling biases?
b) biased estimators?

Normative or biased
judgment
The Nave Intuitive 77

Figure 2

Interval production format


Give the smallest interval which you are .xx % certain to
include the population of Thailand.
Between _____ and _____ million inhabitants.

Half-range format
Does the population of Thailand exceed 25 million? Yes/No
How confident are you that your answer is correct?
50% 60% 70% 80% 90% 100%
Guessing Certain

Full-range format
The population of Thailand exceeds 25 million.
What is the probability that this proposition is correct?
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Certainly false Certainly true

B
The Nave Intuitive 78

0.4

0.3
Over/Underconfidence

0.2

0.1

0.0

-0.1
Half-range Full-range Interval production

Probability Assessment

Response Format

Figure 3
The Nave Intuitive 79

Unknown quantity

1. Retrieve cue
that covaries with the
quantity. The cue defines an
objective environmental
distribution (OED)

2. Retrieve Sample
Retrieve n observations
from the corresponding
subjective environmental
distribution (SED)

3. Naive estimation
Estimate population
properties directly from
sample distribution (SSD),
and translate into required
response format

a. Interval b. Interval
evaluation production
Use the proportion in Use the coverage in
the SSD to estimate the SSD to estimate
the proportion in the the coverage in the
population population

Little or no Extreme
overconfidence overconfidence
The Nave Intuitive 80

Figure 4

OED (or Large Sample SD)


0.60
Probability density

0.45

50%
0.30

0.15

0.00
-3.50 -1.75 0.00 1.75 3.50
Standardized OED variance
B

SD, Small Sample and Centered Interval


0.60
Probability density

0.45
39%

0.30

0.15

0.00
-3.50 -1.75 0.00 1.75 3.50

Standardized OED variance


C

SD, Small Sample and Dislocated Interval


0.60
Probability density

0.45
34%

0.30

0.15

0.00
-3.50 -1.75 0.00 1.75 3.50

Standardized OED variance


The Nave Intuitive 81

Figure 5

A
Interval Size
Calibrated interval size
1.0
0.9

Relative Interval Size


0.8
0.7
0.6
0.5
0.4
0.3 .5 interval
0.2 .75 interval
0.1 .99 interval
0.0
0 2 4 6 8 10 12 14 16 18 20 22
Sample size n

Hit-rates (no SED bias)

1.0
0.9
0.8
0.7
Hit rate

0.6
0.5
0.4
0.3 Calibrated intervals
n =3
0.2
n=4
0.1 n=5
0.0
.50 .75 .95
Interval probability

C
Hit-rates (n=4, SED bias)

1.0
0.9
0.8
0.7
Hit rate

0.6
0.5
0.4
0.3 Calibrated intervals
0 sd SED bias
0.2
.5 sd SED bias
0.1
1 sd SED bias
0.0
.50 .75 .95
Interval probability
The Nave Intuitive 82

Figure 6

A
NSM(no SED bias)
1.0
Calibration
0.9 Empirical data
n=3
0.8
n=4
0.7 n=5
Proportion

0.6
0.5

0.4
0.3
0.2

0.1
0.0

0.50 0.75 1.00


Subjective probability

B
NSM(no SED bias)
0.5

0.4
Calibration
Empirical data
n=3
Overconfidence

0.3
n=4
n=5
0.2

0.1

0.0

Interval production Probability judgment


Assessment format
The Nave Intuitive 83

Figure 7

A
Empirical data
1.0
Probability judgment
0.9
Interval production
0.8 Calibration

0.7
Proportion

0.6
0.5

0.4
0.3
0.2

0.1
0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Subjective probability

NSM(n=4, no SED bias)


1.0
Probability judgment
0.9 Interval production
0.8 Calibration

0.7
Proportion

0.6
0.5
0.4
0.3

0.2
0.1
0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Subjective probability
The Nave Intuitive 84

Figure 8

A First Interval
1.0
0.9 Proportion correct
Subjective probability
0.8
0.7
0.6
Probability

0.5
0.4
0.3
0.2
0.1
0.0
Control ADINA(O) ADINA(R)
Condition

B Final Interval
1.0
Proportion correct
0.9
Subjective probability
0.8
0.7
0.6
Probability

0.5
0.4
0.3
0.2
0.1
0.0
Control ADINA(O) ADINA(R)
Condition
The Nave Intuitive 85

Figure 9
A
Recalled Target Values
0.5

0.4
Proportion Recalled
0.3

0.2

0.1

0.0
0 1 2 3 4
Training Blocks

Overconfidence for Estimates


0.5

0.4
Overconfidence

0.3 NSM(n=4)

0.2

0.1
Probability judgment

0.0
0 1 2 3 4
Training Blocks

C
Format Dependence and STM Capacity
0.5

0.4
Overconfidence

Low STM capacity


0.3 High STM capacity

0.2

0.1

0.0
Probability Interval
Assessment format
The Nave Intuitive 86

Figure 10

Vous aimerez peut-être aussi