Vous êtes sur la page 1sur 259

Are Humans the Only Theorizers?

:
A Philosophical Examination of the Theory-Theory of Human Uniqueness

By
Hayley A. Clatterbuck

A dissertation submitted in partial fulfillment


of the requirements for the degree of

Doctor of Philosophy
(Philosophy)

at the
UNIVERSITY OF WISCONSIN-MADISON
2015

Date of final oral examination: 05/20/15


The dissertation is approved by the following members of the Final Oral Committee:
Elliott Sober, Professor, Philosophy
Lawrence Shapiro, Professor, Philosophy
Daniel M. Hausman, Professor, Philosophy
Farid Masrour, Assistant Professor, Philosophy
Robert Streiffer, Associate Professor, Philosophy and Bioethics

Acknowledgements

I am grateful to all of my colleagues and mentors in the philosophy community at the


University of Wisconsin-Madison and am particularly thankful to Elliott Sober for his
encouragement and guidance. Id like to thank my family, friends, and Lake Mendota for
their steadfast companionship throughout this project. To fully express my gratitude to
Kyle would require a separate dissertation, so I will simply say, Thanks, dear.

ii
Table of Contents

Chapter 1 Darwin, Hume, and Morgan and the Vera Causae of Cognition

Chapter 2 The Theory-Theory of Human Uniqueness

57

Chapter 3 The Logical Problem and the Theoreticians Dilemmas

85

Chapter 4 Rejecting Hempels Premises: Two Unsatisfactory Accounts

126

Chapter 5 The Intervening Variable and Triangulation Approaches

155

Chapter 6 Two Roles of Theories in Human Cognition (and How to Test for Them) 198
References

245

1
Chapter 1 : Darwin, Hume, and Morgan and the Vera Causae of Cognition

1. Introduction
Cognitive ethology, the project of inferring the nature of non-human minds, faces
several unique obstacles. The first is that the mental states of others cannot be directly
observed. Therefore, comparative psychologists have argued that knowledge about them
must proceed via a double induction (Morgan 1894). First, we must induce the
relationships between the mental states that we can observe our own and their
corresponding observable effects in behavior. Then, from a degree of similarity between
animal behavior and our own, we induce a degree of similarity of their corresponding
psychological causes. The inferences we draw about the psychological causes of animal
behavior, then, depends both on our theory of the psychological causes of human
behavior and the principles by which we infer common mental causes from common
behavioral effects.
Another obstacle arises when cognitive ethology is placed within an evolutionary
context that acknowledges our close phylogenetic relationships with other animals.
Human cognitive traits have evolved from more rudimentary ancestral cognitive traits,
and given the relatively short period of time since our last common ancestor, we are
likely to share much of our cognitive architecture with our closest primate relatives. Such
considerations motivate a presumption of continuity in the psychological traits humans
and other animals. On the other hand, despite our recent common ancestry, there appear
to be significant differences in human and non-human behavior; humans are the only
species to have sophisticated language, culture, tool use, and scientific reasoning, abilities

2
that have contributed to our incredible evolutionary success. These seeming
discontinuities between the behaviors humans and animals motivate a conflicting
presumption of discontinuity in their mental causes as well.
Because of the unobservability of psychological states and prominent
considerations of both similarity and differences with animals, cognitive ethologists have
utilized general principles to guide our ethographic inferences. Here, I will examine two
historical principles that favor different presumptions in the debate over our mental
continuity with animals. The first, offered by Charles Darwin (with a precursor in David
Hume), states:
I can see only one way of testing our conclusions. This is to observe whether the
same principle by which one expression can, as it appears, be explained, is
applicable in other allied cases; and especially, whether the same general
principles can be applied with satisfactory results, both to man and the lower
animals (1872/2009, 25).
Using this principle, Darwin arrives at the conclusion that there is no fundamental
difference between man and the higher animals in their mental faculties (1879/2004,
86).
A second influential historical principle of comparative psychological inference,
which has commonly been viewed as a necessary corrective to the anthropomorphic
conclusions to which Darwins principle leads, is stated in C. Lloyd Morgans famous
Canon:
In no case may we interpret an action as the outcome of the exercise of a higher
psychical faculty, if it can be interpreted as the outcome of the exercise of one
which stands lower in the psychological scale (Morgan 1920, 53).
From his principle, Morgan infers a significant discontinuity in the mental faculties of
man and animals, at least with respect to the higher faculties of reason and abstraction.

3
Interestingly, both Darwin and Morgan believed that their principles were
consistent with, and indeed motivated by, natural selection and the thesis of recent
common ancestry. Why, then, were they committed to such different principles?
A common tendency among commentators is to interpret them both as
applications of Ockhams razor, focusing on two different senses of parsimony. Darwins
principle can be interpreted as arising from a general principle of parsimony in which
explanations that unify various phenomena under a single type of cause are to be favored.
Additionally, there is a specifically evolutionary undergirding for his parsimony
principle; given that humans share a common ancestor with other animals, it will be more
evolutionarily parsimonious to posit that similar traits across the phylogenetic spectrum
arise from the same evolutionarily preserved causes (de Waal 1991, Sober 2012).
Morgans Canon has been interpreted as embodying a different type of parsimony
principle in which explanations that posit simpler causes for a phenomenon are to be
favored, with it remaining to be seen how lower psychological faculties are simpler
causes (Sober 1998, Fitzpatrick 2008, Meketa 2014).
However, I will argue that looking for differing commitments to parsimony in the
justification of these principles is a red herring. In fact, both authors followed very
similar vera causa inductive strategies in arriving at their principles, in which parsimony
plays but a weak role. The vera causa principle, embodied in Newtons first rule of
reasoning, states that We ought to admit no more causes of natural things, than such as
are both true and sufficient to explain their appearances (Newton 1687/2003).
In the double induction of comparative psychology, ones theory of human
cognition specifies the true, known causes of behavior. I argue that the main source of

4
disagreement between Darwin and Morgan rests in their different views of human
psychology. In particular, they disagree about whether the causes identified by a Humean
empiricist theory of mind suffice to explain all of human behavior. Once we lay bare their
disagreements on that point, we see that Darwins principle and Morgans Canon are
quite inferentially inert on their own; their varying principles fall out of their similar
vera causa approaches and their particular views of the true causes of cognition.
To begin to make this case, in Section 2 I will first characterize Darwins belief in
mental continuity with animals and argue that he was not driven to that conclusion by any
principles of evolutionary inference that he accepted. Next, in Section 3, I will consider
and dismiss another possible source of his principle, an argument from analogy from
Hume. In Sections 4 and 5, I will argue that Darwin employs a vera causa argument in his
inferences about animal minds and show how his particular philosophy of mind leads him
to his judgment of mental continuity.
I will then turn to a consideration of Morgans vera causa argument to the
opposite conclusion. In Section 6, I will explicate the very specific senses of higher and
lower faculties for which Morgan intended his principle. In Sections 7 and 8, I will
show how his alternative, non-Humean theory of human psychology yields the result that
human and animal psychology is discontinuous. Finally, in Section 9, I will show that the
very same point of contention that separates Darwin and Morgan has re-emerged in a
central debate over mental continuity among current comparative psychologists.

5
2. Darwins judgment of continuity
Darwins theory of human and animal cognition is developed over his two great
works on the topic, The Descent of Man, and Selection in Relation to Sex and The
Expression of Emotions in Man and Animals. In both, he expresses his strong
presumption of continuity of cognition in man and his closest animal relatives. Leaving a
more thorough analysis of his reasons for believing in continuity for later sections, we
can give a brief flavor of Darwins ideas here.
In the third chapter of the Descent, titled, Comparison of the Mental Powers of
Man and the Lower Animals, Darwins stated goal is to show that there is no
fundamental difference between man and the higher mammals in their mental faculties
(1879/2004, 86). His argument is not merely that human capacities could have been
gradually evolved from precursors in animals. Instead, Darwin argues for the stronger
claim that all of the psychological causes of human behavior are present in other extant
species. He states that humans and the higher animals1 share the same senses,
emotions, and faculties of imitation, attention, deliberation, choice, memory,
imagination, the association of ideas, and reason, though in very different degrees (ibid.,
100). Though Darwin admits that there are significant differences between humans and
animals, these differences lie on a continuous spectrum:
We must admit that there is a much wider interval in mental power between one
of the lowest fishes, as a lamprey or lancelet, and one of the higher apes, than
between an ape and man; yet this interval is filled up by numberless gradations
(ibid., 86).

Darwin explicitly states that this category includes the Primate order (ibid., 100). However, his discussion
of cases shows that he extends this judgment of continuity to other mammals, like horses, elephants, and
especially dogs. He also argues for some continuity with non-mammals, such as fish and birds.

6
Darwins analysis of mental faculties in the Descent contains two key claims. The first is
that there is continuous variation in the behavior of animals and humans. The second is
that this continuous variation in behavior can and should be explained via continuous
variation in the same underlying mental causes. These claims are elevated to the abovequoted general principle from the Expression of Emotion, in which Darwin maintains that
human behavior ought to be explained via the same general principles as the behavior
of lower animals.
Darwins inferential strategy here is surprising for numerous reasons. First, for
certain human behaviors, the presumption is strongly in favor of discontinuity with
animals. For example, humans are the only species that has significant and flexible tool
use, complex scientific reasoning, a sophisticated language, systems of religion and
morality. At the very least, if there is a burden of proof here, it seems to be on the
advocates of strong continuity. Though Darwin discusses these examples at length in the
Descent and attempts to establish the existence of intermediary forms of each trait among
animals, he makes many surprising and prima facie implausible attributions of mental
states in the process2.
Darwins commitment to continuity in mental faculties is surprising for another
reason. Since Darwin maintained that mental traits obey the same principles of evolution
by natural selection, we should expect him to apply the same inferential rules in
reasoning about the evolution of mental traits as for biological traits more generally. By
Darwins own lights, however, neither of his main claims about mental traits is true with
respect to biological traits more generally.
2

For example, Darwin claims that There can, I think, be no doubt that a dog feels shame and something
very like modesty when begging too often for food. (ibid., 42). Elsewhere in the Descent, he suggests that
dogs have something close to religious beliefs, spite, and other complex emotions.

7
First, though Darwin is strongly committed to gradualism, his form of gradualism
only predicts that trait values in an evolving lineage will be continuous with one another
through time. However, Darwin clearly states that his theory does not entail that when we
look at the present distribution of a trait in extant species that descended from a common
ancestor, we will see gradual variation. He writes,
On the theory of natural selection, we can clearly understand the full meaning of
that old canon in natural history, Natura non facit saltum [nature makes no
jumps]. This canon, if we look only to the present inhabitants of the world, is not
strictly correct, but if we include all those of past times, it must by my theory be
strictly true (1859/2003, 223).
In fact, Darwins theory predicts that we should usually expect discontinuity in trait
values among extant species:
The competition will generally be most severe between those forms which are
most nearly related to each other in habits, constitution, and structure. Hence all
the intermediate forms between the earlier and later states, that is between the less
and more improved state of a species, as well as the original parent-species itself,
will generally tend to become extinct (ibid., 171).
Therefore, Darwin has provided a prima facie evolutionary justification for thinking that
the seeming discontinuities between animal and human behavior are genuine.
The inference from continuity of trait values to continuity in the underlying
causes of those traits is also suspect. Even if Darwin is right that all human behaviors
bear similarities to those of lower animals, this would still not license the inference that
the causes of those behaviors must be the same, for he allows that convergent evolution
may lead unrelated or distantly related species to exhibit similar traits3; in fact, Darwin

In a passage that famously reflects the ontogeny of the theory of natural selection itself, Darwin writes
that in nearly the same way as two men have sometimes independently hit on the very same invention, so
natural selection, working for the good of each being and taking advantage of analogous variations, has
sometimes modified in very nearly the same manner two parts in two organic beings, which owe but little
of their structure in common to inheritance from the same ancestor (ibid., 216). The cases of convergent
evolution that Darwin mentions in the Origin include: the torpedo body shape of fish, whales, and dugongs;

8
suggests that convergent evolution is common enough to present serious problems for
systematists (ibid., 360). Convergent evolution would not undermine the inference from
similar traits to similar mental causes if trait convergence was always due to convergent
evolution in the same underlying causes. However, Darwin himself suggests that species
undergoing convergent evolution will often take different paths to achieving superficially
similar traits4. For instance, though birds, bats, and bees have independently evolved the
ability to fly, the wing structures that cause them to have this ability are very different;
from similarity of traits, we should not infer that the underlying causes must be the same.
Thus, Darwins theory of animal minds in the Descent and Expression of
Emotions appears anomalous given Darwins claims in the Origin that we should expect
traits to be discontinuous across extant species and that similar traits may (and often do)
result from different causes via convergent evolution. In what follows, I will examine
several possible explanations for why Darwin seems to obey different inferential
procedures with respect to mental evolution than evolution of traits more generally. First,
I want to consider a deflationary explanation of this anomaly, according to which Darwin
breaks his own rules of evolutionary inference in order to establish a dialectical point that
is not warranted by his own lights.
One way of motivating this response is suggested by Darwins reluctance to
explain trait similarities by positing convergent evolution; though he does so when he
believes the evidence requires it, he treats convergent evolution as an explanation of last

luminescence in insects; enlarged tubers in turnips and rutabagas; electric organs in fish; and pollen masses
in orchids and milkweeds.
4
In all these cases of two very distinct species furnished with apparently the same anomalous organ, it
should be observed that, although the general appearance and function of the organ may be the same, yet
some fundamental difference can generally be detected (ibid.).

9
resort and indeed a scandal for his theory5. Why might this be? Note that there are two
primary components of Darwins long argument in the Origin; first, that extant species
share common ancestry, and second, that the process of evolution by natural selection
explains how and why species with common ancestry have diversified and adapted to
different environments6. Though Darwin gives several different types of arguments for
common ancestry, one such argument is predicated on similarities among extant species.
Perhaps Darwins worry is that the strength of this inference from similarity in traits to
common ancestry is undermined if convergent evolution is rampant7.
If Darwins primary goal in the Descent is to establish the common ancestry of
humans and other animals, it would be dialectically useful to downplay the possibility of
convergent evolution while overstating the similarity of human and animal minds. This
interpretation gains credence when we consider that his theorys implications for human
evolution presented the most serious obstacle to its acceptance. For this reason, Darwin
hesitated to exclude an extensive discussion of human evolution from the Origin, and in
his preface to the Descent, he admits his prior determination not to publish on the
subject, as doing so should thus only add to the prejudices against my views
(1879/2004, 17).
Certainly, then, to bolster the application of his theory to human evolution,
Darwin would have felt a dialectical pressure to make the strongest possible case for the

In his discussion of instinct, for instant, Darwin admits that no doubt many instincts of very difficult
explanation could be opposed to the theory of natural selection, including cases of instincts almost
identically the same in animals so remote in the scale of nature, that we cannot account for their similarity
by inheritance from a common parent, and must therefore believe that they have been acquired by
independent acts of natural selection (ibid., 242).
6
For a discussion of how these two components of Darwins argument relate to one another, see Sober
(2011).
7
Hennig expresses a similar concern that if convergence were the default explanation for similarity in
traits, phylogenetic systematics would lose all the ground on which it stands (Hennig 1966, 121).

10
continuity of humans and their closest relatives. Further, to the extent that his insistence
upon continuity elevates the capacities of animals, rather than degrading that of humans,
the notion that humans had evolved from non-human ancestors would be more palatable
to his readers.
While I think that there is some merit to this interpretation, I do not find it fully
satisfying. First, it is not clear that establishing common ancestry is Darwins only, or
even primary goal, in the Descent. He is also keenly interested in explaining how and
why various human traits evolved and their variation among human races (ibid., 18).
Second, it seems most fruitful to adopt the principle of charity and presume that Darwin
had some good reason for believing in strong mental continuity. Even if dialectical
considerations motivated Darwins search for a way to establish continuity between
human and animal minds, this does not imply that the reasons he ultimately offers are
unsound. It is to the possible reasons he had for his view that I will now now turn.

3. Humes Argument from Analogy


Another possible explanation for Darwins inference that human and animal
cognition is continuous both in terms of their behaviors and the psychological causes of
those behaviors is that he was influenced by a principle of ethological inference that he
found in Hume. Humes principle is found in both of his major works on human
psychology, Treatise of Human Nature (1739-1740) and Enquiry Concerning Human
Understanding (1748). Humes principle is found in its strongest form in the Treatise,
where he states:

11
When any hypothesis, therefore, is advanced to explain a mental operation,
which is common to men and beasts, we must apply the same hypothesis to
both (T 1.3.16.3; SBN 177)8.
Though the influence of Hume on Darwin has received surprisingly little attention9, the
hypothesis that Darwins assumption of continuity is drawn from Humes principle has
some initial plausibility. In Notebooks N and M, the notebooks that would serve as the
basis of the Descent and Expression of Emotions, Darwin cites both the Treatise and the
Enquiry and specifically references the chapter of the Essay containing Humes principle
(N101). Additionally, the general principle of inference that Darwin states in the
Expression of Emotions10 sounds strikingly similar to Humes. To evaluate its purported
influence on Darwin, it will be worthwhile to first examine Humes principle in more
depth.
Hume gives surprising weight to this principle, arguing that it is a necessary
touchstone that any theory of human psychology must meet to be considered minimally
plausible; he writes, if we find upon trial, that the explication of these phenomena,
which we make use of in one species, will not apply to the rest, we may presume that that
explication, however specious, is in reality without foundation (T 2.1.12.6; SBN 327,
italics mine). Hume offers a somewhat weaker interpretation of the principle in the
Enquiry, arguing that a unified account of human and animal cognition is merely more
likely to be true than one that posits separate causes of their behaviors: Any theory, by
which we explain the operations of the understanding, or the origin and connexion of the
8

References to Humes Treatise and Enquiry are abbreviated by T and EHU, respectively, followed by
book, part, section, paragraph, and page number in the Selby-Bigge and Nidditch edition.
9
See Huntley (1972), Reichman (2000), and Richards (1987) for exceptions.
10
I can see only one way of testing our conclusions. This is to observe whether the same principle by
which one expression can, as it appears, be explained, is applicable in other allied cases; and especially,
whether the same general principles can be applied with satisfactory results, both to man and the lower
animals (1872/2009, p. 25).

12
passions in man, will acquire additional authority, if we find that the same theory is
requisite to explain the same phenomena in all other animals (EHU 9.82; italics mine)11.
When Hume insists that we must apply the same hypothesis to explain animal
and human behavior, there are two senses of similarity he may intend. First, there is the
weak thesis that their mental states must be of the same ontological type. For example,
Hume may be arguing against views that give a Platonic account of human beliefs but a
thoroughly empiricist account of animal beliefs. In fact, Hume applies his principle in
ways that make it clear that he has an even stronger thesis in mind. When behaviors are
similar, he argues that the same type of mental states, with the same content, must be
responsible for both. For example, he argues that an animals prideful behavior is caused
by a sense of its own beauty (T 2.1.12.4; SBN 326), that seeing grief in another causes
sympathy (T 2.2.12.6; SBN 398), and from the tone of voice the dog infers his masters
anger, and foresees his own punishment (T 1.3.16.6; SBN 178), in just the same way as
in man.
Humes support for this principle does not rest upon assumptions about the
common ancestry of humans and animals, but rather a general principle of causal
inference. He argues that the mental states of humans and animals must be similar
because they exhibit similar effects:

11

It is interesting to note parallel claims made by De Waal (1991) in favor of mental continuity (discussed
in Sober 2012). Motivated by considerations of evolutionary parsimony, he argues, By far the simplest
assumption regarding the social behavior of the chimpanzee, for example, is that if this species behavior
resembles that of ourselves then the underlying psychological processes must be similar too (298). Later,
he weakens this claim, arguing The most parsimonious assumption concerning nonhuman primates is that
if their behavior resembles human behavior the psychological and mental processes involved are probably
similar too (316).

13
Tis from the resemblance of the external actions of animals to those we
ourselves perform, that we judge their internal likewise to resemble ours; and the
same principle of reasoning, carryed one step farther, will make us conclude that
since our internal actions resemble each other, the causes, from which they are
derivd must also be resembling (T 1.3.16.3; SBN 176-177).
Hume states that his principle derives from the more general rule that from similar
effects, all our principles of reason and probability carry us with an invincible force to
believe the existence of a like cause (T 1.3.16.2; SBN 176).
On Humes view, ethological inferences are arguments from analogy (Boyle
2003). The first premise of the argument is that bodily structures and/or behaviors are
functionally similar in man and animals. The second is that from similar effects, we must
posit similar causes. The result is the extremely strong conclusion that where the
structure of the parts in brutes is the same as in men, and the operation of these parts also
the same, the causes of that operation cannot be different, and that whatever we discover
to be true of the one species, may be concluded without hesitation to be certain of the
other (T 2.1.12.2; SBN 325).
Humes principle is far too strong as a general principle of inference. First, brief
reflection will turn up obvious counter-examples to the claim that similar effects by
necessity have similar causes12. For example, Alison and Jamie may both give to charity,
though Alisons behavior was caused by altruistic desires and Jamies by a desire for
praise. Further, it is not clear that all similar effects even suggest that similar causes are
responsible. For example, a buoy in the lake and an antsy child both bounce up and
down, but we are not tempted, let alone driven with an invincible force to posit the
existence of a similar cause of both events.

12

This claim would be vacuously true if having similar effects is a property which suffices to make two
causes similar.

14
On the other hand, Humes principle is surprisingly weak in the conclusions it
delivers about other minds. Humes principle is a conditional; if two effects are similar,
then they must have similar causes. However, Hume does not argue at any length for the
similarity of human and animal behaviors, and though they obviously bear some
similarities to one another, the variation among them is just as obvious. Additionally,
Humes principle does not give us a criterion for how similar two behaviors must be in
order for us to conclude that their causes are the same. Perhaps Humes principle
recommends that from a degree of similarity in behavior or structure, we ought to infer a
proportional degree of similarity in their causes. However, this does not get us any further
to solving the problem, unless we also have criteria for similarity of causes.
The conclusion of Humes argument from analogy is that the causes of human
and animal behavior must be the same, so only unified psychological theories of both will
be considered adequate. Even so, this does not say anything about which psychological
theory or theories will actually meet this criterion. It is clear that Hume thinks that an
account which attributes high-level, rationalist capacities to animals is implausible, and
thus his empiricist account is the only one that will pass his test (T 1.3.16.8; SBN 178).
However, he does not argue for this claim, and it is not established by his principle alone.

4. Vera Causa Reasoning


Despite the facts that the two authors make suspiciously similar claims and that
Darwin encountered Humes principle, I do not think that its influence can plausibly
explain Darwins commitment to continuity. First, Humes principle does not establish
that human and animal behaviors are in fact similar, so its influence on Darwin would not

15
explain why he thinks that they are continuous, despite the fact that his theory would
predict otherwise. Additionally, Darwin explicitly denies Humes claim that similar
behaviors must by necessity have similar mental causes, as this passage indicates:
We may easily underrate the mental powers of the higher animals, and especially
of man, when we compare their actions founded on the memory of past events, on
foresight, reason, and imagination, with exactly similar actions instinctively
performed by the lower animals (1879/2004, 89; emphasis mine).
In general, the type of argument by analogy advocated by Hume is at odds with
Darwins general philosophy of science. Many authors, most notably M.J.S. Hodge
(1977, 1989, 1992), have argued that Darwins scientific method was largely driven by
the vera causa approach to scientific inference. This approach was prominently
forwarded by Newton and adopted by major figures in philosophy of science of Darwins
day, including Herschel and Lyell, whom Darwin cited as major influences.
The vera causa principle, the first rule of scientific reasoning in Newtons
Principia, states that We ought to admit no more causes of natural things, than such as
are both true and sufficient to explain their appearances (Newton 1687/2003). This
principle was interpreted by Reid, Herschel, and other philosophers of the eighteenth and
nineteenth centuries to embody a two-step process for causal explanation. First, any
causes posited to explain some novel phenomenon must be true causes that are known
to operate in some well understood domain. Secondly, these known causes must be
shown to be sufficient to produce the observed effects in the new domain of interest13.
Once a known cause has been shown to be sufficient to explain the new phenomenon, the
principle recommends that no additional causes be added to the explanation.

13

To these requirements of existence and competence, Hodge adds a third requirement that the vera causa
must be shown to be responsible for the observed effects by ruling out alternative explanations (1977, 239).

16
Historians of Darwin have argued that we can make sense of his argumentative
strategy in the Origin through the lens of the vera causa view. Darwin opens the Origin
with a detailed examination of artificial selection, tracing the known causes of variation
and adaptedness in domestic breeds. This forms the basis of the second part of the book,
in which he argues that the very same causes would suffice to produce the variation and
adaptedness of naturally occurring species. Lastly, the third part of the book, covering
embryology and geographic distribution, is an argument that natural selection from a
common ancestor, rather than the separate creation of species, was in fact responsible for
producing the traits and distribution of species that we observe (Hodge 1992).
Importantly, the vera causa strategy employed by Darwin is not a simple
argument from analogy. Darwin took great pains to first uncover the causes that operate
in the process of artificial selection and trace how those causes operate on heritable
variation to produce diversity and adaptedness of domestic breeds. If his argument in the
Origin were a straightforward argument from analogous effects to analogous causes, then
it would suffice for him to argue that the effects of domestic breeding bear certain
resemblances to natural species, and therefore, that the same causes must be responsible
for both. This is not the strategy that Darwin pursues. Instead, he gives detailed
arguments to show that the causes that he has identified in artificial breeding variation,
heritability of traits, and differential reproduction and survival are also present in
natural populations and would suffice to produce the traits of natural species.
In an argument from analogy, the judgment of similarity between two effects is
dialectically prior to the identification of their like causes, while in a vera causa
argument, the analysis of like causes is dialectically prior to the analysis of like effects.

17
This ordering is exemplified by the two-step process of vera causa inference expounded
by Herschel, according to which scientific inquiry begins with the detailed analysis of the
causes of a known phenomenon, followed by the generalization of those causes to other
phenomena (Herschel 1987). According to Gildenhuys (2004), Herschel explicitly
distinguishes his view from the type of analogical process embodied in Humes principle
in that the two analogous phenomena need not share any features in common other than
their shared causes.: phenomena need not have any other features in common
(Gildenhuys 2004, 597).
As we have seen, inferences using Humes principle will be weakened to the
extent that (a) there are dissimilarities among the effects under comparison, or (b) similar
effects are likely to have dissimilar causes. Darwins commitments in the Origin predict
that both (a) and (b) will hold for biological traits in general which would thus would
undermine Darwins theory of mental continuity if he was arguing on the basis of
analogy. However, neither (a) nor (b) need inhibit a vera causa argument.
During the analysis stage of a Herschelian vera causa approach, the scientists
work is to determine the causal bases of the observable features of some phenomenon,
and this process may reveal that some of these features are not indicative of underlying
causal structure. In an example from Herschel, the analysis of polarizing substances
shows that features such as color and weight are not causally relevant with respect to
polarization (Herschel 1987, 100; Gildenhuys 2004). Therefore, when generalizing the
causes of one substances polarization to another, dissimilarity in color between the two
substances is no obstacle. In general, dissimilarities do not impede a vera causa argument
so far as they are spurious with respect to the causes under consideration. Additionally,

18
the fact that dissimilar causes may produce similar effects does not significantly weaken
a vera causa argument, for it is incumbent upon the scientist to show that the known
causes are in fact operative in the new domain. Therefore, the presence of convergent
evolution and variation in the traits of extant organisms present a more severe obstacle to
arguments from analogy than to vera causa arguments from human to animal minds. In
Section 5, I will argue that Darwins inference about the nature of animal minds is a vera
causa argument in which the true causes of human psychology that Darwin identifies
are those contained in Humes philosophy of mind. Darwins commitment to continuity
falls naturally out of this theory of mind.

5. Humean Empiricism as a Vera Causa


5.1. Humes theory of mind
To build the evidential case for a Humean influence on Darwins philosophy of
mind, a brief exposition of Humes theory will be helpful. In the Treatise and the
Enquiry, Hume presents his thoroughly empiricist account of human cognition. On his
view, the representational materials of cognition are of two general types. Impressions are
the vivid and raw perceptions that are delivered by the senses and emotions. From these
and only these are derived ideas, less vivid copies of the impressions, stored in memory
and used in reasoning and the imagination. Impressions and ideas may be either simple or
complex, depending on whether they allow for any separation in the mind. For example,
the idea of redness is simple, while the idea of a red cube is not, for one can separate the
perceived redness of the cube from its six-sidedness.

19
There are two primary faculties that operate over the perceptual raw materials of
impressions and their resulting ideas. The memory is the faculty of recalling past
impressions and ideas, and it obeys the order and form with which those perceptions
occurred in experience. The imagination, on the other hand, is not so limited and thus
allows for novel combinations of past impressions and ideas (T 1.1.4; SBN 9-10).
The faculties of memory and imagination (to a lesser extent) operate under
general principles of connection, some associating quality by which one idea naturally
introduces another (T 1.1.4.1; SBN 10). Impressions and their resultant ideas become
connected to one another in thought when they are experienced as resembling one
another, as contiguous, or as cause and effect (which is just a species of contiguity).
Thus, the idea of a billiard ball may naturally introduce the idea of a baseball in virtue
of their similar shapes, and the idea of a cue hitting a billiard ball may automatically elicit
the idea of the ball moving away from the cue in virtue of their being experienced
contiguously in the past. Though imagination can separate complex ideas into simple
ones and has some freedom to conjoin two ideas that were not previously experienced as
contiguous or similar, the trains of thought in imagination typically follow these general
patterns of association.
In summary, the raw materials of cognition are the impressions and ideas, all of
which are ultimately derived from experience. Through the faculties of memory and
imagination, certain perceptions naturally elicit other perceptions, in accordance with the
laws of resemblance, contiguity, and causation. On the Humean picture, these are
properly thought of as the basic laws of human psychology.

20
Before turning to evidence of Darwins Humeanism, it will be helpful to explicate
Humes view on the higher faculties of reason and abstraction which will present the
strongest challenge for Darwins commitment to mental continuity. Most acts of
reasoning from idea to idea proceed in accordance with the relations between their source
impressions, namely resemblance, contiguity, and causation. However, Hume admits that
it is possible for the imagination to compare with one another two ideas that were not
previously associated with one another, to see if they bear any relation. The imagination
may thus represent and consider abstract philosophical relations such as {being the
same shape as} which are divorced from particular sensory impressions.
Hume gives a deflationary empiricist account of these abstract ideas14, according
to which all general ideas are nothing but particular ones, annexed to a certain term,
which gives them a more extensive signification, and makes them recall upon occasion
other individuals which are similar to them (T 1.1.7.1; SBN 17). For Hume, abstract
ideas can be subsumed under the relations of similarity and contiguity; an abstract ideas
extension is a set of objects that have been experienced contiguously with the same label,
and in virtue of this similarity, one member of the labels extension elicits ideas of the
rest15.
Thus, even the highest forms of reasoning for Hume are constituted by the same
types of representation and capacities as lower forms of association. Reason in all its
forms is nothing but a wonderful and unintelligible instinct in our souls, which carries us
along a certain train of ideas, and endows them with particular qualities, according to
their particular situations and relations (T 1.3.16.9; SBN 179).

14
15

Hume credits Berkeley for this account of abstract ideas.


I will return to a more in-depth examination of the plausibility of this account in Section 7.

21
5.2. Darwins Humeanism
While Darwins empiricist leanings are evident throughout the Descent and the
Expression of Emotions, the direct influence of Humes philosophy of mind is most
evident in Darwins Notebooks M and N, which served as the basis of his two works on
cognitive ethology. In his notes, Darwin cites both the Treatise (M104, N101) and the
Enquiry (M155), noting that the latter is well worth reading. Additionally, Hume and
other British empiricists would have been familiar to Darwin, having been popular within
his family, Erasmus Darwin most notably (Richards 1987).
In addition to the bibliographical evidence for Humes influence, there is strong
textual evidence that Darwin thought of human cognition in Humean terms. Darwin
separately discusses each of the components of cognitive activity, and in each case, his
discussion has a distinctively Humean flavor. Darwins analysis of the vera causae of
human cognition has two aspects. First, he argues that these causes suffice to explain all
human behaviors. Second, he takes pains to explain how variation among humans can be
explained by variation in the development of these Humean capacities16. This analysis
will thus set the stage for his argument that the true causes of human cognition suffice to
explain animal behavior and variation between humans and animals.
First, Darwin has a sensationalist theory of mental content, in which every mental
representation traces back to a sensory impression. In addition to instinct, humans have
knowledge acquired by [the] senses where thinking consists of sensation of images
before your eyes, or ears (language mere means of exciting association) or of memory
16

It is clear that he takes this analysis of human variation to play an important role in explaining continuous
variation across species. Indeed, immediately after claiming that all animal cognition lies on a continuous
spectrum, Darwin adds: Differences of this kind between the highest men of the highest races and the
lowest savages, are connected by the finest gradations. Therefore it is possible that they might pass and be
developed into each other (1879/2004, 86).

22
of such sensations, & memory is repetition of whatever takes place in brain when
sensation is perceived (M61e, M62e). That Darwins view was influenced by Humes
theory in particular is evidenced by his citation of Hume in his later remark that as some
impressions (Hume) become unconscious so may some ideas i.e. habits, which must
require idea to order muscles to do the actions (M105).
Already, we can see how Darwin thought of the memory as a faculty which
merely recalls past impressions. Additionally, he sees memory as governed by laws of
association, though he gives a somewhat different account than does Hume. He writes,
an intentionally [sic] recollection of anything is solely by association, & association is
probably a physical effect of brain, the thoughts being functions of the same part of brain,
or the tendency to habit of producing a train of thought (M46). Elsewhere, remarking on
Mackintoshs discussion of association, Darwin suggests that a physical linkage in the
brain between two stored ideas may explain why one naturally suggests the other17 (N89,
Mackintosh 1837). Nevertheless, he believes that memory is largely an instinctual
process, that even when conscious, obeys rules of association that are largely out of the
control of the thinking subject. With respect to imagination, recall that Hume
distinguished the automatic elicitation of related ideas in typical acts of imagination from
the conscious, labored comparison of previously unconnected ideas through the
philosophical imagination. Like Hume, Darwin emphasizes the former but admits that the
latter faculty may be possible18.

17

For Hume, the laws of association are brute laws of human psychology for which no deeper explanation
needs to be given. Darwin, then, presages modern neuroscience in seeking physiological explanations for
psychological regularities.
18
The Imagination is one of the highest prerogatives of man. By this faculty he unites former images and
ideas, independently of the will, and thus creates brilliant and novel results The value of the products of
our imagination depends of course on the number, accuracy, and clearness of our impressions, on our

23
Of greatest interest is Darwins analysis of the faculties of reason and abstraction,
which for him are but species of memory and imagination, and how variation in these
faculties among humans is to be explained. He hints at his Humean conception of reason
in the Descent, stating that the mere association of ideas is intimately connected with
reason (ibid., 97). His notebooks, however, give a fuller picture of the development of
Darwins thought on the matter.
Darwin states that Reason in its simplest form probably is single comparison by
senses of any two objects they by vivid power of conception between one or two absent
things reason probably mere consequence of vividness & multiplicity of things
remembered & the associated pleasure &c accompanying such memory (N21e). Darwin
then recommends adopting Abercrombies definition of reason, according to which
higher forms of reasoning such as mathematical or philosophical reasoning are
essentially the same as these simple associations of ideas (Abercrombie 1838).
He applies this definition from Abercrombie to a case that appears to be a direct
precursor to a key example in the Descent. Consider a cart horse reasoning about the best
way to descend a steep hill. Presumably, the horse does so by remembering past
experiences of descending hills. Darwin suggests that if the horse were to develop from
this into a theory of friction & gravity, it would not be doing anything different in kind
but only rather more steps of the same process (M141). Therefore, if we wish to label
as reasoning the development of a theory of gravity, we ought to apply the same label
to the first step of the process, which is the mere association of ideas.

judgment and taste in selecting or rejecting the involuntary combinations, and to a certain extent on our
power of voluntarily combining them (1879/2004, 95; emphasis mine).

24
The effect of Hume on Darwins view of reason is most clear in a passage in
Notebook N. After citing Humes chapter on the reason of animals, Darwin states, I
suspect the endless round of doubts & skepticisms with respect to the evolution of
reason might be solved by considering the origin of reason as gradually developed; see
Hume on Skeptical Philosophy (N101).
Humes primary concern is that chapter (Chapter IV:1 of the Treatise) is to show
that epistemic skepticism will not penetrate into our everyday, common judgments.
Hume notes that skepticism about ones epistemic faculties leads to an infinite regress
which threatens to destroy justification for any of our beliefs19. Hume responds that if
beliefs were the result of purely rational judgments, then skepticism would indeed intrude
on all of our common beliefs. However, our psychological capacity for reason is not
necessarily sensitive to higher-order judgments about the reliability or justification of our
beliefs. He writes:
Nature, by an absolute and uncontrollable necessity has determined us to judge
as well as to breathe and feel; nor can we any more forebear viewing certain
objects in a stronger and fuller light, upon account of their customary connection
with a present impression, than we can hinder ourselves from thinking as long as
we are awake all our reasonings concerning causes and effects are derived from
nothing but custom; and that belief is more properly an act of the sensitive, than
of the cogitative part of our natures (T 1.4.1.8; SBN 183).
Higher-order reasoning reasoning about metaphysics or abstract principles is built out
of the simplest forms of reasoning, and takes the simplest generalizations from custom as

19

Suppose that one believes that q is probable (though not certain) given p. Then, one considers the
probability that ones judgment that q given p will be reliable, considering that ones past judgments have
been fallible. Skepticism about ones ability to draw true inferences should lower ones credence that q
actually follows from p. Similarly, skepticism about ones ability to draw inferences should cause one to
doubt their judgments about how their past reliability bears on the inference from p to q, and so on.

25
its materials. Therefore, in Humes passage Darwin finds support for his claim that even
the highest forms of reasoning are just a series of steps, not leaps, from brute association.
In that chapter, Darwin may have also found a promising psychological
explanation for variation in reasoning capacities in humans. According to Hume,
increasing levels of attentional resources are required for reasoning as the relations
involved become more abstract. The force of higher-order reasoning is diminished in
comparison to the associations given by experience because attention is on the stretch
and the connective force between ideas is weakened (T 1.4.1.10; SBN 185).
Variation in attention is one way, then, to explain variation in reasoning
capability. Darwin seems to affirm this account, noting that Hardly any faculty is more
important for the intellectual progress of man than Attention (1879/2004 94). Humans
certainly vary in their capacity to focus on particular objects or relations at the expense of
others, and only those individuals that have excellent attentions will be capable of higher
forms of reasoning.
Darwin also uses variation in memory, experience, and the power of association
(the disposition to form a connection between contiguous or similar impressions) to
explain differences in human reasoning. In a telling passage in Notebook N, Darwin
notes that there is a great difference between the judgments formed by adults and children
(N90). He argues that we cannot explain this difference by positing that the man has
reason and the child lacks it. On Darwins view, reason itself is invariable. Rather, the
difference between the two is that the man has had much more experience of the world.
For experience to explain how the child gradually progresses to the reason of man, those

26
experiences must influence his judgments, and they can only do so if the child had a prior
capacity for reason.
Thus, I conclude that Darwin held a thoroughly Humean philosophy of mind,
according to which all cognition is built of the same basic representative constituents
(ideas and impressions) and faculties (memory, imagination, and attention), and is
governed by the same laws of association. Variation in behavior should not be explained
by positing that one of these capacities is present or absent; the difference between a
child and a genius like Newton is not due some special faculty in the latter. Rather,
differences in reasoning ability are explained by variation in the other Humean faculties,
such as memory or attention.

5.3 The Existence of the vera causae in animals


Having established the vera causae of human cognition, Darwin next argues for
the existence of each of these causes in animals. First, Darwin takes it as manifestly true
that animals have many sensory capacities in common with man, and therefore, as man
possesses the same senses as the lower animals, his fundamental intuitions must be the
same (1879/2004, 87). Additionally, animals exhibit the ability to feel pleasure and pain,
and have the same emotional states, including higher emotional states like love, agony,
vengeance, sympathy and curiosity (ibid., 89).
As evidence of animals remarkable capacity for memory, Darwin cites the case
of a dog that recognized him after an absence of five years, which he explains in very
Humean terms20:

20

Darwin (ibid.) discusses a similar case of recognition in ants, showing just how evolutionarily deep he
thinks the capacity of memory is.

27
I went near the stable where he lived, and shouted to him in my old manner; he
shewed no joy, but instantly followed me out walking, and obeyed me, exactly
as if I had parted with him only half an hour before. A train of old associations,
dormant during five years, had thus been instantaneously awakened in his mind
(ibid., 95).
The case for imagination in animals is less obvious, and Darwins arguments for it
strain credulity. Darwin argues that the vivid dreams of dogs, cats, horses, and birds
demonstrate the involuntary combination of ideas21. More implausibly, Darwin suggests
that dogs howl at night because "their imaginations are disturbed by vague outlines of the
surrounding objects, and conjure up before them fantastic images; if this be so, their
feelings must be called almost superstitious (ibid., 96).
Darwin also believes that animals clearly have the power of attention, which is
demonstrated by a cats single-minded focus whilst stalking prey. Darwin again
emphasizes the importance of attention in explaining cognitive variation, offering as
support a monkey trainers opinion that the capacity for attention is the best predictor of a
monkeys performance (ibid., 95).
Of greatest interest here is Darwins argument for the presence of reason in
animals. As we have seen, Darwin believed that all reasoning was of the same basic kind
the association of ideas that trace back to sensory impressions. This commitment is
carried over to the Descent, where Darwin lays out his case for attributing to animals
reasoning that is different merely in degree from that of humans.
In a key passage that likely has as its direct precursor his discussion of the cart
horse in Notebook M 151 Darwin relates the story of Houzeau, who while crossing an
arid plain in Texas, noticed that his two thirsty dogs repeatedly ran to hollows in the
21

Darwin puzzles over the relationship between dreaming and the imagination throughout Notebooks M
and N. He notes how difficult it is to intentionally construct castles in the air through imagination, yet the
fanciful creations of dreaming do not require such effort (M102e, M103, M34, M35)

28
ground in search of water (ibid., 97). There were no signs of water in the hollows (no
smell of damp earth, vegetation, etc.), so the dogs behavior must have been caused by a
prior belief that water can be found in low ground.
Darwin suggests that if we saw some man (in his example, a savage) perform the
same behavior, we would interpret it as resulting from an act of reason. Of course, critics
of strong continuity with animals would argue that the mans act of reasoning may be
much more sophisticated, perhaps proceeding via general principles (i.e. Water is
always found in low ground) or even via scientific knowledge (i.e. Gravity causes
liquids to settle in the lowest part of a container). Darwin dispels this explanation by
giving his own Humean account of the savages behavior:
The savage and the dog have often found water at a low level, and the
coincidence under such circumstances has become associated in their minds
and in both it seems to be equally an act of reason, whether or not any general
proposition on the subject is consciously placed before the mind The savage
would certainly neither know nor care by what law the desired movements were
effected; yet his act would be guided by a rude process of reasoning, as surely as
would a philosopher in his longest chain of deductions (ibid., 98).
There are three things worth noting about this passage. First, Darwin begins by
identifying the vera causa of the humans behavior; the belief that water is found in low
ground is based on the contiguity of past impressions of water and low ground, from
which an association between the two was formed. Second, he argues that the behaviors
of the dog and man are equally explicable on that hypothesis; in other words, the true
cause of the savages behavior suffices to explain a similar behavior in the dog. Third,
though Darwin acknowledges that while a cultivated man may reason via a general law,
and thus there are some differences among him, the savage, and the dog, these are
differences in degree rather than kind.

29
In the same section of the Descent, Darwin gives a generalization of this
inferential strategy. To motivate it, he discusses an experiment in which a pike was
separated from other fish by a transparent pane of glass in the aquarium, such that when
the pike lunged to attack the other fish, he ran into the glass and was stunned. After
months of such efforts, the pike eventually learned not to attack the fish on the other side
of the aquarium, and this habit persisted even when the glass partition was removed.
Darwin then asks the reader to consider similar behaviors in monkeys and
savages. He notes that monkeys too will learn to avoid a painful stimulus, though they
typically require many fewer trials than did the pike. Likewise, he posits that if a savage,
who had never seen a large plate-glass window, were to dash himself even once against
it, he would for a long time afterwards associate a shock with a window frame though,
unlike the pike, the man would probably reflect on the nature of the impediment, and be
cautious under analogous circumstances (ibid., 97).
Darwin argues that though there is variation in the number of painful experiences
the pike, monkey, and man require to learn to avoid the glass and in the nature of their
beliefs about the glass, this variation should be explained by positing variation in the
same underlying cause of the behavior, rather than different causes altogether:
If we attribute this difference between the monkey and the pike solely to the
association of ideas being so much stronger and more persistent in the one than
the other can we maintain in the case of man that a similar difference implies
the possession of a fundamentally different mind? (ibid.).
The upshot is a general principle of cognitive ethological inference. If we explain
variation in behavior along some dimension between species A and B via variation in
some underlying psychological cause, P, and we observe that species Cs behavior varies
from Bs along the same dimension, then we also ought to explain the variation between

30
B and C via variation in cause P instead of introducing some wholly new cause to
account for the difference. This general principle is a mere formalization of the vera
causa strategy22. The association of ideas sufficiently explains the behaviors of the pike
and monkey and the degree of variation between them. Therefore, if the same cause is
present in humans and suffices to explain variation between the monkey and human, we
ought to posit that known cause and no others.

5.4 Continuity
From Darwins Humean theory of human cognition, it is a short leap to his
conclusion that human and animal cognition is highly continuous. Darwin believed that
associations of ideas derived from sensory impressions underlie all human cognition and
that varying strengths of expression of the Humean causes of cognition suffice to explain
the variation in human behavior, from the behavior of infants to that of the uncultivated
man to the highest cognitive behavior of a Newton. The same causes exist in animal
cognition. Since the variation between animals and humans falls within the range of
variation among humans, the vera causa view suggests that animal behavior, and
variation thereof, can be explained via the same underlying causes as human behavior.
Darwin was quite aware that humans achieve superlative levels of cognitive
performance, so if any behaviors required the postulation of unique causes, they would be
human powers of reason and abstraction. By adopting a philosophy of mind that reduces
22

In this passage, Darwin first claims that we would explain the pikes and monkeys behaviors via the
association of ideas, operating with varying strength. Then, he argues that if we account for variation
between the pike and monkey that way, we should extend that explanation to account for variation between
the human and monkey. While this subverts the order of the vera causa strategy I have attributed to Darwin,
it still depends heavily on the particular theory of human cognition that he adopts. If he hadnt already
established that all reasoning is based on the association of ideas via similarity or contiguity, then his
explanation of the pikes and monkeys behaviors in those terms would be unmotivated.

31
all human psychological capacities to Humean empiricist components which animals
share, Darwin dispenses with the prima facie obstacle of explaining how those
extraordinary capacities arose. It is not parsimony that drives Darwin to deny that the
causes of animal and human behavior may be of different kinds. Rather, in his Humean
picture of mind, there is no room for genuine mental discontinuities.
On the other hand, if one rejects the Humean picture of human psychology, it is
an open question whether there are any causes of human behavior that are not shared by
other animals. If there are human cognitive behaviors that Humean psychology cannot
sufficiently explain, then it may be necessary to admit new psychological causes into our
ontology. If these behaviors are uniquely human, then there will be a genuine
discontinuity in mental faculties across the animal kingdom. We will now turn to one
such argument from Morgan, which he uses to motivate his conflicting Canon of
cognitive ethological inference.

6. Morgans Hierarchy of Psychical Faculties


In contrast with the principles of Darwin and Hume that instruct comparative
psychologists to offer unified explanations of animal and human cognition, Morgans
principle has been interpreted as a recommendation that disunified explanations should
often be favored. Interestingly, Morgan does not disagree with the assumption that
cognitive ethographic inferences must start with an examination of human psychology.
He writes, we are forced, as men, to gauge the psychical level of the animal in terms of
the only mind of which we have first-hand knowledge, namely the human mind. The
question, then, is how are we to apply the gauge? (Morgan 1896, 55).

32
Morgan offers his Canon as a principle that should guide this induction from
human psychology to animal psychology. Though it is found in several forms throughout
his works, its most famous formulation comes from his Introduction to Comparative
Psychology:
In no case may we interpret an action as the outcome of the exercise of a higher
psychical faculty, if it can be interpreted as the outcome of the exercise of one
which stands lower in the psychological scale (ibid., 53).
One tempting way to read his Canon is as a principle of parsimony, in which simpler
explanations those in terms of lower faculties are to be preferred over explanations
that posit more complex causes. Though many commentators have adopted this
interpretation, this is not how Morgan himself viewed his Canon. Responding to
criticisms of his principle, Morgan suggests that it would actually be simpler to explain
animal behavior via the higher psychical faculties that humans exhibit23. Even so, he
writes, surely the simplicity of an explanation is no necessary criterion of its truth
(ibid., 54).
Instead, I will argue that the argument for the discontinuity of animal and human
minds which motivates Morgans Canon is a vera causa argument, similar in form to the
one Darwin utilizes but differing in the true causes of human behavior that it identifies.
To begin to motivate this claim, it will be helpful to consider an addendum to the Canon
from Morgans later work, Animal Intelligence:

23

In this passage, Morgan does not state why he thinks that explanations in terms of higher faculties are
simpler. Notably, he does not argue that they would be simpler in virtue of their unifying powers. I will
return to an explication of the sense in which Morgan thinks they are simpler.

33
The canon of interpretation which I have elsewhere suggested is, that we should
not interpret animal behavior as the outcome of higher mental processes, if it can
be fairly explained as due to the operation of those which stand lower in the
psychological scale of development. To this is may be added lest the range of
the principle be misunderstood that the canon by no means excludes the
interpretation of a particular act as the outcome of the higher mental processes, if
we already have independent evidence of their occurrence in the agent (Morgan
1920, 270-271).
In other words, Morgans Canon is a prescription for what to do in the absence of
evidence for a particular psychological faculty, not a prescription for what would
constitute such evidence.
This clarification is important, for it rebuts the interpretation according to which
explanations in terms of lower faculties are always to be preferred. It also implies that
Morgan did not think that there could never be evidence of the existence of higher
faculties in animals. It may be read as a requirement that comparative psychologists must
independently establish the existence of a known mental faculty in an individual or
species before using that faculty to explain her behavior; this is but a demand that they
satisfy the second step of a vera causa inference.
Morgans ultimate conclusion in his comparative psychological works is that
there is in fact no evidence of the highest psychical faculties in any non-human animals,
and thus, that there is a genuine discontinuity between animal and human minds. To see
how he arrives at this view and the role the Canon plays in his inference, we must first
get clear on Morgans account of the known causes of cognition, his hierarchy of
psychical faculties.
Nearly all recent commentators on Morgans Canon have focused solely on the
Canon and its applications in comparative psychology without considering the role that it

34
plays in Morgans comprehensive theory of psychology. In a representative example,
Fitzpatrick (2008) states:
One thing in particular that we need to be clear about is the distinction between
higher and lower psychical faculties. Morgan himself was vague about what
he meant by these terms and about the nature of the psychological scale.
Subsequent theorists have done little to clarify these notions (226).
While Fitzpatrick is entirely correct about subsequent theorists, Morgan himself is quite
clear about his usage of those terms.
Throughout his major works, Morgan argues for a tripartite distinction of mental
faculties, in which the existence of faculties higher in the scale entails the existence of
lower ones. From lowest to highest, his ontology of mental faculties can be briefly
characterized as follows (Morgan 1896, Morgan 1920):
1. Instinct: innate and purely mechanical motor responses, where any consciousness
that accompanies these responses is epiphenomenal in producing the behavior
(1896, 206).
2. Intelligence: the training of an instinctual behavior via learned associations
between the (dis)pleasure accompanying performance of the behavior and
observed features of the environment, where consciousness affords data that gives
the animal selective control over instinctual behaviors (ibid.; 208, 212).
3. Reason: conceptual thought operating on explicit representations of the
associations that govern intelligent behavior.
We can illustrate this distinction through an example that Morgan discusses at length. In
his many experiments on chicks, Morgan observed that immediately after hatching,
chicks reflexively and indiscriminately pecked at objects, including food objects but also
spots on the ground and their own feet. These peckings are produced by the lowest
psychical faculty of instinct, as there can be very little question that the motor response
is, as we are apt to say, purely mechanical (ibid., 206).

35
Some of these peckings resulted in the chicks having pleasurable sensations, such
as the taste of an edible worm. Others resulted in displeasurable sensations; for example,
Morgan presented his chicks with hive bees, which produce a distasteful poison. After a
few instances of reflexive peckings at the hive bees, the chicks associated the sight of a
hive bee with a displeasurable experience and subsequently avoided pecking at them.
According to Morgan, when an instinct is, as so often is the case, modified and adapted
to meet new circumstances, the modification and adaptation is no part of the instinct as
such, but is due to intelligent control (ibid., 208). Thus, it is clear that the possession of
intelligence entails the possession of instinct, as instinctual behaviors provide the data to
consciousness from which associations can be formed.
It is worth noting what types of evidence would require an explanation in terms of
intelligent behavior and thus be admissible under the Canon. He writes:
We must not be too ready to put down to instinct all the habits of animals, even
definite habits common to a species, without taking the trouble to ascertain by
careful observation and experiment how far these habits, though based on an
innate capacity for motor response, are rendered definite through imitation,
parental teaching, or tradition (ibid., 210).
Instinctual behaviors are inflexible, and as long as they remain instinctual, are not
modified by conscious experience24. If we see a change in the targets of behaviors in an
individual over time, for instance a preferential targeting of pecking away from hive bees
and toward edible worms, then this is evidence that a learned association is mediating the
subjects behavior.
Morgan gives a largely Humean account of the laws governing the intelligent
stage of psychical development. Like Hume, he believes that associations among

24

These behaviors may become more refined due to the exercise of the relevant nerves and muscles.
However, there is no change in the targets of these behaviors.

36
impressions are governed solely by their contiguity or similarity within experience and
that the capacity to form associations is innate, though the specific associations that are
formed depend entirely on sensory impressions (ibid., 89-90). Intelligent behavior
reaches only as far as the data given by experience, obeying the laws of contiguity and
resemblance. As a result, the attribution of intelligent behavior is only plausible if the
subject has the requisite experience to furnish it with the learned association. Thus, one of
the hallmarks of intelligence, for Morgan, is its dependence on trial and error learning.
He writes, Association links cannot be formed vicariously all the associations must be
established by the individual himself for his own individual guidance (ibid., 90).
On the intelligence hypothesis, the chicks avoidance of the hive bee is the result
of the automatic elicitation of an associated distasteful impression. However, we could
also explain the chicks behavior by positing that it had formed the explicit belief that
All hive bees are distasteful, and upon seeing a novel hive bee, it inferred that
Therefore, this hive bee will be distasteful, and thus I will avoid it. In the first case, the
chick acts in accordance with a learned association, though the association itself (All
hive bees are poisonous) need not be represented. In the second, the chick explicitly
represents the learned association between hive bees and poison, and its belief about the
association itself causes its subsequent behavior. This latter explanation attributes the
highest psychical faculty, reason, to the chick.
According to Darwin, both of these processes would be instances of reason, with
the latter reachable by a small series of steps from the former. However, according to
Morgan, this latter type of reasoning faculty is a wholly distinct faculty. He may have
Darwin in mind when he writes:

37
To this process of practical inference through association in sense-experience,
the term reasoning is applied by many writers There is, however, a narrower
use of the word reasoning Reason, in this narrower sense, implies the power of
perceiving, and conceiving the logical relation as such. This is the sense in which
I use the words reason and reasoning, reserving the word intelligence for
the faculty in virtue of which inferences are suggested in the field of senseexperience. On this view, if we wish to determine whether an inference is rational
or intelligent, we must inquire whether the logical relation is clearly perceived or
not (ibid., 283).
We can characterize Morgans hierarchy of faculties in terms of the
representations they involve. An instinctive behavior need not be caused by any
representation of features in the environment. For example, I do not need to be
consciously aware of the fact that the doctor is hitting my knee with a mallet in order for
my leg to jerk in response. Even if I am so aware, my perception merely accompanies my
reflexive action and does not cause it.
Intelligent behavior does require the subject to store and remember
representations of observable environmental features. For example, in order for past
experience of hive bees to cause a chick behavior, the impression of a hive bee must elicit
the impression of a bad taste. This requires that the chick have stored representations of
each impression, which have become associated through experience.
Just as instinct provides the representations upon which intelligence operates,
intelligence provides the representational materials for reason. Intelligence consists of
associations between sensory impressions. A reasoning subject explicitly represents those
associations, and this representation of the relation that holds between two ideas can be
stored as its own idea and can become combined with other representations.
In a key point of departure from Humes theory of reason, for Morgan, the ideas
of reason are not beholden to the laws of association via contiguity or resemblance:

38
Sense-experience affords the raw material out of which all our higher conceptual
thought is elaborated. But only in the light of reflection are the relations which are
involved perceived and conceived as such when through reflection and the
perception of relationships, we rise above the plane of sense-experience, when to
associations by contiguity or by superficial resemblance, there are added
associations through perceived similarity of relations (ibid., 284).
According to Morgan, this feature of reason that it is not restricted to previously learned
associations makes it responsible for the capacities of insight (anticipation of novel
events), descriptive communication, and speculative scientific reasoning. In the next
section, I will explain Morgans arguments for the discontinuity of reason and
intelligence. First, though, I will defend the claim that Morgan intended his Canon to
apply only to these three faculties
Morgans Canon has been taken by comparative psychologists to license an
across-the-board preference for simpler explanations, in many different senses of
simpler25. However, Morgans own application of his Canon is surprisingly rare and is
restricted to the three-part hierarchy that I have sketched above. A few examples will
suffice to illustrate how Morgan actually utilizes his principle.
In his discussion of memory, Morgan presents the case of an elephant who was
given a sandwich of cayenne pepper by a malicious Captain Shipp. Six weeks later, Shipp
visited the elephant, who took the opportunity to fill his trunk with water and drench the
Captain. Morgan considers two explanations for the elephants behavior. The lower
explanation is that the sight of the Captain automatically elicited a painful or angry
impression in the elephant, which caused him to attack the Captain. This type of memory,
which Morgan calls desultory memory, is the Humean capacity of memory to trigger
associated impressions in relevant circumstances.
25

For example, simpler cognitive mechanisms might be ones which are less flexible, less cognitively
sophisticated, or more economical of resources. See Meketa (2014) for a critical discussion.

39
The alternative explanation Morgan considers is that the elephant had
remembered the Captains trick as taking place at a specific time and place, noting its
relation to other times and places, and was remembered not in virtue of its association
with other facts or experiences by contiguity or by mere resemblance, but because it is
seen to fit into a particular place in a definitely organized scheme, to have meaning in
reference to a system of knowledge (ibid., 119). This capacity of systemic memory
accompanies only the highest faculty of reason, according to Morgan, since it requires the
representation of temporal and spatial relations themselves.
Both explanations would suffice to explain why the elephant sprayed the Captain,
so Morgan applies his Canon here. He asks, Can the phenomena of memory in animals,
so far as we can observe and infer, be explained in terms of reinstatement, or must we
infer a further process of time-localization? If they can be explained in terms of mere
reinstatement, we are bound by the canon not to assume any higher process. In the
case of the elephant:
All that the facts warrant us in concluding is that the sight of the captain gave
rise, by association, to a reinstatement of a previous occurrence. So long as the
association had not faded from memory, it mattered not whether it was six weeks,
six days, or six months ago; and there is no reason for supposing that the elephant
in such a case localized the event in time, in even the most rudimentary way
(ibid., 120-121).
In this case, Morgan considers two sufficient explanations, one requiring mere
intelligence and the other requiring the representation of time relations, and uses his
Canon to come down in favor of the former.
Another of the rare employments of Morgans Canon is shown in his lengthy
discussion of his efforts to teach a dog to pass a stick through a narrow fence opening.
The stick would catch on the rails if the dog grasped it in its jaws at its center point and

40
tried to walk through the opening, but it would pass if the dog grasped it at the end knob
and rotated it. Despite many trials and Morgans demonstrations of the correct solution to
the task, the dog only succeeded after it chanced across the right method in one of his
many trials, and even afterward, it continued to attempt to pass the stick by grabbing it at
its center.
Morgan concludes that though a dog could probably be taught the trick after many
trials and rewards, the dog showed no evidence of understanding the relationship between
the angle of the stick and its ability to pass through the opening. He concludes:
What the observations show, so far as the dogs observed are concerned, is that
their way of dealing with the difficulty is the method of trial and error, which is
the method of sense experience. In other words the facts observed can be
completely explained on the hypothesis that there is sense-experience only. The
perception of relations as such is not necessary to the performances, and is
therefore by our canon of interpretation excluded (ibid., 259).
Again, the Canon is used to contrast an explanation in terms of Humean intelligence with
one in terms of the representation of relations26.
Perhaps even more illuminating are those cases in which Morgan does not apply
his Canon, despite the fact that he is comparing simpler and more complex mental
faculties. In his discussion of association by resemblance, Morgan considers two
26

This latter case also illustrates the sense in which Morgan takes higher psychological explanations to be
simpler. A passer-by who witnessed a trained dog successfully grasping the stick by its end, but who had
not observed the lengthy trial-and-error process by which it was learned, would likely infer with delight
that the dog understood the mechanics of the problem and devised a creative solution. This explanation
posits that the dogs behavior was reached by a single step of insight or understanding. However, the true
explanation of the dogs behavior involves hundreds of steps of trial and frustrating error and a stroke of
luck in chancing on the right solution. Though false, the passer-bys explanation is simpler in terms of the
number of causal steps it posits to explain the performance of the trick.
Elsewhere, Morgan makes it clear that it is this sense of simplicity he has in mind when he admits
that explanations which attribute reason to animals will often be simpler. By analogy, Morgan suggests that
the explanation of the genesis of the organic world by direct creative fiat, is far simpler than the
explanation of its genesis through the indirect method of evolution, and that it is simpler to explain the
Grand Canyon by positing one sudden rift than millions of years of erosion (1896, 54). In each case, the
simpler explanation is the one that makes the outcome the result of a single cause or single step, while the
more complex explanation requires the tracking of several different causes or many iterations of the same
cause.

41
explanations for why chicks that had learned that hive bees were distasteful also avoided
pecking at similar-looking striped flies. The first is that the chicks mistook the flies for
bees, or rather that the impression of stripes alone was enough to trigger the idea of a bad
taste. The second is that the sight of the fly triggered, via resemblance, an idea of the bee,
which in turn elicited, via contiguity, the idea of a bad taste. The first explanation
attributes only two impressions to the chick and a single type of association, while the
second attributes three impressions and two different types of association. Thus, the
former explanation is simpler on two counts, and we might suspect Morgan to apply his
Canon in favor of it.
However, Morgan does not do this. With respect to the second explanation, he
writes:
We shall find it difficult to obtain evidence of a satisfactory kind of the existence
of this mode of association in animals. Not that this necessarily shows that such
associations by resemblance are absent from the mental processes of animals. It is
quite possible, nay more, exceedingly probable, that they may frequently occur
(ibid., 98).
There is a marked contrast between this case and those where Morgan employs his Canon
to deny that animals have higher mental states. Morgan does not apply his Canon here
because he is comparing two explanations that posit psychical faculties at the same level,
intelligence. Recall the addendum that Morgan made to his Canon, that the canon by no
means excludes the interpretation of a particular act as the outcome of the higher mental
processes, if we already have independent evidence of their occurrence in the agent
(Morgan 1920, 270-271). Since we already have evidence for the presence of intelligence
in animals, the Canon does not preclude us from attributing different sorts of intelligent
behavior to them, even when doing so is less parsimonious.

42
It is clear that Morgan does not take his Canon to prescribe an across-the-board
prescription in favor of lower psychological explanations. His Canon is only meant to
preclude explanations in terms of reason when intelligence alone would suffice or in
terms of intelligence when instinct alone is sufficient. However, the justification for this
more restricted form of the Canon is still in question, and it is to that question that we
will now turn.

7. Morgans Argument for Discontinuity in Reasoning


The best way to elucidate Morgans argument for discontinuity is by contrast with
Darwins argument for continuity. Recall that in the first step of his vera causa argument,
Darwin established that all human cognition is constituted by Humean associationism,
and we can explain all variation in human cognitive behavior through variation in the
exercise of memory, attention, and the power of association. From there, it was but a
short leap to show that the existence and variation of the same underlying causes could
explain the smaller amount of variation between humans and animals.
The structure of Morgans argument for discontinuity also bears the hallmarks of
the traditional vera causa approach. Morgan first argues that Humean associationism
(Morgans intelligence) is responsible for a large class of human behaviors27. The vera
causa approach recommends that if a behavior performed by an animal is caused by
intelligence when performed by humans and we have evidence that animals possess
intelligence, then we ought to explain the animals behavior by positing the same cause
and no other causes that (a) are not entailed by intelligence, and (b) are not necessary to
27

Sense-experience and association afford the basis of a great number of expectations of the greatest
practical value in the conduct of life A very large proportion of our practical inferences in daily life are
of this intelligent type (Morgan 1896, 281-282).

43
explain that behavior. In this light, Morgans Canon is seen as a straight-forward
embodiment of the vera causa view.
So far, Morgans approach is identical to Darwins. The difference arises from
Morgans argument that there is a small class of human behaviors that cannot be
explained by Humean associationism. Therefore, there is a real discontinuity in human
behavior, and to explain this discontinuity, we must posit a distinct psychological faculty
of reason. Most importantly, Morgan argues that none of the human behaviors that are
caused by reason are exhibited by animals. The faculty of reason though a true cause of
human behavior is not necessary to explain any animal behaviors, and therefore, it
should not be attributed it to them. The discontinuity within human behavior thus marks a
phylogenetic discontinuity as well. In a striking contrast with Darwin, Morgan writes,
the extraordinary difference between men, even the lowest, and animals, even the
highest, is due to the introduction of the new factor involved in the perception of relations
and conceptual thought (1896, 293). The two key parts of Morgans argument are that
Humean associationism cannot explain an important set of human behaviors and that
there is no evidence that animals exhibit any behaviors in that set. I will consider these in
turn.
Throughout my discussion of Morgans distinction between intelligence and
reason, I have gestured to a distinction between acting in accordance with a relation and
explicitly representing the relation itself, but this distinction requires some unpacking in
order to make sense of Morgans argument that the latter capacity is of a different kind
than the first. Hume himself distinguishes between two senses of relation. He writes:

44
The word relation is commonly used in two senses considerably different from
each other. Either for that quality, by which two ideas are connected together in
the imagination, and the one naturally introduces the other or for that particular
circumstance, in which, even upon the arbitrary union of two ideas in the fancy,
we may think proper to compare them (T 1.1.5.1; SBN 13).
The first type, then, is marked by the natural elicitation of one idea by the other, governed
by the laws of association by resemblance, contiguity, and causation. Through the
imagination, he allows, we can also hold two ideas before the mind to see whether they
bear other sorts of relations to each other, such as similarity or degree. This latter
capacity requires that the relation itself be represented, such that it can freely be
combined with and used to evaluate previously unconnected ideas. Thus, Humes
distinction between natural and philosophical relations seems to mirror that between
acting in accordance with some relation in the world and representing the relation itself.
Morgans claim is that reasoning about philosophical relations requires a different
kind of mind one that only humans possess than merely obeying natural relations. To
begin to motivate his argument, we can first point out an ambiguity in how Hume thought
of natural relations (Inukai 2010). For example, suppose that the impression of thunder is
always experienced contiguously with the impression of lightning, such that the idea of
one naturally introduces the other. What is the quality by which the ideas of lightning and
thunder are connected together in the imagination?
One interpretation is that the phenomenon of lightning, out there in the world, is
actually contiguous with thunder, and it is in virtue of this real relation that they bear to
one another that their ideas become connected in the mind. Speaking loosely, the fact that
lightning and thunder are actually contiguous causes (this is one sense of by which)
them to become associated in the mind. On this view, their contiguity is present in the

45
experience as a relationship among the objects of perception; that is, the relation is part of
the impression. Another interpretation of Humes view is that contiguity is a relation that
holds between the ideas of lightning and thunder. The relation is not part of the
experience; the contiguity of lightning and thunder is constituted by (another sense of by
which) the connection in the mind by which one idea naturally elicits the other. On this
reading, the relation of contiguity is not present in the impression but a fact about how
experiences are related in the mind.
By Humes lights, one can have an idea of something only if there was a
corresponding impression of that thing, given in experience. Therefore, to have an idea of
a pure relation, that relation must have been present in the impressions of lightning and
thunder, ruling out the second interpretation. Inukai (2010) argues persuasively that
Hume did indeed view natural relations as consisting in the objects of experience, where
it is in virtue of these real relations that ideas become associated in the mind. For Hume,
relations are constituents of complex impressions and ideas, which the imagination can
isolate through the distinction of reason. His discussion of our idea of extension is
illustrative:
The table before me is alone sufficient by its view to give me the idea of
extension. This idea, then, is borrowd from, and represents some impression,
which this moment appears to the senses. But my senses convey to me only the
impressions of colourd points, disposd in a certain manner. If the eye is sensible
of any thing farther, I desire it may be pointed out to me. But if it be impossible to
shew any thing farther, we may conclude with certainty, that the idea of extension
is nothing but a copy of these colourd points, and of the manner of their
appearance (T 1.2.3.4; SBN 34).
The shape of the table is present in the impression, though it is not a separate impression
from that of the colored points which make up the table. It is the manner of appearance
of the points, which we may interpret as the relation that the points bear to one another.

46
The shape of the table, then, is a first-order perceptual relation which has simple sensory
features (colored points) as its relata.
While Hume can give an account of how we come to represent first-order
relations, can he account for the truly abstract relations-of-relations that Morgan takes to
be constitutive of the highest forms of human reason? To examine this question, let us
consider how Hume could explain our ability to reason about the relation {being the same
shape}. Similarity of shape seems to be a truly abstract relation, since it can be applied to
perceptually dissimilar sets of objects. For example, we can deem that the relation which
holds between a basketball and a baseball is the same as the relation that holds between a
Rubiks cube and a die, but basketballs and baseballs are not perceptually similar to
Rubiks cubes and dice. Therefore, the relation {being the same shape} does not seem to
be reducible to any particular first-order perceptual relation28. If {being the same shape}
is not present in any impression, then how is it possible to have any idea of it?
According to Humes deflationary account of abstract relations, we have no
general conception of {being the same shape}, we merely have ideas of instances of pairs
of objects that were of the same shape to which the general term same shape has been
associated. Therefore, when we see a basketball and a baseball together, this elicits the
associated term, same shape, which in turn makes them recall upon occasion other
individuals which are similar to them (T 1.1.7.1; SBN 17). There is an ambiguity in
Humes claim here that the term same shape makes us recall similar individuals. In
what way are pairs of cubes similar to pairs of globes?

28

In Chapter 3, I will discuss a surprisingly similar issue in contemporary cognitive science regarding
animals ability to match abstract relations.

47
One interpretation is that they are similar only in virtue of the fact that both pairs
have been associated with the same general term, same shape. This radical nominalist
interpretation encounters two problems. The first is that it seems to simply beg the
question; in virtue of what have we decided to label both pairs of cubes and pairs of
globes with the term, same shape29? The response that such a labeling was merely
arbitrary is wholly unsatisfactory since we seem to make fruitful and accurate inferences
on the basis of our grouping of the two under the same general category. The second
problem is that we can extend the relation {being the same shape} to pairs of items that
we have not previously encountered, let alone observed being associated with the term
same shape. No doubt, a person who had just seen a corkscrew for the first time could
judge that it bore the relation of {being the same shape} to a pigs tail, even though it had
never heard the label used in association with that novel pair of objects.
A more plausible interpretation30 is that pairs of globes and pairs of cubes are
similar in virtue of some other feature which causes them to be given similar labels.
However, a tension with Humes philosophy of mind is immediately apparent here. In
order for an idea of a pair of globes to suggest the idea of a pair of cubes, they must be
associated via either resemblance or contiguity, and we can dismiss the latter mechanism
of association in this case. For Hume, association by resemblance occurs when two
impressions are perceptually similar. By hypothesis, globes do not resemble cubes, for if
they did, then our idea which unites them would not be an abstract idea. Further, it cannot
be that the imagination has invented some similarity between globes and cubes, for in
29

Morgan makes a similar objection to the view that language suffices to introduce relations: The relations
must be already there, implicit in sense-experience, before they can become explicit through the
instrumentality of a sound or sign by which they may be indicated (ibid., 249)
30
I am not claiming that this is a more plausible account of what Hume actually thought, but rather a more
plausible interpretation of how a Humean reasoner could have abstract concepts.

48
order to have the idea {being the same shape}, there must be some corresponding
impression that gave rise to that idea. Therefore, to save his views of abstract concepts
and philosophical relations, Hume must posit that {being the same shape} is somehow
present in our complex impression of a basketball and baseball, and the same relation
must be present in our impression of a pair of cubes.
Is this something that Hume can plausibly accept? By analogy with his analysis of
our impression of the table, Hume can thus admit that when we see the basketball, we can
perceive the relation {being globe shaped} that its orange and black points bear to one
another. The white and red points which make up our impression of the baseball are also
related to one another by {being globe shaped}. Therefore, our complex impressions of
the basketball and baseball are similar to one another with respect to that feature, and an
association via resemblance may be formed. Similarly, there is an observable
resemblance, {being cube shaped}, between the Rubiks cube and the die.
However, the observed first-order relation {being globe shaped} is not the same,
nor does it perceptually resemble, the relation of {being cube shaped}. Instead, the two
are related as instances of a second-order relation-of-relations, {being the same shape},
which takes first-order perceptual relations i.e. {being globe shaped} and {being cube
shaped} as its relata but cannot be reduced to either31. In order for Hume to give an
account of our abstract ideas which is (a) consistent with his own explanatory machinery
and (b) independently plausible, he must posit that the relation of {being the same shape}

31

Lacking this, it is still possible that a Humean reasoner could learn to associate {being globe shaped}
with {being cube shaped} if the two were contiguously experienced, but it would not be led to this
association by resemblance alone.

49
not merely the relation of {being globe shaped} is present in our impression of a pair
of globes32.
This implication of Humes view is important, for it marks a serious division
between Darwin and Morgan, one that explains their differing judgments about continuity
in reasoning ability. Recall that Darwins argument for the continuity of reasoning
capacities between humans and animals rested on his belief that the progression from
simple perceptual associations to general, abstract principles is constituted by a series of
steps, all of the same kind. Morgan argues that there is a step in this process the
representation of the abstract relationship itself that is of a wholly different kind from
those that precede it.
Morgan argues that in intelligent behavior, a subject may be sensitive to the
relations that hold among her impressions. For instance, one may see the colored points
of a table arranged in a certain manner and behave appropriately in light of this spatial
information, for instance, giving sufficient berth when walking around it. However, this
is possible without the subject ever representing the relation of {being table shaped}
itself. He writes, so long as we are dealing simply with impressions in nave sense
experience, the relationship need not yet have been perceived. The relations may have
been implicitly there, but they may not yet have become explicit (ibid., 221).
Morgan acknowledges that it is a subtle distinction, and he provides a helpful
illustration. Imagine a child who is presented with three wafers in succession; first, a red
wafer, followed by another red wafer, followed by a blue wafer. The merely intelligent
(not yet reasoning) child may be sensitive to the fact that the first two wafers are similar.
Upon seeing one red wafer, the child subconsciously anticipates that the next wafer will
32

I am indebted to the careful analysis of Inukai (2010) here.

50
be red, and this expectation is not violated in the transition from red to red. The transition
from a red wafer to a blue wafer will be surprising, and he will be aware of
dissimilarity33.
For the child then, sensitivity to the relation of similarity is nothing more than an
expectation elicited by the disposition of ideas to naturally elicit resembling ideas, and
sensitivity to dissimilarity is just a violation of this expectation34. Genuine reason does
not arrive on the scene until the subject is capable of making the relation explicit. Contra
Hume, Morgan argues that the relation of similarity is not present in any of the childs
impressions, and therefore, a merely intelligent (Humean) subject could not have an idea
of the relation itself.
First, Morgan notes, the relation of {being the same color} is not present in the
impression of the first wafer; until the second wafer is presented, the relation is not
complete. However, when the second wafer is present, the impression is simply of a new
red wafer. The impression of the relation {being the same color} is not present in the
impressions of either wafer, but rather in the transition between the two. Therefore, the
relation is not present in any of our sense impressions. As Morgan concisely states his
argument:
Every relation involves two related terms; until the second term is given, the
relation is incomplete; but at the moment of passage from the first to the second,
the latter is not yet given. Hence it is impossible to sense a relation in its
completeness during the transition which is its psychological equivalent (ibid.,
225).

33

In a more recent discussion, Carruthers (2008, 61) gives a very similar deflationary account of behaviors
that have been interpreted as evidence of metacognition in animals.
34
Morgan writes, The subconscious awareness of relations prior to the advent of perception has its origin
in the transitions of consciousness from one focal state to another; and further, that in the unreflective flow
of consciousness the transitions are entirely marginal (ibid., 224).

51
To repeat, a merely intelligent subject is disposed to associate impressions or ideas that
perceptually resemble each other or are experienced contiguously. This disposition
creates expectations that guide behavior, expectations that give a sense of dissimilarity
when violated and similarity when confirmed. The relations of similarity or dissimilarity
themselves are never present in her impressions, however, so she cannot explicitly have
an idea of them. This argument, if sound, holds of any abstract relation whatsoever, so a
Humean subject cannot genuinely reason about abstract relations, such as {being the
same shape}.
The transition from intelligence to reason, for Morgan, requires the capacities of
reflection and retrospection. Relations are not present in ideas; they hold between our
ideas. To perceive a relation, then, one must be able to look back at ones previous ideas
and perceive the relations that hold among them. For Morgan, a representation of a
relation is inherently second-order; it takes as its relata other mental representations
(ideas). If the ideas it relates are of perceptual relations, deriving from impressions of
spatial or temporal relations, then what is explicitly represented is a second-order
relation-of-relations.
In contemporary terms, we may properly think of Morgan as advocating a
traditional associationist view of animal cognition, according to which animal behaviors
are not cognitively mediated by intervening representations but are naturally and
automatically elicited in accordance with learned contingencies. This view of animal
cognition has faced serious challenges over the past several decades, and the picture that
has emerged is that animals do seem to form representational categorizations of the

52
relations among observable stimuli35. For now, it suffices to note that Morgans view
entails that animals will be incapable of representing relations in such a way that they can
be applied flexibly to perceptually distinct classes of objects. It is this feature of
representations of relations that does the crucial work for Morgan of establishing
significant discontinuities between humans and animals.
This capacity is also the key innovation that opens the door to a whole new world
of cognitive abilities, a realm of well-nigh boundless extent (ibid., 227). Once the
relation has been isolated, the subject can use it to compare any two ideas she likes. She
can swap out the particular sense impressions from which she extracted the relation to
reason counterfactually about novel cases. Though she may have learned the {is the same
shape} relation from observations of a baseball and a basketball, she can now hold the
relation fixed and swap the idea of a baseball for the idea of an eye to determine whether
the same relation holds. She can also use these relations as inputs to even higher-order
relations of relations. From these, she can develop general concepts that can be used to
explain, predict, and make inferences about the regularities she encounters. Thus, a
genuine discontinuity in reasoning results when the Humean materials of intelligence
have been accentuated with this new representational repertoire; For beings who have
reached the conceptual stage then, association deals with a new order of ideas, those of
general import (ibid., 269).

8. Morgans Argument that Human and Animal Reasoning is Discontinuous


So far, Morgans discussion has only purported to show that various human
cognitive behaviors are discontinuous. To turn this into an argument for discontinuity
35

I will return to this issue in Chapter 2 and subsequent chapters.

53
between humans and animals, he must argue that intelligence is a known cause of all and
only those behaviors that humans share with animals. Since intelligence thus suffices to
explain all animal behaviors, we should not attribute any other mental causes to them.
Morgans arguments in defense of this claim back him into a difficult dialectical position,
for they lead him to repeatedly emphasize just how useless the capacity to reason about
relations usually is. Consider, for instance, the following claims in Morgan (1896,
emphasis added):
[As children], we have not learned the trick of making [relations] focal in
consciousness. Nor have we in our early days much need to do so, since marginal
awareness amply suffices for all the purposes of practical experience (229).
Of what practical service would it be for the fox-terrier pup to make the relations
focal in perception? I am unable to see that it would be of any practical service or
advantage to him as a fox-terrier; and being of no practical service or advantage
to him, I am unable to see what grounds we have for supposing that this faculty
has been developed in him or in his race (243).
Where would be the advantage of perceiving relations as such, since practical
sense-experience suffices for all the needs of existence of many highly organized
animals? (250).
Morgan at the same time insists that reasoning about relations would be of no use to the
animal while building his case for human cognitive exceptionalism around that very
ability. This raises two challenges to his project. If the vast preponderance of a reasoning
subjects behaviors will be identical with those she would perform if merely intelligent,
then it is hard to see first, what could count as evidence that she explicitly represents
relations, and second, how that capacity could indeed responsible for all of the
remarkable heights of human cognition.
These very same issues arise in contemporary revivals of Morgans view and will
be discussed in much greater depth in subsequent chapters, so for now, a brief discussion

54
of Morgans own suggestions on the matter will suffice. Morgan sometimes suggests that
the only function of representations of relations is to permit descriptive
intercommunication36 and that language is perhaps the only possible evidence of reason
(ibid., 243). If true, this result would be methodologically stultifying for comparative
researchers, for it would preclude the possibility that we could obtain evidence of reason
in non-linguistic animals.
Elsewhere, however, Morgan hints at more helpful ways of distinguishing
behaviors caused by reason from those caused by intelligence alone. Intelligence is
strongly beholden to past observations; an intelligent creature can only reason with those
ideas that it has gained through sensory experience, and the associations among them are
limited by their resemblance and contiguity as given in sensory experience. If an animal
extends a known association to a novel situation that is not perceptually similar to one it
has already observed, then according to Morgan, this would be strong evidence that it had
grasped the relation common to both. However, if we observe that the animal fails to do
so and instead requires trial-and-error learning to develop the necessary association in the
new case, we ought not attribute reason to it:
If, for example, a dog which has been accustomed to swim in a lake, and has
therefore no practical acquaintance with the effects of a stream, comes to a
running river, and, seeing sticks and straws floating down, at once and without
previous trial takes to the water further up stream than the point he desires to
reach, and does so every time he is brought down to that river, then the
probability of his reasoning in the matter would I think be strong. It is difficult to
see how, under these circumstances he could adopt the course he does without
some such thinking of the therefore as we should express as follows: Sticks
floating in that water are carried down, therefore if I float in the water I too shall
be carried down (ibid., 292).

36

Distinguish between descriptive and indicative communication.

55
According to Morgan, there is no compelling evidence that such spontaneous transfer is
ever exhibited by animals while there is plenty of evidence of their failure to do.
Therefore, intelligence suffices to explain all animal behavior, and in many cases, an
attribution of reason would fly in the face of our evidence.
Notably, Morgans reasoning here does not utilize his Canon. From the known
effects of intelligence and reason in humans, he derives conditions under which we could
obtain evidence for reason in animals and finds it lacking. On a reasonable interpretation
of evidence, if spontaneous extension of a relation to a novel situation is evidence for
reasoning, then the failure to do so (absent some alternative explanation) should count as
evidence against it. Thus, Morgan is driven to his conclusion of discontinuity primarily
for evidential reasons. Interestingly, several prominent discussions of Morgans Canon
have called for a replacement of the Canon with a more general commitment to
evidentialism; we should not interpret an action as the outcome of a cognitive process
unless we have evidence that favors that interpretation over alternatives (Sober 1998b,
Fitzpatrick 2008). If I am correct about Morgans application of his own Canon, then he
would hardly see this call for evidentialism as an alternative, rather than an embodiment,
of his view.

9. Setting the Scene for Current Comparative Psychology


What may seem like a purely historical debate about the nature of the
psychological causes of behavior is actually quite instructive for comparative
psychologists working today. Morgans view that the capacity to represent relations
themselves in particular, abstract relations that do not reduce to perceptual regularities

56
within sensory impressions marks the discontinuity between human and animal minds
has had a significant revival in cognitive science of late. Indeed, debates over this theory
which I call the theory-theory of human uniqueness comprise some of the most active
controversies in the field today.
These debates have significant parallels with that between Darwin and Morgan,
not least of which is that various sides start with conflicting views about the versa causae
of human cognition, and these views have ramifications for their stances on mental
continuity with other animals. A century after Morgans work, significant disagreements
remain regarding both the plausibility of this account and which tests, if any, would
reveal the capacity to reason about abstract relations.
In Chapter 2, I will present several modern revivals of Morgans view, identifying
a common core that they share. In Chapter 3, I will show that the dialectical problem that
arose in Morgans presentation of the view i.e. in order to deny that animals are
theorizers, Morgan undermined the epistemic utility of theories also plague
contemporary accounts. Then, in Chapters 4 through 6, I will consider whether and how
this problem can be resolved.

57
Chapter 2: The Theory-Theory of Human Uniqueness

1. Introduction to the Theory-Theory of Human Uniqueness


In the previous chapter, I examined a debate between C. Lloyd Morgan, on the
one side, and Charles Darwin and David Hume, on the other, about whether a Humean
empiricist theory of mind could account for the abstract reasoning that seems to
distinguish us from other animals. In this debate, the claim of strong continuity with
animals rises or falls with the empiricist theory of the human mind.
This debate has returned with a vengeance in contemporary controversies
regarding the nature of human cognition and of our mental continuity with other animals.
Like Morgan, many researchers today argue that empiricism cannot explain the abstract
knowledge and reasoning that humans exhibit. However, a puzzle remains for how to
account for these abilities. In an influential review, Gopnik and Wellman (2012) describe
the problem thusly:
The study of cognitive development suffers from a deep theoretical tensionone
with ancient philosophical roots. As adults, we seem to have coherent, abstract,
and highly structured representations of the world around us. These
representations allow us to make predictions about the world and to design
effective plans to change it. We also seem to learn those representations from the
fragmented, concrete, and particular evidence of our senses But how can the
concrete particulars of experience become the abstract structures of knowledge?
(1085).
Serious challenges have been raised for our ability to answer this final question, including
arguments which purport to show that experience cannot lead us to new abstract,
structured knowledgemost famously, Fodors (1975) argument for radical concept
nativism. In light of these challenges, some psychologists have argued for the empiricist
thesis that humans do not actually possess such abstract, structured knowledge (e.g.

58
Thelen & Smith 1994), while others have embraced the nativist claim thesis that this
knowledge exists but is innate (e.g. Margolis & Laurence 2013).

1.1 The theory-theory of human cognition


However, an increasingly influential research program has arisen in the last
twenty years that maintains that humans can develop genuinely abstract, structured
knowledge from their perceptual experience of the world, where this knowledge goes
beyond the information delivered by experience. It responds to Fodors challenge by
pointing out that we do have one paradigm examplesciencein which genuinely new
concepts were developed from observational evidence. Scientists have developed new
abstract, structured concepts that they use to make predictions about the world, concepts
like electron, gravity, or natural selection. Perhaps, then, individual humans can
arrive at new abstract concepts and knowledge via the same processes by which new
scientific concepts and knowledge are formed.
Hence, the theory-theory of human cognition was born. This theory posits a strong
continuity between scientific reasoning and the reasoning of individual humans and thus
a strong analogy between the processes of concept formation and usage in human
cognition and in scientific practice37. To wit, two of the most prominent defenses of the
theory-theory of human cognition characterize the view as follows:

37

The theory-theory has adopted the philosopher of science Nancy Nersessians continuity thesis that the
problem-solving strategies scientists have invented and the representational practices they have developed
over the course of the history of science are very sophisticated and refined outgrowths of ordinary
reasoning and representational processes (1992, 5).

59
The theory-theory holds that intuitive framework theories38 are similar to metaconceptually held scientific theories in many respects, including important aspects
of the mechanisms through which they are constructed (Carey 2009, 484).
The central idea of this theory is that the processes of cognitive development in
children are similar to, indeed perhaps even identical with, the processes of
cognitive development in scientists (Gopnik and Meltzoff 1997, 3).
According to the theory-theory, human knowledge and reasoning share various
conceptual, functional, and dynamical features with theories in scientific practice. Like
scientific theories, intuitive human theories involve coherent, abstract, causal
representations of the world, [and] often, they include unobservable hidden theoretical
entities (Gopnik & Wellman 2012, 1086). They have the same functions of making
predictions about unobserved states of the world, interpreting the sometimes ambiguous
evidence provided by the senses, and supporting counterfactual inferences. Lastly, they
both change in light of new evidence, sometimes drastically.
The theory-theory has grown into a tremendously fruitful research program over
the last two decades. It has been used to analyze conceptual growth from infancy through
adulthood, and to characterize human knowledge in social, physical, causal,
psychological, and numerical domains39. Recently, the theory-theory has incorporated
advances in philosophy of science and machine learning to characterize human theories
as probabilistic causal models, and the resulting cross-disciplinary approach comprises
one of the liveliest areas of research in cognitive science today (see Gopnik et al. 2004,
Gopnik & Wellman 2012 for summaries).

38

Where framework theories are the theories that ground the deepest ontological commitments and the
most general explanatory principles in terms of which we understand the world (Carey 2009, 22).
39
For the most comprehensive summaries of the theory-theory, see Carey (2009) and Gopnik and Meltzoff
(1997).

60
Similarly to Morgans account of the true causes of human psychology, the
theory-theory of human cognition is often accompanied by the complementary hypothesis
that this capacity to theorize is uniquely human.

1.2 The theory-theory of human uniqueness


After Darwin, and particularly after significant discoveries of rudimentary tool
use and culture in other animals in the latter half of the twentieth century, the nearconsensus view among comparative psychologists was that human and animal minds are
different in degree rather than kind (Pepperberg 2005). Certainly, differences exist in
many domains of cognition, but the view was that these differences were to be explained
either by adverting to variation in the same underlying mental causes (attention, memory,
etc.) or by positing piecemeal evolutionary changes in separate domains, e.g. that the
evolution of recursion in humans explains differences in language (Hauser et al. 2002),
the evolution of shared intentionality in humans explains differences in social cognition
(Tomasello and Rakoczy 2003), and so on.
However, researchers aligned with the theory-theory of human cognition have
challenged this view. They argue that there is a real and striking discontinuity between
humans and other animals in many domains of cognition. In an influential statement of
the view, Penn et al. (2008) argue:
Darwin was mistaken: The profound biological continuity between human and
nonhuman animals masks an equally profound functional discontinuity between
the human and nonhuman mind. Indeed, we will argue that the functional
discontinuity between human and nonhuman minds pervades nearly every domain
of cognition from reasoning about spatial relations to deceiving conspecifics
and runs much deeper than even the spectacular scaffolding provided by language
or culture alone can explain (110).

61
Further, theory-theorists argue that the very behaviors that separate humans from
other animalssophisticated language, tool use, culture, religion, and scientific
reasoningare all manifestations of and require the capacity to theorize. Therefore, they
hypothesize, the evolution of this capacity in the hominid lineage was the crucial
innovation that spurred radical changes throughout human cognition and was a necessary
step on the way to language, tool use, and so on. I will call this general class of views the
theory-theory of human uniqueness which is committed to the claims that human concept
formation and knowledge proceeds via the development and application of theories, akin
to those used in science, and that it is this feature of cognition that is unique to humans
and explains many of the unique behaviors which distinguish us from other animals.
This view would seem well-suited to address one of the central challenges of
comparative psychology that I discussed in Chapter 1. How do we explain the many
apparent behavioral discontinuities between humans and our closest primate relatives,
given the relatively short period of time since our last common ancestor? Postulating
separate evolutionary changes to explain each individual cognitive difference seems both
ad hoc40 and evolutionarily implausible. It requires that an enormous amount of changes
occurred in the six million years since our last common ancestor and are compatible with
our very strong genetic similarity with chimpanzees. Thus, comparative psychologists
have argued that we should look for a more unifying theory of our differences with other
animals. In their comprehensive book on primate cognition, Tomasello and Call put the
challenge as follows:

40

Hence, the criticism of some nave Evolutionary Psychology approaches as providing just so stories.

62
The basic puzzle is that, although 6-8 million years is a very long time
historically, it is a short time evolutionarily Given this evolutionary proximity,
any cognitive differences we observe must be based on a very delimited set of
biological adaptations. In attempting to characterize human cognition as a special
case of primate cognition, therefore, the challenge is to find a small difference
that made a big differencea small change, or set of changes, that transformed
the process in fundamental ways (Tomasello & Call 1997, 401, italics added).
According to the hypothesis that I will be examining in the remainder of this dissertation,
the capacity to theorize was this small difference that made a big difference.

1.3 Requirements on a theory of uniqueness


In order to be a plausible account of the cognitive differences between humans
and animals, the theory-theory of human uniqueness must offer satisfying answers to the
following questions:
(a) What is the unique epistemic function of theories such that a theory user is
capable of behaviors that she would not be capable of performing if she
lacked a theory, and how do theories satisfy this function?
(b) How could we know whether a subject is a theory user?
With respect to the first question, the theory-theory identifies the capacity to
theorize as the major evolutionary innovation in the human lineage; it is in virtue of
having theories that humans are capable of cognitive feats that are beyond the ken of our
animal ancestors, and the novel behaviors that result from using theories explain our
incredible evolutionary success. Thus, theories must fulfill some unique and important
function that would not be fulfilled by representations of observable events and properties
alone, and it is incumbent upon proponents of the theory-theory to specify what this
function is and how theories uniquely satisfy it.

63
Further, I am assuming here (with some caveats to follow) that this unique
function is distinctively epistemological; that is, though theories might contribute to our
abilities to, say, form normative judgments about other agents or form religious beliefs
that play some pragmatic role in human life, that at least for most theories most of the
time, their adaptive role is to deliver predictions or explanations. At this point, I will not
argue for this assumption, though arguments for it will emerge in my explorations of the
epistemic roles that theories do play. However, I will note that the analogy between
human cognition and scientific theorizing upon which the theory-theory is predicated
motivates this (defeasible) assumption.
The second question constitutes a demand that the theory-theory proponent
provide conditions for the testability of her theory. She should specify what would count
as evidence for its various claims. First, and most obviously, in order to assess whether it
is true that humans are theorizers and animals are not, we need some methods for
determining which observable behaviors are evidence of an underlying theory. Second,
and more subtly, in order to support their claim that unique human behaviors, such as
language or tool use, are produced by theorizing, the theory-theorist owes an account of
how theorizing contributes to those behaviors.
These two questions are closely related. To answer either, she must explain how
theory-use manifests itself in behavior; that is, she must specify which behaviors we
should expect from a theory-user that we should not expect from a subject reasoning
about observable properties alone. Such an account would allow the theory-theorist to
begin answer to question (a); for example, she could evaluate whether the behaviors she
has identified are themselves particularly adaptive or are necessary for other adaptive

64
behaviors, such as social communication or tool use. Additionally, knowledge of which
behaviors are likely to be exhibited only by theory-users permits an answer to question
(b); if experiments and observations show that these behaviors are only exhibited by
humans, then the theory-theorist would have evidence for her claim that theory-usage
marks a discontinuity between animal and human cognition.
To answer these questions, then, it will be important for theory-theorists to
identify the roles that theories play, such that we can understand their ramifications for
behavior. Unfortunately, despite the centrality of theorizing on their view, few theorytheorists have devoted significant attention to their unique epistemic functions. Further,
as we will see, they sometimes express that this function is (at best) unknown, or (at
worst) non-existent, given some powerful arguments purporting to show the epistemic
disutility of theories. An analysis of this latter possibility will comprise the bulk of this
dissertation. The weaker possibility is expressed by Povinelli in one of the most
comprehensive defenses of the theory-theory of human uniqueness, where he writes:
But why would such a novel mental system have evolved in the first place? What
startling new behavior or behaviors became possible in the advent of this new
way of processing information about the world? My default (and most honest)
answer is I have no better idea than anyone else But another part of the story
is that as our lineage has evolved, both genetically and culturally, humans have
employed this domain-general ability to support all sorts of uniquely human
enterprises. And so today, as we survey the staggering diversity of purposes to
which human cultures have put this system to work, its obvious that much has
accreted upon this causal-symbolic edifice (Povinelli 2012, 26).
My task in the rest of this dissertation is to examine if and how it is possible for
theoretical capacity to lead to novel human behaviors, behaviors which explain why
theorizing was the tremendous adaptive step which separates us from all other animals on
earth.

65
To test the plausibility of the theory-theory, I will draw on work in the history of
the philosophy of science regarding the function that theories play in science to see if it
can shed light on the function that theories play human (and possibly animal) cognition.
To begin down this road, it will first be necessary to get a sense of what theories are in
order to have a working version of the theory-theory of human uniqueness on the table.

2. What are theoretical beliefs?


2.1. Overview
The hypothesis that the capacity to theorize was the key evolutionary innovation
separating human and animal cognition has been offered by many authors in the
comparative psychological literature, in many different guises. What most of these views
have in common is that they posit a type of representation of which humans alone are
capable41; typically, this is be a representation of a particular type of relation. However,
they disagree about what type of representation this is and about the specific capacity
animals lack which prevents them from representing it. For example, theory-theorists
have identified the crucial representations distinguishing human from animal cognition
as:
1. Explicit representations of relations (Morgan 1896, Karmiloff-Smith 1995)
2. Beliefs about unobservable entities and properties (Vonk and Povinelli 2006)
3. Second-order relations of relations (Oden and Thompson 2000, Shettleworth
2010)
4. Metarepresentations (Suddendorf 1999, Sperber 2000)
5. Role governed relations akin to those of a Physical Symbol System (Penn et al.
2008)
6. Structure mapping (Gentner 2010)
41

Some theory-theorists have identified other, non-representational changes that explain why humans can
theorize but animals cannot. For example, Buchsbaum et al. (2012) argue that the extended period of
immaturity in humans, paired with a propensity for pretend play, explain why humans develop into theoryusers while animals do not.

66
Before we can consider objections to and defenses of the theory-theory of human
cognition, it will be important to have on the table a single statement of the view which
captures the essence of these various formulations. Below, I will give a brief
characterization of several statements of the theory-theory, and then I will identify a
common core that will suffice for further evaluation.

2.2 Representing relations


It will be helpful to begin with what I take to be the first modern statement of the
view from C. Lloyd Morgan. As I showed in Chapter 1, Morgan argues that only humans
have attained the highest ideational or rational level of cognitive development, which
is distinguished from lower levels by the capacity to explicitly represent relations
themselves. For Morgan, the inability to represent relations at all precludes animals from
representing more complex relations, such as relations-of-relations or those with relata
that have not been associated by contiguity or perceptual similarity42. Morgan argues that
while animals are merely sensitive to the relations that they act in accordance with,
humans are capable of reflecting on the relations that are given in experience,
reinterpreting and explicitly representing them as relations.
Morgans view finds its closest contemporary counterpart in the Representational
Redescription Hypothesis from Karmiloff-Smith. She argues that humans and animals
share similar core systems of innate knowledge and associative mechanisms for learning
through experience. While the domain-specific and largely implicit knowledge delivered

42

He writes, only in the light of reflection are the relations which are involved perceived and conceived as
such when through reflection and the perception of relationships, we rise above the plane of senseexperience, when to associations by contiguity or by superficial resemblance, there are added associations
through perceived similarity of relations (Morgan 1896, 284).

67
from these mechanisms often suffices for behavioral mastery within a tightly-constrained
domain, it cannot be extended to perceptually distinct domains or used to theorize43.
According to Karmiloff-Smith, only humans have the capacity to explicitly reformulate
this implicit knowledge so that it can be used in novel domains:
My claim is that a specifically human way to gain knowledge is for the mind to
exploit internally the information that it has already stored (both innate and
acquired), by redescribing its representations or, more precisely, by iteratively rerepresenting in different representational formats what its internal representations
represent (Karmiloff-Smith 1995, 15).
Clearly, if the capacity to theorize requires the ability to explicitly represent theoretical
relations, then an organism that lacks the ability to explicitly represent relations at all will
be incapable of theorizing. This formulation of the theory-theory postulates a very deep
and wide divide between humans and other animals as it denies that animals have the
basic representational building blocks necessary for theorizing.
If this attempt at formulating the theory-theory were successful, then it would
indeed establish a strong discontinuity between human and animal cognition.
Historically, this view has found a comfortable home in traditional associationist
accounts depicting animals as unreflective, stimulus-response machines whose behaviors
are automatically elicited by environmental stimuli, unmediated by intervening concepts.
However, this radical version of the theory-theory is increasingly untenable in light of
findings that complicate the clean distinction between rational human cognition and
brutely associationist animal cognition. More specifically, this old view has been
challenged by experiments that have uncovered sophisticated learning behaviors in

43

The type of distinction she has in mind is roughly similar to the distinction between knowing how and
knowing that. A person may have mastered her golf swing through intensive practice but be incapable of
describing to someone else how to swing a club. To do this, she must transform her implicit knowledge
how into explicit knowledge of the mechanics, forces, and bodily positions involved in her golf swing.

68
animals and suggest that they form stable and persistent representations that they use in
categorization and prediction. Further, some of these seem to be genuine representations
of relations.
One influential finding was the discovery of backward blocking in animals. To
understand how these experiments upset the traditional associationist picture, first
consider a standard associationist model of Pavlovian contingency learning. Suppose an
unconditioned stimulus (US), i.e. food, that elicits an unconditioned response (R), i.e.
salivation, is paired with a conditioned stimulus (CS), i.e. a bell, such that after repeated
trials, CS elicits R even in the absence of US. According to traditional associationist
models, like the Rescorla-Wagner model, the strength of the association between a CS
and R can only be altered by trials on which the CS is present. It is this feature of
standard associationist models that make them ripe for behaviorism. If any change in the
behavior or response rate of the subject is accompanied by the CS or US, then it is
possible to explain that change by referring to features of the present experimental set-up
alone without positing that some internal representation of those features had changed.
Even if one is not inclined toward behaviorism, one can posit that the CS or US elicits
some expectancy or implicit sensitivity on the part of the subject in these trials even if the
subject does not explicitly represent the relation between the US or CS and R.
These models have been challenged by observations of so-called backward
blocking in both human and non-human subjects (Shanks 1985, Miller and Matute 1996,
Urushihara and Miller 2010). In a backward blocking experiment, the subject first learns
a contingency between a compound CS (A+X+) consisting of two cues, A and X, and R.
Then, one of these cues, A, is presented in conjunction with R, with X absent (A+X-).

69
For example, a light and tone might be paired with a food reward in training, and then,
the light alone is presented in conjunction with the food.
Researchers that have found that in the right conditions, the association between
X and R is weakened after exposure to positive pairings of A and R. Standard
associationist models cannot account for this phenomenon, since the associative
relationship between an absent cue and a response changed. Psychologists have offered
interpretations of backward blocking of varying strength, ranging from associative
explanations that slightly amend the traditional models, to those that attribute to animals
beliefs about the causal power of the conditioned stimuli to produce R44 (summarized in
Penn and Povinelli 2007b).
For my purposes, it is sufficient to note that such experiments lend credence to the
hypothesis that subjects exhibiting backward blocking had separate representations of
each cue in the paired CS and their relations with the outcome, R. An observation of the
pairing of R with A alone is the relevant stimuli in this case, and plausibly, this is
information about a relation between A and R. On a natural interpretation of these results,
this information changes the subjects behavior with respect to a different relation, that
between X and R. Further, there are several reasons to think that these relations are both
represented by a subject undergoing backward blocking.
First, the relation between X and R is altered even in the absence of an
observation of X. Thus, because X is not present on those occasions in which behavior
44

For instance, De Houwer and Beckers (2003, 346) suggest that animal subjects in backward blocking
experiments undergo a propositional, inferential process, obeying the deductive rule: If Cue A on its own
causes the outcome to occur with a certain intensity and probability, and if Cue A and T together cause the
outcome to occur with the same intensity and probability, this implies that Cue T is not a cause of the
outcome. See De Houwer et al. (2005) and Cheng (1997) for further discussion of this causal reasoning
interpretation, and Blaisdell et al. (2006) for similar experiments purporting to show complex causal
reasoning in rats.

70
changes, the traditional behaviorist explanation is not available and the best explanation
seems to be that the subject had some persisting representation of the relation between X
and R. Further, the subjects sensitivity to the relations between A and R on the one hand
and X and R on the other are independently manipulable. The picture that suggests itself
is that the subject had formed separate representations of the relations between each of
the stimuli and the reward and information about one of these relations was used to
update beliefs about the other.
In light of these results like these, along with others, most contemporary
comparative researchers now admit that nonhuman primates behavior is conceptually
mediated by representations of categories, including relational categories45. Most
contemporary proponents of the theory-theory of human uniqueness have followed suit
and have backed away from the extreme formulation of the view (Penn and Povinelli
2007b). Instead, they have argued instead that what distinguishes humans is the capacity
to represent a particular kind of relation, namely theoretical relations. For example, in
their presentation of the theory-theory of human cognitive development, Gopnik and
Meltzoff argue:
There is nothing intrinsically theoretical about [the kind of metarepresentational
development posited by Karmiloff-Smith] From the point of view of the theory
theory, the developmental work is done not by the fact that you represent your
own representations but by the particular things that you represent about them
(Gopnik and Meltzoff 1997, 70).

45

In an influential review, Tomasello and Call write: In all, it is noteworthy that even learning theorists,
whose roots are mostly in behaviorism, have been led to posit cognitive processes to explain the behavior
of nonhuman primates. Primates and other animals do not just react to particular physical stimuli. They
remember things, they learn about categories of things, and they acquire general strategies to use across
situations (1997, 135).

71
As the above list illustrates, there is no consensus among comparative psychologists
about which representations are the distinctively theoretical ones. This confusion is
understandable; indeed, the project of specifying exactly which relations are theoretical
has a long and fraught history within the philosophy of science.

2.3. Unobservability
Philosophers of science have most typically drawn the distinction between
theoretical and non-theoretical relations in terms of observability; that is, a theoretical
relation is one that contains an unobservable entity as at least one of its relata. This focus
is understandable given that philosophers of science have largely been interested in the
question of whether we are justified in our beliefs about unobservables (Psillos 1999).
This view was taken up by Vonk and Povinelli (2006) in their unobservability hypothesis:
One of the important ways in which humans differ from other species is that our
minds form and reason about concepts that refer to unobservable entities or
processes (see Povinelli, 2004). In short, we explore the possibility that, whereas
many species form concepts about observable things and use those concepts in
flexible and productive ways, humans alone think about such things as God,
ghosts, gravity, and other minds Although many minds are adapted to represent
events in terms of their observable properties and are able to extrapolate certain
rules from these representations, these minds do not posit unobservable entities or
processes as mediating variables to explain or to predict observable events or
states (2006, 364-365).
While this characterization of theories has a long pedigree within the history of
philosophy of science, it may also inherit some of the problems that have been raised
against it there. Most prominently, philosophers of science have argued that there is no
clear distinction between observable and unobservable entities to be made (and therefore,
no distinction between observable and theoretical relations and no distinction between
representations of them either). The attack on this distinction has had two main fronts.

72
Both of these can be dealt with rather easily by the unobservability hypothesis within
comparative psychology, so I will only discuss them briefly before turning to what I take
to be a more damaging criticism of the view.
First, philosophers have noted that the observability of an entity is relative to the
capacities of the observing agent. These capacities vary among agents, most notably
through time, as technological changes have rendered previously unobservable entities
observable (Maxwell 1962). The worry, then, is that there either is no distinction to be
drawn at all, or if there is, the differences between observable and unobservable entities
are so diffuse and context-dependent, that it is impossible to create a principled and
absolute distinction between terms which refer only to unobservable entities and terms
which refer only to observable ones (Psillos 1999, 24). However, in comparative
psychological investigations, it will often be clear what the relevant context is; for
instance, the question of whether a chimps belief is about an unobservable entity will
depend on whether the entity is unobservable with respect to the chimpanzees perceptual
capacities46. Here, at least, the distinction can be drawn in the way suggested by van
Fraassen:
The human organism is, from the point of view of physics, a certain kind of
measuring apparatus. As such it has certain inherent limitationswhich will be
described in detail in the final physics and biology. It is these limitations to which
the able in observable refersour limitations, qua human beings (1980, 17).
A second problem raised for the observable-theoretical distinction is that
observations in science are theory-laden in various ways. Scientific instruments or
measuring devices deliver data that are treated as observable by scientists but which
depend on background theory for their interpretation and validity (Feyerabend 1962).
46

I am not claiming that it is obvious what entities or properties are in fact observable from the point of
view of human and non-human subjects.

73
Users of competing theories may find different perceptual properties to be salient or,
more controversially, may have wholly different perceptual experiences of the same
properties (Kuhn 1962). The topic of cognitive penetration of perceptual experience has
developed into a lively and interesting area of inquiry, to which I cannot do service here
(but see Stokes 2013 for a recent overview).
However, the theory-ladenness of observations does not seriously undermine the
observable-theoretical distinction, at least for the purposes to which theory-theorists want
to put it, for observations will be theory-laden only for subjects who possess theories47.
While there may be no clear demarcation between a theory users observable and
theoretical beliefs, there may still be a clear demarcation between theory-laden
observations and the non-theory-laden observations of users who lack theories, and this
distinction is all that is needed for the theory-theorys uses48. Put another way, the theorytheorist can grant that cognitive penetration of perception occurs, but it is still a live
question of which kind of concepts (theoretical or non-theoretical) are doing the
penetrating.
Thus, the traditional arguments which attempt to undercut the notion of an
unobservable/observable distinction are not troublesome for the unobservability
hypothesis. However, there is yet another type of argument from the philosophy of
science that does pose a serious problem for this formulation of the theory-theory of
human uniqueness. The problem is that some theories both scientific and intuitive do
not seem to posit unobservable entities, at least in any robust sense, or if they do, these

47

On the other hand, it is possible that non-theoretical concepts (such as categorizations of observable
properties, like prototypes) could penetrate perception.
48
See Gopnik and Meltzoff (1997, 44) for a discussion of theory-ladenness of observation from a theorytheory point of view.

74
entities do not seem to be doing the real epistemic work of theories. Thus, the
unobservable/observable distinction, even if tenable, does not seem to cross-cut the
theoretical/non-theoretical distinction.
In the philosophy of science, this problem has been emphasized by advocates of
the Ramsey-sentence and structural realist approaches to theorizing. Though a full
examination of these views is not possible here, we can briefly consider some of the main
points that their proponents have used to deny that unobservability is the hallmark of the
theoretical.
Structural realism arose as an attempt to strike a middle ground between scientific
realism roughly, the view that we are justified in believing that the unobservable
entities posited by our best scientific theories are true and the antirealist position that
we are at best justified in believing in the mere empirical adequacy of these theories
(Worrall 1989). More specifically, structural realism attempts to give an account of
theories that can accommodate the strong arguments for and against traditional scientific
realism. The strongest argument in the realists arsenal is the no miracles argument
which states that the predictive success of our best scientific theories would be
miraculous if the unobservable entities that they posit did not exist. The antirealists
counter this argument with one that seems equally compelling the pessimistic induction
which argues that nearly all scientific theories, even those that claimed to be the best of
their day, have been found to be false. The unobservable entities that they posited
phlogiston, the ether, and the like were shown not to exist and were dispensed with in
the theories that would supplant them. Therefore, the argument goes, we should believe

75
that in the future, the unobservable entities posited by our best theories today will go the
way of phlogiston.
Structural realism attempts to capture the idea that progress and predictive success
in science are no miracle while acknowledging that the unobservable content of theories
changes, sometimes drastically. The basic idea is that the structural components of a
theory the mathematical, causal, or functional relations that the unobservable and
observable components of the theory bear to one another is often maintained through
periods of scientific change, though the realizers of these relations may change. Further,
it is in virtue of these structural components that theories are successful (and thus no
miracle).
One tool that structural realists have used to specify the structural content of
theories is the Ramsey sentence which reproduces all of the statements of a theory but
replaces all theoretical terms with existentially quantified variables. Thus, a theorys
Ramsified equivalent maintains its structural content while omitting reference to
particular unobservable entities (Ramsey 1950, Worrall and Zahar 2001, Ketland 2004).
The structural realist idea is that the Ramsey sentence of a theory captures its essential
epistemic content. It is in virtue of the mathematical or causal relations it establishes
among unobservables and observables that theories are successful.
If structural realism successfully captures the appeal of the no miracles argument
while evading the force of the pessimistic induction, then this is a strong point in its
favor. It also offers a promising avenue for developing the theory-theory, as similar
arguments can be raised concerning human cognition. It is undeniable that humans
sometimes do have theories of the world that make explicit reference to unobservable

76
entities. It is also true that humans enjoy a fair amount of predictive success as a result of
their theories, and a proper account of these theories would show that this is no miracle.
However, the vast majority of unobservable entities that humans posit, particularly those
posited by children or people in prescientific societies, do not actually exist. Therefore,
the theory-theorist must explain how it is that human theories are successful in spite of
the fact that nearly all of them are literally false.
We can also find support for the view that theorizing does not essentially involve
unobservable entities by looking at the practice of science, and of human theorizing more
generally. First, it seems that there are some scientific and intuitive theories that do not
seem to posit unobservable entities at all. One well-known and controversial case is that
of mathematical theory (and scientific theories that make use of mathematical theory).
Philosophers of mathematics have long debated whether the use of numbers commits one
to the existence of unobservable, abstract entities. Regardless of how this debate turns
out, the popularity of non-realist views of numbers is prima facie evidence that human
reasoners (including some mathematicians) could possess a theory of numbers without
positing that unobservable entities exist.
Second, though many theories do explicitly make reference to yet-unobserved
causes to explain observable regularities, they may sometimes remain silent about the
nature of that unseen cause; in particular, they may remain silent about whether that
cause is unobservable in principle or just in practice so far. Worrall makes reference to
one such famous case, Newtons gravitational theory. Though gravity is putatively an
observable force acting on bodies, Newton himself was famously agnostic about the
nature of this force; Worrall suggests that on the structural realist view what Newton

77
really discovered are the relationships between phenomena expressed in the mathematical
equations of his theory (1989, 122).
Another interesting case with roots in both the history of scientific biology and in
human intuitive biological theorizing, is the theory of biological essentialism, according
to which members of biological kinds share some unobserved essence which causes their
other kind-typical traits. Though scientific biology since Darwin has largely abandoned
essentialism49, humans show a strong perhaps even universal disposition towards
biological essentialism in childhood which typically persists through adulthood (Gelman
2003).
For educated adults in industrialized societies today, the usual candidate for this
essence is species-typical DNA; all tigers have tiger DNA and this causes them to have
stripes, be ferocious, and successfully breed only with other tigers. However, for children
(and some adults), the nature of the essence is unknown and may remain unknown
throughout the lifespan with no significant loss in predictive power (Carey 2009). The
essence serves as a placeholder the essence of tigers is whatever causes tigers to have
stripes, be ferocious, and successfully breed only with other tigers which may or may
not be fleshed out with beliefs about the nature, whether observable or unobservable, of

49

The extent to which evolutionary biology has abandoned essentialism is controversial. Many biologists
now reject the idea that species have some fixed, unchanging essence (Sober 1980). The essentialist view
of species has been superseded by views according to which species are historical individuals, linked by
ancestry (Hull 1984) or on which species are defined by some dispositional property, such as the ability to
produce fertile offspring (Mayr 1970). However, some philosophers of biology still hold that species do
have essences (Griffiths 1999), and competing accounts, such as Boyds Homeostatic Property Cluster
(1991) view, make room for unobserved internal mechanisms that contribute to species membership.
Complicating things further, regardless of ones views about which properties are constitutive of species
membership, there is a somewhat separable causal question about whether some unobserved property in
fact underlies all members of a species, for which species-typical DNA is a plausible candidate.

78
that essence50. Thus, some psychologists have (plausibly) concluded that while biological
essentialism does posit an unobserved entity, the ontological nature of the essence is not
epistemically crucial, and in fact, the essence of essentialist theories is in the causal
structure they posit as holding among internal causes and observable external properties
(Rehder 2007)51. Further, modern essentialist theories based on DNA (an observable)
show that it is not crucial that the essence actually be unobservable.
The dispensability of substantive beliefs about the nature of the unobserved
causes posited by theories has, I believe rightly, led some philosophers to deny that the
unobservable/observable distinction is the relevant one for distinguishing theoretical from
non-theoretical relations. Famously, Carnap (1963) went so far as to argue that all
theoretical variables could be interpreted as denoting mathematical functions rather than
any physical entities at all52. I have a more restrained view; though I think that many
scientific and intuitive theories do purport to postulate unobservable, material entities,
there are some that are not properly interpreted that way53.

3. Statement of the view


The above considerations suggest that at least some theories purport to pick out
functional or structural regularities which obtain among observable properties, where

50

This is similar to a Ramsey-sentence version of a theory of tigers, where the unobservable essence is
replaced by an existentially quantified variable.
51
I will discuss this example in more depth in Chapter 6.
52
He suggested that we can interpret the Ramsey-sentence of a theory as asserting merely that the
observable events in the world are such that there are numbers, classes of such, etc. which are correlated
with the events in a prescribed way and which have among themselves certain relations (Carnap 1963, p.
963)
53
As van Fraassen puts it: It is often not at all obvious whether a theoretical term refers to a concrete
entity or a mathematical entity. Perhaps one tenable interpretation of classical physics is that there are no
concrete entities which are forces that there are forces such that can always be understood as a
mathematical statement asserting the existence of certain functions (1980, 11).

79
these higher-order regularities are not reducible to and can be extended beyond the
particular observational-level relations from which they were derived. In this way, then, a
theory that does not posit unobservable entities may still be said to have excess content
(Hempel 1950); its content is not definable, translatable, or exhaustable by those
observable regularities (Psillos 1999, Hempel 1965, Braithwaite 1953). Accordingly,
theory-theorists have broadened their conception of theoretical relations to include
relations that do not have unobservable entities as relata. For instance, they have
characterized theories as having content that goes beyond spatio-temporal vocabulary
(Carey 2009, 195) or being phrased in a vocabulary that is different from the vocabulary
of the evidence that supports the theory (Gopnik and Meltzoff 1997, 35).

3.1. The Relational Reinterpretation Hypothesis


This expanded version of the theory-theory is given its fullest expression in the
Relational Reinterpretation Hypothesis of Penn, Povinelli, and colleagues (Penn et al.
2008, Penn and Povinelli 2009, Povinelli 2012). They argue that humans and animals are
capable of representing first-order perceptual relations. However, their claim is that
animals representations of these relations are tightly constrained by the particular
perceptual properties of the instances from which they were learned.
Humans, on the other hand, are capable of reinterpreting these first-order relations
in terms of more abstract relations that they instantiate relations with causal,
explanatory, or functional structure such that these new representations are (a) not
reducible to any particular perceptual regularity and (b) can be extended to perceptually
distinct regularities that instantiate the same higher-order relation. It is precisely these

80
types of relations and concepts those defined in terms of shared abstract structural
features that theory-theorists argue are beyond the ken of other animals.
Hence, the Relational Reinterpretation Hypothesis states that although human
and nonhuman animals share many similar cognitive mechanisms, our relational
reinterpretation hypothesis is that only human animals possess the representational
processes necessary for systematically reinterpreting first-order perceptual relations in
terms of higher-order, role-governed relational structures akin to those found in a
physical symbol system (Penn et al. 2008).
For example, both humans and nonhuman primates can form beliefs about firstorder perceptual relations such as:




Wind leads to coconuts dropping from branches.


A dominant chimp whose eyes are oriented toward food will head toward
the food.
The larger something is, the more effort it requires to lift.

Note that these relations are quite sophisticated; the first two are tertiary relations,
denoting relations that obtain between external objects and individuals, and the third
relates an increase in one perceptual dimension to an increase in another. However,
humans can reinterpret these first-order perceptual regularities to form beliefs about
abstract, non-perceptual relations such as:




Force exerted on branches causes coconuts to fall.


A dominant chimp who desires to eat the food and believes it is at location
x will head toward location x.
The effort an object requires to lift is a linear function of its weight, which
in turn is a linear function of its mass.

How do these latter regularities differ from the lower-order perceptual regularities? That
is, what unique predictive abilities do they confer on their users? I will investigate these
questions in much more depth in the coming chapters. Indeed, some authors (theory-

81
theorists included) have raised serious challenges against the claim that these
representations deliver any unique inductive capacities at all. Therefore, here, I will give
a brief characterization of the type of considerations that theory-theorists have used to
support the claim that animals are not capable of representing these abstract relations.
With respect to the first relation, a non-theorizer can predict that wind will be
accompanied by falling coconuts. However, by representing wind as but one kind of
force, a theorizer can thereby infer that any such force, including ones own shaking of
the branch, will also cause coconuts to fall. The analogy from external causes to ones
own actions seems so natural to adult humans that it is hard to imagine that doing so
requires a significant inductive leap54. However, as Tomasello and Call (1997, 389)
report, We believe that most primatologists would be astounded to see the ape, just on
the basis of having seen the wind make fruit fall, proceed to shake a limb, or pull an
attached vine, to create the same movement of the limb given how perceptually
disparate the force of the wind and the force of ones own exertions are.
With respect to the second relation, a non-theorizer may know that a dominants
head orientation predicts whether it will pursue the food. However, a theorizer can also
make more specific predictions about the conditions under which a dominant chimp who
is oriented toward the food will fail to head toward the food; for instance, if the food has
been moved, unbeknownst to the dominant, a theorist can predict that the dominant will
not head toward the food but rather where it believes the food to be. While six-year-old
human children can make such predictions, nonhuman primates routinely fail nonverbal
false belief tasks (Call and Tomasello 1999), leading many researchers to the conclusion
54

In fact, human children take a surprisingly long time to learn to bridge the gap between external causes
and their own interventions and represent both as instances of the same phenomenon. See Bonowitz et al.
(2012) and Meuntener et al. (2012) for interesting discussions.

82
that chimpanzees and other nonhuman primates do not understand the psychological
states of others. That is, nonhuman primates can predict others actions in many
situations based on past experience (and perhaps some specialized cognitive adaptations),
but they do not go beneath the surface to an understanding of the goals, perceptions,
knowledge and beliefs that guide others actions (Kaminski et al. 2008, 225).
Lastly, with respect to the third relation, a non-theorizer can predict that larger
items will require more effort to lift and anticipate the correct force to exert on a novel
object. However, a theory user can go much further than this. First, a theory of weight
unites the various perceptual manifestations of weight via a single representation. The
same thing that causes large objects to be hard to lift causes dense objects to depress
pillows and make loud noises when dropped, and therefore, these relations will obey the
same functional properties as between weight and effort to lift. Hence, observing that a
dumbbell is difficult to lift allows the theory user to predict that the dumbbell will
depress a pillow and make a loud noise when dropped, and a second dumbbell that is
even more difficult to lift will depress the pillow more and make a louder noise. In a
series of tests, Povinelli (2012) supports the surprising claim that chimpanzees never
unite all of the perceptual manifestations of weight via a common representation; they
require separate experience of an objects effort to lift and noise made when dropped,
never spontaneously generalizing from one to the other. He concludes that chimpanzees
most abstract reasoning about weight is limited to establishing and using causal relations
between particular perceptual properties of objects and their effects (15).
What all of these cases have in common is that while a first-order perceptual
relation suffices for predictions within a restricted perceptual domain, the inferences it

83
permits will be tightly constrained to it. However, by reinterpreting a known perceptual
relation as an instantiation of a more abstract relation, a theorizer seems able to extend its
knowledge about an observed regularity to make predictions about perceptually distinct
regularities that nevertheless instantiate the same abstract relation. In a seminal article,
Penn et al. 2008 put the view thusly:
Although there is a profound similarity between human and nonhuman animals
abilities to learn about and act on the perceptual relations between events,
properties, and objects in the world, only humans appear capable of reinterpreting
the higher-order relation between these perceptual relations in a structurally
systematic and inferentially productive fashion. In particular, only humans form
general categories based on structural rather than perceptual criteria, find
analogies between perceptually disparate relations, draw inferences based on the
hierarchical or logical relation between relations, cognize the abstract functional
role played by constituents in a relation as distinct from the constituents
perceptual characteristics, or postulate relations involving unobservable causes
such as mental states and hypothetical physical forces (110).
As I have argued, their final examplea relation which relates observable phenomena to
unobservable causesis but one type of theoretical relation within scientific practice.

3.2 Some unquestioned assumptions


Before moving on to consider challenges for the theory-theory of human
uniqueness, I want to state a few questions about the view that I will not purport to
answer. First, as I have characterized the view, the capacity to theorize is identified with
the capacity to represent relations with causal, explanatory, or functional structure such
that these new representations are (a) not reducible to any particular perceptual regularity
and (b) can be extended to perceptually distinct regularities that instantiate the same
higher-order relation. This is a functional definition of theorizing which accords well
with use of the term in the philosophy of science.

84
I have assumed that the hypothesized discontinuity in the capacity to theorize is a
representational difference; that is, theorizers are capable of a type of representation that
non-theorizers are not. This is an assumption of the theory-theory of human cognition in
general. If it turns out that unique human behaviors are not the result of any
representational difference, I take this to undermine the theory-theorys plausibility,
independently of any issues that I raise for the view here.
I will also not discuss the relation between these representational capacities and
language. Theory-theorists are split on this point. Some, such as Lupyan (2008) and
Gentner (2010), argue that our ability to encode relations linguistically, via external
symbols in particular, was necessary to represent the types of abstract, structured
relations posited by the theory-theory. On the other hand, Penn et al. argue that the ability
to represent these relations was a necessary precursor to language. I cannot attempt to
settle this score here, and I do not think it matters for the central claims of the theorytheory whether language was adaptive in virtue of its enabling theorizing or vice versa.
More generally, I will not attempt to give an account of how such representations
are implemented in cognition or in the brain, for two reasons. First, while there is
interesting research being performed on this issue, there is no consensus, and I cannot do
the various proposals justice here55. Second, the problems that I raise for the theorytheory will target the functional role of theories, as I have specified it, so the functional
characterization should suffice for my purposes. It is to these problems that I now turn.

55

But see the hybrid connectionist-symbolic LISA model developed by Hummel and Holyoak (1997, 2003,
2005), and the dynamic binding account from Halford & Busby (2007) for examples.

85
Chapter 3: The Logical Problem and the Theoreticians Dilemmas

In the previous chapter, I argued that in order to be a plausible account of the


cognitive differences between humans and animals, the theory-theory of human cognition
must offer satisfying answers to the following questions:
(a) What is the unique epistemic function of theories such that a theory user is
capable of behaviors that she would not be capable of performing if she
lacked a theory, and how do theories satisfy this function?
(b) How could we know whether a subject is a theory user?
In this chapter, I will discuss two arguments that raise significant obstacles to answering
these two questions, raising doubts about the plausibility and testability of the theorytheory of human uniqueness. The first is the logical problem which has been raised by
comparative psychologists as a challenge to answering question (b). It purports to show
that no experiments to date have proven, or could prove even in principle, that animals
possess theoretical concepts (Povinelli and Vonk 2004, Penn and Povinelli 2007a). The
second argument is Hempels (1965) theoreticians dilemma which has been raised by
philosophers of science as a challenge to answering question (a). It purports to show that
theories do not play any unique epistemic function because they may be dispensed with
in favor of beliefs about observable regularities alone with no resulting loss in predictive
ability.
While the theoreticians dilemma has traditionally targeted the epistemic utility of
theories within scientific practice, it presents significant obstacles to the theory-theory of
human cognition as well. First, by the theory-theorists own lights, human cognition is
continuous with scientific practice, so any argument that undercuts the scientific utility of

86
theories will present a prima facie challenge to their cognitive utility as well. Second, if
the argument proves that every prediction made by a theory-user could be made by
reasoning about observable regularities alone, then it is implausible that theory use would
be the tremendously important adaptive innovation that the theory-theory of human
cognition claims it is.
Despite the different purposes that the two arguments have been put to, I will
argue that the logical problem and the theoreticians dilemma share deep similarities.
While the similarity of two arguments from such diverse fields is independently
interesting, elucidating their common structure also suggests new avenues for developing
and testing the theory-theory of human cognition. By examining how philosophers of
science have attempted to resolve the theoreticians dilemma, we may uncover new ways
of rebutting the logical problem and shed light on how and why theory use is adaptive.

1. The Logical Problem


1.1. The genesis of the argument
In this chapter, I will argue that there is a tension at the heart of the theory-theory
of human uniqueness which arises from a conflict between a prominent argument that
theory-theorists have used to deny that animals are theorists, despite experiments that
have been widely interpreted as demonstrating theoretical ability in animals, and their
commitment to theories playing a unique and important adaptive role in human cognition.
The argument in question has come to be known as the logical problem
(Povinelli and Vonk 2004). The logical problem is so-called because it purports to show
that there is a deep problem with past attempts to discern whether animals utilize theories

87
that does not merely concern their particular methodological details but rather is a
systematic problem with logical structure of such experiments themselves. The argument
originally arose as an attack on experimental investigations of theory of mind ability in
chimpanzees, but it has since been expanded to attack experiments regarding many other
types of theorizing as well. Given that the problem has been most prominent in the theory
of mind debate, I will begin by explaining how the argument developed there and then
show how it has been generalized to other domains of theorizing.
Following Lurz (2009), I will contrast two general hypotheses about
chimpanzees beliefs about unobservable mental states. The first is the mind-reading
hypothesis (MRH) that chimpanzees have mental concepts, such as see and belief, that
they apply to themselves and others (specifically, conspecifics) for the purpose of
predicting others behaviors. This contrasts with the behavior-reading hypothesis
(BRH) that chimpanzees lack mental concepts altogether and anticipate the behaviors of
other animals on the basis of what they know or believe (from experience, inference, or
innately) about the contingencies existing between such behaviors and the observable
environment (Lurz 2009, 305-306).
The debate over whether the MRH or BRH is true has divided into two main
camps. One side has concluded that there is solid evidence from several different
experimental paradigms that chimpanzees understand the goals and intentions of others,
as well as the perception and knowledge of others (Call and Tomasello 2008, 187). The
other has concluded that after decades of effort by some of our brightest human and
non-human minds, there is still little consensus on whether or not non-human animals
understanding anything about unobservable mental states (Penn and Povinelli 2007a,

88
731); elsewhere, members of this second camp have made the stronger claim that there is
no compelling evidence whatsoever that non-humans can reason about the mental states
of others.
According to advocates of the MRH, the breakthrough experiments that, for the
first time, provided solid evidence that chimpanzees are mind-readers involved a series of
food competition experiments from Hare et al. (2000) in which a subordinate had the
choice of taking or not taking a food item in the presence of a dominant chimp
(Tomasello et al. 2003a, 154). The subordinate and a dominant were placed in separate
rooms that opened to a common chamber. In the chamber, researchers placed two food
items, one in the open and visible to both chimps, and one behind a barrier, visible only
to the subordinate chimp. The chimpanzees could see each other. When the researchers
released both chimps into the main chamber, the subordinate chimp was given a head
start. Hare et al. (2000) found that the subordinate chimp approached the food item
behind the barrier more often than the food item placed out in the open; that is, the
subordinate approached the food item that the dominant could not see.
Several versions of this experiment were performed. In one, the barrier was
replaced with a transparent barrier. In a second, the subordinate and dominant were
allowed to watch the food being hidden. In a third, the subordinate and dominant were
allowed to watch the food being hidden, but the dominant was replaced with a new
dominant that had not seen the hiding process. In each case, the subordinate was observed
to preferentially approach the food item that the present dominant had not seen.
Tomasello et al. conclude that we therefore believe that these studies show what they
seem to show, namely, that chimpanzees actually know something about the content of

89
what others see and, at least in some situations, how this governs their behavior (2003a,
155).
In response, skeptics about the MRH hypothesis have denied that Hares
experiments provide evidence that chimpanzees have a theory of mind or that any similar
experimental setup could possibly provide evidence for the MRH over the BRH. The
main argument against MRH theorists from Povinelli and his colleagues has come to be
known as the logical problem. The argument has two parts.

1.2 The logical problem Part 1


First, proponents of the logical problem argue that for any purported instance of
mind-reading reported by Hare et al., there is a purely behavior-reading hypothesis
(BRH) that can explain the subordinate chimpanzees observed behaviors equally well.
To see how this is possible, it will first be helpful to consider an MRH model of the
subordinates behavior when there was or was not an opaque barrier between the
dominant and one of the food items.

Experimental
Set-up

O1

O2

Behavior

Figure 3.1 - MRH explanation of the subordinate's behavior in a food competition experiment from
Hare et al. (2001). The boxed variables denote beliefs attributed to the subordinate.

The MRH explains the subordinates behavior by first specifying the features of the
experimental set-up. These are then posited to cause the subordinate to form a belief
about the relevant observable features of the experiment, O1; i.e. There is no opaque
barrier between the dominant and the exposed food. Then, on the basis of this belief, the

90
subordinate infers that the dominant is in some mental state, M; i.e. The dominant can
see the exposed food. On the basis of this mental state attribution, the subordinate infers
that the dominant will perform some observable behavior, O2; i.e. The dominant will
wallop me if I go for the food. The subordinates prediction that O2 then causes it to
perform some behavior; i.e. not approaching the exposed food.
The logical problem proceeds by noting that it is possible to give an analogous
model that predicts the subordinates behavior but does not contain an intervening mental
state attribution. To create this competing BRH model, we can attribute to the
subordinate the same beliefs about the observable properties of the experiment, O1 and
O2, but posit that O1 causes O2 directly without the intervening mental state variable; the
M variable is simply snipped away.

Experimental
Set-up

O1

O2

Behavior

Figure 3.2 - BRH explanation of the subordinate's behavior in a food competition experiment from
Hare et al. (2000). The boxed variables denote beliefs attributed to the subordinate.

Povinelli and Vonk (2004) describe the logical problem as follows:


The general difficulty is that the design of these tests necessarily presupposes that
the subjects notice, attend to, and/or represent, precisely those observable aspects
of the other agent that are being experimentally manipulated. Once this is
properly understood, however, it must be conceded that the subjects predictions
about the other agents future behavior could be made either on the basis of a
single step from knowledge about the contingent relationships between the
relevant invariant features of the agent and the agents subsequent behavior, or on
the basis of multiple steps from the invariant features, to the mental state, to the
predicted behavior (8-9).
Since every MRH explanation must posit that the subordinate forms beliefs about the
relevant observable properties of the experimental set-up, every instance of mind-reading

91
necessarily involves behavior-reading. Thus, Povinelli and Vonk argue that for any
instance of purported mind-reading behavior, there is an explanation of the chimpanzees
behavior positing this more limited behavior-reading alone that will fit the experimental
data equally well. In this case, a chimp that had all of the same beliefs about the
observable features of the dominant and reasons that a dominant with those features will
not punish him will behave in precisely the same way as a mind-reading chimp, and
therefore, its behavior is equally likely on either hypothesis.
The conclusion that Povinelli et al. draw from this first step of the logical problem
is that we would expect a chimp to pursue the hidden food regardless of whether it
reasoning about the unobservable mental states of the dominant or reasoning about
observable properties as sketched above, and therefore, that behavior cannot provide
evidence for the hypothesis that the chimps behavior was caused by a theory over the
more minimal behavior-reading hypothesis.
However, the first part of the logical problem does not, by itself, justify the more
ambitious claim that chimps do not utilize theoretical beliefs. The most that it can show is
that such experiments are evidentially neutral between the BRH and MRH. If these two
hypotheses are initially on a par with one another, then the first part of the logical
problem does not provide any reason to favor the BRH over the MRH. Therefore, the
theory-theorist must provide some reason for thinking that these two hypotheses are not,
in fact, on a par.

92
1.3. The logical problem Part 2
Povinelli et al. argue that in the absence of evidence favoring the MRH over the
BRH, the BRH ought to be favored because it is more parsimonious due to its attributing
fewer beliefs to the subordinate. We can interpret their argument as referring either to
token or types of beliefs.
Notice that the BRH and MRH models sketched above each attribute the same
beliefs about observable properties to the chimp, namely O1 and O2, and the MRH alone
attributes an additional belief about the mental state of the dominant56. The BRH also
posits fewer types of beliefs to explain the subordinates behavior. The BRH only
attributes one type of belief to the chimp, that about merely observable properties, while
the MRH also attributes a belief about an unobservable mental state. If the theory-theory
is correct that these are genuinely different kinds of mental representations, then the BRH
is more parsimonious with respect to types as well57.
Because the BRH attributes fewer beliefs to the subordinate, Povinelli et al. argue
that a purely behavior-reading explanation of the data is in principle more parsimonious
than its mind-reading counterpart and is thus the unavoidable null hypothesis which
current experiments give us no ground to reject:

56

We might also want to include in the accounting the subordinates beliefs about the contingencies which
hold between different states in virtue of which the subordinate inferred one from the other; i.e. the belief
about the association between O1 and O2 or M and O2. Regardless, the BRH uses fewer token beliefs to
explain the subordinates behavior.
57
If we include beliefs about the contingencies among states in the accounting, the BRH includes belief
about the contingencies between observable states, while the MRH must also include belief about the
contingencies between observable and unobservable states.

93
Thus, in order to produce experimental evidence for a [theory of mind] one must
first falsify the null hypothesis that the agents in question are simply using their
normal, first-person cognitive state variables One must, in other words, create
experimental protocols that provide compelling evidence for the cognitive (i.e.
causal) necessity of [a theory of mind] in addition to and distinct from the
cognitive work that could have been performed without such a function (Penn and
Povinelli 2007a, 734).
This argument rests on several questionable assumptions about the relevance of
parsimony to this debate. I will return to these points in Section 2.
However, before considering whether these assumptions are justified, it is worth
pausing to consider an even stronger claim suggested by this passage from Penn and
Povinelli. They imply that Hares experiments do not provide evidence for the MRH
because in this case, it is not clear that the mental state attribution plays any distinctive
causal role over and above the chimps representations of observable states alone. Indeed,
Povinelli et al. argue that comparative researchers have never specified unique causal
work that representations about mental states do above and beyond the work that can be
done by representations of the observable features of other agents past and occurrent
behaviors (ibid., 731).
There are two things worth noting about these claims. First, as I have argued
above, the question about how to test for the presence of theoretical beliefs is closely
intertwined with the question about what unique role, if any, theoretical beliefs play.
Second, we can begin to see a tension emerging here within the theory-theory of human
uniqueness. Despite their view that theorizing was the important innovation in the human
lineage that underlies many of the other behaviors that distinguish us from animals, here
Penn and Povinelli admit that the unique causal work done by theories is mysterious.

94
1.4. The logical problem, generalized
Though the logical problem was initially raised in the context of chimpanzee
social cognition, it has been used to undermine experimental tests for theoretical
conceptual ability in other domains and in other species58. In fact, doing so is often quite
easy. We can construct a simple recipe for cooking up a logical problem as follows. It
starts by characterizing a hypothetical explanation that attributes theoretical ability to a
subject.
1. For any purported act of theorizing, the subject must attend to observable
property, O1.
2. From O1, the subject infers that some theoretical property, T, obtains.
3. On the basis of T, the subject infers that some other observable property, O2,
obtains.
The logical problem is that there is always a possible belief about observables namely,
a belief about the direct relationship between O1 and O2 that would have sufficed for
the subjects prediction that O2 on the basis of O1. Further, this explanation will be more
parsimonious in that it posits fewer beliefs to the subject.
We can see this recipe at work with respect to experiments designed to test for
animals capacity to reason about the abstract relations of identity and difference. A
common experimental paradigm used to examine whether animals possess these concepts
is the Same/Different (S/D) task or the related Match-to-Sample (MTS) task.
In an S/D task, the subject is presented with a pair of objects and is rewarded for
behaving one way if the two objects are the same and another way if the objects are
different. For example, if the subject is presented with a pair of objects, AA, she is

58

For similar types of arguments, see Strevens (2000) on beliefs about essences, Andrews (2012) on beliefs
about mental states, and Carruthers (2008) on metacognitive states.

95
rewarded for selecting a star shape, but if presented with BC, she is rewarded for
selecting a square. In an MTS task, the subject must select a novel pair of objects that
exhibit the same relation as the test stimulus. For instance, if presented with pair AA, the
subject is rewarded for selecting pair BB and not BC. If presented with DE, she is
rewarded for selecting FG, not FF.
In properly controlled MTS experiments, the correct choice is the pair that
bears the same relation (identity or difference) as the stimulus pair. The correct and
incorrect choices are designed to create a mismatch between this abstract relational
property and lower-order perceptual properties. For example, when shown a stimulus pair
{small yellow triangle, large yellow triangle}, the appropriate response is to pick {small
blue circle, large blue circle} rather than {small yellow triangle, large yellow circle},
even though the latter pair has more perceptual features in common with the test pair.
According to an influential review of these procedures, success at a properly-controlled
MTS task is proof that a subject possesses the abstract concepts of similarity and
difference.
Thompson and Oden (2000) argue that a chimpanzee cannot pass the test
described above by reasoning about the overall perceptual similarity among all four
items. Instead, it must first assess that the stimulus pair instantiates the relationship of
sameness (of color and shape), then assess that the second pair of objects instantiates the
relationship of sameness (of color and shape). Then, crucially, it must judge that the
relations that hold within each pair of items are identical. They argue that since it is only
the relations that are similar across pairs of objects, not the objects themselves, a subject

96
that passes this test59 has demonstrated that it possesses a concept of sameness that is
abstracted from any particular perceptual regularity, can be extended in a productive,
rule-like fashion to new cases, and can be used to compare relations themselves.
However, Penn et al. (2008) dispute these conclusions, arguing that the logical
problem bites here too:
Regardless of which nonhuman species are capable of passing S/D and RMTS
tasks, the more critical and largely overlooked point is this: Both of these
experimental protocols lack the power, even in principle, of demonstrating that a
subject cognizes sameness and difference as abstract, relational concepts which
are (1) independent of any particular source of stimulus control, and (2) available
to serve in a variety of further higher-order inferences in a systematic fashion
(112).
Similarly to the case of chimpanzee mindreading, their argument here rests on the claim
that there is some observable regularity which would suffice to produce the subjects
behavior, and lacking any specification of the unique role played by the purported
theoretical belief (here, an abstract concept of sameness), success at the task does not
and could not provide us with evidence of its presence.
Consider again the experiment in which a subject is presented with test stimulus
AA {small yellow triangle, large yellow triangle} and offered a choice between BB
{small blue circle, large blue circle} and AC {small yellow triangle, large yellow circle}.
There are fewer perceptual feature differences across all four objects in the AA, AC
comparison than the AA, BB comparison since in the former, only one object from each
pair differs along only one dimension (shape), while in the latter, the objects from each
pair differ along two dimensions (color and shape). Therefore one might expect that a
perceptually-grounded (non-theoretical) concept of sameness would categorize AA with
59

Thompson and Oden concur with Premack (1983) that only language-trained apes are capable of passing
such tests.

97
AC rather than BB. Therefore, a subject who deems AA and BB to be more similar than
AA and AC must be using some other criterion of similarity, which Thompson and Oden
argue must be a similarity in the abstract relation the objects within each pair bear to one
another.
However, Penn et al. argue that there is a perceptual regularity on which AA and
BB are more similar than AA and AC. The subject might make an analog estimate of the
perceptual variability within each pair and select the pair of objects that is most similar
with respect to this within-pair variability. The AA and BB pairs each have one withinpair featural difference (size) while the AC pair has two (size and shape).

Stimulus
pair

Incorrect
choice

Correct
choice
Stimulus
Correct

Stimulus
Incorrect

Relation (Same/Different)
Differences

Overall feature differences


across all 4 objects

Feature differences within


each pair

Figure 3.3 Three ways of counting the number of differences between the stimulus and test pairs in
an MTS task.

Hence, there is an available perceptual property a higher-order perceptual property than


a simple count of overall feature differences by which the AA pair is more similar to
the BB pair than the AC pair. A subject could pass the test by forming a belief about this

98
higher-order perceptual feature of the data and learning that it will be rewarded for
picking another pair of objects that is perceptually similar60. Penn et al. conclude:
At best, the [MTS] task demonstrates that nonhuman animals can select the
choice display that has the same degree of between-item variability as the sample
display. But the task says nothing about nonhuman animals ability to evaluate the
non-perceptual relational similarity between those relations (ibid.).
By analogy with the mindreading case, we may note that on the hypothesis of Oden and
Thompson, a chimp using the concept of sameness must first form a belief about the
perceptual similarity between items in the AA and BB pairs, then infer that each pair
instantiates the relation of sameness. This belief, paired with the belief that it will be
rewarded when it picks two pairs of objects that both share the relation of similarity,
causes it to choose correctly. However, a subject that merely had a belief about the
perceptual variability within the AA and BB pairs, plus the belief that it will be rewarded
for picking two pairs that have the same level of perceptual variability, will make the
very same choice. Therefore, it is possible to explain the subjects behavior equally well
without attributing to it a belief about the abstract relations of identity or difference, and
according to Povinelli et al., to so would be simpler.

2. Parsimony
The second part of the logical problem states that because the BRH is simpler
than the MRH in virtue of attributing fewer beliefs (token and type) to the chimpanzee
it is the null model that must be shown to be false through experimental evidence.
Notice that in order for the logical problem to buttress the claim, central to the theory-

60

Notice that this requires that the subject judge that two perceptual relations are similar to one another.
Recall the discussion in Chapter 1 about whether a relation such as {overall perceptual variability} could be
present in the subjects sensory impressions of the pairs.

99
theory of human uniqueness, that animals are not theorizers, it must be the case that the
BRHs advantage in parsimony gives us a reason to believe that the BRH is more likely
to be true. Thus, there are three assumptions about parsimony that Povinelli et al. are
making. The first is that parsimony matters; that is, if two hypotheses explain the data
equally well, then we ought to favor the simpler one. The second is that the BRH is in
fact more parsimonious. The third is that if the way that parsimony matters is by
specifying the null model, then we ought to believe the null model is true. However, all
three of these assumptions are seriously problematic.
With respect to the first question, I generally agree with the deflationary account
of simplicity (Sober 1990, Sober ms, Fitzpatrick 2009), which is characterized by
Fitzpatrick (2009) as follows:
The deflationary account departs from the standard view in that it denies that
simplicity should be seen as a general theoretical virtue and criterion for theory
choice in its own right. There is no adequate general justification for favoring
simple theories over less simple ones. However, the claim is not that we are never
justified in preferring simple theories to less simple ones. Rather the claim is
that in cases where we do seem to be justified in preferring theories that are
simpler in some particular respect, some other consideration is doing the real
epistemic work. Typically, what is doing the real work are various background
theoretical considerations, often specific to the scientific context at hand (269).
According to this view, then, if parsimony gives us a reason to favor the BRH it is
because the extra entities postulated by the MRH are implausible for independent
reasons61. Viewed this way, parsimony is quite inert. Povinelli et al. need to provide
some reason to think that the beliefs postulated by the MRH do not exist. The situation
gets murkier once it is acknowledged that claims about parsimony are elliptical for some
further theoretical considerations, for there are many different senses in which the BRH
61

I view my discussion of two competing parsimony principles in Chapter 1 as part of this project. I argued
that these principles are actually manifestations of deeper disagreements about the nature of human abstract
reasoning.

100
might be more parsimonious than the MRH, or vice versa, and it is unclear how to
compare them.
In Chapter 1, I referred to competing presumptions of parsimony in the overall
debate over continuity with animals. Parsimony in the sense of unification seems to favor
the MRH; it recommends that we explain human and animal behaviors via the same
underlying mechanisms, and if we would explain a humans behavior in food competition
experiments by attributing a theory of mind, perhaps we should do so for animals as well.
This approach receives some support when paired with considerations of phylogenetic
parsimony. Given the small amount of time since our last common ancestor with
chimpanzees, the most phylogenetically parsimonious hypothesis will be one that posits
the fewest evolutionary changes in the hominid lineage. Thus, perhaps the most
phylogenetically parsimonious hypothesis is that our last common ancestor was capable
of mindreading and so too are modern chimps (de Waal 1991, Sober 2012, though see
Sober 2005 for a critical discussion).
The MRH is also more unifying in that it can explain the chimps behavior in all
of Hares experiments via a single mental state attribution. According to Call and
Tomasello, the BRH is less parsimonious because behavioral or contextual rules might
be concocted to explain the results of any one of the seven studies in which the
chimpanzees react to or predict the behavior of others, but this requires many different ad
hoc behavioral and contextual rules for which there is absolutely no positive evidence
(2008, 189).
While I can not do service to these various proposals here62, at the very least, they
show that there are many senses of parsimony that may be relevant to the primate
62

See Fitzpatrick (2009) and Meketa (2014) for more comprehensive discussions.

101
mindreading debate, and they do not give an unequivocal verdict in favor of the BRH.
What about the sense of parsimony that Povinelli et al. identify? Typically, in
statisticalist approaches, such as Neyman-Pearson testing, the hypothesis which posits
no effect is specified to be the null hypothesis, which is compared with hypotheses that
posit additional entities or causal relationships among variables of interest. The standards
for acceptance and rejection of the null and alternative embody a presumption toward the
simpler null hypothesis (Godfrey-Smith 1994).
While it is questionable whether the logic of Neyman-Pearson testing applies to
this case and whether the choice of a null model is merely conventional, for the sake of
argument, I will grant that the there is an objective sense in which the BRH is simpler
which justifies calling it the null hypothesis in this case. The more serious problem with
the null model account of parsimony is that the Neyman-Pearson protocol specifies the
conditions under which we can accept or reject the null model, but serious objections
have been raised for interpreting these conditions as ones under which the null model is
likely to be true or false (Sober 2008, 65). Proponents of the logical problem who want to
use it to prove that chimpanzees are probably not mindreaders need some further reason
to believe that the BRH is true, not merely that it is innocent until proven guilty.

3. A better argument from parsimony


I am skeptical that the argument from parsimony found in the second part of the
logical problem provides a good reason to deny that animals are theorizers. However, I
will now turn to another type of parsimony argument that is more promising to that end.
As I noted in Section 1, in their development of the logical problem, Penn and Povinelli

102
argue that comparative researchers have never specified the unique causal work that
representations about mental states do above and beyond the work that can be done by
representations of the observable features of agents past and occurrent behaviors (Penn
and Povinelli 2007, 731).
Here, I want to suggest that if beliefs about the mental states of others do no such
unique causal work, then the MRH would postulate the existence of a belief that is
positively useless for the chimpanzee to have. Further, there are theoretical reasons to
believe that if mindreading beliefs are useless, then these are not beliefs that chimps are
likely to possess. Therefore, the BRH is to be favored over the MRH, not merely because
the MRH is less parsimonious, but because the extra entities it posits are implausible.
There are several ways that one could argue that if a type of belief is superfluous,
then it is not likely to have been causally involved in the production of behavior. The first
is that, in general, we do not expect evolution to lead to the development and
maintenance of traits when those traits do not serve (and have never served) any function
that contributes to survival and reproduction. This argument is particularly strong with
respect to theoretical beliefs, in particular, given several common assumptions63.
First, the capacity to form non-theoretical beliefs beliefs about the contingencies
among observable states alone is widely believed to have evolved first, exist on its own
in many species, and that it must be present in any purported theorizer (Karin-DArcy
2005). If theoretical capacity is an add-on to a deeper system of perceptual reasoning,
then it is mysterious why evolution would graft a useless cognitive capacity on top of one
that was sufficient, all by itself. Second, it is commonly assumed that extensive cognitive
architecture is required for theoretical reasoning and thus the metabolic resources
63

However, see Meketa (2014) for a critical discussion of the following assumptions.

103
required to build and operate a theorizing cognitive system are very high (Whiten 1995).
It is unlikely that a system carrying such high costs and providing no benefits over
existing systems would evolve.
In addition to evolutionary considerations, psychological considerations also seem
to count against the MRH if mindreading beliefs are superfluous. As Povinelli and Vonk
(2004, 10) point out, a mindreader cannot skip the step of representing the observable
regularities necessary to invoke theoretical concepts. Therefore, theorizing involves an
extra mental operation, and predictions made via an extra, superfluous inferential step are
likely to be slower and more error prone. Thus, we might wonder why even a chimp who
is capable of mindreading would go to the extra work of reasoning about anothers
mental states if simpler processes would suffice.
Therefore, there are strong prima facie reasons to doubt that theoretical capacity
would evolve or that psychological agents would expend the extra effort to engage in
such reasoning if it would serve no epistemic purpose in animal cognition while carrying
significant costs. This is a better parsimony argument than the one originally used by
Povinelli et al. In general, if theoretical beliefs do no unique causal work, then there are
evolutionary and psychological grounds for believing that the simpler theory is more
plausible.
Thus, so buttressed, the logical problem offers a strategy for advocates of the
theory-theory of human uniqueness to support one of their key claims, that animals are
not theorizers. At the same time, however, this strategy threatens to severely undermine
one of its other key claims, that theories play a unique and highly adaptive role in human
cognition. While not quite contradictory, it is at least dialectically perilous to at the same

104
time argue that theoretical beliefs would be useless for animals to have while at the same
time insisting that they have been of tremendous importance in human cognition.
Despite this tension, the argumentative strategy that I have sketched is not merely
an abstract possibility but has actually been adopted by proponents of the theory-theory.
As I showed in Chapter 1, it is clearly on display in the work of Morgan, an early
proponent of the view. He goes to great lengths to argue that we can explain even the
most sophisticated animal behaviors without attributing theoretical beliefs (for Morgan,
representations of relations) to them. Further, he argues, such attributions would be
implausible since such beliefs would be of no use to the animal. To wit, he writes
(emphases mine):
Of what practical service would it be for the fox-terrier pup to make the relations
focal in perception? I am unable to see that it would be of any practical service or
advantage to him as a fox-terrier; and being of no practical service or advantage
to him, I am unable to see what grounds we have for supposing that this faculty
has been developed in him or in his race (ibid., 243).
Where would be the advantage of perceiving relations as such, since practical
sense-experience suffices for all the needs of existence of many highly organized
animals? (ibid., 250).
On the other hand, Morgan argues that the ability to theorize is the key innovation of the
human lineage that opens the door to a whole new world of cognitive abilities, a realm
of well-nigh boundless extent (ibid., 227). In almost the same breath, Morgan repeatedly
emphasizes just how useless the capacity to reason about relations would be for animals
while building his case for human cognitive exceptionalism around that very ability. This
puts Morgan in a precarious position64.

64

As I mentioned in Chapter 1, Morgan occasionally gestures at a way of resolving this tension, suggesting
that theoretical capacity is only useful to language users, though he does not do much work to defend this.

105
All that is old is new again, for the same dialectical problem besets modern
proponents of the theory-theory of human cognition. So far, I have been treating this
argumentative strategy as an option for the theory-theorist who wants to use the logical
problem to deny that animals use theoretical beliefs (and not merely that we lack
evidence that they do). However, a more damning possibility is that this strategy is not
optional for a theory-theorist who wants to utilize the logical problem at all. The problem
that I will now examine is that the logical problem bears very strong structural
similarities to an argument from philosophy of science, Hempels theoreticians dilemma,
that purports to show that theories are epistemically superfluous.

4. The Theoreticians Dilemma and the Logical Problem


The most influential statement of the theoreticians dilemma is given by Hempel
(1965). To understand his argument, it will be helpful to first characterize his view of
theories. Hempel conceived as theories as making use of general laws that have the
function of establishing systematic connections among empirical facts in such a way that
with their help some empirical occurrences may be inferred, by way of explanation,
prediction, or postdiction, from other such occurrences (177). On his view, theories have
a layer cake structure (Lange 2000), consisting of three levels.
At the bottom are statements in the observation language of particular matters of
fact. The middle layer consists of empirical generalizations among directly observable
objects, events, and properties (the O-level). The top level consists of theoretical laws
that explain the basic-level observable regularities via the introduction of unobservable
entities, events, and properties (the T-level). These laws entail the empirical

106
generalizations in the middle layer via interpretive sentences (or bridge principles) which
connect terms of the T-level to terms of the O-level65. For Hempel, the ultimate function
of theories is to systematize observable regularities so as to deductively entail
observational predictions66. Crucially, on his view, theories do not do this directly but
instead do so indirectly via the intermediate level of empirical generalizations.

T-language generalizations
Interpretive
sentences

O-language empirical regularities


Inductive
confirmation /
explanation

O-language matters of fact


Figure 3.4 - Hempel's "layer cake" picture of theories.

Hempel provides two illustrations of this layer cake structure in scientific usage
(1965, 184-185). In the first, he describes how Newtonian theory would be used to
predict the motions of a rocket floating freely 100 miles above the moon. At the base
level, observations are made of the properties of the rocket and moon (O1, O2). These
are subsumed by the relevant empirical generalizations about the types of objects bearing
those observable properties. From these generalizations and the appropriate interpretive
sentences, inferences are made about the theoretical properties of the moon and the

65

The nature of these interpretive sentences, i.e. whether they need to provide state necessary and
sufficient conditions for the applicability of the theoretical terms or something weaker, is a matter of debate
(Hempel 1955, 194).
66
As we will see, responses to the theoreticians dilemma may reject either of these suppositions about the
nature and function of theories. Sellars will reject the claim that a theorys function is to generate new
empirical generalizations for the purposes of prediction, and Hempel himself will reject the claim that the
function of theories is to impose deductive relationships between observables.

107
rocket, e.g. their masses, positions, and velocities. For instance, if the rocket is observed
to have properties O1 and O2, then it is inferred that T1, the rocket has a mass of
100,000kg, is true; {O1&O2}  T1. Descriptions of the moon and rocket in the Tlanguage are used as input to the theoretical laws couched in terms of T-language alone,
here, Newtons law of gravitation and laws of motion. The theory then entails future
states of the moon and rocket, couched in T-terms (position and velocity), and these are
translated back into the O-language via interpretive sentences to generate predictions
about their future observable states O3.
Hempels second illustration of this layer-cake structure is the use of a
psychological theory to predict a subjects behavior. He supposes that in order to predict
a subjects future observable behaviors, the psychological theorist begins with
observations of the initial behavioral state of the subject and the observable
environmental stimuli. The theorist proceeds via bridge statements relating observable
states of a subject to hypothetical intervening variables such as drives, inhibitions, or
desires, from which future observable behaviors of the subject can be predicted (ibid.,
185). Thus, Hempels own analysis offers a surprising parallel to the mindreading debate
in which the logical problem arose.
Given that, on Hempels view, a theorys role is to generate predictions about
matter of fact, and given that it can only do so via intermediate empirical generalizations,
we can begin to motivate the theoreticians dilemma by asking what unique role is played
by theoretical laws in generating new predictions over and above the empirical
regularities themselves. For if theoretical claims are entailed by the observation claims
and bridge principles and themselves entail other observational claims, then the theory

108
does not seem to add any new information but rather serves as an additional deductive
link between observation statements.
Schematically, a theoretical inference for Hempel has the form:
1.
2.
3.
C:

O1.
O1  T.
T  O 2.
Therefore, O2.

Hempel points out that the conjunction of premises 2 and 3 of this theoretical inference,
[(O1  T) and (T  O2)], is equivalent to:
[(O1 & T)  O2]
which, in turn, is true if and only if:
[T  (O1  O2)]
The upshot is that in order for T to generate a prediction that O2 on the basis of input O1,
T must already imply the sentence If O1, then O2 (ibid., 210).
However, a puzzle immediately arises, since this empirical regularity would
suffice, by itself, for predictive purposes. In other words, in order for T to fulfill its
function, T must already entail an empirical regularity that would serve that function all
on its own. Therefore, T seems superfluous. The basic problem is succinctly stated by
Hull; If you have a secure equational linkage extending from the antecedent observable
conditions through to the consequent observable conditions, why, even though to do so
might not be positively pernicious, use several equations when one would do? (Hull
1943, 284; quoted in Hempel 1965, 186).
From this point, Hempel constructs a dilemma for the epistemic necessity of
theories (ibid.):

109
1. If the terms and general principles of a theory serve their purpose, i.e. if they
establish definite connections among observable phenomena, then they can be
dispensed with since any chain of laws and interpretive statements establishing
such a connection should then be replaceable with a law which directly links
observational antecedents to observational consequences.
2. If the terms and general principles of a theory do not serve their purpose, they are
unnecessary.
3. Terms and general principles of a theory either serve their purpose or they dont.
C: Therefore, the terms and general principles of a theory are unnecessary.
I will leave an examination of Hempels argument and his justification for its first
premise until the next section. For now, all that matters is that Hempels argument shows
that given certain assumptions about the nature and purpose of theories, it is always
possible to dispense with a theory in favor of a set of merely observable generalizations
while maintaining the same consequences for observable outcomes. This argument has
strong parallels with the logical problem; both arguments turn on the crucial premise that
for any token observable prediction (or behavior stemming from that prediction) that may
be reached via reasoning from an intervening theoretical belief, there is a purely
observational regularity that would suffice for that prediction.
Put schematically, the logical problem states that for any purported act of
theorizing (here, mindreading), the subject must attend to some observable feature of the
competitor, O1. It infers from the presence of O1 that its competitor is in state T (here, a
mental state), and then reasons that if T is present, its competitor will produce a future
observable behavior, O2. In response, the subject produces some behavior B. The logical
problem states that the following two hypotheses would each suffice to explain B:
(MRH)

a. S observes that O1
b. S believes that {O1  T}
c. S believes that {T  O2}
d. From (a-c), S infers that O2
e. Ss belief that O2 causes S to B

110
(BRH)

a. S observes that O1.


b. S believes that {O1  O2}
c. From (a-b), S infers that O2
d. Ss belief that O2 causes S to B

Therefore, the mere observation that the subject performed B (i.e. predicted that O2 on
the basis of O1) does not tell us whether it was using a theory to do so. On its face, this is
an argument about our ability to know whether a subject is a theory user, not whether
theories play some special epistemic function.
To strengthen the analogy with the logical problem, we can cast the theoreticians
dilemma in more psychological terms. It states that for any act of theorizing, the theorist
must first note that some proposition couched in observational language, O1, is true. From
this, she infers via an interpretive sentence that some proposition couched in theoretical
language, T, is true. Then, she reasons via the laws of her theory and interpretive
sentences that some further observation-language proposition, O2, is true. Suppose that
this causes her to make a prediction, B, that O2. The theoreticians dilemma suggests that
the subject could have arrived at her prediction that O2 either via a theory (TH) or by
reasoning about empirical generalizations alone (EH):
(TH)

a. S observes that O1.


b. S believes that {O1  T}
c. S believes that {T  O2}
d. From (a-c), S infers that O2.
e. Ss belief that O2 causes S to B.

(EH)

a. S observes that O1.


b. S believes that {O1  O2}
c. From (a-b), S infers that O2.
d. Ss belief that O2 causes S to B.

If the theoreticians dilemma is sound, then it is always possible to explain the


subjects prediction via (EH). Given these strong structural similarities between the

111
logical problem and the theoreticians dilemma, there is a worry that a theory-theorist
who wants to use the former argument is thereby committed to the latter. These
similarities also show that the emerging tension at the heart of the theory-theory of
human uniqueness is no accident. In order to deny that sophisticated animal behaviors are
the result of theorizing, theory-theorists diminish the unique causal role that theories play.
Hempels dilemma buttresses the theory-theorists argument here, for it establishes that
for any theoretical generalization, there will be an observational counterpart that would
suffice for predictive purposes. However, from here, it is but a short step to the claim that
theories are epistemically superfluous.
The pairing of the logical problem and the theoreticians dilemma presents a
strong prima facie challenge for the theory-theorys ability to answer either of the
questions that I posed at the beginning of this chapter. In fact, it seems to suggest that
both questions are ill-formed; there is no unique epistemic function of theories, and as a
result, there is no way to tell whether a subject was a theory-user from the predictions she
makes.
In order to resolve this tension at the heart of her view, the theory-theorist of
human uniqueness must show how it is possible to retain the insight of the logical
problem that past experiments have not provided evidence that animals are theorizers
while still maintaining that theories serve some unique epistemic function in human
cognition. This will be the task of the remaining chapters, but here, we can establish the
groundwork by elucidating some of the disanalogies between the logical problem and the
theoreticians dilemma which will open up space for theory-theorists to maintain the
former and deny the latter.

112
5. Disanalogies between the two arguments
5.1. In what sense are theories unnecessary?
In order to elucidate the differences between the two arguments, we can start by
noting a lacuna in Hempels original formulation of the argument:
1. If the terms and general principles of a theory serve their purpose, i.e. if they
establish definite connections among observable phenomena, then they can be
dispensed with since any chain of laws and interpretive statements establishing
such a connection should then be replaceable with a law which directly links
observational antecedents to observational consequences.
2. If the terms and general principles of a theory do not serve their purpose, they are
unnecessary.
3. Terms and general principles of a theory either serve their purpose or they dont.
C: Therefore, the terms and general principles of a theory are unnecessary.
Notice that the first premise states that theories can be dispensed with by their
observational-language counterparts. However, from this premise, Hempel draws a
conclusion about the necessity of theories. Thus, in its original formulation, the
theoreticians dilemma is not valid and must be amended to include an additional
premise:
1. If the terms and general principles of a theory serve their purpose, i.e. if they
establish definite connections among observable phenomena, then they can be
dispensed with since any chain of laws and interpretive statements establishing
such a connection should then be replaceable with a law which directly links
observational antecedents to observational consequences.
2. If the terms and general principles of a theory can be dispensed with, then they
are unnecessary.
3. If the terms and general principles of a theory do not serve their purpose, they are
unnecessary.
4. Terms and general principles of a theory either serve their purpose or they dont.
C: Therefore, the terms and general principles of a theory are unnecessary.
Is the second premise of this amended argument true? This will depend on what
one means by unnecessary. The following are a few ways that a theory might be

113
epistemically necessary for an agent, S, with respect to some proposition p that is
predicted by Ss theory, T:
(a) Justificatory necessity: T is justificatorily necessary for S if and only if S is
justified in believing p only if she came to believe that p by use of T.
(b) Predictive necessity: T is predictively necessary for S if and only if S could not
have predicted that p if she had not used T.
(c) Psychological necessity: T is psychologically necessary for S if and only if, in
practice, S would not have predicted that p if she had not used T.
It is this last notion of necessity that is the primary concern of the theory-theory of human
uniqueness. If it turns out that theories are psychologically unnecessary if actual human
agents can arrive at all of the predictions of a theory without actually having used that
theory then this would significantly undermine the claim that theories play some unique
adaptive role in human cognition. On the other hand, if theories turn out to be
unnecessary in the other two senses, this may not undermine the theory-theory as a view
about the cognitive practices of actual agents. First, as a psychological rather than
normative epistemic theory, it is not concerned with whether agents are justified in their
predictions but rather whether they make the predictions at all. Second, it is not
concerned with whether it is possible, in principle, to make the same predictions without
use of a theory but rather whether agents of interest (humans and other animals), such as
they are, would actually do so.

5.2 Dispensibility does not entail that theories are psychologically unnecessary
As I will discuss in more detail below, it is unclear which notion of necessity
Hempel had in mind when constructing his dilemma. However, it is clear that he did not
take it to show that theories are (always) psychologically unnecessary for creatures like

114
us. We can see why this is so by considering one of Hempels examples of a theory at
work, as well as examining the justification he gives for his first premise.
Recall the example of an agent who uses Newtons theory to derive a prediction
about the motion of a rocket floating freely 100 miles above the moon. A theory user
computes the trajectory of the rocket by using observations to determine its mass,
position, and velocity. These values are then plugged into the laws of the theory to
determine the rockets gravitational attraction with the moon and resulting trajectory.
Now, it is in principle possible to formulate a system of equations relating observable
properties of objects and their resulting movements that does not require the use of
Newtons laws67.
For example, it may be possible to construct a look-up table into which one can
input observable properties of any two objects and the table outputs their movements. To
strengthen the case, suppose that this look-up table was not constructed via Newtons
theory but by exhaustive and comprehensive observations of the motion of satellites
around various bodies. What is important to note, here, is that this is not a look-up table
that we could reasonably expect any normal human agent to possess and use. To make all
of the predictions of Newtons theory, it would need to be infinitely long. Further, the list
would look ad hoc; the entries for rockets with the same mass but different surface
properties would look entirely different.

67

This has been done, to some extent. Van Fraassen (1980, 52) discusses a purely observational theory that
is empirically equivalent to Newtons law of gravitation but omits any reference to gravity; instead, this
force is just folded into the equations describing the motions of objects. In his discussion of this case, Van
Fraassen identifies an interesting function of theories. If we discovered that the force of gravity was slightly
different than we currently believe it to be, every part of the observational-level theory would have to be
independently revised, while only a single change would have to be made to Newtons theory which would
automatically generate the other necessary changes. Thus, theories can allow for easier revisions to ones
web of belief with new information.

115
Given that the observable-level counterpart of Newtons theory would be so
unwieldy, we might reasonably infer that a person who correctly predicts the movement
of a novel satellite had done so by use of the theory. This inference would rest on what
we know about human reasoners, in particular. It is consistent to maintain that while it is
in principle possible for some agent to store and reason from the look-up table, it is not in
practice possible for the types of agents that we are. Further, given the types of
observations that would be necessary to construct a look-up table with the generality of
Newtons theory, this is not something we can ever expect to construct on the basis of
observations alone.
Hempels justification the first premise of the dilemma if a theory establishes
definite connections among observable phenomena, then it can be replaced with a law
which directly links observational antecedents to observational consequences lends
further credence to the view that theories may be psychologically necessary. The first
premise of the dilemma is established by a formal theorem from Craig (1956) which
shows that for any axiomatized theory T, stated in a language that can be partitioned into
observational and theoretical components, there is an axiomatized system, TB, which
uses only the nontheoretical terms of T and yet is functionally equivalent with T in the
sense of effecting, among the sentences expressible in the nontheoretical vocabulary,
exactly the same deductive connections as T (Hempel 1965, 213). Hempel notes that the
set of postulates of TB is always infinite. While there are some Ts that will also yield
simpler, finite axiomatizations in the observation language, in general, the resulting
system will be practically unmanageable or worse so complex that it is humanly
impossible to conceive of it all at once (ibid., 214). Thus, Hempels theoreticians

116
dilemma, taken by itself, should not be interpreted as a direct attack on the psychological
necessity of theories; his view seems to be that, at least in some cases, a theory may be
unnecessary in some senses but be psychologically necessary for creatures like us.
Hempels suppressed second premise is more plausible as a claim about
justificatory and predictive necessity. The basic idea there is that if an agent arrived at
belief p by way of a theory but there were other possible ways to arrive at p that are
epistemically respectable, then the use of a theory is necessary neither to predict that p
nor to be justified in belief that p. Craigs theorem seems to show that theories are not
necessary in either of these senses68 without showing that they are psychologically
unnecessary.
By analogy, the first step of the logical problem claims that for any prediction
made on the basis of a theory, there is a possible relation among observables alone that
would have sufficed for that prediction, namely, a direct relation between the observable
input to the theory, O1, and its subsequent prediction, O2. Thus, the first step in the
logical problem shows that theories are predictively unnecessary. However, a theorys
being predictively unnecessary does not entail that it is psychologically unnecessary. By
analogy with the rocket example, it might be that the direct relation between O1 and O2 is
not known from observations alone or that whatever process is required to deduce O2
directly from O1 is too unwieldy for actual agents to use.

68

Though see Field (1980) for a discussion of whether Craig axiomatizations are epistemically
respectable in the relevant sense to confer justification. Fields concern is whether the indispensability of
theoretical entities, particularly numbers, entails that we ought to be ontologically committed to the
existence of those entities. The discussion here is related but somewhat orthogonal; here, the question is
whether the dispensability of a theory entails that it is epistemically unnecessary, leaving open the
possibility that a theory could still be epistemically necessary without committing its users to the existence
of the entities it posits.

117
From this discussion, we can extract a general strategy for showing that a
particular theorys dispensability does not entail that it is psychologically unnecessary.
First, describe how a theory could be used to arrive at prediction p and concoct an
observational counterpart of the theory that would serve to replace it. Then, determine
whether this observational counterpart is something that an actual agent might plausibly
possess. If the counterpart is either too complex for the agent to store and reason with, or
if it is something that could not have plausibly been constructed from the agents past
observations alone, then there are good grounds for believing that the theory was
psychologically necessary.
By themselves, then, neither the logical problem nor the theoreticians dilemma
entail that theories are unnecessary for actual agents like humans and animals to
arrive at certain predictions given certain observational histories and cognitive
limitations. As I have argued above, theory-theorists who want to use the logical problem
to deny that animals are theorists need to supplement that argument, and one way they
have done so is to argue that theories are psychologically unnecessary, at least with
respect to the predictions that animals have been observed to make. In fact, I will argue
that theory-theorists have been drawn to a stronger formulation of the theoreticians
dilemma according to which the dispensability of theories does entail that they are
psychologically unnecessary. In order to argue for this claim, I will distinguish between
two versions of the theoreticians dilemma.

118
6. Two versions of the dilemma
To begin to see this distinction, consider the relationship between generalizations
at the theoretical level and regularities at the observational level on Hempels layer
cake view of theories. Hempels discussion of the theoreticians dilemma is primarily
concerned with the downward arrow from T-level laws to O-level empirical regularities.
In order to make predictions at the O-level, the T-level must entail O-level empirical
regularities. These O-level empirical regularities then suffice for predictions about
particular O-level matters of fact, predictions which can then be made without recourse to
the higher levels of the theoretical apparatus. Hence, I will call this formulation of the
argument the downward theoreticians dilemma.

T-language generalizations
Interpretive
sentences

O-language empirical regularities


Inductive
confirmation /
explanation

O-language matters of fact


Figure 3.5 - Hempel's layer cake depiction of theories.

If this formulation is sound, what does it prove? It shows that the first step of the
logical problem will hold for any instance of theorizing; for any deductive O-level
prediction made via a theory, there is a possible O-level empirical regularity entailed by
that theory that would suffice for the prediction. However, this formulation does not
prove that T was unnecessary in the first place. It merely shows that once a theory has
done its (perhaps necessary) work, we can kick away the theoretical ladder and reason

119
about its observational consequences alone. In other words, one can hold that amended
theoreticians dilemma is sound but only under an attenuated interpretation of
unnecessary, where this notion of necessity does not entail psychological (or even
perhaps predictive) nonnecessity. In fact, I think that this is the correct interpretation of
the theoreticians dilemma.
However, it is possible to formulate a version of the theoreticians dilemma that
would suffice to show that theories are epistemically unnecessary in an even stronger
sense. To repeat, the downward version of the theoreticians dilemma states that at any
time, a theory can always be replaced by the empirical generalizations it entails. The
upward version adds that the empirical generalizations that replace the theory are ones
that the user already possesses such that the theory was not instrumental in delivering
those generalizations. In other words, theories are genuinely superfluous because a theory
user must already know the very same empirical regularities that would suffice for all
predictive purposes. The basic idea is that T-level generalizations must be constructed
from known generalizations in the middle level (the upward arrow from the O-level to the
T-level).
Sometimes, it seems that Hempel himself had this upward interpretation of his
dilemma in mind. In his discussions of the progress of science, Hempel often suggests
that the identification of O-level empirical regularities is epistemically and temporally
prior to the formulation of T-level regularities which would explain and entail them. To
wit, consider his characterization of his layer cake view of theories:

120
[I]t will be helpful to refer to the familiar rough distinction between two levels
of scientific systematization: the level of empirical generalization, and the level
of theory formation. . . . [T]he former level . . . is characterized by the search for
laws (of universal or statistical form) which will establish connection among
the directly observable aspects of the matter under study. [In] the second level
. . . research is aimed at comprehensive laws, in terms of hypothetical entities,
which will account for the uniformities established on the first level (1965, 178;
quoted in Lange 2000, 213).
The picture that emerges is that science begins with a set of O-level empirical regularities
which are constructed autonomously, without the input of a theory. The regularities that
are discovered to be robust and projectible then serve as candidates for theoretical
explanation. If this picture is correct, then the theory seems genuinely superfluous. As
Hempel states the problem:
Why should science resort to the assumption of hypothetical entities when it is
interested in establishing predictive and explanatory connections among
observables? Would it not be sufficient for the purpose, and much less
extravagant at that, to search for a system of general laws mentioning only
observables, and thus expressed in terms of the observational vocabulary alone?
(1965, p. 179).
Further, given what Hempel has said, scientists would not have to search at all!
There are a few ways to argue that a theory-user must already possess the relevant
empirical generalizations in order to use her theory. First, perhaps in order to know that
ones theory implies some empirical regularity in the downwards direction, one must
already know that the regularity exists in the O-level69. After all, actual psychological
agents rarely know most of what their beliefs entail, and theories may be particularly
complicated. Therefore, one may need observational-level familiarity with the
regularities that a theory generates before the theory can be used. A related possibility is

69

In other words, in order for a theory-user to know that T entails {O1  O2} and thus to predict that O2 on
the basis of O1 at t2, she must already know that {O1  O2} at t1.

121
that in order to learn a theory, agents must already have learned the relevant empirical
regularities that it entails.
Take, for example, the subordinate chimps purported theory of mind. How does a
subordinate chimpanzee with a theoretical concept of seeing know that a dominant
chimp who is not separated by a barrier can see the food (O1  M) and that a dominant
who can see the food will wallop him (M  O2)? It is plausible that these bridge
principles between seeing and its observable consequences have to be learned through
experience, and the most obvious candidate for such experience is an observed
contingency between taking food that is not behind a barrier from a dominant and being
walloped, (O1  O2).
In fact, proponents of the logical problem sometimes do suggest that they have
this stronger formulation of the problem one akin to the upward theoreticians dilemma
in mind, arguing that not only are there possible observable regularities that would
suffice for success at food competition or other experimental tasks, these are observable
regularities that any chimp with a theory of mind must already have before she can wield
her theory to make the correct predictions. For example, with respect to gaze-following
experiments, they write:
Any socially intelligent subject like a chimpanzee must possess a rich database of
[beliefs about observable cues] based on what he has learned about perceptually
similar situations in the past and the conditional dependencies that tend to hold
between these observable cues and other animals' subsequent behaviour. Thus, the
subject may have turned his head in the direction of the other chimp's head simply
because it learned from past experience (or was born with the propensity to learn)
that the given pattern of perceptual cues is a reliable indicator of something worth
looking at in the direction inferred by the other agent's eyes and head. There is no
need for the subject to reason in terms of an [mental state] variable, and positing
[a mental state] variable does no additional explanatory work in the given
situation (Penn and Povinelli 2007, 734, emphasis added).

122
The lesson of gaze-following experiments generalizes:
In almost all experimental procedues to date, purported [mental state] variables
appear to be causally superfluous re-descriptions of the other observable inputs
and representations that are logically required by the experimental design (ibid.,
735).
A similar suggestion is made by Morgan in his presentation of the logical
problem70. Morgan gives the example of a dog who had purportedly demonstrated mindreading by deceiving his masters. The dog was trained to walk around a long table on his
hind legs to receive a reward. Before long, its masters noticed that the dog required far
less time to round the table than before and upon inquiring, observed that he had started
dropping to all fours when out of sight, only returning to his hind legs when in view.
They hypothesized that the dog had reasoned as follows: I will get a treat if my masters
believe I am on hind legs. When they cant see me, they will still believe that I am on my
hind legs even when I am on all fours. Therefore, since it is easier to walk on all fours, I
will do so when they cant see me.
Morgan responds that we can explain the dogs behavior by positing that in his
experiences performing the trick, the dog would from time to time drop to all fours. On
some of these occasions, he was not rewarded, but on others (namely, those occasions on
which he dropped down on the opposite side of the table), he was still rewarded. Thus,
the habit of doing the latter would be developed through mere association between its
own position relative to the table and reward, with no thought of the mental states of its
masters.

70

To my knowledge, no commentators to date have noted that Morgan forwarded a nascent version of the
logical problem, very closely analogous to the one currently sweeping the comparative psychological
landscape. This is surprising, given that Morgans Canon is often discussed.

123
Crucially, Morgan argues that if the dog were mind-reading, he would still have to
learn all of the same observable contingencies. He would have learned when his masters
could not see him precisely by observing when he would or would not be rewarded upon
completion of his trick. However, once he has learned these contingencies, reasoning
about their mental states would be of no advantage. Morgan writes, if the action was a
lie, it involved not only the complexity I have hinted at, but in addition thereto the
thought, while I am on all fours they suppose I am on two legs, and this is the factor
which seems to me unnecessary (Morgan 1894, 371).
To summarize, there are several reasons why theory-theorists might endorse (and
have endorsed) the upward theoreticians dilemma. First, to employ her theory, a theoryuser must have a large store of information about how her theory relates to certain
observable facts. Second, there is a puzzle about how it is possible to learn71 a theory,
along with its observational consequences, in the first place unless one has already
learned the requisite observable contingencies. In either case, it seems that knowledge
about empirical contingencies already needs to be in place before a theory can be used, in
which case, it is unclear why the theory is necessary at all.
To answer the upward dilemma, then, the challenge is to specify how and when
the database of information one needs to learn and employ a theory falls short of the
epistemic reach of the theory itself. While the upward theoreticians dilemma may seem
merely a philosophical peculiarity, in fact, theory-theorists have acknowledged that a
central difficulty and obligation of the view is to give an account of how it is possible to
71

For now, I am ignoring the possibility that the theory might be unlearned. One major strategy for
bypassing the difficulty of explaining how a theory could be learned from data is to posit that it is innate
(for a recent review, see Margolis and Laurence 2013). While this is one possible version of the theorytheory, most theory-theorists believe that intuitive theories are learned from experiences which interact
with more basic (perhaps innate) representational resources (Carey 2009, Gopnik and Wellman 2012).

124
learn new theoretical representations that epistemically go beyond known observable
regularities. As one prominent theory-theorist puts it, the challenge is to answer the
question, What learning processes can create representational resources with more
expressive power than, or qualitatively different from, their input? (Carey 2009, 307).

7. Conclusion and Next Steps


I have argued that the structural similarities between the logical problem and the
theoreticians dilemma are a double-edged sword for theory-theorists of human
uniqueness. In order to support the claim that animals do not reason about theoretical
events or properties, its proponents must show that theories are not necessary for even the
most sophisticated behaviors of animals, and the logical problem has been a central tool
in their attempts to prove that point. However, emphasizing the sufficiency of purely
observational-level reasoning threatens to weaken the unique epistemic role of theories.
This is no accident; the same logic underlying the logical problem also underlies a strong
challenge to the utility of theories in general, the theoreticians dilemma.
After developing this central tension at the heart of the theory, I argued that a
closer look at the theoreticians dilemma suggests some avenues for resolving this
tension. In particular, the downward version of the theoreticians dilemma which is
analogous to the first part of the logical problem does not prove that theories are
psychologically unnecessary. On the other hand, the stronger, upward formulation of the
dilemma does present a serious challenge to the psychological utility of theories.
Therefore, in order to make the theory-theory tenable, it will be necessary to
respond to the upward formulation of the dilemma. There are several different strategies

125
for doing so. First, one can argue that the theoreticians dilemma rests on a false
presupposition about the nature or purpose of theories. Second, one can argue that the
upward theoreticians dilemma is unsound by showing why theories are psychologically
necessary, that is, by showing how theories allow their users to extend their knowledge of
observable regularities beyond those that they already possess. Lastly, the theory-theorist
must show how her response to the theoreticians dilemma is compatible with the logical
problem, as used as an attack on experiments which have purported to demonstrate
theoretical capacity in animals. These will be the goals of the remaining chapters.

126
Chapter 4: Rejecting Hempels Premises: Two Unsatisfactory Accounts

In the previous chapter, I argued that there is a tension at the heart of the theorytheory of human uniqueness. A central argument that theory-theorists have used to deny
that animals are theorizers, the logical problem, threatens to undermine the claim that
theorizing plays some unique and important epistemic role in human cognition, in part
because it is extremely similar to an argument which purports to show that theories are
superfluous, Hempels theoreticians dilemma. Having developed the strongest such
version of this tension, expressed by the upward formulation of the theoreticians
dilemma, in the rest of this dissertation, I will attempt to resolve it.
As I noted in the previous chapter, there are two general ways to rebut the upward
theoreticians dilemma. The first is to show that the dilemma rests on some false
presupposition about the nature or function of theories and that the dilemma does not
impugn the correct account of theories. The second is to accept Hempels assumptions
but to show that the conclusion that theories are superfluous does not follow from those
premises.
In this chapter, I will consider two responses of the first kind. Ultimately, I will
argue that neither provides an account of the psychological necessity of theorizing
adequate to the purposes of the theory-theory of human uniqueness. However, these two
attempts each contain kernels of truth that will help to set the stage for the more
promising accounts that I will consider in the next two chapters.

127
1. Shifting from a Deductive to an Inductive Account of Theories
Hempels own solution to the dilemma is to reject one of his initial assumptions
about the nature of theories. He argues that the dilemma is only really a dilemma when
we assume that the function of theories is to impose deductive relationships (entailments)
among propositions stated in the observation language. Hempel argues that theories can
also serve to establish inductive connections among observable statements, and in this
role, they may not be replaceable by any of their observable-level implications.
Unfortunately, Hempels discussion of this role of theories is quite brief and schematic,
and as I will argue, is vulnerable to revised, inductive versions of the dilemma.
Theories inductively systematize a set of observational regularities when they do
not make observation statements logically entail other observational statements but make
them probabilistically relevant to one another72. The basic sketch of one type of inductive
systematization is as follows. A theory can make observations inductively relevant to one
another without making them entail one another by positing the existence of an
unobserved common cause of them. For Hempel, this common cause suffices for its
observable effects but is not necessary for them73, so the presence of those effects raises
the probability that the common cause is present but does not entail it. Then, if this
common cause is hypothesized to have other observable effects (either directly or via its
relationships with other hypothesized causes), then it will establish inductive connections
among its observable effects.
72

For example, a medical theory might predict that Johnny will show measles symptoms (O2) given that his
sister showed measles symptoms (O1); the theory does not show that O1 entails O2 but rather that the
probability of O2 given O1 is, say, .92 (Hempel 1965, 176).
73
Perhaps motivated by his Deductive-Nomological model of explanation and prediction that he defends
elsewhere (Hempel 1942, Hempel 1965, Hempel and Oppenheim 1948), Hempel retains a deductive
component of theories in his response to the dilemma. However, the requirement that postulated theoretical
causes entail their effects is unduly strong. In fact, the approach that I favor, discussed in the next chapter,
requires that causes do not entail their effects.

128
Hempel illustrates how theories play this inductive role with a hypothetical theory
of white phosphorous (Hempel 1965, 214). The theory contains two theoretical terms:
Px = x is white phosphorous
Ix = x has an ignition temperature of 30C
and one law stated entirely in theoretical terms:
(x)(Px  Ix)
It also contains bridge principles stating the deductive observational consequences of the
theoretical terms. First, it contains the bridge principle that if something has an ignition
temperature of 30C, it will burst into flames at 30C (Fx):
(x)(Ix  Fx)
Second, it contains generalizations about the observable properties of white phosphorous:
(x)(Px  Gx)
(x)(Px  Sx)
(x)(Px  Tx)
where Gx = has a garlic odor, Sx = results in skin burns, and Tx = is soluble in
turpentine.
Because these bridge principles give only necessary conditions for somethings
being white phosphorous, they do not exhaust the observational-level content of the Tterms or count as reductions of T-terms in the observational language. The inference from
observable properties to the common cause of white phosphorous is inductive rather than
deductive; observing that a material has these observable properties raises the probability
of its being white phosphorous but does not suffice for it. As a result, the postulation of P
makes these observable effects merely inductively relevant to each other; learning that a
substance, a, is garlic odored (Ga) raises the probability that it will result in skin burns

129
(Sa) and dissolve in turpentine (Ta), but it does not deductively entail that it will. The
inductive systematization proceeds as follows.
Suppose you have observed that substance a exhibits properties Ga, Sa, and Ta,
and you are therefore very confident that the material is white phosphorous. The other
postulates of the theory entail that any instance of phosphorous also has an ignition
temperature of 30C and therefore will burst into flames at that temperature, so the theory
enables you to predict that the substance is also very likely to burst into flames at 30C.
According to your theory, your observations are evidence which raise the probability that
a will burst into flames at 30C; Pr(Fa | Ga&Sa&Ta) > Pr(Fa).
Unlike the deductive case, the theory does not entail an observation-language
entailment that would suffice for that prediction, i.e. (x)(Gx&Sx&Tx  Fx). Craigs
Theorem, which plays a key role in the deductive statement of the dilemma, guarantees
that there is a possible O-language counterpart of T that will contain all of the deductive
connections among observables that T entails, but it does not guarantee that there is a
possible O-language counterpart of T that will contain all of the inductive connections
among observables that T entails. Therefore, there is no guarantee that the Craig
equivalent of the theory can replace it without a loss in predictive power74. Psillos (1999,
26) approvingly summarizes this solution to the dilemma as follows:

74

Hempel notes that this inductive connection will be contained in the Ramsey-sentence equivalent of the
theory. However, Hempel does not think that the Ramsey-sentence should be seen as an observable-level
replacement of the theory, arguing that The Ramsey-sentence associated with an interpreted theory T
avoids reference to hypothetical entities only in letter replacing Latin constants by Greek variables
rather than in spirit Hence, Ramsey-sentences provide no satisfactory way of avoiding theoretical
concepts (1965, 216).

130
A hypothesis H entails observational consequences O1, O2, , On. When these
obtain, although we cannot deductively infer H, we can inductively conclude that
H holds. Suppose, further, that H together with other theoretical hypotheses entail
an extra testable prediction On+1. This new prediction could not have been issued
by the observational consequences O1, O2, , On on their own. Instead, its
derivation rests essentially on accepting the inductively inferred theoretical
hypothesis H. So, H is indispensible in establishing this inductive connection
between O1, O2, , On and On+1 in that Craig(H) could not possibly establish
such connection.
Though this move is a step in the right direction, answering the theoreticians
dilemma is not so easy as moving from a deductive to an inductive conception of
theories. While this response evades the original theoreticians dilemma, it is possible to
construct new versions of the argument regarding the inductive consequences of T. The
downward inductive theoreticians dilemma states that the role of a theory T is to
establish inductive connections that is, probabilistic dependencies among observables.
In order for T to establish a probabilistic dependence75 between O1 and O2, the argument
goes, the theory must entail a statement of the probabilistic dependence between O1 and
O2. This claim about the probabilistic relationship among the observables could then
replace the theory with no loss in predictive ability.
To illustrate, consider Hempels toy theory of white phosphorous. In order to
perform the inductive function that Hempel identifies, the theory must establish that
Pr(Fx|Gx&Sx&Tx) > Pr(Fx)76. For more precise predictions about the particular

75

Here, I am using probabilistic dependence in the general sense to refer to both positive and negative
correlations. Additionally, I intend T establishes a probabilistic dependence to cover both the weaker
claim T establishes that there exists some dependence between O1 and O2 or the stronger claim that T
establishes that a particular dependence exists between the two.
76
The theory entails that (x)(Px  Fx) and (x)(Px  Gx). However, this conjunction does not guarantee
that Pr(Fx|Gx) > Pr(Fx) is true. Even if Ga confirms Pa, it does not necessarily confirm the deductive
consequences of Pa (e.g. Fa). Such reasoning would be an instance of the Special Consequence Condition
(SCC), which is invalid. For example, the observation that a substance a is garlic-odored (Ga) confirms
both the hypothesis that it is white phosphorous (Pa) and that it is garlic (Ha). Suppose that (x)(Hx  ~Fx).
Then, if the SCC were true, Ga would confirm both Fa and ~Fa. Depending on how strongly Ga confirms
Pa and Ha and the prior probabilities of those hypotheses, Ga may either confirm or disconfirm Fa.

131
probabilistic dependencies among them say, that the Pr(Fx|Gx&Sx&Tx) = .9 the
theory must entail this particular probabilistic relationship. However, once the theory has
yielded this statement, it is possible to replace the chain of theoretical statements with the
simple statement of the inductive relationship between observables alone. We can restate
the downward theoreticians dilemma with respect to this case as follows:
1. If the terms and general principles of the theory of white phosphorous serve their
purpose; i.e. if they establish inductive connections among observable features Gx,
Sx, Tx, and Fx, then they imply some set of statements which establish that
Pr(Fx|Gx&Sx&Tx) > Pr(Fx). If they do, they can be dispensed with since any chain
of laws and interpretive statements establishing such a connection should then be
replaceable with a statement, Pr(Fx|Gx&Sx&Tx) > Pr(Fx), which directly links
observational antecedents to observational consequences.
2. If the terms and general principles of the theory can be dispensed with, they are
unnecessary.
3. If the terms and general principles of the theory do not serve their purpose, i.e. if they
do not imply some statement or set of statements which entail that Pr(Fx|Gx&Sx&Tx)
> Pr(Fx), they are unnecessary.
4. The terms and general principles of the theory either serve their purpose or they
dont.
C: Therefore, the terms and general principles of the theory are unnecessary.
As with the original theoreticians dilemma, the second premise of this argument
is on shaky ground, for it is possible that the new statement of the probabilistic
dependencies among observables would not have been reached without the use of a
theory. However, it is also possible to construct a version of the upwards theoreticians
dilemma for the inductive case that would show that theories are unnecessary in an even
stronger sense.
A sketch of this argument with respect to the theory of white phosphorous is as
follows. Suppose one has observed a correlation among substances that are garlic-odored,
soluble in turpentine, and cause skin burns, and one wants to predict whether substances
exhibiting these properties are also more likely to burst into flames at 30C.

132
In order to employ the theory to make predictions, one must be able to identify
likely instances of white phosphorous. Thus, at the observable-level, you must already
possess empirical generalizations that pick out instances of the theoretical generalization.
For instance, you may have observed that the properties of being garlic-odored, soluble in
turpentine, and causing skin burns are correlated Pr(Gx&Tx&Sx) >
Pr(Gx)Pr(Tx)Pr(Sx) and you infer that materials that have these properties are likely to
be of a stable chemical type, which you posit to be white phosphorous.
According to Hempel, your theory states that white phosphorous also has the
property of bursting into flames at 30C, so it establishes a new inductive connection
between (Gx&Tx&Sx) and Fx. However, the upward theoreticians dilemma demands an
answer to how one could have learned the bridge principle that white phosphorous bursts
into flames at 30C, unless one had already observed that materials that have the
characteristic observable markers of white phosphorous (G, T, and S) also tend to have F.
In other words, in order to learn the theoretical regularity, (x)(Px  Fx), that makes it the
case that Pr(Fa|Ga&Ta&Sa) > Pr(Fa), one must already have observed a correlation
among those properties. However, the argument goes, this prior observed regularity
suffices for all of the predictions of the theory, and therefore, the theory is unnecessary77.
The good news for the theory theory of human cognition is that this upward
inductive formulation of the theoreticians dilemma is not generally sound; it is not the
case that in order for a subject to possess, learn, and/or utilize a theory that generates for

77

Positing a common cause does make additional predictions beyond the mere correlation between its
effects, though they are not predictions about matters of fact in Hempels sense. For instance, it predicts
that an intervention on one effect variable that does not change the chemical composition of the substance
will not yield a change in the other effect variables. For instance, spraying Febreeze on white phosphorous
to eliminate its garlic odor would probably not change the probability that it will cause skin burns. I will
return to a discussion of these higher-order predictions in Chapter 6.

133
the subject the prediction that observable states O1 and O2 will be correlated, she must
already have observational-level knowledge that O1 and O2 are correlated. The challenge,
in what follows, is to specify how theoretical-level generalizations can be learned and
utilized such that these generalizations make predictions that go beyond those that have
been previously given at the observational-level. In other words, Hempel still needs to
provide an account of how the postulation of a common cause can establish new
inductive connections that are not known on the basis of observation alone.
Thus, while I think that Hempels purported solution to the dilemma is on the
right track that the special epistemic import of theories lies in the inductive connections
they establish merely gesturing at this function does not resolve the difficulties raised
by the theoreticians dilemma.

2. Theories as simplifying, AIC


2.1. Intervening variables simplify mental models
Another key assumption of the theoreticians dilemma is that the role of theories
is to generate new predictions. If the theoreticians dilemma is sound, then perhaps
theoretical terms can help to simplify the systematization of observational predictions
but this is their only function (Earman 1978, 197). According to the next response to
the theoreticians dilemma that I will consider, even if theories merely simplify the
systemization of observational predictions, they may be epistemically necessary in virtue
of this simplifying role. Work in the area of model selection has shown that simpler
models are more predictively accurate; therefore, models containing theoretical terms
may be more predictively accurate than their observable counterparts.

134
For this argument to be plausible, it must be the case that theories do simplify
systemizations (here, the mental models that subjects use for prediction) of observables,
that simplified systematizations are more predictively accurate, that this is an important
epistemic role, and that this simplification is unique to theories. Ultimately, I do not think
that this account is adequate for the purposes of the theory-theory of human uniqueness
because it fails on the last two counts. However, an examination of why it fails will serve
to bring out a few key features of the relation between theories and observable
generalizations that will be crucial for the more positive suggestions I will make in the
following chapters.
But first, why should we think that theory use will lead to more accurate
predictions? Recent work in model selection theory has shown that intervening
theoretical variables can reduce the number of adjustable parameters in a model and that
models with fewer adjustable parameters, all else being (roughly) equal, are expected to
be more predictively accurate when applied to new data. The thought, developed by
Whiten (1995, 2013), Sober (2009), and Clatterbuck (forthcoming), is that this
simplifying function of theories may be the key to explaining the unique advantage of
theorizing.
The simplifying function of theories was influentially applied to the primate
mindreading case by Whiten (1995), who showed that models of behavior that contain
intervening variables corresponding to unobservable mental states can represent diverse
behavioral data using fewer parameters than corresponding behavior-reading models. On
Whitens view, a mind-reading model is one in which variables denoting observable
behavioral stimuli are causally linked to a mental state variable, which in turn is causally

135
linked to variables denoting expected behavioral response. The mental state variable is
not directly observable, and is thus an intervening variable the value of which can be
affected by any or all of the input variables, and having changed, can itself affect each of
the outputs (ibid., 284).
Whiten offers the following example in which the introduction of an intervening
variable reduces the number of parameters in a hypothetical model of rat behavior:

Figure 4.6 Two hypothetical models of rat behavior from Whiten (1995). The bottom introduces an
intervening variable denoting an unobservable, thirst.

When multiple input variables are each causally linked to multiple output
variables, a non-intervening variable model (at top) requires causal arrows from each
input variable to each output variable. The intervening variable model (at bottom) needs
to use fewer arrows to depict the effects of the input variables on the output variables.

136
We can conceive of the above as possible mental models of behavior that a
cognitive agent may use to predict behavior, where each variable corresponds to the
agents belief about the presence or absence of some feature and each arrow represents a
known contingency between two variables78. Whiten argues that models which contain
intervening variables will be more economical of representational resources because
the organism need not represent the links between each behavioral cue and each
response; instead, they can infer the likely states of any of the response variables from the
state of the intervening variable alone (ibid., 284).

Figure 4.2: A hypothetical mental model used by primate A to predict the behavior of primate B,
where As beliefs about Bs behavioral cues (left) are unified by its belief about Bs mental state
(center) and that belief is used to predict Bs possible behaviors (right) (ibid., 285).

However, Whiten notes that a puzzle arises from his suggestion that mind-reading models
are more economical of representational resources, for if mentalism conceived in this
way owes its existence to cognitive economy, it may appear a paradox if our current
working hypothesis that it is refined only in particularly clever species like apes is

78

I am agnostic about whether this is an explicit belief of the subject or merely a disposition to infer one
from the other.

137
confirmed (ibid., 287). The best and perhaps only 79 candidates among non-humans for
mind-reading capabilities are the large-brained apes, suggesting that mindreading takes
great cognitive resources. Why is the simplicity of mind-reading models evolutionarily
relevant if it is more costly to represent those simpler models?
In his response to the paradox, Whiten also suggests that simpler mental models
might be more useful even if they are cognitively demanding:
The capacity to recognize in the first place the complex pattern which is covered
by an intervening variable may require considerable neural resources, of a level
we see only in apes. It is once this recognition has taken place that application in
behavioral analysis can become efficient on any one occasion, facilitating fast and
sophisticated tactics to be deployed in, for example, what has been described as
political maneuvering in chimpanzees (ibid., 287).
Note that this response explicitly accepts a crucial premise of the upward theoreticians
dilemma; Whiten states that a subject must first possess the relevant empirical regularity
before it can be encoded by an intervening variable, and this is what he hypothesizes to
be so cognitively demanding. Thus, the intervening variable only allows limited cognitive
agents to more easily reason about an observable regularity that they have already
learned, but it does not extend their predictive capacities beyond the empirical
regularities they already possessed.
I agree that this is an interesting and adaptive feature of simpler models. Whiten is
concerned with the evolution of complex mind-reading in an arms race within
competitive social environments, and fast deployment of a previously learned regularity
may be highly adaptive here. However, I doubt that this function is sufficiently important
to play the vast role that the theory-theory of human cognition attributes to theories.
79

There has also been considerable work done concerning mind-reading in corvids. See Clayton et al.
(2006) for a review, and Bugnyar and Heinrich (2005) for an experiment testing the caching behavior of
ravens which is structurally very similar to Hares experiments with chimps. See Hare and Tomasello
(2005) for a discussion of related work with domestic dogs.

138
According to that view, the use of theories unlocked novel behavioral capacities
language, flexible tool use, etc. far beyond those of other animals. It is hard to see how
the ability to do the same things as non-theorizers but faster would explain these novel
capacities.
However, Whitens own suggested solution does not exhaust the possible
advantages of simpler mental models. We can begin to resolve his paradox by
distinguishing between parsimony in cognitive resources demanded by a model (which is
perhaps proportional to some measure of the variables it contains) and parsimony in the
number of adjustable parameters that the model contains. Sober (2009) places Whitens
insight into a model selection framework in which the latter measure of parsimony is
epistemically relevant, in the stronger sense of influencing the accuracy of predictions
made from the model.

2.2 Intervening variables reduce the number of parameters


To illustrate how intervening variables can reduce the parameters in a model,
Sober describes the following MRH model of one version of Hares experiment discussed
in Chapter 3, where S is the subordinate, D the dominant, and x the food item placed in
the middle of the room (Sober 2009, 252):
(MRH)

Pr(S takes x | S believes that D did not see x) = p


Pr(S takes x | S believes that D saw x) = q

The observed frequencies with which the chimp takes the food in each case are used to
estimate the values of p and q, thus finding the best-fitting MRH model. A BRH model
can also be fitted to the data:

139
(BRH)

Pr(S takes x | S believes that an opaque barrier is between D and x) = a


Pr(S takes x | S believes that there is not an opaque barrier between D and x) = b

So far, both models contain the same number of parameters. However, Sober
argues that when we move to models of multiple versions of Hares experiments, the
number of parameters needed to fit the data under the BRH model explodes, while the
MRH still requires just two parameters. Consider the following models of the
subordinates response across the trials in which a dominant was behind an opaque
barrier, trials in which the dominant was behind a transparent barrier, and trials in which
a dominant had his back turned to the subordinate and the food. According to Sober
(2009), the multiple experiments, taken together, can be modeled as follows:
(MRH)

Pr(S takes x | S believes that D did not see x) = p


Pr(S takes x | S believes that D saw x) = q

(BRH)

Pr(S takes x | S believes that Ds back is turned to S and x) = a


Pr(S takes x | S believes that D is facing both S and x) = b
Pr(S takes x | S believes that an opaque barrier between D and x) = c
Pr(S takes x | S believes that no opaque barrier is between D and x) = d
Pr(S takes x | S believes a transparent barrier is between D and x) = e
Pr(S takes x | S believes that no transparent barrier is between D and x) = f

Povinelli et al. dispute that the MRH could be a simpler theory than the BRH. To
see this, we must change focus a bit. While Sobers discussion is given solely for models
that we construct to predict chimpanzee behavior, it is possible to apply the very same
logic to compare possible models that chimpanzees themselves form to predict the
behavior of others, as does Whiten80. Povinelli et al. argue that a mindreading chimps
mental model could not be more parsimonious than a behavior reading chimps. One of
their arguments for this claim can be dispatched with rather quickly. Povinelli and Vonk
argue:

80

In what follows, I will consider these latter models exclusively.

140
A hypothetical chimpanzee subject, endowed with a full-blown, human-like
theory of mind, would still need the ability to detect every behavioral category
that is relevant to a proper theory of mind inference Thus, possession of a
theory of mind does not somehow relieve the burden of representing the massive
nuances of behavior of the statistical invariances that sort them into more and less
related groups. In either event, these behavioral abstractions must be
represented Indeed, if our analysis is correct, there is no sense in which a
system that makes inferences about behavioral concepts alone provides a less
parsimonious account of behavior than a system that must make all of those same
inferences plus generate inferences about mental states (Povinelli and Vonk
2004, 9-10).
In effect, their argument is that the MRH model would require all of the parameters of the
BRH model, plus the parameters for seeing and not seeing. For example, consider a
BRH* and MRH* mental model that the subordinate chimp may use to predict the
dominants behavior in Hares experiments.
(BRH*)

Pr(D will wallop me | Ds back is turned to S and x) = a


Pr(D will not wallop me | Ds back is turned to S and x) = b
Pr(D will wallop me | D is facing both S and x) = c
Pr(D will not wallop me | D is facing both S and x) = d
Pr(D will wallop me | there is an opaque barrier is between D and x) = e
Pr(D will not wallop me | there is an opaque barrier between D and x) = f

(MRH*)

Pr(D will wallop me | D did not see x) = p


Pr(D will not wallop me | D did see x) = q
Pr(D did not see x | Ds back is turned to S and x) = r
Pr(D did see x | Ds back is turned to S and x) = s
Pr(D did not see x | D is facing both S and x) = t
Pr(D did see x | D is facing both S and x) = u
Pr(D did not see x | there is an opaque barrier is between D and x) = v
Pr(D did see x | there is an opaque barrier between D and x) = w

and so on for each of the other variations of the experiments.


However, it is not true that the introduction of an intervening variable always
requires extra (or the same number of) parameters; this will fail to be true when the
intervening variable links multiple inputs with multiple outputs81. For example, suppose
that a subject in Hares experiments used its mental model to predict two behaviors of the

81

More specifically, the number of parameters are reduced any time two inputs are linked to more than two
outputs or vice versa.

141
dominant (not just whether it would be walloped). In that case, the intervening variable
does serve to simplify the MRH model, as we can see by focusing on graphical
representations of the two models82:
+/- opaque
barrier
+/- D wallops
me
+/- back
turned

+/- opaque
barrier
+/- D wallops
me
+/- back
turned

+/- D takes
food
+/- transparent
barrier

+/- D sees x
+/- D takes
food

+/- transparent
barrier

Figure 4.3 - BRH* and MRH* models that the subordinate chimp may use to predict the dominants
behavior in three versions of Hares experiments. The MRH* model contains fewer parameters (5)
than the BRH* (6).

Once we focus on chimps mental models, we can see how their attribution of a mental
state to the dominant can serve to reduce the number of parameters in those models.

2.3 Simpler models are more predictively accurate


The logical problem turned on the claim that there are BRH and MRH models that
fit the current experimental data equally well, and thus, the data does not arbitrate
between them (Povinelli and Vonk 2004). Here, I will grant that the best-fitting BRH and
MRH models will fit the data (roughly) equally well83. However, according to the model
selection framework used by Sober, fit to data is not the only epistemic virtue of models.
To compare the BRH and MRH models, Sober applies a model selection criterion
from Akaike. A models Akaike Information Criterion (AIC) score is an unbiased

82

For ease of illustration, I have consolidated the variables into binary variables; for instance, the D sees
x and D does not see x variables are consolidated into +/- D sees x.
83
For a more thorough discussion of some problems that arise in comparing the likelihoods of the two
models, see Clatterbuck (forthcoming).

142
estimate of its predictive accuracy84. AIC takes into consideration both a models fit to
data and its number of adjustable parameters:

AIC score of model M = log [Pr(data | L(M))] k


Thus, models that fit the data roughly equally well, measured by Pr[data | L(M)], can
differ in their AIC scores, with more complex models (measured by the number of
adjustable parameters, k) suffering lower expected predictive accuracy.
We can get an intuitive grasp on Akaikes result by considering one of the
challenges of modeling, which is to strike the right balance between underfitting and
overfitting the data (Bozdogan 1987). A good model should pick up on true regularities
in the data (the signal) that can be projected to new instances without picking up on
spurious regularities (the noise) that will not. Models with more parameters allow a closer
fit but they also run the risk of overfitting to noise in the data. Therefore, simpler models
that fit the data roughly as well will tend to be more predictively accurate when applied to
new observations. A models AIC score balances these two desiderata fit to data and
number of parameters to give an estimate of predictive accuracy (Forster and Sober
1994, Forster 2000, Hitchcock and Sober 2004).
Sober argues that AIC makes sense of why an advantage in parsimony matters.
More parsimonious models are expected to be more predictively accurate, and therefore,
parsimony provides a (defeasible) reason to endorse mindreading models of chimpanzee
behavior. Elsewhere (Clatterbuck forthcoming), I have argued that Sobers argument as
a claim about the models we construct of chimp behavior is unduly instrumentalist and
does not give us reason to favor the MRH. However, when we shift our focus to chimp
84

By predictive accuracy of a model M we mean how well on average M will do when it is fitted to old
data and the fitted model is then used to predict new (Sober 2008, 84).

143
models of chimp behavior that is, when we focus on the models that chimps have in
their heads to represent and predict the behavior of other chimps Akaikes model
selection framework yields a hypothesis about the adaptive advantage of mindreading.
Just as introducing mental state variables to our models of chimp behavior
reduces the number of parameters we need to model their behavior, the introduction of
mental state variables to chimp models of chimp behavior will reduce the number of
parameters they need to model behavior. Hence, mindreading chimps would be make
more accurate predictions than their behavior reading peers.
The argument for predictive accuracy as a unique adaptive function of theories
would go as follows. Intervening theoretical variables simplify mental models by
unifying behavioral data under fewer parameters while still fitting the data roughly as
well as models which contain separate associations between observables alone. An agent
who makes predictions on the basis of a large set of learned associations among
observable cues will overfit to spurious associations in the data. Therefore, the capacity
to reinterpret observational regularities in terms of intervening theoretical variables will
result in greater expected predictive accuracy when the agent applies its mental models to
make predictions in new circumstances. Because agents that make accurate predictions in
novel situations will, all else being equal, be more fit than agents who are less accurate,
the capacity to reason about intervening theoretical variables may be adaptive.
There is an interesting parallel between AIC evaluation and the evolution of
different types of mental models that is worth noting. AIC is used to compare families of
models, rather than particular fitted models within those families. For instance, we can

144
consider the following BRH and MRH models which have not yet been fitted to
behavioral data.

a
+/- C1

+/- C1

+/- B1

+/- B1

+/- B2

c
+/- C2

d
e

+/- C3

+/- C2

+/- M

+/- B2

t
+/- C3

Figure 4.4 BRH and MRH families of models.

The AIC score of the BRH or MRH family of models takes into account the
likelihood of the best fitting fitted model within each family and the number of adjustable
parameters specified by the model family. Thus, if the best-fitting BRH and MRH models
fit the data equally well, then it is the family of MRH models that is expected to be more
predictively accurate when applied to new data, not a particular fitted model from that
family. This feature of AIC scoring lends itself well to a biological interpretation.
According to the speculative hypothesis under consideration, the cognitive trait that will
evolve is not a particular, fitted model; in most cases, an agent will have to learn that
from the particular observable cues in its environment. Instead, the trait that will evolve is
a general type of cognitive model. Interestingly, model selection criteria demonstrate how
the propensity to form mental models from the MRH family may be adaptive even if
particular beliefs about the variables in the model have to be learned individually.

145
2.4 Not the exclusive domain of theories
While the generality of the AIC result shows how a general type of mental model
could be adaptive, that same generality undermines its relevance in establishing the
unique adaptive role of theoretical intervening variables in particular. Note that the MRH
model is more parsimonious in virtue of its containing an intervening variable, not in
virtue of the content of that variable.
What is crucial is that the variable serves as an informational bottleneck which
collapses information about disparate observable inputs into a single variable, which is
then linked to multiple observable outputs (Sober 1998a, 475). From the point of view of
the cognitive system, the intervening variable screens off information about its inputs85.
While the AIC scoring result is quite general, relating parameter number to predictive
accuracy for many different types of model structures, the concept of informational
bottlenecks allows an intuitive grasp on how intervening variable models prevent
overfitting. They allow a system to forget some information about input-output pairs; this
is adaptive because some of this information will be noise.
The problem is that any intervening variable which imposes such a bottleneck
would serve to simplify the mental model, regardless of whether it corresponded to
anything theoretical. The question, then, is whether for any theoretical intervening
variable, there will always be a candidate non-theoretical intervening variable that will
serve the same syntactic role. Trivially, the answer is yes. For example, in Figure 4, we
could replace M with the binary variable +/-{C1 or C2 or C3} and achieve the same
simplification. More generally, we can construct an intervening variable that is a mere
85

The intervening variable (I) might not screen off the input variables in the strict, logical sense. The
variable can serve as a bottleneck even if information is lost from the inputs, in which case it may be that
the Pr(outputj | inputi) Pr(outputj | inputi & I).

146
function of its observable inputs; crucially, this variable is not theoretical, in that it is
reducible to and explicitly definable in terms of prior observational states.
This parallels the lesson of the theoreticians dilemma; for any model that
contains an intervening theoretical variable linking observational inputs to outputs, it is
possible in principle to replace that theoretical variable with one denoting the direct
relation between the observational states while achieving the same advantages of the
theory. However, it is possible to respond to this challenge in the same way as the
theoreticians dilemma; if the candidate observable variable is not one that a
psychological agent would or could represent, then the theoretical variable may be
psychologically necessary for achieving the goal of simplification.
In many cases, such an O-level intervening variable is psychologically available.
That is, in many cases, it is plausible that there is an observable regularity that unites the
various observable inputs of the model which actual agents can represent and which
would serve to impose a bottleneck on their mental models. Setting aside the mindreading
case for a moment, it will be helpful to look at a case in which there are good
psychological grounds for believing that these representations do exist in animals.

2.5 Observable bottlenecks prototypes


Research on perceptual category discrimination in pigeons has tested their ability
to divide perceptual stimuli into categories for the purposes of prediction and induction.
For instance, Wasserman and colleagues presented pigeons with pictures of objects
belonging to various natural categories, such as chairs, cats, flowers, and cars. After
seeing a picture, the pigeons were trained to peck a particular key corresponding to the

147
correct category, i.e. to peck the upper right red key if the object was a chair, the lower
left green key if it was a cat (reviewed in Wasserman and Astley 1994; Shettleworth
2010, 192). Not only were pigeons capable of learning to match objects to their
respective categories after training with those objects, they were also able to correctly
categorize new objects, i.e. they would correctly choose the upper right red key when
presented with a chair they had never seen. This is evidence that the pigeons had formed
a representation of perceptual categories that was not merely a verbatim listing of
previously observed objects.
The question, then, is what kind of conceptual representations underlie the
pigeons performance. According to one theory, the elemental approach, the pigeons had
memorized particular perceptual features of the objects that were necessary and sufficient
for category membership. The problem with this approach is that for most natural
categories, there is no such set of features and category membership is a more
probabilistic matter (Rosch 1988). Take, for instance, the category of trees. Most trees
are green, have leaves and branches, have dark trunks, and are found outside. However,
some trees (conifers, birches, and potted ficus) lack some of these features, while some
non-trees (celery stalks, electric poles, and sunflowers) possess some of these features.
Exemplars are another candidate for the representations underlying category
discrimination. An exemplar is a particular observed object that is taken to be indicative
of the category, and other objects are placed into the category based on perceived featural
distance from the exemplar. For example, a pigeons representation of a previously

148
observed tree might be its representation of the category, and only objects that look
sufficiently similar to that memorized image will be categorized as trees86.
More relevant to my interests here is the prototype theory of perceptual category
discrimination. According to this theory, a category is represented via a prototype, where
the features of the prototype are the central tendencies of previously observed features of
members of that category which may include configurations of those features, like
spatial arrangement and may change with new observations. For example, the pigeons
tree prototype may not be identical to any tree it has seen but will be an amalgamation of
them all, having the features most commonly seen in trees (i.e. looking more like an oak
than a weeping willow or Japanese maple).
Crucially for my point here, prototypes are intervening variables that can impose
informational bottlenecks that reduce the number of parameters in a mental model
thereby increasing expected predictive accuracy in precisely the same way as intervening
theoretical variables. The prototype averages over particular inputs and thus will tend to
only track robust regularities among category members, forgetting noise. Then, the
prototype representation screens off the particular observational features of an input
stimulus when the subject makes predictions about its other likely properties. For
example, consider a pigeon making an inference about the likely properties of a novel
bird, X:
Once X is categorized into a particular prototype based category, feature inference
is based entirely on the summary statistics encoded in the prototype itself. If the
value of Flies is 0.95 for the prototypical bird (i.e., 95% of birds summarized in
the prototype could fly), then the probability that this bird flies is 0.95 (Danks
2007, 176).

86

Additionally, category membership may be graded, not binary, such that something could be classified as
more of a tree than something else.

149
To see how prototypes can simplify mental models, consider the following mental
models that may be used to infer the properties of trees. The input variables designate
observed trees, along with their values for certain features that are diagnostic of trees.
The output variables are other observable features of trees that the pigeon that may be
ecologically relevant to the pigeon. I will compare three types of models; the first
contains no intervening categorization, the second unites the inputs under a prototype
categorization, and the third unites the inputs under a theoretical categorization of trees.

BIRCH
-dark trunk
+branches
+leaves
+green

b
c

+/- supports
weight of
nest

OAK
+dark trunk
+branches
+leaves
+green

d
e

+/- shelter
in rain

PINE
+dark trunk
+branches
-leaves
+green

Figure 4.5 A model with no intervening concept of tree. It contains 6 adjustable parameters, a-f,
which represent observed frequencies with which observed trees have had features of interest.

150

BIRCH
-dark trunk
+branches
+leaves
+green

g
+/- supports
weight of
nest

j
OAK

+/-tree
prototype

+dark trunk
+branches
+leaves
+green

PINE

+/- shelter
in rain

+dark trunk
+branches
-leaves
+green

Figure 4.6 A model containing an intervening prototype concept of tree. It contains 5 adjustable
parameters, where the g-i parameters are similarity measures from objects to the prototype, and j
and k parameters represent the observed frequency with which category members have had features
of interest.
BIRCH
-dark trunk
+branches
+leaves
+green

l
o

OAK

+/- is a tree

+dark trunk
+branches
+leaves
+green

PINE

+/- supports
weight of
nest

p
n

+/- shelter
in rain

+dark trunk
+branches
-leaves
+green

Figure 4.7 A model containing an intervening theoretical concept of trees with 5 adjustable
parameters, where the l-n parameters represent the probability that a given observed object is a tree,
and o and p parameters represent the probability that a category member will have features of
interest.

151
Suppose that these models all fit the observational data (roughly) equally well. By
comparing the model in Figure 5 to those in Figures 6 and 7, we can see that the
introduction of an intervening concept reduces the number of parameters in the model.
A comparison of the models in Figures 6 and 7 shows that it is not the content of
the intervening variable that is relevant for AIC scoring. Suppose that the intervening
variable in Figure 7 denotes some theoretical concept of tree-hood; for instance, perhaps
it posits some underlying essence that trees have which causes them to have their
characteristic properties (Gelman 2003, Rehder 2007). This will impose an informational
bottleneck on the model. The properties of an object (whether it has a dark trunk, etc.)
serve as evidence that a given object is a tree, but once categorized as such, those
properties are screened off by its category membership with respect to other properties
(whether it will support a nest, etc.). However, a prototype representing the central
tendency of past observations will serve to screen off perceptual inputs just as well.
The upshot is that while theoretical beliefs do serve to simplify mental models,
there are beliefs corresponding to perceptual regularities alone that can and do serve the
same purpose in animal cognition. If a model containing an intervening theoretical
variable and a model containing an intervening variable denoting some purely
observational category each fit the data (roughly) equally well, then they will both reduce
adjustable parameters in a model in the same way.
Model selection criteria shed interesting new light on when and why placing
idiosyncratic perceptual inputs into categories for the sake of making predictions is
adaptive. Reasoning about trees as a category either via a prototypical tree or a theory
of trees is expected to deliver more accurate predictions than reasoning about individual

152
trees. This result is not obvious. You might expect that to predict whether a given object
will support a nest, a subject would do best to take into account as many features of the
object as possible. However, this will often result in overfitting (as Sober points out) and
would require the subject to represent an enormous amount of parameters, a large
cognitive burden (as Whiten points out).

2.6 Observable categorizations MRH and BRH


The lesson that I draw from the above considerations is that there may be
psychologically available non-theoretical representations that can serve the same
syntactic role as theories. If such representations are always available, this would
undermine the claim that simplifying mental models is the unique function of theories.
However, it is doubtful that a non-trivial observable intervening variable will always be
available.
Returning to the mindreading case, Povinelli et al. argue that there is in fact an
observable intervening variable that a chimp could use to simplify its representation of
Hares experiments. In each of those experiments, the dominant walloped the subordinate
for taking all and only those food items for which, at some point in the experiment, there
was an uninterrupted line in space between the dominants eyes and the food item; this
possibility has been dubbed the line of sight (Lurz 2009) or evil eye hypothesis
(Kaminsky, et al. 2008). Thus, the subordinate could have united the observable cues in
each experiment under an intervening variable, D has a direct line-of-sight to the food,
where this denotes a directly observable property. If this is the case, then it is possible to

153
construct a BRH model of the chimps behavior using the same number of parameters as
the MRH*:
+/- opaque
barrier
+/- D wallops
me
+/- back
turned

+/-

Line of
sight
+/- D takes
food

+/- transparent
barrier
Figure 4.8 A BRH model of three versions of Hares experiment containing an intervening variable.

In the case of Hares experiments, there is a plausible observable-level


contingency the subordinate could use to predict the dominants behavior using as few
parameters as the mindreading model. However, it is still an open question as to how
often this will be the case. I will take up this question again in the next chapter.
For now, I conclude that the function of theories identified here, and for which I
have argued for elsewhere (Clatterbuck forthcoming), does not suffice for the theorytheorys purposes. The suggestion is that theories reduce the number of parameters
needed to mentally model some domain, and that this increases the expected predictive
accuracy of the model when applied to new data. In order for this to be a plausible
account of the unique adaptive role of theories, this role should be one that theories play,
that only theories play, that can explain how theorizing enables novel behaviors that
observable beliefs cannot, and suggests ways of testing for the presence of theoretical
beliefs.
While the first of these desiderata is on firm mathematical ground, the account
falters on the other three. First, this is not a role that theories alone can play; indeed, this

154
function is one that many other types of observable-level representations can and do play
in animal cognition. Second, it is difficult to see how theories, in virtue of merely
reducing the number of parameters in a model, could produce novel types of cognition or
prediction, rather than old types faster or more accurately.
These two problems also raise difficulties for using this function to design new
tests for theoretical capacities in animals. First, if theories only make a quantitative
difference with respect to accuracy or speed of processing, it will be more difficult to test
for their presence. The problem is amplified when we consider that we do not have a
good control; we cannot know independently when an animal is using a theory and when
it is not, measure their relative accuracy or speed, and then use this to evaluate other
cases87.
Second, the function played by theoretical variables in the model selection
framework is purely syntactic it is not in virtue of the content of theoretical beliefs that
they increase predictive accuracy and thus, I have argued that observable beliefs often
exist that can play the very same syntactic role. This raises an additional problem for a
related account of how to test for the presence of intervening theoretical beliefs, to which
I will now turn.

87

It might be possible to use human subjects to estimate how much more quickly and accurately
predictions are made when theorizing.

155
Chapter 5: The Intervening Variable and Triangulation Approaches

1. Testing for Intervening Variables


In the previous chapter, I presented two responses to the theoreticians dilemma,
each of which rejects an assumption on which it is predicated. The first, from Hempel,
states that theories are indispensible in their role of establishing inductive connections
among observable statements. The second, from Whiten and Sober, states that theories
are indispensible in their role of simplifying the causal models of theory users. I
concluded that both accounts fail to redeem the psychological necessity of theories and
for largely the same reason, which is that there are possible observable-level intervening
variables that can play those very same roles.
These two responses to the theoreticians dilemma have inspired a recent
approach to testing for the presence of theories, which has come to be known as the
intervening variable approach (Heyes in press). It characterizes the conceptual schemes
that humans and animals use to make predictions about the world as inductive causal
models, and it uses tools developed in the field of causal modeling to analyze and
understand cognition. While this approach has launched a burgeoning research program
in cognitive science, particularly among theory-theorists (Gopnik and Wellman 2012,
Gopnik et al. 2004), here I will be focusing on a more specific proposal from Whiten
(1995, 2013) and Sober (1998a, ms) for how to use principles of causal modeling to test
for the presence of theoretical intervening variables.

156
Both sides of the mindreading controversy have accepted the notion that nontheoretical representations may also serve as intervening variables in animal cognition.
Andrews (2005) characterizes this non-behaviorist picture, as follows:
There are two general classes of explanation for predictive behaviors generallya
purely behavioristic cue-based approach, and a knowledge-based approach in
which apes construct and use categories of behavior. The only difference between
these two approaches is with regard to the ability of the animal to use intervening
variablesthe abstract concepts that are used to organize behavior. As Call points
out, and as [Povinelli and Vonk] confirm, there is ample evidence from studies on
chimpanzee concepts (such as same/different, stimulus equivalence, and
transitivity tasks) that chimpanzees use categories to determine the responses they
should make. The real issue at stake is whether some of those categories are
[mindreading] ones (522-523).
The intervening variable approach tests for a syntactic property of models
containing theoretical variables, namely, that there will be probabilistic dependencies
among the observable-level outputs of the models which persist even once we have
conditioned on the observable-level inputs of the models. Though I will ultimately reject
this methodology as inadequate in most cases, it is an improvement over the types of
experiments attacked by the logical problem. Thus, in Sections 2-5, I will show how the
intervening variable approach promises to overcome the logical problem and sketch how
it can be best used (that is, the best case scenario for the approach).
Then, in the remainder of the chapter, I will argue that the intervening variable
approach suffers from the same problem that beset the accounts discussed in the previous
chapter. The problem is that there are often observable-level intervening variables that
can play the same syntactic role as theoretical intervening variables; therefore, when
these alternative possibilities exist, testing merely for this role will not distinguish
between models with theoretical variables and those without.

157
Finally, I will suggest an experimental protocol that draws on some of the insights
of the intervening variable approach but which offers a way of distinguishing between
different types of intervening variables. The solution, I will argue, is to design
experiments in which the semantic content of theoretical intervening variables differ from
their observable analogs, such that the two models predict different probabilistic
dependencies among observables.

2. The logical problem


The logical problem, as an attack on our ability to experimentally uncover the
presence of intervening theoretical beliefs, centers around a comparison between two
different types of causal chain explanations. Consider the BRH and MRH explanations
for why the subordinate chimp in the experiments of Hare et al. predicts that it will not be
punished for taking the food item that is behind an opaque barrier. The BRH attributes to
the subordinate beliefs about the observable cues, O1, of the experiment (an opaque
barrier between it and the dominant) and the association between those cues and resultant
behavior, O2, of the dominant; this last belief causes it to take the food (B). The MRH
attributes beliefs about O1, the association between O1 and the presence of a mental state
of the dominant, M, and the association between M and O2; this last belief causes it to
predict that O2 and produce B.
Schematically, these can represented as causal chains, where the  represents a
causal relation between two mental states of the subordinate. In the discussions of
mindreading I have been considering, this causal arrow denotes an inference which
proceeds via a belief (implicit or explicit) about the probabilistic relationships between

158
the two states those beliefs represent. In the models I will now consider, a parameter
above the arrow denotes the probabilistic relationship between two variables, i.e.
denotes the value that the subject assigns (explicitly or implicitly) to Pr(O2|O1):

Experimental
Set-up

O1

O2

Behavior

Figure 5.7 BRH model of one version of Hares food competition experiment.

Experimental
Set-up

O1

O2

Behavior

Figure 5.8 MRH model of one version of Hares food competition experiment.

The first part of the logical problem states that it is always possible to construct a BRH
model that will fit the data as well as the MRH model. These causal representations of the
two hypotheses show why this is the case. Both models predict that the subject will
perform behavior B given the experimental set-up. Further, it is easy to set the parameters
in the two models such that they make exactly the same predictions, for example, by
setting equal to and = 188.
It may be objected that such models are not realistic, perhaps because there will
rarely, if ever, be a deterministic relationship between an agents beliefs. However, this
point does not go very far in rebutting the logical problem. There are other ways to get
the models to make the same predictions about the probability that the subordinate will
perform B in that experimental set-up, namely, any situation in which Pr( | ) = . Also,
note that these parameters link variables denoting the chimpanzees mental states. These

88

This type of BRH model constitutes the existence proof for the first part of the logical problem in Penn
and Povinelli (2007, 734). Similarly, Sober (2009) suggests that in order to construct the best fitting BRH
model, we can set = 1 and = .

159
are unobservable within a black box (Sober 1998a) and therefore, we cannot easily
test which parameter assignments are correct.
From the fact that these BRH and MRH models can be made to fit the data
equally well, Povinelli et al. do not conclude that it is impossible, always and
everywhere, to experimentally distinguish between the BRH and MRH but rather that
experiments with this particular causal chain structure can not distinguish between them
(Penn and Povinelli 2007, Povinelli and Vonk 2004). This restraint is warranted because,
as demonstrated by Sober (1998a, ms), the two hypotheses do make different predictions
within experimental contexts with different causal structures.

3. The intervening variable approach


In Chapter 4, I discussed one effect of introducing an intervening variable into a
model; it can reduce the number of parameters. The intervening variable approach uses
the same general causal modeling framework and draws on another effect of introducing
an intervening variable, which is that it establishes new probabilistic dependencies among
the outputs of the model.
To see this, consider three models of the causal structure responsible for
producing two effects, E1 and E2 (Sober 1998a). The first postulates separate causes of
each (SC). The second and third postulate at least one common cause of E1 and E2,
including cause C. They differ in whether they do or do not contain an intervening
variable, I, as a more proximate common cause of E1 and E2 (IV and NIV, respectively):

160
(SC)

C1

E1

C2

E2

(NIV)

(IV)

E1

E1
C

E2

E2

Figure 5.9 Three causal models with different causal structures, terminating in effects E1 and E2.

These models all contain causal chains that terminate in the same effects, E1 and E2.
What, then, could be the predictive difference between the three models?
We can exploit certain assumptions about the relations between causation and
correlation to make predictions about causal graphs with different structures. These
assumptions are encoded in the Causal Markov Condition, central to contemporary causal
modeling approaches, which states that any variable in a causal graph is conditionally
independent of all other variables which are not its descendents, conditional on its direct
causes. Informally, if there is a causal arrow from A to B (and no other arrows into B),
then conditional on the state of A, the state of B is independent of all other variables in
the graph except for any variable C for which there is a directed causal path from B to
C89.
What implications does the CMC have for the three causal models sketched
above? In the separate cause model (SC), we should expect E1 and E2 to be
89

I am glossing over many of the technical details of the CMC which vary somewhat relative to the formal
frameworks in which it is used. See Hausman and Woodward (1999), Spirtes et al. (2000), and Pearl
(2000).

161
probabilistically independent; Pr(E2|E1) = Pr(E2). They are independent both
unconditionally and conditional on the states of the C1 and C2 variables90.
In the common cause model without an intervening variable (NIV), C is a
common cause of E1 and E2 (which meets the extra conditions stated above). Therefore
(a) E1 and E2 will be unconditionally dependent, such that Pr(E2|E1) > Pr(E2), and (b) E1
and E2 will be independent conditional on the state of C. C will screen off E1 from E2
(and vice versa), such that Pr(E2 | E1 & C) = Pr(E2 | C).
In the common cause model with an intervening variable (IV), because E1 and E2
have a common cause, they will be unconditionally dependent, such that Pr(E2|E1) >
Pr(E2). However, note that variable I in model (IV) is a more proximate common cause of
E1 and E2 than is C. If the parameters in the model are not deterministic, then the CMC
entails that E1 and E2 will be probabilistically dependent conditional on C because they
share a common cause (I) that has not been conditioned upon.
Hence, even if each of the models predicts that E1 and E2 will probably occur if
the relevant causes do, there are higher-order predictive differences between the models;
it is not merely the occurrence of E1 and E2 that is relevant but rather the correlation
between E1 and E2. The predictive differences between the NIV and IV models are
particularly relevant when we consider cases in which the intervening variable, I, cannot
be observed (I is in a black box). Crucially, the predictive differences between NIV and
IV allow us to infer whether there is an intermediate unobserved cause between C and E1
and E2 by seeing whether the probabilistic dependence between E1 and E2 disappears
when we condition on the state of C.
90

One point of clarification is needed. E1 and E2 will only be unconditionally correlated if C1 and C2 are
independent of each other. This assumption is represented by the absence of an arrow between C1 and C2 or
any common cause of the two, plus the assumption of faithfulness.

162

Model

Unconditionally
Independent?

Independent
Conditional on C?

Separate Cause (SC)

Yes

Yes

No Intervening
Variable (NIV)
Intervening Variable
(IV)

No

Yes

No

No

Figure 5.10 - Predictive differences between the SC, NIV, and IV models.

Following Sober (1998a, ms), we can apply this model selection framework to the
mindreading context. Crucially, we cannot compare a single causal chain BRH against a
causal chain MRH but must compare BRH and MRH models of multiple behaviors.
Consider competing models of the subordinates behavior in the opaque barrier and back
turned experiments from Hare et al. (2000).
The models below depict two different causal structures leading from the
experimental sets-up that is, the properties of the experiments that can be observed by
the experimenters, ES1 and ES2 to the behaviors of the subordinate chimpanzee that
is, whether they took the food in a trial, B1 and B2. The shaded boxes refer to beliefs of
chimpanzee. In the BRH, it reasons by separate learned contingencies between
observable states of the experiments here, the presence of an opaque barrier (O1) and
the dominants back being turned (O3) and resultant behaviors of the dominant in each
case (O2 and O4). The MRH posits that the chimpanzee represented both of these
experiments as cases in which the dominant could see the food call this M where this
belief causes it to predict the resultant behavior of the dominant.

163

ES1

O1

O2

B1

ES2

O3

O4

B2

BRH

MRH

ES1

O1

O2

B1

O4

B2

M
ES2

O3

Figure 5.11 A separate cause BRH model and a common cause intervening variable model of the
chimps behavior in two experiments from Hare et al. (2000).

If we assign positive but indeterministic parameters to each of the causal arrows,


both models predict that the subordinate will take the hidden food in the opaque barrier
case (B1) and in the back turned case (B2). However, unlike the causal chain models,
there is a predictive difference between these two models.
The BRH model predicts that the chimps behaviors in each task will be
uncorrelated conditional on the state of the experimental sets-up; the probability that it
performs both B1 and B2 will be equal to the product of the probability of its performing
B1 and that of its performing B2. In other words, a chimp who performs B1 is no more
likely to perform B2 (and vice versa) once we take into account whether ES1 and ES2
obtain. On the other hand, the MRH model predicts that the chimps behaviors will be
correlated even when we have conditioned on the states of the experimental sets-up.

164
4. The intervening variable approach at work
The intervening variable approach can thus be used to experimentally determine
whether subjects represent an intervening variable uniting various experiments. If
subjects responses are correlated in the two experiments such that a subject that
succeeds at one task is more likely to succeed at the other then this is evidence that
there was such an intervening variable. However, as we will see, such a correlation does
not entail that the intervening variable was theoretical rather than observable or even that
there was an intervening variable at all. However, before I consider these challenges to
the intervening variable view, it will be helpful to look at a case in which it does seem
promising.

4.1. The intervening variable approach applied to food competition experiments


In response to the attacks on Hares food competition experiments from Povinelli
et al., particularly their claim that a single observable generalization could explain the
chimps behavior (the line of sight hypothesis), other experiments have been designed
for which it is argued that there is no single observable regularity uniting the various
experimental conditions. For instance, consider a series of food competition experiments
performed by Melis et al. (2006). Chimpanzee subjects were given a choice of two
tunnels, each containing a food reward. A human experimenter sat on the other side of
the tunnels, within view of the chimp. If the chimp chose to reach into a tunnel that the
experimenter could perceive her reaching into, the experimenter would snatch the food
away before the chimp could grab it.

165
In one version of the experiment, the chimp had a choice between opaque and
transparent reaching tunnels, which tested whether the chimp would take food her
competitor could not see her taking. Their results were consistent with those of Hare et
al. (2000). The second version of the experiment offered a choice between tunnels with
either noisy or silent trap doors, which tested whether the chimp would take food her
competitor could not hear her taking. Melis et al. found that chimps succeeded above
chance levels at both tasks, which confirmed the earlier results of Hare et al. and
demonstrated that their paradigm could be extended to a different sensory modality91.
Are these results evidence that the chimpanzees reasoned about the mental states
of their competitors in these tasks? Melis et al. suggest that they do, and further, that
success across multiple sensory modalities provides further evidence of this capacity,
beyond that provided by visual experiments alone:
We thus believe that the current findings demonstrate that chimpanzees have
some understanding of what others can see and, moreover, that they know how to
use this knowledge to conceal visual information from them. Additionally, the
current study provides evidence that this ability may extend to the auditory
domain, suggesting that the underlying mechanism is not tied to any one sensory
modality. Instead, it appears to involve a broader understanding of others
perceptual states (Melis et al. 2006, 161).
They note the logical problem may be used as an attack on their results. It is possible that
chimps in the visual task had reasoned via a learned association between observable
features of the experiment and being rewarded for reaching through the opaque tunnel.
Similarly, the chimps might have had separate beliefs about the observable set-up of the
auditory tasks that sufficed for their success. Melis et al. pessimistically admit that this
debate is nearly impossible to solve empirically: All the results coming from

91

See Santos et al. (2006) for a similar experiment which probed whether rhesus monkeys know what
competitors can hear.

166
experiments with an ecologically valid design can always be interpreted by means of a
behavioral account (161). I disagree with this pessimistic conclusion. The resources for
empirically distinguishing between the BRH and MRH are at their disposal, but
unfortunately, the experimenters did not utilize them.
It is true that we can explain the subordinates behavior by attributing to it
separate learned contingencies among observables. However, plausibly, a mindreading
chimp may represent both of these tasks as cases in which its competitor could perceive
her reaching for the food. This would be a theoretical representation, according to the
definition of theorizing that I defended in Chapter 2, because it unifies separate
observable regularities here, regularities about line of sight and line of hearing into
a single, abstract representation that can be flexibly applied to both. We can test for
whether the chimpanzee represented both tasks as instances of this single theoretical
regularity not by observing whether the chimpanzees as a group succeeded above chance
levels at both tasks but by observing whether chimpanzees that succeeded at one task
were more likely to succeed at the other92. Consider the following BRH and MRH
models of the two tasks:

92

My discussion of the experiments from Melis et al. (2006) draws heavily on conversation with Elliott
Sober and on his analysis of the case in Sober (ms).

167

ES1

O1: E has line


of sight to x

O2: E
will grab x

B1

ES2

O3: E is near
noisy approach
to x

O4: E will grab x

B2

BRH

MRH

ES1

O1: E has line


of sight to x

O2: E
will grab x

B1

O4: E
will grab x

B2

M: E perceives
me taking x

ES2

O3: E is near
noisy approach
to x

Figure 5.12 A separate cause BRH and common cause MRH model of two versions of the food
competition experiment from Melis et al. (2006).

This particular BRH model of the subjects performance on the two tasks of predicts that
chimps performance on the two tasks will not be correlated conditional on the ES
variables while the MRH model predicts that they will.
While Melis et al. do not perform the relevant statistical comparison, they do
provide data that allow us to see whether a subject who passed their visual task was more
likely to pass their auditory task. In addition to testing the chimps on both tasks in the
presence of a human competitor, Melis et al. also ran control versions of both tasks in
which there was no competitor. Below, I have reproduced each subjects individual
scores on these four experiments. The numbers denote the percentage of trials on which
the subject selected the opaque tunnel on the competition test described above (HVR) and
a control test with no competitor (HVC), or selected the quiet trap door tunnel in the
competitive (SLR) and control (SLC) tasks.

168

Name
Jahaga
Fifi
Truddy
Sandra
Frodo
Patrick
Brent
Mean*

Name
Jahaga
Fifi
Truddy
Sandra
Frodo
Patrick
Brent
Mean*

Hidden-visible
reach (HVR)
50
56
44
56
59
67
65
57

Nonsocial
control (HVC)
50
44
50
56
50
50
39
48

Silent-loud reach
(SLR)
67
33
67
83
83
67
40
63

Nonsocial
control (SLC)
50
33
58
50
50
50
33
46

Figure 5.13 - Percentage of trials on which subjects selected the correct tube in four conditions of the
experiment in Melis et al. (2006).

Note that when scores are aggregated across all participants, the mean scores on each
competitive task are above chance (57% and 63%) and the chimps chose the tube that a
competitor could not perceive at higher rates when a competitor was actually present.
However, this does not suffice to show that performance on the two tasks was
correlated. Though a much more rigorous analysis (of much more data) is needed to
make any definitive claims about whether this correlation exists93, a brief look at the data

93

In addition to a larger sample size and more powerful statistical tools, it would also be helpful to have the
data broken down by trial rather than the overall rate of success. For a helpful illustration of these
procedures with respect to experiments on weight cognition in chimpanzees, see Povinelli (2012).

169
here does not suggest that it does. To see this, I calculated the chimp-by-chimp Pearson
correlation coefficient, r94 for the following:
(a)

Correlation between HVR and (HVR HVC) = .891;


Correlation between SLR and (SLR-SLC) = .910

Overall choice percentage on the competitive tasks (HVR, SLR) is one measure of
success. However, in the order to rule out the possibility that success was due to a
preference for the opaque or quiet traps, regardless of their interaction with the
competitor, we might want to compare preferences for those tubes in competitive tasks
versus non-competitive control tasks. In each experiment, there was a strong positive
correlation between success at the competitive task and the amount of difference in
behavior between the control and competitive tasks. This suggests that chimps who chose
correctly in either task did not do so as a result of a generalized preference for that tube.

(b)
(c)

Correlation between HVR and SLR = -.186


Correlation between (HVR-HVC) and (SLR-SLC) = -.291

These two correlations are the key ones in testing whether chimps used a single
representation in both tasks (MRH) or separate learned contingencies (BRH). The first,
(b), is the chimp-by-chimp correlation in success at the visual and auditory tasks, a
measure of whether a chimp who succeeded at the visual task would also succeed at the
auditory task. The second, (c), is a measure of whether chimps who behaved differently
between the competitive and control conditions of the auditory task would do the same in
the visual task, and vice versa. There was a very weak negative correlation between these
two values.

94

Correl ( X , Y ) =

( x x )( y y )
(x x) ( y y)
2

170
How should we interpret these results? While I do not want to draw any strong
conclusions based on the small sample size of this experiment and the weak statistical
method used, a few points are worth noting. First, despite the fact that, in the aggregate,
the subjects succeeded on both tasks, the data here do not suggest that individual chimps
who performed above the population mean on one task were more likely to do so on the
other task95. Similarly, if we take the difference in responses between the competitive and
control tasks to be a measure of how much the subjects strategy was responsive to the
competitor, how much she got it, then getting it on one task was not positively
correlated with getting it on the other within this sample. I conclude that the
experiments as interpreted by Melis et al. do not adequately rebut the logical problem
since they did not test for whether behaviors were correlated across tasks. Furthermore,
when the probative comparisons are actually made, there is no evidence to support that
behaviors in the two tasks was caused by a single underlying mechanism.

4.2. What can we infer if behaviors are not correlated?


Let us suppose, for now, that the data here are robust and that the chimps
performance on both tasks is not correlated. Does this show that the chimpanzees were
not mindreaders? It does not, for while it does provide evidence that there was not a
single intervening variable uniting the two experiments that is, it provides evidence that
particular MRH in Figure 6 is false it does not rule out that the chimpanzees were using
separate mindreading variables to in the two tasks. For instance, it is possible that the
subjects reasoned about what the competitor could see in the visual experiments and

95

In fact, they were slightly less likely to perform above the population mean, though I do not suspect that
this negative correlation would be robust across samples.

171
reasoned about what they could hear in the auditory experiment, but they did not
categorize seeing and hearing via a common belief about what the competitor could
perceive. From the point of view of human folk psychology, this type of mindreading
theory would be peculiar, but it is not impossible.

4.3 What can we infer if behaviors are correlated?


On the other hand, suppose for the moment that the experiments had revealed a
strong correlation between performance at the visual and auditory tasks. Would this show
that the chimpanzees were mindreaders? While I agree that it would constitute evidence
that they were (and indeed, it would strongly favor the particular MRH in Figure 6 over
the BRH model there), it would not entail that they were, and the strength of the evidence
it provides depends on certain background assumptions. There are two alternative BRH
explanations for why their behaviors would be correlated, and depending on the
plausibility of these explanations, a correlation may not lead us to favor the MRH at all.
First, as I argued in Chapter 4, representations of observable regularities can also
serve as intervening variables in mental models and these will have all of the same
syntactic consequences as intervening theoretical variables. In this case, if the chimps
represented the auditory and visual tasks as instances of a single perceptual regularity that
they shared, this would also cause their behaviors to be correlated. Trivially, one such
regularity is Either E has a line of sight to x or E is near a noisy approach to x. This
variable is implausible since this disjunctive category seems extremely gerrymandered
unless one believes that they are both cases of perceiving. That is, there is no reason to

172
suppose that a non-mindreader would spontaneously lump these two cases together under
a single representation.
More substantively, there might be some perceptual regularity that the transparent
and noisy tunnels share; for instance, perhaps their openings both had red markings and
the chimp reasoned via an association between having red markings and being unsafe to
grab food through. This mere possibility does not suffice for the skeptical conclusion that
an observed correlation in the two tasks should lead us to be neutral between the BRH
and MRH because the threat can be mitigated by good experimental design. Indeed,
Melis et al. worked to ensure that the correct tubes were not distinguished by any
common perceptual cue, and their control experiments without a competitor present also
make that interpretation implausible. Another technique is to design other control
experiments that share any contextual cues that could have induced a probabilistic
dependence between O1 and O2; if the putative causal influence does not manifest itself in
the control, then it is inferred to not have caused performance on the original test
conditions to be correlated96.
There is a second way that a BRH model may predict a correlation in performance
between the two tasks which does not posit that the chimps used a single variable to
represent both. Instead, consider a BRH model according to which the chimps reasoned
via separate observable regularities in each task but there is a causal link between

96

In this case, they created a control condition in which the competitor grabbed the food item if the subject
reached through the noisy tunnel. However, in this case, the trap doors made a noise after the chimp had
reached for the food (the experimenters played a noise over a walkie talkie). The chimps did not learn to
avoid this tunnel. Melis et al. conclude that, If chimpanzees were just associating contextual cues with
certain behavioral outcomes in the first two experiments, they should have been able to learn to use cues
provided in these control conditions (160).

173
representations of these two regularities such that a chimpanzee who believes one
regularity is more likely to believe the other:

ES1

O1: E has line


of sight to x

O 2: E
will grab x

B1

ES2

O3: E is near
noisy approach
to x

O4: E will grab x

B2

BRH

Figure 5.14 A BRH model of the two experiments from Melis et al. (2006) in which the two
behavioral regularities are causally related.

We can take the two arrows to represent that there is a common cause of the two
represented contingencies. For example, it may be the case that a chimp who has learned
the association between O1 and O2 is more likely to have made observations from which
it could learn the association between O3 and O4; that is, a subject that has experience
with opaque barriers may be more likely to have had experience with noisy competitive
environments as well. Other types of common causes are possible. For instance, subjects
with greater intelligence or attention spans may be more likely to exceed mean
performance on both tasks. In any event, if there is a causal relationship between the two
observable regularities, the behaviors in both tasks will be probabilistically dependent
conditional on the experimental inputs.
There is one final problem that I want to consider for the intervening variable
approach which would show that there is always a possible BRH explanation for why a
subjects responses are correlated, even when (a) the experimental set-ups are not
perceptually similar, thus ruling out stimulus generalization, and (b) the subject does not
utilize a theoretical intervening variable. It is well-known that animals are capable of
forming categories of perceptually disparate items that have common effects, i.e. things
that are good to eat (Shettleworth 2010). Notice that the auditory and visual tasks all

174
require the subject to predict the same effect (whether the competitor will grab the food
before the chimp can). Therefore, it is possible that a subject that has experience with
losing the food when it reaches through the transparent tunnels and then learns that the
noisy trap doors in auditory trials have the same outcome may group these two conditions
together via the purely observational intervening variable, conditions which result in the
competitor grabbing the food97 even if the experimental set-ups are perceptually
disparate and there is no real theoretical understanding of why these various set-ups
should all yield the same effects.

5. Generalizing
I have argued that the lack of a correlation between the chimps behaviors in the
two food competition tasks does not entail that they were not using a theory, nor would a
correlation entail that they were. For the reasons that I have suggested, some authors in
comparative psychology have concluded that the intervening variable conception is
weak as a basis for the formulation of empirically testable target and alternative
hypotheses (Heyes in press). While I have some sympathy for this assessment, it is
worth noting that many of the problems that emerge for the intervening variable view are
methodological rather than logical (in the sense that the logical problem was
supposed to be systematic rather than methodological). Indeed, putting the framework in
terms of causal models, as I have followed Sober (1998a, 2009, ms) in doing, allows us to
understand why some of the methodological rules of thumb that have been developed in
comparative psychology are useful.

97

Notice that this is different from its learning two separate observable regularities. It groups them as
instances of the same observable regularity, defined by common effects.

175
The discussion of the experiments from Melis et al. (2006) suggests a more
general set of guidelines. In order for a correlation between two behaviors to provide
good evidence for the presence of theoretical variables, it must be the case that:
(a) There is a plausible theoretical intervening variable that would impose such a
correlation (that is, a theory that unites the observable inputs and would cause
both of the observed behaviors).
(b) There is no plausible non-theoretical intervening variable that would also impose
such a correlation.
(c) There is no plausible causal relationship between the two observable
contingencies that would impose such a correlation.
In most experimental cases, (a) is taken for granted. However, comparative psychologists
are acutely aware that steps must be taken to ensure that (c) and are increasingly aware
that similar steps must be taken to ensure that (b).

5.1 Controlling for common causes


With respect to (c), the problem is that even if a subject uses separate observable
contingencies in two separate tasks, there may be reasons why a subject who knows one
of these contingencies is more likely to know the other, and this would cause their
behaviors to be correlated on the two tasks. I suggested three ways that this might occur:
first, differences in domain general cognitive processes (such as increased attention span
or memory) may cause performance to be correlated; second, prior training histories
might suggest that a subject who has learned one contingency would also have had
experience of the other; third, the training within the experiment itself might lead there to
be a causal relationship between the subjects representations of the two contingencies. I

176
will briefly consider strategies that have been developed to control for each of these
possibilities.
The first problem is perhaps the easiest to control for since it will often be
possible to independently test subjects domain general cognitive capacities. For
example, the Continuous Performance Test (CPT) 98 can be used to independently
measure subjects capacity for sustained attention throughout a task, and the Wisconsin
Card Sorting Task can be used to test executive control99.
A more indirect strategy for controlling for general cognitive common causes is
used by Gopnik and Meltzoff (1997). In their defense of the theory-theory of human
cognitive development, they argue that children go through significant periods of theory
change in several different domains. One of the key pieces of evidence they use for a
theory change (that is, a new representation in childrens mental models) is a correlation
in performance on non-linguistic problem-solving tasks and changes in linguistic
utterances. For instance, children start spontaneously sorting objects into categories
around the same time (~18 months) that they start seeking names for everything (a
naming explosion). They take this to be evidence that the child has developed a new
theory of kinds. Similarly and around the same time (~18 months), infants begin to pass
occluded object displacement tasks at the same time they start to use the word gone to
refer to disappearances. Gopnik and Meltzoff argue that this is evidence that the infant
has learned a new theory of objects.

98

CPTs are generally characterized by rapid presentation of continuously changing stimuli with a
designated target stimulus or target pattern; duration of the task varies but is intended to be sufficient to
measure sustained attention (Riccio, et al. 2001, 241)
99
This paradigm tests subjects ability to maintain focus on relevant features and then quickly switch focus
to other features when task demands have changed. While used extensively with humans, it has been
extended to other primates as well (Bont, et al. 2011).

177
However, perhaps we can chalk up all of these correlated changes to a domain
general increase in memory, attention, or processing power occurring around 18 months.
Gopnik and Meltzoff suggest a nifty way of controlling for this possibility. They argue
that if some increase in general intelligence is a common cause of all four changes, all
four should be correlated with one another. However, they found that changes within
each domain are correlated but changes across domains are not; that is, naming and
categorization are correlated with each other, gone and object tracking are correlated
with each other, but the development of the first cluster is independent of the
development of the second. In general then, the way to test whether a correlation in
performance on two tasks is due to general cognitive abilities (rather than an intervening
representation) is to test for a further correlation with other tasks that rely on the same
cognitive abilities but are unrelated to the intervening representation at hand.
A second type of common cause hypothesis states that behaviors will be
correlated if it is the case that a subject who has learned one of the observable regularities
is more likely to have learned the other. Often, this is pitched as a hypothesis about
training histories. For example, in response to the experiments performed by Hare et al.
(2000), Penn and Povinelli (among others) suggest that taking food around opaque
barriers and when dominants backs are turned are part and parcel of life as a chimp, so
that any normal chimp would have experience of both (Penn and Povinelli 2007).
Controlling for this alternative hypothesis is extremely difficult because doing so would
require information about the subjects entire training history100. It also places

100

In the next section, I will demonstrate how an alternative procedure controls for this alternative
hypothesis.

178
experimenters in a double bind; the greater the ecological validity of an experiment101,
the greater the probability that the subject will have the relevant experience with both
regularities.
Lastly, a correlation may result if the experiment itself causes separate learned
regularities to become causally related. The typical way to control for this possibility is to
insist upon immediate transfer; that is, only a correlation between performance on the
final trials of the first task and the very first trials on the second is evidence for a prior,
theoretical intervening variable (Sober 1998a). One problem with this requirement is that
it will exclude theorizers that need to become familiar with new task requirements, and it
will also ignore interesting learning dynamics on later trials (see Povinelli 2012 for a
discussion).

5.2. Controlling for Observable-level regularities


While controlling for common causes may be difficult and its results contentious,
there are good experimental protocols in place for doing so. I will now turn to a problem
that is even more difficult and contentious, that of controlling for the possibility of
competing observable-level intervening representations. Suppose we have ruled out a
common cause which would impose a correlation among the behaviors of a subject using
separate observable contingencies in separate tasks. Then, a correlation in performance
on the two tasks is evidence that it represented the two tasks either via a theoretical
intervening variable or non-theoretical intervening variable. A natural suggestion for
ruling out the latter is to design tests with no observable features in common, united only

101

Ecological validity is defined as the degree to which an experiment is able to simulate a relevant aspect
of subjects individual ontogenies and/or their species evolutionary histories (Hare 2001, 270).

179
by a theoretical regularity. To see the limitations of this approach, it will be helpful to
consider the fate of one such proposal.
The goggles test was first proposed by Heyes (1998), and comparative
psychologists widely (at least for a time) agreed that only a creature with a theory of
mind could pass such a test. While variations on this test have been proposed and
implemented for use in human children (Meltzoff 2007) and corvids (Heyes in press), for
continuity, I will discuss the version that was proposed for use with chimpanzees, using
visors, by Penn and Povinelli (2007a).
In training sessions, the chimpanzee subject is given experience with a red and a
blue visor. The red visor can be seen through, and the blue visor cannot. The chimp thus
learns what it can see when it is wearing each visor. Then, in the test phase, the chimp is
given a choice between begging for food from human experimenters wearing either the
red or the blue visor. Previous studies with chimpanzees have shown that they can learn
to beg preferentially from a human experimenter who can see food than from one who
cannot (Povinelli et al. 1990). These previous experiments have not been considered
conclusive tests of mindreading ability for the same reason as Hares experiments; the
chimp could have passed this test by reasoning about known correlations between
observable properties of the experimenter (e.g. whether she has a line of sight to the food)
and success in begging (Penn et al. 2008).
According to Penn and Povinelli, if a chimpanzee preferentially begs from the
experimenter wearing the red visor, this would demonstrate that it has a theory of mind.
In order for the chimp to reason from its own first-hand experience with the visor to what
the experimenter is likely to do, it must represent its own experiences as cases in which it

180
could not see and then infer that the experimenter, likewise, could not see while wearing
the red visor. Because its own first-person experience with being unable to see is not
perceptually similar to its experience of someone else being able to see, reasoning from
one to the other requires theoretical capacity. They argue:
[This process is] a paradigmatic example of encoding an ms (mental state)
variable about a first-person internal state (i.e. the general epistemic condition of
not-being-able-to-see) that results from a given manifest contingency (i.e. wearing
the red visor) and then using these representations to predict the behaviour of
another cognitive agent to a novel situation (i.e. responding to begging gestures).
We contend that without the ms variable, the subject could not immediately solve
the problem presented (Penn and Povinelli 2007, 738).
If they are correct that only an intervening mind-reading variable could serve to unite the
training and test conditions, then the visor experiment would be a paradigm case in which
the intervening variable view could genuinely distinguish between mind-reading and
behavior-reading hypotheses.
Many authors in the comparative psychological literature have indeed deemed this
to be the best case experiment to test for mindreading (Whiten 2013, Meltzoff and
Brooks 2008, Heyes 2014). Thus, given that this test was proposed nearly two decades
ago, we might expect that the whole controversy over ape mindreading would have been
put to rest. In fact, results of this test in nonhuman animals have never been published
(Heyes in press).
Unfortunately, the consensus over this protocol has fallen apart; even this pure
experiment has suffered the same challenges that beset the intervening variable approach
more generally. For example, Tomasello et al. (2003a) argue that even if a chimp fails
this test, it does not show that they were not mindreading for this test is cooperative
rather than competitive and involves strange visors, so it is too ecologically invalid to be

181
probative. On the other hand, many authors have argued that success on the test would
not reveal that chimps were mindreaders because there are observable regularities that
would suffice for such performance. For example, Andrews (2005, 530) argues that in the
training phase, the chimp learned merely that the red visor is associated with not being
able to do things. Lurz (2009) argues that his line of sight hypothesis can account for
successful performance equally well because the chimp may categorize red goggle
experiences as cases of opacity which it has learned to associate with unsuccessful
begging. Such deflationary alternative hypotheses will likely continue to multiply for any
such proposed experiment102.
The lesson of this case is not that the intervening variable approach is logically
hopeless. Its that its use is often dialectically hopeless. In order to decisively prove that a
correlation in behaviors is due to an intervening theoretical representation, one must rule
out every alternative way that there might be a correlation. Because these correlations can
come from anywhere past training history, too much experience within the experiment
itself, any perceptual regularity between the two contexts, etc. doing so is nearly
impossible. I am not endorsing the view that these alternative hypotheses should all be
taken seriously, and indeed, many of them should not. I agree with Heyess (in press)
complaint about the state of the science:
These proposals would need the support of cognitive science in order to become,
not just in principle possibilities, but alternative hypotheses; to provide the means
and the motivation to devise an experiment for example, using screens with
different properties that could distinguish the transparency/opacity hypothesis
from the seeing/not seeing hypothesis. Surely, outside the curious world of
mindreading, this is how science works incrementally, by dealing with each
theoretically and empirically motivated problem as it comes.
102

The field of mindreading research has come to resemble some episodes in philosophy in which replies
and counter-replies snowball into their own subfield, e.g. the Gettier cases of increasing complexity
subfield of epistemology.

182
The upshot of this discussion is that while tests for screening off intervening
variables is an improvement over the experimental approach of Hare et al. and can in
principle reveal the presence of theoretical variables, the difficulty in controlling for
possible causal interactions among observable beliefs is the chief obstacle to the
intervening variable methodology.
This motivates a different approach. Instead of controlling for all possible causal
interactions among beliefs about observables that could induce correlations, the approach
I favor accepts and exploits the fact that any intervening variable will induce correlations
among responses conditional on experimental set-ups. The task, then, is to create
situations in which theoretical and observable categorizations predict different types of
correlations.

6. A triangulation method for theoretical variables


The intervening variable protocol that I have been discussing takes the only
relevant observation to be whether two behaviors are correlated. I have argued that many
of the problems which have arisen for the intervening variable view can be diagnosed as
a problem of controlling for other types of causes that could impose such a correlation.
Here, I will argue for expanding the protocol to cover cases in which theoretical and nontheoretical hypotheses both predict a correlation in behaviors but for which they predict
different correlations. However, the lesson of the logical problem and theoreticians
dilemma is that, given the tight logical relationship between theoretical variables and
their observable counterparts, teasing them apart is not easy.

183
6.1 Overview of the triangulation framework
The framework that I am proposing is an extension of the triangulation or
transposition method to the particular case of theoretical and non-theoretical beliefs
(Campbell 1954, Heyes in press). This method was designed to tackle the particular
problem that I have raised in previous chapters. In brief, the problem is this.
Psychologists frequently attempt to experimentally distinguish between two intervening
variables that might have caused a subjects behavior, where these two variables are
logically related, often denoting relations at different levels of abstraction. In virtue of
their logical relationship, the relations denoted by these variables will have overlapping
extensions. Therefore, experiments which test for responses falling under the extension of
both candidate variables will not distinguish between them.
This is the basic point of the logical problem and theoreticians dilemma. The
concept of theories which unites various formulations of the theory-theory of human
uniqueness is that humans alone are capable of reinterpreting observable regularities in
terms of higher-order relations relations with causal, explanatory, or functional
structure that are (a) not reducible to observable regularities and (b) can be extended to
perceptually distinct regularities. It follows directly that theories and their observable
counterparts will be logically related. Hempels argument also turned on this claim; in
order for theories to make predictions at the observable level, they must entail observable
regularities.
The key, then, for experimentally distinguishing between these logically related
relations is to understand the ways that their extensions come apart. First, while
instantiating the lower-level observable relation may suffice for instantiating the higher-

184
level theoretical relation, it may not be necessary for it. That is, there may be other
observable relations which also suffice for the theoretical relation in which case the
extension of the theoretical relation is wider than its observable-level counterpart. Less
obviously, the converse may hold. A higher-order theoretical relation may have a
narrower extension than its observable-level counterpart. I will consider these two
possibilities in the next chapter after setting out the general experimental protocol I have
in mind.
Applied to the case of theories, the triangulation method accepts that for any
purported behavior caused by a theoretical variable, there will be some observable-level
variable that also would have sufficed for that behavior. The triangulation protocol
proceeds by training a subject on a novel set of experiences in the training phase. The
training trials are instances of both candidate representations at higher and lower levels of
abstraction. At this point, it is unknown which of the two variables is being represented
by the subject and reinforced by the training trials. In the test phase, the stimulus is
altered slightly from that of the training trials. The goal is to change the stimulus enough
to probe for the content of the intervening variable while remaining similar enough so the
subject continues to use the same variable in the test as in the training phases.
Crucially, in the test phase, the relation between the new stimulus and the old
response is an instantiation of one candidate relation but not the other. For instance,
suppose it falls within the higher-order relation but not the lower. Thus, if a subject were
using the higher-order relation during the training trials, we should predict that they will
extend this relation to the test trails and continue to perform the same behavior. If the
subject were using the lower-order relation during the training trials, we should predict

185
that they will not extend this relation to the test trials and will not perform the same
behavior.
The triangulation method is clearest when the subject has two choices in the test
trials and the lower- and higher-order relations favor opposite choices. This protocol can
also be used to distinguish between an intervening variable that is probabilistically
relevant to the response in the test condition and one that is not, thus covering cases such
as the experiments from Melis et al. (2006) in which only a theoretical intervening
variable unites both the test and training cases.
Schematically, we can represent a triangulation experiment via the causal models
below. The top graph represents an Observable Hypothesis, in which the subject
represents the training trails as instances of an observable category, denoted by the
intervening variable, OG103. The bottom graph represents a Theoretical Hypothesis, in
which the training trials are represented as instances of a theoretical category, denoted by
the intervening variable, T. The top and bottom rows of each graph denote the training
and test phases, respectively. Both of these models predict that the subjects behaviors in
the training and test trials will be correlated with one another since there is a common
cause (here, a reinforced intervening variable). Because the training trials are instances of
the relations established by both OG and T, parameters and will both be positive. The
triangulation method will deliver the clearest results when the and have opposite
signs. In that case, OH will predict a positive correlation between B1 and B2 and TH will
predict a negative correlation, or vice versa.

103

Im using OG to denote an observable generality since it is this type of variable that is usually pitched
against theoretical variables in the cases Im interested in.

186

Training

O1

O2

B1

OG

OH
Test

O3

O4

B2

Training

O1

O2

B1

O4

B2

TH

T
Test

O3

Figure 5.15 Models containing observable (OH) and theoretical (TH) intervening variables.

This protocol has been fruitfully used to distinguish between intervening variables
at various levels of abstraction, but notably, many of these are not cases in which theories
have been tested against non-theories. I will first illustrate a few instances of the
triangulation protocol at work to distinguish between different observable regularities
before turning to a discussion of its applicability to theories, more specifically.

6.2 Wickenss experiment


One of the first presentations of the triangulation approach is found in Campbell
(1954) in the context of a controversy about how to specify what is learned by subjects in
a conditioning experiment. Campbell notes that we must infer the content of an
intervening variable used by a subject in a learning experiment on the basis of its
observable behaviors, and that a problem arises in our ability to do so given that, for a
given set of behaviors, there are multiple descriptions of what is learned that are
compatible with the data. To wit:

187
Let us take, for a hypothetical example, a dog that, under reinforcement of
electrical shock, has learned to lift its paw at the sound of a buzzer. The response
can be defined in at least the following ways: (a) The dog has learned to contract
certain muscles at the sound of a buzzer. (b) The dog has learned to change the
position of its foot relative to the position of its other feet and its total body at the
sound of a buzzer. (c) The dog has learned to remove its body from contact with a
specific object. (d) The dog has learned to remove its body from a given location,
defined in terms of the frame of reference of the room or in terms of latitude and
longitude. These, others, and certain combinations are all possible as
interpretations of what is learned (ibid., 169).
As a result of this ambiguity about what the dog had really learned, some psychologists
of Campbells day argued that the question about what is learned is a pseudo problem
and a theoretical blind alley (Kendler 1952). Campbell argues that such pessimism is
unwarranted, for these various hypotheses are operationally distinguishable despite prior
experiments failure to do so. Campbell diagnoses the problem as follows:
These four and the possible others are experimentally confounded in this learning
experiment as described, and in all learning experiments in which the habit is
tested only in the situation in which it was originally acquired. Rather than
assume one interpretation and neglect the possibility of others, it behooves the
learning theorist who wishes to have his theory based upon firm empirical
grounding to set up experimental situations in which these possible interpretations
are disentangled and unconfounded (Campbell 1954, 169).
To illustrate the type of experimental protocol he has in mind, Campbell
approvingly cites experiments by Wickens (1938, 1939, 1943, 1948) that are closely
analogous to the hypothetical case of the dogs learned response to the shock. In
Wickenss experiments, human subjects placed their middle finger on an electrode, with
the palm down. Through the pairing of a tone with an electrical shock to the finger, a
conditioned movement of the finger in the presence of the tone was achieved. In these
trials, the subjects behavior could be characterized either as an extensor movement of the
middle finger or a movement of the finger away from the electrode, and the experiment,
thus far, does not tell us which of these the subject had learned (e.g. Perform an extensor

188
movement when you hear the tone or Move your finger away from the electrode when
you hear the tone).
In his follow-up experiments, Wickens disentangled these two possibilities by
having subjects place their middle finger on the electrode, with the palm up. If the
subjects had learned to perform an extensor movement in the training phase, Wickens
predicted that they would move their finger toward the electrode in the test phase, while
if they had learned to move their fingers away, they would now perform a flexor
movement. Campbell reports that 90% of the subjects continued to move their fingers
away from the electrode in the test phase, indicating that they had not learned a specific
muscle movement but instead learned to change their fingers position in a specified
direction104.
Notice that the original problem for interpreting Wickenss first experiment arose
because the two interpretations of the learned response move the finger up and move
the finger away from the electrode were both true of the training trials. Moving the
finger up is one possible instantiation of the more abstract relation of moving the finger
away from the electrode namely, when the palm is down but it is not the only possible
instantiation. What Wickens did was alter the test conditions such that the more abstract
relation is instantiated by a different finger movement. We can model the two
experiments as follows, where the two intervening variables have the same probabilistic
relationship to the behavior in the Palm down condition (B1) that is, parameters and

104

Sober (1998a) performs a careful analysis of Wickenss data to determine whether the subjects
behaviors in the two tasks really were correlated with one another, so as to indicate an intervening variable.
Note that while this is the relevant question according to Sobers intervening variable approach, Campbell
assumes that we should attribute an intervening variable in any case, where the real question regards its
content.

189
are both positive but have different relationships to the behavior in the Palm up
condition that is, parameter is positive and is negative.

Palm
Down
(ES1)

Tone

B 1: move
finger up

Move
finger up

Palm
Up
(ES2)

Tone

Palm
Down
(ES1)

Tone

Palm
Down
(ES2)

B 2: move
finger up

B 1: move
+ finger up
Move
finger away

Tone

- B 2: move

finger up

Figure 5.16 Two models of what is learned in Wickenss experiments.

While Wickenss experimental design elegantly illustrates the triangulation method, it


might be objected that the type of intervening variables it addresses are quite dissimilar
from the theoretical beliefs I am interested in. The candidates for what is learned are
different conditioned responses, and it might be a stretch to even call these beliefs.
Thus, it will be helpful to consider one more example of the triangulation method before
turning to its applications for theoretical beliefs in particular.

6.3 Rumbaughs experiment


The next example I will discuss still falls short of a comparison between
theoretical and non-theoretical intervening representations, but it does compare two

190
observable representations at different levels of abstraction. In a long series of works,
Rumbaugh and colleagues developed tests to measure the extent to which prior learning
positively or negatively influenced facility at learning new tasks (Rumbaugh and Pate
1984, Rumbaugh 1995, Tomasello and Call 1997). This measure, called the transfer
index, was used to compare general intelligence across over one hundred primate
species.
In one such test, subjects were shown two cups one blue and one red per trial,
one of which was baited with food out of sight of the subjects. If a subject correctly chose
the baited cup, it was given the reward, and if it chose incorrectly, it was shown that the
food was under the other cup and received nothing. In the training condition, the red cup
was always baited. Training ended when a subject began to make the correct choice 84%
of the time (more on this threshold percentage in a moment).
In the test condition, the general set-up remained the same, but now experimenters
began to consistently bait the blue cup. Rumbaugh and colleagues observed how many
trials subjects required before they began choosing the blue cup at the success threshold.
They argued as follows. If subjects required an equal number of trials to reach success in
picking the blue cup as they did for the red cup, then learning the first contingency had no
effect on learning the second. If subjects learned the reversal contingency more quickly
than the initial contingency, then learning the first contingency had a positive effect on
learning the second (positive transfer). If subjects took more trials to learn the second
contingency than the first, then learning the first contingency had a negative effect on
learning the second (negative transfer).

191
Rumbaugh argued that positive transfer was evidence that a species performance
was cognitively mediated rather than the result of mere conditioning. However, since
some mere conditioning is also mediated by some mental representation or other, so I
want to put Rumbaughs experiments to the somewhat different purpose of identifying
the content of the mediating mental representation.
Call the final trial of the testing condition n. Consider two hypotheses of how
subjects had represented the contingency learned in the training phase of Rumbaughs
experiments which caused them to select the red cup at n:
(Color) The subject observed the experimental set-up and colors of the two cups,
and categorized the cups by a variable denoting color, OC, which can take states
red (+OC) or blue (-OC). In the training trials, the subject learned that there
was a strong (perfect) correlation between a cups being red (+OC) and its being
baited. This learned contingency caused the subject to select the red cup at trial n.
(Same Color) The subject observed the experimental set-up and colors of the two
cups, and categorized the cups by a variable denoting whether they were the same
color as the cup that was baited in the previous trial, OSC, which can take states
same color (+OSC) or different color (-OSC). In the training trials, the subject
learned that there was a strong (perfect) correlation between a cups being the
same color as the cup baited on the previous trial (+OSC) and its being baited. This
learned contingency caused the subject to select the red cup at trial n.
There does not seem to be any reason to favor one or the other of these hypotheses as an
explanation for why a subject chose the red cup at trial n. Along with Rumbaugh, we can
assume that the probability that the subject will choose a cup that instantiates some
property either being red or being the same color as the previous baited cup will be
proportional to the strength of the association it has observed between that property and
reward. Since all of the training trials were instances of the red cup being baited and of
the cup that was the same color as the previous baited cup being baited, both learned

192
contingencies would be equally strong and both would promote choosing the red cup at n;
thus, Color and Same Color can explain the behavior equally well.
Note, however, that while a red cup at trial n is an instance of both +OC and +OSC,
there are possible cups that are instances of the latter but not the former, namely, blue
cups on trials for which the previous baited cup was also blue. By introducing these
instances in the test phase, we can tell which categorization the subject was using. The
n+1 trial will not be informative since both Color and Same Color predict that the subject
will continue to choose the red cup. However, they make different predictions starting
with the n+2 trial.
We can model the two hypotheses as follows. For both the n and n+2 trials, call
the red cup A and the blue cup B. The experimental set-up (ESn) includes the observable
properties of the present trial as well as a memory of previous trials. Following
Rumbaugh, we can assume that the parameters are proportional to the strength of
the observed contingency between that category of cups and reward in the training trials.

193

Cup An
will be baited

B1: Choose
red at n

Test
(ESn+2)

Cup An+2
will be baited

B2: Choose
red at n+2

Training
(ESn)

Cup An
will be baited

B1: Choose
red at n

Cup An+2
will be baited

B2: Choose
red at n+2

Training
(ESn)

OC

(Color)

OSC

(Same
Color)
Test
(ESn+2)

Figure 5.17 Two models of the intervening representation used in Rumbaughs experiments.

On trial n, subjects using Color and Same Color to categorize the cups will both predict
that the red cup, cup A, will be baited; that is, parameters and are both strong and
positive. Thus, both models predict that the subject will choose the red cup.
On trial n+2, subjects have observed a single disconfirming instance of each
candidate regularity (one trial in which the red cup was not baited and one trial in which
the cup that is the same color as the previous baited cup was not baited). However, these
learned contingencies, and , will still be strong. Thus, a subject categorizing cups by
Color will still predict that the red cup will be baited at the n+2 trial; because is
positive, this raises the probability that the subject will select the red cup at n+2. A
subject categorizing cups by Same Color will predict that the cup that is the same color as
the baited cup on the n+1 trial will be baited. Because this is the blue cup, it will predict
that the red cup will not be baited at the n+2 trial; because is negative, this lowers the
probability that the subject will select the red cup at n+2.

194
Color and Same Color both predict that the subjects responses across trials will
be correlated, but they predict different correlations. Color predicts a positive correlation;
a subject who chooses red on n will be more likely to choose red on n+2. Same Color
predicts a negative correlation; a subject who chooses red on n will be less likely to
choose red on n+2.
Which of these hypotheses was borne out by the data? Rumbaugh observed that
prosimians and some monkey species continued to preferentially choose the red cup until
the overall frequency of baited blue cups exceeded the frequency of baited red cups
(negative transfer). Apes and some monkey species reversed immediately to picking the
blue cup (positive transfer). I interpret these results as indicating that the former group
was categorizing the cups by Color while the latter had categorized the cups by Same
Color105. While Rumbaugh suggests that the difference between the two groups is one of
general intelligence, the difference may instead be the ability to represent perceptual
properties at higher levels of abstraction106.
I have been interpreting the parameters in the above models as measures of the
strength of a subjects association between a categorizing property and reward. Further
support for this interpretation comes from Rumbaughs follow-up experiments in which
subjects were trained to a 67% success rate instead of 84% (Rumbaugh 1995). My
interpretation predicts that subjects that are trained to an 84% success rate should have

105

My interpretation is compatible with the (not implausible) hypothesis that the apes were using the Lower
categorization in the training phase (that is, picking the red cup) but that they switched to the Higher
categorization in the test phase. Regardless, this hypothesis must still posit that the apes categorized the
testing phase as instances of Higher, though perhaps only after the Lower categorization failed.
106
This is not so much a disagreement with Rumbaughs interpretation but rather a more specific
hypothesis about the nature and content of the mediating concepts that distinguish negative and positive
transfer species. This may also correlate with a vague conception of general intelligence.

195
stronger associations between their intervening categorizations and responses than those
with shorter training periods.
For Color, this means that with a shorter training period, there will be a weaker
association between red and reward. Thus, subjects categorizing by Color should
reverse to blue more quickly when the training period is shorter. For Same Color, this
means that with a shorter training period, there will be a weaker association between
being the same color as the baited cup on the previous trial and reward. Thus, subjects
categorizing by this higher-order property should reverse to blue less quickly when the
training period is shorter.
Color

Same Color

67% training

Faster reversal

Slower reversal

84% training

Slower reversal

Faster reversal

Figure 5.18- Predictive differences between the Color and Same Color hypotheses when training rate
is manipulated.

Indeed, Rumbaugh found that the negative transfer species reversed more quickly and
positive transfer species reversed less quickly with shorter training periods.

7. Upshots
These experiments exemplify a promising strategy for using the intervening
variable approach to test for both the presence and the content of intervening beliefs.
Importantly, it allows us to experimentally discriminate between intervening beliefs that
are logically related, thus providing a powerful way to evade the logical problem. We can
put the general strategy more informally as follows.
First, one provides subjects with experience that would suffice for the relevant
predictions in the training phase and observes that subjects perform a behavior that could

196
be explained by attributing learned contingencies between either a lower-order or more
abstract categorization and reward. The relevant question to ask is, What states of affairs
instantiate the higher-order regularity but not the lower? or vice versa. Then, one tests
for whether the learned contingency is extended to these cases that fall within the
extension of one candidate representation but not the other. The results will be clearest
when the two candidate representations make opposite predictions, and therefore, the two
hypotheses predict opposite correlations in the subjects behaviors. Lastly, this
framework requires that the same intervening representation is used in both the training
and test phases of the experiment, and we can test for whether this occurred by
manipulating the strength of the contingency developed in the training phase.
I have argued that the triangulation framework is an improvement over the
intervening variable approach from Whiten and Sober particularly with respect to those
cases in which there are candidate theoretical and non-theoretical intervening variables
that may impose correlations on subjects behaviors. The triangulation method has the
potential to distinguish between these variables because it uses correlations in subjects
behaviors to reveal not only the presence of intervening variables but distinguish between
variables with different contents. Note, however, that this first requires that we specify
what the contents of the hypothesized intervening variables are in order to design cases in
which their extensions differ.
One of the attractive features of the intervening variable approach is that it only
requires us to test for a syntactic feature of a model. If there is a correlation among
outcomes that does not disappear when the experimental set-up is conditioned upon, then
there (probably) exists an intervening variable. However, since the triangulation

197
approach, success turns on the possibility of the two variables having different
extensions, it requires that we delve into the semantic properties of the hypothesized
intervening variables in order to distinguish between them.
This fact has an important consequence. The intervening variable approach
offered the promise of experimentally determining whether a subject was a theorizer
simpliciter. However, it did so only by conflating intervening variables with theoretical
intervening variables; that conflation allows one to conclude that a subject is a theorizer if
there is an intervening variable and a non-theorizer if there is not. As I have argued at
length, this conflation is an error. However, once we accept this, then it is no longer
possible to test a general theorizing hypothesis against a general non-theorizing
hypothesis. Instead, we can only test hypotheses about particular theoretical variables
and particular non-theoretical variables.
So far, I have considered examples of the triangulation approach that do not pit
theoretical variables against non-theoretical variables but rather observable variables at
different levels of abstraction. How can we use this method in the case of theories, more
specifically? In the next chapter, I will draw on two responses to the theoreticians
dilemma which specify two different ways that the extensions of theoretical and
observable variables may come apart, which in turn suggest two ways to use the
triangulation method to test for their presence.

198
Chapter 6: Two Roles of Theories in Human Cognition (and How to Test for Them)

1. Introduction
In previous chapters, I have argued that in order to be a plausible account of the
cognitive differences between humans and animals, the theory-theory of human
uniqueness must offer satisfying answers to the following questions:
(a) What is the unique epistemic function of theories such that a theory user is
capable of behaviors that she would not be capable of performing if she
lacked a theory, and how do theories satisfy this function?
(b) How could we know whether a subject is a theory user?
At the end of Chapter 5, I argued that the triangulation approach offers a promising way
to answer question (b), for it allows researchers to experimentally discriminate between
different intervening representations even when the candidate representations are
logically related to one another.
The triangulation method proceeds by acknowledging that candidate intervening
representations of logically related relations will have overlapping extensions. Thus,
testing for whether a subject behaves in accordance with observable regularities that
instantiate both will not distinguish between the two candidates. However, in some cases,
it is possible to find regularities that fall within the extension of one candidate relation
but not the other. If the subject extends its knowledge to these cases, then this constitutes
evidence for which relation she had represented.
In order to use this framework to distinguish between theoretical variables and
observable-level variables, then, it will be necessary to specify the ways that their
extensions can come apart. The theoreticians dilemma is motivated by the fact that

199
theories and their observable-level empirical counterparts will be logically related to one
another, and thus, they present a particularly challenging methodological problem.
Further, as I argued at the end of the previous chapter, in most cases, the triangulation
method requires that we specify the particular theoretical representation and particular
observable-level representation under consideration, for to understand the predictive
differences between the two, it is necessary to specify their content. Thus, it is impossible
to say unequivocally how the extensions of any and all theoretical and observable-level
variables will differ.
In this chapter, I will consider two types of responses to the theoreticians
dilemma in the philosophy of science which identify two distinct epistemic roles that
theories play. Importantly, these two accounts suggest two different ways that the
extensions of theoretical and observable regularities call these T and OG respectively107
can differ and thus two ways to use the triangulation method to experimentally
discriminate between them.
The first response to the theoreticians dilemma argues for an expansive role for
theories. By re-representing an observable regularity in terms of some more abstract
theoretical relation, a theory user can extend her knowledge beyond the empirical
regularities she has learned at the observable level. Thus, in this role, T has a wider
extension than OG. The second response argues for a limiting role for theories.
Sometimes, an empirical generalization can be too general, including extraneous or
accidental contingencies in past observations, and a theory user may identify the more

107

As in the previous chapter, I will use OG to denote that the intervening representation may be of an
observable generality, rather than some particular observable matter of fact.

200
specific relation in virtue of which the broader empirical generalization holds. Thus, in
this role, T has a narrower extension than OG.
In this chapter, I will examine each of these proposed epistemic functions of
theories, showing how they attempt to rebut the upwards theoreticians dilemma. Then,
for each, I will present an example of an intuitive theory in human cognition that
manifests the given function. Lastly, I will consider each functions ramifications for
testing for the presence of theories in humans and other animals.

2. An expansive role of theories


2.1 Introduction
According to the account of the expansive role of theories developed by Earman
(1978, 1992) and Putnam (1963), a theory allows its users to reinterpret some observed
regularity in terms of some more abstract relation, where this relation can then be
extended to predictions beyond the original observed regularity from which it was
learned.
In his development of this idea, Earman is sensitive to the distinction that I have
been drawing between the downward and upward versions of the dilemma. In his concise
(1978) explication of the dilemma, Earman argues that there is a special sense in which
theories cannot be dispensed with favor of their observable counterparts (see also Psillos
1999, 26). His basic point is that all Craigs Theorem108 proves is that at some time t1, the
theoretical terms of a developed theory can be replaced; in other words, the downward

108

The Craig-counterpart of a theory serves as a syntactic replacement of theoretical terms, but it may not
retain certain structural elements of the theory. Earman also discusses another way of replacing theoretical
terms the Ramsey sentence counterpart which replaces all t-terms with variables which are existentially
quantified that does retain this structure.

201
formulation of the theoreticians dilemma is sound. However, the same theory may
change, both with respect to its observational entailments and its structure, when new
information is learned. Of course, the subsequently altered theory at t2 can also be
dispensed with in favor of its observable-language counterparts at that time. Crucially for
Earmans point, the indispensible contribution of the theory may lie in the transition from
states t1 to t2; the theory can grow and develop with new observations while preserving
important continuity between those two states, a continuity that will not exist between its
observable-level counterparts at t1 and t2.
Consider an old theory T and a new theory T which is introduced after new
observations but contains the same T-terms. First, though T and T can each be
eliminated in favor of their TB and TB observational counterparts at any given time, there
is no guarantee that the observational consequences of (TB

TB) are the same as (T

T)B, unless the T-terms of each are explicitly definable in O-terms (that is, their
content is exhausted by the actual observations that have been made to that point). In
other words, if the theory goes beyond its observable consequences, then when we
make new observations, we may learn something new about the observable consequences
of the theory.
Earman concludes that even if we could, at any single time, dispense with a
theory in favor of a description of its observational consequences:
When the temporal development of science is taken into account, one can never
be sure that the observational part Craig(T1) of the theory T1 captures the full
empirical import of T1 for unless there is explicit definability, the theoretical
terms of T1 have the potential over time of helping to generate new observational
predictions which cannot be generated with the help of Craig(T1) alone (Earman
1978, 199).

202
It is true that once T has generated new observational predictions that result in the
formulation of a new T at some later time, the theory can be dispensed with in favor of
its Craig (or Ramsey) equivalent which entail all of the empirical regularities of T.
However, if the theory can generate new observational predictions that its observable
counterpart would not, then there may be an important continuity between T and T
they are the same theory, in an important sense that does not hold between TB and TB.
That is to say, while the downward dilemma might be sound, the upward may not be.

2.2. Los Alamos example


How, then, is it possible for the theory to generate such new observational
predictions? Earman does not give a detailed analysis of how this is possible that is,
how the upward formulation of the dilemma can be resolved but he does offer a
suggestive example from Putnam (1963) which suggests an answer to the problem
(Earman 1992, 79-82). Putnam asks us to imagine that we are scientists at Los Alamos
preparing for the first test of the atomic bomb. We have a hypothesis, couched in
theoretical terms, about what will result when two subcritical masses of uranium-235 are
slammed together:
H: when these two subcritical masses of U235 are slammed together to form a
supercritical mass, there will be an atomic explosion.
Note that it is possible to state a counterpart of H that is stated in purely observational
terms:
H: when these two rocks are slammed together, there will be a big bang.
In fact, and fortunately for the scientists at Los Alamos, the atomic explosion which
resulted was not a surprise. Scientists predicted that H and seemed to be justified in that

203
prediction. The question is whether the scientists could have made this prediction and
have been justified in it without the use of a theory (here, atomic theory, T); in other
words, would scientists have assigned a high probability to H and been rational in doing
so unless they had used T?
Earman and Putnam argue that they would not have been. Consider the
observational evidence, E, that scientists had prior to observing the first atomic
explosion. If E is couched in observational terms and is not interpreted via T, then it does
not seem that E supports belief in H; up until that point, there had been no observed
cases in which two rocks were slammed together resulting in a big bang and many
observed cases in which two rocks were slammed together without that result. Thus, if
scientists were justified in believing H on the basis of E, it must be because the theory
that they used to interpret those prior observations played some essential role. In this
case, the prediction that a big bang would occur is entailed by a theory T, which is
confirmed by past observations E, and which makes E inductively relevant to H.
Earman gives a Bayesian account of this process:
You were in possession of a theory T of the atomic nucleus that entails H.
Applying the principle of total probability to the total available observational
evidence E gives
Pr(H | E) = Pr(T | E) + Pr(H | ~T & E) x Pr(~T | E)
Thus, if your opinions conformed to the probability calculus, your confidence in
H should have been at least as great as your confidence in T. And E made you
somewhat confident of T (because, for example, T entails other experimental
regularities whose positive instances are recorded by E). Further, ~T includes
other theories that also entail H or make H highly probable, and E made you
somewhat confident of those theories. The upshot was that you were more
confident of H (Earman 1992, 80).

204
2.3 Are inductive connections enough?
However, a puzzle still lingers about this case. In Chapter 4, I discussed a similar
proposal from Hempel, according to which theories are indispensible because they
establish inductive connections among observable phenomena. His example was the
postulation of a chemical kind, white phosphorous, to establish inductive connections
among the various properties of white phosphorous, such as being garlic-odored, soluble
in turpentine, causing skin burns, and bursting into flames at 30C. There, I argued that
Hempel failed to show that theories were necessary because an altered version of the
theoreticians dilemma can be constructed for the inductive case as well.
The downward inductive theoreticians dilemma states that the role of a theory T
is to establish inductive connections that is, probabilistic dependencies among
observables. In order for T to establish a probabilistic dependence between O1 and O2, the
argument goes, the theory must entail a statement of the probabilistic dependence
between O1 and O2 which could then replace the theory with no loss in predictive
function. I have argued that this downward formulation does not show that theories are
unnecessary109.
The upward inductive theoreticians dilemma states that in order to know that
ones theory entails a probabilistic relationship among observable states, one must have
already learned of this relationship from the observable level. However, the argument
goes, this prior observed regularity suffices for all of the predictions of the theory, and

109

Thus, I think Earman is incorrect in claiming that T might be said to be essential to establishing
inductive connectiosn among observables if there are observation sentences O1 and O2 such that
Pr(O2|T&O1) > Pr(O2|T), or more interestingly, if Pr(O2|T&O1) > Pr(O2|OT&O1), where OT is a sentence
logically equivalent to the set of observational consequences of T (1992, 81). If the downward dilemma is
right, then this latter condition will never be satisfied. However, this does not show that T was inessential;
possibly, OT would not be known without T.

205
therefore, the theory is unnecessary. While I believe that this upwards formulation of the
dilemma is unsound, I argued that Hempels own analysis failed to show why it was
unsound. The challenge that remains is to specify how theoretical-level generalizations
can be learned and utilized such that these generalizations make predictions that go
beyond those that have been previously given at the observational-level.
In the Los Alamos case, by hypothesis, there was no prior observed positive
dependence between rocks of such a size being smashed together and atomic explosions.
In other words, if the scientists at Los Alamos were justified in making their prediction, it
is not the case that in order for a subject to possess, learn, and/or utilize a theory that
generates the prediction that observable states O1 and O2 will be correlated, she must
already have observational-level knowledge that O1 and O2 are correlated. The remaining
question, then, is how the atomic theory achieved this feat.
While an in-depth investigation of the particular history of how atomic theory was
used to develop the atomic bomb is not possible here, the basic picture is as follows110. At
Los Alamos, the scientists evidence, E, included past observations that bombarding bits
of uranium-235 with particles (O1) resulted in those bits emitting energy (O2).
Observations were also made about how much bombardment was necessary to achieve
this result, at what distance, and so on.
This phenomenon was (correctly) explained theoretically by positing that
bombarding the uranium with neutrons caused uranium atoms to absorb neutrons which
in turn caused them to become unstable and to split (fission). The theory, among other
things, encoded a precise relation between the levels of bombardment (O1) necessary to
release energy and radiation in lumps of uranium of varying size, shape, density, distance
110

For a comprehensive history of the subject, see Rhodes (2012).

206
from the source of bombardment, and so on (O2). Scientists could then extrapolate this
regularity to lumps of uranium of sizes, shapes, etc. beyond that which had already been
observed.
Then, a theoretical explanation was also given for the energy and radiation
released at O2. It was hypothesized (correctly) that when a uranium atom split, it released
one to three neutrons. This last step permitted the crucial induction that the particles
released at O2 would obey the same principles as the initial particles used to start the
reaction, O1. Thus, given the right configuration of uranium, a chain reaction could occur,
with the neutrons emitted by a splitting uranium atom causing nearby uranium atoms to
split as well. They had not previously observed this chain reaction occurring because they
had only observed subcritical masses of uranium; that is, the material was not of the right
size, density or shape for the released neutrons to interact with nearby atoms before
dispersing from the material. However, by extrapolating the function that the theory
encoded as obtaining between past observations, O1 and O2, they were able to predict the
size, density, and shape of uranium necessary to create a sustained nuclear chain reaction.
Thus, atomic theory played two roles in promoting the prediction that two lumps
of uranium-235, when smashed together, would result in an atomic explosion (H). First,
past observations of subcritical masses alone were not inductively relevant to the
prediction about what would happen at supercritical masses unless they were reinterpreted as instantiating the abstract structural regularity that encompassed both.
Second, though the scientists observations of the causes and effects of particle
bombardment were probably quite different, the theory represented each as instances of

207
the same phenomenon, thus establishing that the functional properties discovered of the
former would be true of the latter as well.

2.4 Generalizing the lesson


The Los Alamos example suggests a more general lesson regarding how theories
can establish genuinely new inductive connections. The key is that observed empirical
regularities are reinterpreted as instantiations of more abstract theoretical relations, where
these theoretical relations have extensions which go beyond the particular observable
instantiations from which they were learned. Thus, by extrapolating this more general
relation, a theory-user can make predictions regarding connections among states that she
has not yet observed.
The Los Alamos case also suggests two ways that the extension of a theoretical
relation can be wider that that of the empirical regularity from which it was learned. In a
case of within-domain extrapolation, an empirical regularity within a domain is
redescribed in terms of a theoretical relation which is then extrapolated to unobserved
cases of that same domain. For example, the function of, say, a monotonically decreasing
relationship between the density of observed lumps of uranium and the bombardment
level necessary to create a reaction in them, can be extrapolated to densities that are much
higher or lower than those observed.
In a case of cross-domain extrapolation, empirical regularities in domain A and B
are redescribed in terms of the same theoretical relation, and it is induced that what is true
of A is also true of B111. For example, if the particles bombarding a lump of uranium are

111

This type of reasoning is central to Gentners (2010) structure mapping account.

208
seen as instances of the same theoretical regularity as the particles emitted, observed
properties of the former can be extrapolated to hold of the latter.
The white phosphorous case, at least as developed by Hempel (1963), fails to
demonstrate how the introduction of an unobserved common cause permits the theory to
make predictions beyond those that are known from the observable level alone. I think it
is at least a plausible conjecture that such theoretical posits will only be epistemically
necessary when the posited variable has structure, such that it has entailments for
observable states beyond just the common effects from which it was inferred112.

2.5 Testing for this Function


The expansive role of theories under consideration draws on the fact that the
extension of a theoretical regularity can be wider than that of the observable regularity
from which the theory was learned, and we can exploit this fact to design triangulationstyle experiments. The basic framework is as follows. Consider two candidates for the
subjects intervening representation, a theory, T, and an observable generalization, OG. T
and OG will have overlapping extensions, so they could each explain predictions that fall
under both; this is the point that motivates the logical problem. We can probe for the
content of the intervening belief by testing for whether the subject will extend that
learned contingency to predictions that fall within the wider extension of T but not OG. If

112

Although Hempels discussion doesnt bring it out, in fact, the chemical causes that have been posited to
explain the observable properties of materials have more structure than almost any other scientific theory in
common usage, as even a brief examination of the periodic table shows. One can predict a novel elements
place in the periodic table by its observable properties, and the periodic table predicts further observable
properties of an element as a function of its position in the table (denoting the number of shells in the atom
and number of electrons in the outer shell).

209
she does, this is evidence that she had represented the contingency as holding between T
and the outcome of interest rather than OG (and vice versa).
However, extending predictions to cases that fall outside the extension of OG does
not entail that the subject had done so via T, for the logical problem and downward
theoreticians dilemma show that there would still have been some observable regularity
that would have sufficed for that prediction. Fortunately, the triangulation approach has
the resources to deal with this possibility.
The first, as discussed in Chapter 5, is to assume that the strength of a learned
contingency will be proportional to the number of confirming instances that the subject
observed in the test phase. Thus, we expect that a theory user will achieve success more
quickly on the test condition with more training, while a subject who needs to learn a new
observable regularity to succeed in the test condition is less likely to show improvement
with longer training (because by its lights, observations in the training phase are not
relevant to the novel prediction).
Second, the triangulation approach is iterative, and we can use the second,
limiting role of theories to complement the initial test. While an in-depth discussion of
this function will have to wait until Section 4, a brief characterization is possible here.
Suppose that we hypothesize that a subject used a separate observable regularity to
succeed in the test phase. Once we develop a specific hypothesis about the content of this
regularity, we can then iterate the triangulation process to determine which other
predictions the subject will extend it to. Sometimes, this candidate regularity will extend
to cases outside of the extension of the original candidate theoretical variable. Thus, if the

210
subject extends its learned regularity to those predictions as well, this is further evidence
that she was not using the theory.

3. An expansive role of theories the case of numeral theory


3.1 Associative vs. numeral theory
While some expansive theories may instantiate just one of within- or crossdomain extrapolation, theories that pair the two, such as the atomic theory in the Los
Alamos case, enable uniquely powerful predictive abilities, as the case study in this
section will illustrate.
Humans and number-trained apes can both learn to associate numerals with sets
of objects of particular sizes and use these associations to guide their behaviors. For
example, both a typical fifth grader and Ai, the most extensively number-trained chimp in
the world, can identify numerals from 1 to 9, place them in correct ascending numerical
order, and can retrieve the correct number of objects when prompted with a numeral
(Matsuzawa 1985). We might then wonder whether the human child and the chimpanzee
use the same representations of numbers in the production of these behaviors.
More specifically, suppose that you show a child and Ai the numeral 4 and ask
them to bring you a set of marbles corresponding to that number (Wynns Give A
Number task)113. The child and Ai both bring you four marbles. Suppose that both
subjects know numbers one through three and have made extensive observations of a
mapping between 4 and sets of four objects. Consider two hypotheses for the types of
representations that subjects used to perform this behavior. First, perhaps they
represented a learned relation between the observable properties of the numeral and
113

This test, developed by Wynn (1990, 1992) has been used extensively with children.

211
observable properties of the corresponding set (4  ). Alternatively, we might
hypothesize that the subjects had a theory of numbers that goes beyond that mere
empirical regularity.
As I have argued, in order to test this latter hypothesis, it is necessary to be more
specific about the particular theoretical representation of numbers that might be at work.
One influential characterization of human numeral theory is provided by Gelman and
Gallistel (1978) in their study of counting behavior in children. They argue that
conceptual competence in counting and other early numerical ability requires:
Stable order principle: numerals always occupy the same position in the numeral
list (4 always follows 3 and precedes 5).
Cardinality: the last numeral symbol reached in a count corresponds to the
number of items in the set.
One-to-one correspondence: every numeral corresponds to a unique set size and
vice versa.
Indifference of object order: objects in a set may be counted in any order.
Abstraction: sets comprised of different kinds of objects can be counted.
In her thorough examination of the development of numeral competency in children,
Carey (2009) argues that the first three conditions together entail the following property
that can be used to characterize a theory of numbers:
Successor function: For any symbol in the numeral list that represents cardinal
value n, the next symbol on the list represents cardinal value n+1.
The theoretical hypothesis, then, is that the subject reasoned that 4 corresponds to a set
of four objects by reasoning about the relations between 4 and other numerals and sets
of four objects and sets of other sizes. For example, the subject might have started with

212
zero marbles and added a single marble, counting 1, and so on until the correct cardinal
value was reached.
How could we tell whether the child or Ai the chimp had used a direct association
between the observable properties of numerals and set sizes or a more complicated
representation of numerals and set sizes as instantiations of the successor function?

3.2 Testing for Number Theory


Fortunately, in the case of number theory in children, careful and extensive
experiments utilizing the triangulation approach have been performed to tease apart
these two hypotheses. Briefly, the picture of human number development that has
emerged is as follows114.
By the age of three, many children can recite the numeral line in the correct order
up to 10. Despite this, their performance on the Give A Number task demonstrates a
surprising lack of conceptual competence with numbers. Children begin as subset
knowers who can associate particular set-sizes with particular numerals (Carey 2009,
298). For example, between 24 and 30 months, children learn to map 1 to single
objects. They can pass Wynns Give 1 task, retrieving a single object if asked to give
1. If asked to give a number other than 1, they will bring more than one object, but
their choice is at random; when asked to bring 2, they are equally likely to retrieve two,
three, or four objects.
A child who knows the numeral line and the successor function that for any
symbol in the numeral list that represents cardinal value n, the next symbol on the list
represents cardinal value n+1 should be able to reason that since 1 corresponds to a
114

For a much more extensive discussion and bibliography, see Carey (2009).

213
set of one object, she can create a set of 2 by adding exactly one object to the set
corresponding to 1. This suggests that children at this stage do not utilize the number
theory in performance of the Give A Number task. This stage persists for six months to
a year until children become two knowers who can discriminate 1 and 2 from each
other and from other numerals but perform at chance for all other numerals. Likewise,
they then become three knowers and remain so for some months. This is consistent
with the hypothesis that though extensive experience, children had learned new
associations for each numeral mapping through 3.
However, this pattern changes dramatically after children have become three
knowers (or occasionally, four knowers). To learn 5, children no longer need
extensive experience of the contingency between 5 and sets of five objects and
spontaneously match higher numerals to the appropriate set sizes. This new ability is
manifested in success at the Give A Number task as far as the childs knowledge of the
numeral line extends.
Hence, around the age of 3, there is evidence that children switch from
association-based reasoning to theory-based reasoning about numbers. Following Earman
and Putnams Los Alamos example, we can make this clearer by examining a childs
prediction about a novel mapping, say, the mapping between 5 and five objects.
Imagine a child who has learned the numeral line until 10 and knows the mappings
between 1 and one object, and so on up until 4. Suppose that she has not observed
any pairings of the numeral 5 with sets of five objects115.

115

If it seems implausible that a child would not have observed this relevant contingency, we can recast the
case in terms of a pairing for which it is plausible, such as the pairing of 19 and sets of nineteen objects.

214
Now, imagine that the child is asked to Give Me 5 and correctly selects five
marbles. Call her prediction that 5 would correspond to a set of exactly five objects, H.
Is this a prediction that the child would make without a theory? Consider the
observational evidence, E, that she has prior to her first observed pairing of 5 and five
objects. She has extensive observations of the pairings of 1 to one, 2 to two, 3 to
three, and 4 to four. If E is couched in observational terms and is not interpreted in
terms of a number theory, then it does not seem that E supports the prediction that H. The
learned contingencies between the observable properties of prior numerals and
corresponding set sizes give the child no information about the set size that will
correspond to 5116.
Thus, if the child predicted H on the basis of E, it must be because the theory that
she used to interpret those prior observations of numerals 1 through 4 played some
essential predictive role. Indeed, if the childs prior observations are interpreted as
instantiations of the successor function, then they are inductively relevant to the
prediction that H. Observed pairings of numerals until 4 with their corresponding set
sizes instantiate and thus confirm that numbers obey the successor function. Then, this
function can be extrapolated to numerals (whose position on the numeral line are known)
for which the agent has not yet observed the relevant empirical regularity that would
suffice for the same predictions as the theory. In this case, the child could extrapolate the
function, reasoning that if 4 in the numeral list represents cardinal value , the
next symbol on the list represents cardinal value + .

116

In the Arabic system, there is no perceptual dimension along which numerals are related to set sizes, e.g.
numerals with increasing numbers of lines correspond to increasing set sizes.

215
A childs success at retrieving exactly five marbles in the Give Me 5 task does
not entail that she had used a number theory; there will always be some observable
regularity that would have sufficed for the prediction that 5 maps to sets of five. By
hypothesis, this is not a regularity that the child has from the observable level alone.
However, in real cases, it will often be very difficult to determine whether this is the case.
Let us suppose, then, that we are not sure whether the child has observed a contingency
between 5 and five. I have sketched two strategies for using the triangulation method to
evaluate this hypothesis.
First, note that children take nearly as long months in each case to learn the
mapping between 3 and three as they did to learn the mapping between 2 and two
and 1 and one. However, 5 and higher numbers are learned rapidly. This severe
discontinuity in the amount of experience necessary to learn the right mappings would be
very surprising if the process by which each was learned were the same.
Another alternative hypothesis is that the child used an altogether different
observable regularity to correctly choose five marbles. For example, she might have
reasoned as follows: If a numeral does not look like one that you already know matches
to n objects, then it corresponds to a set n. Thus, she could have categorized 5 as a
numeral for which he does not know the mapping, predicted that the correct choice would
not be one, two, three, or four marbles and on this basis, she might successfully select
five marbles.
We can exploit the second role that a theory of numbers plays its limiting role
to run an iteration of the triangulation experiment, pitting the number theory against this
new candidate observable generalization. Notice that a subject using the successor

216
function will predict that 5 maps to exactly five objects. However, the pick any size
set other than one I already know the numeral for regularity applies not just to five but
to any other number greater than four as well. Thus, we expect that a subject using this
regularity will extend 5 to sets of six or more marbles. Experimenters have used this
very protocol to determine that subset knowers interpret new numerals in precisely this
way (5 means anything but 1-4) whereas children with a theory of numbers are exact
in their predictions.

3.3. Do apes have a theory of numbers?


I started this section by noting the fascinating number competence of Ai, a
chimpanzee who has learned the Arabic numerals through 9, can place randomly
presented numerals in ascending order, and can label sets of objects with the correct
numeral (Matsuzawa 1985, 2009). Ais feats have since been replicated with other
number-trained chimps, including Ais own son, Ayumu.
Certainly, the performance of these chimps is impressive. On some number tasks,
particularly memorizing sequences of numbers, they outperform even human adults.
However, using similar protocols as I have been discussing above, Matsuzawa has
discovered significant differences in the number cognition of chimps and human children,
consistent with a lack of theoretical ability in the former.
Unlike children, Ai never exhibited a leap from being a subset learner to a theory
user. While all normally developing children spontaneously generalize to new numerals
with no additional training after learning the correct mapping of 3 (or, for some
children, 4), Ai required the same number of trials to learn every new numeral.

217
Additionally, when Ai knows a numeral n, she treats numeral n+1 as referring to any set
size that she has not yet mapped to a numeral; when Ai knew 5 and was asked to select
sets of 6 items, she chose randomly among sets larger than five (Matsuzawa 2009,
94)117.
Matsuzawa concludes that the number concept in terms of symbolic
representation is uniquely human, and comparable levels of abstraction have yet to be
demonstrated in the rest of the animal kingdomeven in our closest evolutionary
neighbor, the chimpanzee (ibid., 97). Likewise, Carey speculates that Ais failure is
suggestive of a deeper discontinuity, in that nonhuman animals lack a crucial tool for
combining previously distinct mental representations: They cannot use lexical identity as
a clue that previously unrelated representations actually capture the same aspects of
reality (2009, 332). According to the theory-theory of human uniqueness, this failure
with respect to lexical identity is due to an even deeper inability to match perceptually
distinct relations at all.

3.4 Numbers and Extrapolation


While the use of the successor function to make predictions about new set sizes is
clearly an instance of within-domain extrapolation, the childs development of a theory of
numbers also requires cross-domain extrapolation. Notice that unless the child has
already learned to recite the numeral line, she cannot predict what size set corresponds to
5. To predict that H, the child must use information about the relative position of 5

117

Matsuzawa also observed other differences in his chimps number behavior. For instance, they seem to
have trouble generating one-to-one matchings between different sets of objects, and the time that is
required for Ai to judge the numerosity of different sets of objects strongly suggests that she does not count
objects one by one (ibid.).

218
with respect to numerals she does know. This relational information about the numeral
line is used to predict a similar relation among set sizes.
In particular, the child must recognize that states of the numeral line instantiate
the same abstract relation as do states of set size, namely the successor function.
According to the definition I have been using, this is something that only a theorizer can
do. The reason is that a transition in states of the numeral line does not bear any
perceptual similarity to a transition in states of set size. These are only the same with
respect to some higher-order, abstract relation that they both instantiate, and recognizing
this common relation allows information from the first domain to be transmitted to the
second. This process is given central importance in Careys bootstrapping account of
number learning which she characterizes as follows:
Children learn the count sequence as a meaningless routine. They note the identity
of the words one, two, three, and four which now have numerical
meaning, and the first words in the otherwise meaningless counting list At this
point, the stage is set for the crucial induction. The child must notice an analogy
between next in the numeral list and next in the series of models related by adding
an individual This analogy licenses the crucial induction: if x is followed by
y in the counting sequence, adding an individual to a set with cardinal value x
results in a set with cardinal value y (Carey 2009, 327).
Carey argues that similar cases of linguistic bootstrapping are crucial for learning
theories more generally. On her view, linguistic devices (like the numeral line) are pieces
of cognitive technology designed to have structure that matches the structure of the target
phenomenon in the world. If she is right, then matching lexical relations to relations in
the world is a particularly powerful way that theorizers can develop new, structurally rich
theoretical representations allowing them to extend their knowledge beyond previously
observed regularities. In other words, it is one way that the upward theoreticians
dilemma can be unsound.

219
4. A limiting role of theories
4.1 Sellarss competing picture of theories
The second historical response to the theoreticians dilemma that I will examine is
offered by Wilfrid Sellars in a series of papers (1963a, 1963b, 1965, 1977). He is
concerned with the conclusion, drawn from the theoreticians dilemma, that theories are
in principle otiose, that a theory might be known on general grounds to be the sort of
thing which, in the very process of being perfected, generates a substitute which, in the
limiting case of perfection, would serve all the scientific purposes which the perfected
theory would serve (Sellars 1963b, 118).
As I have noted, one way to respond to Hempels theoreticians dilemma is to
accept his assumptions about the nature and purpose of theories and to show that the
conclusion that theories are superfluous does not follow from those premises. To some
extent, this strategy is adopted by Putnam and Earman in their identification of the
expansive role of theories. Sellars adopts the alternative strategy and rejects one of the
assumptions of the argument.
Sellars begins his defense of the utility of theories by arguing that Hempels
layer-cake model of theories is mistaken. Recall that on this model, theoretical
generalizations are general laws that have the function of establishing systematic
connections among empirical facts in such a way that with their help some empirical
occurrences may be inferred, by way of explanation, prediction, or postdiction, from
other such occurrences (Hempel 1965, 177). Thus, there are three layers to the
theoretical cake.

220
At the top, there are the general (deductive or probabilistic) laws couched in the
theoretical language. In the middle, there are empirical generalizations couched in the
observation language which specify the (deductive or probabilistic) connections among
types of observable states. These top layers are connected by interpretive sentences that
connect terms of the theoretical language to terms of the observation language (with the
precise specification of these interpretive sentences left somewhat open). The bottommost layer states particular matters of fact couched in the observation language which are
predicted from the empirical generalizations of the middle layer.
Following Sellars, we can illustrate this theoretical structure via the relationship
between the kinetic theory of gases and Boyles law, an empirical generalization which
states that (holding temperature fixed) the volume and pressure of a gas will be inversely
related118. Boyles law can be used to make predictions about particular matters of fact,
for instance, how the pressure of a particular amount of gas held at 100 will change if its
volume is changed from 1 liter to 2 liters.
According to Hempels picture, the laws of the kinetic theory namely, those
describing the average kinetic energy of molecules making up a gas and the force that
they exert on their container explain Boyles law by entailing it. However, the
downwards theoreticians dilemma bites here: once the theoretical laws have generated
Boyles law, they can be dispensed with for that observational level empirical

118

It might be objected that temperature, pressure, volume, and fixed amount of a gas are themselves
theoretical concepts. To this, I offer two responses. First, while it is true that the modern concept of
temperature, for example, is shot through with theoretical content, it is possible to consider a more
rudimentary notion of temperature akin to the one Boyle himself used on which temperature is an
observational primitive (see Carey 2009 for a brief history of the concept of temperature). If that response
is not satisfactory, one can merely operationalize these concepts in terms of instrument readings or the like.

221
generalization suffices for all predictions about matters of fact. Thus, the argument goes,
the kinetic theory of gases is ultimately superfluous.
Even more troubling, the upwards theoreticians dilemma also seems to bite. As a
matter of fact, the relevant empirical generalization sufficient for predicting how
pressure and volume will be related throughout a wide range of values was discovered
and well-confirmed centuries before the theoretical laws that explain that empirical
regularity were discovered. Further, perhaps it was necessary for scientists to have first
established the robust empirical generalization embodied by Boyles law before the
kinetic theory of gases could be developed. If this is true, then scientists were already in
possession of the very empirical generalization that would render their theoretical
generalizations otiose.
Sellars argues that if the purpose of the kinetic theory of gases were to generate
new observable regularities, then indeed, the theory would have played no indispensible
role in this particular case119 (as the actual historical ordering demonstrates). However,
we do think that the development of the kinetic theory of gases was a worthwhile
scientific project that yielded some benefit, over and above Boyles law. Thus, the theory
must serve some other purpose.
According to Sellars, Hempels picture of theories on which theoretical
generalizations explain (by entailing) empirical regularities which in turn explain
particular matters of empirical fact is fundamentally misguided. Instead, Sellars argues:

119

This is not to say that Sellars believes that theories could never generate new observable regularities. His
argument is that Hempels picture cannot make sense of cases, like this one, in which the theory by
hypothesis does not do so.

222
Theories about observable things do not explain empirical laws in the manner
described, they explain empirical laws by explaining why observable things obey
to the extent that they do, these empirical laws; that is, they explain why
individual objects of various kinds and in various circumstances in the
observation framework behave in those ways in which it has been inductively
established that they do behave. Roughly, it is because a gas is in some sense of
is a cloud of molecules which are behaving in certain theoretically defined
ways, that it obeys the empirical Boyle-Charles law (1963b, 121).
To illustrate, consider schematic pictures of Hempels and Sellarss conceptions of
theories.
T-language generalizations
Interpretive
sentences

O-language empirical regularities


Inductive
confirmation /
explanation

O-language matters of fact


Figure 6.19 - Hempel's layer-cake depiction of theories.

O-language empirical regularities


Inductive
confirmation /
explanation

Explain

T-language
generalizations

O-language matters of fact


Figure 6.20 Sellarss revised depiction of theories.

On the bottom layer of Sellarss picture of the kinetic theory of gases, there are
observations of the pressure and volume of particular gases and how they changed in
relation to one another. On the top is Boyles law which gives a general statement of how
pressure and volume are related for gases in general. On his view, the purpose of the
kinetic theory of gases is to explain why it is that observed particular matters of fact
display the robust, stable, and projectible regularity described by Boyles law. While it is

223
clear that Sellarss picture differs from Hempels, it is less clear what ramifications this
should have for explicating the indispensible role of theories. What does relocating
theories as intermediaries between the level of observable matters of fact and empirical
regularities purchase?
From Sellarss writing in The Language of Theories and elsewhere, however,
we can cobble together a picture of how theories, thus situated, can fulfill functions that
empirical regularities cannot. Roughly, his arguments can be divided into two types. His
first purports to show that in order for a theory to play an important role in science, it is
not necessary for there to exist a corresponding O-level empirical regularity; there may
be a theoretical difference without an observational difference (a too few empirical
regularities argument). The second purports to show that O-level empirical regularities
are not sufficient to replace the contributions of the theory because there might be
observational differences that do not make a theoretical difference (a too many empirical
regularities argument).

4.2 Too few empirical regularities


To illustrate how there might be a theoretical difference without a corresponding
empirical difference, Sellars asks us to consider a case in which two otherwise
observationally indistinguishable bits of gold are discovered to dissolve at different rates
in aqua regia. Because they are observationally similar, there are no regularities in the Olanguage regularities of the form If the gold has property O1, it will dissolve at rate O3,
but if it has property O2, it will dissolve at rate O4 that capture this difference.
However, a theoretical distinction can be made to explain why different bits of gold

224
dissolve differently; one can posit that this difference is caused by two structures of
microentities each of which corresponds to gold as an observational construct, but such
that pure samples of one dissolve, under given conditions of pressure, temperature, etc.,
at a different rate from samples of the other (ibid., 122).
However, it is unclear what exactly the theoretical distinction is contributing in
this case. For one, if there are no observable markers to distinguish gold of one type from
another, then the theory will not allow you to predict whether a particular bit of gold will
dissolve at one rate rather than another. For Sellars, the virtue of such a distinction is that
it would explain (or at least begin to explain) a phenomenon for which no O-level
explanation is available (the most that can be said at the O-level is that lumps of gold
have some probability of dissolving at either of O3 or O4). However, this raises the
question of why such an explanation is scientifically valuable if it does not render any
new or better predictions.
An answer to this question, suggested by van Fraassen (1980), paves the way for a
more promising lesson about of this role of theories. He first points out that the
theoretical distinction between two types of microstructures of gold does have predictive
consequences at the observable-level:
Suppose the two substances are A and B, with dissolving rates x and x+y and that
every gold sample is a mixture of these substances. Then it follows that every
gold sample dissolves at a rate no lower than x and no higher than x+y; and that
between these two any value may be foundto within the limits of accuracy of
gold mixing. None of this is implied by the data that different samples of gold
have dissolved at various rates between x and x+y (van Fraassen 1980, 33).
Additionally, while denying that there is any requirement that a theoretical explanation be
given for any O-level difference whatsoever, van Fraassen suggests that there may be an
instrumental methodological reason for doing so, for by imagining a certain sort of

225
micro-structure for gold and other metals, say, we might arrive at a theory governing
many observationally disparate substances, and this might then have implications for
new, wider empirical regularities when such substances interact (ibid. 34).
Uniting these two insights, we can draw two lessons that will be important for the
discussion to follow. First, even if a theory does not permit one to make specific
predictions about particular matters of fact, the theory may make predictions about more
abstract, higher-order structural properties of a domain that will have O-level
consequences for that domain. The theory permits within-domain structural inference.
Here, the theory states that the observed variability in dissolving rates was not accidental,
has a certain range of values, and obeys certain functional properties (say, that the
dissolving rate of a mixture of gold is the average of the dissolving rates of its
constituents).
Second, even if we suppose that the theory does not tell us much about gold in
particular even if it doesnt expand the range of empirical regularities we know about
bits of gold, taken singularly or as a group theorizing about gold can have ramifications
for our theorizing about other domains. The theory permits cross-domain structural
inference. For example, suppose that one has posited different microstructures to explain
why gold regularly dissolves at different rates. Then, by analogy, one may infer that other
types of metals, say copper, will have different microstructural realizers with different
rates of dissolving120.
120

This analogical step is not required, of course, from the details that Sellars has provided about the case.
It may be required by other theoretical commitments one has, for example, a belief that all metal kinds have
similar extents of microstructural heterogeneity. Then, if one has posited that gold has a certain amount of
microstructural heterogeneity, all other metals will too.
Short of explaining why bits of copper have different rates of dissolving, one might make inferences about
copper by placing the relevant hypothesis under the umbrella of a wider overhypothesis (Goodman 1955),
such as, All chemical kinds have variable rates of dissolving, such that what has been observed to be true

226
What might this inductive leap purchase you? Well, for one, it may lead you to
refrain from inferring on the basis of a single observation of one bit of copper that
dissolves at rate x that future bits of copper must also dissolve at rate x. Further, it may
cause you to revise your standards for metal kindhood; for instance, if you observe that
two bits of metal that are otherwise similar but dissolve at different rates, your knowledge
about gold may lead you to hesitate from classifying these as instances of different metal
kinds. In both of these cases, by hypothesis, there is no available empirical regularity that
would serve the purposes of the theoretical explanation.

4.3 Too many empirical regularities


Scientific inference is messy. Observations can contain errors, observation reports
may contain unnecessary details, and our even our best theories in the special sciences
contain ceteris paribus laws. The problem is that there are many empirical regularities
that are consistent with our experience and while some of these will be projectible,
picking out real regularities in the world (the signal), the vast majority will not be (the
noise). One of the central tasks of science, then, is to distinguish the signal from the
noise; according to Sellars (1963a), theories are essential in this project. Theories do their
work by picking out a privileged subset of the empirical regularities consistent with our
experience and explaining why these are the projectible, scientifically valuable ones.
For example, Sellars supposes that by introducing a theoretical distinction
between different microphysical structures of gold, one has committed oneself to the
belief in a projectible irregularity; i.e. samples of gold will exhibit reliable variation in
of the overhypothesis is inferred to be true of the more specific hypothesis. On my liberal definition, this
would count as theoretical if the overhypothesis extends to perceptually distinct underhypotheses. For a
discussion of overhypotheses in relation to the theory-theory, see Kemp et al. (2007).

227
dissolving rates as long as they reliably are mixtures of different microstructures. On the
other hand, our microphysical theory can explain why some observed irregularities will
not be projectible; i.e. samples of gold observed on a Tuesday more frequently have
dissolving rates ending in odd decimals. This may be the case even if the empirical
regularities are equally well confirmed at the observational level.
Sellars defends this role of theories in relation to the common-sense theory which
posits material objects to explain regularities in phenomenal experience. Consider the
phenomenalist who, emboldened by the theoreticians dilemma, claims that established
regularities among our phenomenal experiences alone suffices for the purposes of
prediction about future phenomenal experiences. If the inductive generalizations provided
by the language of external objects can be captured entirely by their sense-content
counterparts, then object-language could be dispensed with.
For example, consider the generalization, couched in the language of external
objects, that All lightning is followed by thunder121. By analogy with Hempels layer
cake view of scientific theories, what the phenomenalist seeks is a set of regularities,
couched in the language of sense contents alone, which will capture the same inductive
regularities the theory of external objects establishes. In this case, such a generalization
will be something like, Whenever I have sense experiences of a certain kind (S1) such
as the impression when I am looking through my window122 and have sense content of
yellow jagged lines I then have sense experiences of a certain kind (S2), of loud
rumbling noises. Then question, then, is whether a phenomenal regularity of this kind
121

This example is taken from Lange (2010).


My attempted characterization of the sense contents corresponding to lightning itself refers to objects,
which is precisely the thing that the phenomenologist is trying to excise from her vocabulary. Therefore,
one might think that it is impossible to characterize sense-contents in a language divorced from that of
objects. For a similar objection and a possible response, see Sellars (1963a, 81).
122

228
can establish the prediction that Anyone having sense experiences of kind S1 would
subsequently have sense experiences of kind S2.
The empirical regularity, S1  S2, contains as its antecedent a set of features
shared by my sense experiences when, and only when, I was in the act of perceiving
lightning. S1 will contain sense-data corresponding to lightning jagged-shaped yellow
lines against a dark background but plenty of other regularities besides the sense-data
resulting from my window frame and its panes of glass, the particular wallpaper in my
field of vision, the distinctive smell of my home, and so on. These regularities are
essentially autobiographical, such that even granting that there are inductively
warranted generalizations which permit the definition of phenomenally conditional sense
contents, the latter will be logically tied to the peculiarities of my environment in such a
way that they cannot be transferred to other things in other places (ibid., 81).
In order to separate the intersubjective wheat from the essentially
autobiographical chaff, the phenomenalist must identify a subset of the sense-data
contained in S1 that is really projectibly linked to the following sensation as of thunder,
S2. However, Sellars argues that this could only be done via a theory of objects. It is
because my sense-data of lightning is actually caused by lightning that someone elses
sense-data of lightning will also be followed by thunder. My sense-data of the wallpaper
is irrelevant because it is caused by an object in my idiosyncratic environment.
We can expand Sellarss point with respect to phenomenalism to construct a
response to the theoreticians dilemma more generally. On this view, the role of theories
is to pick out the reliable, projectible empirical regularities from the perhaps infinite
number of regularities that are consistent with our past experience. They do this by

229
explaining why observable particulars obey empirical regularities. Sometimes, this
explanation will show why the fact that they do obey a regularity is a mere accident and
is not expected to transfer to novel situations. In other cases, a theory can explain away a
seeming violation of a well-established empirical generalization, showing why this does
not impugn its general reliability (these are the ceteris paribus conditions found in the
laws of special sciences). In some cases, a theoretical explanation will show that there is
a real nomological reason why observed particulars have exhibited a regularity which
licenses us to extend that regularity either within that same domain or to other analogous
ones.
In his summary of Sellarss view, Lange aptly describes how these identified
functions of theories respond to the two versions of the theoreticians dilemma:
[Sellarss] point is that I cannot justly arrive at physically necessary lower-level
generalizations by inductively projecting regularities holding among my past
observations unless I am guided in my projections by an upper-level outlook. The
empirical counterparts of the kinds figuring in theoretical laws are rendered
salient by our observations if and only if we allow theoretical considerations to
play a role in our inductive reasoning. Hence, even if the theoretical layer adds
nothing to the capacity of the lower-level generalizations to make correct
empirical predictions, the theoretical layer is indispensable to our making those
predictions, since it is crucial to our justly arriving at those lower-level
generalizations (Lange 2010, 218).
The upward dilemma states that in order to use a theory to make a prediction, one
must already possess the very reliable empirical generalization that would suffice for
predictive purposes. Sellarss first argument (too few empirical regularities) shows how
this can fail to be the case. In particular, theorizing in one domain can suggest empirical
regularities in analogous domains with which the theory-user does not have
observational-level experience.

230
However, suppose that we grant the claim that a successful theory user must
already possess the reliable empirical regularity that would suffice for predictive
purposes. His second argument (too many empirical regularities) shows that even if
this is the case, she may not be able to identify which of the regularities she possesses is
the right one to use unless she uses a theory. At the observable level, there may be
nothing to distinguish the projectible from non-projectible empirical generalizations; only
the theoretical explanations of why matters of fact obey these generalizations can do this
work.

4.4 Testing for this function


We have already seen how this limiting function of theories can be used in the
triangulation paradigm to distinguish between theoretical and non-theoretical intervening
variables. In the case of numbers, one of the candidate observable regularities that could
have been used to succeed at the Give Me 5 task was if x is a numeral other than one I
already know, then it corresponds to a set size that I have not already matched to a
numeral. This regularity has a wider extension than the theoretical representation since it
raises the probability that 5 corresponds to any number greater than four.
In general, the limiting role of theories under consideration draws on the fact that
the extension of a theoretical regularity can be narrower than that of an observable
regularity from which the theory was learned. T and OG will have overlapping extensions,
so they could each explain predictions that fall under both. However, we can probe for
the content of the intervening representation by testing for whether the subject will
extend that learned contingency to predictions that fall within the wider extension of OG

231
but outside of the extension of T. If it does, then this is evidence that the subject had
represented the contingency learned in the training phase as holding between OG and the
outcome of interest, rather than T. In particular, it is evidence that the subject had
overextended an observable regularity to predictions that would be unprojectible by the
lights of the theory.
In tests for the expansive role of theories, I argued that a subjects extension to
cases that falls outside of OG does not entail that they were using a theory, for they might
have exploited a separate empirical regularity that would suffice for those predictions.
The converse is the case here. A subjects overextending past the extension of T does not
entail that they were using OG rather than T, for the subject might have used T in the
training phase but switched to a different observable regularity that does extend to these
new cases. She also might have been using a different theory than the one originally
hypothesized. These possibilities can be controlled for via the same methods I have
sketched above.

4.5 Example of testing for the limiting role


We can illustrate how the triangulation approach can be used to probe for theories
in their limiting role by looking at experimental investigations into the mental processes
underlying tool use in animals. Despite the fact that many animals, including
chimpanzees, capuchins, and corvids, create and use tools in the wild, it is controversial
whether this is evidence that they are capable of sophisticated, theoretical causal
reasoning. Understanding of animal tool use is impeded by the common dilemma in
studies of animal cognition, that both high- and low-level explanations may account for

232
apparently impressive performances (Taylor et al. 2009, 251). For example, observation
of termite fishing in chimpanzees raises the questions:
When a primate uses a stick to displace a reward and thereby render it accessible,
does it foresee where the reward is going to go in space? Can it learn to anticipate
or represent where its action is going to displace the reward? Does success in
tasks that require the displacement of an object by means of another object imply
an appreciation of causality? Can success be achieved otherwise? (Limongelli et
al. 1995, 18).
To answer these questions, Visalberghi and colleagues devised an experimental
protocol, the trap tube test, to examine the mental representations by which primates
solved tool-use tasks (Visalberghi and Limongelli 1994, Limongelli et al. 1995,
Visalberghi and Limongelli 1996). In a trap tube experiment, subjects are presented with
a transparent tube, with a trap with a solid base in the middle (which can be seen by the
subject). A food reward is randomly placed on one side of the trap, and the subject is
given a stick that it can use to push the food out of the tube. If the subject pushes the food
over the trap, it will fall into it and become inaccessible. To succeed above chance
(>50%), a subject must learn to insert the stick on the side opposite the trap from the
food.
Limongelli et al. (1995) tested chimpanzees on the trap tube task, and two of their
chimps succeeded far above chance levels. Additionally, these two chimps were highly
confident tool users, solving most trials by the immediate insertion of the stick in the
correct side, which the researchers interpreted as evidence that the chimps were able to
represent beforehand the outcome of a successful trial (Limongelli et al. 1995, 21). These
results were replicated by Povinelli (2000).
Notice that the chimps could have represented the outcome either via the
empirical regularity, OG, pushing from the side with the trap is associated with losing

233
the food, or the theoretical regularity, T, gravity causes food pushed over the trap to fall
into and get stuck in the trap. How, then, can we tell which of OG or T was used? The
key is to find a way to manipulate the trap tube test so that test trials fall under the former
regularity but not the latter.
Visalberghi, Povinelli, and colleagues devised an ingenuous strategy for doing so.
After presenting chimps with training trials intended to reinforce the use of an
intervening representation, they simply inverted the trap tube, so that now, the trap was at
the top of the tube and thus posed no threat. In this case, OG still promotes avoiding the
trap; OG raises the probability that the food will fall be lost in both the tube-down and
tube-up trials. However, T no longer promotes avoiding the trap; T raises the probability
that the food will be lost in the trap-down task but lowers the probability that it will be
lost in the trap-up task. In Sellars-ian terms, T explains why OG obtained in the training
trials, and in doing so, it shows that the regularity will only be projectible for a subset of
trap tube cases, namely, only those in which the trap points down and has a solid bottom.
When this new test was given to chimps that had succeeded on the trap-down
condition, all of them persisted in avoiding the trap. This suggests that the chimps had
indeed been using OG to succeed at the initial trap-down task and had overextended the
observable regularity to the trap-up task123. Similar patterns have been found in
capuchins, orangutans, and corvids (see Penn et al. 2008 for a review). Thus, in contrast
with human children who rapidly figure out both the trap-down and trap-up conditions,
this pattern of failure has been interpreted as evidence that nonhuman animals do not

123

The trap tube protocol has been criticized by several researchers, in part because many animals find it
more natural to pull with a tool rather than push. This has prompted the development of more ecologically
sound tests for tool use. See Taylor et al. (2009) for a summary of experiments performed with New
Caledonian crows.

234
understand unobservable causal properties such as gravity and support; nor do they
reason about the higher-order relation between causal relations in an analogical or theorylike fashion (Penn et al. 2008, 119).
It may be objected that a chimp who persists in avoiding the trap in the trap-up
condition might do so, not because he wasnt using a theory, but because he was using a
different causal theory than the gravity-based one that humans use. For instance, we may
posit that the chimp had a theory, T, that a trap (however it is oriented) acts as a food
sucker, causing the food to escape into the trap. This theory is false, of course, but
would explain his behavior equally well.
This alternative explanation underlines a point that I have already made with
respect to testing for theories; it will not be possible to experimentally test the general
hypothesis Subject X used a theory versus the general hypothesis Subject X did not
use a theory. Instead, we can only test particular hypotheses about specific theories
against hypotheses about specific observable regularities. The triangulation approach also
offers a way to deal with this problem.
Suppose that we are still uncertain whether a subject avoided the trap due to the
learned observable regularity (OG), a theory of gravity (T), or the food sucker theory
(T). We can tease these hypotheses apart by finding instances in which their extensions
differ. For instance, if the subject extends its learned regularity to cases in which a trap is
located on the side of a vertical tube, this is evidence in favor of OG and T and against T.
It is harder to tease apart the food sucker and avoid the trap hypotheses without
specifying the particular details of the food sucker theory. In the event that these two
regularities are genuinely co-extensive every time the food sucker hypothesis predicts

235
the food will be trapped, the avoid the trap regularity does too the triangulation
approach cannot distinguish between the two. This is not a particular scandal for the
approach, however, as it is an instance of the more general truism that if two theories
make identical predictions, no observation will evidentially distinguish between them.

5. The limiting role of theories the case of essentialist theories


We can use Sellarss insights about the epistemic roles that theories play to shed
light on a common type of human intuitive theory. As I mentioned in Chapter 2,
cognitive scientists have discovered that humans tend to form essentialist theories,
particularly in the biological domain (Gelman 2003). The essentialist hypothesis states
that:
People seem to assume that categories of things in the world have a true,
underlying nature that imparts category identity The underlying nature, or
category essence, is thought to be the causal mechanisms that results in those
properties we can see. For example, the essence of tigers causes them to grow as
they do to have stripes, large size, capacity to roar, and so forth (Gelman et al.
1994, 344).
Following Strevens (2000), we can represent an essentialist theory of tigerhood as a
(partially) causal model, where the property of being a tiger is definitionally linked to
having the essence of tigerhood which in turn causes the observable properties of tigers.
It is worth noting that actual users of the essentialist theory may not, and often do not,
have commitments about what entity or mechanism actually constitutes the essence,
though it usually has something to do with biological relatedness and internal structure
(Gelman and Wellman 1991).

236

Figure 6.21 - An essentialist theory of tigerhood (Strevens 2000, 151).

Strevens (2000) argues against the essentialist hypothesis by showing that we can
explain subjects predictions in the biological realm via a more minimal hypothesis that
does not attribute belief in essences. His argument is strikingly similar to the
theoreticians dilemma. He notes that the essentialist hypothesis attributes two
representations, of essences and of the causal or statistical laws relating the essences to
observable properties of members of the kind, which he calls K-laws. According to
Strevens:
Everything the essentialist hypothesis explains, it explains in virtue of the childs
belief in K-laws. Thus, a more conservative hypothesis suggests itself: that the
child represents K-laws but does not have any beliefs concerning the
metaphysical foundation of these laws. In particular, the child does not represent
an essence as doing the causing. (ibid., 163).
Consider a child who predicts that a given tiger will be fierce by way of his
essentialist theory of tigers. The child must first recognize that the animal belongs to the
category of tigers, from which he infers that the animal has the essence of a tiger, and
then it reasons via a K-law the essence of tigerhood causes tigers to be fierce to the
prediction that this tiger will be fierce. The downward theoreticians dilemma states that a
theory of essences can be replaced with K-laws alone with no loss in predictive ability
about the observable properties of tigers. This claim would not impugn the psychological
necessity of the essentialist theory if it were the case that the child could only come to
know the K-laws by way of his theory.

237
However, the upward dilemma denies that this is so, for in order for the childs
essentialist theory to render predictions about the observable properties of tigers, he must
already possess the very K-laws that would suffice for prediction. That is, he must
already have observed that animals that belong to the category of tigers are likely to be
fierce. Otherwise, how would the child know that the essence of tigers causes them to be
fierce, rather than tame or inanimate or something else entirely? A child who merely uses
these connections between being a tiger and being fierce can make all of the same
predictions as a theory-user despite not representing the tiger as having an essence and
despite having no opinion about what it is that makes the causal laws true (ibid.).
Strevenss theoreticians dilemma for essentialist theories raises two distinct questions;
what role does a theory of biological kinds play, and is the representation of an essence
essential to the theory?
With respect to the first question, I agree that an essentialist theory of tigers does
not generate any new predictions about the observable properties that a given tiger will
have. Put it this way: if you know nothing about tigers, and I tell you that tigers have the
essence of tigerhood, would this get you any closer to predicting whether a tiger will be
fierce? And if you already had observed that tigers tend to be fierce, would telling you
that tigers have the essence of tigerhood add anything to your predictive ability? The
answer to both seems to be no.
However, perhaps the purpose of essentialist theories is not to generate new
predictions about particular matters of fact. They explain why biological kinds obey
generalizations to the extent that they do. Once you already know that, say, 98% of tigers
observed were fierce and 90% were orange, the theory explains why it is that observed

238
tigers obeyed those generalities; it is in virtue of having a common essence that they have
consistent observable properties. What does this explanation purchase you?
First, the theory permits within-domain structural inference. If you believe that
the essence explains the observable properties and that causes have consistent effects,
then you will induce that the frequency with which future tigers are fierce will be similar
to the frequency you have already observed (ceteris paribus). If you had no such theory,
then it is at least open that the fact that 98% of observed tigers were fierce was a merely
accidental generalization (maybe they had tough childhoods).
More importantly, a theory of tiger essences may permit cross-domain structural
inference. Essences encode information about which observed regularities are projectible
within biological kinds and which are not. Suppose you know that the tiger essence
causes 98% of tigers to be fierce and 90% to be orange, and you want to use this
information to predict the likely properties of a novel kind of animal, a hippopotamus.
The theory will not tell you what properties a hippo will have. You have to observe
hippos before you know that.
Suppose you observe a single hippo that is fierce, purple, and has a scar on its
right ear. From ones observations of tigers and other species, one can learn higher-order
information about the variability of traits within a species. In fact, the essentialist
intuitive theories used by humans state that behavioral properties (like fierceness) are
highly consistent within species, overall appearance properties (like being purple) are less
so, and idiosyncratic appearance properties (like scars) are extremely inconsistent
(Gelman 2003). Thus, a theorizer could predict, from the observable of a single hippo,

239
that it is likely that future hippos will also be fierce, it is somewhat likely that they will be
purple, and it is not very likely they will have scars on their right ears.
Sellars pointed out that there may be nothing at the observable level to distinguish
the projectible empirical regularities. In their response to Strevens, Ahn et al. (2001) level
a similar complaint that there may be nothing to distinguish the projectible (natural) Klaws from mere accidental (unnatural) generalizations about kinds:
There is nothing in the minimal hypothesis that warrants a distinction between
causal and non-causal laws. Real world regularities are neither necessary nor
sufficient for determining naturalness of causal laws because regularities, even
causal ones, are not necessarily indicative of a K-law. Dodo birds were nave,
slow, and tasty; thus, they were all killed by hunters. However, being killed by a
hunter does not seem like a K-law about Dodos (Ahn et al. 2001, 62).
Is the representation of an essence essential for these types of theoretical
inferences? I have argued that the function of a biological essentialist theory is to explain
which empirical regularities about biological kinds will be projectible and which ones
will not. In order to use structural information about the traits of tigers to make similar
inferences about hippos, it may be necessary to represent that in virtue of which hippos
have consistent behavioral properties as the same kind of thing as what tigers have. Must
this be an essence? Perhaps not, but it will be something more than just the idiosyncratic
K-laws found within each species theory. That is, it must be a theoretical variable of
some sort, one that represents the regularities between tigerhood and the observable
properties of tigers as instantiations of the same regularity as those between hippohood
and the observable properties of hippos.

240
6. What about theory of mind?
In this chapter, I have argued for two general functions that theories can play in
cognition. In their expansive function, theories allow their users to extrapolate higherorder regularities beyond those observed regularities from which they were learned, both
within the same domain and across other domains. In their limiting function, theories
allow their users to identify which previously observed regularities will be projectible and
which will not, both within a domain and across domains.
The examples that I used to illustrate these functions came from the physical,
numerical, and biological realms. I have not yet directly addressed the question of the
function a theory of mind, in particular, plays in human cognition and how we can test
whether a subject possesses one. Though the logical problem has been extended as an
attack on theoretical capacities more generally and thus presents problems for the theorytheory as a whole, one would hope that a solution to the logical problem could resolve the
debate in which it first arose.
Unfortunately, this is not a task that I can fully undertake here, for I think that
there are several problems which uniquely beset the issue of theories of mind. In the
cases that I have discussed in this chapter, we have a good grasp on what the candidate
theories are and how they are used to make inferences; we know what our intuitive
theories of numbers, gravity, and biological essentialism are and what they entail.
However, the same cannot necessarily be said about a theory of mind. Significant
controversies still rage regarding the way that theory of mind is implemented in humans
(e.g. theory vs. simulationist accounts)124, whether theory of mind is a single thing or a

124

See Massimo Marraffas entry on Theory of Mind in the Internet Encyclopedia of Philosophy for a
thorough discussion and bibliography.

241
cluster of related capacities125, whether our mental attributions to ourselves and others are
even minimally accurate126, and what predictive role, if any, a theory of mind plays127.
Indeed, given the inaccuracy of many mental state attributions and the sufficiency of
complex behavior-reading for most of human interactions, some authors have suggested
that mindreading does not serve a predictive role at all but rather plays a primarily
normative function of justifying or condemning behavior128.
Given these complexities, I am reticent about making any pronouncements about
how my account of theorizing applies to the case of theory of mind, so I will limit myself
to a few remarks. For simplicity, I will restrict the discussion to theories about what
others believe. One role of theory of mind that has been extensively studied regards its
users ability to make predictions about false beliefs. In a typical false belief task, the
subject and a partner both see an object being hidden in one of two boxes, and the subject
sees that the partner has seen where the object is hidden. Then, the subject sees that the
object is moved to the other location and that the partner has not seen it being switched.
The subject is then asked to predict where the partner will look for the object. The most
common interpretation of false-belief tests is that a subject who reasons via the line-ofsight empirical regularity, A partner who has seen the object will go for the object, will
predict that she will look for it where it actually is, rather than where she last saw it.
A subject who has a theory of belief may predict that the subject will look in the
wrong location, for he represents a tertiary relation between the partners unobservable
125

See Andrews (2012) for a discussion.


See Doris (2015), who claims that the reason attributions we make to explain the behavior of others and
ourselves are very often false.
127
See Andrews (2012).
128
Andrews (2009) argues that beliefs about norms precede beliefs about the mental causes of others
behaviors, both in ontogeny and phylogeny. Once this normative system is in place, theory of mind
develops in order to explain individuals norm-violating behaviors so that they can be justified, condemned,
or lead to an alteration of the norms.
126

242
belief and a particular location; her belief encodes an object as existing in a certain place
and time and recommends going for the food in that spot. Interpreted this way, a theory
of belief serves the limiting role argued for by Sellars. The theorizer explains the line-ofsight regularity, why partners typically head toward objects they have seen, in terms of
the more specific hypothesis that partners go to the location where their belief encodes
the object as being located. Typically, this is the same place as where the food actually is,
but it is possible to present the partner with cases that fall under the broader line-of-sight
relation but not the narrower relation holding between their belief and the object.
Notably, chimpanzees have systematically failed at such false belief tests (Call &
Tomasello 1999). Kaminski et al. (2008) argue that other primates can reason about the
relation between a partner and an object when it corresponds to the observable line-ofsight regularity but not when this regularity is violated129. More generally, theories of
belief can play a limiting role by specifying which of the large range of possible
observable regularities that another might represent is the one that is indeed being used.
This entire dissertation may be seen as an instantiation of this project.
However, regarding the expansive role of theories, a more radical proposal
suggests itself: perhaps it is only fruitful to be a mindreader if the agent whose mind one
is reading is itself a theorist. In other words, the expansive role of a theory of mind is
parasitic on the expansive roles of the theoretical beliefs that another might possess.
While I can not fully argue for this proposal here, a few initial points can be made in its
favor.

129

The title of their article Chimpanzees Know What Others Know, but Not What They Believe
seems optimally designed to frustrate philosophers.

243
Consider a subject, S, who is a mindreader, and another agent, D, who is not a
theorist. Ss task is to predict what D will do in some circumstance, so S begins by
forming a belief about an observable state, O1, that it believes will be relevant to D. Since
D is not a theorist, D reasons merely about an observed regularity between O1 and some
prediction, O2, on the basis of which D performs some behavior, B. Hence, if S knows
the very same observed regularity that governs Ds behavior (S knows if O1, then O2
and if O2, then D will B), and as a non-theorizer Ds reasoning really is limited to this
regularity, this would suffice to predict what D will do. Hence, there is no reason for S to
posit that D believes the empirical regularity that she acts in accordance with.
On the other hand, if D were a theorist, Ds belief could entail predictions
extending beyond this empirical regularity. Therefore, there is no guarantee that a past
observed regularity will suffice to know what D will do and S may only be able to predict
what D will do by reasoning about the content of her belief. This argument parallels the
complaint I made against Hempels example of white phosphorous; it seems that positing
a common cause only does any extra epistemic work if that common cause has
entailments for observable effects beyond those from which the common cause was
inferred. By hypothesis, the only mental common causes that have this property are
theoretical beliefs, so a theory of mind only plays an expansive role when used to predict
the actions of other theorizers.
This claim is speculative, and I do not want to defend it here. However, it
naturally fits with other views of the evolution of mindreading, including the arms race
theory (Miller, 1997) and Sterelnys (2003) theory of decoupled representations. Perhaps
this claim should not be surprising for another reason, since it is precisely the issue

244
involved in the logical problem130. If humans are indeed the only theorizers in other
domains, then there may be some further reason why humans are the only theorizers
about other minds.

130

In this case, humans are mindreaders trying to predict the behavior of chimpanzees. If chimpanzees in
food competition experiments are not theorists, then knowing whether an observable relation obtains
among the subordinate, dominant, and food suffices to predict what the subordinate will do. That is, the
BRH (and indeed, the more minimal behaviorist hypothesis) will predict just as well as the MRH.

245
References
Ahn, W. K., Kalish, C., Gelman, S. A., Medin, D. L., Luhmann, C., Atran, S., Coley, J.
D., & Shafto, P. (2001). Why essences are essential in the psychology of
concepts. Cognition, 82(1), 59-69.
Abercrombie, J. 1838. Inquiries concerning the Intellectual Powers and the Investigation
of Truth, 8th, London: John Murray.
Andrews, K. (2005). Chimpanzee theory of mind: Looking in all the wrong places?. Mind
& Language, 20(5), 521-536.
---- (2009). Understanding norms without a theory of mind. Inquiry, 52(5), 433-448.
---- (2012). Do apes read minds?: Toward a new folk psychology. MIT Press.
Barrett, P. H., Gautrey, P. J., Herbert, S., Kohn, D., & Smith, S. (Eds.). (1988). Charles
Darwins Notebooks, 1836-1844: Geology, Transmutation of Species, Metaphysical
Enquiries (1st ed.). Cambridge: Cambridge University Press.
Blaisdell, A. P., Sawa, K., Leising, K. J., & Waldmann, M. R. (2006). Causal reasoning
in rats. Science, 311(5763), 10201022.
Bont, E., Flemming, T., & Fagot, J. (2011). Executive control of perceptual features and
abstract relations by baboons (Papio papio). Behavioural brain research, 222(1), 176182.
Bozdogan, H. (1987). Model selection and Akaike's information criterion (AIC): The
general theory and its analytical extensions. Psychometrika, 52(3), 345-370.
Bonawitz, E. B., Ferranti, D., Saxe, R., Gopnik, A., Meltzoff, A. N., Woodward, J., &
Schulz, L. E. (2010). Just do it? Investigating the gap between prediction and action in
toddlers causal inferences. Cognition, 115(1), 104-117.
Boyd, R. (1991). Realism, anti-foundationalism and the enthusiasm for natural kinds.
Philosophical studies, 61(1), 127-148.
Boyle, D. (2003). Hume on animal reason. Hume studies, 29(1), 328.
Braithwaite, R. B. (1953). Scientic explanation. Cambridge University Press.
Buchsbaum, D., Bridgers, S., Weisberg, D. S., & Gopnik, A. (2012). The power of
possibility: causal learning, counterfactual reasoning, and pretend play. Philosophical
Transactions of the Royal Society B: Biological Sciences, 367(1599), 2202-2212.

246
Bugnyar, T., & Heinrich, B. (2005). Ravens, Corvus corax, differentiate between
knowledgeable and ignorant competitors. Proceedings of the Royal Society B: Biological
Sciences, 272(1573), 1641-1646.
Call, J., & Tomasello, M. (1999). A nonverbal false belief task: The performance
of children and great apes. Child Development, 70(2), 381395.
---- (2008). Does the chimpanzee have a theory of mind? 30 years later. Trends in
Cognitive Science, 12, 187-192.
Campbell, D. T. (1954). Operational delineation of what is learned via the transposition
experiment. Psychological Review, 61(3), 167-174.
Carruthers, P. (2008). Meta-cognition in animals: a skeptical look. Mind and Language,
23, 58-89.
Carey, S. (2009). The origin of concepts. Oxford University Press.
---- (2011). Prcis of the origin of concepts. Behavioral and Brain Sciences, 34(03), 113124.
Cheng, P. W. (1997). From covariation to causation: A causal power theory.
Psychological review, 104(2), 367.
Clatterbuck, H. (forthcoming). Chimpanzee mindreading and the value of parsimonious
mental models. Mind & Language.
Clayton, N. S., Dally, J. M., & Emery, N. J. (2007). Social cognition by food-caching
corvids. The western scrub-jay as a natural psychologist. Philosophical Transactions of
the Royal Society of London B: Biological Sciences, 362(1480), 507-522.
Craig, W. (1956). Replacement of auxiliary expressions. Philosophical Review, 65, 3855.
Danks, D. (2007). Theory unification and graphical models in human categorization. In
A. Gopnik & L. Schulz (Eds.), Causal learning: psychology, philosophy, and
computation (p. 173-189). Oxford University Press.
Darwin, C. (1859/2003). On the origin of species by means of natural selection, or the
preservation of favoured faces in the struggle for life (1st ed.). In Carroll, J. (Ed.), On the
origin of species (p. 77-398). Peterborough, ON: Broadview.
---- (1872/2009). The expression of the emotions in man and animals. Ekman, P. (Ed.).
Oxford University Press.

247
---- (1879/2004). The descent of man, and selection in relation to sex (2nd ed.). In Moore,
J. & Desmond, A. (Eds.), The Descent of Man. London: Penguin Books.
De Houwer, J., & Beckers, T. (2003). Secondary task difficulty modulates forward
blocking in human contingency learning. Quarterly Journal of Experimental Psychology
Section B, 56(4), 345-357.
De Houwer, J., Vandorpe, S., & Beckers, T. (2005). Evidence for the role of higher order
reasoning processes in cue competition and other learning phenomena. Learning &
Behavior, 33(2), 239-249.
De Waal, F. B. M. (1991). Complementary methods and convergent evidence in the
Study of primate social cognition. Behaviour, 1l8, 297320.
Doris, J. (2015). Talking to our selves. Oxford University Press.
Earman, J. (1978). Fairy tales vs an ongoing story: Ramsey's neglected argument for
scientific realism. Philosophical Studies, 33(2), 195-202.
---- (1992). Bayes or Bust? A critical examination of Bayesian conrmation theory.
Cambridge: MIT Press.
Feyerabend, P. K. (1962). Explanation, reduction, and empiricism. In H. Feigl & G.
Maxwell (Eds.), Scientific Explanation, Space, and Time (Minnesota Studies in the
Philosophy of Science, 3 (28-97). Minneapolis: University of Minnesota Press.
Field, H. (1980). Science without numbers. Princeton University Press.
Fitzpatrick, S. (2008). Doing away with Morgans Canon. Mind & Language, 23(2), 224246.
---- (2009). The primate mindreading controversy: a case study in simplicity and
methodology in animal psychology. In R. Lurz (Ed.), The Philosophy of Animal Minds
(pp. 258-277). Cambridge University Press.
Fodor, J.A. (1975). The language of thought. Harvard University Press.
Forster, M., & Sober, E. (1994). How to tell when simpler, more unified, or less ad hoc
theories will provide more accurate predictions. The British Journal for the Philosophy of
Science, 45(1), 1-35.
Gelman, R., & Gallistel, C. R. (1978). The childs understanding of number. Cambridge,
MA: Harvard University Press.
Gelman, S.A. (2003). The essential child: origins of essentialism in everyday thought.
Oxford University Press.

248

Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of
the non-obvious. Cognition, 38(3), 213-244.
Gentner, D. (2010). Bootstrapping the mind: Analogical processes and symbol
systems. Cognitive Science, 34(5), 752-775.
Gildenhuys, P. (2004). Darwin, Herschel, and the role of analogy in Darwins origin.
Studies in History and Philosophy of Biological and Biomedical Sciences 35, 593-611.
Godfrey-Smith, P. (1994). Of nulls and norms. PSA Proceedings, 1994(1), 280-290.
Goodman, N. (1955). Fact, ction, and forecast. Indianapolis, IN: Hackett.
Gopnik, A. & Meltzoff, A. (1997). Words, thoughts, and theories. Cambridge: MIT
Press.
Gopnik, A., & Wellman, H. M. (2012). Reconstructing constructivism: Causal models,
Bayesian learning mechanisms, and the theory theory. Psychological bulletin, 138(6),
1085-1108.
Griffiths, P (1999) Squaring the circle: natural kinds with historical essences. In Species:
New interdisciplinary essays, ed. R. A. Wilson. Cambridge: MIT Press. 209-28.
Halford, G.S. & Busby, J. (2007). Acquisition of structured knowledge without
instruction: The relational schema induction paradigm. Journal of Experimental
Psychology: Learning, Memory and Cognition, 33, 586603.
Hare, B. (2001). Can competitive paradigms increase the validity of experiments on
primate social cognition? Animal Cognition, 4, 269-280.
Hare, B., Call, J., Agnetta, B., & Tomasello, M. (2000). Chimpanzees know what
conspecifics do and do not see. Animal Behaviour, 59, 771-785.
Hare, B. and Tomasello, M. (2005). Human-like social skills in dogs? Trends in
Cognitive Science, 9, 439-444.
Hauser, M.D., Chomsky, N. & Fitch, W.T. (2002). The faculty of language: What is it,
who has it and how did it evolve? Science 298(5598), 1569-1579.
Hausman, D. & Woodward, J. (1999). Independence, invariance, and the Causal Markov
Condition. British Journal for the Philosophy of Science, 50, 1-63.
Hempel, C.G. (1942). The function of general laws in history. Journal of Philosophy, 39,
3548.

249
---- (1965). Aspects of scientific explanation, and other essays in the philosophy of
science. Free Press.
Hempel, C.G. & Oppenheim, P. (1948). Studies in the logic of confirmation. Philosophy
of Science, 15, 98-115.
Hennig, W. (1966): Phylogenetic Systematics. Urbana: University of Illinois Press.
Herschel, J. F. W. (1987). A preliminary discourse on the study of natural philosophy.
Chicago: University of Chicago Press. (First published 1830)
Hitchcock, C., & Sober, E. (2004). Prediction versus accommodation and the risk of
overfitting. The British journal for the philosophy of science, 55(1), 1-34.
Heyes, C. M. (1998). Theory of mind in nonhuman primates. Behavioral and Brain
Sciences, 21(01), 101-114.
---- (in press). Animal mindreading: whats the problem? Psychonomic Bulletin and
Review.
Hodge, M. J. S. (1977). The structure and strategy of Darwins long argument. British
Journal for the History of Science, 10, 237245.
---- (1989). Darwins theory and Darwins argument. In M. Ruse (Ed.), What the
philosophy of biology is, (pp. 163182). Kluwer Academic.
---- (1992). Darwins argument in the Origin. Philosophy of Science, 59(3), 461464.
Hull, C. (1943). Principles of behavior. Appleton-Century-Crofts.
Hull, D.L. (1984). A matter of individuality. In Conceptual Issues in Evolutionary
Biology, ed. E. Sober. Cambridge: MIT Press, 623-645.
Hume, D. (1978). Treatise of human nature. L. A. Selby-Bigge & P. H. Nidditch (Eds.).
New York: Oxford University Press.
---- (2006). Enquiry Concerning Human Understanding. Ed. S. Butler. Echo Library.
Hummel, J.E. & Holyoak, K.J. (2001). A process model of human transitive inference. In
M. Gattis (Ed.), Spatial schemas in abstract thought (pp. 279-305). Cambridge: MIT
Press, 279305.
---- (2003). A symbolic-connectionist theory of relational inference and generalization.
Psychological Review, 110, 22064.

250
---- (2005). Relational reasoning in a neurally plausible cognitive architecture. Current
Directions in Psychological Science, 14(3), 15357.
Huntley, W. B. (1972). David Hume and Charles Darwin. Journal of the History of Ideas,
33(3), 457470.
Inukai, Y. (2010). Hume on relations: are they real? Canadian Journal of Philosophy,
40(2), 185-209.
Kaminski, J., Call, J., & Tomasello, M. (2008). Chimpanzees know what others know,
but not what they believe. Cognition, 109, 224-234.
Karin-DArcy, M.R. (2005). The modern role of Morgans canon in comparative
psychology. International Journal of Comparative Psychology, 18, 179201.
Karmiloff-Smith, A. (1995). Beyond modularity: A developmental perspective on
cognitive science. MIT Press.
Kemp, C., Perfors, A, & Tenenbaum, J. B. (2007). Learning overhypotheses with
hierarchical Bayesian models. Developmental Science, 10(3), 307-321.
Kendler, H.H. (1952). What is learned? a theoretical blind alley. Psychological
Review, 59, 269-277.
Ketland, J. (2004). Empirical adequacy and ramsification. The British Journal for the
Philosophy of Science, 55, 409424.
Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago
Press.
Lange, M. (2000). Salience, supervenience, and layer cakes in Sellars's scientific realism,
McDowell's moral realism, and the philosophy of mind. Philosophical Studies, 101(2),
213-251.
Limongelli, L., Boysen, S. T. & Visalberghi, E. (1995) Comprehension of cause-effect
relations in a tool-using task by chimpanzees (Pan troglodytes). Journal of Comparative
Psychology, 109, 1896.
Lupyan, G. (2008). Taking symbols for granted? Is the discontinuity between human and
nonhuman minds the product of external symbol systems? Behavioral and Brain
Sciences, 31(02), 140-141.
Lurz, R. (2009). If chimpanzees are mindreaders, could behavioral science tell? Toward a
solution to the logical problem. Philosophical Psychology, 22, 305-328.
Mackintosh, J. (1837). Principles of Pathology, and Practice of Physic (Vol. 1). Biddle.

251

Margolis, E., & Laurence, S. (2013). In defense of nativism. Philosophical


studies, 165(2), 693-718.
Matsuzawa, T. (1985). Use of numbers by a chimpanzee. Nature, 315(6014), 57-59.
---- (2009). Symbolic representation of number in chimpanzees.Current opinion in
neurobiology, 19(1), 92-98.
Maxwell, G. (1962). The ontological status of theoretical entities. In H. Feigl & G.
Maxwell (Eds.), Scientific Explanation, Space, and Time (Minnesota Studies in the
Philosophy of Science, 3 (3-15). Minneapolis: University of Minnesota Press.
Mayr, E. (1970). Populations, species, and evolution: an abridgment of animal species
and evolution. Harvard University Press.
Meketa, I. (2014). A critique of the principle of cognitive simplicity in comparative
cognition. Biology & Philosophy, 29(5), 731-745.
Melis, A. P., Call, J., & Tomasello, M. (2006). Chimpanzees (Pan troglodytes) conceal
visual and auditory information from others. Journal of Comparative Psychology, 120(2),
154.
Meltzoff, A. N. (2007). Like me: a foundation for social cognition.Developmental
science, 10(1), 126-134.
Meltzoff, A. N., & Brooks, R. (2008). Self-experience as a mechanism for learning about
others: A training study in social cognition. Developmental Psychology; Developmental
Psychology, 44(5), 1257.
Miller, G. F. (1997). Protean primates: The evolution of adaptive unpredictability in
competition and courtship. In Whiten, A. & Byrne, R. W. (Eds.), Machiavellian
intelligence II: Extensions and evaluations (p. 312-340). Cambridge University Press.
Miller, R. R., & Matute, H. (1996). Biological significance in forward and backward
blocking: Resolution of a discrepancy between animal conditioning and human causal
judgment. Journal of Experimental Psychology: General,125(4), 370.
Morgan , C.L. (1894). An introduction to comparative psychology. Walter Scott .
---- (1906). An introduction to comparative psychology (2nd ed.). Scribner.
----. (1920). Animal behaviour (2nd ed.). E. Arnold.

252
Muentener, P., Bonawitz, E., Horowitz, A., & Schulz, L. (2012). Mind the gap:
investigating toddlers sensitivity to contact relations in predictive events. PloS one, 7(4),
e34061.
Nersessian, N. J. (1992). How do scientists think? Capturing the dynamics of conceptual
change in science. Cognitive models of science, 15, 3-44.
Newton, I. (1687/2003). Sir Isaac Newton's Mathematical Principles of Natural
Philosophy and His System of the World. Kessinger Publishing.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge University
Press.
Penn, D. C., Holyoak, K. J., & Povinelli, D. J. (2008). Darwin's mistake: Explaining the
discontinuity between human and nonhuman minds. Behavioral and Brain
Sciences, 31(02), 109-130.
Penn, D. C., & Povinelli, D. J. (2007a). On the lack of evidence that non-human animals
possess anything remotely resembling a theory of mind. Philosophical Transactions of
the Royal Society of London B: Biological Sciences,362(1480), 731-744.
---- (2007b). Causal cognition in human and nonhuman animals: A comparative, critical
review. Annual Review of Psychology, 58, 97-118.
Pepperberg, I.M. (2005). Intelligence and rationality in parrots. In S. L. Hurley & M.
Nudds (Eds.), Rational Animals? (pp. 469-488). Oxford University Press.
Povinelli, D. J. (2012). World without weight: Perspectives on an alien mind. Oxford
University Press.
Povinelli, D. J., Nelson, K. E., & Boysen, S. T. (1990). Inferences about guessing and
knowing by chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 104(3),
203.
Povinelli, D. J., & Vonk, J. (2003). Chimpanzee minds: suspiciously human? Trends in
cognitive sciences, 7(4), 157-160.
---- (2004). We dont need a microscope to explore the chimpanzee's mind. Mind &
Language, 19(1), 1-28.
Premack, D. (1983). The codes of man and beasts. Behavioral and Brain Sciences, 6(01),
125-136.
Psillos, S. (1999). Scientific realism: how science tracks truth. Psychology Press.

253
Putnam, H. (1963). Degree of confirmation and inductive logic. In Putnam, H. (Ed.),
Mathematics, Matter, and Method. Cambridge University Press.
Ramsey, F.P. (1950). The foundations of mathematics and other logical essays.
Humanities Press.
Rehder, B. (2007). Essentialism as a generative theory of classification. In A. Gopnik &
L. Schulz (Eds.), Causal learning: Psychology, philosophy, and computation (pp. 190207). Oxford University Press.
Reichmann, J. B. (2000). Evolution, animal 'rights, and the environment. CUA Press.
Rhodes, R. (2012). Making of the atomic bomb. Simon and Schuster.
Riccio, C. A., Reynolds, C. R., Lowe, P., & Moore, J. J. (2002). The continuous
performance test: a window on the neural substrates for attention?. Archives of clinical
neuropsychology, 17(3), 235-272.
Richards, R. J. (1987). Darwin and the emergence of evolutionary theories of mind and
behavior. University of Chicago Press.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B. Lloyd (Eds.), Cognition
and categorization (p. 27-48). Lawrence Erlbaum.
Rumbaugh, D.M. (1995). Primate language and cognition: common ground. Social
Research, 62(3), 711-730.
Rumbaugh, D.M. and Pate, J.L. (1984). The evolution of cognition in primates: a
comparative perspective. In H. L. Roitblat, T. G. Bever, & H. S. Terrace (Eds.), Animal
cognition (p. 569-585). Lawrence Erlbaum Associates.
Santos, L. R., Nissen, A. G., & Ferrugia, J. A. (2006). Rhesus monkeys, Macaca mulatta,
know what others can and cannot hear. Animal Behaviour,71(5), 1175-1181.
Sellars, W. (1963a). Phenomenalism. In Science, Perception and Reality (p. 60-95).
London: Routledge.
---- (1963b). The language of theories. In Ibid, (p. 106126).
---- (1965). Scientific realism or irenic instrumentalism. In R. Cohen
& M. Wartofsky (Eds.), Boston Studies in the Philosophy of Science, v.2 (p. 171-204).
New York: Humanities Press.
---- (1977). Is scientific realism tenable? In PSA 1976 v.2, (p. 307-334). Lansing,
MI: Philosophy of Science Association.

254
Shanks, D.R. (1985). Forward and backward blocking in human contingency
judgments. Quarterly Journal of Experimental Psychology, 37B, 121.
Shettleworth, S. J. (2010). Cognition, evolution, and behavior (2nd ed.). Oxford
University Press.
Sober, E. (1980). Evolution, population thinking, and essentialism. Philosophy of
Science, 350-383.
---- (1990). Explanation in biology: let's razor Ockham's razor. Royal Institute of
Philosophy Supplement, 27, 73-93.
---- (1998a). Black box inference: When should intervening variables be postulated? The
British journal for the philosophy of science, 49(3), 469-498.
---- (1998b). Morgans canon. In C. Allen and D. Cummins (Eds), The evolution of mind
(pp. 224-242). Oxford University Press.
---- (2005). Comparative psychology meets evolutionary biology: Morgans canon and
cladistic parsimony. In Daston L. & Mitman, G. (Eds.), Thinking with animals: new
perspectives on anthropomorphism (p. 85-99). New York: Columbia University Press.
---- (2008). Evidence and evolution. Cambridge University Press.
---- (2009). Parsimony and models of animal minds. In R. Lurz (Ed.), The Philosophy of
Animal Minds (p. 237-257). Cambridge University Press.
---- (2011). Did Darwin Write the Origin Backwards?: Philosophical Essays on Darwin's
Theory. Prometheus Books.
----. (2012). Anthropomorphism, parsimony, and common ancestry. Mind & Language,
27(3), 229-238.
---- (ms). Ockhams razors a users manual.
Sperber, D. (2000). Metarepresentations in an evolutionary perspective. In D. Sperber
(Ed.), Metarepresentations (pp. 117-138). Oxford University Press.
Spirtes, P., Glymour, C. & Scheines, R. (2000). Causation, prediction, and search (2nd
ed.). MIT Press.
Stokes, D. (2013). Cognitive penetrability of perception. Philosophy Compass,8(7), 646663.
Strevens, M. (2000). The essentialist aspect of naive theories. Cognition, 74(2), 149-175.

255
Suddendorf, T. (1999). The rise of the metamind. In Corballis, M. C. & Lea, S. E. G.
(Eds.), The descent of mind (pp. 218-260). Oxford University Press.
Taylor, A. H., Hunt, G. R., Medina, F. S., & Gray, R. D. (2009). Do New Caledonian
crows solve physical problems through causal reasoning? Proceedings of the Royal
Society B, 276, 247-254.
Thelen, E., & Smith, L. B. (1994). A dynamic systems approach to the development of
cognition and action. MIT Press.
Thompson, R. K., & Oden, D. L. (2000). Categorical perception and conceptual
judgments by nonhuman primates: The paleological monkey and the analogical
ape. Cognitive Science, 24(3), 363-396.
Tomasello, M., & Call, J. (1997). Primate cognition. Oxford University Press.
Tomasello M., Call, J. & Hare, B. (2003a). Chimpanzees understand psychological
states- the question is which ones and to what extent? Trends in Cognitive Science, 7,
153-156.
---- (2003b): Chimpanzees versus humans: Its not
that simple. Trends in Cognitive Science, 7, 239-240.
Tomasello, M. & Rakoczy, H. (2003). What makes human cognition unique? From
individual to shared to collective intentionality. Mind & Language, 18(2): 121-147.
Urushihara, K., & Miller, R. R. (2010). Backward blocking in first-order
conditioning. Journal of Experimental Psychology: Animal Behavior Processes,36(2),
281.
Van Fraassen, B.C. (1980). The scientific image. Oxford University Press.
Visalberghi, E. & Limongelli, L. (1994). Lack of comprehension of cause-effect
relations in tool-using capuchin monkeys (Cebus apella). Journal of Comparative
Psychology, 108, 1522.
---- (1996) Action and understanding: Tool use revisited through the mind of capuchin
monkeys. In A. E. Russon, K. A. Bard & S. T. Parker (Eds.), Reaching into thought. The
minds of the great apes (p. 57-79). Cambridge University Press.
Vonk, J., & Povinelli, D. J. (2006). Similarity and difference in the conceptual systems of
primates: The unobservability hypothesis. In Zentall, T. & Wasserman, E. A.
(Eds.), Comparative cognition: Experimental explorations of animal intelligence (pp.
363-387). Oxford University Press.

256
Wasserman, E.A. and Astley, S.L. (1994). A behavioral analysis of concepts: its
application to pigeons and children. The Psychology of Learning and Motivation, 31, 73132.
Whiten, A. (1995). When does smart behavior-reading become mind-reading? In P.
Carruthers & P. Smith (Eds.), Theories of Theories of Mind (p. 277-292). Cambridge
University Press.
---- (2013). Humans are not alone in computing how others see the world. Animal
Behaviour, 86(2), 213-221.
Wickens, D.D. (1938). The transference of conditioned excitation and conditioned
inhibition from one muscle group to the antagonistic muscle group. Experimental
Psychology, 22, 101-123.
---- (1939). The simultaneous transfer of conditioned excitation and conditioned
inhibition. Experimental Psychology, 24, 332-338.
---- (1943). Studies of response generalization in conditioning, I. Stimulus generalization
during response generalization. Experimental Psychology, 33, 221-227.
---- (1948). Stimulus identity as related to response specificity and response
generalization. Journal of Experimental Psychology, 38, 389-394.
Worrall, J. (1989). Structural Realism: the best of both worlds?. Dialectica,43(1-2), 99124.
Worrall, J. & Zahar, E. (2001). Ramsification and structural realism. In E. Zahar (Ed.),
Poincar's Philosophy: From Conventionalism to Phenomenology (pp. 236-251). Open
Court.
Wynn, K. (1990). Childrens understanding of counting. Cognition, 36, 155-193.
---- (1992). Childrens acquisition of the number words and the counting system.
Cognitive Psychology, 24, 220-251.

Vous aimerez peut-être aussi