Cognitive Systems As Dynamical Systems

Cognitive Systems as
Dynamical Systems Terence Horgan and John Tienson
1. Introduction both in humans and in other animals, are vastly more

complex than either natural subsystems like the olfac-
The body of mathematical concepts, techniques, and tory perceptual system of rabbits, or the kinds of
results known as "dynamical systems theory," which has artificial systems and models that connectionists have so
long played a central role in physics, is beginning to far produced. This gap almost certainly involves differ-
figure importantly in cognitive science and in neuro- ences in kind, and not just in the complexity of the
science. In cognitive science there is rapidly spreading processes involved. In particular, it is very likely that
activation of the theoretical approach known as connec- many cognitive tasks require mental representations
tionism, which treats cognitive systems as a species of with rich structure, and processing of these representa-
neural networks. Neural networks in turn are most tions that is suitably sensitive to that structure. The
naturally viewed, conceptually and mathematically, as a visual system, for instance, must have the capacity to
species of physically realizable dynamical systems (and isolate (enduring) objects in the visual field and to
are so viewed, by connectionists themselves). Whereas attribute complex properties to them -- e.g., approach-
classical cognitive science treats cognitive systems as ing rapidly, barking (as a kind of motion). And the
discrete formal systems, the mathematics of dynamical cognitive system as a whole must somewhere -- within
systems is, in essence, continuous: either literally so, the visual system or elsewhere -- have the capacity to
when the systems evolve continuously through time and recognize perceived objects both as specific individuals
the natural mode of mathematical description is the (e.g., as Fido) and as members of kinds (e.g., dog). And
differential equation; or by discrete approximation to thinking about objects when they are absent from the
the continuous case, when evolution is by discrete time perceptual field no doubt requires mental representa-
steps and the corresponding natural mode of descrip- tions with far richer structure.
tion is the "difference equation." In this paper we will set forth a conception of
Concepts and mathematical techniques from dynami- cognitive systems as dynamical systems. Our particular
cal systems theory also are now beginning to be focus will be on ways of conceptualizing and incor-
employed by neuroscientists, especially those who are porating, within the dynamical systems framework,
concerned to understand how neural processes sub- mental representations with complex syntactic struc-
serve perceptual information processing. Researchers ture, and cognitive processing that is sensitive to that
(e.g., Skarda and Freeman, 1987; Freeman, 1991) are structure. We will also address some important ways in
finding evidence that information processing in the which this conception of cognitive systems differs from
rabbit olfactory system, for instance, involves global the conception embodied in classical cognitive science.
patterns of neural activity rather than locally encoded Much of our discussion will be applicable, mutatis
information; and they are describing these patterns in mutandis, to other possible kinds of structured repre-
terms of phase transitions, attractor basins, phase space sentations -- for instance, images, or models (construed
contours, chaotic attractors, and other such notions as non-language-like).
from the conceptual tool bag of dynamical systems
theory.
Numerous real life cognitive systems and subsystems,
Topoi ! 1: 27--43, 1992.

1992 KluwerAcademic Publishers. Printed in the Netherlands.
28 TERENCE HORGAN AND JOHN TIENSON
2. Settling into a new paradigm ents. (The contents of the constituents might themselves
depend partially on the total representational context in
We come to the topic of cognitive systems as dynamical which the given complex representation occurs; thus the
systems by way of an interest in connectionism and its content of the complex representation might be partially
potential implications for the philosophy of mind and context dependent also.)
the philosophy of psychology (Horgan and Tienson, Models and theories in classical cognitive science
1987, 1989, 1990, forthcoming a, forthcoming b). In typically exhibit various other features too. Some of
our view, connectionist work does not yet embody any these, although they do not quite have the same founda-
clear or univocal conception of cognition, over against tional status as assumptions (1)--(3), have become very
the conception embodied in classical cognitive science. deeply ingrained in the thinking of classicists. Especially
However, by reflecting on connectionism itself, and on worthy of mention, in this category, is the ubiquitous
certain recalcitrant problems in classical cognitive assumption that memories are literally stored, in the
science (e.g., the frame problem) that largely contributed same form in which they occur in active processing, in a
to the emergence and rapid popularity of connec- separate component of the system (called memory) -- a
tionism, we have come to believe that the classical component appended to an executive part of the system
conception is very likely mistaken. We have also come (the "CPU") that does active processing and fetches
to believe that an adequate alternative paradigm would items from memory as it needs them.
need to incorporate certain key foundational assump- The alternative conception of cognition we advocate
tions -- some held in common with classicism, others incorporates claims (1) and (2) of classicism, but
incompatible with classicism. repudiates claim (3). We maintain that cognition in-
The classical conception, as we understand it, involves complex, language-like representations, plus
cludes three central assumptions about human and processes sensitive to the syntactic structure of these
human-like cognitive systems: (1) that they employ representations; but we deny that these processes
mental representations with compositional syntax and conform to PRL rules. We call this potential alternative
semantics; (2) that processing of these representations is paradigm representations without rules (for short,
sensitive to their syntactic structure; and (3) that RWR). Certain additional features of cognitive systems
cognitive processing conforms to programmable rules, would, we maintain, be natural byproducts of an RWR
statable over mental representations themselves (as we paradigm, if not quite essential or foundational ingredi-
shall call them, programmable representation level rules, ents. In particular, RWR systems probably would not
of PRL rules). (It is consistent with classicism -- and contain memory modules in which representations are
with our own view -- that there be other kinds of explicitly stored, to be fetched as needed for processing.
representations in addition to syntactically structured Rather, when something is remembered a representa-
ones.) tion is re-created that had not existed anywhere in the
Claims (2) and (3) amount to the same thing from the system since the system last had a "thought" with this
classical point of view, but they are logically distinct, content.
and the importance of distinguishing them will 'become Elsewhere (op. cit.) we have argued at length that
clear momentarily." Claim (3) reflects classicism's as -~ human cognition very probably conforms to (he RWR
sumption that cognitive processing is a matter of there conception. We will not attempt to reproduce the
being some (stored or hardwired) program, running in arguments in detail here, but instead will offer only very
hardware or in "wetware," that manipulates mental brief sketches. We maintain that cognition needs mental
representations on the basis of their formal/syntactic representations with compositional syntax and seman-
properties. tics, and structure sensitive processing of these repre-
We shall have more to say below about what is, and sentations (a combination of features we call effective
what is not, implied by saying that mental representa- syntax), because a cognitive system without these fea-
tions have compositional syntax. To say that such tures would not be able to adequately keep track of the
representations have compositional semantics means, many different objects in its environment, and the many
essentially, that the content of any syntactically complex different properties of these objects and relations
representation is determined by the contents of, and the between them, that are immediately relevant to the
grammatical relations between, its syntactic constitu- system's cognitive task demands. For a small and limited
COGNITIVE SYSTEMS AS DYNAMICAL SYSTEMS 29
system, it might be possible to get the effect of "tracking" some circumstances. There is nothing that all (and only)
by wiring in suitable connections among unstructured exceptions have in common except being exceptions. In
representations. But natural cognitive systems do not particular, there is nothing that could make being an
have an antecedently fixed range of properties and exception subject to a PRL rule, because relevance is
individuals they might encounter, and are capable of determined by content, not form. (PRL rules apply to
vastly many possible representations involving predica- representations on the basis of their form).
tion of properties and relations to individuals. 1 For such It is an open question, at present, whether or not
a system, keeping track of which relevant objects there can be connectionist cognitive systems with the
instantiate which relevant properties -- especially when features that characterize the RWR conception of
each such object simultaneously instantiates several cognition. But we maintain that human cognition very
such properties and relations, and when many such likely conforms to the RWR conception in any case
properties and relations are simultaneously instantiated (regardless of how things might turn out with connec-
by several such individuals -- requires a system of tionism), and hence that connectionism should strive to
representations in which identities of reference and evolve into an RWR paradigm -- which at present
identities of prediction are recoverable from the repre- means, most importantly, that connectionism should
sentations themselves. And this requires representations strive to develop systems that incorporate effective
with structural constituents that function syntactically as syntax. Our discussion here should be seen as situated
names, predicates, connectives, quantifiers, and the like. within this wider context. We shall be suggesting (i) that
Any such system is a language, although -- as will figure there are rather natural looking ways to incorporate
prominently below -- the way its representations syntactically structured representations into a general
embody syntactic structure might not be much like the theoretical account of cognition, given the mathemati-
way syntax is embodied in the "well formed formulas" of cal/conceptual framework appropriate for connection-
formal logic or in the representations posited by ist systems; (ii) that there also are natural looking ways
classical cognitive science. to construe structure sensitive processing, within this
We maintain, however, that human cognition does dynamical systems framework; and (iii) that under the
not conform to PRL rules, and thus does not conform to dynamical systems construal of complex structure and
the classical conception. This is due to the interplay of structure sensitive processing, the contention that cogni-
two features of cognition. First, human cognition is open tion does not conform to PRL rules also looks fairly
ended: there is no limit to the things a human being natural and plausible. So, with respect to our own
could possibly represent. Second, virtually anything a broader agenda, we see the points we shall make here as
cognitive system knows or remembers can be relevant bolstering our own larger case for the RWR conception
to any given cognitive task. Although it is possible to of cognition, and also as lending plausibility to the
state certain general principles concerning behavior and hypothesis that connectionism has the potential to
mental processes, such as the familiar generalizations evolve into an RWR paradigm.
attributed to "folk psychology," it is well known that Our discussion will be inspired, to some extent, by
there are exceptions to these generalizations. We have recent proposals in the connectionist literature for
argued that it is not possible to eliminate these excep- incorporating representations with complex structure,
tions, so as to state general, exceptionless principles including language-like structure, into connectionist
concerning human behavior or mental processes. This is systems. As yet, however, rather little has been done by
because there is a potentially unlimited range of possible way of processing models that employ such representa-
exceptions to any interesting generalization concerning tions. Furthermore, whereas syntactically structured
human cognition -- exceptions within the realm of the representations in classical symbol processing systems
mental (as opposed to those due to system breakdown have what Fodor and McLaughlin (1990) call classical
or outside intervention). The point is not just that there constituents -- i.e., whenever a syntactically complex
are an unlimited number of possible exceptions -- an representation is tokened, its syntactic constituents must
"override" rule permitting an unlimited number of be tokened simultaneously -- the proposed connec-
instances could deal with that. The trouble, rather, is the tionist representations lack this feature. One might well
potential relevance of anything to anything. Any repre- think (as do Fodor and McLaughlin, evidently) that
sented item of information can generate an exception, in token representations with putatively language-like
structure couldn't possibly have suitably structure sensi- A n attractor in state space is a point, or a set of
tive effects unless their structural parts are tokened points, in state space toward which the system will
along with them. If they are not tokened, how can they evolve from any of various other points. These others
have effects? are said to lie within the basin of the attractor -- the
Part of our project here is to address this issue within idea being that the attractor itself lies at the "bottom" of
the dynamical systems framework. We maintain that the the basin. The boundary of an attractor basin separates
structure sensitive processing of representations with those points in state space lying within the basin from
non-classical constituents -- something that admittedly those lying outside it. A point attractor is a single stable
looks virtually impossible within the framework of point in state space; when a system evolves to a point
classical cognitive science -- actually looks both natural attractor, it will remain ir~ that state (unless perturbed).
and plausible within the dynamical systems framework. 2 A periodic attractor is an orbit in state space that
repeats back on itself; when a system evolves to a
periodic attractor, it will oscillate perpetually through
3. Dynamical systems and connectionism the same sequence of states (unless perturbed). A
chaotic (or strange) attractor is a nonrepeating orbit in
3.i. Dynamical systems 3 state space. (Chaotic attractors are probably important
in the brain's information processing. See Skarda and
To treat a system as a dynamical system is to specify in a Freeman (1987), and Freeman (1990)).
certain way its temporal evolution, both actual and
hypothetical. The set of all possible states of the system
-- so characterized -- is the system's abstract state 3.ii. Connectionist systems 4
space. Each possible state of the system is a point in its
state space, and each magnitude or parameter is a A connectionist system is a network of simple neuron-
separate dimension of this space. The dynamical system, like processors, called nodes or units. Each node has
as such, is essentially the full collection of temporal directed connections to several other nodes, so that it
trajectories the system would follow through state space gets signals from some nodes and sends signals to some
- - with a distinct trajectory emanating from each nodes, possibly including the ones from which it gets
possible point in state space. A dynamical system can be signals. In practice a given node may get input from just
identified with a set of state space trajectories in roughly two or three other nodes, or from an many as two or
the same sense in which a formal system can be three dozen. In principle, it could be thousands or
identified with the set of its theorems. Just as there are millions. The input from each node is a simple signal,
many different axiomatizations of a formal system such like an electrical current or synaptic transmission.
as $5 or the classical propositional calculus, so likewise The total input to a node will determine its state of
there might be many different mathematical ways of activation -- how turned on it is. When a node is on, it
delineating a certain dynamical system or class of sends out signals to the nodes to which it has output
dynamical systems. (We follow c o m m o n practice in connections. The strength of the output signal is a
using the term 'dynamical system' ambiguously for function of the degree of activation. The strength of
abstract mathematical systems and for physical systems connection between nodes is referred to as weight. The
- - such as planetary systems and certain networks -- input to node b from node a is a function of the strength
whose behavior can be specified via some associated of a's output signal and the strength of the connection --
mathematical dynamical system.) i.e., weight -- between them.
In classical mechanics, for instance, the magnitudes Some units in a connectionist system get stimuli from
determining the state space of a given mechanical outside the system. These are input units. Some are
system are the instantaneous positions, masses, and output units. They are thought of as sending signals
velocities of the various bodies in the system. Temporal outside the system. Input units may also get stimuli from
evolution of such a system, from one point in its state nodes within the system, and output nodes can send
space to another, is determined by Newton's laws, which feedback to nodes of the system. The rest of the nodes,
apply globally to the entire state of the system at an with no connections outside the system, are called
initial time. hidden or internal nodes.
COGNITIVE S Y S T E M S AS D Y N A M I C A L SYSTEMS 31
Input and output nodes may be assigned interpreta- connectionists, is the notion of a network's activation
tions. A problem is posed by activating a pattern of landscape. Consider a network consisting of two nodes;
input nodes. Typically there is a great deal of activity, its associated activation space is two dimensional, and
with nodes going on and off and sending and receiving thus can be envisioned as a Cartesian plane. Now
signals repeatedly until the system "settles" into a stable imagine that plane being topologically molded into a
configuration that constitutes its solution to the problem contoured, non-Euclidean, two dimensional surface.
posed. The interpretation or semantic content of the Imagine this "landscape" oriented horizontally, in three
activated nodes is what the system currently represents, dimensional space, in such a way that for each point p in
hence its ~answer." Connectionist systems are capable of the network's two dimensional state space, the path
"learning" from "experience" by having their weights along the landscape that a ball would follow if posi-
changed systematically in a way that depends upon how tioned at p and then allowed to roll freely is the
well the network has performed to date in generating temporal trajectory that the network itself would follow,
solutions to problems posed to it as a training regimen. through its state space, if it were to evolve (without
The most striking difference between such networks perturbation) from p. For an n-unit connectionist
and conventional computers is the lack of a central network, the activation landscape is the n-dimensional
executive. In a conventional computer the behavior of analog of such a two dimensional, non-Euclidean,
the whole system is controlled at the central processing contoured surface: i.e., a topological molding of the n-
unit (CPU) by a stored program. In connectionist dimensional state space such that, were this surface
systems there is no central "executive" and no program oriented "horizontally" in an (n + 1) dimensional space,
to control the operation of the system. All connections then a ball would "roll along the landscape," from any
are local, so that each node knows only what it gets from initial point, in a way that corresponds to the way the
the nodes to which it is connected. And the only system itself would evolve through its state space
information that one node communicates to another is, (barring perturbation) from that point. The topology of
"I'm turned on (so much)." There is no component of a given network's activation landscape will be jointly
the system with more knowledge than that. So, in determined by two factors: (i) the structural features of
particular, nothing in the system knows what the system the network, in particular the pattern of connections
as a whole is doing. Furthermore, what goes on at one between nodes and the weights on these connections;
place in the system is independent of what goes on and (ii) the rules by which the individual nodes update
elsewhere in the system. The behavior of each node is their own activations and output signals.
determined only by its current state and its input. In connectionist models, cognitive processing is
Nevertheless, connectionist systems can be interpreted typically construed as the system's evolution along its
to globally represent interesting content when in a activation landscape from one point in activation space
particular state, and as having stored (in the weights) to another -- where the initial point in the temporal
knowledge that is not presently actively represented. trajectory, and also the final point, and perhaps certain
intermediate points, have certain representational con-
3.iii. Connectionist systems as dynamical systems tents (whereof more presently). As we have already
noted, in typical systems a problem is posed to the
As we remarked at the outset, connectionist networks system by activating a set of nodes which are interpreted
are a species of dynamical systems. The magnitudes as having a certain content. From a dynamical systems
determining the state space of a given connectionist perspective, this amounts to positioning the system at
system are the instantaneous activation levels of each of some point in its state or activation space. (Which
the nodes in the network. Thus the state space of a specific point this is will depend also upon the activation
network is frequently called its "activation space." The level of nodes other than those involved in posing the
activation space of a network has as many dimensions as problem.) The network eventually settles into a stable
the network has nodes. The rules governing the system's state which constitutes its "solution" to the problem --
temporal evolution apply locally, at each node of the another point in its activation space. From the dynami-
system; this simultaneous local updating of all nodes cal systems perspective, the system has evolved to a
determines the system's evolution through time. point attractor, from an initial point -- the posed
A useful geometrical metaphor, often employed by problem -- within the associated attractor basin.
Learning too is typically construed, within connec- processing is construed as the system's evolving from
tionism, as temporal evolution of a connectionist net- one representational state to another -- not surprisingly,
work through a state space. When the issue is learning as since the goal is to construct models that solve cognitive
opposed to processing, however, the weights on the problems. Within connectionism, representations are
network's connections are viewed as malleable rather conceived as being realized by certain activation pat-
than fixed; learning involves incremental changes in the terns of nodes in a network. 5 And within the concep-
weights. Hence the relevant states of the system are tual/mathematical framework of dynamical systems, a
given by total specifications of its weights, at a given specific activation pattern is a vector O.e., an ordered
moment of time during learning. From the dynamical n-tuple); each dimension of the vector is a specific
systems perspective, learning is thus a matter of the activation value of some specific node of the network,
network's temporal evolution through its weight space and thus the vector itself is a set of coordinates in
(with the weight of each inter-node connection being a activation space. Hence, (token) connectionist repre-
dimension of this space). However, a connectionist sentations are vectors. Typically the dimensionality of a
network as it evolves through weight space is not by representation vector will be considerably smaller than
itself a full-fledged dynamical system, because weight the dimensionality of the entire space (i.e., the repre-
changes are effected not by the network itself, but sentation vector will specify activation values for only a
by system-external application of a weight change few nodes in the entire network), so that many different
algorithm. points in activation space will have the coordinates a
Henceforth in this paper, we will frame our discus- representation vector specifies.
sion in terms of connectionist systems, viewed as Insofar as connectionist networks are regarded as
dynamical systems. Two interrelated points should be dynamical systems, it is useful and natural to construe
kept in mind, however. First, the remarks we will make representations as being realized, ultimately, by total
about cognitive systems as dynamical systems need not activation states of a network (i.e., by points in the
necessarily be tied to connectionist systems per se; network's activation space). Thus, if a particular point in
rather, cognitive systems like humans could be best activation space is (partially) characterized by an activa-
understood as dynamical systems quite different in tion vector which itself realizes a given representation,
nature from current connectionist networks; and in then this point itself thereby also realizes that repre-
principle, our subsequent discussion could carry over, sentation. For even minimally complex systems, the
mutatis mutandis, to such alternative systems. But total activation states that occur in processing will
secondly, it is probably a mistake anyway to treat the realize several distinct representations, simultaneously;
notion of a connectionlst system as determined by the i.e., a single point in activation space might well be
sorts of networks currently being explored under the (partially) characterized by several distinct representa-
rubric of connectionism. In a broader sense, a "connection vectors.
tionist system" might better be construed this way: a How might one design a connectionist network that
dynamical system, physically realized by a network would be an effective cognitive system, with cognitive
architecture. In this broader sense, there are numerous capacities comparable to those of humans? It is useful to
possible connectionist systems (including human brains, consider this question of cognitive design from an
perhaps) that differ substantially from the kinds of idealized "God's eye view" perspective. How might God
networks currently being explored by connectionists. conceive the project of designing such a network?
God's overall task can be thought of as involving a
number of interrelated subtasks that need to be simul-
taneously satisfied, as part of the total design project:
3.iv. Cognitive dynamical systems: a creation myth
(1) Choosing a specific network. (This involves
deciding on the number of nodes, the distribution of
Those who construct connectionist models typically connections among the nodes, the rules to be followed
understand them mentalistically. Just as in classical by the nodes in updating their activations and their
modeling, content is assigned to certain states, essen- output signals, and the weights on the connections.)
tially by the modeler's fiat. Connectionist models are (2) Working out a general scheme for realization of
described in terms of representations, and cognitive representations as activation patterns in this network,
and thereby for the realization of total representational tive dynamical system. But how should one construe
states by points in the network's activation space. syntactic structure, and structure sensitive processing,
(3) Setting the weights on the connections in such a within the conceptual/mathematical framework of dy-
way that evolution through activation space from any namical systems? We will address these two questions in
initial point with representational content will be suit- this section.
ably content-sensitive, under the representation scheme The two questions are deeply intertwined. For, just as
mentioned in (2). states only count as representations for the system
(4) Building into the network the capacity to alter its insofar as they function causally (in the system) in ways
own weights in ways that constitute suitably content- appropriate to their content, states only count as having
appropriate learning, relative to the :representational representational structure, for the system, insofar as they
scheme mentioned in (2). 6 function causally (in the system) in ways that involve
In principle, the resulting system might turn out to be suitably structure sensitive effects.
some classical system, one that employs syntactically
structured representations, and in which processing
conforms to PRL rules. (Perhaps, too, it would have 4.i. Structure sensitivity and nonclassical constituents: A
separate memory banks in which representations are to dilemma
be stored in the same form in which they are used; and
other, related, aspects of classical AI-type systems, such The language-like representations employed by AI-type
as classical syntax as characterized in Section 2.) If so, cognitive models have what Fodor and McLaughlin
then the cognitive engineering task will have been (1990) call classical syntactic constituents: whenever a
handled, in effect, by finding a classical system together syntactically complex representation is tokened in the
with a way to implement such a classical system with a system, its constituents are simultaneously tokened.
connectionist network. Paradigm cases of such constituents are spatial or
However, nothing in the cognitive engineering task, temporal parts of the representation of which they are
as just specified, presupposes that the sort of solution constituents. But Fodor and McLaughlin's definition is
God might arrive at would be a network implementation also meant to encompass other possibilities -- such as
of a classical system. And, if our own arguments against items in memory that are objects of "pointers" occurring
classicism are right, then no such system could exhibit in a tokened complex representation, or that are the
either the requisite open-endedness of genuine human values of a '"fetch" operation (p. 186, Note 2). The idea
cognition, or its capacity to accommodate the potential is: classical constituents are items that are actually
relevance of anything to anything. (God, in her infinite physically present in the system during processing.
wisdom, would of course realize this.) So if we are right, Initially it seems plausible to suppose, as Fodor and
then the kind of dynamical system she hits upon, when McLaughlin evidently do, that classical constituency is a
she addresses the task of cognitive engineering in the prerequisite for effective syntax. For, if the constituents
above-described way, will not be just a neurally imple- are not physically present, how can the tokened repre-
mented classical system. This point will unfold as we sentation have constituent sensitive effects?
proceed. Connectionist representations are realized as activa-
tion vectors. So if one were to try introducing language-
like representations with classical constituents into
4. Structured representations and structure sensitive connectionist systems, the natural approach would be to
processing harness the relation between an activation vector and its
various subvectors (as we shall call them) -- where an
As we said in Section 2, we maintain that any adequate activation vector V is a subvector of a vector V' just in
general account of cognition will have to posit mental case (i) every node activation specified by V is also
representations with compositional syntax and seman- specified by V'; and (ii) some node activations specified
tics, plus cognitive processing that is suitably sensitive to by V' are not specified by V.
the syntactic structure of these representations -- in The vector/subvector relation was actually harnessed
short, effective syntax. If we are right, then God would fairly commonly by the representation schemes em-
incorporate effective syntax into the design of a cogni- ployed in the first generation of connectionist models.
34 TERENCE ItORGAN AND JOHN TIENSON
For instance, in these models an individual item or a proposal for building rich structure into connectionist
kind (e.g., a living room) was often represented as a representations, due to Jordan Pollack. 8
vector of activated nodes, with each node separately
representing some "microfeature" (e.g., an item of
furniture); thus, the item was represented, in effect, 4.ii. Recursive auto-associative representations
as a collection of microfeatures. 7 But the structure of
these representations is no more than set theoretic Pollack has developed a way to represent variable sized
membership, and this will not do for syntactic structure. recursive data structures, such as trees or lists, as "fixed
Suppose, for instance, that one has an activation vector bandwidth" patterns in connectionist networks -- i.e., as
representing John, another representing Mary, and activation vectors which all have the same number of
another representing the relation of loving. One does elements regardless of the recursive complexity of the
not obtain an adequate representation of the proposi- data structure being represented.
tion that John loves Mary just by taking the total activa- The approach requires that the complex structures to
tion vector comprised of the three subvectors for John, be represented vectorially have "fixed valence." For tree
loving, and Mary. For how, then, does one represent the structures, this means that every tree and sub-tree must
proposition that Mary loves John? have the same number of "branches." For instance, the
In the current second generation of connectionist trees might all be binary (i.e., have valence 2) -- as does
modeling and theorizing there are proposals for impart- the tree ((AB) (CD)), shown in Figure 1.
ing interesting complex structure, including language- The networks that develop and manipulate these
like structure, into connectionist representations (e.g., vectorial representations are three-layer feed-forward
Hinton, 1990; Pollack, 1989, 1990; Smolensky, 1990). networks called "recursive auto-associative memory"
These proposals (one of which we will examine pres- (or RAAM). (A feed forward network is a network that
ently) do not rely on the vector/subvector relation, but is organized into layers of nodes: an input layer, one or
instead resort to nonclassical constituency relations. more hidden layers, and an output layer. Activation
As yet, very little has been done by way of construct- passes from one layer to the next, without any feedback
ing connectionist models which actually employ the loops or within-layer connections.) If the represented
proposed nonclassical, putatively language-like, repre- structures have valence v, and the representations are to
sentations to perform interesting tasks. It remains to be be k-element activation vectors (i.e., are to have band
seen how amenable they might be to sophisticated, width k), then the input layer and the output layer will
structure sensitive, processing. Given this fact, and given each consist of v distinct pools of k units each. The
the prima facie case that nonclassical structure cannot hidden layer will have k units, no matter how many
provide effective syntax, an apparent dilemma arises. distinct pools of units there are in the input and output
On one hand, prospects look distinctly unpromising layers (i.e., no matter how large v is); this will allow the
for connectionism to incorporate representations with vector representations created in the hidden layer by the
classical syntax. On the other hand, prima facie the system to go into the k-unit input pools, permitting
prospects took even bleaker that connectionist systems recursion. For instance, if the structures to be repres-
(or any systems) could make suitable use of representa- ented are binary trees, then the structure of the network
tion with "syntax" that is not classical. The upshot, it will be configured as in Figure 2.
would seem, is that connectionism very probably cannot A R A A M network is "trained up" so that it will be
deliver effective syntax. able to perform two complementary tasks: compression
We take this dilemma quite seriously, and we think and decompression. In the case of binary trees, for
connectionists should do so too. But contrary to initial
appearances, there is a rather natural and plausible way
that connectionism might be able to steer between the
horns of the dilemma, a way that arises within the
conceptual/mathematical framework of dynamical sys-
tems. We will take up this theme in Section 4.iv. First, A B C D
however, we will lay some groundwork by summarizing,
and drawing some morals from, an important recent Fig. 1.
2K OUTPUT UNITS for the symbol E) in the right input; the network will
[17 LEFT RIGHT ] then produce the appropriate compressed representa-
l WHOLE
// ] K HIDDEN UNITS
tion -- in this case, for the tree (((A B) (C D)) E) -- in
the hidden units; and so forth, for trees of arbitrarily
great depth. Decompression will again yield the (com-
pressed) representation of ((A B) (C D)) in the left
output pool. And decompression of that output will
_//
[ LEFT RIGHT ]
yield a representation of (A B) in the left output pool
and a representation of (C D) in the right. Thus, the
complete structure of a complex tree can be recovered
- rendered explicit -- from its compressed representa-
-
tion, by repeated application of the decompression

2K INPUT UNITS operation.
A completely trained up network will be able to
Fig. 2.
perform such compression and decompression for a
large number of trees -- including, in general, similar
instance, these two operations are to work as follows: trees that were not in its training set. One set of weights
Compression: whenever two k-width vectorial repre- connecting input to hidden and hidden to output nodes
sentations R I and R 2 are present in the left input units makes possible all of these specific compression and
and the right input units respectively, then spreading decompression operations.
activation will generate, in the hidden units, a k-width A trained network gives back as output what was
(i.e., "compressed") vectorial representation 8 3 of the given as input. The RAAM network does not, of course,
binary tree whose left and right branches are presented do this prior to training. The training that brings about
by R 1 and R~ respectively. Decompression: whenever this identity of input and output has an interesting
such an R 3 is present in the hidden units, then spreading feature called the moving target strategy. The training
activation will generate the corresponding R~ and R2 in regimen is divided into successive "epochs." In each
the left and right output units respectively. For instance, epoch, the compressed representations used as inputs
if vectors respectively representing the binary trees are the ones the system itself has developed in the
(A B) and (C D) are respectively present in the left and preceding epoch. The representations change along with
right input units respectively, then these will be "com- the weights from epoch to epoch, and the network
pressed" into a vector in the hidden units that represents gradually discovers appropriate representations. For
the tree ((A B) (C D)); and this compressed representa- example, if the symbols A, B, C, and D are each
tion will then be "decompressed" back into the original represented by k-element vectors, then the network
vectorial representations of (A B) and (C D), which will could be trained to develop compressed k-element
be generated in the left and right output units respec- representations for the binary trees (A B), (C D), and
tively. Successful decompression indicates that all the ((A B) (C D)), in accordance with the following table
information in the 2k input vector representation of (where t represents an epoch of training):
(A B) and (C D) has been compressed into the k unit
vector in the hidden layer. If this information were not
Input pattern Hidden pattern Output pattern
retained in the hidden layer, it could not be extracted in
decompression. However, no subvector of the com-
(A B) Rl(t ) (A'(t) B'(t)
pressed representation will represent either (A B) or (C D) R2(t) (C'(t) D'(t))
(C D). That is, representation is non-classical. (Rl(t) R2(I)) R3(t) (R~(t) Rift))
Much of the interest of the RAAM technique is due
to the tact that compression and decompression are
performable recursively by the network, after it is At each epoch t, the input vectors are those in the left
trained. For instance, the (compressed) representation column, and the output vectors are those in the right
of the tree ((A B) (C D)) in the hidden units can be put column. The learning algorithm slightly alters the
into the left input, along with some representation (say, weights, at each epoch, in a way that moves the network
to a point in weight space where the total error in its natural and appropriate to construe the representations
outputs - - roughly, the total difference between the as having (nonclassical) syntactic constituents -- as does
input patterns and the output patterns -- is lower than it Pollack himself. Table I shows the 13 training items,
was before. Thus the weights and the representations expressed as English sentences. These were hand
co-evolve, over successive epochs. With sufficient train- translated into the 13 tertiary trees listed in Table II --
ing, input and output patterns converge, for the various the "propositions." This tertiary structure, he says, "is
recursively structured items in the system's learning meant to capture the flavor of a recursive (Action Agent
corpus; these will be the network's final representations Object) case system" (p. 90).
of those items. A 3-1-3 I~AAM (i.e., 3 pools of input units and 3
Pollack has developed a variety of models along pools of output units) with 48 input units, 16 hidden
these lines. His work has shown that such networks can units, and 48 output units learned to construct (16-
be successfully trained, for recursive structures with element) vectorial representations of the propositions,
considerable depth and complexity. (His models also and (as Pollack puts it) "to recursively encode and
display some capacity to generalize successfully to
recursive structures not included in the training corpus, TABLE I
which shows that they are not merely "memorizing" the Collection of sentences for propositional experiment
training items.)
Following common usage, we have talked of recur- 1 Pat loved Mary
sive auto-associative representations (RAA representa- 2 John loved Pat
tions) as representations of recursive structures. But 3 John saw a man on the hill with a telescope
where the task domain is anything other than language, 4 Mary ate spaghetti with chopsticks
5 Mary ate spaghetti with meat
it would be more appropriate to think of them as
6 Pat ate meat
representations with recursive structure, that is, as 7 Pat knew John loved Mary
being sentences with syntactic structure. Taking for 8 Pat thought John knew Mary loved John
granted that R A A representations really represent, they 9 Pat hoped John thought Mary ate spaghetti
encode propositionally structured information. They do 10 John hit the man with a long telescope
this holistically. The representation consists of a pattern 11 Pat hoped the man with a telescope saw her
12 Pat hit the man who thought Mary loved John
of activation of many nodes, but it is not possible to 13 The short man who thought he saw John saw Pat
relate particular portions of the content of the repre-
sentation to particular nodes. That is, in connectionist
terminology, they are distributed representations. If the
task domain is other than language, what the R A A TABLE II
Ternary trees for propositional experiment
representation represents is something on the order of
the cat being on the mat, not something on the order of
1 (LOVED PAT MARY)
'the cat is on the mat.' It does this without having a part
2 (LOVED JOHN PAT)
that refers to the cat. But the fact that it is about the cat 3 ((WITH SAW TELESCOPE) JOHN (ON MAN HILL))
is recoverable (recursively) within the system from the 4 ((WITH AT~'~CHOPS'HCKS) MARY SPAGHETTI)
representation itself. It is perhaps puffery to speak of the 5 (ATE MARY (WITH SPAGHETTI MEAT))
vectors in the present generation of R A A M s as repre- 6 (ATE PaT MEAT)
sentations. But R A A vectors that figured in processing 7 (KNEWPaT (LOVED JOHN MARY))
8 (THOUGHT PAT (KNEW JOHN (LOVED MARY JOHN)))
that solved some class of cognitive problems, rather 9 (HOPED PaT (THOUGHT JOHN (ATE MARY
than just in composition and decomposition, would SPAGHETTI)))
appropriately be called representations. Then it would 10a ((WITHHIT (MOD TELESCOPE LONG)) JOHN MAN)
be appropriate to speak of a network's system of R A A 10b (HIT JOHN (WITH MAN (HOD TELESCOPE LONG)))
representations as the language of that network, and to 11 (HOPED PAT (SAW (WITH MAN TELESCOPE) PAT))
12 (HIT PAT (IS MAN (THOUGHT MAN (LOVED MARY
take such a language as an alternative, nonclassical,
JOHN))))
model of the language of thought. 9 13 (SAW(IS (MOD MAN SHORT) (THOUGHT MAN (SAW
One of Pollack's R A A M models involves R A A MAN JOHN))) PAT)
representations of propositions. Here it is especially
decode these trees into their respective parts" (p. 90). genuine (albeit nonclassical) syntactic constituency rela-
As this wording indicates, Pollack regards these R A A tions to one another; and that this nonclassical syntax is
representations as being (vectorially encoded) trees effective syntax. How is this suggestion to be justified,
themselves -- even though their parts (their syntactic and just what does it amount to?
constituents) are not tokened (in a given layer of the To come to grips with these questions, one needs to
network, at a given time) when they are. And prima appreciate three crucial features in virtue of which the
facie this attitude seems appropriate, since the R A A M trained-up R A A M system performs successfully. (These
network actually subjects these vectorial "trees" to a can be viewed as features of its "emergent" engineering
palpable -- albeit simple and limited -- form of structure design, the design that is induced into the system via its
sensitive processing: it recursively constructs complex training.) First, the system's processing, its construction
R A A representations from their constituents, and re- and deconstruction of compressed R A A representa-
cursively deconstructs them back into their constituents. tions, is subserved by shared wiring, as opposed to
(In this experiment, as in the others, the model displayed dedicated wiring. The various different representations
some capacity to generate correct propositional repre- are all tokened, within each layer of the network, as
sentations beyond those in its training set.) distinct activation vectors over the same pool of nodes;
The sort of structure sensitive processing of lan- and the same set of inter-layer connections mediates the
guage-like representations required by cognitive sys- passing of activation whenever the system in engaged in
tems is, of course, a good deal richer than this. And for processing, irrespective of which specific representa-
R A A representations to play an interesting role in such tions are involved. With dedicated wiring, on the other
processing, it would have to be the case that they hand, distinct representations are tokened (in each layer
sometimes generate these richer structure sensitive of the network) in distinct nodes or pools of nodes, with
effects directly, without the system's extracting all the these separate nodes (or node pools) in each layer
syntactic constituents and then employing them in themselves being linked by their own special inter-layer
processing. (If not, they will be just a (nonclassical) way connections. In dynamical systems terms, shared wiring
of storing representations that are otherwise classical.) means that the same weighted connections go into
Research in pursuit of this enticing possibility is just determining all content appropriate transitions. (Dedi-
beginning. 1 cated wiring, by contrast, would mean that each connec-
tion or set of connections mediates only one content
appropriate potential transition.)
4.iii. Structure sensitivity in R A A M networks Second, even though processing does depend upon
shared wiring, nevertheless the single post-training
We will now draw some morals, in dynamical systems setting of the network's connection weights yields an
terms, about the engineering features of Pollack's activation landscape whose overall topology subserves
RAAM networks. Then (Section 4.iv) we will explain recursive composition and decomposition processes
how these morals generalize, thereby suggesting an quite systematically -- for all the items in the training
attractive route between the horns of the dilemma posed set, and also for various representations to which the
in Section 4.i. system spontaneously generalizes. I.e., a single position
Consider Pollack's network for representing proposi- of the network in its "weight space" subserves a wide
tions. The various vectorial representations generated range of different content-appropriate trajectories along
by the network will have various representational the activation landscape from one total representational
contents. In some instances the content of an R A A state to another, and does so despite the fact that each of
representation will be a proposition, like the proposition the weights plays a non-negligible role vis-a-vis many of
that Pat loves Mary. In other instances the content will these different trajectories.
be some entity, other than a proposition, that figures as Third, the systematic appropriateness of processing
a semantic constituent of some proposition; for instance, thus depends not only on the weighted connections
the individuals Pat and Mary, and the relation of loving. between nodes, but also on what activation vectors the
Now, the suggestion at the close of section 4.ii was that representations are; they cannot be arbitrary vectors,
the R A A representations whose contents bear semantic but instead must have appropriate mathematical prop-
constituency relations to one another, themselves bear erties and relations in order to be amenable to the kind
of processing in virtue of which they can function as (in part, perhaps, because she appreciates the soundness
representations. In dynamical systems terms, it is critical of our own tracking argument); and that it is to be
(given that the RAAM network does compression and nonclassical syntax (in part, perhaps, because she
decompression of propositional representations) how regards the vector/subvector relation as too impover-
each representation R is positioned, in activation space, ished a resource for efficiently incorporating effective
in relation to other representations (i) that are con- syntax). We suppose, in addition, that she resolves to
stituents of R, and (ii) that have R as one of their resort primarily to shared wiring in her engineering
constituents. Only because each representation is suit- plan, rather than dedicated wiring, because on the scale
ably positioned in activation space relative to these she has chosen to work (viz., that of human heads),
others -- i.e., is the right vector -- does each trajectory dedicated wiring would permit at best cognitive systems
along the landscape commencing with any one of these of far too limited capacity. If she can design a system
representations constitute an appropriate instance of that meets these conditions, then she will have found a
processing -- i.e., one that leads to suitable compression route between the horns of the dilemma posed in
(when activation passes from the input to the hidden Section 4.i.
layer) or decompression (when activation passes from Even though the system she is designing must be
the hidden to the output layer). Thus the importance of vastly more complex and sophisticated than a RAAM
the "moving target" strategy, which permits the RAAM network that does nothing but recursive compression
to find the appropriate representation. and decompression of vectorial representations, the
So the emergent engineering design of the system, morals we just extracted concerning RAAM's are
relying as it does on shared wiring rather than dedicated evidently generalizable. Here is how they generalize --
wiring, exploits two crucial and complementary factors: and hence how to steer between the Scylla of connec-
the topology of the activation landscape, and the relative tionism's lack of promise as a medium for classical
positions, on that landscape, of representations bearing syntax, and the Charybdis of the prima facie impossi-
constituency relations to one another. It is no accident bility of effective syntax that is nonclassical.
that these two factors work together this way in the Recall, from Section 3, the four tasks that must be
system's emergent design, since they have co-evolved simultaneously satisfied in producing an adequate de-
during the system's training. Thus it is a matter of sign of a cognitive system. Items (2) and (3) on that list
evolution which points in activation space realize the were to work out a general scheme for the realization of
various representations, and also a matter of evolution total representational states by points in the network's
which points follow which others in the trajectories activation space; and to set the weights on the connec-
along the activation landscape that constitute the sys- tions in such a way that evolution along the activation
tem's processing. landscape from any initial point with representational
Since these positional relations (in activation space) content will be suitably content sensitive (under the
among complex representations and constituent repre- given representation scheme). Since God seeks to satisfy
sentations do figure essentially in the system's engineer- these two constraints using shared wiring, it is crucial to
ing design, and since the design harnesses them in a way position the representation realizations on the land-
that specifically allows processing to be suitably sensi- scape just right. In particular, since she seeks to install
tive to their constituent representations, the positional processing trajectories that are suitably sensitive to the
relations thereby function as syntactic relations. They constituents of representations with complex content, it
function as very rudimentary effective syntax, even is especially important to position the complex repre-
though they do not constitute classical syntax. sentations just right vis-~t-vis the representations of their
constituents and other representations with the same
constituents. The resulting engineering design will har-
4.iv. Structure sensitive processing without classical ness these positional relations on the landscape, be-
constituents tween representations whose contents bear constituency
relations to one another, in a way that specifically allows
Now consider again God's project of cognitive engineer- for processing that is suitably sensitive to these semantic
ing, as set forth in Section 3. God resolves, we suppose, relations. The positional relations in activation space, as
that effective syntax is to be incorporated into the design thus exploited within God's engineering design, will
provide for composition and decomposition (as do ent-dimensional distances (and perhaps also involving
RAAM's), and for many fancier processes that are also other aspects of relative position beyond distance
appropriate to content and structure. These relations relations). The range of distinct, potentially exploitable,
will thereby function as effective syntax, even though positional relations is enormous.
they do not constitute classical syntax. Fourth, there is another perspective one can take
There you have it, in a nutshell: the way out of the toward the incorporation of nonclassical effective syn-
dilemma might well be an engineering design that tax into cognitive engineering -- a perspective that
cleverly exploits both the shape of the activation land- evidently constitutes a further gloss on the idea of
scape and positioning of representation on it. We now exploiting positional relations, in activation space,
hasten to add a number of remarks by way of elabora- among representation realizations. One can view struc-
tion and clarification. The following points should also ture sensitive processing as the exploitation, in the
help to make this picture a fairly plausible epistemic system's engineering design, of various different simi-
possibility, beyond merely an abstract conceptual pos- larity relations among the activation vectors that realize
sibility. representations. For instance, if several different activa-
First, in the cognitive system God designs, the tion vectors all have propositional contents that involve
relevant sort of content sensitive processing must of a particular individual, then these vectors ought to be
course be much more complex and variegated than is relevantly similar in some (possibly fairly abstract) way
the case with a simple RAAM system that just does by virtue of that common semantic constituent of their
compression and decompression. Accordi@y, the ways respective contents; and the system's cognitive engineer-
that the engineering design exploits positional relations ing should exploit that similarity. From a dynamical
in activation space, among representations whose con- systems perspective, the various potentially exploitable
tents bear semantic constituency relations to one an- similarity relations, among vectors with common con-
other, must of course be much more complex and stituents, will be just the various positional relations that
variegated too. we have been talking about already.
Second, one should not think that shared weights Fifth, one should keep in mind that God's cognitive
undermine the potential for richly variegated modes of engineering design, as characterized in Section 3, really
structure sensitive processing. Even though each pro- involves total representational states being realized as
cessing trajectory on the activation landscape will points in the activation landscape. Typically a single
generally be influenced by weights in the network that total representational state of a cognitive system will be
also influence numerous other processing trajectories, comprised of several different member representations.
this certainly does not mean that the trajectories must be What was said above about nonclassical constituency
topologically similar to one another, so similar as to really is also applicable, mutatis mutandis, for total
preclude the requisite variability in forms of structure representational states vis-a-vis their member repre-
sensitive processing. On the contrary, the activation sentations: this membership relation too need not be
landscape of a nonlinear dynamical system can manifest classical, but instead can involve suitable positioning, in
exquisitely complex differences in the local topological activation space, of points that realize total representa-
features that determine various specific trajectories -- tional states vis-~t-vis points that realize their member
i.e., so-called sensitive dependence on initial condi- representations.
tions.1 Sixth, when a cognitive system's design exploits
Third, it is important to realize that in a high spatial position relations (i.e., similarity relations)
dimensional activation space, there are many different among representations in a manner that constitutes
kinds of positional relations, among points in the space, nonclassical effective syntax, the result is a form of
that are potentially exploitable to function as relations effective syntax that does not operate via (i) the syn-
of nonclassical syntactic constituency. For a network of tactic constituents of a semantically complex repre-
dimension n, there is of course the total n-dimensional sentation being tokened in the system along with the
"distance" relation. But in addition, there are also representation itself, and then (ii) these tokened constit-
various distance relations in various of the subspaces of uents figuring in the token causal sequence whereby the
this n-space; and also various kinds of positional tokened representation generates its structure-sensitive
relations involving various combinations of these differ- effects. An important lesson of the story we have been
40 TERENCE HORGAN AND JOttN TIENSON
telling is that structure sensitive effects can be generated perfectly good sense, and the fact that it is very much a
in other ways than this. In particular, they can be live option epistemically, reinforce our independent
generated by tokened representations that are semanti- arguments (briefly summarized in Section 2) against the
cally complex but lack classical syntactic constituents, PRL rule describability of human cognition; and these
provided that the system's engineering design suitably facts also add credibility to the suggestion that connec-
exploits certain positional relations, in activation space, tionism can evolve into an RWR paradigm. It is an
linking the tokened representation to various nonto- important development just to see this position as being
kened representations whose contents are the semantic genuinely conceptually possible, and as epistemically
constituents of the tokened one. (The engineering open, since it's been largely invisible from the classicist
design of a radio receiver is relevantly analogous. All perspective.
waves superimpose; hence waves of specific frequencies How, then, do the points we made in Sections 3 and 4
are not classical constituents of the ambient radio wave. help to reveal the conceptual cogency and epistemic
But the effects in the receiver are sensitive only to the viability of cognitive processing that does not conform
structure of non-tokened waves of the specific fre- to PRL rules? In two principal, interrelated, ways. First,
quency to which the receiver has been tuned.) Whether when one conceives of certain dynamical systems as
structure sensitive effects, of the richness and variety information processing systems, and of information
found in natural cognitive systems, can be generated in processing itself as evolution along a temporal land-
this manner remains to be seen. scape (e.g., along an activation landscape, for connec-
tionist systems), then the possibility arises that, in some
such systems, the overall topology of the landscape is so
5. Representations without rules subtle and nuanced -- relative to the representational
content of various points on the landscape -- that no set
We said in Section 2 that our discussion here would of PRL rules can describe the system's information
bolster our own larger case for RWR conception of processing capacities. For one thing, if the weights and
cognition, and would also lend plausibility to the activation levels in a network are real-valued, and its
hypothesis that connectionism can evolve into an RWR dynamics given by a set of differential equations, then, in
paradigm. We shall now explain why Sections 3 and 4 general, the characteristic function of the network -- the
have this epistemic import. function defined by its dynamical equations -- will not
One key ingredient of the RWR conception is the be a computable function. And if the characteristic
contention that cognitive systems with information function of the network is not computable, there is no
processing capacities like those of humans must have reason to think that there will be a computable function
mental representations with effective syntax. The import characterizing its potential representational state transi-
of the preceding discussion on this issue is obvious: tions.
contrary to certain initial appearances, there are poten- This assumes an unbounded set of representations,
tially powerful ways to incorporate effective syntax since any finite function is computable. If the set of
into connectionist systems, and into similar kinds of possible total representational states is finite, it may be
dynamical systems. The natural way to do this is via possible to characterize the system at least by way of a
nonclassical constituency. list which specifies, for each possible representational
The other essential feature of RWR is the claim that state, the representational state to which the system will
human-like cognitive processing does not conform to evolve. But this list will be huge, far too large to itself
programmable, representation level rules (PRL rules). constitute a set of PRL rules (cf. Note 1). And it may be
The general overall dialectical bearing of the preceding that there is not the kind of order and systematicity
discussion of this" issue is as follows. The discussion within the list to allow it to be reduced to the kind of
helps make manifest, in ways to be elaborated presently, general, exceptionless principles required for PRL rules.
the genuine conceptual and epistemic possibility of Furthermore, because of multiple realizability, there
cognitive dynamical systems that do sophisticated, might not even be such a list. It might be the case that
content appropriate, information processing (including from one realization of a representation, R, the system
processing involving effective syntax) but do not con- will evolve to a realization of representation RI, while
form to PRL rules. The fact that this possibility makes from another realization of R the system will evolve to a
realization of representation R2. It is important to structure. These soft laws might really hold not merely
understand that this holds even if the system is rule for a single network, but rather for a whole class of
describable at the node level. Given multiple realiza- similarly related networks. (It is plausible that this is
bility, rule describability at a low level does not imply what human neural systems are, given the number of
rule describability at a higher level. There is nothing neurons and neural connections, and the limited space
wrong with this. It is not necessary for a cognitive system for genetic encoding of the wiring diagram.)
to go through the same mental changes every time it has
a certain combination of mental states. What is neces-
sary is that it goes into some sensibly related cognitive 6. C o n c l u d i n g r e m a r k s
state most of the time.
Systems that do conform to PRL rules may be just a We have two final points to make about cognitive
special case of information processing dynamical sys- dynamical systems. The first concerns learning and
tems. And those belonging to this special category may memory. As we remarked in Section 2, we think it
all be -- will all be, if our own arguments against the unlikely that memory, in RWR cognitive systems
PRL rule describability of human cognition are correct (including humans), involves storage of representations.
- - distinctly lacking in certain capacities that humans Rather, memory and knowledge are likely to be a matter
manifest in abundance: for instance, the capacity to of the system's cognitive dispositions: dispositions to
accommodate, flexibly and quickly, the potential rele- evolve from one total representational state to another
vance of virtually any piece of known or remembered in ways that are appropriate to the content of what the
information to virtually any cognitive task. system knows, believes, and remembers. The operative
Second, the range of potential cognitive dynamical dispositions will include, but certainly need not be
systems that do not conform to PRL rules looks even restricted to, tendencies to recreate active representa-
broader yet when one takes into account the ways that tions of the known/believed/remembered information,
nonclassical syntax might be involved in the cognitive when appropriate. And learning is likely to be a matter
engineering of such systems. Engineering designs that of the acquisition of additional such dispositions. The
resort solely to classical syntax, as a basis for processing point we wish to stress here is this: in dynamical systems
that is suitably sensitive to semantic constituency rela- terms, these dispositions are to be subserved by the
tions among representations, may be just a restricted same two complementary aspects of the system's cogni-
special case of the possible designs that incorporate tive engineering that figured so importantly in our
effective syntax. In dynamical systems like connectionist discussion of effective syntax: (i) the global topology- of
networks, there is a tremendously rich range of poten- the activation landscape (as determined by shared
tially exploitable nonclassical positional relations in the wiring), and (ii) the relative positions of representations,
system's state space; and this enhances substantially the on the landscape. This approach to learning and mem-
possibility that some such systems can have temporal ory, different as it is from classicism's reliance on mental
landscapes that subserve content appropriate processing filing cabinets, is actually rather natural looking within
by way of an overall topology that is far too intricate and the dynamical systems framework.
finely molded, relative to the contents of the points on The second point concerns a feature of vectorial
the landscape that realize mental representations, for mental representations. We have stressed the likelihood
the system's information processing to conform to PRL of, and the potential advantages of, nonclassical syntac-
rules. tic constituency relations for such representations. As
That there are no PRL rules statable over representa- RAAMs illustrate, nonclassical constituency relations
tions themselves does not mean that there won't be require distributed representations, involving activation
important, scientifically illuminating, generalizations at across many nodes, rather than just one. Cognitive dy-
this level. There still could be, and likely would be, a namical systems are, we think, likely to employ mental
range of soft laws: generalizations containing ceteris representations that are highly distributed. Distributed
paribus clauses subject to indefinitely many same level representation not only makes more efficient use of
exceptions (Horgan and Tienson, 1990). In dynamical representational resources than does local (single node)
systems terms, these would be an important global representation; it also tends to situate the various
aspect of such systems, apart from their local fine representations in richer relative position relations to
42 T E R E N C E H O R G A N AND JOHN T I E N S O N
o n e a n o t h e r i n activation space; a n d these relations, as (1991). The "bible" of cormectionism is the two volume collection,
a l r e a d y stressed, are potentially exploitable for p u r - Rumelhart and McCMland (1986).
We speak of representations being realized by activation patterns,
poses of cognitive engineering.
rather than being identical to them, because typically a given
T h e m o r a l s of these two points, as applied to the
representation will be multiply realizable in a given connectionist
cognitive systems f o u n d in n a t u r e , are that we should system. More below on multiple realization, and on the reasons why
expect d i s t r i b u t e d r e p r e s e n t a t i o n s to b e the n o r m (and it is typical in connectionist systems.
so should n o t b e surprised that these are what F r e e m a n 6 Since current connectionist networks cannot alter their own
and his colleagues (Skarada and Freeman, 1989; weights, this condition in effect assumes that certain systems with
such self-alteration capacity would still count as connectionist
F r e e m a n 19 91) have f o u n d in their studies of the r a b b i t
networks; cf. the final paragraph of Section 3.iii.
olfactory system); a n d that we should n o t expect to find 7 In this and the next several sections, we will talk about activation
r e p r e s e n t a t i o n s in the cognitive system (stored in vectors as being representations rather than as realizing representa-
m e m o r y ) w h e n they are n o t active i n processing. O u r tions, for simplicity of exposition and because this is the more
m a i n c o n c e r n in this p a p e r has b e e n with structurally common connectionist way of talking. It is worth keeping in mind,
though, that in general representations will be multiply realizable, at
c o m p l e x r e p r e s e n t a t i o n s . W e h a v e a r g u e d elsewhere,
least in networks in wtuch nodes can take on a range of activations
a n d r e i t e r a t e d here, that n a t u r a l cognizers (like h u m a n s
rather than being (for instance) binary' valued. The current activation
a n d rabbits) m u s t have systems of s t r u c t u r e d r e p r e - vector is a tokening of the representation. The same representation
sentations. O u r m a i n p o i n t has b e e n that we s h o u l d n o t can be tokened by (many) other activation vectors (much as the same
expect this structure to b e d i s c e r n i b l e in the r e p r e s e n t a - sentence can be printed in many different fonts, and written by many
tions themselves. T h e structure m a y well consist in their different hands).
s Another important such proposal is the tensor product repre-
b e i n g c o m p o s a b l e a n d d e c o m p o s a b l e i n certain sys-
sentation scheme developed by Smolensky (1990). (The only extant
tematic ways (such as is d o n e b y R A A M s ) within the connectionist model we know of employing this scheme is the
cognitive systems, a n d m o r e generally i n their relations, implementation of a simple production system by Dolan and
within the system, to o t h e r r e p r e s e n t a t i o n s . I n systems Smolensky (1989).) Smolensky (1989, 1991) argues that a connec-
of this sort, relations a m o n g r e p r e s e n t a t i o n s will be tionist paradigm employing tensor product representations can steer
between the horns of a dilemma posed for connectionism by Fodor
systematic, b e c a u s e they rely o n shared wiring. A n d they
and Pylyshyn (1988). Fodor and McLaughlin (1990) reply to
will hold relative to the wiring a n d the weights within the
Smolensky by arguing, in effect, that tensor product representations
cognizer - - that is, relative to what we have called the cannot provide effective syntax because they lack classical constitu-
cognitive d y n a m i c a l system's p a r t i c u l a r t e m p o r a l l a n d - ents. In Horgan and Tienson (forthcoming a), we argue that Fodor
scape. and McLanghlin's critique of Smolensky is unpersuasive; the present
paper, however, says substantially more about how effective syntax
could be possible without classical constituency. Although it is more
convenient and natural, for present purposes, to use Pollack's
Notes representation scheme for illustration and as a basis for addressing
the dilemma posed in Section 4.i, we could also have done so using
1 There are on the order of 102 English sentences of 20 words or Smolensky's tensor product representations. Tensor products pro-
less. For most of these there is a potential corresponding thought, vide a very natural way of encoding syntactic constituency relations,
and potential thoughts far outstrip sentences because of our ability to such as that 'John' is the subject of the sentence, and 'Mary' the direct
make relevant discriminations that we lack linguistic resources to object, But tensor products alone make recursion awkward at best.
describe. For comparison, there are 10 l or so neurons in the human PoUack's RAAM technique allows recursion to go smoothly, but
brain, and there have been around 1017 seconds in the history of the does not clearly encode constituency relations. Since we wrote this,
universe. two systems have been proposed independentlythat aim at combin-
2 It may, of course, turn out that such processing is (mathematically ing the virtues of tensor products and RAAMs (Berg, 1991; Plate,
cum physically) impossible. But this is something that cannot even 1991).
be investigated until the conceptual possibility is clearly seen. 9 RAAM systems would give us representational schemes that work
3 For a useful introduction to dynamical systems, with emphasis on more like 'drank' in 'Jill drank water' or Spanish 'hablo' (= I speak)
visual presentation of key ideas, see Abraham and Shaw (1984). than like the familiar languages of formal logic and classical
There are numerous recent books discussing dynamical systems, computer science. But as these examples (and classical Greek and
intended for a general audience and highlighting dramatic recent Latin) show, such representational schemes are perfectly good as
developments involving chaos and fractals; the best known is Gleick languages,
(1987). 10 The RAAM model constructed by Chalmers (1990) is intriguing
4 For a compact introduction to connectionism, see Tienson (1987). and suggestive. Chalmers first uses a standard RAAM network to
For a more thorough introduction, see Bechtel and Abrahamsen construct RAA representations of active sentences and correspond-
C O G N I T I V E SYSTEMS AS D Y N A M I C A L SYSTEMS 43
ing passives. Then he trains another network to convert the repre- Hinton, Geoffrey E.: 1990, "Mapping part-whole hierarchies into
sentations of active sentences into the corresponding representations connectionistnetworks', Artificial Intelligence 46, 47--75.
of passives. This new network does active/passive conversions Horgan, Terence E. and Tienson, John L.: 1987. 'Settling into a new
without any intermediate extracting of constituents. And it also paradigm', Southern Journal of Philosophy 26, Supplement, 97--
displays some capacity to generalize correctly to items beyond the 114.
training set, thereby indicating that it has not simply learned the Horgan, Terence E. and Tienson, John L.: 1989, 'Representations
specific mapping of actives to passives that constituted its learning without rules', Philosophical Topics 17, 147-- 174.
regimen. Horgan, Terence E. and Tienson, John L.: 1990, 'Soft laws', Midwest
~ Connectionist networks are nonlinear dynamical systems, as they Studies in Philosophy 15,256--279.
provably must be in order to avoid well known limitations in their Horgan, Terence E. and Tienson, John L~:forthcoming a, 'Structured
computational power similar to the limitations of perceptrons. As representations in connectionist systems?', in S. Davis (ed.),
striking recent developments in dynamical systems theory have Connectionism, Theory and Practice, Oxford.
taught us, the topology of a nonlinear dynamical system's temporal Horgan, Terence E. and Tienson, John L,: forthcoming b, Connec-
landscape can be enormously complex. This complexity can involve, tionism and the Philosophy of Psychology: Representational
among other factors, the nature of the attractors (including chaotic Realism without Rules', MIT/Bradford, Cambridge.
ones); the nature of the basin boundaries (including fractal ones); Plate, Tony: 1991, "Holographic reduced representations', Technical
and the relations among the attractor basins (including highly Report CRG-TR-91-1, Department of Computer Science, Uni-
intricate intertwiulngs, some of them fractal). versity of Toronto.
Pollack, Jordan B.: 1989, 'Implications of recursive distributed
representations', in D. Touretsky (ed.), Advances in Neural
References InCbrmation Processing, Morgan Kaufmann, Los Altos, pp. 527--
536.
Abraham, Ralph H. and Shaw, Christopher D.: 1984, Dynamics: The Pollack, Jordan B.: 1990, 'Recursive distributed representations',
Geometry of Behavior, 4 Volumes, Aerial Press, Santa Cruz. ArtificialIntelligence 46, 77--105.
Bechtel, William and Abrahamsen, Adele: 1991, Connectionism and Rumelhart, David E., McClelland, James L., and the PDP Research
the Mind, Basil Blackwell, Oxford. Group (eds.): 1986, Parallel Distributed Processing, 2 Volumes,
Berg, George: 1991, 'Combining the strengths of PDP and X-bar Bradford/MIT, Cambridge.
syntax', Technical Report 91--5, Department of Linguistics and Skarada, Christine A. and Freeman, Walter A.: 1987, "How brains
Cognitive Science, State Universityof New York at Albany. make chaos in order to make sense of the world', Behavioral and
Chalmers, David J.: 1990, 'Syntactic transformations on distributed Brain Sciences 10,161--195.
representations', Connection Science 2, 53--62. Smolensky, Paul: 1987, "The constituent structure of connectionist
Dolan, Charles P. and Smolensky, Paul: 1989, 'Implementing a mental states: a reply to Fodor and Pylyshyn', Southern Journal
connectionist production system using tensor products', in G. of Philosophy, 26, Supplement, 137--163.
Hinton and T. Sejnowski (eds.), Proceedings of the Connectionist Smolensky, Paul: 1990, 'Tensor product variable binding and the
Models Summer School, Morgan Kaufmann, Los Altos. representation of symbolic structures in connectionist systems',
Fodor, Jerry and Pylyshyn, Zenon: 1988, 'Connectionism and Artificial IntelIigence 46,159--216.
cognitive architecture: a critical analysis', Cognition 28, 3--71. Smolensky, Paul: 1991, 'Connectionism, constituency, and the
Fodor, Jerry and McLauglin, Brian P.: 1990, 'Connectionismand the language of thought', in B. Loewer and G. Rey (eds.), Fodor and
problem of systematicity: why Smolensky's solution doesfft His Critics, Blackwell, Oxford.
work', Cognition 35,183--204. Tienson, John L.: 1987, 'Introduction to connectionism', Southern
Freeman, Walter J.: 1991, ~The physiology of perception', Scientific Journal of Philosophy 26, Supplement, 1--16.
American 264, 2, 78--85.
GIeick, James: 1987, Chaos: Making a New Science, Viking Penguin,
M e m p h i s State University
New York.

Cognitive Systems As Dynamical Systems

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Cognitive Systems As Dynamical Systems

Transféré par

Droits d'auteur :

Formats disponibles

Cognitive Systems as

Dynamical Systems Terence Horgan and John Tienson

1. Introduction both in humans and in other animals, are vastly more

Topoi ! 1: 27--43, 1992.

tion, by repeated application of the decompression

Vous aimerez peut-être aussi