Gobet Lane 2007 Order

Chapter 8
An Ordered Chaos: How Do Order

Effects Arise in a Cognitive Model?
Fernand Gobet
Peter C. R. Lane
This chapter discusses how order effects arise within EPAM, an influential computational theory
of cognition developed by Feigenbaum and Simon. EPAM acquires knowledge by constructing a
discrimination network indexing chunks, closely integrating perception and memory. After a brief
description of the theory, including its learning mechanisms, we focus on three ways order effects
occur in EPAM: (a) as a function of attention; (b) as a function of the learning mechanisms; and
(c) as a function of the ordering of stimuli in the environment. We illustrate these three cases with
the paired-associate task in verbal learning and with an experiment using artificial material. In the
discussion, we address some of the implications of this work for education, including how to order
hierarchically arrangeable material, and the need to focus learners on important and relevant
features.
Chapter 2 by Cornuéjols presents some current re- ceptual maps for identifying complex and varied vi-
search in machine learning, and Chapter 5 by Nerb sual forms of abstract entities such as an arch. In this
et al. discusses several process models that have been model, the quality of the conceptual map built by the
used to simulate human behavior. In this chapter we system depended on that of the presented examples.
consider the origins of order effects in a detailed theory Winston developed the idea of a ‘‘near miss’’ to define
of human cognition. This theory arose early in the the optimal presentation sequence; the near miss
development of the field now known as artificial in- would make some feature salient, which the learner
telligence (AI), and it is worth considering in brief the would then use to induce some property of the con-
context from which it emerged. At the time (the 1950s cept being learned. For example, given an initial ex-
and 1960s), AI was generally seen as an attempt to ample of an arch, the second example might be a
model the way in which the human mind represents ‘‘nonarch,’’ without a gap between the supporting
and uses knowledge. Therefore, the early AI systems pillars. This example fails to be an arch in only one
were, in many respects, cognitive models. Two broad property—the lack of a gap between the pillars—and
ideas were explored in those models that learned: one so this is extracted by the learner and emphasized as
set of models required explicit, supervised instructions; an important property of an arch.
the other, operating more ‘‘unconsciously,’’ extracted Unfortunately, such a level of supervision depends
patterns in an unsupervised manner. on the teacher possessing full knowledge of the domain
The models based on supervised learning relied to and the learner having an optimal internal represen-
a large extent on simplified forms of human tutoring. tation of the domain. But even more so, it relies on
A good example of such a model is Winston’s W the internal matching process of the learner extracting
system (Winston, 1975, 1992), which learned con- precisely those features that the teacher intended to be
107
108 FUNDAMENTAL EXPLANATIONS OF ORDER
the most salient conceptual differences between the amendments may be followed in Feigenbaum (1963),
two examples. The requirement of absolute knowledge Feigenbaum and Simon (1962, 1984), Richman and
by the teacher of both the domain and the learner has Simon (1989), and Richman, Staszewski, and Simon
meant that such fine-grained supervised instruction did (1995). A more recent outgrowth of this line of research
not scale up well to more complex systems or envi- is CHREST (Chunk Hierarchy and REtrieval STruc-
ronments. Other cognitive models from this period rely tures), for which the reader can consult Gobet (1993,
on a less-supervised instruction routine and allow the 1996), Gobet and Lane (2004, 2005), Gobet and
individual learner greater scope in building up an in- Simon (2000), and Gobet et al. (2001).
ternal representation of the presented examples. The This section begins with an overview of EPAM
teacher supervises the learner’s overall performance and its basic assumptions, considering any implica-
rather than the details of the knowledge gained. This tions for the way stimuli order might affect learning
approach to computer instruction is followed in the and processing. Then we describe in more detail how
field now known as machine learning. the discrimination network is used to represent and
In this chapter we discuss one of these less-supervised learn about information acquired by perception.
theories of human cognition, one based directly on
perception. The theory, known as EPAM (Elemen- Overview
tary Perceiver and Memorizer), relies on an attentional
Figure 8.1 contains a schematic overview of the
mechanism and short-term memory to extract percep-
five basic areas of the EPAM architecture: processes
tual chunks from the observed data and then stores
for acquiring low-level perceptual information, short-
these chunks in a discrimination network for distin-
term memory (STM) and attentional mechanisms,
guishing between training examples.1 EPAM has been
processes for extracting high-level features from the
used to account for certain data on the effect of order in
low-level information, its indexing in a discrimination
learning. This chapter presents a simplified version of
network, and the link from this network into other
EPAM’s learning mechanisms and some experiments
items of long-term memory, such as productions or
that examined the way in which the order of training
schemas. The STM mediates the amount of proces-
examples affects the content and performance of the
sing that can occur in any of these areas and guides the
discrimination network learned. Thus, this chapter cov-
attentional mechanisms, which, in this case, are eye
ers a single topic in relatively great detail, showing how
fixation points. As an implemented theory, EPAM has
order effects arise in an established cognitive model.
little to say about feature extraction: Implementations
assume merely that it occurs and supplies suitable
A COGNITIVE MODEL OF features for discrimination. Also, as a theory of per-
PERCEPTION AND MEMORY ception, EPAM has had little to say about the contents
of long-term memory, although Richman et al. (1995),
The cognitive model we discuss in this chapter was Gobet (1996), and Gobet and Simon (2000) discuss
initially proposed in Edward Feigenbaum’s 1959 PhD ways in which traditional models of semantic memory
dissertation. Feigenbaum introduced a theory of high- can be implemented within the theory.
level perception and memory, which he called EPAM. The central processing of the EPAM architecture
The basic elements of EPAM include low-level per- therefore revolves around the acquisition of a discrim-
ception, a short-term memory, attentional mechanisms, ination network while the attentional mechanisms re-
and a discrimination network for indexing items in long- trieve high-level perceptual features from the outside
term memory. Although EPAM includes all of these world. For the purposes of this chapter, we assume that
elements in a complete theory of cognition, most of the the sole role of the discrimination network is to dis-
interest in the theory has focused on the discrimination tinguish perceptually distinct phenomena from one
network. Many of the amendments to EPAM address another.
variations on the learning mechanisms possible in ac- Feigenbaum was supervised by Herbert Simon,
quiring this network; the various improvements and and thus EPAM has some features that arise from Si-
1. Although our emphasis is about its use as a theory of human cognition, EPAM can also be considered a set of
algorithms for machine learning. The system is available at http://homepages.feis.herts.ac.uk/~comqpcl/chrest-shell.html
AN ORDERED CHAOS 109
Long-Term Memory
High-Level Concepts
(e.g., schemas and productions,
Feature Discrimination including strategies)
Extraction Network
Low-Level
Perception
Attention
Mechanism
STM
figure 8.1. Overview of the EPAM architecture.
mon’s view of human cognition as expressed in, for once the system has learned words such as ‘‘cat,’’ the
example, Simon (1981). One of these is that seriality sentence ‘‘the cat sat on the mat’’ would be repre-
plays an important role. The central processing is all sented as an ordered list of words and not individual
assumed to operate in a serial fashion; that is, the eye letters. We can see this more clearly after considering
fixates on an object, features are extracted and pro- both the discrimination network built up by the
cessed in the discrimination network, and then a learning mechanisms within the cognitive model and
further eye fixation is made, and so on. In addition, its role in guiding perception.
the STM operates as a queue, in which the first ele-
ments to enter are also the first to leave; the STM, as
Searching and Creating
suggested by Miller (1956), has a limited capacity,
the Discrimination Network
which experiments with EPAM have shown to consist
of between three and seven chunks. Further influ- The information experienced by the cognitive model
ences from Simon include the fact that learning is is stored within the discrimination network. This net-
sufficient to generate satisfactory but not necessarily work consists of nodes, which contain the internal
optimal performance, and processing is constrained representations (known as images) of the experienced
by capacity and time limits. The limitations include data. The nodes are interconnected by links, which
various time parameters, which permit direct empir- contain tests by which a given item is sorted through
ical tests. Examples include the length of time re- the network. In this chapter we assume that tests are
quired for eye fixations and the processing of infor- simply chunks from the discrimination network, that
mation in the discrimination network. is, individual letters or words. Each test applies simply
to the next letter (or set of letters) in the stimulus. In
the broader theory of EPAM, more complex tests are
Hierarchical Representation of Objects
possible in which, for example, color is tested.
In EPAM, all information is stored as an ordered list Figure 8.2 contains an example network in which
of features. For example, the word ‘‘cat’’ may be stored the black circles depict separate nodes, and the node
as the ordered list of letters c, a, and t. These features images are contained in ellipses next to them. Note
are initially extracted from the image by using the that an image need not contain the same information
attentional mechanisms and feature extraction mod- as the set of tests that reaches that node. For example,
ule. However, these features may also be adjusted to the node reached after the test ‘‘t’’ from the root node
suit what has been learned previously. For example, has ‘‘time’’ as its image, and not just ‘‘t.’’ The reason
t d
c
time cow dog
r r
h a h
tray this c cart ch
r e t
th the cat
figure 8.2. Example of a discrimination network.
for this will become clear when we describe the pro- from the item is added to the image. If the item and
cess of familiarization. image disagree in some feature, then discrimination
An item is sorted through a network as follows: occurs, in which a new node and a new link are added
Starting from the root node of the network, the links to the network. Estimates have been made from human
from that node are examined to see whether any of the data of the time each of these operations requires: for
tests apply to the next letter (or letters) of the current discrimination, about 8 seconds, and for familiariza-
item. If so, then that link is followed, and the node tion, around 2 seconds (Feigenbaum & Simon, 1984).
reached becomes the new current node; this proce- One further important aspect is the way the model
dure is repeated from this current node until no fur- determines the test to be added to the new link during
ther links can be followed. discrimination. At the point of discrimination, the
The learning procedure is illustrated in Figure 8.3 model will have sorted the new item to a node, and it
and proceeds as follows: Once an item has been sorted will mismatch some feature in the image there. In
through the network, it is compared to the image in the order to progress, the model must create a test from the
node reached. If the item and image agree but there is information in the new item. It does this by sorting
more information in the item than in the image, then the mismatching features of the new item through the
familiarization occurs, in which further information discrimination network. The node reached from these
mismatching features is used as the test on the link.
For example, ‘‘cat’’ was used as the test in Figure 8.3.
Thus, tests may be formed from lists of features and
h h not just single features; for example, in the textual do-
main, tests may be words and not discrete letters. We
provide some pseudocode in the Appendix to this chap-
e e
ter for readers who wish to construct their own imple-
the dog the dog mentation of this discrimination network algorithm.
cat
the cat EPAM as a Psychological Theory
As a psychological theory, EPAM has several strengths.
figure 8.3. Examples of learning. Only a fragment of It is a simple and parsimonious theory, which has
the entire network is shown. Left: Presenting ‘‘the few degrees of freedom (mainly subjects’ strategies).
dog’’ leads to familiarization, by which information is
It makes quantitative predictions and is able to simu-
added to the current node; note that the entire word
late in detail a wealth of empirical phenomena in
‘‘dog’’ can be added to the image as it appears else-
where in the network. Right: Subsequently present- various domains, such as verbal learning (Feigenbaum
ing ‘‘the cat’’ leads to discrimination, by which an & Simon, 1984), context effect in letter perception
extra link and node are added to the network; note (Richman & Simon, 1989), concept formation (Gobet,
that the link can use the whole word ‘‘cat’’ as a test as Richman, Staszewski, & Simon, 1997), and expert
it appears elsewhere in the network. behavior (Gobet & Simon, 2000; Richman, Gobet,
Staszewski, & Simon, 1996; Richman, Staszewski, & 1962). Feigenbaum and Simon were interested in
Simon, 1995; Simon & Gilmartin, 1973). The most simulating aspects of the paired-associate task in ver-
recent addition to the EPAM family, CHREST, has bal learning. In this task, items are presented in pairs
enabled detailed simulations in domains such as expert (stimulus-response), and subjects try to learn the ap-
behavior (de Groot and Gobet, 1996; Gobet, 1993; propriate response given a particular stimulus. For
Gobet, de Voogt, & Retschitzki, 2004; Gobet & Wa- example, during the first presentation of the list, the
ters, 2003), the acquisition of syntactic structures experimenter may present pairs such as
(Freudenthal, Pine, & Gobet, 2006), and vocabulary
(Jones, Gobet, & Pine, 2005), the learning and use [DAG—BIF]
of multiple diagrammatic representations (Lane, [TOF—QET]
Cheng, & Gobet, 2000), concept formation (Lane &
Gobet, 2005), and the integration of low-level percep- [DAT—TEK]
tion with expectations (Lane, Sykes, & Gobet, 2003).
The recursive structure of EPAM’s network captures a During a later presentation of the list, only the
key feature of human cognition, and its redundancy stimuli are presented, and the subjects have to pro-
(small variations of the same information can be stored pose a response. The correct pair is then displayed as
in various nodes) ensures that EPAM is not brittle or feedback. For example:
oversensitive to details. Finally, its learning mecha-
nisms and the emphasis on serial processes make it an [DAG—?]
ideal candidate for studying effects of order in learning. [DAG—BIF]
[TOF—?]
ORDER EFFECTS
[TOF—QET]
There are three main sources of order effects in EPAM: The list is presented as many times as required
(a) ordering of the attentional strategies used; (b) effects until some level of success is attained; for instance, the
arising from routines that train the discrimination net- subjects may be required to make two consecutive cor-
work; and (c) ordering of the stimuli in the environ- rect passes. Two of the numerous anticipation learning
ment (e.g., the curriculum). All three are similar in the phenomena observed in the human data and simu-
sense that they change the form of the discrimination lated by Feigenbaum and Simon are of interest here.
network by altering the order of appearance of exam- We next discuss the serial position effect, which is a
ples and their features. Each of these three effects may result of attentional strategies, and the phenomenon
be explored by using computer simulations and, to a of forgetting and then remembering again, which is a
certain extent, experiments with human subjects. We result of EPAM’s learning mechanisms. When mod-
illustrate the first two effects with the paired-associate eling such experiments in EPAM, we assume that the
task in verbal learning, which was used in the initial discrimination network is learning to distinguish be-
EPAM publications to demonstrate its ability to emu- tween the stimuli. The nodes are then linked to their
late human performance. Two order effects were ob- appropriate responses, which are stored elsewhere in
served in these experiments: the serial position effect, in long-term memory.
which a subject initially learns items in salient positions
(the effect of an attentional strategy), and forgetting, in
Serial Position Effect
which a subject’s previously correct knowledge is
somehow lost (the effect of the learning routines). The When subjects are presented with a list of items, such
third order effect is investigated in a simple set of ex- as the stimulus-response pairs above, they will pay at-
periments that explore the effects of stimuli order. tention to some items before others. Of interest here
is that the initial item that subjects focus on is not
selected arbitrarily. In the simplest case, they will no-
Paired-Associate Task in Verbal Learning
tice any item that stands out, such as a brightly colored
The role of order effects was indirectly explored in the word, and remember it before noticing the other items.
first publications on EPAM (Feigenbaum & Simon, This effect, known as the Von Restorff effect (Hunt,
1995), is found with the other senses, too: Subjects pay recall the response to the same stimulus on trial tþ1,
attention to loud sounds first. In the absence of any but then get the correct response on trial tþ2. Because
particularly prominent feature, subjects will begin the discrimination network, which stores all of the
from ‘‘anchor points,’’ in this case, the beginning and information learned by the cognitive model, cannot
end of the list. This leads to the serial position effect, delete or erase information, one might wonder whe-
where the items close to the beginning and end of the ther the model can explain such phenomena.
list are learned better than the items in the middle. However, Feigenbaum and Simon provide an ex-
Feigenbaum and Simon (1984) showed that two planation based on the learning mechanisms that cre-
postulates of EPAM account for the serial position ef- ate the discrimination network. The basic idea is that,
fect. First, it takes a constant amount of time to create a as the discrimination network is modified, additional
new node in memory (experiments have shown this test links may be added to any node. For instance,
requires roughly 8 seconds). This limitation means that consider Figure 8.4. At the beginning, the stimulus
subjects simply do not have enough time to memorize DAG is sorted to a node using a single test on its first
entire lists of novel items; thus they tend to learn more letter (d); that node indexes the correct response, BIF
about what they notice most. The second reason to (indicated by the dashed line). As further learning oc-
account for the serial position effect is the attentional curs within the network, additional links are added to
strategy the subject uses to select which item to pay this node. As the second panel of Figure 8.4 shows, this
attention to first. The default strategy that subjects most may result in DAG now being sorted to a deeper level
often use is to begin from the anchor points, though this than before, using tests on its two first letters (first d and
may be overridden by particularly salient items. In ad- then a). Thus, the model will be considered to have
dition, subjects may consciously decide to look at the ‘‘forgotten’’ the correct response, BIF, whereas what
items in the middle of the list first, for example. Evi- has really happened is that the response has been by-
dence for the role of attentional strategies on the or- passed. With additional learning, a new node may be
dering of learning has been discussed in Feigenbaum created, using tests on all three letters of the stimulus
and Simon (1984) and Gregg and Simon (1967). (first d, then a, and finally g), and the correct response
may be associated with this new node. We leave as an
exercise to the reader to show how this mechanism
Forgetting and Remembering Again
can be used to explain oscillation, where the cycle of
Another, almost curious, phenomenon that has been knowing-forgetting-knowing occurs several times.
observed in subjects is that learned information may
sometimes be forgotten and then remembered later
Experimental Exploration of Order Effects
on. For instance, Feigenbaum and Simon (1984) dis-
cussed how subjects may correctly recall the response In this section we show how one may explore order
to a given stimulus on one trial, t, then incorrectly effects arising from the environment by using a rela-
<DAG–BIF> is learned <DAG–TEK> is learned, <DAG–BIF> is

correctly. so that <DAG – ?> now relearned correctly.
bypasses the correct node.
d
DAG BIF
d
a
d DAG BIF
DAT TEK
DAG BIF a
g
DAT TEK
DAG BIF
figure 8.4. Learning, forgetting (by-passing), and learning again.

tively simple experimental setup. We are interested in performance is measured by testing it on the entire
how changes in the order in which the discrimination database, with learning turned off.
network receives data to process affects the network. For each pattern, the discrimination network must
To investigate this, we choose a simple target domain, first sort the pattern through its tests and assign it to an
train the model with different orders of items from appropriate node. At that node only two things can
that domain, and then investigate the changes in ob- occur: Either the pattern matches the image at the
served performance of the resulting networks. node, or it mismatches. If the pattern matches the im-
age, then we say that the pattern was correctly sorted,
though we distinguish those cases where the image is
The Data to Learn
an exact match from those where it is only a subset. If
The experiment here tests EPAM’s ability to dis- the pattern mismatches the image, then the pattern
tinguish patterns consisting of ordered sequences of was incorrectly sorted. Figure 8.5 shows graphs of the
random digits. For simplicity, we assume that only performance for the three orderings based on increas-
the digits 0 to 4 may appear in each sequence; ex- ing numbers of patterns from the dataset; the top graph
ample patterns would then be (0), (3), (2 1), (4 1 2), reports the proportion of nodes reached at which the
and so forth. The discrimination network will identify item and the image in the node matched exactly, and
each pattern using individual digits or series of digits the lower graph the proportion in which the item was a
(chunks) as tests, with patterns forming the images at subset of the image in the node. It can be seen that the
the nodes in the network. small-first ordering performs substantially worse than
We present the discrimination network with a data- the other orders in terms of exact matching but is
set consisting of a number of patterns. We investigate clearly superior in terms of subset matching; note that
three kinds of order on the dataset: small first, large the range of the vertical axis differs.
first, and random. ‘‘Small’’ and ‘‘large’’ refer to the The next point of interest is how well the learned
length of the pattern. Therefore, ‘‘small first’’ means networks from each order compare with one another.
that the learner sees the patterns in the dataset in order We use two measures. The first simply adds up the
of increasing size; ‘‘large first,’’ in order of decreas- number of nodes in the network, and the second
ing size; and ‘‘random,’’ in arbitrary order. For ex- counts the average number of tests used to index each
ample, the random sequence of patterns ((0) (4 1 2) example in the test set. Because each test will require a
(2 1) (0 4 2 1 1)) would be ordered small first as ((0) certain amount of time to be applied, the latter test
(2 1) (4 1 2) (0 4 2 1 1)) and large first as ((0 4 2 1 1) helps measure the resources required by the network
(4 1 2) (2 1) (0)). when in use. Figure 8.6 contains graphs showing how
these two quantities vary across the networks. The
number of nodes in each network does not change
Testing for Order Effects
much for the separate orders, but the small-first order
To test for order effects, we present EPAM with a requires a noticeably greater number of tests for sort-
dataset of patterns in the three different orders. We ing of test items.
then compare the learned networks first by their per-
formance and second by contrasting their form and
Discussion
content. We illustrate this procedure by comparing
the networks learned for datasets consisting of up to The results in the graphs indicate that the different
10,000 patterns, each pattern of at most 10 digits, and orders do indeed produce different discrimination
each digit one of 0, 1, 2, 3, or 4. We first generate a networks and consequently varying levels of perfor-
database of 10,000 random patterns and then select mance. We have seen that the small-first order pro-
subsequences of this database for training the net- duces a deeper network, that is, one requiring more
work. We thus produce separately trained networks for tests. This occurs because, on average, each pattern is
sequences of 1,000 patterns, 2,000 patterns, and so on longer than the preceding ones during training. There-
up to the full 10,000 patterns. For each length of fore the network will discriminate among the lon-
sequence we produce three networks, one for each ger patterns at a greater depth. This also explains the
ordering of the patterns. After training, each network’s performance differences. The small-first order works
Performance of network on exact matching

0.45
0.4
0.35
0.3
Proportion correct
0.25 Small first

Large first
0.2 Random order
0.15
0.1
0.05
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of patterns in training set
Performance of network on subset matching

0.7
0.6
0.5
Proportion correct
0.4 Small first

Large first
Random order
0.3
0.2
0.1
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
figure 8.5. Performance of the network given three different orderings.
poorly for exact matching because there are more of EPAM AND INSTRUCTION:
the larger patterns, and, because these appear only THE BROADER VIEW
at the end of the training set, there is not enough
time for familiarization to learn them completely. We have seen that order effects arise due to different
In addition, the small patterns, because they do not orderings of stimuli. There are a number of conse-
appear later in the training, will not be properly quences of this. On the positive side, it should be
familiarized. possible to order stimuli such that many primitives
Number of nodes in network

5000
4500
4000
Number of nodes in network
3500
3000
Small first
2500 Large first
Random order
2000
1500
1000
500
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of tests required

6
Average number of tests per pattern
Small first
3 Large first
Random order
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
figure 8.6. Number of nodes and average depth of networks given three different orderings.
and useful chunks will be learned first and thus sup- must be completely altered. We briefly consider how
port the acquisition of more complex information. On such findings relate to education.
the negative side, if stimuli are inappropriately or- A common claim in education (e.g., Anderson,
dered, then learning may become progressively harder 1990; Gagné, 1973) is that knowledge is organized
and result in suboptimal performance. One of the hierarchically and that teaching should move from
clearest predictions that EPAM makes is that relearn- simple chunks of knowledge to more advanced knowl-
ing a domain with a new organization is very difficult edge (see also Chapter 7 of this book, by Renkl and
because the structure of the discrimination network Atkinson and Chapter 2 by Reigeluth). EPAM agrees
with the hierarchical organization of knowledge, but the system and subjects. These effects have some in-
when extrapolating (perhaps boldly) from our simu- teresting consequences for instruction, as was dis-
lations in a simple domain to real-life settings, one cussed in the previous section. However, the impact
would conclude that the need for hierarchical teach- and correction of order effects in more complex do-
ing is at least open to discussion and almost surely will mains are still open research topics.
vary from one domain of instruction to another and as
a function of the goals of instruction.
Attentional strategies and the ordering of instances PROJECTS AND OPEN PROBLEMS
in the environment are crucial for EPAM because, as
a self-organizing system, EPAM develops both as a To explore the topics this chapter raises, we have in-
function of its current state and of the input of the cluded two sets of possible problems and exercises.
environment. Therefore, any deficiencies in learning,
such as attending to irrelevant features of the envi-
ronment, will lead to a poor network and thus poor Easy
performance. Further, attempting to correct previous
1. Implement the discrimination network learning
learning will require massive and costly restructuring
procedure described in this chapter.
of knowledge (Gobet & Wood, 1999; Gobet, 2005).
Note that EPAM’s view of sequential effects on 2. Replicate the experiments described here
learning is not shared by all theories of learning. For (broadly defined), and verify that you obtain
example, Anderson (1987, p. 457) states that, for similar results.
the ACT tutors, ‘‘nothing about problem sequence is
3. Extend the experiments by varying the parame-
special, except that it is important to minimize the
ters. For example, increase the set of numbers
number of new productions to be learned simulta-
that may appear in the patterns and the number
neously.’’ This difference seems to come from the way
of elements in each pattern. Do the learned
information is stored in both systems: as a hierarchical
networks contain more nodes or fewer? Is per-
structure in EPAM and as a flat, modular organization
formance as good as before, or does it take
of productions with ACT. As a consequence, the two
longer to improve?
theories have different implications for the design of
effective instructional systems (Gobet & Wood, 1999).
Hard
CONCLUSIONS
1. Write a parallel implementation of EPAM, and
find out whether order effects are still present.
We are now in a position to answer the questions
raised in the title: How do order effects arise in cog- 2. Gobet (1996) and Gobet and Simon (2000)
nition, and can cognition do anything about them? suggest that chunks that recur often evolve into
Order effects arise in our example cognitive model, more complex data structures similar to sche-
EPAM, because its learning mechanisms are incre- mata, which they called templates. Read their
mental. That is, on seeing an item of data, the learning papers and discuss how the presence of such
mechanism will add to whatever is already present in structures influences ordering effects.
long-term memory. How and what is added from a
3. Investigate other measures of the quality of the
given item of data will depend upon what is already
discrimination network. For example, information
present in the discrimination network. Thus, order ef-
theory allows us to predict the optimum size of
fects arise as a natural consequence of the incremental
the network and sets of features for indexing
mechanisms within the cognitive model.
any specified dataset. Are the networks learned
Of interest is whether these order effects may be
by EPAM nearly optimum? If not, could better
eliminated by continued experience. In a number of
networks be learned with a better ordering of the
simple situations we have seen that order effects man-
training data?
ifest themselves quite strongly in the performance of
APPENDIX: PSEUDOCODE FOR

Procedure Discriminate (found-node,
DISCRIMINATION NETWORK
target)
Let new-feature be select-mismatch-
We provide a simplified version of the training rou-
feature (target, node-image
tines for EPAM’s discrimination network. Informa-
(found-node))
tion is held in two separate data structures:
Create a new-node with an empty
image.
Node: Node-image stores a chunk for
Create a new-link with a link-test ¼
this node.
new-feature and link-node ¼
Node-links stores the set of
new-node.
links from this node.
Add new-link to node-links (found-
Link: Link-test holds a test.
node)
Link-node holds the node
Procedure Select-mismatch-feature
reached from this link.
(target, found-image)
Let mismatch be difference of target
Root-node holds the initial node for the discrimina-
and found-image
tion network.
Let new-node be Find-node (mismatch)
High-level functions, for calling EPAM:
If a new-node is found with nonempty
node-image
Procedure Recognize-pattern (target)
Then return node-image (new-node)
Find-node (target, root-
Else learn (mismatch)
node)
Procedure Learn (target)
References
Let found-node be find-node (target,
root-node) Anderson, J. R. (1987). Production systems, learning,
and tutoring. In D. Klahr, P. Langley, & R. Neches
If node-image (found-node)(target
(Eds.), Production systems models of learning and
Then familiarize (found-node, development (pp. 437–458). Cambridge, MA: MIT
target) Press.
Else discriminate (found-node, Anderson, J. R. (1990). Cognitive psychology and its
target) implications (3d ed.). New York: Freeman.
de Groot, A. D., & Gobet, F. (1996). Perception and
memory in chess: Heuristics of the professional eye.
Searching: Assen, the Netherlands: Van Gorcum.
Feigenbaum, E. A. (1963). The simulation of ver-
Procedure Find-node (target, current_ bal learning behavior. In E. A. Feigenbaum &
node) J. Feldman (Eds.), Computers and thought (pp.
297–309). New York: McGraw-Hill.
While (unconsidered links in
Feigenbaum, E. A., & Simon, H. A. (1962). A theory of
current_node) the serial position effect. British Journal of Psycho-
If target satisfies link-test logy, 53, 307–320.
Then find-node (target, link- Feigenbaum, E. A., & Simon, H. A. (1984). EPAM-like
node) models of recognition and learning. Cognitive Sci-
If no links left and no link satisfied, ence, 8, 305–336.
Freudenthal, D., Pine, J. M., & Gobet, F. (2006).
Return current_node Modelling the development of children’s use of
optional infinitives in English and Dutch using
Learning: MOSAIC. Cognitive Science, 30, 277–310.
Gagné, R. M. (1973). Learning and instructional se-
quence. Review of Research in Education, 1, 3–33.
Procedure Familiarize (found-node,
Gobet, F. (1993). A computer model of chess memory.
target) Proceedings of Fifteenth Annual Meeting of the
Add to node-image (found-node) a Cognitive Science Society (pp. 463–468). Hillsdale,
feature from target NJ: Erlbaum.
IN ORDER TO LEARN
How the Sequence of Topics Influences Learning
Edited by
Frank E. Ritter
Josef Nerb
Erno Lehtinen
Timothy M. O’Shea
1
2007

Gobet Lane 2007 Order

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Gobet Lane 2007 Order

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 8

An Ordered Chaos: How Do Order

ﬁgure 8.2. Example of a discrimination network.

<DAG–BIF> is learned <DAG–TEK> is learned, <DAG–BIF> is

ﬁgure 8.4. Learning, forgetting (by-passing), and learning again.

Performance of network on exact matching

0.25 Small first

Performance of network on subset matching

0.4 Small first

Number of nodes in network

Number of tests required

APPENDIX: PSEUDOCODE FOR

Vous aimerez peut-être aussi