Vous êtes sur la page 1sur 26

An Introduction to Artificial Intelligence

Author(s): Staffan Persson


Source: Ekonomisk Tidskrift, rg. 66, n:r 2 (Jun., 1964), pp. 88-112
Published by: Wiley on behalf of The Scandinavian Journal of Economics
Stable URL: http://www.jstor.org/stable/3438581 .
Accessed: 07/11/2013 07:54
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley and The Scandinavian Journal of Economics are collaborating with JSTOR to digitize, preserve and
extend access to Ekonomisk Tidskrift.

http://www.jstor.org

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

AN INTRODUCTION TO ARTIFICIAL
INTELLIGENCE*
By STAFFAN

PERSSON1

What is artificial intelligence? Attempts to answer this question


often give rise to semantic controversies. To avoid this let us intuitively
define the area of research in artificial intelligence as the design of
machines2 the behavior of which in a given situation would be labelled
intelligent if observed in human activity. The nature of the problem
of definition may be illustrated as follows: a person who scores high
in an intelligence-test is usually considered intelligent, but is a machine
which performs equally well on the same test also intelligent? As we
know of no generally accepted way of resolving this question we have
to leave the reader with his own conclusion.3 A question closely related
to the discussion
machines think".
defining it as an
and one machine

of intelligence in machines is the following: "can


Turing (ref. 30) made this question operational by
"imitation game" played by two people (A and B)
(C). One of the participants, say A, is interrogator

with the task to decide whether B or C is the machine. A performs


his job by evaluating answers given by B and C to his questions, as
all communication is transmitted mechanically such clues as handwriting and voice are eliminated, leaving him with the actual contents
of the answers as basis for his decision. The machine is said to have
* To
justify the presentation of artificial intelligence in this journal we should
exemplify its potential relevance for the field of business administration,
as we
however in this context want to avoid speculations, let us for the moment accept
the fact that courses in artificial intelligence are now offered at some of the
leading American Schools of Business Administration as a tentative justification.
1 The author is greatly indebted to Dr. Edward A.
Feigenbaum of University
of California for valuable comments and suggestions.
2 Machine is here synonymous
to a program for a general purpose computer
or to a special purpose computer.
3 For a discussion
of this kind of question see references 1, 27, and 30.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

89

passed the "Turing test" if A can not identify it more frequently than
pure chance would predict. At present there are no machines which
can pass the Turing test for arbitrary questions, but given a realistic
limit for the area of exploration there actually exist machines proficient enough to pass a partial Turing test.
The general area of artificial intelligence can be divided into more
branches. At a "micro-level" we find the research on
neural-net-analogies,4 an area concerned with models composed of
several simple and often randomly connected components. The allegedly "brain-like" behavior of these models is obtained by application
specialized

of simple learning-rules which after a trial and error-procedure can


cause the initially unorganized network of components to assume a
logical configuration which actually is capable of recognizing patterns
or solving simple problems. These models will not be discussed further
here because their problemsolving-capacity
is not yet sufficiently
developed to be of interest for our purposes.
At the "macro-level" of artificial intelligence we combine the simple
basic components to complex but specialized problemsolving units.
This approach sacrifices some generality but permits design of
machines

with "interesting" capability using equipment available


to-day. We can at this level distinguish two lines of research, namely
Simulation of Human Cognitive Processes and Artificial Intelligence
Proper.
Simulation

of Human Cognitive Processes is concerned with the


exploration of human behavior at a level between the explicitly observable and the basic neural processes. The assumption behind this
research is that it is possible to analyze and precisely explain the
basic problemsolving and symbol-manipulation processes underlying
the human thought. As the computer is a general information-processing device of very high capacity it has been employed as a powerful
tool for the construction and testing of models of human behavior.
The task of the researchers is thus to formulate a precise
theory in
the form of a computer-program and then to test the
theory by
the
of
the
the
comparing
performance
program (i.e.
predictions of the
theory) with available data from psychological experiments. It should
4

For a discussion

of this area see references

nos. 16 and 21.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

90

STAFFAN

PERSSON

be stressed that not the solution given to specific problems but rather
the process that generates the result is of interest.
Artificial Intelligence Proper is concerned mainly with the results
generated by the problemsolving process and therefore admits any
efficient method to be used for reaching the goal. In spite of the
freedom of choice among methods it has turned out that also this
research depends heavily upon results obtained from the exploration
of human problemsolving. This is not surprising because the designers
of the machines necessarily must find it difficult to detach themselves
from their own experience when looking for appropriate methods to
include in their models.
Having briefly presented our general area of discourse we now
proceed to survey some results and methods of artificial intelligence.
The next few sections of this article are devoted to brief descriptions
of some major achievements of research, (unfortunately the limited
space forces us to simplifications and generalizations and we therefore recommend the interested reader to retrieve the original reports
for more accurate accounts). One section is devoted to a rather
detailed description of SEP 1, a program for extrapolation of sequences
of numbers and letters. SEP 1 has got a preferential treatment not
because it is especially important, but due to the fact that this program will provide several examples to illustrate methods of artificial
intelligence discussed in the last few sections of the paper.

Some Important Applications of Artificial Intelligence


1. Game-Playing Programs
It may seem rather wasteful to spend time and money upon the
design of game-playing programs for digital computers, but in addition to the value of pure entertainment the parlor-games provide nice
examples of decision-making in complex and changing environments.
The most successful game-playing program so far is Samuel's
checkers-playing program (ref. 22). This program does not explicitly
try to imitate the methods of the human checkers-player, but utilizes
interesting methods of learning which will be discussed in some more

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

91

detail in a later section. The performance of this program can be


compared to that of a very good human player.
The game of chess has for a long time been considered as the undisputable queen of games, a situation which of course has attracted
the attention of people who want to prove that machines can be programmed to perform well even in the most sophisticated environments. In spite of large efforts there does however not yet exist a
machine that can consistently beat even an average player. This
"failure" on the part of the programmers has however contributed to
an increased understanding of how a chess-playing program should
be designed. As the history of these programs is of fundamental interest for the understanding of the development of research in
artificial intelligence, a brief survey of the evolution of chess-playing
programs will be given.
The game of chess is from a theoretical point of view "uninteresting" because there exists an optimal strategy,5 which always guarantees at least a draw. The problem of identifying this strategy among
the about 10120available is however unsolvable, because realizing that
an evaluation of as many as one million strategies per second would
only produce about 1016 evaluations per century obviously clearly
shows that no exhaustive technique can compute the optimal strategy.
Shannon (ref. 23) proposed an exhaustive evaluation to a limited
depth (say 2 moves6) basing the choice of move upon the maximization of a numerical value assigned to every possible board-position.
In 1956 a Los Alamos group (ref. 12) implemented a Shannon-type
of chess-playing program on the MANIAC 1 computer.7 Because of the
enormous number of computations required for each move the evaluation-function had to be very simple. The performance of this program
was very poor due to the very limited amount of consideration given
to each alternative. Bernstein (ref. 2) in 1957 presented a chessplaying program for the IBM 704, which for each stage only evaluated
up to seven plausible alternatives to a depth of two moves.8 The reconcepts see ref. 14.
This gives about 800,000 alternatives to explore for each move.
7To reduce the time necessary for evaluation of each move a chess-board of
only 6 x 6 was used, some kinds of moves were also eliminated.
8 This
approach requires about 2500 alternatives to be evaluated for each move.
5For game-theoretical

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

92

STAFFAN PERSSON

duced number of evaluations permitted a more complicated evaluation-function but the performance of the program was still rather
mediocre.
At this stage it may be of interest to see how the human chessplayer can use his limited computational ability to play a far better
game of chess than the machines. The key to his excellence is selective
search, he at the most evaluates 100 alternatives at each step using
previous experience, rules of thumb, etc. to guide his choice. Newell,
Shaw, and Simon (ref. 19) have tried to capture the essential features
of human chess-playing in their chess-playing program. As their ideas
are relevant as well for organization-theory as for artificial intelligence
we want to give at least a rudimentary feeling for their method of
reducing the search for alternatives. One basic idea behind their
program is the well-known "aspiration-level" model of decision-making (ref. 15). A simplified step by step account of the decisions preceding one move runs as follows:
1. On the basis of the current stage of the game a set of goals (such
as Center Control, Material Balance, King-Safety, etc.) are ordered
in decreasing priority.
2. A move-generator proposes a few moves considered relevant for
the currently highest priority goal.
3. The set of proposed moves is evaluated in terms of the value for
each goal, giving for each move a list of values, which may be
numerical or just yes or no. The evaluation is performed to a
depth which is decided by an analysis-generator.
4. The list of values is now used to choose the most appropriate move.
The first entry decides the choice unless several moves have equal
value at this position in which case the second entry is compared
etc. Up to now we have a case of simple maximization but the
proposed move must also satisfy the requirement that all entries
in its list of values must exceed the aspiration-level of
corresponding goal, if this is not the case the next move in priority is tested
etc. If no move is accepted the move-generator for the secondpriority goal is initiated and the analysis starts again at point 2.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

2. Problem-Solving

93

INTELLIGENCE

Machines

The Logic Theory Machine (LT) by Newell, Shaw, and Simon


(references 17 and 18) is the first ancestor to several artificial intelligence projects. Although LT was explicitly designed to prove a
set of theorems from Whitehead and Russel's Principia Mathematica,
the basic methods employed were of far more general applicability.
LT utilized a set of five basic axioms and three rules of inference
(methods) which could be applied in a recursive fashion. The basic
heuristic9 of LT was the "working backwards" technique, which was
employed to identify subgoals which could bridge the "gap" between
the basic axioms and the theorem. A very abbreviated account of the
procedure is given below:
1. Identify a set of subtheorems which can be transformed to the
theorem which we intend to prove in one step by applying one
of the methods to the axioms or previously proven and memorized
theorems. If one of our subtheorems is identical to an axiom or
a previously proven theorem our task is completed, otherwise we
go to point 2.
2. Reduce and organize the set of subtheorems generated in 1 by
a. Excluding subtheorems believed to be unprovable,
b. Testing for similarities in order to avoid double work,
c. Ranking the remaining subtheorems in increasing order of
difficulty.
3. We now have a list of subtheorems, all of which we know how to
transform to the initial theorem, therefore if we could find a bridge
between the axioms (and proven theorems) and any of our subtheorems we have completed our task. So we go to 1 again using
the highest priority subtheorem instead of the original theorem.
If unsuccessful LT then tries other subtheorems, until a solution is
found for memory-space exhausted.
As we see LT if successful generates a string of successive subtheorems which all can be derived from its predecessor using one
9 Heuristic=any
method that often (not necessarily
solution by reducing the necessary amount of search.

always)

7 - 644811 Ekonomisk Tidskrift 1964 Nr 2

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

can lead to a

94

STAFFAN PERSSON

application of some method. The string of subtheorems therefore


obviously constitutes a proof of the theorem.
The Symbolic Automatic INTegrator (SAINT) by Slagle (ref. 26)
performs symbolic integration at the level of freshman calculus. The
basic organization of SAINT is largely inherited from LT. The tools
available are a list of standard-integrals, memorized results and a set
of algorithmic10 and heuristic procedures. In order to reduce search
a crude evaluation of the relative difficulty of different lines of attack
is made before a string of transformations is applied.
Another successful relative of LT is the Geometry Theorem Machine
by Gelernter (ref. 9). This machine will be discussed in the section
entitled "Particular Heuristics".
The designers of LT have utilized their experience from this program in their General Problem Solver (GPS) (ref. 20), a program the
structure of which is based upon the analysis of "thinking aloud"
protocols from experiments in human problemsolving. The name
General Problem Solver does not mean that all problems can be solved
by GPS but that the problemsolving methods employed are contextindependent and general in the sense that problems which can be
cast into a certain general form can be solved. GPS will in a later
section be discussed in some detail as an example of means-ends
analysis.
3. Inductive Machines
The important area of induction has not yet produced many results,
but some achievements of interest have been reported. The work within
this area aims toward the design of machines which can actually build
and utilize models of their environment. Besides ability to predict
and answer questions these machines should have the capacity to ask
"intelligent" questions when additional information
their work.

is required for

Lindsay's SAD SAM (ref. 13) is a machine which can answer questions concerning kinship-relations. The information is given in form
Algorithm = (in this context) a method which guarantees a solution, but which
may be uneconomical to use because of the amount of computation-time
required
to reach the solution.
10

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

95

of a set of sentences1l which describe family-relations. "Joey was


playing with his brother Bobby" and "Bobby's sister Jane asked her
mother Anna for help" are simple examples of input-sequences. SAD
which is a sentence-diagrammer encodes the syntactic information
given by any sentence into a map, the general format of which is
context-independent. SAM is a semantic analyser which retrieves the
information from SAD's map and arranges it into a family-tree.
Given a question about family-relations"
explicitly or implicitly
derivable from the inputsentences SAD SAM can now give the answer
by consulting the tree-model. In our example we can for instance ask
for the name of Joey's mother and get the answer Anna.
A program by Simon and Kotovsky (ref. 25) extrapolates lettersequences used in intelligence-tests. The conclusion of the program
is reached in two steps, first an explicit model of the input-sequence
is described in an appropriate language, then this model generates
the continuation of the sequence. It may be noted that this model is
intended to illustrate a theory about human concept formation. A very
efficient but purely artificial intelligence program by Persson12 can as
a subset of its problemenvironment, solve the same kind of intelligence-tests as the Simon and Kotovsky program.
An interesting example of a machine that can adapt its behavior in
a changing environment is Ernst's Mechanical Hand (ref. 5). During
its performance of such simple tasks as building towers from blocks
or putting blocks into a box, the machine builds a model of its physical
environment. This is accomplished by recording the position of a set
of potentiometers (one for each degree of freedom of movement)
every time the hand locates an object. The inductive properties come
to use when changes in the environment are arranged by the
experimenter (for instance by moving some previously located object). The
machine in this case is guided by a heuristic program which on the
basis of preceding events decides the most appropriate
way to update
the model.
It may be noted that all programs discussed in this section deal
with environments for which we know how to build efficient models
1

Formulated in Basic English.


Unpublished.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

96

STAFFAN PERSSON

(such as family-trees), for more general cases we have to rely upon


research into human information-storage and -retrieval for finding
clues about efficient methods.
4. Question-Answering-Machines
One of the goals of artificial intelligence is the design of machines
which on questions presented in spoken or written natural language
rapidly can produce answers in an easily understandable form. Before
we can actually implement such machines we have to know how to
mechanically extract meaning from sentences. This problem the difficulty of which is due to the fact that meaning can not be derived
from separate words but is a function of the context is discussed in
the literature on Machine Translation of Languages to which we refer
for further information.
The previously discussed SAD SAM by Lindsay can be classified as
one attempt to build a question-answering-machine
for a rather narrow environment.
BASEBALL by Green et al. (ref. 10) does not build its own model
of the data-universe but is instead from the beginning given an efficiently stored handbook of baseball. Questions (formulated in a
somewhat restricted English) about baseball-statistics can be answered.
There also are some restrictions in the kind of questions which
can be answered by BASEBALL, but the program permits rather complex inquiries as for instance "Did every team play at least once in
each park in each month?". The flexibility of the program has been
proven by presenting several differently formulated questions, to
which the machine gave the same answer.
SYNTHEX by Simmons (ref. 24) has a very wide data-universe
namely the Children's Golden Encyclopedia. As answers to questions
formulated in natural English the program presents quotations from
the above mentioned text.
5. Some Applications
Clarkson (ref. 3) has analyzed the work of a trust-investment
officer at a medium-sized bank and expressed his
findings in a
program which has turned out to be a very successful attempt to

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

97

simulate human behavior in a complex decision-making situation.


Our account of his model must by necessity be very brief, but we hope
that it will induce the reader to study some of the more detailed
descriptions of this important achievement.
Clarkson divides the trust-investment process into three major
phases.
i. Analysis and selection of a list of suitable stocks (the "A" list).
ii. Formulation of an investment-policy.
iii. Selection of a portfolio.
In stage i no attempt to simulate the actual behavior of the trustofficer is made, instead an approximation of his experience as gained
over the years is formulated. This approach is necessitated because
the analysis is made continuously over the years and thus can not be
simulated for any particular point of time. The "A" list is produced
by choosing those entries from a list of stocks considered as suitable
(the "B" list) which, according to expectations derived from information given about the general economy, the industries, and the companies, are expected to produce the best performance.
In stage ii the program utilizes information about the client (such
as economical position, profession, desired amount of growth, etc.)
to choose a suitable investment-policy. The policies, which are characterized by expected percentages of growth and income, are: "Growth
Account", "Growth and Income Account", "Income and Growth Account", and "Income Account".
For stage iii the following information is available: the "A" list
together with information about its members, information about the
client, and the chosen investment-policy. The stocks of the "A" list are
now ranked according to the relative performance in the dimension
of the main attribute of the investment-policy (i.e. for a
growth policy
the prospects are ranked according to growth potential). Some stocks
can be eliminated because of the tax-position of the client or other
legal reasons leaving a reduced set for further tests. Depending upon
the investment-policy a set of tests is chosen and
applied to the relist
of
a
further reduced list of stocks to constocks, leaving
maining
sider in the final choice. From the accepted stocks an investment-

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

98

STAFFAN PERSSON

portfolio is composed by applying rules for diversification


of participation in each stock.

and size

Clarkson's model has been tested against the performance of the


trust-investor. Given the same information as he it has chosen almost
exactly the same portfolio as he did. The different stages of the model
have also been tested separately and have shown a behavior very close
to that of the officer. Tonge (ref. 28) uses heuristic methods for
balancing assembly-lines, a complex combinatorial problem which
only in a few simple cases has been found accessible to mathematical
solutions. The problem involves the assignment of men and jobs to
work-stations and the object is to maximize the rate of assembly
given certain constraints specifying order between certain jobs etc.
Tonge uses methods inherited from industrial engineers, which by
using rules of thumb can make an efficient balancing without using
optimal procedures. Tonge has also shown how GPS can be arranged
for balancing assembly-lines which provides us with a further indication of the possibility to use general problemsolving
wide variety of problems.

methods for a

6. Simulation of Cognitive Processes


Most of the results up to now have at least to some extent been
based upon observations of human behavior. In this section we will
briefly discuss some programs explicitly designed as simulations of
cognitive processes. Although our presentation will be rather brief we
want to emphasize that this area of research is of fundamental importance for the future development of artificial intelligence.
The Elementary Perceiver and Memorizer (EPAM) by Feigenbaum
& Simon (ref. 6) is a precise formulation of a theory concerning the
basic processes that are involved in human cognition. EPAM has been
tested as an artificial subject in psychological experiments and has
given a very close approximation of the behavior of human subjects.
We will in a brief account of EPAM's learning mechanism discuss its
performance in association-learning tests. In these tests the subject
is presented a stimulus for which it is supposed to answer with the
correct response.
A basic feature of EPAM's learning is that
only a minimal amount

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

99

INTELLIGENCE

is memorized. Stimuli together with clues for their


responses are represented by the smallest amount of information
which at the time of their learning can discriminate them from each
other and from previously learned items, responses however are
stored in full. Suppose that EPAM has learned two items, a stimulus
of information

S1 together with its response Rl. At this stage EPAM's associative


memory (the discrimination-net) consists of a tree with one node
and two branches with corresponding terminals. The node contains a
test Tl which can discriminate between S1 and R1 by checking for
some specific information. The terminals are access points for storage
locations containing the previously mentioned minimum amounts of
information for S1 and Rl respectively. Now suppose that EPAM is
required to learn a new stimulus-response pair S2-R2. This is achieved
by growing the discrimination-net until it can hold every item of
information (new or old) at a unique terminal. First S2 is tested by
T1 and directed to one of the terminals (let us assume to the one
which holds information about SI) where a discrepancy between the
stored information and S2 is detected. A new test T2 is placed at the
point of discrepancy. T2 as previously T1 can direct stimuli to one
of two branches, the terminals of which contain information about
S1 and S2 respectively. R2 is sorted in the same manner and we end
up with a tree consisting of three nodes containing tests (T1, T2, and
T3) and four terminals containing information. Figure 1 shows two
possible configurations

for this tree.

T2)

T3

S
SS1

S2

IR1I

Figure 1.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

R2

100

STAFFAN PERSSON

Given a stimulus say S2 EPAM can retrieve corresponding response


(R2) by sorting S2 via test Ti and test T2 (Fig. 1 b) to its terminal.
The minimum clue for retrieving R2 has previously been stored at
this terminal and can thus be retrieved. The clue for R2 is thereafter
sorted via tests T1 and T3 down to the terminal containing the complete response R2 which is presented as the solution of the task.
As long as new information is presented the discrimination-net continues to grow. Remembering that associations (clues) formed at any
point in time are just adequate to retrieve associations at that time,
we see that at a later time when the discrimination-net has grown,
some of these associations may become inadequate thus making
certain items irretrievable. This means that forgetting can occur in
the model without any destruction of information. EPAM's forgetting,
which is not explicitly present in the hypothesis behind the model
has a very close connection to forgetting as experienced by people.
Hunt uses discrimination methods of the same kind as used in
EPAM for his model of human concept formation (ref. 11). The
program works with experiments of the following kind: A subject
is shown a set of stimuli and is also told which stimuli are examples
of certain concepts. The task of the subject (and the model) is to
identify the concept.
Feldman in his binary choice experiment (ref. 8) uses a computer
program to formulate hypothesis about the next event to occur in a
sequence of binary events. The model not only predicts the event but
also gives the reason for the prediction. It must be observed that
the goal of Feldman's research is not to reach a high
percentage
of correct predictions but instead is to closely reproduce the decisions
made by specific subjects.

SEP 1 a SequenceExtrapolator
This section will give a rather detailed description of the basic
organization of SEP 1, a computer program for extrapolation of sequences of numbers or letters, which has been explicitly designed to
illustrate applications of basic problemsolving heuristics. We
justify
this presentation by our intent to illustrate our survey of methods of

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

101

artifical intelligence by examples taken from mainly SEP 1 (but also


from some other programs).
The problem-environment of SEP 1 is:
1. The identification and extrapolation of sequences of numbers generated
by members of the following class of equations:
(a- X4+ b- X3+ c X2+ d X + e) [(f X + g) ,h](i +i)
where a, b, c, d, e, f, g, h, i, and j are integers, (f-X+g) and (i.X+j)
are zero or positive, h= - i, or + 1.
Several exceptions from the general formula are accepted
a. One or more of the entries in the given input-sequence may be erroneous or even left out.
b. Two or more sequences may be mixed in which case the sub-sequences
are identified and extrapolated separately.
c. The input-sequence may be "accidentally" scrambled in which case
it will be unscrambled, extrapolated and printed out in general form.
d. A strictly exponential sequence can be separated from any noise of
bounded amplitude.
2. Extrapolation of certain kinds of sequences of letters.13
3. Recognition of previously encountered sequences using a minimum
amount of clues.
For extrapolation of number-sequences SEP 1 uses the following main
processes:
Name
Task
Problem type
A
a
Extrapolate polynomials
B
b
Extrapolate exponentials
C
c
Separate mixed sequences
D
d
Recognition of sequences
The problem-type a, b, etc. is assigned to a sequence on the basis
of which process was used for its extrapolation (i.e. a sequence extrapolated by process A is said to be of type a etc.). It must be observed
that any sequence of type a can be extrapolated by processes B and C,
the process actually used depends upon the values of certain learning
parameters.
SEP 1 is equipped with the following basic goals:
1. Solve the problem.
2. Use as few inputs as possible.
3. Minimize the amount of time necessary for the solution.
13 Among others those used in Simon & Kotovsky

(ref. 25).

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

102

STAFFAN PERSSON

Goal 1 causes no difficulty, but goals 2 and 3 are in a sense contradictory and therefore require some judgment about priority.
Heuristics for Reduction of Search in the Problem-Space
In our survey of artificial intelligence we have seen that one of
the most difficult problems of artificial intelligence is to find methods
to reduce the number of alternatives to consider and evaluate in
decision-making situations. In some cases there exist efficient procedures which can guarantee a solution (these methods are here
called algorithms), in other cases some heuristic method may be
The nature of artificial intelligence makes the use of
algorithms uninteresting unless they are parts of basically heuristic
procedures. We can recognize two groups of heuristic methods, namely
general heuristics and particular heuristics. The former group is
applicable to a wide variety of problems, as examples we will discuss:
The Basic Learning Heuristic, The Means-End Analysis, and Planning. Particular heuristics are methods which take advantage of the
structure of specific problem-environments to reduce the amount of
applicable.

search for a solution.


The Basic Learning Heuristic
For our purpose learning will be defined as the utilization of
previous experience for the improvement of present performance.
Learning may be implemented in two basically different ways, namely
generalization-learning and rote-learning. Generalization-learning utilizes experience from previously encountered situations under new
but similar circumstances, rote-learning on the other hand
only allows
specific information to be retrieved from the memory.
Examples of Generalization-Learning
SEP 1 does not need any learning at all to achieve its
goal 1. Goals 2
and 3 however are more efficiently satisfied if a crude form of
learning is utilized. Let us step by step follow a simple example.
Information about the problem.
Input-sequence: 1 2 3 4 5 8 7 16 9 ...
One error is accepted in the input-sequence.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

103

The most recently solved problems have been of the following types:
aaaacaacaaadaabbbaccc
Decision 1: Which procedure to use?
It should be observed that whatever procedure is initially chosen,
the same final solution will be given, an incorrect choice at this stage
therefore only will affect the computation time.
SEP 1 can use two different methods for this decision, one is to
study the pattern of the sequence of problem-types,14 the other to
check for the distribution of problem-types during the last few experiments.
Method A. Pattern-recognition:
By applying its letter sequence
extrapolation routine to the sequence of encountered problem-types
SEP 1 can identify a fairly large set of different patterns. A sequence
like a b d a a a b d a a a b d a a a would suggest that a problem of
type b is likely to occur next time. In our example no such pattern
can be identified, so method B must be used.
Method B. In this method the decision rule is to look back a certain
number of problems (the horizon), determine which type of problem
occurred most frequently, and to choose the procedure corresponding
to this type.
The length of the horizon determines the sensitivity and stability
of the decision-rule. SEP 1 chooses the horizon which would minimize
the number of wrong decisions for the set of problems already enthus implicitly assuming that a similar distribution of
problem-types will occur also in the future. The optimal horizon is
related to the degree of randomness of the occurrence of different
problem-types. A long horizon is required for purely random occurrences, but an horizon of length I is optimal when the different
problem-types occur in groups within which all entries are equal, as
for instance in the sequence: a a a a a a a a a c cc c c c c cc c c
b b b b ... Assuming the present horizon to be 5 we by this method
will choose procedure C.
countered,

Decision 2: How many inputs to ask for?


A choice of too few entries means that goal 2 but not
goal 3 will
"

This method was initially

suggested by Dr. Edward A. Feigenbaum.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

104

STAFFAN PERSSON

be satisfied. On the other hand too many entries will almost satisfy
goal 3 but will not achieve goal 2. To satisfy each goal the correct
number of entries (i.e. the minimum number of entries which defines
the sequence) must be chosen. The decision-rule used is very simple,
SEP 1 asks for the number of entries which in most cases has been
sufficient to solve problems of the type chosen by decision 1 without
requiring further additions. Let us assume that a sequence of 6 entries
is chosen.
Current input-list: 1 2 3 4 5 8
Use process C. Split the sequence into two sub-sequences.
1

The sub-sequences
1

are immediately found to be:


3

5
4

7
8

16

11
32

64

...
...

And the result is printed out as:


Odd sequence is: (2X-1)
Even sequence is: 2X
It must be noted that some trial and error is often necessary before
a solution can be found, our example was chosen to give the solution
in a minimum number of steps.
Samuel has designed a sophisticated learning-procedure for his
checkers-playing machine. Checkers (as chess) has a huge game-tree15
which forces any player of the game to utilize search-reducing methods
which include heuristics for selection of the most "promising" alternatives and for determination of the depth of exploration.
We know that any method which does not examine the alternatives
down to an end-position of the game requires a static evaluationmethod for determining the relative merits of different alternatives.
In Samuel's machine this evaluation is performed by a
polynomial,
which can be revised and improved for every move of the game thus
allowing a very fast rate of learning. Before any move is chosen a
set of different alternatives are examined for possible
consequences.
This examination may include several moves ahead but
usually ends
in a non-terminal position, the value of which is
computed by using
15

The game-tree of checkers contains

some 1040 different

paths.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

105

the evaluation

polynomial. A minimax-procedure choses the alternative with the highest attainable of the computed values, this value
is also assigned to the current board-position. A comparison is made
between the just computed "backed up" value and a previously computed value for the same board-position. If a difference is found the

coefficients of the evaluation-polynomial are modified in a direction


which will reduce it. This learning method leads to stability in the
evaluation of board-positions; inclusion of a goal in the form of a
piece-advantage term also assures improvement in the performance
of the machine.
The 16 parameters of the evaluation-polynomial are selected by
the machine from a list of 38 candidates. For each move the parameter which gives the smallest contribution to the total value of the
polynomial is recorded, when this has happened 8 times for the same
parameter it is replaced by one of the 22 parameters currently not
used in the evaluation-polynomial. The experience of this generalization learning-procedure has shown that after initially violent changes
in the polynomial a stabilization of its coefficients and parameters
has occurred after completion of some 40 games.

Rote-Learning
Current experience can often be utilized at a later point in time by
memorization of pertinent information. SEP 1 simply memorizes all
previously encountered solutions, a feature which in many cases has
produced interesting results. LT memorizes solved sub-problems for
use in later search for proofs of theorems. Samuel in his checkersplayer uses an advanced rote-learning technique where the machine,
in order to save memory-capacity, stores all board-positions in a
normalized form. His program also forgets positions which are seldom
encountered.
The utilization of rote-learning is mainly limited by the
availability
of memory-capacity, but another difficulty (encountered
by LT) may
also be mentioned. LT has the ability to store sub-problems
proven
during its work. The availability of these however sometimes decreases the selective power of the machine by increasing the number
of alternatives available at later stages.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

106

STAFFAN PERSSON

The Means-Ends Analysis Heuristic


Means-ends analysis is a powerful heuristic which can be used in
situations where we want to transform an initial state to a final
state by employing a limited set of tools. As GPS gives a very precise
formulation

of means-ends analysis we find it convenient to use a


description of this program to convey the idea behind this method.
GPS consists of two basic parts, namely the Task-Environment
which contains all specific information about particular problems and
applicable rules and the Core which contains context-independent
methods for problem-solving. GPS can solve problems of the following
general form: "Transform the initial state X to the final state Y by
applying rules Ri belonging to a given set R."
The means-ends procedure is generated by recursive application of
the following goals:
1. Transform state A to state B.
2. Reduce the difference between states A and B.
3. Apply rule R? to state A.
The complexity of the resulting chain of subgoals may be illustrated
by a brief description of the actions initiated by the different goals.
Goal 1. "Transform A to B" requires knowledge of how the two states
differ. If there is no difference we can exit with success, but if a
difference D exists it must be reduced so goal 2 is evoked.
Goal 2. "Reduce difference D" requires that a suitable operator (rule)
is found. A table look-up indicates which operators are most likely
to be successful for differences of the kind represented by D. An
operator Ri is chosen. R, must now be applied to A so goal 3 is
evoked.
Goal 3. "Apply rule R, to A" requires that the form of A conforms to the
requirements of R?. If R, can be applied directly to A we get a new
state A' and goal 1 is evoked to transform A' to B. Otherwise goal 2
must be evoked to transform A to a form acceptable by R,.
We see how the different goals can call each other and (indirectly)
themselves in a recursive fashion, which in many cases may generate
very long chains of goals. Many of these chains terminate in "impossible" situations necessitating several repetitions of the procedure
in part or total. In order to detect unfruitful attempts at an
early
stage GPS contains some heuristic "indicators". One of these is the

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

107

INTELLIGENCE

previously mentioned table of recommended operators. Others are


criteria for refusing impossible or trivial problems. An interesting
heuristic is that no subgoal may be attempted if it is judged to be
more difficult than any goal at a higher level. This means that GPS
expects to partition its task into successively easier sub-tasks.
The Planning Heuristic
In many cases it is possible to reduce the number of paths between
an initial state A and a goal B by insertion of a number of sub-goals.
Let us assume that our problem requires a choice between M alternatives at each stage and that the expected number of stages is N,
thus giving a total of MN possible alternatives. The insertion of S-1
sequential and equidistant subgoals will reduce the number of possibilities to S .M-/ (for M= 10, N= 15, and S= 5 initially 1015 alternatives are reduced to 5.103). This means that any method of finding
such subgoals however crude it is may prove to be valuable. One such
method which in many cases can offer some help is planning, a
heuristic proposed by Newell, Shaw, and Simon for their GPS. The
idea behind this heuristic is to strip the original problem from all
detail, solve the thus simplified problem, and then use the result as
a plan for the solution of the original problem.

Particular Heuristics
In several problem-environments some particular structure may be
utilized in order to reduce the number of possible alternatives.
Methods which are efficient in such special environments are here
called particular heuristics.
The basic ability of SEP 1 is to extrapolate polynomials. As the
previously described general form can generate sequences which are
not necessarily polynomial in structure SEP 1 must utilize methods
which break down the input-sequence to polynomial sub-sequences.
Let us study a particular example.
"Analyze the sequence 5 56 729 10240 203125 in which one of the
entries may be erroneous, and print out its general form."
Subgoal 1:16 Find the general expression of the exponent. We know that
16 The procedures
general expressions.

discussed

can easily

be generalized

for more complicated

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

108

STAFFAN

PERSSON

this is a polynomial of at most the order 1. Our problem is now to


isolate the exponent. The only information available is that it can never
assume a value larger than the number of occurrences of the most frequent
prime-factor indicates (there are three exceptions from this rule).17 Let
us compute these numbers, they are:
1 3 6 11 6
This sequence is certainly not linear, so a procedure for making it so
is necessary. SEP 1 therefore generates a diagram (a very efficient
heuristic device) and plots the above sequence as follows:
Number of 12 occurences

10-

8x

6-

/a

4- _

2Figure 2.

'

1
5

1 ,
7

Entry
number

A line which satisfies the following conditions is now fitted to the


points:
1. It may not pass above more than 3 points.
2. It shall pass as many points as possible.
3. Its slope must be 0 or a positive integer.
In most cases several such lines exist but by using heuristics for choosing a candidate SEP 1 usually finds the correct one rather soon. In our
case the line passes the following points:
2 3 4 5 6
Giving us the line (X+l).
Subgoal 2: Assuming the above hypothesis as correct we now proceed to
identify the basis of the exponential part of the general expression. We
know that the exponent indicates the minimum number of occurrences
of the prime-factors of the basis. Therefore take the
product of all factors
which occur at least as many times as the exponent tells. We
get the
following result:
1 2 3 4 5
Giving us the line (X).
(The same diagrammatic technique as in subgoal 1 is used to produce

the linear sequence.)l8


17 When the basis is 0 or 1
also allow for one error. There
18 Note that the
polynomial
usually disturb the linearity of

any exponent gives only one prime-factor, as we


are 3 possible exceptions.
part may contribute with some factors, which
the assumed basis.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

109

Subgoal 3: Identify the polynomial part. The procedure is now completely


straight-forward. First compute the exponential part:
1

81

1024

15625 ...==X(X+l).

Divide the input by the exponential part:


5 7 9 10 13
An algorithm gives the polynomial part (2X + 3).
(NB one error is allowed)
Test: The general expression (2X+3) XX+1 is now used to generate a
sequence which is tested against the input, if at most one dissimilarity
is found the general expression is printed out.
Among particular heuristics semantic reformulation of the problem
is a method which in many cases can provide better insight than the
original formulation. As an example of such reformulation we will
briefly describe Gelernter's Geometry Proving Machine, which uses
a coded graph to eliminate infeasible solutions. The idea behind this
semantic model is obvious, we know from experience how much easier
it is to prove theorems in geometry when we are allowed to look at
a graph (on paper or in the "minds-eye") than when we have to
work only with the formulation of the theorem in words. The geometry machine consists of three parts, a "Heuristic Computer" acts
as an executive for a "Syntactic Computer" and for a "Diagram Computer". Problems given to the "Syntactic Computer" by the "Heuristic
Computer" give rise to strings of proof-components generated in a
rather straightforward way. These strings are tested for
feasibility by
the "Diagram Computer" which studies a suitably coded
graph. Exhas
shown
that
the
use
of
the "Diagram-Computer" efficiently
perience
reduces the exploration of unfruitful attempts. In addition to the diagram the machine contains several other heuristics for determination
of the relative difficulty of problems, for determination of which
subgoal to attack first etc.

Artificial

Intelligence:

Past, Present, and Future

During the first few years of its existence research in artificial intelligence was mainly directed toward particularly interesting problems, resulting in the development of some powerful special-purpose8-

644811 Ekonomisk Tidskrift 1964 Nr 2

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

110

STAFFAN PERSSON

This kind of work certainly proved the feasibility of


problem-solving by machine, but did not increase very much our knowledge about how to solve problems in general.
Current research seems to be more concerned with the development
of general methods, hopefully leading toward a general theory of
artificial intelligence. If this research turns out to be successful we
may in the future be able to tie together a couple of standardized
programs.

problemsolving processes and include the resulting "consultant" as a


new member of our organization. GPS must be credited to be one of
the forerunners of this approach, still more powerful methods especially when it comes to inductive capability are however necessary. It
must here be stressed that the most important area for learning
general problemsolving methods is the study of human cognitive
processes. When we know more about how people process their information the next step is to include our findings in computer-programs, a task which at that time may be simplified by the development
of new computer-languages and/or computer-hardware.
Future uses of artificial intelligence may easily be conjectured but
we feel that given the information presented in this paper the reader
himself is in a position to judge and guess. For further information
about the field of artificial intelligence we recommend the book "Computers and Thought" edited and commented by Feigenbaum & Feldman (ref. 7). This volume contains research-reports, review articles,
and a very comprehensive bibliography; ingredients which together
makes this book the best possible introduction to the field.

References
1. Armer, P., "Attitudes toward Intelligent Machines" in reference 7.
2. Bernstein, A. et al., "A Chess-Playing Program for the IBM 704 Computer, Proceedings of the Western Joint Computer Conference, pp.
157-159, 1958.
3. Clarkson, G. P. E., "Portfolio Selection: A Simulation of Trust Investment", Prentice-Hall, Inc., Englewood Cliffs, N.J.; 1962.
4. Clarkson, G. P. E., "A Model of the Trust Investment Process" in reference 7.
5. Ernst, H. A., "MH-1.A Computer-operated Mechanical Hand" Ph.D. dissertation, MIT, presented at the Western Joint Computer Conference,
1962.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

ARTIFICIAL

INTELLIGENCE

Ill

6. Feigenbaum, E. A., "The Simulation of Verbal Learning Behavior",


Proceedings of the Western Joint Computer Conference, pp. 121-132,
1961. Reprinted in ref. 7.
7. Feigenbaum & Feldman eds. "Computers and Thought" McGraw-Hill
Book Company, Inc., New York; 1963.
8. Feldman, J., "Simulation of Behavior in the Binary Choice Experiment",
Proceedings of the Western Joint Computer Conference, pp. 133144; 1961. Reprinted in ref. 7.
9. Gelernter, H., "Realization of a Geometry Theorem-Proving Machine",
Proc. International Conf. on Information Processing UNESCO House,
Paris; 1959. Reprinted in ref. 7.
10. Green, B. F., Wolf, A. K., Chomsky, C., and Laughery, K., "Baseball: An
Automatic Question Answerer", Proc. of the Western Joint Computer
Conference, pp. 60-68; 1961. Reprinted in ref. 7.
11. Hunt, E. B., "Concept Formation: An Information Processing Problem",
John Wiley & Sons, Inc., New York; 1962.
12. Kister, J., Stein, P., Ulam, S., Walden, W., and Wells, M., "Experiments
in Chess", Journal of the Association for Computing Machinery, April,
1957, pp. 174-177.
13. Lindsay, R. K., "A Program for Parting Sentences and Making Inferences about Kinship Relations", in Symposium on Simulation
Models: Methodology and Applications to the Behavioral Sciences,
Eds. Hoggatt A. C. and Balderston F. E. South-Western Publishing Co.
Cincinnati, Ohio; 1963.
14. Luce, R. D., and Raiffa, H., "Games and Decisions; Introduction and
Critical Survey", John Wiley & Sons, Inc. New York; 1957.
15. March, J. G., and Simon, H. A., "Organizations", John Wiley & Sons, Inc.
New York; 1958.
16. McCulloch, W. S., and Pitts, W., "A Logical Calculus of the Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics;
1943; pp. 115-137.
17. Newell, A., and Simon, H. A., "The Logic Theory Machine-a
Complex
Information Processing System", IRE Trans. on Information Theory,
vol. IT-2, pp. 61-79; September, 1956.
18. Newell, A., Shaw, J. C., and Simon, H. A., "Empirical Explorations of
the Logic Theory Machine: a Case Study in Heuristics", Proc. of the
Western Joint Computer Conference, 1957, pp. 218-230. Reprinted in
ref. 7.
19. Newell, A., Shaw, J. C., and Simon, H. A., "Chess-playing Programs and
the Problem of Complexity", IBM Journal of Research and
Development, vol. 2, No 4, 1958, pp. 320-335. Reprinted in ref. 7.
20. Newell, A., and Simon, H. A., "GPS a Program that Simulates Human
Thought", in Lernende Automaten, R. Oldenburg KG, Munich 1961.
Reprinted in ref. 7.
21. Rosenblatt, F., "The Perception: A Probabilistic Model for Information
Storage and Organization in the Brain", Psychological Review, November 1958, pp. 386-407.
22. Samuel, A. L., "Some Studies in Machine Learning
using the Game of
Checkers", IBM Journal of Research and Development, July, 1959,
pp. 211-229. Reprinted in ref. 7.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

112

STAFFAN

PERSSON

23. Shannon, C. E., "Programming a Digital Computer for Playing Chess",


Philosophy Magazine, March 1950, pp. 356-375.
24. Simmons, R. F., "Synthex: Toward Computer Synthesis of Human
Language Behavior", in Computer Applications in the Behavioral
Sciences", H. Borko ed., Prentice-Hall Inc., Englewood Cliffs, N.J.;
1962.
25. Simon, H. A., and Kotovsky, K., "Human Acquisition of Concepts for
Sequential Patterns", Psychological Review, 1963, no. 6, pp. 534-536.
26. Slagle, J., "A Heuristic Program that Solves Symbolic Integration
Problems in Freshman Calculus", in reference 7.
27. Taube, M., "Computers and Common Sense", Columbia University Press,
New York, 1961.
28. Tonge, F., "A Heuristic Program for Assembly Line Balancing", PrenticeHall Inc.; Englewood Cliffs, N.J.; 1962.
29. Tonge, F., "Summary of a Heuristic Line Balancing Procedure", Management Science, 1960, 7, pp. 21-42. Reprinted in ref. 7.
30. Turing, A. M., "Computing Machinery and Intelligence", Mind, October
1950, pp. 433-460. Reprinted in ref. 7.

This content downloaded from 197.255.75.75 on Thu, 7 Nov 2013 07:54:32 AM


All use subject to JSTOR Terms and Conditions

Vous aimerez peut-être aussi