Vous êtes sur la page 1sur 20

An Introductory Lisp Parser

Donald Loritz
Georgetown University

ABSTRACT: Instructional parsers will figure prominently in future computer-assisted


language learning. Most contemporary computational linguistics research is being
conducted in Lisp. A knowledge of Lisp parsing should therefore help prepare serious
language teachers for future CALL. Despite Lisp's manifest suitability to natural
language processing, most introductions neglect this aspect of Lisp. This paper tries to
redress some of that neglect by introducing Lisp in the context of an introductory English
parser.

KEYWORDS: atoms, lexicon, Lisp, lists, parser, recursion stack, rewrite,


semantics, simparse, syntax, TLC Lisp, toplev (Lisp interpreter)

Introduction
Consider the average secondary or university foreign language student
who spends 5 hours per week in a foreign language class of 15. His share of the
instructor's time is only 20 minutes per week. Since such limited instruction is
available, grammar rules are often taught to students so that students can self-
instruct by applying the rules against their own output. One problem with this is
that too many students spend too much time learning and applying grammar
rules and too little time learning and using language. Another problem is that the
grammar rules themselves are often too simplistic or mechanical to be of real use.
Nevertheless, some students have learned by this process, so the checking
of output against grammar rules must be at least potentially effective. There is
today no reason why students should have to learn and apply mechanical
grammar rules. Computers can do that with instructional parsers.
In this context, parsers are computer programs used to either (a) verify an
experimental grammar against accepted language output or (b) verify
experimental language output against an accepted grammar. The former are
theoretical parsers such as are written by theoretical computational linguists. The
latter are instructional parsers. These are typically modeled on some theoretical

CALICO Journal, Volume 4 Number 4 51


parser whose grammar is not too simplistic and modified to minimize the costs,
both pecuniary and pedagogical, of grammatical mechanization. Fairly advanced
prototypes of such systems have been announced for ESL (Loritz, 1984), Chinese
(Loritz, et al., forthcoming), and German (Sanders & Sanders, 1987). Other
languages will follow, and it now appears instructional parsers will enter
widespread use within a decade.
A feature of the best theoretical parser grammars is that they are more
consistent than the most consistent theoretical grammar books ever written.
Verification by the parser has made this Possible. Although competent human
language teachers might be more authoritative than the best theoretical parser
(and certainly more simply understood!), none are so underworked that they can
scorn so consistent a teacher's aide as an instructional parser could be.
Conscientious teachers will therefore want to know about the grammars their
mechanical aides are applying to their students' output.
In the first part of this paper, I will give a short, interactive, grammar of
the Lisp computer language. In the second part I will present a basic phrase-
structure-grammar-based parser as a concrete example of how a parser is written
in Lisp. By itself, such a short introduction is unlikely to be sufficient to many
readers, but it should at least help make intelligible other introductory works
such as Bates (1978), Winograd (1983), and Winston & Horn (1981),

A Short Grammar of LISP


Lisp is among the most "natural" of computer languages. One reason for
this is that, in Lisp, a Saussurean arbitrary relation holds between signifier and
significand. We will see how it is possible to redefine virtually every classical
Lisp command word. Indeed, Lisp programming might best be understood as
the reshaping of Lisp into languages especially designed for their problem
domains. One consequence of this is that there are ultimately as many LISP
dialects as there are LISP programs. Linguists should rejoice in this linguistic
diversity, but it does slightly complicate matters for beginners. The version of
Lisp presented here is TLC Lisp for the IBM PC and compatibles. I have tried,
however, to write all sample code in a classical Lisp Style which is maximally
dialect-independent. What follows will probably be most intelligible if, after a
first, quick reading, it is reread in the company of a Lisp interpreter. From the
MS-DOS prompt, the TLC Lisp interpreter is summoned by >lisp.

CALICO Journal, Volume 4 Number 4 52


Dialogue with the Lisp Interpreter. The Lisp Interpreter is actually a
three-program-team with which you hold a turn-taking dialogue. The dialogue is
conducted in Lisp. You enter a Lisp command sentence and signal your end-of-
turn by pressing the <return> [or <enter>] key. The first member of the
interpreter team, the reader, transliterates your alphabetic Lisp into the
computer's native, numeric language. The second member of the team, the
evaluator, actually evaluates or "performs" your command. The third member,
the printer, transliterates the result of your command back into alphabetic Lisp,
displays the result on your computer screen, and indicates that it is again your
turn with the TLC-Lisp prompt, >>>. Dialogue I illustrates the preceding:

I.
>>> ;the TLC Lisp interpreter's prompt.
>>> (add 1 2) ;user command
3 ;interpreter response

>>> (string "m" 3) ;interpreter prompt, user's turn again

Lisp Morphology. In Lisp jargon, syllables are atoms. There are three
main types of atoms: strings, symbols, and numbers. Strings [e.g., "m" "m3"
"qwerty" "qwerty ASDFGH ZxcvbN"] are non-morphemic atoms; they have no
value, no "meaning." Symbols [e.g., m m3 qwerty dozen] are delimited by spaces
or parentheses. Symbols [sometimes called variables] have two main parts: a
label [or identifier or print-name] and a value [i.e., "meaning"]. These correspond
to the Saussurean notions of "signifier" and "significand," respectively. Only two
symbols are pre-defined in Lisp: t and nil [i.e., "something" and "nothing"]. All
other labels can be arbitrarily bound to any value. Symbols will be discussed
further under "Lisp semantics," below. The third type of atom, numbers, have
values [like symbols], but these are not arbitrary [e.g., 3 is always 3].
Atoms may be collected in lists. Lists are bounded by parentheses, e.g.,
(tom dick harry). Symbols frequently have values which are lists.

Lisp Syntax. Lisp is a VSO language [where V, S, and O are atoms or


lists]. Valid Lisp sentences are lists which begin with a verb, as in (add 1 2),
above. Lisp sentences are called "function calls" or "functional objects" in Lisp
jargon. Similarly, the verb is called a "function" or "applicable object." The
process of translating (add 1 2) is described as "applying the function add to its

CALICO Journal, Volume 4 Number 4 53


arguments [1, 2]." Rather than "print a translation," the function call is said to
"return a value" and the process of translation itself is called "evaluation."

Very simple recursion. Lisp is recursive, so sentences may be embedded


to any depth. Thus, we can have the following dialogue with the interpreter:

II.
>>> (add 1 (sub 3 1))
3

>>>

In the process of translating [evaluating] the outer sentence, the


interpreter must first recursively translate the embedded sentence, (sub 3 1). It is
not uncommon for a Lisp program to be like a single sentence with several
thousand levels of embedding. Such deep recursion is powerful, but it can be
confusing! The key to understanding this recursive evaluation procedure is
understanding the Lisp recursion stack

The recursion stack. A recursion stack is quite like the stack of papers that
accumulates on one's desk. Normally one works on the papers on the top of the
stack. Work on the underlying layers is suspended until the stack is unstacked
["unwound" in Lisp jargon]. Similarly, when the interpreter encounters ["reads"]
a left parenthesis, he opens a stack frame on the top of the recursion stack. When
he reads a right parenthesis, he evaluates the [completed] stack frame and
"unstacks" it, returning the result to the next lower stack frame. Figure I
illustrates this process (add 1 (sub 3 1)).
In Figure 1, we also see part of the interpreter represented on the stack. As
discussed above, the Lisp interpreter consists of a reader, an evaluator, and a
printer. The Lisp interpreter is often called "toplev," and it can be defined as:
III.
(de toplev ()
(forever
(print (eval (read)))
)
)

CALICO Journal, Volume 4 Number 4 54


2 (sub (sub 3 1) -.
1 (add 1 (add 1 <-'
toplev: (print (eval (print (eval
(forever (forever
read: (add 1 (sub 3 1))<ret> (add 1 (sub 3 1))<ret>
(a) (b)
1 (add 1 2)
toplev: (print <-' (print 3) -.
(forever (forever <-'
read: (add 1 (sub 3 1))<ret> (add 1(sub 3 1))<ret>
(c) (d)

Figure 1: Recursive evaluation uses a stack. When a left parenthesis is read (a), a
new stack frame is opened. When a right parenthesis is read (b), the function call
in the top stack frame is evaluated and the result is returned to the lower
(calling) stack frame (b, c). When the interpreter (toplev) itself is reached (d), the
process is ready to repeat.

The great value of recursive computation becomes apparent when an


error is encountered. For example, the dialogue,

IV.
>>> (add 1 (sub junk 1))

***error*** unbound variable: junk

>>>

prints its error message with the stack as in Figure 2a. By defining the error
function to include an error-handling interpreter, as outlined in Figure 2b, most
Lisp systems allow the programmer to inspect the stack "and see how he got into
this mess."

CALICO Journal, Volume 4 Number 4 55


4 (error (error (print arg)
(forever (print (eval (read
3 junk junk
2 (sub (sub
1 (add 1 (add 1
toplev: (print (eval (print (eval
(forever (forever
read: (add 1 (sub 3 1))<ret> (add 1 (sub 3 1))<ret
(a) (b)

Figure 2: When an error occurs (a), the history of the computation leading up to
the error is preserved on the recursion stack. Adding an error-handling
interpreter to the error function (b) allows the user (or program!) to inspect that
history and take appropriate action.

As we shall see, the ability of recursion to preserve the history of the


computation is invaluable in natural language processing

Lisp Semantics: Noun definition. Only two nouns are attested in


Classical Lisp: t and nil. Additional nouns must be defined ["bound"] or
redefined by using the Lisp primitive function set as in V:

V.
>>> n1
***error*** unbound variable: n1
>>> (set 'n1 9)
9
>>> n1
9
>>> (set 'n2 10)
10
>>> n2
10

The single quote marks in V have a subtle, but important effect. It can be
most readily seen by comparing the Lisp sentences in VI and VII:

CALICO Journal, Volume 4 Number 4 56


VI.
(string n1 n2) ==> "910"

VII.
(string 'n1 'n2) ==> "n1n2"

Henceforth we will take the tasks of the reader and printer largely for
granted and focus on the work of the evaluator, so the symbol "==>" can be read
as "evaluates to" or "returns>" In VI, n1 and n2 are evaluated: their values are
concatenated into a string. VII returns the concatenated string "n1n2." The quote
blocks evaluation of n1 and n2. Just as quotation marks in printed text might
mean "don't translate this," so 'arg in Lisp means don't evaluate arg.
Our of respect for tradition, most Lisp programmers still code VI and VII as in
VIII:

VIII.
(setq n1 9)
(setq n2 10)

where the q replaces the ' of VI and VII.


Quoting allows Lisp to manipulate both the label and the value of its
Saussurean symbols. This gives Lisp great power, but it also lays the foundation
for a Tower of Babel. Consider, for example, dialogue IX:

IX.
>>> (setq dog 'chien)
chien
>>> )setq chien 'hund)
hund
>>> 'dog
dog
>>> dog
chien
>>> (eval 'dog)
chien
>>> (eval dog)
hund

The Lisp primitive function eval can be thought of as the opposite of quote.
Thus (eval 'dog) in IX would be equivalent to (eval (dont-eval dog)).

CALICO Journal, Volume 4 Number 4 57


Fortunately, Lisp also gives us powerful dictionary systems to help keep
track of the threatening Babel. The most important and common of these is
illustrated in the putprop/getprop pair. (In some Lisp dialects these are put/get or
putp/getp.)

X.
>>> (putprop 'chien 'lang 'french)
french
>>> (putprop 'hund 'lang 'german)
german
>>> (getprop 'chien 'lang)
french
>>> (getprop 'hund 'lang)
german

XI.
>>> (setq dozen 12)
>>> (putprop 'dozen 'lang 'english)
english
>>> (putprop 'dozen 'pos 'n)
n
>>> (putprop 'dozen 'plural '-s)
>>> (plist 'dozen)
(lang english pos n plural -s pname "dozen")
>>> (pname 'dozen)
"dozen"
>>> dozen ; same as (eval 'dozen)
12

XI illustrates how multiple properties [semantic features] of a symbol are


maintained on a property list. Indeed, the print name [pname, label] of a symbol is
itself often stored on the symbol's property list [although in TLC Lisp it is not
actually returned by (plist arg)]. The size of a property list, like all Lisp lists, is
limited only by available memory.

Lisp Semantics: Lists. Lisp sentences can now be simply described as


unquoted lists:

XII.
>>> '(add 1 (add 2 3))
(add 1 (add 2 3))
>>> (add 1 (add 2 3))
6

CALICO Journal, Volume 4 Number 4 58


Quoted lists are the characteristic Lisp data structure, and LisP is, in fact,
an acronym for List Processor. But the acronym conveys the misleading
impression that Lisp lists are like grocery lists. In fact, Lisp "lists" are "trees." As
XIII shows, Lisp should be called "Treep"!

XIII.
>>> (setq s ;(a)
'((np (det an) (n engineer)) (vp (v makes)
(np (n bombs))))
)
((np (det an) (n engineer)) (vp (v makes)
(np (n bombs))))

>>> (pp s) ;(b)


((np (det an) (n engineer)) (vp (v makes)
(np (n bombs)))
)

>>> (car s) (np (det an) (n engineer)) ;(c)

>>> (setq s (cons 's s)) ;(d)


(s (np (det an) (n engineer)) (vp (v makes)
(np (n bombs))))

>>> (car s) ;(e)


s

>>> (pp (cdr s)) ;(f)


(np (det an) (n engineer)) (vp (v makes)
(np (n bombs)))
)

>>> (cdr (cdr (cdr s))) ;(g)


nil
>>> (cdddr s) ;(h)
nil

>>> (pp (first (rest s))) ;(i)


( np
(det an)
(n engineer)
)

CALICO Journal, Volume 4 Number 4 59


In XIII(a) the quote prevents the interpreter from trying to translate (np
(det ... )) as some kind of verbal element. As a result, s is simply bound to an
unevaluated list [we should say "tree"] of words.
Some people used to say Lisp is actually an acronym for Lots of Irritating
Sentential Parentheses. Trying to read XIII(a), one can see why. But, by using a
prettyprinter function, XIII(b) and the remaining turns of XIII show that Lisp code
can be made quite readable.
(car listarg) returns the first element of listarg. In XIII(c) this happens to be
a list.
(cons arg listarg) "pushes" arg onto the "front" of listarg, constructing, in
the process, a new tree. If we then setq 's to this new tree as in XIII(d), then s is
rebound, and (car s) returns s as in XIII(e).
(cdr listarg) returns a list of all elements of listarg but the first, as in XIII(f).
car and cdr can be recursively nested as in XIII(g) and abbreviated as in XIII(h).

Although (car listarg) often evaluates to an atom, it is useful to learn that


(cdr listarg) always returns a list. The apparent contradiction of XIII(g,h) can then
be rationalized by observing that Lisp normally treats nil and() [the empty list] as
synonymous.
The terms car, cdr, and cons are particularly sacred to the orthodox Lisp
religion, but certain reformed churches have adopted synonyms. Thus in Logo
the synonyms of car/cdr are first/but-first and in PROLOG they are head/tail. In
TLC-Lisp the synonyms first/rest are recognized as in XIII(i).

Defining Lisp verbs. As remarkable as it may seem, it is nearly possible


to define all the rest of Lisp in terms of the few primitives discussed this far. We
can, for example, define new verbs as in XIV:

XIV.
>>> (de ;(a) "define"
premier ; function-name
(liste) ; parameter list

(car liste) ;function body


)
premier ;(de ... ) returns this

>>> (setq 1 '(a b c)) ;(b)

CALICO Journal, Volume 4 Number 4 60


(a b c)
>>> (first 1) ;(c)
a
>>> (premier 1) ;(d)
a

There is some dialectal variation in the syntax of Lisp verb definitions. The
MACLISP family of dialects includes TLC-Lisp, but members of this family are
likely to use defun where TLC-Lisp uses de. Lisps of the Interlisp family use
variants of the structure (define (function-name (lambda (parameter list) function-
body))). [(Lambda ... ) is an "internal" Lisp function].
The parameter list of XIV(a) consists of one variable which is "passed" a
value when the function is called. I.e., in processing XIV(d) the argument l is
evaluated. The value of l ['(a b c)] is then bound to liste.
Note that liste will continue to be bound to '(a b c) until the process
effectively reaches the final closing parenthesis of (premier ... ). Any computation
that is "stacked on top of" (premier '(a b c)) [i.e., any function called from within
(premier '(a b c))] will find liste bound to '(a b c).
(Premier ...) then successively evaluates the sentences in the function body,
returning as the value of premier the value of the last sentence evaluated in the
body. Premier is simple: the body is only one sentence long. Premier takes the car of
liste and returns this as its value. In the next section we will look at longer and
more complex functions.

Simparse: A Simple Context-Free Parser


Figure 3 gives the lexicon and the grammar for a basic, context-free parser.
It is written in Classical Lisp and should need little if any modification to work
with most Lisp interpreters. The reader is encouraged to experiment with both
the lexicon and the grammar.

Loading Simparse. Procedures for loading and editing Lisp systems vary
from Lisp dialect to dialect, from machine to machine, and from operating
system to operating system.
TLC-Lisp has a powerful built-in editor (itself written in Lisp) which
emulates WordStar. To use it, enter

>>> (edit "simparse")

The screen will change colors to let you know you are now talking to the Lisp
editor and not the Lisp interpreter. When you have finished typing the

CALICO Journal, Volume 4 Number 4 61


; ***SIMPARSE ***

(setq ptracef nil) ;Initialize parse-trace-flag,


if TLC-
;Lisp, (setq ptracef t)

; ***THE LEXICON***

(setq A '((np vp))) ;S --> NP + VP


(setq NP '( (det n) ;NP --> DET + N
(prop-n) ) ;NP --> N
)
(setq VP '( (v np) ;VP --> V
(v) ) ;VP --> V + VP
)

; ***THE PARSER***

(de IS-S (*s) (and ptraced ;(setq ptracef t) If TLC-Lisp


(init) ;A utility function, see Fig.
4
)
(terpri)
(and
(rewrite 's)
(null *s)
))))))

(de REWRITE (node) ;Rewrite is passed a node-


atom, e.g.,
; np, v, s, det
;uncomment next line if TLC-Lisp
;(stackdemo "REWRITE:" node
(setq node (eval node)) ;Rewrite the node
(cond
( (listp (car node)) ;As in ((det adj n) (det n) (n))
(tryrules node *s)) ;If so, return this value
( (atom (car node)) ;As in (dog cat mouse)
(tryword node)) ;If so, return this value
)
))))))

(de TRYWORD (wordlist) ;Look up a word in Lexicon


;uncommon next line if TLC-Lisp
;(stackdemo "TRYWORD" wordlist
(cond

CALICO Journal, Volume 4 Number 4 62


((member (car *s) wordlist) ;If found, move on to next word
always
(setq *s (cdr *s))
t))
))))))

(de TRYRULES (rule-list good*s) ;Try different ways of rewriting


a
;node,stack a record of good*s
;uncomment next line if TLC-Lisp
;(stackdemo "TRYRULES" rules
(cond
((null rule-list) nil) ;1. No rules worked, return
nothing
((tryrule (car rule-list)) ;2. If (tryrule) => T, return t
t)
(t ;3. Otherwise always make sure
*s is
;good
(setq *s good*s)
(tryrules
(cdr rule-list) *s) ;Try rest of rules
)
)
))))))

(de TRYRULE (rule) ;Try a single node rewrite rule


;uncomment next line if TLC-Lisp
;(stackdemo "TRYRULE" rule
(cond
((null rule) t) ;1. All terms have been rewritten
((rewrite (car rule)) ;2. If nil return nil, rule fails
; recur on rest of rule
(tryrule (cdr rule))
)
)
)))))))))

Figure 3: The Simparse lexicon, grammar, and parser functions.

CALICO Journal, Volume 4 Number 4 63


; ***TLC-Lisp SIMPARSE UTILITIES***

(SETQ BLNKSTR (NEWSTRING 78 \ )) ;Make a string of blanks


(de INIT ()
(cls)
(print3 "Press any key for each successive parse step:")
)))))))

(df STACKDEMO (pl


&aux (message (first pl))
(arg (eval (second pl)))
(body (cddr pl))
)
(and (equal message "TRYRULE")
(null arg)
(setq arg t))
(cond
(ptracef
(prin3 message)
(prin3 " ")
(prin3 arg)
(tab 40)
(prin3 *s)
(terpri)
(console-in)
)
)
(prog1 (eval (cons 'progn body))
(and ptracef (unstack)))
)))))))

(de UNSTACK ()
(v-cursor 0 (sub1 (second (v-cursor))))
(print3 blnkstr)
(v-cursor 0 (sub1 (second (v-cursor))))
))))))

(:de :CLS ()
(v-scroll-up 0 0 79 24 0 7)
(v-cursor 0 0)
))))))))

Figure 4: Simparse TLC-Lisp stack-display utilities.

CALICO Journal, Volume 4 Number 4 64


Simparse code of Figure 3 into the editor, typing ctl-k ... j will evaluate the
entirety, defining the necessary functions and variables, and "loading" them into
the Lisp workspace." You are then ready to return to the interpreter. Do this by
typing esc. You can then return to the editor at any time by typing esc ... enter.
The listings in Figures 3 and 4 can also be made on any text editor, saved
in the file "simparse.1sp , and then entered into TLC Lisp as follows:

>>> (load "simparse.1sp")

The typo-free "simparse.1sp" code can be downloaded from CompuServe.


GO FLEFO (Foreign Language Education Forum) and DL6.
You will also find there "toplev.tlc." Loading this file will enable you to
use the up- and down-arrow keys to inspect the actual Lisp recursion stack when
debugging TLC-Lisp.

Surviving Lisp. (Exit) is the commonest way to leave Lisp. If, for some
reason, this doesn't work, the user can generate a "break" with ctl-g. If ctl-g
doesn't work, ctl-k will generate a near-fatal break." Either should return you to
the interpreter, of whom you can then beg your leave.

Running Simparse. The TLC-Lisp utility functions in Figure 4 display


how the Lisp recursion stack changes during the parse of a sentence. They are
not essential to the parser, but they are instructive. Unfortunately, since they
must directly control the screen cursor, they will need system-specific
modification to work with anything other than TLC-Lisp running in an IBM PC
compatible.
Sentences are parsed by entering (is-s '(put your words here)). For example,

XV.

>>> (is-s '(the engineer made a bomb))

(IS-S (*s) ... ). In XV [cf. (de IS-S ... ) in Fig. 3] *s is bound to the input string
(the engineer made a bomb). In the course of a successful parse, *s will be re-bound
to (cdr *s) every time (car *s) [the "current" word] satisfies a rule in the grammar.

(and argl arg2 ... argn). And is a Lisp primitive function which
successively evaluates arg1, arg2, ..., argn. If arg1 and arg2 and argn all return

CALICO Journal, Volume 4 Number 4 65


something [not nil], then the last something [the value of argn] is returned as the
value of and. But if any argument returns nothing [nil], the following arguments
are not evaluated, and the last value [nil] is returned as the value of the entire
function call. Thus, in parsing '(the engineer made a bomb) and evaluating the (and
... ) in the function body of IS-S,
1. (rewrite 's) will apply the grammar rules. We will for rewrite to return
something [t] in this case because the engineer made a bomb is grammatical.
2. (null *s) will return something because nothing will be left in *s [*s will
be nil, all the words having been "consumed" in the course of (rewrite 's).
3. (and (rewrite 's) (null *s) will return t because (null *s) will return t.
4.(is-s '(the engineer made a bomb)) will return t because (and (rewrite 's) (null
*s)) will return t.

(REWRITE (node) ... ). From a programmer's point-of-view then, rewrite


must be defined in such a way that if all the rewrite rules of a node are
successfully applied, rewrite will return something. On the other hand, if a given
sub-node can't be rewritten, rewrite should return nil. A first draft of rewrite
might look like the following:

XVI.
(de REWRITE (node) ;e.g., vp
(setq node (eval node)) ;vp ==> ((v np) (v))
(and
(rewrite 'v)
(rewrite 'np)
)
)))))))

There are several reasons why this draft wouldn't work. First of all we
notice that it only takes care of the rule vp --> (v np). It makes no provision for
getting to the second vp rule, vp --> (v). Secondly, we notice that the rules v -->
(made chased) and np --> ((det n)) are different. One expands to terminal symbols
[made, chased]. These are words in the lexicon and we won't be able to rewrite
them! These problems must be faced sooner or later. In the version of (rewrite ... )
given in Figure 3, we choose to confront them immediately. To handle the above
problems, REWRITE in Figure 3 makes use of the Lisp primitive conditional
function.

CALICO Journal, Volume 4 Number 4 66


(cond ...). Cond is an important Lisp primitive. Its plan-of-evaluation
might be described as:

XVII.
(cond
( ;cond clause #1
(if this is t) ;cond test #1
(then do this)
(and this)
...
(whole cond returns this last value)
)
;if cond test #1 is nil, then
( ;cond clause #2
(if this is t) ;cond test #2
(then do this)
...
(whole cond returns this last value)
)
;if cond test #2 is nil, then segue.

The cond in REWRITE first uses the Lisp primitive listp to check if the first
element of the node is a list [e.g., ((det n)), ((v np) (v))]. [p as in listp is often
appended to the names of Lisp test functions which literally return t or nil.] If
(car node) is a list, then the node rewrites to non-terminal symbols, i.e., phrase
structure rules. In this case, cond [and rewrite] will return the result of (tryrules
node *s). If (listp (car node)) returns nil the next conditional clause is tried.
If (atom (car node)) [atomp in some Lisp dialects] returns something [is t],
REWRITE reasons that the node is a list of words [which are atoms]. The cond
[and rewrite] will then return t or nil from (tryword (wordlist)).

(TRYWORD (wordlist). TRYWORD is a simple one-clause conditional. If


the (car *s) is a member of wordlist [member is a Lisp primitive in mot dialects],
something is returned. The *s is reset to the (cdr [of] *s). At this point the word
has been found in the lexicon, so we want the whole cond clause to return
something. However after the last word, (cdr *s) would return nil, and (setq *s
(cdr *s)) would also return nil even though the word had been successfully
looked up! So we must add a final t to the cond clause to be returned in all cases,
just to cover this last-word case.

CALICO Journal, Volume 4 Number 4 67


(de TRYRULES (RULE-LIST GOOD*S) ... ). TRYRULES is a classic
example of a recursive Lisp function. Such functions take the form of a
conditional function, and almost invariably begin with a cond clause of the form
((null arg) ... ). This is a "terminal condition." It will terminate recursion and
initiate "unstacking." In this case, when (tryrules nil ... ) is called, rule-list is bound
to nil. This means we have run out of rules; no rewrite rule for the current node
has worked; we must return nil.
If there are rules in the rule-list the second cond clause trys (car rule-list). If
the cond-clause "test" (tryrule (car rule-list)) succeeds, we return t.
If (tryrule (car rule-list)) fails, it is possible that *s became misbound in the
course of its failure. *s must therefore first be rebound to the "good" *s which was
stacked as a passed-in variable when (tryrules ... ) was first called. Then tryrules
can call itself recursively, trying (cdr rule-list).
If every rule has been tried and (cdr rule-list) is nil, then the first cond-
clause, ((null rule-list) nil) will cause the recursive (tryrules (cdr rule-list)) to fail
and unstack, as discussed above. With this, the calling last cond-clause of tryrules
[a stack level down] will also fail. (cond ... ) will return nil, and eventually the first
pre-recursive call on tryrules will unstack, returning nil.

(TRYRULE (rule) ... ). This is another classic recursive function. On the


first call by tryrules, rule will be bound to something like (det n). Assuming a
successful parse, on subsequent recursive calls rule will be set to (n) and then()
[equivalent to nil]. Therefore in this case, if the test of the first cond-clause, (null
rules), succeeds, it means all of the sub-nodes have succeeded, and t should be
returned.
If all rules have not yet been applied, they are applied recursively by the
second cond-clause. (Rewrite (car rule)) returns t or nil as a test. If the test succeeds
(tryrule (cdr rule) ... ) then returns t or nil according to the rest of the rule, and this
value is returned until the first, pre-recursive call on (tryrule...) is unstacked.

Limitations. Simparse can be extended to parse most right-branching


embeddings. For example, English relative clauses could be parsed by
incorporating rules like (det n s) and (det n relpro vp). Simparse cannot, however,
elegantly handle left-branching recursions. Chinese relative clauses, for example,
entail rules like np --> (s np). In conjunction with the rule s --> (np vp)), such left-
branching rules would cause Simparse to go into infinite recursion. (Rewrite 's)

CALICO Journal, Volume 4 Number 4 68


would first try to (rewrite 'np) which would first try to (rewrite 's), and so on
forever.
Simparse is also inefficient. To parse the sentence "the dog ate," it must
first try and then reject the rule vp --> (v np). If we reordered the vp rules with
(setq vp '((v) (v np))) "the dog ate the cat" would not parse because vp --> (v)
would succeed causing (tryrules '((v) (v np))) to unstack itself before ever
recurring with (tryrules '((v np)))!
Simparse also lacks mechanisms for the application of lexical and context-
sensitive rules. There is, for example, no mechanism in the parser for blocking "a
dogs" from being accepted by the rule np --> (det n), even if "dogs" were
somewhere marked "plural" in the lexicon. But the Simparse lexicon is obviously
its weakest subsystem. There is, for example, no mechanism for marking "dogs"
as the plural of "dog."

Conclusion
Because of Lisp's Protean flexibility, almost all modern parsers are being
developed in Lisp. These range from the early augmented transition network
(ATN) parsers of Woods, (1972) to more recent lexical functional grammar (LFG)
parsers (Kaplan & Bresnan, 1982). Even Generalized Phrase Structure Grammar
(GPSG) parsers, commonly associated with Prolog, are today being mostly
developed in Lisp (Gazdar, 1984).
Despite the variety of these parsers, Simparse-like mechanisms underlie
them all, and they all can be regarded as attempts to overcome Simparse's
limitations. it is by no means certain what form the instructional parsers of the
future will finally take, but an understanding of Lisp and Simparse should help
language teachers understand the future when it arrives.

References

Bates, M. 1978. The theory and practice of augmented transition network


grammars. In L. Bolc, Natural Language Communication with Computers. Lecture
Notes on Computer Science, No. 63. Berlin: Springer Verlag.
Gazdar, G. 1984. Recent computer implementations of phrase structure
grammars. Computational Linguistics 10, (July - December): 212-214.
Kaplan, R. M. and Bresnan, J. 1982. Lexical-functional grammar: A formal system
for grammatical representation. In J. Bresnan (ed.), The Mental Representation
of Grammatical Relations. Cambridge, MA: MIT Press.
Loritz, D. J. 1984. Natural language processing on microcomputers. Paper
presented at the 18th Annual TESOL Convention, March 6-11, Houston, TX.
Loritz, D., Chen, S., Chou, F., and Yen, W. 1987. "Xue-Jiu: a prototype Chinese
instructional parser." Georgetown University. In preparation.
Sanders, A. and Sanders, R. 1987. The essay processor, a syntactic parser for
German: A research report. Paper presented at the Fourth Annual CALICO
convention.
TLC-Lisp. 1986. The Lisp Company, POB 487, Redwood Estates, CA 98044.
$99.50.

CALICO Journal, Volume 4 Number 4 69


Winograd, T. 1983. Language as a cognitive process, vol. 1: Syntax: Reading, MA:
Addison-Wesley.
Winston, P. H. and Horn, B K. P. 1981. Lisp. reading, MA: Addison-Wesley.
Woods, W. A., Kaplan, R. M., and Nash-Webber, B. 1972. The lunar sciences
natural language information system: final report. BBN Report No. 2378.
Cambridge, MA: Bolt Berandek and Newman.

Author's Biodata
Donald Loritz is Assistant Professor of Computational Linguistics at
Georgetown University. He holds a B.A. from Harvard University, an Ed.D. in
Applied Psycholinguistics from Boston University, and has studied acoustic
phonetics at MIT. His current research includes augmented transition network
instructional parsers (with alpha-test systems in English and Chinese), digital
analysis of learner pronunciation, and parallel models of language learning and
language processing.

Author's Address
Donald Loritz
Department of Linguistics
Georgetown University
Washington, DC 20057

CALICO Journal, Volume 4 Number 4 70

Vous aimerez peut-être aussi