Académique Documents
Professionnel Documents
Culture Documents
Donald Loritz
Georgetown University
Introduction
Consider the average secondary or university foreign language student
who spends 5 hours per week in a foreign language class of 15. His share of the
instructor's time is only 20 minutes per week. Since such limited instruction is
available, grammar rules are often taught to students so that students can self-
instruct by applying the rules against their own output. One problem with this is
that too many students spend too much time learning and applying grammar
rules and too little time learning and using language. Another problem is that the
grammar rules themselves are often too simplistic or mechanical to be of real use.
Nevertheless, some students have learned by this process, so the checking
of output against grammar rules must be at least potentially effective. There is
today no reason why students should have to learn and apply mechanical
grammar rules. Computers can do that with instructional parsers.
In this context, parsers are computer programs used to either (a) verify an
experimental grammar against accepted language output or (b) verify
experimental language output against an accepted grammar. The former are
theoretical parsers such as are written by theoretical computational linguists. The
latter are instructional parsers. These are typically modeled on some theoretical
I.
>>> ;the TLC Lisp interpreter's prompt.
>>> (add 1 2) ;user command
3 ;interpreter response
Lisp Morphology. In Lisp jargon, syllables are atoms. There are three
main types of atoms: strings, symbols, and numbers. Strings [e.g., "m" "m3"
"qwerty" "qwerty ASDFGH ZxcvbN"] are non-morphemic atoms; they have no
value, no "meaning." Symbols [e.g., m m3 qwerty dozen] are delimited by spaces
or parentheses. Symbols [sometimes called variables] have two main parts: a
label [or identifier or print-name] and a value [i.e., "meaning"]. These correspond
to the Saussurean notions of "signifier" and "significand," respectively. Only two
symbols are pre-defined in Lisp: t and nil [i.e., "something" and "nothing"]. All
other labels can be arbitrarily bound to any value. Symbols will be discussed
further under "Lisp semantics," below. The third type of atom, numbers, have
values [like symbols], but these are not arbitrary [e.g., 3 is always 3].
Atoms may be collected in lists. Lists are bounded by parentheses, e.g.,
(tom dick harry). Symbols frequently have values which are lists.
II.
>>> (add 1 (sub 3 1))
3
>>>
The recursion stack. A recursion stack is quite like the stack of papers that
accumulates on one's desk. Normally one works on the papers on the top of the
stack. Work on the underlying layers is suspended until the stack is unstacked
["unwound" in Lisp jargon]. Similarly, when the interpreter encounters ["reads"]
a left parenthesis, he opens a stack frame on the top of the recursion stack. When
he reads a right parenthesis, he evaluates the [completed] stack frame and
"unstacks" it, returning the result to the next lower stack frame. Figure I
illustrates this process (add 1 (sub 3 1)).
In Figure 1, we also see part of the interpreter represented on the stack. As
discussed above, the Lisp interpreter consists of a reader, an evaluator, and a
printer. The Lisp interpreter is often called "toplev," and it can be defined as:
III.
(de toplev ()
(forever
(print (eval (read)))
)
)
Figure 1: Recursive evaluation uses a stack. When a left parenthesis is read (a), a
new stack frame is opened. When a right parenthesis is read (b), the function call
in the top stack frame is evaluated and the result is returned to the lower
(calling) stack frame (b, c). When the interpreter (toplev) itself is reached (d), the
process is ready to repeat.
IV.
>>> (add 1 (sub junk 1))
>>>
prints its error message with the stack as in Figure 2a. By defining the error
function to include an error-handling interpreter, as outlined in Figure 2b, most
Lisp systems allow the programmer to inspect the stack "and see how he got into
this mess."
Figure 2: When an error occurs (a), the history of the computation leading up to
the error is preserved on the recursion stack. Adding an error-handling
interpreter to the error function (b) allows the user (or program!) to inspect that
history and take appropriate action.
V.
>>> n1
***error*** unbound variable: n1
>>> (set 'n1 9)
9
>>> n1
9
>>> (set 'n2 10)
10
>>> n2
10
The single quote marks in V have a subtle, but important effect. It can be
most readily seen by comparing the Lisp sentences in VI and VII:
VII.
(string 'n1 'n2) ==> "n1n2"
Henceforth we will take the tasks of the reader and printer largely for
granted and focus on the work of the evaluator, so the symbol "==>" can be read
as "evaluates to" or "returns>" In VI, n1 and n2 are evaluated: their values are
concatenated into a string. VII returns the concatenated string "n1n2." The quote
blocks evaluation of n1 and n2. Just as quotation marks in printed text might
mean "don't translate this," so 'arg in Lisp means don't evaluate arg.
Our of respect for tradition, most Lisp programmers still code VI and VII as in
VIII:
VIII.
(setq n1 9)
(setq n2 10)
IX.
>>> (setq dog 'chien)
chien
>>> )setq chien 'hund)
hund
>>> 'dog
dog
>>> dog
chien
>>> (eval 'dog)
chien
>>> (eval dog)
hund
The Lisp primitive function eval can be thought of as the opposite of quote.
Thus (eval 'dog) in IX would be equivalent to (eval (dont-eval dog)).
X.
>>> (putprop 'chien 'lang 'french)
french
>>> (putprop 'hund 'lang 'german)
german
>>> (getprop 'chien 'lang)
french
>>> (getprop 'hund 'lang)
german
XI.
>>> (setq dozen 12)
>>> (putprop 'dozen 'lang 'english)
english
>>> (putprop 'dozen 'pos 'n)
n
>>> (putprop 'dozen 'plural '-s)
>>> (plist 'dozen)
(lang english pos n plural -s pname "dozen")
>>> (pname 'dozen)
"dozen"
>>> dozen ; same as (eval 'dozen)
12
XII.
>>> '(add 1 (add 2 3))
(add 1 (add 2 3))
>>> (add 1 (add 2 3))
6
XIII.
>>> (setq s ;(a)
'((np (det an) (n engineer)) (vp (v makes)
(np (n bombs))))
)
((np (det an) (n engineer)) (vp (v makes)
(np (n bombs))))
XIV.
>>> (de ;(a) "define"
premier ; function-name
(liste) ; parameter list
There is some dialectal variation in the syntax of Lisp verb definitions. The
MACLISP family of dialects includes TLC-Lisp, but members of this family are
likely to use defun where TLC-Lisp uses de. Lisps of the Interlisp family use
variants of the structure (define (function-name (lambda (parameter list) function-
body))). [(Lambda ... ) is an "internal" Lisp function].
The parameter list of XIV(a) consists of one variable which is "passed" a
value when the function is called. I.e., in processing XIV(d) the argument l is
evaluated. The value of l ['(a b c)] is then bound to liste.
Note that liste will continue to be bound to '(a b c) until the process
effectively reaches the final closing parenthesis of (premier ... ). Any computation
that is "stacked on top of" (premier '(a b c)) [i.e., any function called from within
(premier '(a b c))] will find liste bound to '(a b c).
(Premier ...) then successively evaluates the sentences in the function body,
returning as the value of premier the value of the last sentence evaluated in the
body. Premier is simple: the body is only one sentence long. Premier takes the car of
liste and returns this as its value. In the next section we will look at longer and
more complex functions.
Loading Simparse. Procedures for loading and editing Lisp systems vary
from Lisp dialect to dialect, from machine to machine, and from operating
system to operating system.
TLC-Lisp has a powerful built-in editor (itself written in Lisp) which
emulates WordStar. To use it, enter
The screen will change colors to let you know you are now talking to the Lisp
editor and not the Lisp interpreter. When you have finished typing the
; ***THE LEXICON***
; ***THE PARSER***
(de UNSTACK ()
(v-cursor 0 (sub1 (second (v-cursor))))
(print3 blnkstr)
(v-cursor 0 (sub1 (second (v-cursor))))
))))))
(:de :CLS ()
(v-scroll-up 0 0 79 24 0 7)
(v-cursor 0 0)
))))))))
Surviving Lisp. (Exit) is the commonest way to leave Lisp. If, for some
reason, this doesn't work, the user can generate a "break" with ctl-g. If ctl-g
doesn't work, ctl-k will generate a near-fatal break." Either should return you to
the interpreter, of whom you can then beg your leave.
XV.
(IS-S (*s) ... ). In XV [cf. (de IS-S ... ) in Fig. 3] *s is bound to the input string
(the engineer made a bomb). In the course of a successful parse, *s will be re-bound
to (cdr *s) every time (car *s) [the "current" word] satisfies a rule in the grammar.
(and argl arg2 ... argn). And is a Lisp primitive function which
successively evaluates arg1, arg2, ..., argn. If arg1 and arg2 and argn all return
XVI.
(de REWRITE (node) ;e.g., vp
(setq node (eval node)) ;vp ==> ((v np) (v))
(and
(rewrite 'v)
(rewrite 'np)
)
)))))))
There are several reasons why this draft wouldn't work. First of all we
notice that it only takes care of the rule vp --> (v np). It makes no provision for
getting to the second vp rule, vp --> (v). Secondly, we notice that the rules v -->
(made chased) and np --> ((det n)) are different. One expands to terminal symbols
[made, chased]. These are words in the lexicon and we won't be able to rewrite
them! These problems must be faced sooner or later. In the version of (rewrite ... )
given in Figure 3, we choose to confront them immediately. To handle the above
problems, REWRITE in Figure 3 makes use of the Lisp primitive conditional
function.
XVII.
(cond
( ;cond clause #1
(if this is t) ;cond test #1
(then do this)
(and this)
...
(whole cond returns this last value)
)
;if cond test #1 is nil, then
( ;cond clause #2
(if this is t) ;cond test #2
(then do this)
...
(whole cond returns this last value)
)
;if cond test #2 is nil, then segue.
The cond in REWRITE first uses the Lisp primitive listp to check if the first
element of the node is a list [e.g., ((det n)), ((v np) (v))]. [p as in listp is often
appended to the names of Lisp test functions which literally return t or nil.] If
(car node) is a list, then the node rewrites to non-terminal symbols, i.e., phrase
structure rules. In this case, cond [and rewrite] will return the result of (tryrules
node *s). If (listp (car node)) returns nil the next conditional clause is tried.
If (atom (car node)) [atomp in some Lisp dialects] returns something [is t],
REWRITE reasons that the node is a list of words [which are atoms]. The cond
[and rewrite] will then return t or nil from (tryword (wordlist)).
Conclusion
Because of Lisp's Protean flexibility, almost all modern parsers are being
developed in Lisp. These range from the early augmented transition network
(ATN) parsers of Woods, (1972) to more recent lexical functional grammar (LFG)
parsers (Kaplan & Bresnan, 1982). Even Generalized Phrase Structure Grammar
(GPSG) parsers, commonly associated with Prolog, are today being mostly
developed in Lisp (Gazdar, 1984).
Despite the variety of these parsers, Simparse-like mechanisms underlie
them all, and they all can be regarded as attempts to overcome Simparse's
limitations. it is by no means certain what form the instructional parsers of the
future will finally take, but an understanding of Lisp and Simparse should help
language teachers understand the future when it arrives.
References
Author's Biodata
Donald Loritz is Assistant Professor of Computational Linguistics at
Georgetown University. He holds a B.A. from Harvard University, an Ed.D. in
Applied Psycholinguistics from Boston University, and has studied acoustic
phonetics at MIT. His current research includes augmented transition network
instructional parsers (with alpha-test systems in English and Chinese), digital
analysis of learner pronunciation, and parallel models of language learning and
language processing.
Author's Address
Donald Loritz
Department of Linguistics
Georgetown University
Washington, DC 20057