Vous êtes sur la page 1sur 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/260132368

Theory of Automata, Languages & Computation

Book · June 2010

CITATION READS

1 8,005

1 author:

Rajendra Kumar
Vidya College of Engineering
25 PUBLICATIONS   64 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Enhancing Latent fingerprint identification score through SIFT View project

All content following this page was uploaded by Rajendra Kumar on 12 February 2014.

The user has requested enhancement of the downloaded file.


B

   

LR(k) grammars are subclasses of the context – free grammars. LR(k) grammars play an important role
in the study of programming languages and the design of compilers. The compiler of programming
language ALGOL is designed by implementing LR(1) parser. The equivalence of deterministic push
down automata to LR(1) grammars first studied by Knuth in 1965.The later work of Knuth generalised
a sequence of papers dealing with subclasses of context-free grammars having efficient parsing
algorithm. In 1970, Grahm showed that a number of other classes of grammars define exactly the
context-free languages. SLR (simple LR), LALR (look ahead LR), and canonical LR are subclasses of
LR grammars.
The decidability of equivalence is extended to the LL(k) grammars. LL(k) grammars are a proper
subset of the LR(k) grammars. In the design of programming languages and their compliers, it is
essentials to develop the parsing technique (i.e., techniques for obtaining the ‘reverse derivation’
of a given string in a context-free language). In other words, we need techniques to construct a
derivation tree or parse tree for a given sentence ‘s’ in a context–free language. For this purpose,
these grammars are widely used.
You can study parsing in detail, in compiler design. But here you are required to know, what the parsing
actually is. Parsing is the syntactical structure of a string, which is the first step in understanding the
meaning of the sentence.

   

A grammar for which we can construct a passing table is said to be an LR grammar. There are some
context-free grammars which are not LR grammars, but these can generally avoided for typically
programming languages constructs. In general, for a grammar to be LR it is sufficient that a left–to-
right shift-reduce parser should be able to recognize handles when they appear on top of the stack
when LR parser is implemented by using a stack. (You will study handles and shift–reduce parser later
in this chapter.)
A source of information that an LR parser can use to make its shift-reduce decision is the next ‘k’ input
symbols. A grammar that can be parsed by an LR parser examining up to ‘k’ input symbols (k look ahead
symbols) on each move is called an LR(k) grammar.
B.2  q  Theory of Automata, Languages and Computation

There is a significant difference between LL and LR grammars. For a grammar to be LR(k), it must be
able to recognize the occurrence of the right side of production, having seen all of what is derived from
right side with ‘k’ input symbols of look ahead. This requirement for LL(k) grammar is quite different,
where we must be able to recognize the use of a production looking only the first ‘k’ symbols of what
its right side derives. Therefore, LR grammars can describe more languages than LL grammars.

Donald E Knuth is the recipient of Turing award of 1974. In 1962, he began to prepare textbooks about
programming techniques, and this work evolved into a projected seven-volume series entitled The Art
of Computer Programming. Volumes 1-3 first appeared in 1968, 1969, and 1973. Approximately one
million copies have already been printed, including translations into six languages.
He offered a reward of $2.56 to the first founder of any type of errors in his books. He explained that
$2.56 or 256 cents was decided to be one hexadecimal dollar. In the beginning, Knuth used to pay this
amount to the finders by check but due to check frauds he stopped such rewards in October 2008. As a
replacement, Knuth started his own “Bank of San Serrife” to keep an account for everyone who found
an error after 2006. He now sends a “Hexadecimal Certificate” instead of checks.

B.1 LR(k) GRAMMARS

In LR(k) grammar ‘L’ stands for left-to-right scanning of the input string ‘R’ stands for producing right
most derivation and k is number of look ahead symbols in the input string.
To find a derivation tree for a given sentence ‘s’, we can start with ‘s’ and replace a substring say ‘s1’
by ‘s’, with the help of a variable A, if there is a production
A Æ s 1,
We repeat the process until we get S (the start symbol). But this process is not so easy; at every stage
there are so many choices to select the production to be used. If we make a wrong choice, we will not
get S, and in this case we will have to backtrack and try some other alternative production. However,
for certain subclasses of context-free grammars, it is possible to perform this process i.e., getting the
derivation in the reverse order for a given ‘S’ in a deterministic way.
Let us consider some sentential form ‘a b s’ of a context–free grammar G, where a and b are in (VN
» S)* and s ŒS*. Let us assume that we are interested in finding the production applied in the last step of
the derivation for ‘a b s’. If A Æ b is a production, it is likely that A Æ b is a production applied in the
last step, but we may not be sure that this is the case. It is possible to state that A Æ b is the production
applied in the last step by looking ahead for k symbols (i.e., k symbol to the right of b in sentence
‘a b s’), then grammar G is said to be LR(k) grammar, and the production A Æ b is called a handle
production and b is called simply a handle.
While dealing LR(k) grammar, we use notation fi to represent rightmost derivation. For example,
we write
a b
LR(k) and LL(k) Grammars  q  B.3

If b is derived from a by using right most derivation. Let us see an example. Consider a grammar G
having production S Æ AB,
A Æ a Ab | Ÿ, B Æ Bb| b.
The language generated by this grammar is
L(G) = {a mbn | n > m ≥ 0}
Starting with production S ÆAB, we can generate following sentential forms of G by using right most
derivations:
(i) S fi AB (from S Æ AB)
(ii) S ABbk (from S fi AB, B Æ Bb)
(iii) S α mAbmbk (from S fiAB, A Æ αAb, B Æ Bb | b)
(iv) S fi a bm m+k
(from S fi a mAbmbk, A Æ Ÿ)
where k ≥ 1. ‘AB’ appears as the right hand side of S Æ AB. So ‘AB’ may be a handle for AB or ABbk. If
we apply the handle to ‘AB’, we get S AB. If we apply the handle to ‘ABbk. we get
Sbk fi ABbk
But Sbk is not a sentential form. Therefore to decide whether AB can be a handle or not, we scan the
symbol to the right of ‘AB’. If, it is Ÿ (the null string) then ‘AB’ is treated as a handle. If the next symbol
is any terminal, say b, then ‘AB’ cannot be a handle. So only looking ahead for one symbol we can
decide whether ‘AB’ is a handle.
Let us consider the string a3b4, while scanning from left-to-right, we see that the handle production
A Æ Ÿ may be applied. The symbol Ÿ can be used as handle only when it is taken between the right most
‘a’ and left most ‘b’. In this case we get
a3Ab4 a3b4
and we are able to decide that A Æ Ÿ is a handle production only by looking ahead of one symbol (to
the right of Ÿ). If Ÿ is taken between two a’s, we get
aAaab4 a3b4
But aAaab4 is not a sentential form. Similarly, we can see that the correct handle production can be
determined by looking ahead of one symbol for various sentential forms.
Definition of LR(k) Grammar
n
A context free grammar G = (VN, S, P, S) in which S fi S, for n = 0, is an LR(k) grammar (k ≥ 0) if
(i) S a As a b s, where a, b ΠVN , s ΠS * *

(ii) S a1A1s1 a1b1s1, where a1, b1 ΠVN*, s ΠS*, and


(iii) The first |a b | + k symbols of ‘a b s’ and ‘a1b1s1’ coincide. Then a = a1, A = A1 and b = b1.
The important points regarding LR(k) grammars, to be noted are:
(a) If ‘a b s’ or ‘a1b1s1’ have less than |a b | + k symbols, then we add some ‘blank symbols’ (say $),
on the right and compare.
(b) If a sentential form ‘a b s’ is encountered, we can get a right most derivation of ‘b s’ in the
following way:
B.4  q  Theory of Automata, Languages and Computation

If A Æ B is a production, then we have to decide whether A Æ B is used in the last step of right
most derivation of ‘a b s’. By looking k symbols beyond b in ‘a b s’, we can decide whether
A Æ b is the required production in the last step. If ‘a1b1s1’ is another sentential form satisfying
condition (iii) above (in the definition of LR(k) grammar), then we can apply the production
A1 Æ b1 in the last step of right most derivation of ‘a1b1s1’. But by definition, we have A = A1,
b = b1 and a = a1. Therefore, A Æ b is only the production we can apply and we are able to
decide this after looking k symbols beyond b. We repeat this process until we get S (the reverse
derivation).
(c) If grammar G is an LR(k) grammar, it is also an LR(k1) grammar, for all k1 > k.

B.1.1  LR(0) Grammars


LR(0) grammar is a restricted type of context-free grammar. This class of grammar is the first in the
family collectively called LR-grammars. LR(0) stands for “left-to-right scan of the input string producing
a right most derivation by using 0 (zero) symbols of look ahead in the input string.”
The LR(0) grammars define exactly the DCFLs (deterministic context-free languages) having the
prefix property. Note that the prefix property is not a major restriction, since by introducing an end
marker we can convert any DCFL into a DCFL with the prefix property. For example, if L is a DCFL,
then L$ = {s$ | s ΠL} is a DCFL with prefix property. The LR(0) restriction is too severe to provide
convenient and natural grammars for many programming languages. The LR(0) conditions capture the
flavor of its more useful generalizations which have been successfully used in several parser-generating
systems.

  B.1 Show that the context-free grammar G described by productions S Æ bB,


B Æ Baa, B Æ a, is LR(0) grammar.

  The sentential forms of grammar G are bB, bBa2n, and ba2n+1. The elements in L(G)
are of the form {ba2n+1 | n ≥ 0} Œ L(G). As ‘bB’, ‘Baa’, ‘a’ are the possible right-hand sides of
productions, and B Æ a can be the last production. We can decide this without looking at any symbol
to the right of ‘a’ (zero look-ahead symbols). Similarly, the last productions for ‘bBa2n’ and ‘bB’ are
B Æ Baa and S Æ bB, respectively. Therefore, the grammar G is an LR(0) grammar.

  B.2 Show that the CFG G given by productions S Æ AB, A Æ aAb, A Æ Ÿ, B Æ Bb,
B Æ b, is an LR(1) grammar, but not an LR(0) grammar.

  The language generated by grammar G is given by


L(G) = {ambn | n > m ≥ 0}.
By using right most derivation, some sentential forms of G are : AB, amAbmbk, ambm+k, where k ≥ 1.
If we apply the handle to ‘AB’, we get ABbk or Ab and we see that we cannot proceed without looking
ahead for sentential form of G in the last step of a rightmost derivation. Therefore, the grammar G is
an LR(1) grammar. We can also see that the handle production can not be determined without looking
ahead.
LR(k) and LL(k) Grammars  q  B.5

Definition of LR(0) Grammar  A grammar G is said to be LR(0) grammar if:


(i) its start symbol does not appear on the right side of any production, and
(ii) for every feasible prefix b of G, whenever A Æ a is a complete item valid for b, then neither a
complete item nor any other item with a terminal to the right of the dot (i.e., the period) is valid
for b.
Note  The only item that could be valid simultaneously with A Æ a, is set of productions with a
nonterminal to the right of the dot, and this can occur only if a is Ÿ, otherwise another violation of the
LR(0) condition can be expected to occur.

B.1.2  LR-Items
An item for a given context-free grammar is a production with a dot anywhere in the right side, including
the beginning and end. For example, in case of the production S Æ AB, the LR-items are S Æ . AB,
S Æ A.B, S Æ AB., but note that in case of a null production A Æ Ÿ (or AÆ e) there is only one LR-item,
that is A Æ., because the length of Ÿ is zero.
Note  Several parsing methods, such as the LL and LR methods detect an error as soon as possible. In
general, they have the viable-prefix property, means they can detect that an error has occurred as soon
as they see prefix of the input that is not a prefix of any string in the language.
Let us consider a grammar G with productions S Æ Sc | A, A Æ ab then we can get the following
derivation:
S Sc Ac abc
We see that AÆ ab is valid for viable prefix ab, A Æ a.b, is valid for viable prefix a, A Æ ab is valid
for viable viable prefix Ÿ. These all productions with a dot are valid LR-items of the grammar given
above.
Computing Sets of Valid Items
The definition of LR(0) grammar and the method of accepting the language L(G) for LR(0) grammar G
by a deterministic push down automaton each depends on the knowledge of valid items for each viable
prefix B. It turns out that for every CFG G, the set of viable prefixes is a regular set, and this regular set
is accepted by an NDFA whose status are valid items for grammar G.
An equivalent DFA can be constructed by applying subset construction on this NDFA. The status
of constructed DFA is the sets of valid items for B in response to the viable prefix b. The NDFA M
recognizing the viable prefixes for context-free grammar G = (VN, S, P, S) is defined as follows:
(i) d (q0, Ÿ) = {S Æ .a | S Æ a is a production}.
(ii) d (A Æ a.B, Ÿ) = {B Æ .b | B Æ b is a production},
(iii) d (A Æ a.Xg, X) = {A Æ aX.g}
Rule (ii) allows expansion of a variable B appearing immediately to the right of the dot. Rule (iii) allows
moving the dot over any grammar symbol X if X is the next input symbol.

  B.3 Consider the grammar G given by the productions S¢ Æ Sc, S Æ SA | A, A Æ aSb


| ab Construct NDFA for this grammar.
B.6  q  Theory of Automata, Languages and Computation

  First of all we construct set of valid items, then we construct NDFA by using rules (i)-
(iii) given in this section. The required NDFA is given by Fig. B.1 (see below).

Fig. B.1  NDFA of example B.3

Theorem B.1  The NDFA has the property that d (q0, b) contains A Æ a.g if and only if AÆ ag is
valid for b.

Proof
(i) Only if. We have to show that each item A Æ a.g in d (q0, b) is valid for g. We proceed by mathemati-
cal induction on the length of the shortest path labeled b from q0 to AÆ ag in the transition diagram
given by figure B.1. The basis (for length 1) is straight forward. The only path of length one from state
q0 are labeled Ÿ and go to items of the form S Æ a. Each of these items is valid for Ÿ because of the
right most derivation S a.
Let us assume the result is true paths shorter than k and there is a path of length k labeled b from q0
to A Æ a.g. There are two cases depending on whether the last edge is labeled Ÿ or not.
Case (i)  The last edge is labeled X, for X Œ VN » S, the edge must come from a state AÆa1.Xg, where
a = a1.X. Then by the induction hypothesis, A Æ a1.Xg, is valid for b1, where b = b1*. Therefore, there
is a right most derivation
S hAs ha1X s
where ha1 = b1. This similar derivation shows that AÆ a1X.g is also valid for b.
Case (ii)  The last edge is labeled Ÿ. In this case a must be Ÿ, and A Æ a.g is really A Æ .g. The item
in the previous state is of the form B Æ a ¢.Ag ¢, and is also valid for b. Thus there is a derivation
S hBs fi ha ¢Ag ¢s
where, b = ha ¢. Suppose g ¢ x for some terminal string x. Then the derivations
LR(k) and LL(k) Grammars  q  B.7

S hBs
ha ¢Ag ¢s
ha ¢Axs
ha ¢g xs
can also be written as
S ha ¢Axs ha ¢g xs
Thus, A Æ .g is valid for b, as b = ha ¢. Then
(ii) If. Let us assume A Æ a.g is valid for b, then
S b¢As b¢ag s
where, b¢a = b. If we are able to show that d (q0, b¢) contains A Æ .ag, then by rule (iii) we know that
d (q0, b) contains A Æ a.g. We therefore prove by induction on the length of b¢ag s that d (q0, b¢)
contains A Æ .ag.
The basis step follows from rule (i). For the induction we consider the step
S b¢As
In which the explicitly shown A has been introduced.
Now, we can write S b¢As as
S b¢¢Bx

b¢¢b¢¢¢Ab¢¢¢¢x

b¢¢b¢¢¢Ayx
where, b¢¢b¢¢¢ = b¢ and yx = s. Then by applying induction hypothesis to the derivation
S b¢¢Bx
b¢¢b¢¢¢Ab¢¢¢¢x
We know that B Æ b¢¢¢.Ab¢¢¢¢ is in d (q0, b¢¢). By using rule (iii) B Æb¢¢¢.Ab¢¢¢¢ is in d (q0, b¢¢b¢¢¢),
and by using rule (ii), A Æ a.g is in d (q0, b¢¢b¢¢¢). Since b¢¢b¢¢¢ = b¢, we have proved the induction
hypothesis.

  B.4 Consider the transition diagram NDFA given by figure B.1 Construct equivalent
DFA ?

  To find DFA equivalent to given NDFA, first of all we have to remove transition to the
dead states (empty set of items), then we define sates in terms of items I0, I1, I2 and so on. Each item
consists of a single complete item if there is a dot at right most position in right side of a production.
The states do not have any outgoing transition if there is a single production with a dot at right–most
position in right side of the production. There is a transition with label x, if this state consists of a
production with dot just before the x in right side of production. The constructed DFA is given by
figure below (Fig. B.2).
B.8  q  Theory of Automata, Languages and Computation

q0 q4
q1 S¢ Æ Sc.
S¢ Æ .Sc
S¢ Æ S.c I4
S Æ .SA S S Æ S.A c
S Æ .A
A Æ .aSb A
A Æ .aSb A q5
A Æ .ab
A Æ .ab S Æ SA.
I1
I0 I5
q2
a S Æ A. a A
I2
q0 A q6
A Æ a.Sb A Æ aS.b
S q8
A Æ a.b S Æ S.A b
A Æ aSb.
S Æ .SA a A Æ .aSb
I8
S Æ .A A Æ .ab
A Æ .aSb b q7 I6
A Æ .ab A Æ ab.
I3 I7

a
Fig. B.2  DFA whose states are set of valid items

Theorem B.2  Every LR(k) grammar G is unambiguous.


Proof  We have to show that for any x Œ S*, there exists one and only one (unique) right most deriva-
tion, or a unique left most derivation. Let us suppose we have two right most derivation for x, and we
prove that the last step is same in both derivation given by
(i) S aAs abs = x
(ii) S a ¢A¢s¢ a¢b¢s¢ = x
As a¢b¢s¢ = a¢b¢s¢ form the definition it follows that a = a¢, A = A¢ and b = b¢. As a¢b¢s¢ = a¢b¢s¢ we
get s = s¢ and so a¢A¢s¢ = a¢A¢s¢. Hence, the last step in both derivation (i) and (ii) are same. Repeating
the arguments for the other sentential forms derived in (i) and (ii), we can show that (i) and (ii) are
similar. Hence, the grammar G is unambiguous.

The research work of Knuth has been instrumental in establishing several subareas of computer science
and software engineering: LR(k) parsing; attribute grammars; the Knuth-Bendix algorithm for axiom-
atic reasoning; empirical studies of user programs and profiles; analysis of algorithms. In general, his
works have been directed towards the search for a proper balance between theory and practice.
He holds five patents and has published approximately 160 papers in addition to his 19 books. He holds
honorary doctorates from Oxford University, the University of Paris, St. Petersburg University, and
more than a dozen colleges and universities in America.
LR(k) and LL(k) Grammars  q  B.9

B.1.3  Properties of LR(k) Grammars


In this section we give some important properties of LR(k) grammars which play an important role
in parser generation and in other applications. Any push down automaton can accept a context-free
language, and any context-free language L we can construct a push down automaton which accepts L.
The following properties give the relation between grammars, and push down automata:
Property 1  If G is an LR(k) grammar than there exists a deterministic push down automaton which
accepts L(G).

Property 2  If P is a deterministic push down automaton, there exists an LR(1) grammar G such that
L(G) = N(P), where N(P) is the set accepted by null store .
Property 3  If G is an LR(k) grammar, where k > 1, then there exists an equivalent grammar G2 which
is LR(1).
Property 4  The class of deterministic languages is a proper subclass of the class of context- free lan-
guages (The class of deterministic context free language can be denoted by LDCFL).
Property 5  LDCFL is closed under complementation but not under union and intersection.
Property 6  A context-free language is generated by an LR(0) grammar if and only if it is accepted by
a deterministic push down automaton and has prefix property.
Property 7  There is an algorithm to decide whether a given context-free grammar is LR(k) for a given
natural number k.

B.1.4  LR(0) Grammars and Deterministic PDAs


In this section we show that every LR(0) grammar generates a deterministic context-free language
(DCFL), and every deterministic context-free language with the prefix property has an LR(0) grammar.
Since every language with an LR(0) grammar will be shown to have the prefix property, we have an
exact characterization of DCFLs; L is a deterministic CFL if and only if L$ has an LR(0) grammar.
Deterministic PDA from LR(0) Grammar
The way we used in the construction of a DPDA from an LR(0) grammar differs from the method we
followed in the construction of NPDA (nondeterministic PDA) form an arbitrary context-free language.
In this construction we shall trace out a right most derivation in reverse, by using a stack to hold a viable
prefix of a right–sentential form including all variables of right–sentential form, allowing the remainder
of the form to appear at the input.
To simulate right most derivation in an LR(0) grammar not only we do keep a viable prefix on the
stack, but also for every symbol we keep a state of the PDA recognizing viable prefixes. If viable prefix
X1X2X3………Xk is on the stack. Then the complete stack contents will be
s0X1s1X2………Xksk
where si is d (q0, X1X2X3……Xi). The top state Sk provides the valid items for X1X2X3……….Xk.
If sk contains A Æ a. then A Æ a. is valid for X1X2X3………Xk. Therefore, a is a suffix of X1X2X3……….
Xk, say a = Xi+1 Xi+2 Xi+3……. Xk (a may be Ÿ in case of i = k). Moreover, there is some string ‘s’ such that
X1X2X3………Xks is a right –sentential form, and there is a derivation
B.10  q  Theory of Automata, Languages and Computation

S X1X2X3X4……Xi As X1X2X3X4……Xks.
Thus to obtain the right–sentential form previous to X1X2X3…….Xks in a right most derivation we
reduce to A, by replacing Xi+1Xi+2Xi+3…….Xk on the top of the stack by A. That is, by sequence of pop
operations followed by an insert operation that pushes A and the correct covering state onto the stack,
the DPDA will enter a sequence of ID’s Instantaneous descriptions)
(q0, s0X1 s1X2…………..sk–1Xksk, s) (q, s0X1s1X2……si–1XiAs¢, s)
where s¢ =(si, A). Note that if the grammar is LR(0), then sk contains only A Æ a, unless a = Ÿ in that
case sk may contain incomplete items. However, by the definition of LR(0), none of these items has
a terminal to right of the the dot, or are complete. Thus, for any ‘y’ such that X1X2X3…..Xky is a right-
sentential form , X1X2X3…..XiAy must be the previous right –sentential form, therefore the reduction of
a to A is correct regardless of the current input.
Now consider the case when sk contains incomplete items. Then right–sentential form previous to
X1X2X3……….Xk could not be formed by reducing suffix of X1X2X3…….Xk to some variable, otherwise
there could be a complete item valid for X1X2X3…….Xk. There must be handle ending to the right of Xk
in X1X2X3……Xks as X1X2X3 …….Xk is a viable prefix. Thus the only appropriate action for the DPDA is
to shift the next input symbol onto the stack. That is
(q, s0X1s1X2…….sk-1Xksk, ay)|— *
(q, s0X1s1X2……sk-1Xkskat, y)
where t = (Sk, a) if t is not the empty set of items, X1X2X3……Xka is a viable prefix. If t is empty, we shall
prove that there is no possible previous right–sentential form for X1X2X3…..Xkay, so the original input is
not in the grammar’s language, and the DPDA terminates instead of making the move given above.
Theorem B.3  If L(G) is the language for an LR(0) grammar G, then language equivalent to L(G) is
N(A) for a deterministic push down automaton A (i.e., N(A) = L(G)).

Proof  First we construct DFA D from grammar G, with transition function d that recognizes viable
prefixes of grammar G. Suppose the push down symbols in PDA A are the symbols of grammar G, and
states are as in DFA D, Thus we construct PDA A by a DFA D by adding a stack. Suppose PDA has a
state q, which is its initial state, along with the additional states used to perform reductions by sequences
of moves.
If a grammar is LR(0), then reduction in only possible way the previous right–sentential form when
the state of the PDA A starts with string s ΠL(G) on its input and Z0 on its stack, it constructs a right
most derivation for string ‘s’ in reverse. The remaining thing requiring in this proof is that when a shift
operation is called there could not be a handle among the grammar symbols X1X2X3……..Xk found on
the push down store at that time. If there were such a handle, then some DFA state on the stack, below
the top, would have a complete item. (Note that an item of the form A Æ .a is complete item i.e., a is
followed by a dot and dot is not a symbol from (VN » S).
Suppose that if there were such a state containing A Æ a., then it will immediately call for reduction
of a to variable ‘A’. If a π Ÿ, then item {A Æ a.} is removed from the stack. If a = Ÿ, then reduction
of Ÿ to ‘A’ takes place, causing A to be put on the push down store above X1X2X3…Xk. In case there will
always be a symbol above Xk in the push down store as long as the symbol occupies the bottom positions
on the push down store, but here A Æ Ÿ at position k cannot be a handle of any right sentential form
X1X2X3…..Xk b, where b contains a nonterminal.
LR(k) and LL(k) Grammars  q  B.11

The final point concerned with the acceptance by grammar G is, if the top item in the push down store
is {S Æ a.} where S is the start symbol of the grammar, then G pops its push down store in accepting
state. In this case we have completed the reverse of a right most derivation of the input string. Note that
‘S’ does not appear in right hand side of any production and it is not possible that there is an item of the
form A Æ S.a for viable prefix S. Therefore, there is no need to shift additional input symbols when
‘S’ alone appears on the push down store. In other words, we can say that L(G) always has the prefix
property if the grammar G is LR(0). This way we have proved that:
(i) The string ‘s’ is in L(G) i.e., s Œ L(G).
(ii) PDA A finds a right most derivation of ‘s’ that reduces ‘s’ to ‘S’, where S is start symbol of
grammar G.
(iii) Thus we can say that N(A) = L(G).
LR(0) Grammar from Deterministic PDA
If the language L is N(M) for a deterministic PDA M, then L has an LR(0) grammar, Let M = (Q, S, d, G,
q0, Z0, {f}) be a DPDA. We define a grammar G = (VN, S, P, S) such that L(G) = N(M), where VN contains
the start symbol S, the symbols [q X p] for q and p in Q and X ΠG, and the symbols AqaY for q ΠQ , a
Œ (S »{Ÿ}) and y Œ G. The productions of grammar G are defined by following rules:
R1: S Æ [q0 Z0 p] for all p Œ Q
R2: If d(q, a, Y) = (p, Ÿ), then there is a production [qYp] Æ AqaY.
R3: If d(q, a, Y) = (p1, X1, X2, X3, …., Xk) for k = 1, 2, 3, ……. then for each sequence of states p2,
p3, p4, ………pk+1, then there is a production
[qYpk+1] ÆAqaY[p1 X1 p2]…….[pk Xk pk+1].
R4: For all q, a and y, there is a production of the form AqaY Æ a

B.2 LL(k) GRAMMARS

In this section we present LL(k) grammars which are widely used in top-down parsing by applying
certain techniques to certain subclass of context-free languages. A grammar G having the property of
looking ahead for k symbols, left to right scanning of input string in L(G) and left most derivation, is
called in an LL(k) grammar. Similarly, a grammar having the property of looking ahead for one symbol
in the input string, left to right scanning of input string in L(G), and left most derivation, is called an
LL(1) grammar .
In other words if a grammar in LL(k) looking ahead k symbol in the input is always enough to choose
the next move of the PDA. Such a grammar allows the construction of a deterministic top-down parser,
and there are systematic method for determining whether a context-free grammar is LL(k) and carrying
out the construction. For an LL(1) context-free grammar, the algorithm that decides the next step in the
derivation of a string by looking at the next input symbol can be include in a deterministic push down
automaton. The method of recursive descent is another way to formulate essentially the same algorithm;
the name refers to a collection of mutually recursive procedures corresponding to the variables in the
grammar. Let us look on to important terms that used in LL grammars, which are :
B.12  q  Theory of Automata, Languages and Computation

Left Factoring  Left factoring is a grammar transformation that is useful for producing a grammar in
suitable from for predictive parsing. If there are production A Æ a b1|a b2 in a grammar G, than after
left factoring the productions A Æ a b1 |a b2 become
A Æ a A¢
A¢ Æ b1|b2
by introducing a new variable A¢.
Left Recursion  A grammar G is said to be left recursive if it has a nonterminal ‘A’ such that there is
a derivation
+
A fi Aa
for some string a. Top-down parsing methods cannot handle left recursive grammar; therefore the
elimination of left recursion is must. A left-recursive pair of productions A Æ Aa |b can be replaced by
non-left-recursive productions A Æ b A¢, A¢ Æ a A¢|Ÿ without changing the set of strings derivable from
variable A.

B.2.1  LL(1) Grammar


A grammar is said to be LL(1) grammar if its parsing table has no multiple – defined entries. The first ‘L’
in LL(l ) stand for ‘left-to-right’ scanning of input, the second ‘L’ for producing leftmost derivation, and
(1) for using one input symbol of look ahead at each step to make parsing action decision.
LL(1) grammars have several distinct properties. No ambiguous or left recursive grammar can be
LL(1). A grammar G is LL(1) if and only if when A Æ Aa |b are two distinct productions of G with
following condition:
(i) For some terminal ‘a’, both a and b derive string beginning with ‘a’,
(ii) At most one from a and b can drive the null string.
*
(iii) If b fi Ÿ, then a dose not derive any string beginning with a terminal in FOLLOW(A).

Note  FIRST and FOLLOW are two functions which help in the construction of a predictive parser and
allow to fill in the entries in a predictive parsing table for a grammar G.
A grammar G with productions given below:
S Æ AS¢
S¢ Æ +AS¢|Ÿ
A Æ BA¢
A¢ Æ *BA¢|Ÿ
B Æ (S)|a
is an LL(1) grammar, because the parsing table for this grammar has on multiple-defined entries. The
parsing table for this grammar is given by Table B.1.
LR(k) and LL(k) Grammars  q  B.13

Table B.1  Predictive parsing table for grammar G

Input Symbols (Terminals)


Nonterminals
a + * ( ) $
S S Æ AS S Æ AS¢
S¢ S¢ Æ +AS¢ S¢ Æ Ÿ S¢ Æ Ÿ
A A Æ BA¢ A Æ BA¢
A¢ A¢ Æ Ÿ A¢ Æ *BA¢ A¢ Æ Ÿ A¢ Æ Ÿ
B BÆa B Æ (S)

Here we are not concerned with how the table is constructed. Our aim is only to show that there are
no multiple entries in the Table B.1.
Similarly, we can say that the grammar given below (modeling of if–then–else statement) is not
LL(1)
Statement Æ ifExpressionthenStatementA | 0¸ A Æ elseExpression | Ÿ¸ Expression Æ 1
because, the predictive parsing table for this grammar has multiple (more then one) entries. See Table
B.2 (below).

Table B.2  Predictive parsing table for grammar Statement Æ if ExpressionthenStatement A | 0¸ A Æ elseExpression | Ÿ¸
Expression Æ 1

Nonter- Input Symbols (Terminals)


minals 0 1 else if then $
Statement StatementÆ 0 Statement Æ ifExpressionthen
StatementA
A A Æ elseExpression AÆ Ÿ
AÆŸ
Expression Expression Æ1

  B.5 Consider the context –free grammar G having the productions S Æ E + S |E * S


|E, E Æ a, and w = a + a * a Œ L(G). Show that G is an LL(2) grammar.

  When we construct top-down parser for string w = a + a * a, then looking head for
one symbol is not enough in construction. For the string a + a * a, we can apply production E Æ a,
on seeing a. But if a is followed by + or *, we cannot apply a. So, in this case it is necessary to look
ahead for two symbols.
When we start with S, we have three productions S Æ E + S, S Æ E * S and S Æ E, the first two
symbols in a + a * a are a +. This forces us to apply S Æ E + S and not any other S-production.
Therefore, S Æ E + S. We now apply E Æ a to get S fi E + S fi a + S. Now the remaining part of w
is a * a. The first two symbols a * suggest that we must apply the production S Æ E * S in the third
step. So, S fi*
a + S fi a + E * S. The third symbol in w is a, so we can apply production E Æ a,
driving S fi a + E * S fi a + a * S. The remaining part of the input string w is a, so we have to apply
*

production S Æ E and E Æ a. Thus the left most derivation of a + a * a is


B.14  q  Theory of Automata, Languages and Computation

S fi E + S
fia+S
fia+E*S
fia+a*S
fia+a*E
fia+a*a
Hence, the grammar G is LL(2) grammar .

  B.6 Show that the grammar S Æ 0A1, A Æ 0A1 |0 is not LR(0) grammar.

  The language generated by the grammar is L = {0n +11n | n ≥ 1}. The production A Æ 0
is applied in the last step only when ‘0’ is followed by ‘01’. So A Æ 0 is a handle production if and
only if the symbol to the right of ‘0’ is scanned and found to be ‘1’. Similarly, A Æ 0A1 is a handle
production if and only if the symbol to the right of ‘0A1’ is ‘1’. Also , S Æ 0A1 is a handle production
if and only if the symbol to the right of ‘0A1’ is Ÿ. Therefore the given grammar is LR(1), but not an
LR(0) grammar.

B.3 PARSING OF LR AND LL GRAMMARS

Parsing is the method of determining whether a string of tokens can be derived by a grammar. Tokens
are sequence of characters having a collective meaning. A parser can be constructed for any grammar.
For any context- free grammar there is a parser that takes at most O(n3) time to parse a string having n
tokens.
Most of the parsing methods are classified into two parts: top-down and bottom-up methods. These
terms refer to the order in which the nodes in the parse tree (derivation tree) are constructed. In top-
down parsing, construction starts at the root and proceeds towards the leaf nodes (leaves), while in
bottom-up parsing construction starts at the leaves and proceeds towards the root. The advantage of top-
down method is that the efficient parsers can be construction more easily. The advantage of bottom-up
parsing method is that it can handle larger class of grammar and translation schemes.
Recursive–descent parsing is a top-down approach for syntax analysis in which we execute a set
of recursive procedures to the input. Predictive parsing is a special form of recursive–descent parsing,
in which the look ahead symbol unambiguously determines the procedures called in processing the
in put implicitly defines a pares tree for the input. Predictive parsers in a program consist of separate
procedures for every variable (nonterminal).
A general style of bottom-up syntax analysis is called shift–reduce parsing. Operator-precedence
paring is an easy-to-implement from of shift–reduce parsing. A much more general method of shift–
reduce parsing is called LR parsing. LR parsing is used in a number of automatic parser generators.
Shift–reduce parsing is based on bottom-up approach, it attempts to construct a parse tree for an
input string beginning at the leaf nodes (the bottom) and proceeding to root (the top). This process can
be considered as reducing a string (yield) ‘s’ to the start symbol (root) of the grammar. Therefore, this
parsing is called shift-reduce parsing.
LR(k) and LL(k) Grammars  q  B.15

The most efficient top-down and bottom-up methods work only on subclasses of grammars, but
several of these subclasses, such as the LL and LR grammars, are expensive enough to describe most
syntactic constructs in programming languages.

B.1 Eliminate left –recursion to the following productions:


E Æ E + T | T, T Æ T * F | F, F Æ (E) | a.
B.2 Consider the grammar S Æ cAD, A Æ ab | a, D Æ d. Show the steps in top down parsing for strings (i)
‘cabd’ (ii) ‘cad’.
B.3 Consider the grammar S Æ aABe, A Æ Abc | b, B Æ d. Show the steps of bottom-up parsing to reduce the
sentence ‘abbcde’ to S.
B.4 Consider the grammar G given by production S Æ AB, A Æ aAb, A Æ Ÿ, B Æ Bb, B Æ b. Construct the
derivation tree giving yield ‘aabbbb’.
B.5 Show that the grammar S Æ SS | aSb | bSa | Ÿ is not LL(k) grammar.

B.1 By eliminating left-recursion the given grammar becomes


E Æ TE¢
E Æ +TE¢ | Ÿ
T Æ FT¢
T ¢ Æ *FT | ^
F Æ (E) | a
B.2
S S S
fi fi
(i)
c A D c A D c A D

a b a b d
S S S
fi fi
(ii)
c A D c A D c A D

a a d
Fig. B.3  Derivation trees of Exercise B.2

B.3 abbcde fi aAbcde fi aAde fi aABe fi S


B.16  q  Theory of Automata, Languages and Computation

B.4 To get the derivation tree for ‘aabbbb’, we scan the string from left to right. After scanning ‘a’, we look
ahead. If the next symbol is ‘a’ we continue to scan, if the next symbol is ‘b’, we decide that A Æ Ÿ is the
required handle production, therefore the last step of the right most derivation of ‘aabbbb’ is aaAbbbb
aaŸbbbb. To get the last step of aaAbbbb’, we scan the working string from left to right. Here, ‘aAb’ is a
possible handle. Now, we can decide that it is the right handle without looking ahead and we get aAbbb
aaAbbbb. Once again using the handle ‘aAb’, we obtain Abb aAbbb. To get the last step of the right
–most derivation of ‘Abb’, we scan ‘Abb’. A possible handle production is B Æ b. We also note that this
handle production can be applied to the first ‘b’ encountered but not to last ‘b’. So we get derivation ABb
Abb. For ‘AAb’, a possible handle is ‘Bb’. Hence we get derivation AB ABb. Finally, we get S AB.
Thus, we have following derivations:
aabbbb aaŸbbbb (by looking ahead of one symbol)
aAbbb aaAbbbb (Without looking ahead of any symbol)
Abb aAbbb (Without looking ahead of any symbol)
ABb Abb (Without looking ahead of any symbol)
AB ABb (Without looking ahead of any symbol)
S AB (by looking ahead of one symbol)
The derivation tree for ‘aabbbb’ is given below:
S

A B

a A b B b

a A b b

^
Fig. B.4  Derivation trees of Exercise B.4

B.5 Let us consider the string ‘aabb’ and ‘aabbbaa’. For string, ‘aabb’ the derivation must start with S fi aSb,
while deriving ‘aabbbaa’ with S fi SS. But if we see only the first four symbols, we are not able to decide
which case should be applied. The grammar is therefore not in LL(4). Since similar example can be made
for arbitrarily long strings, so the given grammar is not LL(k) for any value of k.

B.1 Eliminate left factoring from the grammar abstracting the ‘dangling –else’ problem
S Æ iEtS | iEtSeS | a, E Æ b,
where, i, t and e stand for ‘if’, ‘then’, and ‘else’, respectively; E and S stand for ‘Expression’ and
‘Statement’ respectively.
B.2 Show that a deterministic context-free grammar language is never inherently ambiguous.
B.3 Show that the deterministic context–free grammar languages are not closed under union, concatenation or
Kleene closure.
LR(k) and LL(k) Grammars  q  B.17

B.4 Construct an LL grammar for the language L = (a* ba) » (abbb*)


B.5 Construct LL grammar for language L = { anbmcm+n | m, n ≥ 0}

B.1 The productions of the form A Æ ab1 | ab2 | …. | abn | g, after eliminating left factoring could be written as
A Æ aA¢|g , A¢ Æ b1 | b2 | … | bn. Hence, the given grammar after elimination of left–factoring becomes
S Æ iEtSS ¢ | a, S ¢ Æ eS | Ÿ, E Æ b.
Thus we can expand S to iEtSS2 on input i, and wait until iEtS has been seen to decide whether to expand
S2 to eS or Ÿ.
B.3 Suppose L1 and L2 are two deterministic CFLs defined as {0i1i2j | i, j ≥ 0} and {0i1j2j | i, j ≥ 0} respectively.
Now we can show that L1 » L2 is not deterministic, but simply a CFL. Thus the DCFLs are not closed
under union. For concatenation, suppose L = a(L1 » L2). Then L is a deterministic CFL, because the
presence or absence of symbol ‘a’ tells us whether to look for a string in L1 or in L2. Make sure that a*
is a deterministic CFL. However a*L is not deterministic one. Therefore the deterministic CFLs are not
closed under concatenation. Similarly, we show that DCFLs are not closed under Kleene closure.
B.4 Any string generated from L1 = (a*ba ) has aaa, aab, or aba as first three symbols. If the first three
symbols are abb, then the string will be in L2 = (abbb*). For each case, we can find an LL grammar and
the both languages can be combined in an obvious fashion. For this we can have following grammar:
S Æ S1 | S2, S1 Æ aS1 | ba , S2 Æ abbB, B Æ bB | Ÿ
Note that this grammar is LL(3) grammar.
B.5 S Æ aSc | A | Ÿ, A Æ bAc | Ÿ. As long the currently scanned symbol is ‘a’, we apply production S Æ aSc, if
it is ‘b’, we apply production S Æ A, if it is symbol ‘c’ we can only use production S Æ Ÿ. The constructed
grammar is LL(1).

1. Which of the following statement is false?


(a) LR(k) grammars are used in design of compliers.
(b) LR(k) grammar are subclasses of CFGs.
(c) LR(k) grammars play an important role in the study of programming languages.
(d) None of the above.
2. The LR(0) grammars :
(a) define exactly the deterministic CFLs having the prefix property.
(b) may be those grammar which do not allow start symbol in right side of any production.
(c) do not allow left most derivation.
(d) all of the above .
3. The context-free grammar given by production S Æ 1A, A Æ A00, A Æ 0 is:
(a) LL(1) grammar (b) LR(0) grammar (c) both (a) and (b) (c) none of these.
4. Which of the following statement is false?
(a) The language generated by LR(k) grammar is accepted by deterministic PDA.
(b) All LR(k) grammars are unambiguous.
(c) If G is a LR(k) grammar ( k > 1), then there exists an equivalent grammar G2 which is LR(1).
(d) None of the above.
B.18  q  Theory of Automata, Languages and Computation

5. If a grammar is in LL(1), then it can be compiled by


(a) a recursive decent parser (b) any parser
(c) can not be compiled (d) None of these.
6. The grammar S Æ XY | aX, X Æ AX |a, Y Æ a is an LR(k):
(a) for some k ≥ 1
(b) for some k ≥ 0
(c) It is not LR(k) for any k, because it is an ambiguous grammar.
(d) None of these above.
7. Which of the following language can be generated by an LR(k) grammar?
(a) L = {ambmcn | m, n ≥ 1} » { ambncn | m, n ≥ 1} (b) L = {ab2n+1 | n ≥ 0}
(c) Both (a) and (b) (d) None of the above.
8. Which of the following statement is false?
(a) There is no algorithm to decide whether a given CFG is LR(k) for a given natural number k.
(b) The grammar S Æ 0A1, A Æ 0A1 | 0, is LL(1).
(c) The grammars S Æ 0A1, A Æ 0A0 | 0, is not LR(0).
(d) All of the above.
9. For the sentential form an+1bn, the production A Æ a is applied in the last step only when ‘a’ is followed
by ‘ab’:
(a) then A Æ a is a handle production if and only if the symbol to the right of ‘a’ is scanned and found to
be ‘b’.
(b) A Æ aAb is a handle production if and only if the symbol to the right of ‘aAb’ is ‘b’.
(c) A Æ aAb is a handle production if and only if the symbol to the right of ‘aAb’ is ‘Ÿ’.
(d) All of the above.
10. Which of the grammar satisfy the LL(1) property?
(a) S Æ S1$, S1 Æ aaS1b | ab | bb (b) S Æ S1$, S1 Æ aA, A Æ aA | bA | Ÿ
(c) S Æ S1$, S Æ S1A | Ÿ, A Æ Aa | b (d) All of the above.
11. Which of the grammar does not satisfy the LL(1) property?
(a) S Æ S1$, S1 Æ S1A | ab, A Æ ab | aAbb (b) S Æ S1$, S1 Æ aAB|bBA, A Æ bS1| a, B Æ aS1| b
(c) S Æ S1$, S1 Æ aaS1b | ab (d) None of the above.
12. Synthesized attributes can easily be simulated by a:
(a) LR grammar (b) LL grammar (c) ambiguous grammar (d) All of these.
13. Which of the following is false statement ?
(a) An LL(k) grammar necessarily be a CFG.
(b) There are some LL(k) grammars which are not CFG.
(c) An LL(k) grammar necessarily be ambiguous.
(d) None of the above.
14. An LR(k) grammar :
(a) covers LL(k) classes.
(b) can only examine a maximum of k input symbols.
(c) can be used to identify handles and productions associated with a handle.
(d) All of the above.
LR(k) and LL(k) Grammars  q  B.19

15. The value of k in LR(k) can not be :


(a) 0 (b) 1 (c) 2 (d) None of these.
16. The set of all viable prefixes of right sentential form of a given grammar :
(a) can be used to control an LR(k) parse. (b) can be recognized by a finite state machine.
(c) (a) is true and (b) is false. (d) Both (a) and (b).
17. Which of the following statement is false?
(a) LR grammars can describe more languages than LL grammars.
(b) There are some left-recursive grammars which are LL(1).
(c) There are some CFGs that are not LL(1).
(d) Every LL(1) grammar is an LR(1) grammar.
18. A grammar given by the productions S Æ Aa | bAc | Bc | bBa, A Æ d, B Æ d :
(a) LR(1) grammar (b) LR(0) grammar (c) LR(2) grammar (d) Both (a) and (c)
19. Which of the grammar is not LL(1)?
(a) S Æ AaAb | BbBa, A Æ Ÿ, B Æ Ÿ (b) E Æ TR, R Æ +TS | −TS | Ÿ, T Æ (E) | id | num
(c) Both (a) and (b) (d) None of these.
20. An LR(1) grammar given by S Æ Sb | a is modified with S Æ ASb | a, A Æ Ÿ, then :
(a) modified grammar is not LR(1). (b) modified grammar is unambiguous.
(c) Both (a) and (b) (d) None of the above.
21. Which of the following is a complete item?
(a) A Æ .abg (b) A Æ abg. (c) A Æ a.bg (d) All of these.
22. Deterministic context-free languages are closed under :
(a) union and concatenation (b) concatenation and homomorphism
(c) Kleene closure (d) None of these.

1. (d) 2. (d) 3. (b) 4. (d) 5. (a) 6. (c)


7. (b) 8. (a) 9. (d) 10. (b) 11. (a) 12. (a)
13. (b) 14. (d) 15. (d) 16. (d) 17. (b) 18. (d)
19. (d) 20. (c) 21. (b) 22. (d)

1. Aho, A.V. and J.D. Ullman, The Theory of Parsing, Translation and Compiling, Prentice- Hall, Englewood
Cliffs, NJ, 1972.
2. Aho, A. V., R. Sethi, and J. D. Ullman, Compilers. Principles, Techniques and Tools. Addison Wesley.
3. Knuth Donald, “On the Translation of Languages from Left to Right”, Information and Control, 1965.
4. Leung, H. and D. Wotschke, “On the size of parsers and LR(k)-grammars”, Theoretical Computer Science,
242(1-2): 59-69, 2000.
5. Susan L. Graham, Michael A. Harrison, and Walter L. Ruzzo, “An improved context-free recognizer”,
ACM Transactions on Programming Languages and Systems, 2(3):415-462, 1980.

View publication stats

Vous aimerez peut-être aussi