Académique Documents
Professionnel Documents
Culture Documents
net/publication/260132368
CITATION READS
1 8,005
1 author:
Rajendra Kumar
Vidya College of Engineering
25 PUBLICATIONS 64 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rajendra Kumar on 12 February 2014.
LR(k) grammars are subclasses of the context – free grammars. LR(k) grammars play an important role
in the study of programming languages and the design of compilers. The compiler of programming
language ALGOL is designed by implementing LR(1) parser. The equivalence of deterministic push
down automata to LR(1) grammars first studied by Knuth in 1965.The later work of Knuth generalised
a sequence of papers dealing with subclasses of context-free grammars having efficient parsing
algorithm. In 1970, Grahm showed that a number of other classes of grammars define exactly the
context-free languages. SLR (simple LR), LALR (look ahead LR), and canonical LR are subclasses of
LR grammars.
The decidability of equivalence is extended to the LL(k) grammars. LL(k) grammars are a proper
subset of the LR(k) grammars. In the design of programming languages and their compliers, it is
essentials to develop the parsing technique (i.e., techniques for obtaining the ‘reverse derivation’
of a given string in a context-free language). In other words, we need techniques to construct a
derivation tree or parse tree for a given sentence ‘s’ in a context–free language. For this purpose,
these grammars are widely used.
You can study parsing in detail, in compiler design. But here you are required to know, what the parsing
actually is. Parsing is the syntactical structure of a string, which is the first step in understanding the
meaning of the sentence.
A grammar for which we can construct a passing table is said to be an LR grammar. There are some
context-free grammars which are not LR grammars, but these can generally avoided for typically
programming languages constructs. In general, for a grammar to be LR it is sufficient that a left–to-
right shift-reduce parser should be able to recognize handles when they appear on top of the stack
when LR parser is implemented by using a stack. (You will study handles and shift–reduce parser later
in this chapter.)
A source of information that an LR parser can use to make its shift-reduce decision is the next ‘k’ input
symbols. A grammar that can be parsed by an LR parser examining up to ‘k’ input symbols (k look ahead
symbols) on each move is called an LR(k) grammar.
B.2 q Theory of Automata, Languages and Computation
There is a significant difference between LL and LR grammars. For a grammar to be LR(k), it must be
able to recognize the occurrence of the right side of production, having seen all of what is derived from
right side with ‘k’ input symbols of look ahead. This requirement for LL(k) grammar is quite different,
where we must be able to recognize the use of a production looking only the first ‘k’ symbols of what
its right side derives. Therefore, LR grammars can describe more languages than LL grammars.
Donald E Knuth is the recipient of Turing award of 1974. In 1962, he began to prepare textbooks about
programming techniques, and this work evolved into a projected seven-volume series entitled The Art
of Computer Programming. Volumes 1-3 first appeared in 1968, 1969, and 1973. Approximately one
million copies have already been printed, including translations into six languages.
He offered a reward of $2.56 to the first founder of any type of errors in his books. He explained that
$2.56 or 256 cents was decided to be one hexadecimal dollar. In the beginning, Knuth used to pay this
amount to the finders by check but due to check frauds he stopped such rewards in October 2008. As a
replacement, Knuth started his own “Bank of San Serrife” to keep an account for everyone who found
an error after 2006. He now sends a “Hexadecimal Certificate” instead of checks.
In LR(k) grammar ‘L’ stands for left-to-right scanning of the input string ‘R’ stands for producing right
most derivation and k is number of look ahead symbols in the input string.
To find a derivation tree for a given sentence ‘s’, we can start with ‘s’ and replace a substring say ‘s1’
by ‘s’, with the help of a variable A, if there is a production
A Æ s 1,
We repeat the process until we get S (the start symbol). But this process is not so easy; at every stage
there are so many choices to select the production to be used. If we make a wrong choice, we will not
get S, and in this case we will have to backtrack and try some other alternative production. However,
for certain subclasses of context-free grammars, it is possible to perform this process i.e., getting the
derivation in the reverse order for a given ‘S’ in a deterministic way.
Let us consider some sentential form ‘a b s’ of a context–free grammar G, where a and b are in (VN
» S)* and s ŒS*. Let us assume that we are interested in finding the production applied in the last step of
the derivation for ‘a b s’. If A Æ b is a production, it is likely that A Æ b is a production applied in the
last step, but we may not be sure that this is the case. It is possible to state that A Æ b is the production
applied in the last step by looking ahead for k symbols (i.e., k symbol to the right of b in sentence
‘a b s’), then grammar G is said to be LR(k) grammar, and the production A Æ b is called a handle
production and b is called simply a handle.
While dealing LR(k) grammar, we use notation fi to represent rightmost derivation. For example,
we write
a b
LR(k) and LL(k) Grammars q B.3
If b is derived from a by using right most derivation. Let us see an example. Consider a grammar G
having production S Æ AB,
A Æ a Ab | Ÿ, B Æ Bb| b.
The language generated by this grammar is
L(G) = {a mbn | n > m ≥ 0}
Starting with production S ÆAB, we can generate following sentential forms of G by using right most
derivations:
(i) S fi AB (from S Æ AB)
(ii) S ABbk (from S fi AB, B Æ Bb)
(iii) S α mAbmbk (from S fiAB, A Æ αAb, B Æ Bb | b)
(iv) S fi a bm m+k
(from S fi a mAbmbk, A Æ Ÿ)
where k ≥ 1. ‘AB’ appears as the right hand side of S Æ AB. So ‘AB’ may be a handle for AB or ABbk. If
we apply the handle to ‘AB’, we get S AB. If we apply the handle to ‘ABbk. we get
Sbk fi ABbk
But Sbk is not a sentential form. Therefore to decide whether AB can be a handle or not, we scan the
symbol to the right of ‘AB’. If, it is Ÿ (the null string) then ‘AB’ is treated as a handle. If the next symbol
is any terminal, say b, then ‘AB’ cannot be a handle. So only looking ahead for one symbol we can
decide whether ‘AB’ is a handle.
Let us consider the string a3b4, while scanning from left-to-right, we see that the handle production
A Æ Ÿ may be applied. The symbol Ÿ can be used as handle only when it is taken between the right most
‘a’ and left most ‘b’. In this case we get
a3Ab4 a3b4
and we are able to decide that A Æ Ÿ is a handle production only by looking ahead of one symbol (to
the right of Ÿ). If Ÿ is taken between two a’s, we get
aAaab4 a3b4
But aAaab4 is not a sentential form. Similarly, we can see that the correct handle production can be
determined by looking ahead of one symbol for various sentential forms.
Definition of LR(k) Grammar
n
A context free grammar G = (VN, S, P, S) in which S fi S, for n = 0, is an LR(k) grammar (k ≥ 0) if
(i) S a As a b s, where a, b Œ VN , s Œ S * *
If A Æ B is a production, then we have to decide whether A Æ B is used in the last step of right
most derivation of ‘a b s’. By looking k symbols beyond b in ‘a b s’, we can decide whether
A Æ b is the required production in the last step. If ‘a1b1s1’ is another sentential form satisfying
condition (iii) above (in the definition of LR(k) grammar), then we can apply the production
A1 Æ b1 in the last step of right most derivation of ‘a1b1s1’. But by definition, we have A = A1,
b = b1 and a = a1. Therefore, A Æ b is only the production we can apply and we are able to
decide this after looking k symbols beyond b. We repeat this process until we get S (the reverse
derivation).
(c) If grammar G is an LR(k) grammar, it is also an LR(k1) grammar, for all k1 > k.
The sentential forms of grammar G are bB, bBa2n, and ba2n+1. The elements in L(G)
are of the form {ba2n+1 | n ≥ 0} Œ L(G). As ‘bB’, ‘Baa’, ‘a’ are the possible right-hand sides of
productions, and B Æ a can be the last production. We can decide this without looking at any symbol
to the right of ‘a’ (zero look-ahead symbols). Similarly, the last productions for ‘bBa2n’ and ‘bB’ are
B Æ Baa and S Æ bB, respectively. Therefore, the grammar G is an LR(0) grammar.
B.2 Show that the CFG G given by productions S Æ AB, A Æ aAb, A Æ Ÿ, B Æ Bb,
B Æ b, is an LR(1) grammar, but not an LR(0) grammar.
B.1.2 LR-Items
An item for a given context-free grammar is a production with a dot anywhere in the right side, including
the beginning and end. For example, in case of the production S Æ AB, the LR-items are S Æ . AB,
S Æ A.B, S Æ AB., but note that in case of a null production A Æ Ÿ (or AÆ e) there is only one LR-item,
that is A Æ., because the length of Ÿ is zero.
Note Several parsing methods, such as the LL and LR methods detect an error as soon as possible. In
general, they have the viable-prefix property, means they can detect that an error has occurred as soon
as they see prefix of the input that is not a prefix of any string in the language.
Let us consider a grammar G with productions S Æ Sc | A, A Æ ab then we can get the following
derivation:
S Sc Ac abc
We see that AÆ ab is valid for viable prefix ab, A Æ a.b, is valid for viable prefix a, A Æ ab is valid
for viable viable prefix Ÿ. These all productions with a dot are valid LR-items of the grammar given
above.
Computing Sets of Valid Items
The definition of LR(0) grammar and the method of accepting the language L(G) for LR(0) grammar G
by a deterministic push down automaton each depends on the knowledge of valid items for each viable
prefix B. It turns out that for every CFG G, the set of viable prefixes is a regular set, and this regular set
is accepted by an NDFA whose status are valid items for grammar G.
An equivalent DFA can be constructed by applying subset construction on this NDFA. The status
of constructed DFA is the sets of valid items for B in response to the viable prefix b. The NDFA M
recognizing the viable prefixes for context-free grammar G = (VN, S, P, S) is defined as follows:
(i) d (q0, Ÿ) = {S Æ .a | S Æ a is a production}.
(ii) d (A Æ a.B, Ÿ) = {B Æ .b | B Æ b is a production},
(iii) d (A Æ a.Xg, X) = {A Æ aX.g}
Rule (ii) allows expansion of a variable B appearing immediately to the right of the dot. Rule (iii) allows
moving the dot over any grammar symbol X if X is the next input symbol.
First of all we construct set of valid items, then we construct NDFA by using rules (i)-
(iii) given in this section. The required NDFA is given by Fig. B.1 (see below).
Theorem B.1 The NDFA has the property that d (q0, b) contains A Æ a.g if and only if AÆ ag is
valid for b.
Proof
(i) Only if. We have to show that each item A Æ a.g in d (q0, b) is valid for g. We proceed by mathemati-
cal induction on the length of the shortest path labeled b from q0 to AÆ ag in the transition diagram
given by figure B.1. The basis (for length 1) is straight forward. The only path of length one from state
q0 are labeled Ÿ and go to items of the form S Æ a. Each of these items is valid for Ÿ because of the
right most derivation S a.
Let us assume the result is true paths shorter than k and there is a path of length k labeled b from q0
to A Æ a.g. There are two cases depending on whether the last edge is labeled Ÿ or not.
Case (i) The last edge is labeled X, for X Œ VN » S, the edge must come from a state AÆa1.Xg, where
a = a1.X. Then by the induction hypothesis, A Æ a1.Xg, is valid for b1, where b = b1*. Therefore, there
is a right most derivation
S hAs ha1X s
where ha1 = b1. This similar derivation shows that AÆ a1X.g is also valid for b.
Case (ii) The last edge is labeled Ÿ. In this case a must be Ÿ, and A Æ a.g is really A Æ .g. The item
in the previous state is of the form B Æ a ¢.Ag ¢, and is also valid for b. Thus there is a derivation
S hBs fi ha ¢Ag ¢s
where, b = ha ¢. Suppose g ¢ x for some terminal string x. Then the derivations
LR(k) and LL(k) Grammars q B.7
S hBs
ha ¢Ag ¢s
ha ¢Axs
ha ¢g xs
can also be written as
S ha ¢Axs ha ¢g xs
Thus, A Æ .g is valid for b, as b = ha ¢. Then
(ii) If. Let us assume A Æ a.g is valid for b, then
S b¢As b¢ag s
where, b¢a = b. If we are able to show that d (q0, b¢) contains A Æ .ag, then by rule (iii) we know that
d (q0, b) contains A Æ a.g. We therefore prove by induction on the length of b¢ag s that d (q0, b¢)
contains A Æ .ag.
The basis step follows from rule (i). For the induction we consider the step
S b¢As
In which the explicitly shown A has been introduced.
Now, we can write S b¢As as
S b¢¢Bx
b¢¢b¢¢¢Ab¢¢¢¢x
b¢¢b¢¢¢Ayx
where, b¢¢b¢¢¢ = b¢ and yx = s. Then by applying induction hypothesis to the derivation
S b¢¢Bx
b¢¢b¢¢¢Ab¢¢¢¢x
We know that B Æ b¢¢¢.Ab¢¢¢¢ is in d (q0, b¢¢). By using rule (iii) B Æb¢¢¢.Ab¢¢¢¢ is in d (q0, b¢¢b¢¢¢),
and by using rule (ii), A Æ a.g is in d (q0, b¢¢b¢¢¢). Since b¢¢b¢¢¢ = b¢, we have proved the induction
hypothesis.
B.4 Consider the transition diagram NDFA given by figure B.1 Construct equivalent
DFA ?
To find DFA equivalent to given NDFA, first of all we have to remove transition to the
dead states (empty set of items), then we define sates in terms of items I0, I1, I2 and so on. Each item
consists of a single complete item if there is a dot at right most position in right side of a production.
The states do not have any outgoing transition if there is a single production with a dot at right–most
position in right side of the production. There is a transition with label x, if this state consists of a
production with dot just before the x in right side of production. The constructed DFA is given by
figure below (Fig. B.2).
B.8 q Theory of Automata, Languages and Computation
q0 q4
q1 S¢ Æ Sc.
S¢ Æ .Sc
S¢ Æ S.c I4
S Æ .SA S S Æ S.A c
S Æ .A
A Æ .aSb A
A Æ .aSb A q5
A Æ .ab
A Æ .ab S Æ SA.
I1
I0 I5
q2
a S Æ A. a A
I2
q0 A q6
A Æ a.Sb A Æ aS.b
S q8
A Æ a.b S Æ S.A b
A Æ aSb.
S Æ .SA a A Æ .aSb
I8
S Æ .A A Æ .ab
A Æ .aSb b q7 I6
A Æ .ab A Æ ab.
I3 I7
a
Fig. B.2 DFA whose states are set of valid items
The research work of Knuth has been instrumental in establishing several subareas of computer science
and software engineering: LR(k) parsing; attribute grammars; the Knuth-Bendix algorithm for axiom-
atic reasoning; empirical studies of user programs and profiles; analysis of algorithms. In general, his
works have been directed towards the search for a proper balance between theory and practice.
He holds five patents and has published approximately 160 papers in addition to his 19 books. He holds
honorary doctorates from Oxford University, the University of Paris, St. Petersburg University, and
more than a dozen colleges and universities in America.
LR(k) and LL(k) Grammars q B.9
Property 2 If P is a deterministic push down automaton, there exists an LR(1) grammar G such that
L(G) = N(P), where N(P) is the set accepted by null store .
Property 3 If G is an LR(k) grammar, where k > 1, then there exists an equivalent grammar G2 which
is LR(1).
Property 4 The class of deterministic languages is a proper subclass of the class of context- free lan-
guages (The class of deterministic context free language can be denoted by LDCFL).
Property 5 LDCFL is closed under complementation but not under union and intersection.
Property 6 A context-free language is generated by an LR(0) grammar if and only if it is accepted by
a deterministic push down automaton and has prefix property.
Property 7 There is an algorithm to decide whether a given context-free grammar is LR(k) for a given
natural number k.
S X1X2X3X4……Xi As X1X2X3X4……Xks.
Thus to obtain the right–sentential form previous to X1X2X3…….Xks in a right most derivation we
reduce to A, by replacing Xi+1Xi+2Xi+3…….Xk on the top of the stack by A. That is, by sequence of pop
operations followed by an insert operation that pushes A and the correct covering state onto the stack,
the DPDA will enter a sequence of ID’s Instantaneous descriptions)
(q0, s0X1 s1X2…………..sk–1Xksk, s) (q, s0X1s1X2……si–1XiAs¢, s)
where s¢ =(si, A). Note that if the grammar is LR(0), then sk contains only A Æ a, unless a = Ÿ in that
case sk may contain incomplete items. However, by the definition of LR(0), none of these items has
a terminal to right of the the dot, or are complete. Thus, for any ‘y’ such that X1X2X3…..Xky is a right-
sentential form , X1X2X3…..XiAy must be the previous right –sentential form, therefore the reduction of
a to A is correct regardless of the current input.
Now consider the case when sk contains incomplete items. Then right–sentential form previous to
X1X2X3……….Xk could not be formed by reducing suffix of X1X2X3…….Xk to some variable, otherwise
there could be a complete item valid for X1X2X3…….Xk. There must be handle ending to the right of Xk
in X1X2X3……Xks as X1X2X3 …….Xk is a viable prefix. Thus the only appropriate action for the DPDA is
to shift the next input symbol onto the stack. That is
(q, s0X1s1X2…….sk-1Xksk, ay)|— *
(q, s0X1s1X2……sk-1Xkskat, y)
where t = (Sk, a) if t is not the empty set of items, X1X2X3……Xka is a viable prefix. If t is empty, we shall
prove that there is no possible previous right–sentential form for X1X2X3…..Xkay, so the original input is
not in the grammar’s language, and the DPDA terminates instead of making the move given above.
Theorem B.3 If L(G) is the language for an LR(0) grammar G, then language equivalent to L(G) is
N(A) for a deterministic push down automaton A (i.e., N(A) = L(G)).
Proof First we construct DFA D from grammar G, with transition function d that recognizes viable
prefixes of grammar G. Suppose the push down symbols in PDA A are the symbols of grammar G, and
states are as in DFA D, Thus we construct PDA A by a DFA D by adding a stack. Suppose PDA has a
state q, which is its initial state, along with the additional states used to perform reductions by sequences
of moves.
If a grammar is LR(0), then reduction in only possible way the previous right–sentential form when
the state of the PDA A starts with string s Œ L(G) on its input and Z0 on its stack, it constructs a right
most derivation for string ‘s’ in reverse. The remaining thing requiring in this proof is that when a shift
operation is called there could not be a handle among the grammar symbols X1X2X3……..Xk found on
the push down store at that time. If there were such a handle, then some DFA state on the stack, below
the top, would have a complete item. (Note that an item of the form A Æ .a is complete item i.e., a is
followed by a dot and dot is not a symbol from (VN » S).
Suppose that if there were such a state containing A Æ a., then it will immediately call for reduction
of a to variable ‘A’. If a π Ÿ, then item {A Æ a.} is removed from the stack. If a = Ÿ, then reduction
of Ÿ to ‘A’ takes place, causing A to be put on the push down store above X1X2X3…Xk. In case there will
always be a symbol above Xk in the push down store as long as the symbol occupies the bottom positions
on the push down store, but here A Æ Ÿ at position k cannot be a handle of any right sentential form
X1X2X3…..Xk b, where b contains a nonterminal.
LR(k) and LL(k) Grammars q B.11
The final point concerned with the acceptance by grammar G is, if the top item in the push down store
is {S Æ a.} where S is the start symbol of the grammar, then G pops its push down store in accepting
state. In this case we have completed the reverse of a right most derivation of the input string. Note that
‘S’ does not appear in right hand side of any production and it is not possible that there is an item of the
form A Æ S.a for viable prefix S. Therefore, there is no need to shift additional input symbols when
‘S’ alone appears on the push down store. In other words, we can say that L(G) always has the prefix
property if the grammar G is LR(0). This way we have proved that:
(i) The string ‘s’ is in L(G) i.e., s Œ L(G).
(ii) PDA A finds a right most derivation of ‘s’ that reduces ‘s’ to ‘S’, where S is start symbol of
grammar G.
(iii) Thus we can say that N(A) = L(G).
LR(0) Grammar from Deterministic PDA
If the language L is N(M) for a deterministic PDA M, then L has an LR(0) grammar, Let M = (Q, S, d, G,
q0, Z0, {f}) be a DPDA. We define a grammar G = (VN, S, P, S) such that L(G) = N(M), where VN contains
the start symbol S, the symbols [q X p] for q and p in Q and X Œ G, and the symbols AqaY for q Œ Q , a
Œ (S »{Ÿ}) and y Œ G. The productions of grammar G are defined by following rules:
R1: S Æ [q0 Z0 p] for all p Œ Q
R2: If d(q, a, Y) = (p, Ÿ), then there is a production [qYp] Æ AqaY.
R3: If d(q, a, Y) = (p1, X1, X2, X3, …., Xk) for k = 1, 2, 3, ……. then for each sequence of states p2,
p3, p4, ………pk+1, then there is a production
[qYpk+1] ÆAqaY[p1 X1 p2]…….[pk Xk pk+1].
R4: For all q, a and y, there is a production of the form AqaY Æ a
In this section we present LL(k) grammars which are widely used in top-down parsing by applying
certain techniques to certain subclass of context-free languages. A grammar G having the property of
looking ahead for k symbols, left to right scanning of input string in L(G) and left most derivation, is
called in an LL(k) grammar. Similarly, a grammar having the property of looking ahead for one symbol
in the input string, left to right scanning of input string in L(G), and left most derivation, is called an
LL(1) grammar .
In other words if a grammar in LL(k) looking ahead k symbol in the input is always enough to choose
the next move of the PDA. Such a grammar allows the construction of a deterministic top-down parser,
and there are systematic method for determining whether a context-free grammar is LL(k) and carrying
out the construction. For an LL(1) context-free grammar, the algorithm that decides the next step in the
derivation of a string by looking at the next input symbol can be include in a deterministic push down
automaton. The method of recursive descent is another way to formulate essentially the same algorithm;
the name refers to a collection of mutually recursive procedures corresponding to the variables in the
grammar. Let us look on to important terms that used in LL grammars, which are :
B.12 q Theory of Automata, Languages and Computation
Left Factoring Left factoring is a grammar transformation that is useful for producing a grammar in
suitable from for predictive parsing. If there are production A Æ a b1|a b2 in a grammar G, than after
left factoring the productions A Æ a b1 |a b2 become
A Æ a A¢
A¢ Æ b1|b2
by introducing a new variable A¢.
Left Recursion A grammar G is said to be left recursive if it has a nonterminal ‘A’ such that there is
a derivation
+
A fi Aa
for some string a. Top-down parsing methods cannot handle left recursive grammar; therefore the
elimination of left recursion is must. A left-recursive pair of productions A Æ Aa |b can be replaced by
non-left-recursive productions A Æ b A¢, A¢ Æ a A¢|Ÿ without changing the set of strings derivable from
variable A.
Note FIRST and FOLLOW are two functions which help in the construction of a predictive parser and
allow to fill in the entries in a predictive parsing table for a grammar G.
A grammar G with productions given below:
S Æ AS¢
S¢ Æ +AS¢|Ÿ
A Æ BA¢
A¢ Æ *BA¢|Ÿ
B Æ (S)|a
is an LL(1) grammar, because the parsing table for this grammar has on multiple-defined entries. The
parsing table for this grammar is given by Table B.1.
LR(k) and LL(k) Grammars q B.13
Here we are not concerned with how the table is constructed. Our aim is only to show that there are
no multiple entries in the Table B.1.
Similarly, we can say that the grammar given below (modeling of if–then–else statement) is not
LL(1)
Statement Æ ifExpressionthenStatementA | 0¸ A Æ elseExpression | Ÿ¸ Expression Æ 1
because, the predictive parsing table for this grammar has multiple (more then one) entries. See Table
B.2 (below).
Table B.2 Predictive parsing table for grammar Statement Æ if ExpressionthenStatement A | 0¸ A Æ elseExpression | Ÿ¸
Expression Æ 1
When we construct top-down parser for string w = a + a * a, then looking head for
one symbol is not enough in construction. For the string a + a * a, we can apply production E Æ a,
on seeing a. But if a is followed by + or *, we cannot apply a. So, in this case it is necessary to look
ahead for two symbols.
When we start with S, we have three productions S Æ E + S, S Æ E * S and S Æ E, the first two
symbols in a + a * a are a +. This forces us to apply S Æ E + S and not any other S-production.
Therefore, S Æ E + S. We now apply E Æ a to get S fi E + S fi a + S. Now the remaining part of w
is a * a. The first two symbols a * suggest that we must apply the production S Æ E * S in the third
step. So, S fi*
a + S fi a + E * S. The third symbol in w is a, so we can apply production E Æ a,
driving S fi a + E * S fi a + a * S. The remaining part of the input string w is a, so we have to apply
*
S fi E + S
fia+S
fia+E*S
fia+a*S
fia+a*E
fia+a*a
Hence, the grammar G is LL(2) grammar .
B.6 Show that the grammar S Æ 0A1, A Æ 0A1 |0 is not LR(0) grammar.
The language generated by the grammar is L = {0n +11n | n ≥ 1}. The production A Æ 0
is applied in the last step only when ‘0’ is followed by ‘01’. So A Æ 0 is a handle production if and
only if the symbol to the right of ‘0’ is scanned and found to be ‘1’. Similarly, A Æ 0A1 is a handle
production if and only if the symbol to the right of ‘0A1’ is ‘1’. Also , S Æ 0A1 is a handle production
if and only if the symbol to the right of ‘0A1’ is Ÿ. Therefore the given grammar is LR(1), but not an
LR(0) grammar.
Parsing is the method of determining whether a string of tokens can be derived by a grammar. Tokens
are sequence of characters having a collective meaning. A parser can be constructed for any grammar.
For any context- free grammar there is a parser that takes at most O(n3) time to parse a string having n
tokens.
Most of the parsing methods are classified into two parts: top-down and bottom-up methods. These
terms refer to the order in which the nodes in the parse tree (derivation tree) are constructed. In top-
down parsing, construction starts at the root and proceeds towards the leaf nodes (leaves), while in
bottom-up parsing construction starts at the leaves and proceeds towards the root. The advantage of top-
down method is that the efficient parsers can be construction more easily. The advantage of bottom-up
parsing method is that it can handle larger class of grammar and translation schemes.
Recursive–descent parsing is a top-down approach for syntax analysis in which we execute a set
of recursive procedures to the input. Predictive parsing is a special form of recursive–descent parsing,
in which the look ahead symbol unambiguously determines the procedures called in processing the
in put implicitly defines a pares tree for the input. Predictive parsers in a program consist of separate
procedures for every variable (nonterminal).
A general style of bottom-up syntax analysis is called shift–reduce parsing. Operator-precedence
paring is an easy-to-implement from of shift–reduce parsing. A much more general method of shift–
reduce parsing is called LR parsing. LR parsing is used in a number of automatic parser generators.
Shift–reduce parsing is based on bottom-up approach, it attempts to construct a parse tree for an
input string beginning at the leaf nodes (the bottom) and proceeding to root (the top). This process can
be considered as reducing a string (yield) ‘s’ to the start symbol (root) of the grammar. Therefore, this
parsing is called shift-reduce parsing.
LR(k) and LL(k) Grammars q B.15
The most efficient top-down and bottom-up methods work only on subclasses of grammars, but
several of these subclasses, such as the LL and LR grammars, are expensive enough to describe most
syntactic constructs in programming languages.
a b a b d
S S S
fi fi
(ii)
c A D c A D c A D
a a d
Fig. B.3 Derivation trees of Exercise B.2
B.4 To get the derivation tree for ‘aabbbb’, we scan the string from left to right. After scanning ‘a’, we look
ahead. If the next symbol is ‘a’ we continue to scan, if the next symbol is ‘b’, we decide that A Æ Ÿ is the
required handle production, therefore the last step of the right most derivation of ‘aabbbb’ is aaAbbbb
aaŸbbbb. To get the last step of aaAbbbb’, we scan the working string from left to right. Here, ‘aAb’ is a
possible handle. Now, we can decide that it is the right handle without looking ahead and we get aAbbb
aaAbbbb. Once again using the handle ‘aAb’, we obtain Abb aAbbb. To get the last step of the right
–most derivation of ‘Abb’, we scan ‘Abb’. A possible handle production is B Æ b. We also note that this
handle production can be applied to the first ‘b’ encountered but not to last ‘b’. So we get derivation ABb
Abb. For ‘AAb’, a possible handle is ‘Bb’. Hence we get derivation AB ABb. Finally, we get S AB.
Thus, we have following derivations:
aabbbb aaŸbbbb (by looking ahead of one symbol)
aAbbb aaAbbbb (Without looking ahead of any symbol)
Abb aAbbb (Without looking ahead of any symbol)
ABb Abb (Without looking ahead of any symbol)
AB ABb (Without looking ahead of any symbol)
S AB (by looking ahead of one symbol)
The derivation tree for ‘aabbbb’ is given below:
S
A B
a A b B b
a A b b
^
Fig. B.4 Derivation trees of Exercise B.4
B.5 Let us consider the string ‘aabb’ and ‘aabbbaa’. For string, ‘aabb’ the derivation must start with S fi aSb,
while deriving ‘aabbbaa’ with S fi SS. But if we see only the first four symbols, we are not able to decide
which case should be applied. The grammar is therefore not in LL(4). Since similar example can be made
for arbitrarily long strings, so the given grammar is not LL(k) for any value of k.
B.1 Eliminate left factoring from the grammar abstracting the ‘dangling –else’ problem
S Æ iEtS | iEtSeS | a, E Æ b,
where, i, t and e stand for ‘if’, ‘then’, and ‘else’, respectively; E and S stand for ‘Expression’ and
‘Statement’ respectively.
B.2 Show that a deterministic context-free grammar language is never inherently ambiguous.
B.3 Show that the deterministic context–free grammar languages are not closed under union, concatenation or
Kleene closure.
LR(k) and LL(k) Grammars q B.17
B.1 The productions of the form A Æ ab1 | ab2 | …. | abn | g, after eliminating left factoring could be written as
A Æ aA¢|g , A¢ Æ b1 | b2 | … | bn. Hence, the given grammar after elimination of left–factoring becomes
S Æ iEtSS ¢ | a, S ¢ Æ eS | Ÿ, E Æ b.
Thus we can expand S to iEtSS2 on input i, and wait until iEtS has been seen to decide whether to expand
S2 to eS or Ÿ.
B.3 Suppose L1 and L2 are two deterministic CFLs defined as {0i1i2j | i, j ≥ 0} and {0i1j2j | i, j ≥ 0} respectively.
Now we can show that L1 » L2 is not deterministic, but simply a CFL. Thus the DCFLs are not closed
under union. For concatenation, suppose L = a(L1 » L2). Then L is a deterministic CFL, because the
presence or absence of symbol ‘a’ tells us whether to look for a string in L1 or in L2. Make sure that a*
is a deterministic CFL. However a*L is not deterministic one. Therefore the deterministic CFLs are not
closed under concatenation. Similarly, we show that DCFLs are not closed under Kleene closure.
B.4 Any string generated from L1 = (a*ba ) has aaa, aab, or aba as first three symbols. If the first three
symbols are abb, then the string will be in L2 = (abbb*). For each case, we can find an LL grammar and
the both languages can be combined in an obvious fashion. For this we can have following grammar:
S Æ S1 | S2, S1 Æ aS1 | ba , S2 Æ abbB, B Æ bB | Ÿ
Note that this grammar is LL(3) grammar.
B.5 S Æ aSc | A | Ÿ, A Æ bAc | Ÿ. As long the currently scanned symbol is ‘a’, we apply production S Æ aSc, if
it is ‘b’, we apply production S Æ A, if it is symbol ‘c’ we can only use production S Æ Ÿ. The constructed
grammar is LL(1).
1. Aho, A.V. and J.D. Ullman, The Theory of Parsing, Translation and Compiling, Prentice- Hall, Englewood
Cliffs, NJ, 1972.
2. Aho, A. V., R. Sethi, and J. D. Ullman, Compilers. Principles, Techniques and Tools. Addison Wesley.
3. Knuth Donald, “On the Translation of Languages from Left to Right”, Information and Control, 1965.
4. Leung, H. and D. Wotschke, “On the size of parsers and LR(k)-grammars”, Theoretical Computer Science,
242(1-2): 59-69, 2000.
5. Susan L. Graham, Michael A. Harrison, and Walter L. Ruzzo, “An improved context-free recognizer”,
ACM Transactions on Programming Languages and Systems, 2(3):415-462, 1980.