Vous êtes sur la page 1sur 9

Predictive Parsers

• Like recursive-descent but parser can


“predict” which production to use
Top-Down Parsing
– By looking at the next few tokens
and – No backtracking
Intro to Bottom-Up Parsing
• Predictive parsers accept LL(k) grammars
Lecture 7 – L means “left-to-right” scan of input
– L means “leftmost derivation”
– k means “predict based on k tokens of lookahead”
– In practice, LL(1) is used

Prof. Aiken CS 143 Lecture 7 1 Prof. Aiken CS 143 Lecture 7 2

LL(1) vs. Recursive Descent Predictive Parsing and Left Factoring

• In recursive-descent, • Recall the grammar


– At each step, many choices of production to use
– Backtracking used to undo bad choices
ET+E|T
T  int | int * T | ( E )
• In LL(1),
– At each step, only one choice of production
– That is
• Hard to predict because
• When a non-terminal A is leftmost in a derivation – For T two productions start with int
• The next input symbol is t – For E it is not clear how to predict
• There is a unique production A   to use
– Or no production to use (an error state)
• We need to left-factor the grammar
• LL(1) is a recursive descent variant without backtracking
Prof. Aiken CS 143 Lecture 7 3 Prof. Aiken CS 143 Lecture 7 4

Left-Factoring Example LL(1) Parsing Table Example

• Recall the grammar • Left-factored grammar


ET+E|T ETX X+E|
T  int | int * T | ( E ) T  ( E ) | int Y Y*T|
• The LL(1) parsing table: next input token

• Factor out common prefixes of productions int * + ( ) $


ETX E TX TX

X+E| X +E  
T  ( E ) | int Y T int Y (E)
Y *T   
Y*T|
rhs of production to use
leftmost non-terminal
Prof. Aiken CS 143 Lecture 7 5 Prof. Aiken CS 143 Lecture 7 6

1
LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors

• Consider the [E, int] entry • Blank entries indicate error situations
– “When current non-terminal is E and next input is
int, use production E  T X” • Consider the [E,*] entry
– This can generate an int in the first position – “There is no way to derive a string starting with *
from non-terminal E”
• Consider the [Y,+] entry
– “When current non-terminal is Y and current token
is +, get rid of Y”
– Y can be followed by + only if Y  

Prof. Aiken CS 143 Lecture 7 7 Prof. Aiken CS 143 Lecture 7 8

Using Parsing Tables LL(1) Parsing Algorithm

• Method similar to recursive descent, except initialize stack = <S $> and next
– For the leftmost non-terminal S repeat
– We look at the next input token a case stack of
– And choose the production shown at [S,a]
<X, rest> : if T[X,*next] = Y1…Yn
then stack  <Y1… Yn rest>;
• A stack records frontier of parse tree else error ();
– Non-terminals that have yet to be expanded
<t, rest> : if t == *next ++
– Terminals that have yet to matched against the input
– Top of stack = leftmost pending terminal or non-terminal
then stack  <rest>;
else error ();
• Reject on reaching error state until stack == < >
• Accept on end of input & empty stack
Prof. Aiken CS 143 Lecture 7 9 Prof. Aiken CS 143 Lecture 7 10

LL(1) Parsing Algorithm$ marks bottom of stack LL(1) Parsing Example

initialize stack = <S $> and next Stack Input Action


repeat For non-terminal X on top of stack, E$ int * int $ TX
lookup production
case stack of TX$ int * int $ int Y
<X, rest> : if T[X,*next] = Y1…Yn int Y X $ int * int $ terminal
then stack  <Y1… Yn rest>; YX$ * int $ *T
else error (); Pop X, push *TX$ * int $ terminal
<t, rest> : if t == *next ++ production
For terminal t on top of TX$ int $ int Y
then stack  <rest>; rhs on stack.
stack, check t matches next Note int Y X $ int $ terminal
input token.
else error ();
until stack == < >
leftmost YX$ $ 
symbol of rhs
X$ $ 
is on top of
the stack. $ $ ACCEPT
Prof. Aiken CS 143 Lecture 7 11 Prof. Aiken CS 143 Lecture 7 12

2
Constructing Parsing Tables: The Intuition Computing First Sets

• Consider non-terminal A, production A  , & token t Definition


• T[A,t] =  in two cases: First(X) = { t | X * t}  { | X * }

• If  * t  Algorithm sketch:
–  can derive a t in the first position
– We say that t  First() 1. First(t) = { t }
2.   First(X)
• If A   and  *  and S *  A t  • if X  
– Useful if stack has A, input is t, and A cannot derive t • if X  A1 … An and   First(Ai) for 1  i  n
– In this case only option is to get rid of A (by deriving ) 3. First()  First(X) if X  A1 … An 
• Can work only if t can follow A in at least one derivation
– and   First(Ai) for 1  i  n
– We say t  Follow(A)

Prof. Aiken CS 143 Lecture 7 13 Prof. Aiken CS 143 Lecture 7 14

First Sets. Example Computing Follow Sets

• Recall the grammar • Definition:


ETX X+E|
Follow(X) = { t | S *  X t  }
T  ( E ) | int Y Y*T|

• First sets • Intuition


First( ( ) = { ( } First( T ) = {int, ( } – If X  A B then First(B)  Follow(A) and
First( ) ) = { ) } First( E ) = {int, ( } Follow(X)  Follow(B)
First( int) = { int } First( X ) = {+,  } • if B *  then Follow(X)  Follow(A)
First( + ) = { + } First( Y ) = {*,  }
First( * ) = { * } – If S is the start symbol then $  Follow(S)

Prof. Aiken CS 143 Lecture 7 15 Prof. Aiken CS 143 Lecture 7 16

Computing Follow Sets (Cont.) Follow Sets. Example

Algorithm sketch: • Recall the grammar


ETX X+E|
1. $  Follow(S)
T  ( E ) | int Y Y*T|
2. First() - {}  Follow(X)
– For each production A   X  • Follow sets
3. Follow(A)  Follow(X) Follow( + ) = { int, ( } Follow( * ) = { int, ( }
– For each production A   X  where   First() Follow( ( ) = { int, ( } Follow( E ) = {), $}
Follow( X ) = {$, ) } Follow( T ) = {+, ) , $}
Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $}
Follow( int) = {*, +, ) , $}

Prof. Aiken CS 143 Lecture 7 17 Prof. Aiken CS 143 Lecture 7 18

3
Constructing LL(1) Parsing Tables Notes on LL(1) Parsing Tables

• Construct a parsing table T for CFG G • If any entry is multiply defined then G is not
LL(1)
• For each production A   in G do: – If G is ambiguous
– For each terminal t  First() do – If G is left recursive
• T[A, t] =  – If G is not left-factored
– If   First(), for each t  Follow(A) do – And in other cases as well
• T[A, t] = 
– If   First() and $  Follow(A) do
• T[A, $] =  • Most programming language CFGs are not LL(1)

Prof. Aiken CS 143 Lecture 7 19 Prof. Aiken CS 143 Lecture 7 20

Bottom-Up Parsing An Introductory Example

• Bottom-up parsing is more general than top- • Bottom-up parsers don’t need left-factored
down parsing grammars
– And just as efficient
– Builds on ideas in top-down parsing • Revert to the “natural” grammar for our
example:
• Bottom-up is the preferred method ET+E|T
T  int * T | int | (E)
• Concepts today, algorithms next time
• Consider the string: int * int + int
Prof. Aiken CS 143 Lecture 7 21 Prof. Aiken CS 143 Lecture 7 22

The Idea Observation

Bottom-up parsing reduces a string to the start • Read the productions in reverse
symbol by inverting productions: (from bottom to top)
• This is a rightmost derivation!
int * int + int T  int
int * T + int T  int * T int * int + int T  int
T + int T  int int * T + int T  int * T
T+T ET T + int T  int
T+E ET+E T+T ET
E T+E ET+E
E
Prof. Aiken CS 143 Lecture 7 23 Prof. Aiken CS 143 Lecture 7 24

4
Important Fact #1 A Bottom-up Parse

int * int + int


E
Important Fact #1 about bottom-up parsing:
int * T + int
T E
A bottom-up parser traces a rightmost T + int
derivation in reverse T+T
T+E
T T

E
int * int + int

Prof. Aiken CS 143 Lecture 7 25 Prof. Aiken CS 143 Lecture 7 26

A Bottom-up Parse in Detail (1) A Bottom-up Parse in Detail (2)

int * int + int int * int + int


int * T + int

int * int + int int * int + int

Prof. Aiken CS 143 Lecture 7 27 Prof. Aiken CS 143 Lecture 7 28

A Bottom-up Parse in Detail (3) A Bottom-up Parse in Detail (4)

int * int + int int * int + int


int * T + int int * T + int
T + int
T T + int
T

T+T
T T T

int * int + int int * int + int

Prof. Aiken CS 143 Lecture 7 29 Prof. Aiken CS 143 Lecture 7 30

5
A Bottom-up Parse in Detail (5) A Bottom-up Parse in Detail (6)

int * int + int int * int + int


E

int * T + int int * T + int


T + int
T E T + int
T E

T+T T+T
T+E
T T T+E
T T

E
int * int + int int * int + int

Prof. Aiken CS 143 Lecture 7 31 Prof. Aiken CS 143 Lecture 7 32

A Trivial Bottom-Up Parsing Algorithm Questions

Let I = input string • Does this algorithm terminate?


repeat
pick a non-empty substring  of I • How fast is the algorithm?
where X  is a production
if no such , backtrack • Does the algorithm handle all cases?
replace one  by X in I
until I = “S” (the start symbol) or all • How do we choose the substring to reduce at
possibilities are exhausted each step?

Prof. Aiken CS 143 Lecture 7 33 Prof. Aiken CS 143 Lecture 7 34

Where Do Reductions Happen? Notation

Important Fact #1 has an interesting • Idea: Split string into two substrings
consequence: – Right substring is as yet unexamined by parsing
– Let  be a step of a bottom-up parse (a string of terminals)
– Assume the next reduction is by X  – Left substring has terminals and non-terminals
– Then  is a string of terminals
• The dividing point is marked by a |
Why? Because X   is a step in a right- – The | is not part of the string
most derivation
• Initially, all input is unexamined |x1x2 . . . xn

Prof. Aiken CS 143 Lecture 7 35 Prof. Aiken CS 143 Lecture 7 36

6
Shift-Reduce Parsing Shift

Bottom-up parsing uses only two kinds of • Shift: Move | one place to the right
actions: – Shifts a terminal to the left string

Shift ABC|xyz  ABCx|yz

Reduce

Prof. Aiken CS 143 Lecture 7 37 Prof. Aiken CS 143 Lecture 7 38

Reduce The Example with Reductions Only

• Apply an inverse production at the right end


of the left string
– If A  xy is a production, then
int * int | + int reduce T  int
int * T | + int reduce T  int * T
Cbxy|ijk  CbA|ijk

T + int | reduce T  int


T+T| reduce E  T
T+E| reduce E  T + E
Prof. Aiken CS 143 Lecture 7 39
E| Prof. Aiken CS 143 Lecture 7 40

The Example with Shift-Reduce Parsing A Shift-Reduce Parse in Detail (1)


|int * int + int shift |int * int + int
int | * int + int shift
int * | int + int shift
int * int | + int reduce T  int
int * T | + int reduce T  int * T
T | + int shift
T + | int shift
T + int | reduce T  int
T+T| reduce E  T int * int + int
T+E| reduce E  T + E 
E| Prof. Aiken CS 143 Lecture 7 41 Prof. Aiken CS 143 Lecture 7 42

7
A Shift-Reduce Parse in Detail (2) A Shift-Reduce Parse in Detail (3)
|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int

int * int + int int * int + int


 
Prof. Aiken CS 143 Lecture 7 43 Prof. Aiken CS 143 Lecture 7 44

A Shift-Reduce Parse in Detail (4) A Shift-Reduce Parse in Detail (5)


|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int int * | int + int
int * int | + int int * int | + int
int * T | + int

int * int + int int * int + int


 
Prof. Aiken CS 143 Lecture 7 45 Prof. Aiken CS 143 Lecture 7 46

A Shift-Reduce Parse in Detail (6) A Shift-Reduce Parse in Detail (7)


|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int int * | int + int
int * int | + int int * int | + int
T T
int * T | + int int * T | + int
T | + int T | + int
T T + | int T

int * int + int int * int + int


 
Prof. Aiken CS 143 Lecture 7 47 Prof. Aiken CS 143 Lecture 7 48

8
A Shift-Reduce Parse in Detail (8) A Shift-Reduce Parse in Detail (9)
|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int int * | int + int
int * int | + int int * int | + int
T T
int * T | + int int * T | + int
T | + int T | + int
T + | int T T + | int T T
T + int | T + int |
int * int + int T+T| int * int + int
 
Prof. Aiken CS 143 Lecture 7 49 Prof. Aiken CS 143 Lecture 7 50

A Shift-Reduce Parse in Detail (10) A Shift-Reduce Parse in Detail (11)


|int * int + int |int * int + int
int | * int + int int | * int + int E
int * | int + int int * | int + int
int * int | + int int * int | + int
T E T E
int * T | + int int * T | + int
T | + int T | + int
T + | int T T T + | int T T
T + int | T + int |
T+T| int * int + int T+T| int * int + int
T+E|  T+E| 
Prof. Aiken CS 143 Lecture 7 51
E| Prof. Aiken CS 143 Lecture 7 52

The Stack Conflicts

• Left string can be implemented by a stack • In a given state, more than one action (shift or
reduce) may lead to a valid parse
– Top of the stack is the |
• If it is legal to shift or reduce, there is a shift-
• Shift pushes a terminal on the stack reduce conflict

• If it is legal to reduce by two different productions,


• Reduce pops 0 or more symbols off of the there is a reduce-reduce conflict
stack (production rhs) and pushes a non-
terminal on the stack (production lhs) • You will see such conflicts in your project!
– More next time . . .

Prof. Aiken CS 143 Lecture 7 53 Prof. Aiken CS 143 Lecture 7 54

Vous aimerez peut-être aussi