Vous êtes sur la page 1sur 27

Top-Down Parsing

v A parser is top-down if it discovers a parse tree top to bottom - A top-down parse corresponds to a preorder traversal of the parse tree - A leftmost derivation is applied at each derivation step v Top-down parsers come in two forms - Predictive Parsers
Predict the production rule to be applied using lookahead tokens - Backtracking Parsers Will try different productions, backing up when a parse fails

v Predictive parsers are much faster than backtracking ones - Predictive parsers operate in linear time will be our focus - Backtracking parsers operate in exponential time will not be considered v Two kinds of top-down parsing techniques will be studied - Recursive-descent parsing - LL parsing
Top-Down Parsing 1 Compiler Design Muhammed Mudawwar

Top-Down Parsing by Recursive-Descent


v We view a nonterminal A as a definition of a procedure A
- Procedure A

will match the token sequence generated by nonterminal A

v The RHS of a production of A specifies the code for procedure A - Terminals are matched against input tokens - Nonterminals produce calls to corresponding procedures v If multiple production rules exist for a nonterminal A
- One - The

of them is predicted based on a lookahead token predicted rule is the only one that applies

The lookahead token is the next input token that should be matched Other rules will NOT be tried This is a predictive parsing technique, not a backtracking one

v A syntax error is detected when - Next token in the input sequence does NOT match the expected token
Top-Down Parsing 2 Compiler Design Muhammed Mudawwar

Example on Recursive-Descent Parsing


v Consider the following grammar for expressions in EBNF notation expr term { addop term } term factor { mulop factor } factor ( expr ) | id | num v Since three nonterminals exist, we need three parsing procedures - The curly brackets expressing repetition is translated into a while loop - The vertical bar expressing alternation is translated into a case statement
procedure expr( ) procedure term( ) begin begin term( ); factor( ); while token = ADDOP do while token = MULOP do match(ADDOP); match(MULOP); term( ); factor( ); end while; end while; end expr; end term; procedure factor( ) begin case token of (: match((); expr( ); match()); ID: match(ID); NUM: match(NUM); else syntax_error(token); end case; end factor;
Compiler Design Muhammed Mudawwar

Top-Down Parsing 3

Lookahead Token and Match Procedure


v The recursive-descent procedures use a token variable v The token variable is the lookahead token - Keeps track of the next token in the input sequence - Is initialized to the first token before parsing begins - Is updated after every call to the match procedure v The match procedure matches lookahead token with its parameter - Is called to match an expected token on the RHS of a production - Match succeeds if expected token = lookahead token and fails otherwise - Match calls scanner function to update the lookahead token
procedure match ( ExpectedToken ) begin if token = ExpectedToken then token := scan( ) ; else syntax_error( token , ExpectedToken ) ; end if; end match ;
Top-Down Parsing 4 Compiler Design Muhammed Mudawwar

Syntax Tree Construction for Expressions


v A recursive-descent parser can be used to construct a syntax tree SyntaxTree := expr ( ) ; Calling parser function for start symbol v Parsing functions allocate and return pointers to syntax tree nodes v Construction of a syntax tree for simple expressions is given below
- New node allocates

a tree node and returns a pointer to it

function expr( ) : TreePtr function term( ) : TreePtr begin begin left := term( ); left := factor( ); while token = ADDOP do while token = MULOP do op := ADDOP.op ; match(ADDOP); op := MULOP.op; match(MULOP); right := term( ); right := factor( ); left := new node(op, left, right); left := new node(op, left, right); end while; end while; return left; return left; end expr; end term;
Top-Down Parsing 5 Compiler Design Muhammed Mudawwar

Syntax Tree Construction cont'd


v For a factor, we have the following parsing function
- symtable.lookup(ID.name)

searches a symbol table for a given name - lookup function returns a pointer to an identifier symbol in symtable - Identifiers are inserted into symbol table when parsing a declaration - The NUM.ptr is a pointer to a literal symbol in the literal table - Literal constants are inserted into the literal table when scanned
function factor( ) : TreePtr begin case token of (: match((); ptr := expr( ); match()); ID: ptr := symtable.lookup(ID.name); match(ID); NUM: ptr := NUM.ptr; match(NUM); else syntax_error(token, Expecting a number, an identifier, or ( ); end case; return ptr; end factor;
Top-Down Parsing 6 Compiler Design Muhammed Mudawwar

Node Structure for Expression Trees


v A syntax tree node for expressions should have at least:
- Node operaror: +

, , * , / , etc. Different for each operator

For symbol table entries, the node operator is ID For literal table entries, the node operator is NUM Other node operators can be added to statements and various types of literals - Left Pointer:

pointer to left child expression tree

Can point to a tree node, to a symbol node, or to a literal node - Right

Pointer: pointer to right child expression tree

Can point to a tree node, to a symbol node, or to a literal node

v The following fields are also important:


- Line, Pos: - Type:

keeps track of line and position of each tree node

associates a type with each tree node

Type information is used to check the type of expressions


Top-Down Parsing 7 Compiler Design Muhammed Mudawwar

Tracing the Construction of a Syntax Tree


v Although recursive-descent is a top-down parsing technique
- The

construction of the syntax tree for expressions is bottom up - Tracing verifies the precedence and associativity of operators v The tree construction of a b + c * (b + d) is given below ptr8 - ptr 1 symtable.lookup(a) - ptr 2 symtable.lookup(b) + ptr3 ptr7 - ptr 3 new node( , ptr1 , ptr2 ) - ptr 4 symtable.lookup(c) * - ptr 2 symtable.lookup(b) ptr1 ptr2 ptr4 ptr6 - ptr 5 symtable.lookup(d) ID a ID b ID c + - ptr 6 new node(+ , ptr2 , ptr5 ) ptr2 ptr5 - ptr 7 new node(* , ptr 4 , ptr6 ) - ptr 8 new node(+ , ptr3 , ptr7 ) ID d
Top-Down Parsing 8 Compiler Design Muhammed Mudawwar

Syntax Tree Construction for if Statements


v An Extended BNF grammar for if statements with optional else: if-stmt if expr then stmt [ else stmt ] v A parsing function can eliminate the ambiguity of else
- By

matching an else token as soon as encountered

v Syntax tree of if stmt is constructed bottom-up


function ifstmt( ) : TreePtr begin match(IF); exprptr := expr( ); match(THEN); thenptr := stmt( ); if token = ELSE then match(ELSE); elseptr := stmt( ); elseptr := new node(ELSE, thenptr, elseptr); // ELSE node ifptr := new node(IF, exprptr, elseptr); // IF node points to ELSE node else ifptr := new node(IF, exprptr, thenptr); end if; // No ELSE node return ifptr; end ifstmt;
Top-Down Parsing 9 Compiler Design Muhammed Mudawwar

LL Parsing
v Uses an explicit stack rather than recursive calls to perform a parse v LL(k) parsing means that k tokens of lookahead are used
- The

first L means that token sequence is read from left to right - The second L means a leftmost derivation is applied at each step v An LL parser consists of - Parser stack that holds grammar symbols: non-terminals and tokens - Parsing table that specifies the parser action - Driver function that interacts with parser stack, parsing table and scanner
next token

Parsing Table
Top-Down Parsing 10

Parsing Stack

Scanner

Parser Driver

Output

Compiler Design Muhammed Mudawwar

LL Parsing Actions
v The LL parsing actions are: - Match: to match top of parser stack with next input token - Predict: to predict a production and apply it in a derivation step - Accept: to accept and terminate the parsing of a sequence of tokens - Error: to report an error message when matching or prediction fails v Consider the following grammar: S ( S ) S |
Parser Stack
S (S)S S)S (S)S)S S)S)S )S)S S)S )S S Empty

Input
(())$ (())$ ())$ ())$ ))$ ))$ )$ )$ $ $

Parser Action
Predict S ( S ) S Match ( Predict S ( S ) S Match ( Predict S Match ) Predict S Match ) Predict S Accept
Compiler Design Muhammed Mudawwar

Parsing of ( ( ) ) Stack grows backward from right to left

Top-Down Parsing 11

Grammar Analysis: Nonterminals that Derive


v Grammar analysis is necessary to
-

Determine whether a grammar can be used in LL parsing - Construct the LL parsing table that defines the actions of an LL parser

v A common analysis is to determine which nonterminals derive


-

Nonterminals that derive are called nullable

v To determine which nonterminals derive


-

We use an iterative marking algorithm - First, nonterminals that derive directly in one step are marked - Nonterminals that derive in two, three, steps are found and marked - Continue until no more nonterminals can be marked as deriving

v Consider the following grammar


A B C D BbC| CcD| D d|
Top-Down Parsing 12

A, B, C, and D are all nullable B, C, and D derive directly A derives indirectly: A B C D C D D


Compiler Design Muhammed Mudawwar

Grammar Analysis: The First Set


v Suppose we have the following grammar:
- The RHS of the productions of

S do not begin with terminals - Parser has no immediate guidance which production to apply to expand S - We may follow all possible derivations of S as shown below S A B C D Aa | Dc| dA | fC | h | i Bb CA e b
Aa S Bb
Dca C Aa
hca ica f C Aa bAa

dA b eb

v We predict S A a when - First token is h, i, f, or b. First(Aa) = {h, i, f, b} v We predict S B b when - First token is d or e. First(Bb) = {d, e} v Otherwise, we have an error
Top-Down Parsing 13

Compiler Design Muhammed Mudawwar

Grammar Analysis: Determining the First Set


v Formally, First() = {a T | * a} - First() is the set of all terminals that can begin a sentential form of v If * then First() v To calculate First() we apply the following rules - First() = {} = - First(a) = {a} = a - First(A) = First(1) First(2) = A and A 1 | 2 | - First(A) = First(A) = A and A is NOT nullable - First(A) = (First(A) {}) First() = A and A + v Consider the following grammar:
S A B C ABCd e|f| g|h| p|q First(A) First(B) First(C) First(S) = {e, f, } = {g, h, } = {p, q} = First(ABCd) = (First(A){}) (First(B){}) First(Cd) = {e, f} {g, h} {p, q} = {e, f, g, h, p, q}
Compiler Design Muhammed Mudawwar

Top-Down Parsing 14

Grammar Analysis: The Follow Set


v Suppose we have the following grammar
-

We follow derivations of S as shown below AcB aA bBS


aAcB

S A A B B
-

SAcB
cB

cbB a Ac B cbBScbBAcB cbBcB c

v We predict A a A when

Next token is a because First(aA) = {a} Next token is c because Follow(A) = {c} Next token is b because First(b B S) = {b} Next token is a, c, or $ (end-of-file token) because Follow(B) = {a, c, $}
Compiler Design Muhammed Mudawwar

v We predict A when
-

v Similarly, we predict B b B S when


-

v We predict B when
-

Top-Down Parsing 15

Grammar Analysis: Determining the Follow Set


v Formally, Follow(A) = {a T | S + A a } - Follow(A) is the set of terminals that may follow A in any sentential form v If S + A then $ Follow(A) - If A is not followed by any terminal then it is followed by the end of file - The $ represents the end-of-file token v We compute Follow(A) using the following rules: A is the start symbol then $ Follow(A) - Inspect RHS of productions for all occurrences of A Let a typical production be B A
- If If is NOT nullable then add First() to Follow(A) l Any token that can begin a sentential form of can follow A If is or derives then add (First() {}) Follow(B) to Follow(A) l If vanishes in a given derivation then what follows A is what follows B l B A A what follows A is what follows B is First()
Top-Down Parsing 16 Compiler Design Muhammed Mudawwar

Examples on the First and Follow Sets


Example 1:
S A A B B AcB aA bBS
Nonterminals that derive are A and B
First(S) = First(AcB) = (First(A){}) First(cB) = {a, c} First(A) = First(aA) First() = {a, } First(B) = First(bBS) First() = {b, } Follow(S) Follow(A) Follow(B) Follow(S) = {$} Follow(B) = {c} = Follow(S) First(S) = {$, a, c} = {$, a, c}

Example 2:
E Q T R F TQ +TQ| TQ | FR * FR | / FR | ( E ) | id

Nonterminals that derive are Q and R


First(E) = First(TQ) = First(T) = First(FR) = First(F) = {( , id} First(Q) = {+ , , } First(R) = {* , / , } Follow(E) = {$ , )} Follow(Q) = Follow(E) = {$ , )} Follow(T) = (First(Q){}) Follow(E) Follow(Q) = {+, , $, )} Follow(R) = Follow(T) = {+, , $, )} Follow(F) = (First(R){}) Follow(T) Follow(R) = {*, /, +, , $, )}
Compiler Design Muhammed Mudawwar

Top-Down Parsing 17

Grammar Analysis: Determining the Predict Set


v The predict set of a production A is defined as follows: - If is NOT nullable then Predict(A ) = First() - If is Nullable then Predict(A ) = (First() {}) Follow(A) - This is the set of lookahead tokens that will cause the selection of A v Example on determining the predict set: E Q Q Q T R R R F F TQ +TQ TQ FR * FR / FR ( E ) id Predict E Predict Q Predict Q Predict Q Predict T Predict R Predict R Predict R Predict F Predict F TQ +TQ TQ FR * FR / FR ( E ) id = First(TQ) = First(T) = {( , id} = First(+TQ) = { + } = First(TQ) = { } = Follow(Q) = {$ , )} = First(FR) = First(F) = {( , id} = First(*FR) = { * } = First(/FR) = { / } = Follow(R) = {+ , , $ , )} ={(} = { id }
Compiler Design Muhammed Mudawwar

Top-Down Parsing 18

LL(1) Grammars
v Not all context-free grammars are suitable for LL parsing v CFGs suitable for LL(1) parsing are called LL(1) Grammars v A grammar is LL(1) if for productions with the same LHS A

A 1 | 2 | | n Predict(A i) Predict(A j) = for all i j


The predict sets of productions with same LHS are pairwise disjoint v The following grammar is LL(1) S A A B B AcB aA bBS Predict(A a A) = {a} Disjoint Predict(A ) = Follow(A) = {c} Predict(B b B S) = {b} Predict(B ) = Follow(B) = Follow(S) First(S) = Disjoint = {$} Follow(B) {a, c} = {$, a, c}
Compiler Design Muhammed Mudawwar

Top-Down Parsing 19

Constructing the LL(1) Parsing Table


v The predict sets can be represented in an LL(1) parse table
- The

rows are indexed by the nonterminals - The columns are indexed by the tokens v If A is a nonterminal and tok is the lookahead token then - Table[A][tok] indicates which production to predict - If no production can be used Table[A][tok] gives an error value v Table[A][tok] = A iff tok predict(A ) v Example on constructing the LL(1) parsing table: 1: 2: 3: 4: 5: S A A B B AcB aA bBS Predict(1) = {a, c} Predict(2) = {a} Predict(3) = {c} Predict(4) = {b} Predict(5) = {$, a, c}
a S A B 1 2 5 4 b c 1 3 5 5 $ Empty slots indicate error conditions

Top-Down Parsing 20

Compiler Design Muhammed Mudawwar

Constructing the LL(1) Parsing Table cont'd


v Here is a second example on constructing the parsing table 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: E Q Q Q T R R R F F TQ +TQ TQ FR * FR / FR ( E ) id Predict(1) Predict(2) Predict(3) Predict(4) Predict(5) Predict(6) Predict(7) Predict(8) Predict(9) Predict(10) = { ( , id } ={+} ={} ={$,)} = { ( , id } ={*} ={/} = {+, , $, )} ={(} = { id }

+ E Q 2 T R F 8

*
3

( 1

) 4

id $ 1 4 5 8 10

5 8 6 7 9 8

v Because the above grammar is LL(1) - A unique production number is stored in a table entry v Blank entries correspond to error conditions - In practice, special error numbers are used to indicate error situations
Top-Down Parsing 21 Compiler Design Muhammed Mudawwar

LL(1) Parser Driver Algorithm


v The LL(1) parser driver algorithm can be described as follows:
Token := scan( ) Stack.push(StartSymbol) while not Stack.empty( ) do X := Stack.pop( ) if terminal(X) if X = Token then Token := scan( ) else process a syntax error at Token end if else (* X is a nonterminal *) Rule := Table[X][Token] if Rule = X Y1 Y2 Yn then for i from n downto 1 do Stack.push(Yi) end for else process a syntax error at Token end if end if end while if Token = $ then accept parsing else report a syntax error at Token end if
Top-Down Parsing 22

Scanner
next token Output Parsing Stack

Parser Driver

Parsing Table

Compiler Design Muhammed Mudawwar

Tracing an LL(1) Parser


Consider the parsing of id * (id + id) $
1: 2: 3: 4: 5: E Q Q Q T + E Q 2 T R F
Top-Down Parsing 23

Parser Stack
E TQ FRQ id R Q RQ *FRQ FRQ (E )RQ E )RQ TQ )RQ FRQ )RQ id R Q ) R Q RQ )RQ Q )RQ + TQ)RQ TQ )RQ FRQ )RQ id R Q ) R Q RQ )RQ Q )RQ )RQ RQ Q Empty

Input
id*(id+id)$ id*(id+id)$ id*(id+id)$ id*(id+id)$ *(id+id)$ *(id+id)$ (id+id)$ (id+id)$ id+id)$ id+id)$ id+id)$ id+id)$ +id)$ +id)$ +id)$ id)$ id)$ id)$ )$ )$ )$ $ $ $

Parser Action
Predict E T Q Predict T F R Predict F id Match id Predict R * F R Match * Predict F ( E ) Match ( Predict E T Q Predict T F R Predict F id Match id Predict R Predict Q + T Q Match + Predict T F R Predict F id Match id Predict R Predict Q Match ) Predict R Predict Q Accept

TQ 6: + T Q 7: T Q 8: 9: FR 10: / ( 1

R R R F F

) 4

*FR / FR ( E ) id id $ 1 4 5 8 10

*
3

5 8 8 6 7 9 8

Stack grows backwards from right to left

Compiler Design Muhammed Mudawwar

The Problem of Left Recursion


v Left recursive grammars fail to be LL(1) or even LL(k)
-A

left recursive production puts an LL parser into infinite loop - If a left recursive production is predicted then
Nonterminal on LHS is replaced with RHS of production The same nonterminal will appear again on top of parser stack The same production is predicted again Iteration goes forever

v Left recursion is commonly used to - Make an operation left associative


Expr Expr addop Term | Term - Specify

a list of identifiers, statements, etc.

StmtList StmtList ; Statement | Statement

v We need to eliminate left recursion to make a grammar LL(1)


Top-Down Parsing 24 Compiler Design Muhammed Mudawwar

Eliminating Immediate Left Recursion


v The simplest case of left recursion is immediate left recursion Form: A A | - The above productions of A generate strings of the form n, n 0 - We introduce a new nonterminal and use right recursion as follows: A Atail Atail Atail |
- General

v In general, if many immediate left recursive productions exist - General Form: A A 1 | A 2 | | A n | 1 | 2 | | m - We introduce a new nonterminal and use right recursion A 1 Atail | 2 Atail | | m Atail Atail 1 Atail | 2 Atail | | n Atail | v For example: Expr Expr + Term | Expr Term | Term becomes: Expr Term Exprtail Exprtail + Term Exprtail | Term Exprtail |
Top-Down Parsing 25 Compiler Design Muhammed Mudawwar

Eliminating Indirect Left Recursion


v In some cases, left recursion may be indirect For example: A B | and BA | v We can do substitutions to make left recursion immediate v Consider the following grammar: ABa | Aa | c BBb | Ab | d v First, we remove the immediate left recursion of A A B a Atail | c Atail Atail a Atail | v Second, we eliminate the indirect left recursion of B A b B B b | B a Atail b | c Atail b | d v Finally, we remove the immediate left recursion of B B c Atail b Btail | d Btail Btail b Btail | a Atail b Btail |
Top-Down Parsing 26 Compiler Design Muhammed Mudawwar

Left Factoring of Common Prefixes


v Another problem to LL parsers is to have a common prefix v An if statement may have 2 production with a common prefix: IfStmt IfStmt if Expr then StmtList end if ; if Expr then StmtList else StmtList end if ;

v An LL(1) parser cannot predict which production to apply v The solution is use left factoring of the common prefix Form: A | | | - Left Factoring solution: A Atail Atail | | |
- General

v The left factoring of the two if statement productions: IfStmt IfStmtTail


Top-Down Parsing 27

if Expr then StmtList IfStmtTail else StmtList end if ; | end if ;


Compiler Design Muhammed Mudawwar

Vous aimerez peut-être aussi