Top Down Parsing

Top-Down Parsing
v A parser is top-down if it discovers a parse tree top to bottom - A top-down parse corresponds to a preorder traversal of the parse tree - A leftmost derivation is applied at each derivation step v Top-down parsers come in two forms - Predictive Parsers
Predict the production rule to be applied using lookahead tokens - Backtracking Parsers Will try different productions, backing up when a parse fails
v Predictive parsers are much faster than backtracking ones - Predictive parsers operate in linear time will be our focus - Backtracking parsers operate in exponential time will not be considered v Two kinds of top-down parsing techniques will be studied - Recursive-descent parsing - LL parsing
Top-Down Parsing 1 Compiler Design Muhammed Mudawwar
Top-Down Parsing by Recursive-Descent

v We view a nonterminal A as a definition of a procedure A
- Procedure A
will match the token sequence generated by nonterminal A
v The RHS of a production of A specifies the code for procedure A - Terminals are matched against input tokens - Nonterminals produce calls to corresponding procedures v If multiple production rules exist for a nonterminal A
- One - The
of them is predicted based on a lookahead token predicted rule is the only one that applies
The lookahead token is the next input token that should be matched Other rules will NOT be tried This is a predictive parsing technique, not a backtracking one
v A syntax error is detected when - Next token in the input sequence does NOT match the expected token
Example on Recursive-Descent Parsing

v Consider the following grammar for expressions in EBNF notation expr term { addop term } term factor { mulop factor } factor ( expr ) | id | num v Since three nonterminals exist, we need three parsing procedures - The curly brackets expressing repetition is translated into a while loop - The vertical bar expressing alternation is translated into a case statement
procedure expr( ) procedure term( ) begin begin term( ); factor( ); while token = ADDOP do while token = MULOP do match(ADDOP); match(MULOP); term( ); factor( ); end while; end while; end expr; end term; procedure factor( ) begin case token of (: match((); expr( ); match()); ID: match(ID); NUM: match(NUM); else syntax_error(token); end case; end factor;
Compiler Design Muhammed Mudawwar
Top-Down Parsing 3
Lookahead Token and Match Procedure

v The recursive-descent procedures use a token variable v The token variable is the lookahead token - Keeps track of the next token in the input sequence - Is initialized to the first token before parsing begins - Is updated after every call to the match procedure v The match procedure matches lookahead token with its parameter - Is called to match an expected token on the RHS of a production - Match succeeds if expected token = lookahead token and fails otherwise - Match calls scanner function to update the lookahead token
procedure match ( ExpectedToken ) begin if token = ExpectedToken then token := scan( ) ; else syntax_error( token , ExpectedToken ) ; end if; end match ;
Syntax Tree Construction for Expressions

v A recursive-descent parser can be used to construct a syntax tree SyntaxTree := expr ( ) ; Calling parser function for start symbol v Parsing functions allocate and return pointers to syntax tree nodes v Construction of a syntax tree for simple expressions is given below
- New node allocates
a tree node and returns a pointer to it
function expr( ) : TreePtr function term( ) : TreePtr begin begin left := term( ); left := factor( ); while token = ADDOP do while token = MULOP do op := ADDOP.op ; match(ADDOP); op := MULOP.op; match(MULOP); right := term( ); right := factor( ); left := new node(op, left, right); left := new node(op, left, right); end while; end while; return left; return left; end expr; end term;
Syntax Tree Construction cont'd

v For a factor, we have the following parsing function
- symtable.lookup(ID.name)
searches a symbol table for a given name - lookup function returns a pointer to an identifier symbol in symtable - Identifiers are inserted into symbol table when parsing a declaration - The NUM.ptr is a pointer to a literal symbol in the literal table - Literal constants are inserted into the literal table when scanned
function factor( ) : TreePtr begin case token of (: match((); ptr := expr( ); match()); ID: ptr := symtable.lookup(ID.name); match(ID); NUM: ptr := NUM.ptr; match(NUM); else syntax_error(token, Expecting a number, an identifier, or ( ); end case; return ptr; end factor;
Node Structure for Expression Trees

v A syntax tree node for expressions should have at least:
- Node operaror: +
, , * , / , etc. Different for each operator
For symbol table entries, the node operator is ID For literal table entries, the node operator is NUM Other node operators can be added to statements and various types of literals - Left Pointer:
pointer to left child expression tree
Can point to a tree node, to a symbol node, or to a literal node - Right
Pointer: pointer to right child expression tree
Can point to a tree node, to a symbol node, or to a literal node
v The following fields are also important:

- Line, Pos: - Type:
keeps track of line and position of each tree node
associates a type with each tree node
Type information is used to check the type of expressions

Tracing the Construction of a Syntax Tree

v Although recursive-descent is a top-down parsing technique
- The
construction of the syntax tree for expressions is bottom up - Tracing verifies the precedence and associativity of operators v The tree construction of a b + c * (b + d) is given below ptr8 - ptr 1 symtable.lookup(a) - ptr 2 symtable.lookup(b) + ptr3 ptr7 - ptr 3 new node( , ptr1 , ptr2 ) - ptr 4 symtable.lookup(c) * - ptr 2 symtable.lookup(b) ptr1 ptr2 ptr4 ptr6 - ptr 5 symtable.lookup(d) ID a ID b ID c + - ptr 6 new node(+ , ptr2 , ptr5 ) ptr2 ptr5 - ptr 7 new node(* , ptr 4 , ptr6 ) - ptr 8 new node(+ , ptr3 , ptr7 ) ID d
Syntax Tree Construction for if Statements

v An Extended BNF grammar for if statements with optional else: if-stmt if expr then stmt [ else stmt ] v A parsing function can eliminate the ambiguity of else
- By
matching an else token as soon as encountered
v Syntax tree of if stmt is constructed bottom-up

function ifstmt( ) : TreePtr begin match(IF); exprptr := expr( ); match(THEN); thenptr := stmt( ); if token = ELSE then match(ELSE); elseptr := stmt( ); elseptr := new node(ELSE, thenptr, elseptr); // ELSE node ifptr := new node(IF, exprptr, elseptr); // IF node points to ELSE node else ifptr := new node(IF, exprptr, thenptr); end if; // No ELSE node return ifptr; end ifstmt;
LL Parsing
v Uses an explicit stack rather than recursive calls to perform a parse v LL(k) parsing means that k tokens of lookahead are used
- The
first L means that token sequence is read from left to right - The second L means a leftmost derivation is applied at each step v An LL parser consists of - Parser stack that holds grammar symbols: non-terminals and tokens - Parsing table that specifies the parser action - Driver function that interacts with parser stack, parsing table and scanner
next token
Parsing Table
Top-Down Parsing 10
Parsing Stack
Scanner
Parser Driver
Output
LL Parsing Actions
v The LL parsing actions are: - Match: to match top of parser stack with next input token - Predict: to predict a production and apply it in a derivation step - Accept: to accept and terminate the parsing of a sequence of tokens - Error: to report an error message when matching or prediction fails v Consider the following grammar: S ( S ) S |
Parser Stack
S (S)S S)S (S)S)S S)S)S )S)S S)S )S S Empty
Input
(())$ (())$ ())$ ())$ ))$ ))$ )$ )$ $ $
Parser Action
Predict S ( S ) S Match ( Predict S ( S ) S Match ( Predict S Match ) Predict S Match ) Predict S Accept
Parsing of ( ( ) ) Stack grows backward from right to left
Top-Down Parsing 11
Grammar Analysis: Nonterminals that Derive

v Grammar analysis is necessary to
-
Determine whether a grammar can be used in LL parsing - Construct the LL parsing table that defines the actions of an LL parser
v A common analysis is to determine which nonterminals derive

-
Nonterminals that derive are called nullable
v To determine which nonterminals derive

-
We use an iterative marking algorithm - First, nonterminals that derive directly in one step are marked - Nonterminals that derive in two, three, steps are found and marked - Continue until no more nonterminals can be marked as deriving
v Consider the following grammar

A B C D BbC| CcD| D d|
Top-Down Parsing 12
A, B, C, and D are all nullable B, C, and D derive directly A derives indirectly: A B C D C D D

Grammar Analysis: The First Set

v Suppose we have the following grammar:
- The RHS of the productions of
S do not begin with terminals - Parser has no immediate guidance which production to apply to expand S - We may follow all possible derivations of S as shown below S A B C D Aa | Dc| dA | fC | h | i Bb CA e b
Aa S Bb
Dca C Aa
hca ica f C Aa bAa
dA b eb
v We predict S A a when - First token is h, i, f, or b. First(Aa) = {h, i, f, b} v We predict S B b when - First token is d or e. First(Bb) = {d, e} v Otherwise, we have an error
Top-Down Parsing 13
Grammar Analysis: Determining the First Set

v Formally, First() = {a T | * a} - First() is the set of all terminals that can begin a sentential form of v If * then First() v To calculate First() we apply the following rules - First() = {} = - First(a) = {a} = a - First(A) = First(1) First(2) = A and A 1 | 2 | - First(A) = First(A) = A and A is NOT nullable - First(A) = (First(A) {}) First() = A and A + v Consider the following grammar:
S A B C ABCd e|f| g|h| p|q First(A) First(B) First(C) First(S) = {e, f, } = {g, h, } = {p, q} = First(ABCd) = (First(A){}) (First(B){}) First(Cd) = {e, f} {g, h} {p, q} = {e, f, g, h, p, q}
Top-Down Parsing 14
Grammar Analysis: The Follow Set

v Suppose we have the following grammar
-
We follow derivations of S as shown below AcB aA bBS

aAcB
S A A B B
-
SAcB
cB
cbB a Ac B cbBScbBAcB cbBcB c
v We predict A a A when
Next token is a because First(aA) = {a} Next token is c because Follow(A) = {c} Next token is b because First(b B S) = {b} Next token is a, c, or $ (end-of-file token) because Follow(B) = {a, c, $}
v We predict A when
-
v Similarly, we predict B b B S when

-
v We predict B when
-
Top-Down Parsing 15
Grammar Analysis: Determining the Follow Set

v Formally, Follow(A) = {a T | S + A a } - Follow(A) is the set of terminals that may follow A in any sentential form v If S + A then $ Follow(A) - If A is not followed by any terminal then it is followed by the end of file - The $ represents the end-of-file token v We compute Follow(A) using the following rules: A is the start symbol then $ Follow(A) - Inspect RHS of productions for all occurrences of A Let a typical production be B A
- If If is NOT nullable then add First() to Follow(A) l Any token that can begin a sentential form of can follow A If is or derives then add (First() {}) Follow(B) to Follow(A) l If vanishes in a given derivation then what follows A is what follows B l B A A what follows A is what follows B is First()
Examples on the First and Follow Sets

Example 1:
S A A B B AcB aA bBS
Nonterminals that derive are A and B
First(S) = First(AcB) = (First(A){}) First(cB) = {a, c} First(A) = First(aA) First() = {a, } First(B) = First(bBS) First() = {b, } Follow(S) Follow(A) Follow(B) Follow(S) = {$} Follow(B) = {c} = Follow(S) First(S) = {$, a, c} = {$, a, c}
Example 2:
E Q T R F TQ +TQ| TQ | FR * FR | / FR | ( E ) | id
Nonterminals that derive are Q and R

First(E) = First(TQ) = First(T) = First(FR) = First(F) = {( , id} First(Q) = {+ , , } First(R) = {* , / , } Follow(E) = {$ , )} Follow(Q) = Follow(E) = {$ , )} Follow(T) = (First(Q){}) Follow(E) Follow(Q) = {+, , $, )} Follow(R) = Follow(T) = {+, , $, )} Follow(F) = (First(R){}) Follow(T) Follow(R) = {*, /, +, , $, )}
Top-Down Parsing 17
Grammar Analysis: Determining the Predict Set

v The predict set of a production A is defined as follows: - If is NOT nullable then Predict(A ) = First() - If is Nullable then Predict(A ) = (First() {}) Follow(A) - This is the set of lookahead tokens that will cause the selection of A v Example on determining the predict set: E Q Q Q T R R R F F TQ +TQ TQ FR * FR / FR ( E ) id Predict E Predict Q Predict Q Predict Q Predict T Predict R Predict R Predict R Predict F Predict F TQ +TQ TQ FR * FR / FR ( E ) id = First(TQ) = First(T) = {( , id} = First(+TQ) = { + } = First(TQ) = { } = Follow(Q) = {$ , )} = First(FR) = First(F) = {( , id} = First(*FR) = { * } = First(/FR) = { / } = Follow(R) = {+ , , $ , )} ={(} = { id }
Top-Down Parsing 18
LL(1) Grammars
v Not all context-free grammars are suitable for LL parsing v CFGs suitable for LL(1) parsing are called LL(1) Grammars v A grammar is LL(1) if for productions with the same LHS A
A 1 | 2 | | n Predict(A i) Predict(A j) = for all i j

The predict sets of productions with same LHS are pairwise disjoint v The following grammar is LL(1) S A A B B AcB aA bBS Predict(A a A) = {a} Disjoint Predict(A ) = Follow(A) = {c} Predict(B b B S) = {b} Predict(B ) = Follow(B) = Follow(S) First(S) = Disjoint = {$} Follow(B) {a, c} = {$, a, c}
Top-Down Parsing 19
Constructing the LL(1) Parsing Table

v The predict sets can be represented in an LL(1) parse table
- The
rows are indexed by the nonterminals - The columns are indexed by the tokens v If A is a nonterminal and tok is the lookahead token then - Table[A][tok] indicates which production to predict - If no production can be used Table[A][tok] gives an error value v Table[A][tok] = A iff tok predict(A ) v Example on constructing the LL(1) parsing table: 1: 2: 3: 4: 5: S A A B B AcB aA bBS Predict(1) = {a, c} Predict(2) = {a} Predict(3) = {c} Predict(4) = {b} Predict(5) = {$, a, c}
a S A B 1 2 5 4 b c 1 3 5 5 $ Empty slots indicate error conditions
Top-Down Parsing 20
Constructing the LL(1) Parsing Table cont'd

v Here is a second example on constructing the parsing table 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: E Q Q Q T R R R F F TQ +TQ TQ FR * FR / FR ( E ) id Predict(1) Predict(2) Predict(3) Predict(4) Predict(5) Predict(6) Predict(7) Predict(8) Predict(9) Predict(10) = { ( , id } ={+} ={} ={$,)} = { ( , id } ={*} ={/} = {+, , $, )} ={(} = { id }
+ E Q 2 T R F 8
*
3
( 1
) 4
id $ 1 4 5 8 10
5 8 6 7 9 8
v Because the above grammar is LL(1) - A unique production number is stored in a table entry v Blank entries correspond to error conditions - In practice, special error numbers are used to indicate error situations
LL(1) Parser Driver Algorithm

v The LL(1) parser driver algorithm can be described as follows:
Token := scan( ) Stack.push(StartSymbol) while not Stack.empty( ) do X := Stack.pop( ) if terminal(X) if X = Token then Token := scan( ) else process a syntax error at Token end if else (* X is a nonterminal *) Rule := Table[X][Token] if Rule = X Y1 Y2 Yn then for i from n downto 1 do Stack.push(Yi) end for else process a syntax error at Token end if end if end while if Token = $ then accept parsing else report a syntax error at Token end if
Top-Down Parsing 22
Scanner
next token Output Parsing Stack
Parser Driver
Parsing Table
Tracing an LL(1) Parser

Consider the parsing of id * (id + id) $
1: 2: 3: 4: 5: E Q Q Q T + E Q 2 T R F
Top-Down Parsing 23
Parser Stack
E TQ FRQ id R Q RQ *FRQ FRQ (E )RQ E )RQ TQ )RQ FRQ )RQ id R Q ) R Q RQ )RQ Q )RQ + TQ)RQ TQ )RQ FRQ )RQ id R Q ) R Q RQ )RQ Q )RQ )RQ RQ Q Empty
Input
id*(id+id)$ id*(id+id)$ id*(id+id)$ id*(id+id)$ *(id+id)$ *(id+id)$ (id+id)$ (id+id)$ id+id)$ id+id)$ id+id)$ id+id)$ +id)$ +id)$ +id)$ id)$ id)$ id)$ )$ )$ )$ $ $ $
Parser Action
Predict E T Q Predict T F R Predict F id Match id Predict R * F R Match * Predict F ( E ) Match ( Predict E T Q Predict T F R Predict F id Match id Predict R Predict Q + T Q Match + Predict T F R Predict F id Match id Predict R Predict Q Match ) Predict R Predict Q Accept
TQ 6: + T Q 7: T Q 8: 9: FR 10: / ( 1
R R R F F
) 4
*FR / FR ( E ) id id $ 1 4 5 8 10
*
3
5 8 8 6 7 9 8
Stack grows backwards from right to left
The Problem of Left Recursion

v Left recursive grammars fail to be LL(1) or even LL(k)
-A
left recursive production puts an LL parser into infinite loop - If a left recursive production is predicted then
Nonterminal on LHS is replaced with RHS of production The same nonterminal will appear again on top of parser stack The same production is predicted again Iteration goes forever
v Left recursion is commonly used to - Make an operation left associative

Expr Expr addop Term | Term - Specify
a list of identifiers, statements, etc.
StmtList StmtList ; Statement | Statement
v We need to eliminate left recursion to make a grammar LL(1)

Eliminating Immediate Left Recursion

v The simplest case of left recursion is immediate left recursion Form: A A | - The above productions of A generate strings of the form n, n 0 - We introduce a new nonterminal and use right recursion as follows: A Atail Atail Atail |
- General
v In general, if many immediate left recursive productions exist - General Form: A A 1 | A 2 | | A n | 1 | 2 | | m - We introduce a new nonterminal and use right recursion A 1 Atail | 2 Atail | | m Atail Atail 1 Atail | 2 Atail | | n Atail | v For example: Expr Expr + Term | Expr Term | Term becomes: Expr Term Exprtail Exprtail + Term Exprtail | Term Exprtail |
Eliminating Indirect Left Recursion

v In some cases, left recursion may be indirect For example: A B | and BA | v We can do substitutions to make left recursion immediate v Consider the following grammar: ABa | Aa | c BBb | Ab | d v First, we remove the immediate left recursion of A A B a Atail | c Atail Atail a Atail | v Second, we eliminate the indirect left recursion of B A b B B b | B a Atail b | c Atail b | d v Finally, we remove the immediate left recursion of B B c Atail b Btail | d Btail Btail b Btail | a Atail b Btail |
Left Factoring of Common Prefixes

v Another problem to LL parsers is to have a common prefix v An if statement may have 2 production with a common prefix: IfStmt IfStmt if Expr then StmtList end if ; if Expr then StmtList else StmtList end if ;
v An LL(1) parser cannot predict which production to apply v The solution is use left factoring of the common prefix Form: A | | | - Left Factoring solution: A Atail Atail | | |
- General
v The left factoring of the two if statement productions: IfStmt IfStmtTail

Top-Down Parsing 27
if Expr then StmtList IfStmtTail else StmtList end if ; | end if ;


Top Down Parsing

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Top Down Parsing

Transféré par

Droits d'auteur :

Formats disponibles

Top-Down Parsing

Top-Down Parsing by Recursive-Descent

will match the token sequence generated by nonterminal A

Example on Recursive-Descent Parsing

Lookahead Token and Match Procedure

Syntax Tree Construction for Expressions

a tree node and returns a pointer to it

Syntax Tree Construction cont'd

Node Structure for Expression Trees

, , * , / , etc. Different for each operator

pointer to left child expression tree

Can point to a tree node, to a symbol node, or to a literal node - Right

Pointer: pointer to right child expression tree

Can point to a tree node, to a symbol node, or to a literal node

v The following fields are also important:

keeps track of line and position of each tree node

associates a type with each tree node

Type information is used to check the type of expressions

Tracing the Construction of a Syntax Tree

Syntax Tree Construction for if Statements

matching an else token as soon as encountered

v Syntax tree of if stmt is constructed bottom-up

Compiler Design Muhammed Mudawwar

Parsing of ( ( ) ) Stack grows backward from right to left

Grammar Analysis: Nonterminals that Derive

v A common analysis is to determine which nonterminals derive

Nonterminals that derive are called nullable

v To determine which nonterminals derive

v Consider the following grammar

A, B, C, and D are all nullable B, C, and D derive directly A derives indirectly: A B C D C D D

Grammar Analysis: The First Set

Compiler Design Muhammed Mudawwar

Grammar Analysis: Determining the First Set

Grammar Analysis: The Follow Set

We follow derivations of S as shown below AcB aA bBS

cbB a Ac B cbBScbBAcB cbBcB c

v Similarly, we predict B b B S when

Grammar Analysis: Determining the Follow Set

Examples on the First and Follow Sets

Nonterminals that derive are Q and R

Grammar Analysis: Determining the Predict Set

A 1 | 2 | | n Predict(A i) Predict(A j) = for all i j

Constructing the LL(1) Parsing Table

Compiler Design Muhammed Mudawwar

Constructing the LL(1) Parsing Table cont'd

LL(1) Parser Driver Algorithm

Compiler Design Muhammed Mudawwar

Tracing an LL(1) Parser

Stack grows backwards from right to left

Compiler Design Muhammed Mudawwar

The Problem of Left Recursion

v Left recursion is commonly used to - Make an operation left associative

a list of identifiers, statements, etc.

StmtList StmtList ; Statement | Statement

v We need to eliminate left recursion to make a grammar LL(1)

Eliminating Immediate Left Recursion

Eliminating Indirect Left Recursion

Left Factoring of Common Prefixes

v The left factoring of the two if statement productions: IfStmt IfStmtTail

if Expr then StmtList IfStmtTail else StmtList end if ; | end if ;

Vous aimerez peut-être aussi