Académique Documents
Professionnel Documents
Culture Documents
v A parser is top-down if it discovers a parse tree top to bottom - A top-down parse corresponds to a preorder traversal of the parse tree - A leftmost derivation is applied at each derivation step v Top-down parsers come in two forms - Predictive Parsers
Predict the production rule to be applied using lookahead tokens - Backtracking Parsers Will try different productions, backing up when a parse fails
v Predictive parsers are much faster than backtracking ones - Predictive parsers operate in linear time will be our focus - Backtracking parsers operate in exponential time will not be considered v Two kinds of top-down parsing techniques will be studied - Recursive-descent parsing - LL parsing
Top-Down Parsing 1 Compiler Design Muhammed Mudawwar
v The RHS of a production of A specifies the code for procedure A - Terminals are matched against input tokens - Nonterminals produce calls to corresponding procedures v If multiple production rules exist for a nonterminal A
- One - The
of them is predicted based on a lookahead token predicted rule is the only one that applies
The lookahead token is the next input token that should be matched Other rules will NOT be tried This is a predictive parsing technique, not a backtracking one
v A syntax error is detected when - Next token in the input sequence does NOT match the expected token
Top-Down Parsing 2 Compiler Design Muhammed Mudawwar
Top-Down Parsing 3
function expr( ) : TreePtr function term( ) : TreePtr begin begin left := term( ); left := factor( ); while token = ADDOP do while token = MULOP do op := ADDOP.op ; match(ADDOP); op := MULOP.op; match(MULOP); right := term( ); right := factor( ); left := new node(op, left, right); left := new node(op, left, right); end while; end while; return left; return left; end expr; end term;
Top-Down Parsing 5 Compiler Design Muhammed Mudawwar
searches a symbol table for a given name - lookup function returns a pointer to an identifier symbol in symtable - Identifiers are inserted into symbol table when parsing a declaration - The NUM.ptr is a pointer to a literal symbol in the literal table - Literal constants are inserted into the literal table when scanned
function factor( ) : TreePtr begin case token of (: match((); ptr := expr( ); match()); ID: ptr := symtable.lookup(ID.name); match(ID); NUM: ptr := NUM.ptr; match(NUM); else syntax_error(token, Expecting a number, an identifier, or ( ); end case; return ptr; end factor;
Top-Down Parsing 6 Compiler Design Muhammed Mudawwar
For symbol table entries, the node operator is ID For literal table entries, the node operator is NUM Other node operators can be added to statements and various types of literals - Left Pointer:
construction of the syntax tree for expressions is bottom up - Tracing verifies the precedence and associativity of operators v The tree construction of a b + c * (b + d) is given below ptr8 - ptr 1 symtable.lookup(a) - ptr 2 symtable.lookup(b) + ptr3 ptr7 - ptr 3 new node( , ptr1 , ptr2 ) - ptr 4 symtable.lookup(c) * - ptr 2 symtable.lookup(b) ptr1 ptr2 ptr4 ptr6 - ptr 5 symtable.lookup(d) ID a ID b ID c + - ptr 6 new node(+ , ptr2 , ptr5 ) ptr2 ptr5 - ptr 7 new node(* , ptr 4 , ptr6 ) - ptr 8 new node(+ , ptr3 , ptr7 ) ID d
Top-Down Parsing 8 Compiler Design Muhammed Mudawwar
LL Parsing
v Uses an explicit stack rather than recursive calls to perform a parse v LL(k) parsing means that k tokens of lookahead are used
- The
first L means that token sequence is read from left to right - The second L means a leftmost derivation is applied at each step v An LL parser consists of - Parser stack that holds grammar symbols: non-terminals and tokens - Parsing table that specifies the parser action - Driver function that interacts with parser stack, parsing table and scanner
next token
Parsing Table
Top-Down Parsing 10
Parsing Stack
Scanner
Parser Driver
Output
LL Parsing Actions
v The LL parsing actions are: - Match: to match top of parser stack with next input token - Predict: to predict a production and apply it in a derivation step - Accept: to accept and terminate the parsing of a sequence of tokens - Error: to report an error message when matching or prediction fails v Consider the following grammar: S ( S ) S |
Parser Stack
S (S)S S)S (S)S)S S)S)S )S)S S)S )S S Empty
Input
(())$ (())$ ())$ ())$ ))$ ))$ )$ )$ $ $
Parser Action
Predict S ( S ) S Match ( Predict S ( S ) S Match ( Predict S Match ) Predict S Match ) Predict S Accept
Compiler Design Muhammed Mudawwar
Top-Down Parsing 11
Determine whether a grammar can be used in LL parsing - Construct the LL parsing table that defines the actions of an LL parser
We use an iterative marking algorithm - First, nonterminals that derive directly in one step are marked - Nonterminals that derive in two, three, steps are found and marked - Continue until no more nonterminals can be marked as deriving
S do not begin with terminals - Parser has no immediate guidance which production to apply to expand S - We may follow all possible derivations of S as shown below S A B C D Aa | Dc| dA | fC | h | i Bb CA e b
Aa S Bb
Dca C Aa
hca ica f C Aa bAa
dA b eb
v We predict S A a when - First token is h, i, f, or b. First(Aa) = {h, i, f, b} v We predict S B b when - First token is d or e. First(Bb) = {d, e} v Otherwise, we have an error
Top-Down Parsing 13
Top-Down Parsing 14
S A A B B
-
SAcB
cB
v We predict A a A when
Next token is a because First(aA) = {a} Next token is c because Follow(A) = {c} Next token is b because First(b B S) = {b} Next token is a, c, or $ (end-of-file token) because Follow(B) = {a, c, $}
Compiler Design Muhammed Mudawwar
v We predict A when
-
v We predict B when
-
Top-Down Parsing 15
Example 2:
E Q T R F TQ +TQ| TQ | FR * FR | / FR | ( E ) | id
Top-Down Parsing 17
Top-Down Parsing 18
LL(1) Grammars
v Not all context-free grammars are suitable for LL parsing v CFGs suitable for LL(1) parsing are called LL(1) Grammars v A grammar is LL(1) if for productions with the same LHS A
Top-Down Parsing 19
rows are indexed by the nonterminals - The columns are indexed by the tokens v If A is a nonterminal and tok is the lookahead token then - Table[A][tok] indicates which production to predict - If no production can be used Table[A][tok] gives an error value v Table[A][tok] = A iff tok predict(A ) v Example on constructing the LL(1) parsing table: 1: 2: 3: 4: 5: S A A B B AcB aA bBS Predict(1) = {a, c} Predict(2) = {a} Predict(3) = {c} Predict(4) = {b} Predict(5) = {$, a, c}
a S A B 1 2 5 4 b c 1 3 5 5 $ Empty slots indicate error conditions
Top-Down Parsing 20
+ E Q 2 T R F 8
*
3
( 1
) 4
id $ 1 4 5 8 10
5 8 6 7 9 8
v Because the above grammar is LL(1) - A unique production number is stored in a table entry v Blank entries correspond to error conditions - In practice, special error numbers are used to indicate error situations
Top-Down Parsing 21 Compiler Design Muhammed Mudawwar
Scanner
next token Output Parsing Stack
Parser Driver
Parsing Table
Parser Stack
E TQ FRQ id R Q RQ *FRQ FRQ (E )RQ E )RQ TQ )RQ FRQ )RQ id R Q ) R Q RQ )RQ Q )RQ + TQ)RQ TQ )RQ FRQ )RQ id R Q ) R Q RQ )RQ Q )RQ )RQ RQ Q Empty
Input
id*(id+id)$ id*(id+id)$ id*(id+id)$ id*(id+id)$ *(id+id)$ *(id+id)$ (id+id)$ (id+id)$ id+id)$ id+id)$ id+id)$ id+id)$ +id)$ +id)$ +id)$ id)$ id)$ id)$ )$ )$ )$ $ $ $
Parser Action
Predict E T Q Predict T F R Predict F id Match id Predict R * F R Match * Predict F ( E ) Match ( Predict E T Q Predict T F R Predict F id Match id Predict R Predict Q + T Q Match + Predict T F R Predict F id Match id Predict R Predict Q Match ) Predict R Predict Q Accept
TQ 6: + T Q 7: T Q 8: 9: FR 10: / ( 1
R R R F F
) 4
*FR / FR ( E ) id id $ 1 4 5 8 10
*
3
5 8 8 6 7 9 8
left recursive production puts an LL parser into infinite loop - If a left recursive production is predicted then
Nonterminal on LHS is replaced with RHS of production The same nonterminal will appear again on top of parser stack The same production is predicted again Iteration goes forever
v In general, if many immediate left recursive productions exist - General Form: A A 1 | A 2 | | A n | 1 | 2 | | m - We introduce a new nonterminal and use right recursion A 1 Atail | 2 Atail | | m Atail Atail 1 Atail | 2 Atail | | n Atail | v For example: Expr Expr + Term | Expr Term | Term becomes: Expr Term Exprtail Exprtail + Term Exprtail | Term Exprtail |
Top-Down Parsing 25 Compiler Design Muhammed Mudawwar
v An LL(1) parser cannot predict which production to apply v The solution is use left factoring of the common prefix Form: A | | | - Left Factoring solution: A Atail Atail | | |
- General