Académique Documents
Professionnel Documents
Culture Documents
Where we are
T o t a l : = p r i c e + t a x ;
Lexical analyzer
Total
:=
price
tax
assignment id := id Expr + id
Parser
Grammars
How do you define a language? It is unlikely that you can enumerate all the sentences, programs, or documents
4/14/2014
More formally:
Words ={a, the, brown, friendly, good, book, refrigerator, dog, car, sings, eats, likes} Rules:
1) 2) 3) 4) 5) 6) 7) SENTENCE SUBJECT VERB OBJECT SUBJECT ARTICLE ADJECTIVE NOUN OBEJCT ARTICLE ADJECTIVE NOUN ARTICLE a | the| EMPTY ADJECTIVE brown | friendly | good | EMPTY NOUN book| refrigerator | dog| car VERB sings | eats | likes
4/14/2014
Derivation of a sentence
Derivation of a sentence the brown dog likes a good car
SENTENCE SUBJECT VERB ARTICLE ADJECTIVE NOUN VERB the brown dog VERB the brown dog likes the brown dog likes OBJECT OBJECT OBJECT ARTICLE ADJECTIVE NOUN a good car
Rules:
1) 2) 3) 4) 5) 6) 7) SENTENCE SUBJECT VERB OBJECT SUBJECT ARTICLE ADJECTIVE NOUN OBEJCT ARTICLE ADJECTIVE NOUN ARTICLE a | the| EMPTY ADJECTIVE brown | friendly | good | EMPTY NOUN book| refrigerator | dog| car VERB sings | eats | likes
4/14/2014 6
SUBJECT
VERB
OBJECT
ARTICLE
ADJ
NOUN
ARTICLE
ADJ
NOUN
The
brown
dog
likes
good
car
4/14/2014
SENTENCE
SUBJECT
VERB
OBJECT
ARTICLE
ADJ
NOUN
ARTICLE
ADJ
NOUN
The
brown
dog
likes
good
car
4/14/2014
Types of parsers
Top down
Repeatedly rewrite the start symbol Find the left-most derivation of the input string Easy to implement
Bottom up
Start with the tokens and combine them to form interior nodes of the parse tree Find a right-most derivation of the input string Accept when the start symbol is reached
4/14/2014
4/14/2014
10
Recursive definition
Number of sentence can be generated:
ARTICLE 3* ADJ 4* NOUN 4* VERB 3* ARTICLE 3* ADJ 4* NOUN 4* sentences = 6912
How can we define an infinite language with a finite set of words and finite set of rules? Using recursive rules:
SUBJECT/OBJECT can have more than one adjectives:
1) SUBJECT ARTICLE ADJECTIVES NOUN 2) OBEJCT ARTICLE ADJECTIVES NOUN 3) ADJECTIVES ADJECTIVE | ADJECTIVES ADJETIVE
Example sentence:
the good brown dog likes a good friendly book
4/14/2014
11
Chomsky hierarchy
Noam Chomsky hierarchy is based on the form of production rules General form
1 2 3 n 1 2 3 m Where and are from terminals and non terminals, or empty.
4/14/2014
13
Chomsky hierarchy
Unrestricted Context-sensitive Context-free Regular
Chomsky level 1 (Context sensitive grammar) Chomsky level 2 (Context free grammar) Chomsky level 3 (Regular grammar/expression)
The more powerful the grammar, the more complex the program required to recognize the legal inputs.
4/14/2014 14
Grammar examples
Regular grammar
L=,w|w consists of arbitrary number of as and bsGrammar: Sa | b | a S| bS Example sentence: abb is a legal sentence in this language Derivation:
SaS abS abb
4/14/2014
15
4/14/2014
16
4/14/2014
17
Recognize blocks
{ statement1; { statement2; statement3; } }
4/14/2014
18
Grammars
Formal definition, Chomsky hierarchy Regular grammar Context free grammar
4/14/2014
19
Lexical analyzer
syntax analyzer
Semantic analyzer
generate code
improve code
object code
Synthesis
20
21
Consider Lpal ={w|w is a palindrome on symbols 0 and 1} Example: 1001, 11 are palindromes How to represent palindrome of any length? Basis: , 0 and 1 are palindromes;
1. P 2. P0 3. P1
0 and 1 are terminals P is a variable (or nonterminal, or syntactic category) P is also the start symbol. 1-5 are productions (or rules)
23
Example:
Gpal=({0,1}, {P}, A,P), where A ={P, P0, P1, P0P0, P1P1}
24
Chomsky level 1 (Context sensitive grammar) Chomsky level 2 (Context free grammar) Chomsky level 3 (Regular expression)
27
BNF format
lhs ::= rhs Quote terminals or non-terminals
<> to delimit non-terminals, bold or quotes for terminals, or as is
vertical bar for alternatives There are many different notations, here is an example
opt-stats ::= stats-list | EMPTY . stats-list ::= statement | statement ; stats-list .
28
EBNF
An extension of BNF, use regular- expression-like constructs on the righthand-side of the rules:
write [A] to say that A is optional
Write {A} or A* to denote 0 or more repetitions of A (ie. the Kleene closure of A).
Using EBNF has the advantage of simplifying the grammar by reducing the need to use recursive rules. Both BNF and EBNF are equivalent to CFG Example: BNF
block ::= ,' opt-stats - opt-stats ::= stats-list | EMPTY stats-list ::= statement | statement ; stats-list
EBNF
block ::= , *stats-list+ - stats-list ::= statement ( ; statement )*
29
1
0
Context-Sensitive
Unrestricted (or Free)
Linear-Bounded Automaton
Turing Machine
Closer to machine
More expressive
30
the | means OR So the rules can be simplified as: (P1-3) Exp exp + digit | exp - digit | digit (p4) digit0|1|2|3|4|5|6|7|8|9
31
Derivation
Example of derivation
We can derive the string 9 - 5 + 2 as follows:
(P1 : exp exp + digit) (P2 : exp exp digit) (P3 : exp digit) (P4 : digit 9) (P4 : digit 5) (P4 : digit 2)
exp exp + digit exp - digit + digit digit - digit + digit 9 - digit + digit 9 - 5 + digit 9-5+2 exp + 9 - 5 + 2 exp * 9 - 5 + 2
33
Parse tree
This derivation can also be represented via a Parse Tree.
(p1) expexp+digit (p2) expexp-digit (p3) expdigit (p4) digit0|1|2|3|4|5|6|7|8| 9
exp digit 9
digit
5
35
Tree terminologies
Trees are collections of nodes, with a parent-child relationship.
A node has at most one parent, drawn above the node; A node has zero or more children, drawn below the node.
There is one node, the root, that has no parent. This node appears at the top of the tree Nodes with no children are called leaves. Nodes that are not leaves are interior nodes.
36
37
38
39
Applications of CFG
Parser and parser generator; Markup languages.
40
Ambiguity of Grammar
What is ambiguous grammar; How to remove ambiguity;
41
Derivations of 9+5*2
expr expr + expr expr + expr * expr + 9+5*2 expr expr*expr expr+expr*expr + 9+5*2
Ambiguity of a grammar
Ambiguous grammar: produce more than one parse tree
expr expr + expr | exp * expr | digit
expr expr expr 9 (9 + + expr 5 5) * expr 2 expr 9 9 + + (5 expr 5 * * 2) expr expr expr 2
* 2
Based on the existence of two derivations, we can not deduce that the grammar is ambiguous; It is not the multiplicity of derivations that causes ambiguity; It is the existence of more than one parse tree. In this example, the two derivations will produce the same tree
E E D 3 + E D 4
44
A string has two parser trees iff it has two distinct leftmost derivations (or two rightmost derivations).
45
Remove ambiguity
Some theoretical results (bad news)
Is there an algorithm to remove the ambiguity in CFG?
the answer is no
In practice, there are well-known techniques to remove ambiguity Two causes of the ambiguity in the expr grammar
the precedence of operator is not respected. * should be grouped before +; a sequence of identical operator can be grouped either from left or from right. 3+4+5 can be grouped either as (3+4)+5 or 3+(4+5).
46
Remove ambiguity
Enforcing precedence by introducing several different variables, each represents those expressions that share a level of binding strength
factor: digit is a factor term: factor * factor*factor .... is a term expression: term+term+term ... is an expression
E
T
T
+ F D *
F
D
47
if
expr
then if stmt
if
expr if
then expr
stmt
else then
stmt stmt
48
49
stmt unmatched_stmt
if
expr
then
stmt