Vous êtes sur la page 1sur 57

Synt ax:

3. Syntax Analysis

t he

way

in

which

t okens are

put

t oget her

t o

f orm

expr essi ons, st at ement s, or bl ock s of st at ement s.

D The rules governing t he f ormat ion of st at ement s in a programming language.

Sy n t ax an al y si s: t he t ask concer ned w i t h f i tt i ng a sequence of

t okens int o

a specif ied synt ax.

Par sing: To break a sent ence down int o it s component part s wit h an explanat ion of t he f orm, f unct ion, and synt act ical relat ionship of each part.

The synt ax of a programming language is usually given by t he grammar rules of a cont ext free grammar (CFG).

1

Role of a Parser

The synt ax analyzer (parser) checks whet her a given source program satisfies the rules implied by a CFG or not.

If it sat isf ies, t he parser

program.

creat es t he parse t ree of t hat

Ot her w i se, t he p ar ser gi ves t he err or messages.

ACFG:

gives a precise synt act ic specif icat ion

programming language.

of a

A grammar can be direct ly convert ed in t o a parser by some t ools (yacc).

Role of a Par ser

Parse t ree

next char lexical get next analyzer char get next t oken Source Progra symbol m
next char
lexical
get next
analyzer
char
get next
t oken
Source
Progra
symbol
m
table
Lexical
(Contains a record
Error
for each identifier)

next t oken

Syntax analyzer

Syntax

analyzer

Syntax analyzer
symbol m table Lexical (Contains a record Error for each identifier) next t oken Syntax analyzer
symbol m table Lexical (Contains a record Error for each identifier) next t oken Syntax analyzer
symbol m table Lexical (Contains a record Error for each identifier) next t oken Syntax analyzer
Synt ax Error

Synt ax

Error

Synt ax Error

Parser

The parser can be categorized into two groups:

Top-down parser

The parse t ree is creat ed t op t o bott om, st art ing f rom

t he root t o leaves.

Bottom-up parser

The parse t ree is creat ed bott om t o t op, st art ing f rom

t he leaves t o root .

Both top-down and bottom-up parser scan the input f rom lef t t o right (one symbol at a t ime).

Efficient top-down and bottom-up parsers can be

implemented by making grammar.

use

of

LL f or t op-down par si ng

LRfor bottom-up parsing

context-free-

4

Context free grammar (CFG)

A context-free

grammar is a specification

for the

syntactic structure of a programming language.

Context-free grammar has 4-tuples:

G = (T, N, P, S) where

T is a finite set of terminals (a set of tokens)

N is a f init e set of non-t er minals (synt act ic variables)

P is a finite set of productions of the

form A→α where A is non-terminal

and α is a strings of terminals and non-

t erminals (including t he empt y st ring)

D

S N is a designat ed t erminal symbols)

st art

symbol (one of t he non-

Exam p l e : gr amm ar f o r si m p l e ar i t h m e t i c

exp r essi ons

ex pr essi on ex pr essi on + t er m ex pr essi on ex pr essi on - t er m ex pr essi on t er m

t

er m

t

er m

t er m * f act or

t er m / f act or

t er m f act or

f

act or ( ex pr essi on )

f

act or i d

Terminal symbols

i d + - * /

( )

Non-terminals

ex pr essi on t er m

fact or

St art symbol ex pr essi on

Notational Conventions Used

Terminals:

Lowercase lett ers early in t he alphabet , such as a, b, c.

Operat or symbols such as +, *, and so on.

Punct uat ion symbols such as parent heses, comma, and so on.

The digits

Boldf ace st rings such as i d or i f , each of which represent single t erminal symbols.

,9.

Non-terminals:

Uppercase lett ers early in t he alphabet , such as A, B, C.

The lett er

Lowercase, it alic names such as ex p r or st mt .

Uppercase lett ers may be used t o represent non-t erminals f or the constructs.

S is usually t he st art symbol.

expr, t erm, and f act or are represent ed by E, T, F

Notational Conventions Used

Grammar symbols

Uppercase late lett ers lat e in t he alphabet , such as X, Y, Z, that is, either non-terminals or terminals.

Strings of terminals.

Lowercase lett ers lat e in t he alphabet , mainly u, v, x, y T*

Strings of grammar symbols.

Lowercase Greek lett ers, α,

β, γ (NT)*

Aset of productions Aα 1 , Aα 2 ,

, Aα k wit h a common

head A (cal l t hem A-product ions),

may be writt en

Aα 1 | α 1 , α 2

α 2 | |

α k

, α k are the alternativesfor A.

The head of t he f irst product ion is t he st art symbol .

EE+ T |

E- T I T

T T * F I T /

id

F ( E) |

F I F

Derivation

A derivat ion is a sequence of replacement s of st ruct ure names by choices on t he right hand sides of grammar rules.

Examp l e :

E E + E |

E ( E)

Eid

E E |

E * E |

E /

E |

- E

E E + E means t hat E + E is derived f rom E

- we can replace E by E + E

- we have t o have a product ion rule E E+E in our grammar.

EE+Eid+Eid+id means t hat a sequence of replacement s of non-t erminal symbols is called a derivation of id+id from E . In general The one-st ep derivat ion is def ined by

α A β α γ β if t here is a product ion rule A → γ in our grammar

Where α and β are arbitrary strings of terminal and non- terminal symbols. α 1 => α 2 =>. => α n (α n is derived f rom α 1 or α 1 derivesα n )

9

If we

Derivat ion

always choose

t he lef t -most non-t erminal in each

derivat ion st ep, t his derivat ion is called lef t -most derivat ion.

Exampl e: E=>-E=>-(E)=>-(E+E)=>-(i d+E)=>-(i d+i d)

If we always choose t he right -most non-t erminal in each

derivation step, this is called right-most derivation.

Exampl e: E=>-E=>-(E)=>-(E+E)=>-(E+i d)=>-(i d+i d)

The top-down parser try to find the left-most derivation of the given source program.

The bottom-up parser try to find right-most derivation of the

given source program in the reverse order.

1

0

A

parse

tree

derivation

Parse t ree

is

a

graphical

representation

of

a

It filters out the order in which productions are applied to replace non-terminals.

A parse tree corresponding to a derivation is a labeled tree in which:

the interior nodes are labeled by non-terminals,

the leaf nodes are labeled by terminals, and

the children of each internal node represent the replacement of the associated non-terminal in one step of the derivation.

Parse t ree and Derivat ion

Grammar

Parse t ree and Derivat ion Grammar Lets examine this derivation:   E → E+ E|

Lets examine this derivation:

 
 
 

EE+ E|

EE |

( E) |

- E|

id

 
 
 

E-E-(E) -(E+ E) -(id + id)

E

  E ⇒ -E ⇒ -(E) ⇒ -(E+ E) ⇒ -(id + id) E E -

E

- E
-
E
E ⇒ -E ⇒ -(E) ⇒ -(E+ E) ⇒ -(id + id) E E - E
 

E

  E
 
 
 

-

E

 
 
 
 

(

E

)

 

E

  E E

E

 
 
 
 
 
 

-

E

-

E

 
 
 
 
 

(

E

)

(

E

)

E     - E - E   ( E ) ( E ) E +
E     - E - E   ( E ) ( E ) E +
E     - E - E   ( E ) ( E ) E +
E     - E - E   ( E ) ( E ) E +
E + E E + E

E

+

E

E

+

E

 

id

id

parse tree

parse tree

This is a top-down derivation because we start building the parse tree at the top

This is a top-down derivation because we start building the parse tree at the top

Exe r c i se

a) Usi ng t he gr ammar bel ow, dr aw a par se t r ee f or t he following string:

SE

E id

( (

| (

E. E )

| (

L )

| ( )

L L E

| E

id .

id

) id

( id

)

(

(

) )

)

b) Gi v e a r i ght m ost d er i v at i on f or t he st r i ng gi v en i n ( a) .

L E | E id . id ) id ( id ) ( ( ) )

13

A grammar

Ambiguity

produces more t han one parse t ree f or

a

sent ence is called as an ambiguous grammar.

producesmore than one leftmost derivation or

more than one rightmost derivation for the same sent ence.

We

during t he design phase of t he compiler. An unambiguous grammar should be writ t en t o eliminat e t he ambiguit y.

Ambiguous grammars (b/ c of ambiguous operat ors) can be disambiguat ed according t o t he precedence and associ at i vel y r ul es.

grammar

should

eliminat e

t he

ambiguit y

in

t he

Ambiguity: Example

Exampl e: The

ar i t hmet i c expr essi on

grammar E E + E |

E * E |

(

E ) |

id

permits two distinct leftmost derivations for the sent ence id + id * id:

E

(a)

=> E + E => id + E

=> id + E * E

=>

=> i d + i d * i d

i d + i d * E

(b)

E => E * E => E + E * E

=> id + E * E

=> i d + i d * E

=> i d + i d * i d

Ambiguit y: example

Ambiguit y: example According t o t he grammar, bot h are correct . A grammar

According t o t he grammar, bot h are correct .

A grammar t hat produces more t han one

parse t ree f or any input sent ence is said t o be an ambiguousgrammar.

Eliminat ion of ambiguit y

Pr eced ence/ Associ at i on

These t wo derivat ionspoint out a problem wit h t he grammar:

The grammar do not have not ion of precedence, or implied order of evaluation

To add precedence

Create a non-terminal for each level of precedence

Isolat e t he corresponding part of t he grammar

For ce t he par ser t o r ecogni ze hi gh pr ecedence sub expr essi ons f i r st

For al gebr ai c expr essi ons

Multiplication and division, first (level one)

Subt ract ion and addit ion, next (l evel t wo)

To add associ at i on

Lef t - associ at i ve : The next - l evel ( hi gher ) non- t er mi nal pl aces at t he last of a production

17

Elimination of ambiguity

To disambiguat e t he grammar :

E → E+ E| E ∗ E| ( E) | id

EE+ E| EE|

( E)

|

id

we can use precedence of operat ors as

follows:

* Higher precedence

+ Lower precedence

( l ef t associ at i ve) ( l ef t associ at i ve)

We get t he f ollowing unambiguous grammar:

 
 
 

EE + T |

T

id + id * id

T T F |

F

F ( E ) |

id

Left Recursion

 
 
 

EE + T |

T

Consider t he grammar:

T

T F |

F

F

( E ) |

id

 
 

A t op-down parser might loop f orever when parsing

an expr essi on usi ng t hi s gr ammar

E E E E E + T E + T E + T E +
E
E
E
E
E
+
T
E +
T
E
+
T
E
+ T
E +
T
E
+ T
orever when parsing an expr essi on usi ng t hi s gr ammar E E

Elimination of Left recursion

A grammar is lef t recursive, if it has a non-t erminal A

such t hat t here is a derivat ion

A=> +

f or some st r i ng α.

Top-down par sing met hods cannot handle lef t -

recursive grammar.

so a transformation

needed.

that

eliminates

left-recursion

is

To eliminate left recursion for single production

A | β could be r epl aced by t he nonl ef t - r ecur si ve

productions

A β A

Aα A|

ε

Elimination of Left recursion

This left-recursive grammar:

  E → E + T | T T → T ∗ F | F
 

EE + T |

T

T

T F |

F

F

( E ) |

id

  E → E + T | T T → T ∗ F | F F
  E → E + T | T T → T ∗ F | F F

Can be re-writt en t o eliminat e t he immediat e lef t recursion:

E → TE ’ E ’ → +TE ’ | ε T → FT ’

E TE

E+TE|

ε

T FT

T→∗FT| ε

F ( E) |

id

Exercise: Parse id + id * id using the non-left recursive grammar above using left-most derivation.

Top-Down and Bottom-Up

Top-down parsers:

Parsers

St art s const ruct ing t he parse t ree at t he t op (root ) of t he t ree and move down t owards t he leaves.

Easy t o implement by hand, but work wit h rest rict ed grammars.

example: Recursive Decent Parser

Bottom-up parsers:

build t he nodes on t he bott om of t he parse t ree f irst .

Suit abl e f or aut omat ic parser generat ion, handl e a l arger

cl ass of gr ammar s. examples: shift -reduce parser (or LR(k) parsers)

Top-down (LL) parsing

Re cu r si v e De sce n t Par si n g ( RDP)

This method of top-down parsing can be considered as

an attempt to find the left most derivation for an input

string. It may involve backtracking. To construct the parse tree using RDP:

o we creat e one node t ree consist ing of S.

t wo point ers, one f or t he t ree and one f or t he input , will

b e used t o indicat e where t he parsing process is.

init ially,

t hey will

respectively.

be on

S and

t he

f irst

input symbol,

t hen

we

use

t he

f irst

S-product ion t o expand t he t ree.

The t ree point er

will

be posit ioned

on

t he

lef t

most

symbol of t he newly creat ed sub-t ree.

Recursive Descent Parsing (RDP)

by the tree pointer matches

that of the symbol pointed by the input pointer, both

pointers are moved to the right. whenever the tree pointer points on a non-terminal, we expand it using the first production of the non- terminal.

as the symbol pointed

whenever

the

pointers

point

on

different

terminals, the production

that

was

used

is

not

correct, thus

another

production

should

be

used.

We

have

to

go

back

to

the step

just

before

we

replaced

the

non-terminal

and

use

another production.

 

if

we

reach

the

end

of

the

input

and

the

tree

pointer passes

the

last

symbol

of

the

tree,

we

Exampl e:

G:

RDP

S cAd

Aab| a

Draw t he parse t ree for t he input st ring cad using t he above met hod.

Exercise:

Consider t he following grammar:

SA

A A+ A| B++

B y

Draw t he parse t ree for t he input

y+++y++

Home work:

Convert the grammar into non-left recursive and draw the parse tree using RDP

25

Exe r c i se

Using t he grammar below, draw a parse t ree for t he following string using RDP algorithm:

SE

E

id

( (

| (

E. E )

| (

L )

| ( )

L L E E

|

id .

id

) id

( id

)

(

(

) )

)

Bottom-Up (LR) Parser

Abottom-up parser, or a shift-reduce parser, begins

at t he leaves and works up t o t he t op of t he t ree.

The reduction steps trace a rightmost derivation

on reverse.

Consider t he Grammar:

  S → aABe A → Abc | b B → d
  S → aABe A → Abc | b B → d
 

S

aABe

A

Abc |

b

B → d

B

d

B → d

We want t o parse t he input st ring abbcde.

Bottom-Up Parser: Simulation

a b b c d e $
a
b b
c d
e $

INPUT:

Production S→ aABe A→ Abc Bottom-Up Parsing Program A→ b B → d I NPUT:
Production
S→ aABe
A→ Abc
Bottom-Up Parsing
Program
A→ b
B
→ d
I NPUT:
a
b
b
c
d
e
$
Pr
oduct ion
S
→ aABe
A → Abc
Bot t om- Up Par sing
Program
A → b
B
→ d

OUTPUT:

d e $ Pr oduct ion S → aABe A → Abc Bot t om- Up

OUTPUT:

A

b

28

Bottom-Up Parser: Simulation

OUTPUT: I NPUT: a A b c d e $ Pr oduct ion S →
OUTPUT:
I
NPUT:
a
A
b
c
d
e
$
Pr
oduct ion
S
→ aABe
A
→ Abc
Bot t om- Up Par sing
Program
A
A → b
B → d
b
OUTPUT:
I
NPUT:
a
A
b
c
d
e
$
Pr
oduct ion
S
→ aABe
A
→ Abc
Bot t om- Up Par sing
Program
A
A → b
B → d
b
We are not reducing here in t his example. A parser would reduce, get st

We are not reducing here in t his example.

A parser would reduce, get st uck and t hen backt rack!

29

Bottom-Up Parser: Simulation

I NPUT:

a A b c d e $
a
A b
c d
e $
Bot t om- Up Par sing Program
Bot t om- Up Par sing
Program

Pr oduct ion

S → aABe A → Abc A → b B → d Pr oduct ion

S

aABe

S → aABe A → Abc A → b B → d Pr oduct ion S

A Abc

S → aABe A → Abc A → b B → d Pr oduct ion S

A b

S → aABe A → Abc A → b B → d Pr oduct ion S

B d

S → aABe A → Abc A → b B → d Pr oduct ion S

Pr oduct ion

S

aABe

S → aABe A → Abc A → b B → d Pr oduct ion S

A Abc

A

b

B

d

S → aABe A → Abc A → b B → d Pr oduct ion S
I NPUT: a A d e $ Bot t om- Up Par sing Program
I
NPUT:
a
A
d
e
$
Bot t om- Up Par sing
Program

OUTPUT:

 

A

 
 
 

A

b

c

b

OUTPUT:

 
 

A

 
 
 

A

b

c

b

30

Bottom-Up Parser: Simulation

I NPUT: a A d e $ Bot t om- Up Par sing Program
I
NPUT:
a
A
d
e
$
Bot t om- Up Par sing
Program

Pr oduct ion

S → aABe A → Abc A → b B → d Pr oduct ion

S aABe

A Abc

S → aABe A → Abc A → b B → d Pr oduct ion S

A b

B d

S → aABe A → Abc A → b B → d Pr oduct ion S

Pr oduct ion

S

aABe

S → aABe A → Abc A → b B → d Pr oduct ion S

A Abc

A

b

B

d

S → aABe A → Abc A → b B → d Pr oduct ion S
I NPUT: a A B e $ Bot t om- Up Par sing Program
I
NPUT:
a
A
B
e
$
Bot t om- Up Par sing
Program
d I NPUT: a A B e $ Bot t om- Up Par sing Program OUTPUT:

OUTPUT:

 

A

B

 
 
 

A

b

c

d

b

OUTPUT:

 
 

A

B

 
 
 

A

b

c

d

b

31

Bottom-Up Parser: Simulation

I NPUT:

a A B e $
a
A B
e $

OUTPUT:

S Pr oduct ion a A B e S → aABe Bot t om- Up
S
Pr
oduct ion
a
A
B
e
S
→ aABe
Bot t om- Up Par sing
A
→ Abc
Program
A
b
c
d
A
→ b
B → d
b
OUTPUT:
I
NPUT:
S
$
S
Pr
oduct ion
B
e
a
A
S
→ aABe
A
→ Abc
Bot t om- Up Par sing
Program
A
b
c
d
A
→ b
B → d
b
This parser is known as an LRParser because
32
it scans the input from Left to right, and it constructs
a Rightmost derivation in reverse order.

Bottom-up parser (LRparsing)

SaABe

AAbc |

B d

b

abbcde aAbcde aAde aABe S

At

each step,

we have to find

α such that

α is a

subst ring of t he sent ence and replace α by A, where

Aα

St ack implement at ion of shif t / reduce

parsing

In LR parsing t he t wo maj or problems are:

locat e t he

locat e t he

subst ring t hat is t o be reduced

product ion t o use

Ashift/ reduce parser operates:

By shif t ing zero or more input int o t he st ack unt il t he right side of the handle is on top of the stack.

The parser t hen replaces handle by t he non-t erminal of the production.

This is repeat ed unt il t he st art symbol is in t he st ack and t he input is empt y, or unt il error is det ect ed.

Stack implementation of shift/ reduce parsing

Four act i ons ar e possib l e :

shif t :

t he next

input

is shif t ed

on t o t he t op

of

the stack

reduce:

the

parser

knows

the

right

end

of

the handle is at the top of the stack. It should

then decide what non-terminal should replace that substring

accept : t he par ser announces successf ul

completion of parsing

error: the parser discovers a syntax error

Synt ax error handl ing

If a compiler had to process only correct programs, its design and implementation would be simplified greatly. However, a compiler is expected to assist the programmer in locating and tracking down errors that

inevitably creep into programs, despite the

programmer's best efforts.

How if spoken languages had the same requirements for syntactic accuracy as computer languages?

Synt ax error handl ing

Common

programming

errors

can

occur diff erent levels:

Lexical errors include missing quotes around text ,

mi ssp ell i ngs of k eyw or d s, or op er at or s, : E. g. , e b i gi n

instead of begin

Synt act i c err or s i ncl ud e mi sp l aced semi col ons ; extra braces { }, case without switch

Semant i c err or s i ncl ude t ype mi smat ches bet w een

or mi ssi ng

operat ors and operands. Operat or applied t o incompat ible

operand

Logical errors can be anyt hing f rom incorrect reasoning. E.g, assi gnment op er at or = i nst ead of t he compar i son op er at or ==

Synt ax error handl ing

The error

handler

should

f ollowing goals in mind:

be writt en

wit h

t he

Errors should be report ed clearly and accurat ely

It should report the place of the error

It should also report the type of the error

The

eff icient ly and det ect ot her errors.

compiler

should

recover

from common errors

Eg . Add missing semicolons

It should not slow down the whole

processsignificantly.

Add minimal overhead to the processing of correct programs.

Synt ax error handl ing

There are four main st rat egies in error handling:

Panic mode

error

recovery:

discards all

t okens unt il

a

synchronizat ion t oken (like ; and { or }) is f ound.

Phrase level recovery: the parser makes a local correction

so t hat it can cont inue t o parse t he rest of

Replace comma by a semicolon, delet e or insert

t he input .

semicolon

Error

product ions:

augment

t he

grammar

t o

capt ure

t he most common errors t hat programmers make.

Global correction: makes as few changes as possible in the

program so that a globally least cost correction program is

obt ained.

The Parser Generat or: Yacc

Yacc st ands for "yet anot her compiler-compiler".

Yacc: a t ool f or aut omat ically generat ing a parser gi ven a gr ammar w r i t t en i n a yacc speci f i cat i on ( . y file)

Yacc

parser

calls lexical analyzer t o collect

t okens from input st ream.

Tokens are or ganized using grammar rules

When a rule is recognized, its action is executed

Note

lex tokenizes the input and

yacc parsesthe

tokens, taking the right actions, in context.

169

Scanner, Parser, Lex and Yacc

170
170

Yacc

There are four st eps involved in creat ing a compiler in Yacc:

1.

Specif y t he grammar:

Writ e t he grammar in a .y f ile (also specif y t he act ions here t hat are to be taken in C).

Writ e a lexical analyzer t o process input and pass t okens t o t he parser. This can be done using Lex.

Writ e a f unct ion t hat st art s parsing by calling yyparse().

Writ e error handling rout ines (like yyerror()).

2.

Gener at e a p ar se r f r om Yac c b y r unni n g Yac c ov e r t h e

grammar f ile.

3.

Co m p i l e c o d e p r o d u c e d b y Yac c as w e ll as an y o t he r relevant source files.

4.

Li nk t he obj ect f i l es t o appr opr i at e l i br ar i es f or t he

executable parser.

172

43

Writ ing a Grammar in Yacc

Product ions in Yacc are of t he f orm:

Nonterminal

: tokens/nonterminals | tokens/nonterminals

;

{ action } { action }

Tokens t hat are single charact ers can be used

directly within productions, e.g. +

Named t okens must be declared f irst in t he

declaration part using

%token TokenName

Synt hesized Att ribut es

Semant ic act ions may ref er

t o values of t he synt hesized

attributes of terminals and non-terminals in a production:

X : Y1 Y2 Y3 Yn

{ action }

$$ refers to the value of the attribute of X $ i refers to the value of the attribute of Y i

For example

factor

:

(expr )

f act or. val =x

{ $$=$2; }

( expr. val =x )
(
expr. val =x
)

$$=$2

45

Lex Yacc int eract ion

yyparse()

input

calc.y

y.tab.c Yacc y.tab.h gcc Lex calc.l
y.tab.c
Yacc
y.tab.h
gcc
Lex
calc.l

lex.yy.c

yylex()

a.out

Compiled

output

Lex Yacc int eract ion

If l e x i s t o r et ur n t okens t hat yacc w i ll pr ocess, t hey have to agree on what tokens there are. This is

done as follows:

The yacc f ile will have t oken def init ions %token INTEGER in the definitions section.

When t he yacc f ile is t ranslat ed wit h yacc -d, a header f ile

y.t ab.h is creat ed t hat has def init ions like #def ine INTEGER 258

This f ile can t hen be included in bot h t he lex and yacc

program.

The lex f ile can t hen call ret urn INTEGER, and t he yacc program can mat ch on t his t oken.

Exam p l e : Si m p l e c al c u l at o r : y ac c f i l e

%{ #include <stdio.h> void yyerror(char *); #define YYSTYPE int

int types for attributes

*); #define YYSTYPE int int types for attributes and yylval Grammar rules %} %token INTEGER %%

and yylval

Grammar rules

%} %token INTEGER

%%

action

yylval Grammar rules %} %token INTEGER %% a c t i o n program: program expr

program:

program expr '\n'

{ printf("%d\n", $2); }

|

;

The value of

LHS(expr)

{ printf("%d\n", $2); } | ; The value of LHS(expr) expr: INTEGER | expr '+' expr

expr:

INTEGER | expr '+' expr

| expr '-' expr

{ $$=$1;} { $$ = $1 + $3; } { $$ = $1 - $3; }

expr { $$=$1;} { $$ = $1 + $3; } { $$ = $1 - $3;

;

%% void yyerror(char *s) { fprintf(stderr, "%s\n", s);} int main(void) { yyparse();

return 0;}

"%s\n", s);} int main(void) { yyparse(); return 0;} The value of t okens on RHS St

The value of t okens on RHS St ored in yylval

Lexical analyzer invoked by the parser

179

Examp l e : Si mp l e c al c u l at o r : l ex f i l e

%{ #include <stdio.h> #include "y.tab.h" extern int yylval ; %} %%

The lex program mat ches Numbersand operators and ret urns t hem

return INTEGER; }
return INTEGER;
}

Generat ed by yacc, cont ains # d e f i n e I NT EGER 256

[0-9]+ {yylval=atoi(yytext);

Defined in y.tab.c

Place the integer value In the stack

[-+*/\n] return *yytext; [ \t] ;/*Skip white space*/

.

[-+*/\n] return *yytext; [ \t] ; /*Skip white space*/ . yyerror("invalid character"); %% int yywrap(void){

yyerror("invalid character");

%% int yywrap(void){ return 1;

operatorswill be ret urned

}

180

Lex and Yacc: compile and run

[compiler@localhost yacc]$ vi calc.l [compiler@localhost yacc]$ vi calc.y [compiler@localhost yacc]$ yacc -d calc.y yacc: 4 shift / reduce conflict s. [compiler@localhost yacc]$ lex calc.l

[compiler@localhost yacc]$ ls a.out calc.l calc.y lex.yy.c t ypescript y.t ab. c

y.tab.h

[compiler@localhost

yacc]$ gcc y.t ab.c lex.yy.c

[compiler@localhost yacc]$ ls

a.out

calc.l calc.y lex.yy.c t ypescript

y.t ab. c

y.tab.h

[compiler@localhost yacc]$ ./ a.out

2+3

5

23+8+

Invalid charachter synt ax error

50

Example : Simple calculator: yacc fileoption2

%{

#include<stdlib.h>

#include<stdio.h>

%} %token INTEGER; %% Program :

program expr '\n'

|

;

expr : expr '+' mulexpr |expr '-' mulexpr |mulexpr

;

mulexpr : mulexpr '*' term

{printf("%d\n ", $2);}

{$$=$1 + $3;} {$$=$1 - $3;}

{$$=$1;}

{$$=$1 * $3;}

 

| mulexpr '/' term

{$$=$1 / $3;}

|term

{$$=$1;}

;

term :

 

'(' expr ')'

{$$=$2;}

| INTEGER

{$$=$1;}

;

%%

Example : Simple calculator: yacc fileoption2

void yyerror(char *s)

{

fprintf(stderr, " %s\n ", s);}

}

int main(void)

{

yyparse(); return 0;

}

Calculator 2: Exampleyacc file

%{ #include<stdio.h> int sym[26]; %} %token INTEGER VARIABLE %left '+' '-' %left '*' '/' %% program :

sym holds the value of the associ at ed variable

program : sym holds the value of the associ at ed variable associ at i ve
program : sym holds the value of the associ at ed variable associ at i ve

associ at i ve and precedence rules

program statement '\n'

|

;

statement :

user: 3 * (4 + 5) cal c: 27 user: x = 3 * (4

user:

3 * (4 + 5)

cal c:

27

user:

x = 3 * (4 + y = 5

5) user:

user:

x

cal c:

27

user:

y

calc:

5

user:

x+ 2*y

cal c:

37

expression |VARIABLE '=' expression

{printf("%d\n", $1);} {sym[$1]= $3;}

;

expression :

INTEGER |VARIABLE |expression '+' expression |expression '-' expression

|expression

'*' expression

%%

|expression

'/' expression

| '(' expression ')'

;

{$$=$1;}

{$$=sym[$1];}

{$$=$1 + $3;} {$$=$1 - $3;}

{$$=$1 * $3;}

{$$=$1 * $3;}

{$$=$2;}

53

Calculator 2: Exampleyacc file

int yyerror(char *s)

{

fprintf(stderr, "%s\n",s); return 0;

}

int main()

{

yyparse();

return 0;

}

Calculator 2: Examplelex file

%{

#include<stdio.h>

#include<stdlib.h>

#include "y.tab.h

#include<stdlib.h> #include " y.tab.h “ void yyerror(char *); extern int yylval; %} %% [a-z] {

void yyerror(char *);

extern int yylval; %} %%

[a-z]

{

}

[0-9]+

{

}

yylval=*yytext;

return VARIABLE;

yylval=atoi(yytext); return INTEGER;

The lexical analyzer returns variablesand integers

For variables yylval specif ies an index to the symbol table sym.

[-+*/()=\n] [ \t]

.

%%

return *yytext; ; /*Skip white space*/

yyerrror(" Invalid character ");

yyerrror( " Invalid character " );

int yywrap(void)

{

return 1;

}

186

Conclusions

Yacc and Lex are very helpful for building the compiler front-end A lot of time is saved when compared to hand-implement at ion of parser and scanner They both work as a mixture of rulesand C codeC code is generat ed and is merged wit h t he rest of t he compiler code

Calculator program

Expand t he calculat or program so t hat t he new cal cul at or pr ogr am i s capabl e of pr ocessi ng:

user:

3 * (4 + 5)

user:

x

= 3 * (4 + 5)

user:

y

= 5

user:

x

+ 2*y

2^3/ 6 sin(1) + cos(PI)

tan

log

factorial

57
57