Vous êtes sur la page 1sur 68

Equivalence of CFG's and

PDA's
A language is context free if and only
if some pushdown automaton
recognizes it

As usual with if and only if theorems,
there are two directions to prove
If a language is context free, then some pushdown
automaton recognizes it
If a pushdown automaton recognizes some
langauge, then it is context free

Only If (CFG to PDA)
Let L = L(G) for some CFG G = (V,E, P,
S)
Idea: have PDA A simulate leftmost
derivations in G, where a left-sentential
form (LSF) is represented by:
1. The sequence of input symbols that A has
consumed from its input, followed by
2. A's stack, top left-most

Example: If (q, abcd, S) * (q, cd, ABC),
then the LSF represented is abABC


Moves of A
If a terminal a is on top of the stack,
then there had better be an a waiting on
the input. A consumes a from the
input and pops it from the stack
The LSF represented doesn't change!
If a variable B is on top of the stack,
then PDA A has a choice of replacing
B on the stack by the body of any
production with head B
Defining the PDA
Define PDA A as follows:
Q contains a single state, q
E contains the terminal symbols of the grammar
I contains all terminal and non-terminal symbols from the
grammar
F is the empty set (A terminates by empty stack)
Start stack symbol is the distinguished symbol of the
grammar
o is defined as follows:
For each production X o in the grammar, create a move
o(q,c,X) = (q,o)
For each terminal symbol a in the grammar, create a move
o(q,a,a) = (q, c)

Example
S a | aS | bSS | SSb | SbS

PDA A = ({q},{a,b},{S,a,b},o,q,u,S)
o is defined as
o(q,c,S) = { (q,a),
(q,aS),
(q,bSS),
(q,SSb),
(q,SbS) }
o(q,a,a) = (q,c)
o(q,b,b) = (q,c)
Processing of baa
state input stack move
q baa S
o(q,c,S) = (q,bSS)
q baa bSS
o(q,b,b) = (q,c)
q aa SS
o(q,c,S) = (q,a)
q aa aS
o(q,a,a) = (q,c)
q a S
o(q,c,S) = (q,a)
q a a
o(q,a,a) = (q,c)
q - - - accept -
Generate bSS
Match b
Generate a
Match a
Generate a
Match a
S
b S S
a a
b a a
m
a
t
c
h

m
a
t
c
h

m
a
t
c
h

Converting from PDA to CFG
A PDA consumes a character
A CFG generates a character
We want to relate these two
What happens when a PDA consumes
a character?
It may change state
It may change the stack
Converting from PDA to CFG
continued
Suppose X is on the stack and a is read
What can happen to X?
It can be popped
It may replaced by one or more other stack
symbols
And so on
The stack grows and shrinks and grows and shrinks
Eventually, as more input is consumed, X must be
popped (or well never reach an empty stack)
And the state may change many times
We must track all of this!
PDA to CFG
STRATEGY 1
Assume L = N(P), where P = (Q,E,I,o,q
0
,Z
0
,F), F
is empty (accept by empty stack)
Key idea: units of PDA action have the net effect of
popping one symbol from the stack, consuming
some input, and making a state change.
The triple [qZp] is a CFG variable that generates exactly
those strings w such that P can read w from the input,
pop Z (net effect), and go from state q to state p.
More precisely, (q,w,Z) * (p,c,c )
As a consequence of above, (q,wx,Zo ) * (p, x,o) for any x and
o.


It's a Zen thing
[qZp] is at once a triple involving
states and symbols of P, and yet
to the CFG we construct it is a
single, indivisible object.
(OK, I know that's not a Zen thing, but you get
the point)

Strategy
A popping rule, e.g., (p,c) in o(q,a,Z).
[qZp] a
Pop Z, consume a
A rule that replaces one symbol and state by others,
e.g., (p,Y) in o(q,a,Z).
For all states r in Q: [qZr] a[pYr]
Pop Z, consume a, move to state p, push Y
A rule that replaces one stack symbol by two, e.g.,
(p,XY) in o(q,a,Z).
For all states r and s in Q: [qZs] a[pXr][rYs]
Pop Z, consume a, move to state p, push X, move to some
other state, push Y, move to s
There may be some states r that cannot be reached from p while
popping X. True, but does not affect grammar since the resulting
variables are useless and do not affect the language accepted by
the grammar
o(q,a,Z) = (p,Y)
q
p
q
p
?
a, Z Y
a, Z Y
process
Y

consume a
pop Z
push Y
move to state p
Y not yet processed
consume a
pop Z
push Y
move to state
p
Since we dont know which state
the PDA will be in after
processing Y, define a
production [qZr
n
] a[pYr
n
] that
ends in each possible state r
n
Example
PDA with transitions :
o (q
1
,0,Z
0
) = {(q
1
,XZ
0
)}
o (q
2
, c,Z
0
) = {(q
3
,c)}
o (q
1
,0,X) = {(q
1
,XX)}
o (q
1
,1,X) = {(q
2
,c)}
o (q
2
,1,X) = {(q
2
,c)}

S [q
1
Z
0
q
3
]
[q
1
Z
0
q
3
] 0 [q
1
Xq
1
] [q
1
Z
0
q
3
]
[q
1
Z
0
q
3
] 0 [q
1
Xq
2
] [q
2
Z
0
q
3
]
[q
1
Z
0
q
3
] 0 [q
1
Xq
3
] [q
3
Z
0
q
3
]
[q
1
Xq
1
] 0 [q
1
Xq
1
] [q
1
Xq
1
]
[q
1
Xq
1
] 0 [q
1
Xq
2
] [q
2
Xq
1
]
[q
1
Xq
1
] 0 [q
1
Xq
3
] [q
3
Xq
1
]
[q
1
Xq
2
] 0 [q
1
Xq
1
] [q
1
Xq
2
]
[q
1
Xq
2
] 0 [q
1
Xq
2
] [q
2
Xq
2
]
[q
1
Xq
2
] 0 [q
1
Xq
3
] [q
3
Xq
2
]
[q
1
Xq
2
] 1
[q
2
Xq
2
] 1
[q
2
Z
0
q
3
] c

S
[q
1
Z
0
q
3
]
0 [q
1
Xq
2
] [q
2
Z
0
q
3
]
0 0 [q
1
Xq
1
] [q
1
Xq
2
] [q
2
Z
0
q
3
]
0 0 0 [q
1
Xq
2
] [q
2
Xq
2
] [q
2
Xq
2
] [q
2
Z
0
q
3
]
0 0 0 1 [q
2
Xq
2
] [q
2
Xq
2
] [q
2
Z
0
q
3
]
0 0 0 1 1 [q
2
Xq
2
] [q
2
Z
0
q
3
]
0 0 0 1 1 1 [q
2
Z
0
q
3
]
0 0 0 1 1 1
Impossible configurations involving q
3
are dimmed
Derivation of 000111 :
Example
PDA with
transitions
o (q
0
,a,Z
0
) = {(q
0
,AZ)}
o (q
0
,b,Z
0
) = {(q
0
,BZ)}
o (q
0
,a,A) = {(q
0
,AA)}
o (q
0
,a,A) = {(q
0
,AA)}
o (q
0
,b,A) = {(q
0
,BA)}
o (q
0
,a,B) = {(q
0
,AB)}
o (q
0
,b,B) = {(q
0
,BB)}
o (q
0
,c,Z
0
) = {(q
1
,Z
0
)}
o (q
0
,c,A) = {(q
1
,A)}
o (q
0
,c,B) = {(q
1
,B)}
o (q
1
,a,A) = {(q
1
,c)}
o (q
1
,b,B) = {(q
1
,c)}
o (q
1
,c,Z
0
) = {(q
1
,c)}

S [q
0
Z
0
q]
[q
0
Z
0
q] a[q
0
Ap] [pZ
0
q]
[q
0
Z
0
q] b[q
0
Bp] [pZ
0
q]
[q
0
Aq] a[q
0
Ap] [pAq]
[q
0
Aq] b[q
0
Bp] [pAq]
[q
0
Bq] a[q
0
Ap] [pBq]
[q
0
Bq] b[q
0
Bp] [pBq]
[q
0
Z
0
q] c[q
1
Z
0
q]
[q
0
Aq] c[q
1
Aq]
[q
0
Bq] c[q
1
Bq]
[q
1
Aq
1
] a
[q
1
Bq
1
] b
[q
1
Z
0
q
1
] c
In the above, q and p can each
be either q
0
or q
1

The Full Story
S [q
0
Z
0
q
0
]
S [q
0
Z
0
q
1
]
[q
0
Z
0
q
0
] a[q
0
Aq
0
] [q
0
Z
0
q
0
]
[q
0
Z
0
q
0
] a[q
0
Aq
1
] [q
1
Z
0
q
0
]
[q
0
Z
0
q
1
] a[q
0
Aq
0
] [q
0
Z
0
q
1
]
[q
0
Z
0
q
1
] a[q
0
Aq
1
] [q
1
Z
0
q
1
]
[q
0
Z
0
q
0
] b[q
0
Bq
0
] [q
0
Z
0
q
0
]
[q
0
Z
0
q
0
] b[q
0
Bq
1
] [q
1
Z
0
q
0
]
[q
0
Z
0
q
1
] b[q
0
Bq
0
] [q
0
Z
0
q
1
]
[q
0
Z
0
q
1
] b[q
0
Bq
1
] [q
1
Z
0
q
1
]
[q
0
Aq
0
] a[q
0
Aq
0
] [q
0
Aq
0
]
[q
0
Aq
0
] a[q
0
Aq
1
] [q
1
Aq
0
]
[q
0
Aq
1
] a[q
0
Aq
0
] [q
0
Aq
1
]
[q
0
Aq
1
] a[q
0
Aq
1
] [q
1
Aq
1
]
[q
0
Aq
0
] b[q
0
Bq
0
] [q
0
Aq
0
]
[q
0
Aq
0
] b[q
0
Bq
1
] [q
1
Aq
0
]
[q
0
Aq
1
] b[q
0
Bq
0
] [q
0
Aq
1
]
[q
0
Aq
1
] b[q
0
Bq
1
] [q
1
Aq
1
]
[q
0
Bq
0
] a[q
0
Aq
0
] [q
0
Bq
0
]
[q
0
Bq
0
] a[q
0
Aq
1
] [q
1
Bq
0
]
[q
0
Bq
1
] a[q
0
Aq
0
] [q
0
Bq
1
]
[q
0
Bq
1
] a[q
0
Aq
1
] [q
1
Bq
1
]
[q
0
Bq
0
] b[q
0
Bq
0
] [q
0
Bq
0
]
[q
0
Bq
0
] b[q
0
Bq
1
] [q
1
Bq
0
]
[q
0
Bq
1
] b[q
0
Bq
0
] [q
0
Bq
1
]
[q
0
Bq
1
] b[q
0
Bq
1
] [q
1
Bq
1
]
[q
0
Z
0
q
0
] c[q
1
Z
0
q
0
]
[q
0
Z
0
q
1
] c[q
1
Z
0
q
1
]
[q
0
Aq
0
] c[q
1
Aq
0
]
[q
0
Aq
1
] c[q
1
Aq
1
]
[q
0
Bq
0
] c[q
1
Bq
0
]
[q
0
Bq
1
] c[q
1
Bq
1
]
[q
1
Aq
1
] a
[q
1
Bq
1
] b
[q
1
Z
0
q
1
] c
If we specify every non-terminal in terms of all possible states, the expansion would
contain all of the states below
Deriving bacab
S [q
0
Z
0
q
1
]
[q
0
Z
0
q
1
] a[q
0
Aq
1
] [q
1
Z
0
q
1
]
[q
0
Z
0
q
1
] b[q
0
Bq
1
] [q
1
Z
0
q
1
]
[q
0
Aq
1
] a[q
0
Aq
1
] [q
1
Aq
1
]
[q
0
Aq
1
] b[q
0
Bq
1
] [q
1
Aq
1
]
[q
0
Bq
1
] a[q
0
Aq
1
] [q
1
Bq
1
]
[q
0
Bq
1
] b[q
0
Bq
1
] [q
1
Bq
1
]
[q
0
Z
0
q
1
] c[q
1
Z
0
q
1
]
[q
0
Aq
1
] c[q
1
Aq
1
]
[q
0
Bq
1
] c[q
1
Bq
1
]
[q
1
Aq
1
] a
[q
1
Bq
1
] b
[q
1
Z
0
q
1
] c

Only some of the productions in the generated grammar will allow for a derivation; the
rest are unnecessary
PDA moves
(q
0
, bacab, Z
0
) |- (q
0
, acab, BZ
0
)
|- (q
0
, cab, ABZ
0
)
|- (q
1
, ab, ABZ
0
)
|- (q
1
, b, BZ
0
)
|- (q
1
, c, Z
0
)
|- (q
1
, c, c)
Corresponding leftmost derivation
S [q
0
,Z
0
,q
1
]
b [q
0
,B,q
1
] [q
1
,Z
0
,q
1
]
ba [q
0
,A,q
1
] [q
1
,B,q
1
][q
1
,Z
0
,q
1
]
bac [q
1
,A,q
1
] [q
1
,B,q
1
][q
1
,Z
0
,q
1
]
baca [q
1
,B,q
1
][q
1
,Z
0
,q
1
]
bacab [q
1
,Z
0
,q
1
]
bacab
PDA to CFG
STRATEGY 2
Convert PDA P into CFG G
Modify P to be a normalized PDA N so that:
It has a single accept state, q
accept
Create c-transitions from old accept states to this new accept state
It empties the stack before accepting
Push a special character $ on the stack in the start state (introducing a
new start state in the process)
Introduce a new temporary state q
temp
that replaces q
accept
, which has
transitions popping all characters from the stack (except $)
Introduce transition:

q
temp
q
accept
c,$ c
Continued
Each transition either pushes a symbol onto the stack
or pops one off the stack, but it does not do both at
the same time
Replace a simultaneous pop/push move with a 2-transition
rule that goes through a new state
E.g.



(read a from input, pop b from stack, push c)
Introduce special state q
temp
plus 2 transitions, one doing pop
and one doing push:


Replace a transition that neither pops nor pushes with two
transitions that push and then immediately pop some newly-
created dummy stack symbol





q
i
q
j
a,b c
q
i
a,b c
q
j
q
temp
c, c c
q
i q
j
a, c c
q
i
a, c X
q
j
q
temp
c, X c
Normalizing the PDA
EXAMPLE
c , c $
b , c X
a, X Y
c, $ c
a, c c
L(N) = (ba + baa)*b) + c
Pure Push Pop
Make sure the stack is always active by replacing inactive stack moves
by a push followed by immediate pop of a dummy symbol
c , c $
b , c X
a, X Y
c, $ c
a, c c
c , c $
b , c X
a, X Y
c, $ c
a, c D
c, D c
Pure Push Pop
Any move that replaces the top letter on the stack
should be changed into a pop followed by a push
c , c $
b , c X
a, X Y
c, $ c
a, c D
c, D c
c , c $
b , c X
a, X c
c, $ c
a, c D
c, D c
c, c Y
Unique Accept State
Turn off original accept states and connect to a new
accept state
c , c $
b , c X
a, X c
c, $ c
a, c D
c, D c
c, c Y
c , c $
b , c X
a, X c
c, $ c
a, c D
c, D c
c, c Y
c, c D c, D c
c, c D
Remember: each
move must either
push or pop from
the stack)
Empty Stack
Make sure the stack empties its content by adding a
new dummy empty stack symbol and new start/accept
states
c , c $
b , c X
a, X c
c, $ c
a, c D
c, D c
c, c Y
c, c D
c, D c
c, $ c
c, X c
c, Y c
c, c D
c , c
c, c c, D c
PDA to CFG
INTUITIVE DESCRIPTION
Consider normalized PDA N = (Q,E,I,o,q
init
,q
accept
)
Starts in q
init
with an empty stack
Ends in q
accept
with an empty stack
In general, can define the language L
pq
, for any two states p,q e Q which is
the language of all strings that start in p with an empty stack, and end in q
with an empty stack
For each pair of states p and q, define a symbol S
pq
in
the CFG for the language L
pq

Language of N is


L
q
init
q
accept
Steps to process w e L
pq
Two possibilities:
1. During the processing of w the stack
becomes empty at some intermediate
state r
This means a word of L
pq
can be formed by concatenating
a word of L
pr
(which brought N from state p to state r with
an empty stack) and a word of L
rq
(that took N from r to q)
2. Stack is never empty in the middle of Ns
transit from p to q in processing w
The first transition (from, say, p to p
1
) must have been a
push, and the last transition (from, say, q
1
to q) must have
been a pop, and the pop popped exactly the symbol
pushed by the first transition from p to p
1

In other words, if the PDA read a from input
as it moved from p to p
1
, and read b as it
moved from q
1
to q, then w =ayb, where y is
an input that causes the PDA N to start from
p
1
with an empty stack and end in q
1
with an
empty stacki.e., y e L
p1q1

Formally, if there is a push transition (pushing
X onto the stack) from p to p
1
(reading a) and
a pop transition from q
1
to q (popping X and
reading b), then a word in L
pq
can be
constructed from the expression aL
p1q1
b
Note that either or both of a or b could be c
The Construction
For every state p, introduce the rule
A
pp
c
Empty string can always be considered as getting you from p to
p without doing any thing to the stack, since nothing was read
CONCATENATION RULE: For the case where the
stack empties in the middle of transition from
p to q, introduce, for all states p, q, r of N, the
rule
A
pq
A
pr
A
rq
RECURSION RULE: Case where stack never
empty: for any given states

p, p
1
, q
1
, r of N,
such that there is a push transition from p to
p
1
and a pop transition from q
1
to r (that push
and pop the same symbol), introduce an
appropriate rule
Formally, for p, p
1
, q
1
, r of N with the form



introduce the rule
A
pr
aA
p1q1
b

p
1
p

a, c X
r

q
1
b, X c
push X pop X
Formal Definition
FROM SIPSER
P = (Q,E,I,o,q
0
,q
accept
)
Non-terminals of G are {A
pq
| p,q e Q}
Rules:
For each p, q, r, s e Q, t e I, and a, b e E, if o(p,a,

c)
contains (r,t) and o(s,b,t) contains (q,c), put the rule A
pq

aA
rs
b in G
For each p, q, r e Q, put the rule A
pq
A
pr
A
rq
in G.
For each p e Q, put the rule A
pp
c in G.


A
pq
A
pr
A
rq

Stack
height
Input string
Generated by A
pr
Generated by A
rq
Generated by A
pq
p r
q
CONCATENATION RULE
A
pq
aA
rs
b
Stack
height
Input string
Generated by A
rs
Generated by A
pq
q p
r
s
a
b
RECURSION RULE
PDA CFG
If paths for strings that are accepted by the
PDA start and end with an empty stack, it is
possible to consider any such path, between
any two states and recursively generate all
such paths
This recursive relationship between paths will
give rise to the recursion at the heart of the
representative context free grammar
The Grammar
The rules for generating paths give a
grammar to generate all labels of such
paths
The grammar has non-terminals A
qr
which
will generate all strings x that are
processed when passing from state q to
state r

Q: Under this assumption, what should the
production body (right hand side) for
the start variable S be?
The Grammar Symbols
A: S = A
q
init
q
accept
, where q
init
is the start
state and q
accept
is the final state
In addition to this start variable, the other
variables are all A
qr
for which there is a
path going from q to r that starts and
ends with an empty stack



Note that Sipser doesnt require the extra condition that there be a path from q to r which starts and ends with an empty
stackhis method generates all possible combinations. However, those pairs q,r for which no such path exists will
create useless variables A
qr
which end up cluttering the grammar and making the construction extremely ugly, even on
the simplest PDAs. On the other hand, it is not obvious how one would determine a priori which of the pairs dont have
such paths, which probably explains why Sipser didnt include this condition.


Grammar Rules
1. BASIS RULE : Add a production A
qq
c for
each state q in the PDA
2. CONCATENATION RULE : Add a production
A
pr
A
pq
A
qr
for all p,q,r when A
pr
, A
pq
and
A
qr
are all in V.
3. RECURSION RULE : Add a production A
ps

aA
qr
b for all p,s,q,r when
A
ps
and A
qr
are in V
Transitions (q,X) e o(p,a,c), (s,c) e o(r,b,X) for the same
stack symbol X exist in the PDA
Example
PDA in the normalized form:





Q: What is the accepted language?
r
s
c , c$
q
c , $c
(, c X
), Xc
A: CNP = correctly nested parentheses,
including sets of pairs [e.g., ()(())]. The
number of Xs on the stack reflects how
deep the current nesting is.



Q: What are the variables for the
equivalent grammar? What is the
start variable?
r
s
c , c$
q
c , $c
(, c X
), Xc
A: V = {A
qs
, A
qq
, A
rr
, A
ss
}, S = A
qs

We dont need A
rq
, A
sq
,

A
sr
because
the paths go in the wrong direction
We dont need A
qr
or A
rs
because cant
add or remove $ while at r
I.e., not a transition where you both begin and end
with an empty stack



r
s
c , c$
q
c , $c
(, c X
), Xc
Productions from the Base
Rule
Empty string can always be considered as getting
you from p to p without doing any thing to the stack,
since nothing was read







r
s
c , c$
q
c , $c
(, c X
), Xc
A
qq
c , A
rr
c , A
ss
c

Productions from the
Concatenation rule
If you can get from some state p to another state p
1

starting and ending with the stack empty (regardless
of stack activity in the processing of moving from p to
p
1
), and from q
1
to q under the same conditions, then
combine paths to get a path from p to q.




r
s
c , c$
q
c , $c
(, c X
), Xc
A
qs
A
qq
A
qs
| A
qs
A
ss
A
qq
A
qq
A
qq
A
rr
A
rr
A
rr

A
ss
A
ss
A
ss

State pairs q and r and r and s do not satisfy
the condition that the stack is empty in both
states
Productions from the Recursion Rule
For any given states

p, p
1
, q
1
, q of N, such that there is a push
transition from p to p
1
and a pop transition from q
1
to q (that push
and pop the same symbol), i.e., there exist transitions o(p,a,c)
contains(p
1
,X) and o(q
1
,b,X) contains (q,c), put the rule A
pq

aA
p1q1
b




r
s
c , c$
q
c , $c
(, c X
), Xc
A
qs
cA
rr
c = A
rr

A
rr
(A
rr
)

o(q,c,c) contains (r,$) and o(r,c,X) contains (s,c)
o(r,(,c) contains (r,X) and o(r,),X) contains (r,c)
Full Grammar
A
qs
A
rr
| A
qq
A
qs
| A
qs
A
ss

A
rr
c | A
rr
A
rr
| (A
rr
)
A
qq
c | A
qq
A
qq

A
ss
c | A
ss
A
ss


Simplifications
Apparently A
qq
and

A
ss
are purely self-
referential, so there is no way to terminate
themthat is, no string can be derived from
them.
We can therefore remove the variables A
qq
,

A
ss

A
qs
A
rr
| A
qq
A
qs
| A
qs
A
ss

A
rr
c | A
rr
A
rr
| (A
rr
)
A
qq
c | A
qq
A
qq

A
ss
c | A
ss
A
ss
Becomes:
A
qs
A
rr
|

A
qs

A
rr
c | A
rr
A
rr
| (A
rr
)
Showing that the grammar
works
A
qs
A
rr
|

A
qs

A
rr
c | A
rr
A
rr
| (A
rr
)
Rename variables to get:
S T | S
T c | TT | (T)
S isnt needed as its whole purpose is to get
you to T

So the final (cleaned up) grammar is
T c | TT | (T)
Another Example
Consider the language L = {wcw
R
| w e
{a,b}*}. A non-normalized PDA for this
language is
a, c a
b, c b
a, a c
b, b c
s
f
c, c c
Convert to Normalized Form
a, c a
b, c b
a, a c
b, b c
s
f
c,c D
s
a
1. Create new start and accepting states
2. All transitions either pop or push except c, c c;
change to 2 transitions that push and pop a
dummy symbol

q
t
c, D c
c,c $
c,$ c
Generate Grammar
1. Add start symbol and a production A
qq
c for
each state q in the PDA
S A
sa
,

A
ss
c, A
ss
c, A
qq
c, A
ff
c, A
aa

c
a, c a
b, c b
a, a c
b, b c
s
f
c,c D
s
a
q

c, D c
Generate Grammar
2. Add a production A
pr
A
pq
A
qr
for all p,q,r
when A
pr
, A
pq
and A
qr
are all in V

a, c a
b, c b
a, a c
b, b c
s
f
c,c D
s
a
q

c, D c
c,c $
c,$ c
A
sa
A
ss
A
sa
| A
sa
A
aa


Generate Grammar
3. Add a production A
ps
aA
qr
b for all p,s,q,r when A
ps
and A
qr
are
in V and transitions (q,X) e o(p,a,c), (s,c) e o(r,b,X) for the same
stack symbol X exist in the PDA

a, c a
b, c b
a,a c
b,b c
s
f
c,c D
s
a
q

c, D c
c,c $
c,$ c
A
sa
$A
sf
$ | A
sf
cA
qq
c
A
sf
bA
sf
b |aA
sf
a

Final Grammar
S A
sa

A
ss
c
A
ss
c
A
qq
c
A
ff
c
A
aa
c
A
sa
A
ss
A
sa

A
sa
A
sa
A
aa
A
sa
$A
sf
$
A
sf
cA
qq
c
A
sf
bA
sf
b
A
sf
aA
sf
a
S T
R c
U c
V c
W c
X c
T RT


T TX

T $Z$
Z

cV
Z bZb
Z aZa
T $Z$
Z

c
Z bZb
Z aZa
More readable Final grammar Simplified
R,U,V,W,X contribute only c so can
be eliminated
T RT

and T TX then become T
T, which is obviously unnecessary
S is superfluous because it only gets
you to T
Deterministic PDAs
Intuitively: never a choice of move
o (q, a, Z) has at most one member for any q, a, Z
(including a = c).
If o (q, c, Z) is nonempty, then o (q, a, Z) must be
empty for all input symbols a.

Why Care?
Parsers, as in YACC, are really DPDA's.
Thus, the question of what languages a DPDA can
accept is really the question of what programming
language syntax can be parsed conveniently.

Some Language
Relationships
Acceptance by empty stack is hard
for a DPDA
Once it accepts, it dies and cannot accept any
continuation.
Thus, N(P) has the prefix property: if w is in N(P),
then wx is NOT in N(P) for any x = c.
However, parsers do accept by
emptying their stack
Trick: they really process strings followed by a
unique endmarker (typically $) e.g., if they accept
w$, they consider w to be a correct program.

If L is a regular language, then L is
a DPDA language
A DPDA can simulate a DFA, without using its
stack (acceptance by final state).
If L is a DPDA language, then L is a
CFL that is not inherently
ambiguous
A DPDA yields an unambiguous grammar in
the standard construction
Interesting fact: The class of languages
accepted by NPDAs is larger than those
accepted by DPDAs!

Languages accepted by
nondeterministic PDA
Languages accepted by
deterministic PDA
Languages
accepted by FA
or NFA
PDA more powerful than FA
Cleaning Up Grammars
We can "simplify" grammars to a great
extent, e.g.:
1. Get rid of useless symbols -- those that do not
participate in any derivation of a terminal string.
2. Get rid of c-productions--those of the form variable
c.
But you lose the ability to generate c as a string in the language.
3. Get rid of unit productions -- those of the form variable
variable.
Any CFG can be converted via these and
other methods to Chomsky Normal Form
only production forms are variable two variables and
variable terminal.
Getting Rid of the Empty
String
Empty string is a nuisance with grammars
and languages in general
We will look at languages that do not contain
c
No loss of generality:
For language L, let G = (V,T,S,P) be a CFG that generates L
- {c}
Modify grammar by adding a new start variable S
0
and add
productions S
0
S | c
This grammar generates L
Therefore any non-trivial conclusion we make for L - {c}
should transfer to L

Useless Symbols
In order for a symbol X to be useful, it
must:
1. Derive some terminal string (possibly X is a
terminal).
2. Be reachable from the start symbol; i.e., S
oX|.
Note that X wouldn't really be useful if o or |
included a symbol that didn't satisfy (1), so it is
important that (1) be tested first, and symbols that
don't derive terminal strings be eliminated before
testing (2).
*
Finding Symbols That Don't
Derive Any Terminal String
Recursive construction:
Basis: A terminal surely derives a terminal
string.
Induction: If A is the head of a production
whose body is X
1
X
2
X
k
, and each X
i
is
known to derive a terminal string, then
surely A derives a terminal string.
Keep going until no more symbols that derive
terminal strings are discovered.
Example
S AB | C
A 0B | C
B 1 | A0
C AC | C1
Round 1: 0 and 1 are "in."
Round 2: B 1 says B is in.
Round 3: A 0B says A is in.
Round 4: S AB says S is in.
Round 5: Nothing more can be added.

Thus, C can be eliminated, along with any production
that mentions it, leaving S AB; A 0B; B 1 | A0.
Finding Symbols That Can't Be
Derived From the Start Symbol
Another recursive algorithm:
Basis: S is "in."
Induction: If variable A is in, then so is
every symbol in the production bodies
for A.
Keep going until no more symbols
derivable from S can be found.

Example
S AB
A 0B
B 1 | A0
Round 1: S is in.
Round 2: A and B are in.
Round 3: 0 and 1 are in.
Round 4: Nothing can be added.
In this case, all symbols are derivable from S, so no
change to grammar.

Book has an example where not only are there symbols
not derivable from S, but you must eliminate first the
symbols that don't derive terminal strings, or you get the
wrong grammar.
Eliminating c-Productions
A variable A is nullable if A c. Find them by
a recursive algorithm:
Basis: If A c is a production, then A is
nullable.
Induction: If A is the head of a production
whose body consists of only nullable symbols,
then A is nullable.
Once we have the nullable symbols, we can
add additional productions and then throw
away the productions of the form A c for
any A.
*
If A X
1
X
2
X
k
is a production, add
all productions that can be formed by
eliminating some or all of those X
i
's
that are nullable.
But, don't eliminate all k if they are all
nullable.
Example
If A BC is a production, and both B and C
are nullable, add A B | C
Eliminating Unit Productions
1. Eliminate useless symbols and c-productions.
2. Discover those pairs of variables (A, B) such that
A B.
Because there are no c -productions, this derivation can only use
unit productions.
Thus, we can find the pairs by computing reachablity in a graph
where nodes = variables, and arcs = unit productions.
3. Replace each combination where A B o
and o is other than a single variable by A o
I.e., "short circuit" sequences of unit productions, which must
eventually be followed by some other kind of production.
4. Remove all unit productions.
*
*
*
Chomsky Normal Form
1. Get rid of useless symbols, c-productions, and
unit productions (already done).
2. Get rid of productions whose bodies are mixes
of terminals and variables, or consist of more
than one terminal.
3. Break up production bodies longer than 2.

Result
All productions are of the form A BC or A
a
No Mixed Bodies
1. For each terminal a, introduce a new
variable A
a
, with one production A
a

a.
2. Replace a in any body where it is not
the entire body by A
a
.
Now, every body is either a single terminal or it
consists only of variables.

Example
A 0B1 becomes A
0
0; A
1
1; A
A
0
BA
1

Making Bodies Short
If we have a production like A BCDE,
we can introduce some new variables that
allow the variables of the body to be
introduced one at a time.
A body of length k requires k - 2 new variables.

Example
Introduce F and G; replace A BCDE by A
BF; F CG; G DE.
Summary Theorem
If L is any CFL, there is a
grammar G that generates L -
{c}, for which each production is
of the form A BC or A a,
and there are no useless
symbols.