Vous êtes sur la page 1sur 4

Solution Set 2

CS 475 Fall 2006


Problem 1 Let INIT(L) = {x | xy L}. Let r, s, r
I
, s
I
be the regular expressions for languages
R, S, INIT(R), and INIT(S), respectively. Using only these regular expressions and the operations +,
concatenation, and , give the regular expressions for the following languages:
(a) INIT(R S)
(b) INIT(RS)
(c) INIT(R

)
Solution:
(a) Answer: r
I
+ s
I
. Prexes of strings in R S are exactly the prexes of strings in R and the prexes
of strings in S.
(b) Answer: r
I
+rs
I
. A string x is a prex of a string uv RS for u R and v S, if and only if either
x is a prex of u or x = ux

where x

is a prex of v.
(c) Answer: r

r
I
. A string x is a prex of R

if x is a prex of R, or if x consists of some word in R


followed by a prex of R, or if x consists of a word in R followed by another word in R followed by some
prex of R, and so on.
Problem 2 A regular expression is in disjunctive normal form if it is of the form (
1
+
2
+ +
n
)
for some n 1, where none of the
i
s contains an occurrence of +. Show that every regular language
is represented by some regular expression in disjunctive normal form. Hint: Prove and use the fact that
{a, b}

= {a}

({b}{a}

.
Solution: We use struction induction (induction on the height of the expression tree) to prove this. The
base case when our regular expression is a single symbol of the alphabet is trivial. Suppose we are given
a regular expression R. We suppose that R is fully parenthasized, i.e., every operator along with its
operands is surrounded by a pair of parenthesis, e.g. instead of a

(ba + d)

we use ((a

)(((ba) + d)

)).
Consider the highest level operator of R. If it is a +, i.e. R = (R
1
+ R
2
), then inductively, R
1
and R
2
can be written in disjunctive normal forms
1
+ +
n
and
1
+ +
m
respectively and therefore R
can be written as
1
+
1
+ +
m
.
On the other hand, if R = (R
1
R
2
), then again, inductively we can write R
1
and R
2
as above and then
R = (
1
+ +
n
)(
1
+ +
m
) =
1

1
+
1

2
+ +
n

m
.
The last case is when R = (R

1
). Using the formula given in the hint, we get disjunctive normal form
from R in this case too.
Problem 3 The use of with regular expressions does not allow one to describe new languages (see
future lecture on closure properties). However it does allow for more compact expressions. Show that the
1
shortest regular expression for the language consisting of one word (. . . ((a
2
0
a
1
)
2
a
2
)
2
. . .)
2
over alphabet
{a
0
, a
1
, . . . a
n
} is O(2
n
) while there is an O(n
2
) expression using describing the same language. Thus,
using can shorten expressions by an exponential amount.
Solution: For n + 1 symbol alphabet = {a
0
, . . . , a
n
}, let L
n
be the language dened as
L
n
= {(. . . ((a
2
0
a
1
)
2
a
2
)
2
. . . a
n
)
2
}
We rst show that any regular expression for L
n
is exponentially large. Notice that constant exponen-
tiation (like a
2
0
) is not valid in standard regular expressions. Also notice that if a regular expression
uses either of + or operators, its language has to contain more than one word. Therefore, the only
allowed operation to use in a regular expression for L
n
is concatenation. In other words, the only regular
expression we can nd for L
n
is the single word of L
n
itself.
Dene recursively the sequence of words w
0
, . . . , w
n
as,
w
i
=

a
0
a
0
if i = 0
w
i1
a
i
w
i1
a
i
otherwise
It can be easily observed that L
n
= {w
n
}. Let us denote by l
i
the length of the expression w
i
. Observe
that the above recurrence gives us the following for l
i
.
l
i
=

2 if i = 0
2l
i1
+ 2 otherwise
This recurrence gives us l
n
= O(2
n
) which is the length of the only possible regular expressions for L
n
.
If we are allowed to use , we can replace the formula above for w
i
with the following v
i
.
v
i
=

a
0
a
0
if i = 0
(v
i1
a
i
)

({a
0
, . . . , a
i
}

a
i
{a
0
, . . . , a
i
}

a
i
{a
0
, . . . , a
i
}

) otherwise
If we denote by l

i
the length of the expression v
i
, using the above recurrence we obtain the following for
l

i
.
l

i
=

2 if i = 0
l

i1
+ O(i) otherwise
Therefore l

i
= O(n
2
). Compared to l
i
= O(2
n
) this is an exponential improvement.
Problem 4 (For 4 hour graduate students only) In this problem we will dene a new class of expressions
called star free regular expressions over an alphabet which are dened inductively as follows:
2
(a) is a star free expression and the language it denotes is L() = {}
(b) is a star free expressions and its language is L() = {}
(c) For each a , a is an expression and it denotes L(a) = {a}
(d) If r is a star free expression, then r is also a star free expression, where L( r) =

\ L(r)
(e) If r and s are star free expressions, then r+s is also a star free expression, with L(r+s) = L(r)L(s).
(f) And nally, if r and s are star free expressions, then rs is a star free expression with L(rs) =
L(r)L(s).
So unlike regular expressions, star free expressions have complementation of a language, however they
do not have Kleene closure (hence, star free). However, it is possible to dene languages like

using
star free expressions;

= L(

).
A language L will be called aperiodic if there is an integer n > 0 such that for all x, y, z

, xy
n
z L
if and only if xy
n+1
z L. Show that if r is any star free expression then L(r) is aperiodic. (In fact, the
converse also holds: if L is aperiodic then there is a star free expression r such that L = L(r). You might
want to think of how you might prove the converse.)
Solution: We show that the language of every star free expression is aperiodic by induction on the
complexity of the expression. The base cases where the expression is any of , , or a for some a
trivially holds.
Now let e be a star free expression and assume that every expression simpler than e has an aperiodic
language. We consider all the possibilities for the principle operator of e. Since e is non-trivial this
operator has to be one of complementation, union, or concatenation.
Case 1: e = r + s for star free expressions r and s. By induction hypothesis, r and s have
constants n
r
and n
s
satisfying the condition given in the problem. It is easy to observe that since
L(e) = L(r) L(s), the constant n
e
= max{n
r
, n
s
} serves as the desired n for e.
Case 2: e = rs for star free expressions r and s. Let n
r
and n
s
be as in the previous case and
let n
e
= n
r
+ n
s
. Now let w = xy
k
z L(e) where k > n
e
= n
r
+ n
s
and let w = w
r
w
s
where
w
r
L(r) and w
s
L(s). It is easy to observe that either w
r
has at least n
r
repetitions of y or
w
s
has at least n
s
repetitions of y (and of course maybe both). Take as example the rst case (the
second case is similar), i.e. w
r
= xy
i
y
1
and w
s
= y
2
y
j
z where y
1
y
2
= y, i +j = k 1 n
r
+n
s
1,
and i n
r
. Then by induction hypothesis, xy
i
y
1
L(r) if and only if xy
i+1
y
1
L(r) and thus
xy
k
z = xy
i
y
1
y
2
y
j
z L(rs) if and only if xy
i+1
y
1
y
2
y
j
z = xy
k+1
z L(rs).
Case 3: e = r for some star free expression r. Notice that even though ordinary regular expressions
do not have the complementation operator dened for them, they are closed under complementation
due to their equivalence to regular languages. Thus, the language of every star free expression is
regular while the converse does not hold. Now let D
r
be a DFA with state set Q that accepts L(r).
For a string w = xy
n
, let q =

(q
0
, xy
n
) and let q

=

(q
0
, xy
n+1
), where q
0
is the start state of D
r
.
Denote by F the set of accept states of D
r
.
The statement xy
n
z L if and only if xy
n+1
z L is equivalent to the following: for every string
z

,

(q, z) F if and only if

(q

, z) F. Identically

(q, z) F if and only if

(q

, z) F.
3
Now, not belonging to F is the same as belonging to Q F. Thus

(q, z) Q F if and only if

(q

, z) Q F. If we switch the accepting/non-accepting attribute of every state of D, we get a


DFA for the complement of L(r). But, our latter statement shows that in the complement DFA,
still xy
n
z is accepted if and only if xy
n+1
z is accepted. Thus, L( r) is also aperiodic.
A much simpler arguement is the following: Suppose xy
n
z L( r) =

L(r) but xy
n+1
z

L(r). Then
xy
n+1
z L(r). That means xy
n
z L(r), which is a contradiction.
4