RE Theorems

CS2 Language processing note 4
CS2Ah 22.10.04

Regular expressions and Kleenes Theorem
This note presents regular expressions, another method for describing regular
languages. We show how to convert a regular expression to an equivalent NFA
with -transitions, and, conversely, how to convert a DFA to an equivalent regular expression. Taken together, these results yield Kleenes Theorem which
states that a language is regular if, and only if, it can be described by a regular
expression.
Regular expressions
We have seen how sets of strings can be described by automata. A language

over an alphabet is described by constructing a DFA or NFA that accepts precisely the strings that are in . This may be viewed as a dynamic description
of since the characterisation is via a notion of computation. In this lecture we
consider an alternative static method for describing languages, using a simple mechanism for describing how the strings in the language are built. The
mechanism is to define a language using a regular expression.
We begin by presenting the syntax for regular expressions. We then go on to

associate a language with each regular expression, thereby giving a meaning to
the expressions. This latter part is usually referred to as defining a semantics
for regular expressions.
Regular expressions over the alphabet
formation rules:

are produced using the following
is a regular expressions.
is a regular expression.
Every symbol a in
and
is a regular expression.

.
If and are regular expressions then so is .
If is a regular expression then so is .

To define the language described by a regular expression , we first define
the language associated to each of the basic expressions , and then describe
how to interpret the operations +, concatenation, and (often referred to as
If
are regular expressions then so is
Kleene-star) used to build the whole class of regular expressions.

1
CS2Ah 22.10.04
.
.
a a .

Note that the first two languages are subtly different:

is the language containing the empty string as its only string; and is the empty language, which
contains no strings. Also,
is the language containing the string as its only
string.

and are regular expressions and that we have

Now let us suppose that
already defined the languages
and
. We define
,
, and
as follows:

!
" $# ,
&%('*)&%,+ '-+ ,
. #/&%0)1%*+ #2&%435%(67)&%43 %(6+ #2898:8
;<%43=898:8>%@?A)1BC+CDE %43 8:898 %@?F+ &G(8
(the language consisting of all concatenations of finitely many strings from
).
It is important to keep in mind that a regular expression does not describe a

single string, but a language, that is, a set of strings.
I
H
K6L
MN
KO-
>P
Example 4.1. Suppose that

a b c . Then examples of regular expressions
over
are
ab abc,
a b c , and
aaa
aaaaa . These
expressions define the following languages:
73J
>
3M ab abc
K6Q consists of all strings of even length having an a at all even positions
and either a b or a c at all odd positions, that is,
K6Q ; a%43 a%(6 a%(OR8:8:8 a%@?A)SBC+CDE %43 8:8:8 %@?F+T b c G

ab ac abab abac acab acac ababab 898:8 8
KOQ a? )UB/+,DE divisible by 3 or 5 .
From regular expressions to NFAs with -transitions
Proposition 4.2. For every regular expression
such that
transitions
,V Q8

2
there exists an NFA with -
CS2Ah 22.10.04
Proof: The proof is by induction on the structure of the regular expression

. This means that we define the NFA
following the rules that were used to
generate .

W

We first show how to construct NFAs for the basic regular expressions
and
a for a
. Recall that
,
, and a
a . Figure 1 shows three
simple NFAs recognising these three languages.
+C
Figure 1: NFAs for the regular expressions ,
3X
K6
V
V,Y

ZV 3 V76
VF3>Y
VF3
V76
V76
V
and a
7 3 K 6
3M 7V 6QY 6Q
V
[:E
V
VL3
Suppose next that

, where
and
are regular expressions. We
want to construct an NFA
with
. We assume that we already have
constructed NFAs
and
with
and
. Let
and
be the starting states of
and
, respectively. The desired NFA
is obtained
by taking
and
together with a new starting state
which is connected to
and by an transition. The final states of
are those of
and those of .
Figure 2 gives a schematic picture of .
[13
[96
VF3
q1
N1
q2
N2
[13
[96
V76
Figure 2: An NFA for the regular expression
]] 3>K6 3 K6

V
,V Y

ZV 3
V76
FV 3M 3M
VZ3
V76
V
\3$ 6
Suppose next that

, where
and
are regular expressions. We
with
want to construct an NFA
constructed NFAs
and
with
and
. Let
and
be the starting states of
and
, respectively, and let
be the set of final
states in . The desired NFA
is obtained by taking
and
and connecting
[96
VZ3
V 6Q7 6Q

^3
VZ3 V76
[13
^3
[96
CS2Ah 22.10.04
[_3
all states in
with
by an -transition. The starting state of
is , and the
final states are all final states of
. Figure 3 gives a schematic picture of .
V\6
N1
q1
N2
`
3
3

V
VF3M*V "/ 3M 1[ 3
ZV 3
V
[
[S3
\3>6
Finally, suppose that

, where
is a regular expression. We want
to construct an NFA
with
constructed an NFA
with
. Let be the starting states of
and
let
be the set of final states. The desired NFA is obtained from
by adding
a new state , connecting with
by an -transition, and connecting all states
in
with by an -transition. The start state of is , and the set of final states
of
is just
. Figure 3 gives a schematic picture of .
^ 3
^3
[
[
U[
VL3
VZ3
N1
q1
3
From DFAs to regular expressions

Proposition 4.3.For every DFA
there exists a regular expression
ab 8
4
such that
CS2Ah 22.10.04
We do not give a proof here. Such a proof can be found in Introduction to

Automata Theory, Languages, and Computation (2nd Edition) by J. E. Hopcroft,
R. Motwani, and J. D. Ullman, Addison-Wesley, 2001. We just illustrate how to
construct a regular expression from an automaton with one example. The idea is
to eliminate the states of the automaton one by one. As we proceed, we replace
the labels on the transitions of the automaton, which are initially just letters
from the alphabet, by regular expressions. We end up with a simple automaton
with just two states; the transitions of this automaton will immediately give us
the desired regular expression. A slight problem with this construction is that
we cannot eliminate final states so easily, so in a first normalisation step we
replace our initial DFA by one with just one final state, at the price of introducing
some transitions.
Example 4.4. Consider DFA
of Figure 5: After the normalisation step, we
b
a
b
a
a
a
b
b
Figure 5: A DFA
obtain the NFA with transitions displayed in Figure 6, which obviously accepts
the same language as . Now we start eliminating states and obtain the automata displayed in Figures 7. All these automata recognise the same language.
The language of the last automaton of Figures 7 can easily be seen to be
for
the regular expression
c M a

M a
ba b

ba a
M a
bb b a
ba
$ b
Kleenes Theorem
Putting results of the previous sections together, we obtain Kleenes Theorem,
first proved by S.C. Kleene, one of the founders of modern mathematical logic.
5
CS2Ah 22.10.04
b
a
b
a
a
a
b
b
Figure 6: Normalisation

Theorem 4.5. A language

sion such that
.
is regular if and only if there exists a regular expres-

Proof: Proposition 4.3 shows that for every regular language

there exists a
regular expression
such that
, and Proposition 4.2 together with
Theorem 3.6 shows the converse.
CS2Ah 22.10.04
a+ba
bb
a
(a+ba)b
(a+ba)+b
(a+ba)a+bb
b
(a+ba)b+((a+ba)a+bb)b*a
(a+ba)+b
Figure 7: State elimination

7
CS2Ah 22.10.04
Exercises
1. Convert the following regular expressions to NFAs.
d d N

(b) d W>
M
ef
(c) d e d
(a)
Consider other examples too.

2. Convert the following DFA into a regular expression.
1
1
0
0
0
3. Convert the DFAs from lecture notes 1 and 2 into regular expressions.
Don Sannella

RE Theorems

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

RE Theorems

Transféré par

Droits d'auteur :

Formats disponibles

CS2 Language processing note 4

CS2 Language processing note 4

We have seen how sets of strings can be described by automata. A language

We begin by presenting the syntax for regular expressions. We then go on to

are produced using the following

are regular expressions then so is

Kleene-star) used to build the whole class of regular expressions.

CS2 Language processing note 4

Note that the first two languages are subtly different:

     

and are regular expressions and that we have

It is important to keep in mind that a regular expression does not describe a

Example 4.1. Suppose that

 K6Q ; a%43 a%(6 a%(OR8:8:8 a%@?A)SBC+CDE  %43  8:8:8  %@?F+T b  c  G

     ,V Q8

there exists an NFA with -

CS2 Language processing note 4

Proof: The proof is by induction on the structure of the regular expression

Figure 1: NFAs for the regular expressions ,

Suppose next that

Figure 2: An NFA for the regular expression

]] 3>K6    3  K6

Suppose next that

 V 6Q7  6Q

CS2 Language processing note 4

Figure 3: An NFA for the regular expression

Finally, suppose that

Figure 4: An NFA for the regular expression

From DFAs to regular expressions

there exists a regular expression

CS2 Language processing note 4

We do not give a proof here. Such a proof can be found in Introduction to

of Figure 5: After the normalisation step, we

CS2 Language processing note 4

Theorem 4.5. A language

is regular if and only if there exists a regular expres-

Proof: Proposition 4.3 shows that for every regular language

CS2 Language processing note 4

Figure 7: State elimination

CS2 Language processing note 4

Consider other examples too.

Vous aimerez peut-être aussi

K6Q ; a%43 a%(6 a%(OR8:8:8 a%@?A)SBC+CDE %43 8:8:8 %@?F+T b c G

,V Q8

]] 3>K6 3 K6

V 6Q7 6Q