Vous êtes sur la page 1sur 61

SSK5204

Chapter 3:
Regular Expressions and Languages
DR. NOR FAZLIDA MOHD SANI
DEPT. OF COMPUTER SCIENCE
FAC. OF COMPUTER SCIENCE AND INFORMATION
TECHNOLOGY, UPM.

Regular expressions

Introduction
3

RE is the regular operations to build up expressions

describing languages.
RE play important role in CS application

Applications involving text, user may want to search for string


that satisfy certain patterns
RE provide a powerful method for describing such patterns

Utilities such as AWK and GREP in UNIX, PERL,

and text editors are all provide mechanisms for


description of patterns by using RE.

String concatenation
4

s = 011

t = 101

s = a1an t = b1bm

st = 011101
ts = 101011
ss = 011011
sst = 011011101

st = a1anb1bm

Operations on languages
5

The concatenation of languages L1 and L2 is

L1L2 = {st: s L1, t L2}


The n-th power of Ln is

Ln = {s1s2...sn: s1, s2, ..., sn L}


The union of L1 and L2 is

L1 L2 = {s: s L1 or s L2}

Example
6

L1 = {0, 01}

L2 = {e, 1, 11, 111, }


any number of 1s

L1L2

= {0, 01, 011, 0111, } {01, 011, 0111, }


= {0, 01, 011, 0111, }
0 followed by any number of 1s

L12

= {00, 001, 010, 0101}

L1 L2 = {0, 01, e, 1, 11, 111, ...}

L22 = L2
L2n = L2 (n 1)

Operations on languages
7

The star of L are all strings made up of zero or more

chunks from L:

L* = L0 L1 L2

This is always infinite, and always contains e

Example: L1 = {01, 0}, L2 = {e, 1, 11, 111, }.

What is L1* and L2*?

Example
8

L1 = {0, 01}

L2 = {e, 1, 11, 111, }


any number of 1s

L12 = {00, 001, 010, 0101}

L1*: 00100001 is in L1*

L1*

L22 = L2
L2n = L2 (n 1)

00110001 is not in L1*

L2* = L20 L21 L22

10010001 is not in L1*

= {e} L21 L22

are all strings that


start
with 0 and do not contain
consecutive 1s

= L2
L2* = L2

Constructing languages with operations


9

Lets say S = {0, 1}


We can construct languages by starting with simple

ones, like {0}, {1} and combining them


{0}({0}{1})*

0(0+1)*
all strings that start with 0

({0}{1}*)({1}{0}*)

01*+10*
0 followed by any number of 1s, or
1 followed by any number of 0s

Regular expressions
10

A regular expression over S is an expression formed

using the following rules:

The symbol is a regular expression


The symbol e is a regular expression
For every a S, the symbol a is a regular expression
If R and S are regular expressions, so are R+S, RS and R*.

A language is regular if it is represented


by a regular expression

Examples
11

S = {0, 1}
01* = 0(1*) = {0, 01, 011, 0111, }

0 followed by any number of 1s

(01*)(01) = {001, 0101, 01101, 011101, }


0 followed by any number of 1s and then 01

Examples
12

0+1

strings of length 1

= {0, 1}

(0+1)* = {e, 0, 1, 00, 01, 10, 11, }

(0+1)*010

(0+1)*01(0+1)*

any string

any string that ends in 010

any string that contatins the pattern 01

Examples
13

((0+1)(0+1))*+((0+1)(0+1)(0+1))*
all strings whose length is even or a mutliple of 3
= strings of length 0, 2, 3, 4, 6, 8, 9, 10, 12, ...
((0+1)(0+1))*

strings of even length

(0+1)(0+1)

strings of length 2

((0+1)(0+1)(0+1))*

strings of length a multiple of 3

(0+1)(0+1)(0+1)

strings of length 3

Examples
14

((0+1)(0+1)+(0+1)(0+1)(0+1))*
strings that can be broken in blocks,
where each block has length 2 or 3

(0+1)(0+1)+(0+1)(0+1)(0+1)

strings of length 2 or 3

(0+1)(0+1)

strings of length 2

(0+1)(0+1)(0+1)

strings of length 3

Examples
15

((0+1)(0+1)+(0+1)(0+1)(0+1))*
strings that can be broken in blocks,
where each block has length 2 or 3
e 1

10

011

00110

011010110

this includes all strings except those of length 1


((0+1)(0+1)+(0+1)(0+1)(0+1))* = all strings except 0 and 1

Examples
16

(1+01+001)*(e+0+00)

ends in at most two 0s


there can be at most two 0s between
consecutive 1s
there are never three consecutive 0s
Guess: (1+01+001)*(e+0+00) = {x: x does not contain 000}
e

00

0110010110

0010010

Examples
17

Write a regular expression for

all strings with two consecutive 0s.


(anything) 00 (anything else)

(0+1)*00(0+1)*

S = {0, 1}

Examples
18

S = {0, 1}

Write a regular expression for

all strings that do not contain two consecutive 0s.

0110101101010
blocks ending in 1 last block

... at most one 0 in every block ending in 1 (1 + 01)


... and at most one 0 in the last block

(e + 0)
(1 + 01)*(e + 0)

Examples
19

Write a regular expression for

all strings with an even number of 0s.


even number of zeros = (two zeros)*

two zeros = 1*01*01*


(1*01*01*)*

S = {0, 1}

Main theorem for regular languages


20

A language is regular if and only if it is the


language of some DFA

DFA

NFA

regular languages

regular
expression

Road map
21

NFA
regular
expression

NFA

without e

DFA

Examples: regular expression NFA


22

R1 = 0

R2 = 0 + 1

q0

q1
0

q2

q3

q0

q1
e

q4

q5

e
e

R3 = (0 + 1)*

q0

M2

M2

q1

Regular expressions
23

A regular expression over S is an expression formed

using the following rules:

The symbol is a regular expression


The symbol e is a regular expression
For every a S, the symbol a is a regular expression
If R and S are regular expressions, so are R+S, RS and R*.

General method
24

regular expr

NFA

q0

q0

symbol a

q0

RS

q0

q1
MR

MS

q1

General method continued


25

regular expr

NFA
e

R+S

MR

q0

q1
e

MS

e
e
R*

q0

MR

q1

Road map
26

regular
expression

NFA

NFA

without e

DFA

NFAs, DFAs,
and regular expressions

27

Three ways of doing it


28

L = {x S*: x ends in 01}

S = {0, 1}
0

q0

0, 1
q0
0

q00

1
q01

qe
1
1

0
q1

q10

(0+1)*01

0
q11
1

q2

NFA

q1

DFA

regular
expression

They are all the same


29

DFA

NFA

regular languages

regular
expression

Road map
30

NFA
regular
expression

NFA

without e

DFA

Examples: regular expression NFA


31

R1 = 0

q0

q1

R2 = 01

q0

q1

q2

Examples: regular expression NFA


32

R3 = 0 + 01

q1

q2

NFA3

q0

q6
e

q3

q4

q5

e
e

R4 = (0 + 01)*

q0

NFA3

q1

Regular expressions NFA


33

In general, how do we convert a regular


expression to an NFA?
A regular expression over S is an expression formed

using the following rules:

The symbol is a regular expression


The symbol e is a regular expression
For every a S, the symbol a is a regular expression
If R and S are regular expressions, so are R+S, RS and R*.

General method
34

regular expr

NFA

q0

q0

aS

q0

RS

q0

q1
NFAR

NFAS

q1

General method continued


35

regular expr

NFA
e

R+S

NFAR

q0

q1
e

NFAS

e
e
R*

q0

NFAR

q1

Road map
36

regular
expression

NFA

NFA

without e

DFA

Road map
37

regular
expression

NFA

NFA

without e

DFA

Road map
38

NFA
regular
expression

2-state

GNFA

NFA

without e

DFA
GNFA

Generalized NFAs
39

A generalized NFA is an NFA whose transitions are

labeled by regular expressions, like


0*1
q0

moreover

e+10*

q1

0*11

q2

01

It has exactly one accept state, different from its start state
No arrows come into the start state
No arrows go out of the accept state

Converting a DFA to a GNFA


40

NFA
regular
expression

2-state

GNFA

NFA

without e

DFA
GNFA

q3
q0

q1

e
e

q5

qf

Conversion example
41

0
q0

q1

1
1
0

q2

q3

It has exactly one accept state, different from its start state

No arrows come into the start state

No arrows go out of the accept state

GNFA state reduction


42

NFA
regular
expression

2-state

GNFA

NFA
without e

DFA
GNFA

We will eliminate every state but the start


and accept states

State elimination
43

0*1
q0

e+10*

q1

0*11

q2

01

q0

(e+10*)(0*1)*0*11

q2

01

q0

(e+10*)(0*1)*0*11 + 01

q2

State elimination general method


44

To eliminate state qk, for every pair of states (qi, qj)


R2

Replace

qi

R1

qk

R3

qj

R4

by

qi

R1R2*R3 + R4

remember to do this even when qi = qj!

qj

Road map
45

regular
expression

NFA

2-state

GNFA

q0

q1

NFA

without e

GNFA

DFA

A 2-state GNFA is the same


as a regular expression R!

Conversion example
46

0
q0

1
1

q1

q2

q3

00*1+1

Eliminate q1:

q0

Eliminate q2:

q0

0*1

q2

q3

0*1(00*1+1)*

q3

Check:

0*1(00*1+1)* =

q1

1
1
0

q2

Check your answer!


47

1
1

q1

q2

0*1(00*1+1)*
=

0*1(0*1)*

All strings that end in 1


(0 + 1)*1
Always ends in 1
Does every string that ends in 1
have this form?

011001000101

Yes!

What you need to know


48

DFA

NFA

regular
expression

regular languages

Design

Analyze

Convert

Text search

49

The program grep


50

grep -E regexp file


Searches for the occurrence of patterns matching
a regular expression
cat|12

{cat, 12}

union

[abc]
[ab][12]
(ab)*
[ab]?
(cat)+
[ab]{2}

{a, b, c}
{a1, a2, b1, b2}
{e, ab, abab, ...}
{e, a, b}
{cat, catcat, ...}
{aa, ab, ba, bb}

shorthand for a|b|c


concatenation
star
zero or one
one or more
{n} copies

Searching with grep


51

Words containing savor or savour

Words with 5 consecutive a or b

grep E `savou?r` words

grep E `[ab]{5}` words

outsavor
savor
savored
savorer
savorily
savoriness
savoringly
savorless
savorous

savorsome
savory
savour
unsavored
unsavoredly
unsavoredness
unsavorily
unsavoriness
unsavory

grabbable

grep E `zo+zo+` words


zoozoo

More grep commands


52

.
any symbol
[a-z] anything in a
range beginning of line
\<
$
end of line

1 If a DFA is too hard, I do an ...

5
1 n
2 s u
3

r e g

2 SSK5204 assignment makes me ...


3 If a DFA likes it, it is ...

4 s

4 $10000000 = $10?
5 I study 5204 hard because it will make me ...
grep E `\<.ff.u..t` words

a
f
f
l
u
e
n
t

a
f e r
l a r

a r

how do you look for...


53

Words that start in cat and have another cat


grep E `\<cat.*cat` words

Words with at least ten vowels?


grep E `([aeiouy].*){10}` words

Words without any vowels?

[^R] does not contain


R
grep E `\<[^AEIOUYaeiouy]*$` words

Words with exactly ten vowels?


grep E `\<[^AEIOUYaeiouy]*
([aeiouy][^AEIOUYaeiouy]*){10}$` words

How grep (could) work


54

regular
expression

NFA

NFA

without e

DFA

text file

differences

in class

in grep

[ab]?, a+, (cat){3}

not allowed

allowed

input handling

matches whole looks for pattern

output

accept/reject

finds pattern

Implementation of grep
55

How do you handle expressions like:

[ab]? ()|[ab]

zero or one
R? e|R

(cat)+ (cat)(cat)*

one or more
R+ RR*

a{3}

aaa

[^aeiouy]
any

{n} copies
R{n} RR...R

n times

not containing

Algebraic Laws

56

Algebraic Laws for RE


57

RE have number of laws that work for them, which

similar to arithmetic
But there are also some laws that apply to RE but not
for arithmetic

Associativity and Commutativity


58

Commutativity is property of operator that can swith

the order of its operands and get the same result,


x+y=y+x.
Associativity is property of operator that allows us to
regroup the operands when the operator is applied
twice; (x y) z = x (y z)
There laws of these types that hold for REs:
1.
2.
3.

L + M = M + L. Commutative law for union


(L + M) + N = L + (M + N). Associative law for union
(LM)N = L(MN). Associative law for concatenation

Identities and Annihilators


59

Identity for operator is a value when operator is applied to

the identity and some other value, the result is the other
value; e.g.

0 is identity for addition, 0 + x = x + 0 = x


1 is identity for multiplication, 1 x = x 1 = x

Annihilator for operator is a value when operator is applied

to annihilator and some other value, the result is the


annihilator; e.g.

0 is annihilator for multiplication, 0 x = x 0 = 0

Three laws for REs:


1.
+ L = L + = L. is the identity for union
2.
eL = Le = L. e is the identity for concatenation
3.
L = L = . is the annihilator for concatenation

Distributive Laws
60

Distributive law involves two operators, asserts that

one operator can be pushed down to be applied to


each argument of the other operator individually.
These laws are:
1.

2.

L(M + N) = LM + LN, the left distributive law of


concatenation over union
(M + N)L = ML + NL, the right distributive law of
concatenation over union

Laws Involving Closures


61

There are number of laws involving the closure

operators and its UNIX-style variants + and ?.


1.
2.
3.
4.
5.

(L*)* = L*
* = e
L+ = LL* = L*L
L* = L+ + e
L? = e + L

Vous aimerez peut-être aussi