Automata Theory and Formal Languages Introduction

WELCOME TO SSK5204
AUTOMATA THEORY AND FORMAL LANGUAGE

DR. NOR FAZLIDA MOHD SANI
DEPT. OF COMPUTER SCIENCE,
& INFORMATION SECURITY RESEARCH GROUP
1
CHAPTER 1: INTRODUCTION
Why study automata?
Terminology and mathematical concepts
Formal proof
Concepts of automata theory
WHY STUDY AUTOMATA?

Automata theory is the study of abstract computing
devices, or machines
1930s, before there were computers, Alan Turing
studied an abstract machine that had all the
capabilities of todays computer
Turings goal was to describe what computer could do and

could not
1940s and 1950s, simpler kinds of machine, which

today call finite automata studied by a number of
researchers.
Originally proposed to model brain function, turned out to

be extremely useful for a variety of other purposes
CONT.
Late 1950s, the linguist N. Chomsky study of formal

grammars
In 1969, C. Cook extended Turings study
Not strictly machine, these grammars have close relationship

to abstract automata and serve today as the basis of some
important software components, including parts of compilers
Cook able to separate those problems that can be solved
efficiently by computer from those problems that can in
principle be solved, but in practice take so much time
problem called intractable or NP-hard
All these theoretical developments bear directly on what

computer scientists do today
Finite automata & certain formal grammars used in the
design and construction of important kinds of software
Turing machine help understand what we can expect from
our software
CONT.
Regular expressions are used in many systems.

E.g., UNIX a.*b.
E.g., document-type definition, DTDs describe XML tags with
a RE format like person (name, addr, child*).
Finite automata model protocols, electronic circuits.
Context-free grammars are used to describe the syntax

of essentially every programming language.
Theory is used in model-checking.
Not to forget their important role in describing natural

languages.
And DTDs taken as a whole, are really CFGs.

5
CONT.
When developing solutions to real problems, we

often confront the limitations of what software can
do.
Undecidable things no program whatever can do it.
Intractable things there are programs, but no fast
programs.
Well learn how to deal formally with discrete

systems.
Proofs: You never really prove a program correct, but

you need to be thinking of why a tricky technique really
works.
Well gain experience with abstract models and

constructions.
Models layered software architectures.
AUTOMATA THEORY
Automata theory deals with the definitions and properties of

mathematical models of computation.
These models play a role in several applied areas of computer
science, such as:
Finite automata used in text processing, compiler, and
hardware design
Context-free grammar used in programming languages
and artificial intelligence
Excellent place to begin study of the theory of computation.
Allows practice with formal definitions of computation as it
introduces concepts relevant to other nontheoretical areas of
computer science.
7
TERMINOLOGY AND MATHEMATICAL CONCEPTS

- SETS
A group of object represented as a unit

May contain any type of object, incl. numbers, symbols, and
even other sets.
The objects in a set are called its elements or members.
Symbol and denote set membership and nonmembership
Thus, set {7, 21, 57} contains the element 7, 21, and 57
7 {7, 21, 57} and 8 {7, 21, 57}
A is a subset of B, written A B, if every member of A also is

a member of B
A is a proper subset of B, written A B, if A is a subset of B
and not equal to B.
8
SETSCONT.
An infinite set contains infinitely many elements

The set of natural numbers as {1,2,3,}
The set of integer is written {,-2,-1,0,1,2,}
The set with 0 members is called empty set, written
To describe a set containing elements according to some rule,

write {n|rule about n}
{n|n = m2 for some m N} means the set of perfect squares.
Two sets A and B, the union of A and B, written A B,

combining all the elements in A and B into a single set
Intersection of A and B, written A B, is the set of elements
that are in both A and B.
The compliment of A, , is the set of all elements under
consideration that are not in A.
Venn diagram examples
SEQUENCES AND TUPLES
A sequence of objects is a list of these objects in some order
Examples: sequence 7, 21, 57 written (7,21,57)
In sequence the order and repetition does matter.

Sequences may be finite or infinite.
Finite sequences often are called tuples
A sequence with k elements is a k-tuple. Thus (7,21,57) is a 3tuple. A 2-tuple is also called pair.
Sets and sequences may appear as elements of other sets

and sequences.
Power set of A is the set of all subsets of A
If A is the set {0,1}, the power set of A is the set {, {0}, {1}, {0,1}}.
The set of all pairs whose elements are 0s and 1s is {(0,0), (0,1),
(1,0), (1,1)}
10
SEQUENCES AND TUPLESCONT.
If A and B are two sets, the Cartesian Product or cross product of

A and B, written A B
Is the set of all pairs wherein the first element is a member of A and
the second element is a member of B.
Also can take the Cartesian product of k sets, A1, A2, , Ak, written
A1 A2 Ak, It is the set consisting of all k-tuples (a1,a2,,ak)
where ai Ai.
Example: If A = {1,2} and B = {x,y,z}

A B = {(1,x), (1,y), (1,z), (2,x), (2,y), (2,z)}.
Example: If A and B are as above example,

A B A = {(1,x,1), (1,x,2), (1,y,1), (1,y,2), (1,z,1), (1,z,2), (2,x,1), (2,x,2), (2,y,1),
(2,y,2), (2,z,1), (2,z,2)}.
If we have the Cartesian product of a set with itself, we use the

shorthand A A A = Ak
k
Example: The set N2 equals N N . It consists of all pairs of natural numbers. May
also write as {(i,j)|i,j 1}.
11
FUNCTIONS AND RELATIONS
A function is an object that sets up an input-output relationship.
If f is a function whose output value is b when the input value is a,

write as
f (a) = b.
Function also called a mapping
The set of possible inputs to the function called its domain.
The outputs of a function come from a set called its range.
The notation for saying that f is a function with domain D and range R
is
f:DR
Describe a specific function in several ways:

Procedure for computing an output from a specified input
Table that list all possible inputs and gives the output for each input.
12
FUNCTIONS AND RELATIONSCONT.
Example: Consider the function f : {0,1,2,3,4} {0,1,2,3,4}.

n
f(n)
This function adds 1 to its input and then outputs the result modulo 5.
A number modulo m is the remainder after division by m. For
example, the minute hand on a clock face counts modulo 60. When
we do modular arithmetic we define Zm = {0,1,2,,m-1}.With this
notation, the aforementioned function f has the form f : Z5 Z5.
13
Example: Two-dimensional table is used if the domain of

function is the Cartesian product of two sets. Function, g : Z4
Z4 Z4. The entry at the row labeled i and the column labeled
j in the table is the value of g(i,j).
g
The function g is the addition function modulo 4.
14
When domain of a function f is A1 Ak for some sets A1, , Ak,

the input to f is a k-tuple (a1,a2,,ak) and we call ai the arguments to
f.
A function with k arguments is called a k-ary function, and k is the

arity of the function.
If k is 1, f has single argument and f is called a unary function.
If k is 2, f is a binary function.
A predicate or property is a function whose range is {TRUE, FALSE}.
A property whose domain is a set of k-tuples A A is called a

relation, a k-ary relation, or a k-ary relation on A.
A common case is a 2-ary relation, called binary relation.
If R is a binary relation, the statement aRb means that Arb =
TRUE.
Similarly if R is a k-ary relation, the statement R(a1,,ak) means
that R(a1,,ak) = TRUE.
15
Example: in childrens game called Scissor-Paper-Stone, the two player simultaneously

select a member of the set {SCISSORS, PAPER, STONE} and indicate their selection
with hand signals. If the two selections are the same, the game starts over. If the
selections differ, one player wins, according to the relation beats.
beats
SCCISSORS
PAPER
STONE
SCCISSORS
FALSE
TRUE
FALSE
PAPER
FALSE
FALSE
TRUE
STONE
TRUE
FALSE
FALSE
From table can determine that SCISSORS beats PAPER is TRUE and that PAPER beats
SCISSORS is FALSE.
Describing predicates with sets instead of functions is more convenient. The predicate P :
D {TRUE, FALSE} may be written (D,S), where S = {a D| P(a) = TRUE}, or simply S
if the domain D is obvious from the context. Hence the relation beats may written
{(SCISSORS, PAPER), (PAPER, STONE), (STONE SCISSORS)}.
16

Special type of binary relation, called an
equivalence relation, captures the notion of two
objects being equal in some feature.
A binary relation R is an equivalence relation if R
satisfies three condition:
1.
2.
3.
R is reflexive if for every x, xRx;

R is symmetric if for every x and y, xRy implies yRx;
and
R is transitive if for every x, y, and z, xRy and yRz
implies xRz.
17
Example: Define an equivalence relation on the

natural numbers, written 7. For i, j, N say that i 7
j, if i-j is a multiple of 7. This is an equivalence
relation because it satisfies the three conditions.
First, it is reflexive, as i i = 0, which is a multiple of 7.
Second, it is symmetric, as i j is a multiple of 7 if j i is
a multiple of 7.
Third, it is transitive, as whenever i-j is a multiple of 7
and j-k is multiple of 7, then i-k = (i-j) (j-k) is the sum of
two multiples of 7 and hence a multiple of 7, too.
18
GRAPHS
An undirected graph, or simply graph, is a set of points with

lines connecting some of the points.
The points are called nodes or vertices, and the lines are
called edges, as shown in following figure.
1
3
(a) Degree =
(b) Degree =
The number of edges at a particular node is the degree of that
node.
No more than one edge is allowed between any two nodes.
19
GRAPHSCONT.
In graph G that contains nodes i and j, the pair (i, j) represents

the edge that connects i and j.
The order of i and j doesnt matter in an undirected graph, so
the pairs (i, j) and (j, i) represent the same edge.
If V is the set of nodes of G and E is the set of edges, we say
G = (V, E).
Graph can be describe with diagram or more formally by
specifying V and E.
1
Example:
3
4
Formal description for graph (a) is

({1,2,3,4,5}, {(1,2), (2,3), (3,4), (4,5), (5,1)})
Formal description for graph (b)
({1,2,3,4}, {(1,2), (1,3), (1,4), (2,3), (2,4), (3,4)})
(a)
5
2
1
(b)
3
20
GRAPHSCONT.
Graphs frequently are used to represent data.

For convenience, we label the nodes and/or edges of a graph,
which then called a labeled graph.
Graph G is a subgraph of graph H if the nodes of G are a
subset of the nodes of H, and the edges of G are the edges of
H on the corresponding nodes.
Figure shows a graph H and
a subgraph G (shown darker)
Path is a sequence of nodes connected by edges.
Simple path is a path that doesnt repeat any nodes.
A graph is connected if every two nodes have a path
between them.
A path is a cycle if it starts and ends in the same node.
21
GRAPHSCONT.
Simple cycle is one that contains at least three nodes and

repeats only the first and last nodes.
A graph is a tree if it is connected and has no simple cycles.
Tree may contain a specially designated node called the
root.
The nodes of degree 1 in a tree, other than the root, are
called the leaves.
If it has arrows instead of lines, the graph is a directed graph.
The number of arrows pointing froma particular node is the
outdegree of that node, and
The number of arrows pointing to a particular node is the
indegree.
22
GRAPHSCONT.
In directed graph, edge from i to j represented as a pair (i, j).
Formal description of directed graph G is (V, E) where V is the set

of nodes and E is the set of edges.
Formal description for graph below:

({1,2,3,4,5,6}, {(1,2), (1,5), (2,1), (2,4), (5,4), (5,6), (6,1), (6,3)}).
1
3
5
A path in which all the arrows point in the same direction as its steps
is called a directed path.
A directed graph is strongly connected if a directed path connects

every two nodes.
23
STRINGS AND LANGUAGES

Strings and characters are fundamental building
blocks in computer science.
Alphabet to be any nonempty finite set.
The members of the alphabet are the symbols of
the alphabet.
Generally use capital Greek letters and to
designate alphabets and a typewriter font for
symbols from an alphabet.
Example of alphabets:
1 = {0,1};
2 = {a,b,c,d,e,f,g,h,i,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z};
= {0,1,x,y,z}.
24
STRINGS AND LANGUAGES CONT.
A string over an alphabet is a finite sequence of symbols

from that alphabet, usually written next to one another and not
separated by commas.
If 1 = {0,1}, then 01001 is a string over 1.
If w is a string over , the length of w, written |w|, is the
number of symbols that it contains.
The string of length zero is called the empty string and written
, plays the role of 0 in a number system
If w has length n, we can write w = w1w2wn where each wi
.
The reverse of w, written w , is the string obtained by writing
w in the opposite order (i.e., wnwn-1w1)
String z is a substring of w if z appears consecutively within
w.
25
STRINGS AND LANGUAGES CONT.
If we have string x of length m and string y of length n, the

concatenation of x and y, written xy.
String obtained by appending y to the end of x.
Superscript notation is used to concatenate a string with
itself many times.
The lexicographic ordering of strings is the same as the

familiar dictionary ordering, except that shorter strings
precede longer strings.
Thus the lexicographic ordering of all strings over the alphabet
{0,1} is
(, 0,1,00,01,10,11,000,)
26
A language is a set of strings.
BOOLEAN LOGIC
Boolean logic is a mathematical system built around the two

values TRUE and FALSE (Boolean values) always
represented by values 1 and 0.
Boolean values can be manipulated with special designed
operations, called Boolean operations, such as:
Negation or NOT , with symbol , the opposite value
Conjunction, or AND, with symbol , the conjunction of
two Boolean values is 1 if both of those values are 1
Disjunction, or OR, with symbol , the disjunction of two
Boolean values is 1 if either of those values is 1.
00=0
00=0
0 = 1
01=0
01=1
1 = 0
10=0
10=1
11=1
11=1
27
BOOLEAN LOGICCONT.
Other Boolean operations:

Exclusive or, or XOR, symbol , is 1 if either but not
both of its two operands are 1
Equality, symbol , is 1 if both of its operands have
the same value
Implication, symbol , is 0 if its first operand is 1 and
its second is 0; otherwise is 1.
00=0
01=1
10=1
11=0
00=1
01=0
10=0
11=1
00=1
01=1
10=0
11=1
28
BOOLEAN LOGICCONT.
Distributive law for AND and OR, similar for

addition and multiplication, which states that a (b
+ c) = (a b) + (a c). The Boolean version comes
in two forms:
P (Q R) equals (P Q) (P R), and its dual

P (Q R) equals (P Q) (P R)
29
FORMAL PROOF
Proof is something that every computer scientist needs to

understand
Formal proof of the correctness of a program should go handin-hand with writing of the program itself.
Recursion or iteration might unlikely write the code correctly
when testing tells the code is incorrect, we still need to get it
right
To make recursion or iteration correct need to set up an inductive

hypothesis.
The process of understanding the workings of a correct program,

same as the process of proving theorems by induction.
Automata theory cover methodologies of formal proof:

Deductive (sequence of justified steps), and
2. Inductive (recursive proofs of a parameterized statement that use
the statement itself with lower values of the parameter)
1.
30
DEDUCTIVE PROOFS
Consists of a sequence of statements whose truth
leads from some initial statement, called hypothesis
or the given statement(s), to a conclusion
statement.
Hypothesis may be true or false, typically consists
of several independent statements connected by
logical AND
Theorem is proved when go from a hypothesis H to
a conclusion C, the statement is if H then C, says
that C is deduced from H.
31
DEDUCTIVE PROOFSCONT.
Example:
Theorem 1.3: If x 4, then 2x x2.

Can convince informally that Theorem 1.3 is true, with H
is x 4, has parameter x, thus neither true nor false.
Its true depends on value of parameter x; e.g., H is true
for x = 6 and false for x = 2.
The C is 2x x2 uses parameter x and true for certain
values of x. C is false for x = 3, since 23 = 8, which is
not as large as 32 = 9. on the other hand, C is true for x
= 4, since 24 = 42 = 16. For x = 5, the statement is also
true, since 25 = 32 and 52 = 25.
We have completed an informal but accurate proof. (we
shall return to the proof and make it more precise in
inductive proofs)
32
REDUCTION TO DEFINITIONS
Many in automata theory, the terms used in the statement

may less obvious.
If not sure how to start a proof, convert all terms in the
hypothesis to their definitions.
Example:
Theorem 1.5: Let S be a finite subset of some infinite set U. Let T
be the complement of S with respect to U. Then T is infinite.
Restating the facts into definitions:
Original Statement
New Statement
S is finite
There is a integer n such

that S= n
U is infinite
For no integer p is U= p
T is complement of S
S T = U and S T =
33
REDUCTION TO DEFINITIONSCONT.
Need to use a common proof technique called proof by
contradiction, which assume that the conclusion is false. Then
use the assumption, together with parts of the hypothesis, to
prove the opposite of one of the given statements of the
hypothesis.
The contradiction of conclusion is T is finite. Restate the
assumption that T is finite as T= m for some integer m.
One of the given statement, S T = U and S T = . Element
of U are exactly the elements of S and T. Thus, there must be n +
m elements of U. Since n + m is an integer, we have shown that
U= n + m, follows that U is finite. But the statement that U is
finite contradicts the given statement that U is infinite.
By the principle of proof by contradiction we may conclude the
theorem is true.
34
REDUCTION TO DEFINITIONSCONT.
Proofs do not have to be so wordy.

The reprove of theorem in a few lines:
PROOF: (of Theorem 1.5) We know that S T = U and S
and T are disjoint, so S + T= U. Since S is finite,
S = n for some integer n, and since U is infinite, there is
no integer p such that U= p. So assume that T is finite;
that is; T = m for some integer m. Then U=S +
T= n + m, which contradicts the given statement that
there is no integer p equal to U.
35
OTHER THEOREMS FORMS
The if-then form of theorem is most common in typical areas

of mathematics.
However, there are other kinds of statement proved as
theorems also.
Ways of Saying If-Then
Some other ways in which if H then C might appear:

H implies C, H only if C, C if H, or Whenever H holds, C follows
(and with other variants form).
If-And-Only-If Statements
Form of A if and only if B, other form A iff B, A is equivalent to B,

or A exactly when B.
These statements are actually two if-then statements: if A then B
and if B then A
To prove A if and only if B by proving two statements:
1.
2.
The if part: if B then A, and

The only-if part: if A then B, which often stated in equivalent form A
only if B
36
ADDITIONAL FORMS OF PROOF
Proving Equivalence About Sets

In automata theory, we are frequently asked to prove a theorem
which says that the sets constructed in two different ways are the
same sets.
Often this sets are sets of character strings, and the sets are
called languages.
If E and F are two expressions representing sets, the statement
E=F means that the two sets represented the same.
Commutative law of union says that we can take the union of two
sets R and S in either order., R S = S R
Contrapositive
The contrapositive of the statement if H then C is if not C then

not H.
A statement and its contrapositive are either both true or both
false, so we can prove either to prove the other.
37
ADDITIONAL FORMS OF PROOFCONT.
Counterexample
A strategy for implementing a program for example and
need to decide whether or not the theorem is true.
The resolve the question, we may alternately try to prove
the theorem, and if cannot, try to prove that the statement
is false.
Proof by Contradiction
Another way to prove a statement of form if H then C is to
prove the statement H and not C implies falsehood.
Start by assume hypothesis H and the negation of the
conclusion C.
Complete the proof by showing something known to be
false. (example Theorem 1.5)
38
INDUCTIVE PROOFS
Form of proof that is essential when dealing with
recursively defined objects or concepts such as
trees and expressions of various sorts.
Inductions on Integers
Given statement S(n), n integer to prove. Common

approach is to prove:
1.
2.
Basis show S(i) for a particular integer i. Usually i=0 or i=1.

(or maybe higher, i depends on S)
Induction step assuming n i, where i is the basis integer,
and show that if S(n) then S(n+1).
The Induction Principle: If we prove S(i) and we prove that for all n
i, S(n) implies S(n+1), then we may conclude S(n) for all ni.
39
INDUCTIVE PROOFSCONT.
Example: Theorem 1.3 states that If x 4, then 2x x2.

BASIS: If x =4, then 2x and x2 are both 16. Thus 24 42
holds.
INDUCTION: Suppose for some x 4 that 2x x2. We
need to prove the same statement with x+1 in place of x,
that is 2[x+1] [x+1]2.
In this case, we can write 2[x+1] as 2 2x . Since S(x) tells us
that 2x x2, we can conclude that 2x+1 = 2 2x 2x2 .
But we need to show that 2x+1 (x+1)2. One way to prove this
statement is to prove that 2x2 (x+1)2 and then use the
transitivity of to show 2x+1 2x2 (x+1)2 . In our proof that
2x2 (x+1)2
(1.1)
we may use the assumption that x 4. Begin by simplifying
(1.1):
x2 2x+1
(1.2)
40
INDUCTIVE PROOFSCONT.
Divide (1.2) by x, to get:
x 2+
(1.3)
Since x 4, we know 1/ x 1/4. thus, left side of (1.3) is at

least 4, and the right side is at most 2.25. We have thus
proved the truth of (1.3). Therefore, Equations (1.1) and (1.2)
are also true. Equation (1.3) in turn gives us 2x2 [x+1]2 for x
4 and let us prove statement S(x+1), which we recall was 2x+1
(x+1)2.
41
CONCEPTS OF AUTOMATA THEORY

The concepts include the alphabet ( a set of
symbols), strings (a list of symbols from an
alphabet), and language (a set of strings from the
same alphabet).
Languages
If is an alphabet, and L *, then L is a language

over .
Problems in automata is the question of deciding

whether a given string is a member of some
particular language
The problem L is : Given a string w in *, decide

whether or not w is in L.
42

Automata Theory and Formal Languages Introduction

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Automata Theory and Formal Languages Introduction

Transféré par

Droits d'auteur :

Formats disponibles

WELCOME TO SSK5204

AUTOMATA THEORY AND FORMAL LANGUAGE

WHY STUDY AUTOMATA?

Turings goal was to describe what computer could do and

1940s and 1950s, simpler kinds of machine, which

Originally proposed to model brain function, turned out to

Late 1950s, the linguist N. Chomsky study of formal

In 1969, C. Cook extended Turings study

Not strictly machine, these grammars have close relationship

All these theoretical developments bear directly on what

Regular expressions are used in many systems.

Finite automata model protocols, electronic circuits.

Context-free grammars are used to describe the syntax

Theory is used in model-checking.

Not to forget their important role in describing natural

And DTDs taken as a whole, are really CFGs.

When developing solutions to real problems, we

Well learn how to deal formally with discrete

Proofs: You never really prove a program correct, but

Well gain experience with abstract models and

Models layered software architectures.

Automata theory deals with the definitions and properties of

TERMINOLOGY AND MATHEMATICAL CONCEPTS

A group of object represented as a unit

Symbol and denote set membership and nonmembership

A is a subset of B, written A B, if every member of A also is

An infinite set contains infinitely many elements

To describe a set containing elements according to some rule,

{n|n = m2 for some m N} means the set of perfect squares.

Two sets A and B, the union of A and B, written A B,

SEQUENCES AND TUPLES

A sequence of objects is a list of these objects in some order

Examples: sequence 7, 21, 57 written (7,21,57)

In sequence the order and repetition does matter.

Sets and sequences may appear as elements of other sets

SEQUENCES AND TUPLESCONT.

If A and B are two sets, the Cartesian Product or cross product of

Example: If A = {1,2} and B = {x,y,z}

Example: If A and B are as above example,

If we have the Cartesian product of a set with itself, we use the

FUNCTIONS AND RELATIONS

A function is an object that sets up an input-output relationship.

If f is a function whose output value is b when the input value is a,

Function also called a mapping

The set of possible inputs to the function called its domain.

The outputs of a function come from a set called its range.

Describe a specific function in several ways:

FUNCTIONS AND RELATIONSCONT.

Example: Consider the function f : {0,1,2,3,4} {0,1,2,3,4}.

FUNCTIONS AND RELATIONSCONT.

Example: Two-dimensional table is used if the domain of

The function g is the addition function modulo 4.

FUNCTIONS AND RELATIONSCONT.

When domain of a function f is A1 Ak for some sets A1, , Ak,

A function with k arguments is called a k-ary function, and k is the

A predicate or property is a function whose range is {TRUE, FALSE}.

A property whose domain is a set of k-tuples A A is called a

FUNCTIONS AND RELATIONSCONT.

Example: in childrens game called Scissor-Paper-Stone, the two player simultaneously

FUNCTIONS AND RELATIONSCONT.

R is reflexive if for every x, xRx;

FUNCTIONS AND RELATIONSCONT.