Vous êtes sur la page 1sur 15

Today we are going to discuss algorithms

for answering questions about regular


languages or really about the
representations for the languages such as
[inaudible]. The questions we can resolve
for [inaudible] include many we cannot
resolve for programs in general. Examples
include whether a given string is accepted
by a given [inaudible], that's the
membership problem. Or whether a given
[inaudible] accepts any string at all, the
emptiness problem. As part of our
discussion we are going to prove an
important theorem called the pumping
[inaudible] which lets us show certain
languages not to be regular. Automotive
theory talks about many different classes
of languages including context feeling,
redigit cursive and recursive, inumeral
languages. Regards to meet each of these
classes but for the moment we know only
one class, the regular languages. When we
investigate a class of languages there are
two important issues. The first is
decision properties. We shall see that
there are many questions we might like to
ask about a language or rather its
representation, such as whether it is
empty, finite or infinite. Its good to
know there are algorithms to answer such
questions ... At least for the regular
languages. Unfortunately as we meet larger
classes of languages, we find that in
general the larger the class of languages
the less likely there is to be an
algorithms to answer questions about
languages in that class. The second
important issue is closure of properties
of the class. These involve applying
operations such as union to languages in
the class. We are going to defer the
discussion of closure of properties to
another lecture although I'll give you an
example on the next slide. [inaudible]
properties are statements that when we
apply certain operations to languages in
the class, the result is also in the
class. For example we say that a class of
languages is closed under union, if given
two languages in the class, the union of
those languages is also in the class. So
if i have two regular languages i can
represent them by regular expressions and
connect those expressions by a plus, with
the appropriate parentheses, to get a
regular expression for their union.
Similar constructions work for their
concatenation and closure. Now let us
address the main topic. Of this lecture,
decision properties of regular languages.
We've used both formal and informal ways
of describing languages. The formal ways
include [inaudible] and regular
expressions, each of which defines a
language by a precise mathematical
definition. But we've also described
languages informally by pro statements and
set formers such as this. Or even more
informal statements like this. Okay.
However you can't answer questions about a
language unless you have a formal
description. For example, an instance
where we'll talk shortly about testing
whether a regular language is infinite,
given one of its formal representations.
It looks like the last of the informal
descriptions describes an infinite set,
but does the word ?some' Mean any or some
one particular number like ten for example
that I had in mind. Thus we are only going
to use formal descriptions of languages
when we talk about algorithms for deciding
things about those languages. Thus a
decision property for a class of languages
is an algorithm that takes a formal
representative of the language. The
algorithm answers some particular question
about the language such as whether or not
the language described by the
representation is empty. Here are a few
examples of why we might be interested in
decision properties of regular languages.
Both involve protocols represented by
[inaudible] If we ask whether the language
as such in the [inaudible] is finite we
are, in effect, asking whether the
protocol it represents is guaranteed to
terminate. Or if we make the final states
of the [inaudible] be error states, then
asking if its language is empty, is
tantamount to asking whether the protocol
can fail. And remember we couldn't answer
either of these questions about programs
in general, so we couldn't get the answers
to these questions about protocols by
looking at the code that implements them.
Another use for decision properties of
regular languages involves minimizing
their representations. For instance, the
"F" "A"'s are a good representations for
certain kinds of digital circuits. Those
that have memory. We usually want the
smallest circuit to accomplish a task and
a good first step is to find a DFA that
does what we want and has the smallest
number of states of any DFA for the same
language. It turns out that we can
determine whether two DFA's are
equivalent. That is whether they define
the same language. That lets us find the
minimum state DFA equivalent to any given
DFA. Again, we do none of this for
programs in general, you cant tell whether
two programs do the same thing, and we
cant find the smallest equivalent for a
given program even though we know in
principle that one must exist. The
membership question for regular languages
answered by an algorithm that takes a DFA
and a string and tells whether or not the
string is accepted by the DFA. The
algorithm is the obvious one. Simulate the
DFA on the input. Here is an example, it's
something we've was seen several times
before each time I show it to you I use a
different style of presentation, but the
idea is always the same. Here, the DFA for
strings without consecutive ones,
something which I know we have seen
before, and the input string is zero one
zero one, one. That obviously has
consecutive ones so it shouldn't be
accepted. Well, let's see what happens
when we simulate it. We read a zero we
stay in A. When we read a one we go to B.
We read a zero we go back to A. Read a one
we go to B, read another one we go to C.
When we simulate this DFA on the input, we
see that in D the string gets [inaudible]
to state C, and it is not accepted. Now
you might wonder what if the regular
language were represented by an NFA or a
regular expression for example Then you
first need to convert the representation
to a DFA and then simulate the DFA. It is
possible to convert from many of the full
representations we know about to any of
the others using the circles of
conversions. So doing might exponentiate
the size of the description but there is
still an algorithm to do the conversion,
and that's all we need to show there is,
for example, and algorithm to tell, given
a regular expression and a string, Whether
the string is in the language o the
regular expression. Generally, proofs of
closure or decision properties require
either a DFA or regular expression by the
way. The emptiness problem is given a
representation for a regular language does
its language contain any string at all.
Okay, we are going to assume the line is
represented by a DFA. Obviously if by some
other representation, then converted to a
DFA. Finding reachable states requires a
breadth first a depth first search from
the start state. I'm not going to assume
you are familiar with these search
techniques, but it is fairly easy to think
of some way of searching a graph from a
single node by following all arcs out of
the node and marking those nodes you
visited. Then follow arcs out of the nodes
you visited and mark any other nodes you
visit. Keep doing that until you can not
mark any other nodes. The mark [inaudible]
of those that are reached from the start
state on some input. If at least one final
state is marked, then the DFA accepts at
least one input. If no final state is
marked then it is impossible for the DFA
to accept anything and its language empty.
Here is an example. Here is your start
state. We might mark, mark it, mark these
guys. Maybe mark these guys as we go. But
if all your final states are out here, and
never get marked. Then obviously no matter
how complex this automaton is, you cant
reach your final state, and you cant
accept anything. Now if you have a
automaton with three states it is pretty
easy to tell what is reachable and what
isn't. But if you have a million states
represented by some table, then it is
hardly easy to tell. But fortunately we
have a straight forward search algorithm
regardless of how large the automaton or
graph is. Now let us take up a more
difficult problem, but one that we can
still solve. We'd like to know whether or
not the language defined by DFA is finite
or infinite. The first fact we're going to
prove is that if the language of the DFA
contains any string of length n or more,
where n is the number of states of the
DFA, then the DFA contains an infinite
number of strings. Surely if the DFA
doesn't accept any string of money equal
to or greater than N then it accepts only
a finite number of strings. However it
doesn't seem feasible to test membership
for all input strings of length and or
more since there are an infinite number of
them. Can we ever finish? If not then we
really don't have an algorithm for testing
infiniteness. But as we shall see it is
possible to limit the length of strings we
have to test to twice the number of state
so we have a really large and ugly but
finite task, and we really do have an
algorithm. So let's try to prove the point
we need, that if the DFA accepts any
string whose length is at least the number
of states N, that it accepts an infinite
number of strings First observe that a
string of a length N or more has at least
N plus one states along its path. For
example a string a string of length two
has three states on its path, here is a
typical picture we might have. String AB.
Looks like that. Notice no matter how
many, well, the number of [inaudible]
symbols in the string is the number of
arcs and there's always one more node than
there are arcs in the path. That's why
there will be, n plus one states for a
string of length n. Now if there are only
N different states, and there are N plus
one states along the path, then two states
along the path must be the same. That's
called the pigeonhole principle, you might
remember. Here is a picture of the path
for string W, which we are breaking up as
X, Y, and Z. X is the prefix of W that
gets the DFA to the fist state that
repeats on the path, which we call state
Q, that's right here of course. Then Y is
a part of W that gets the [inaudible] back
to Q for the first time. That is the end
of Y is the second occurrence of Q. Notice
that therefore Y cannot be the empty
strain although X and Q might be Finally,
Z is the rest of W and we know it gets DFA
to a final state because W is accepted by
the DFA. Notice that the path label Z may
have states that also appear earlier but
it doesn't matter, the important thing is
that we identify the first repeating state
"Q". The claim that X, followed by I
repetitions of Y, followed by Z, is also
accepted by the DFA for any integer I. To
see why, X takes us to state Q, so we
could for example go like this and then we
could skip Y altogether. Just follow Z and
we get to a final state. That tells us
that XZ is in the language. Or, we could
follow X to state Q, we could go around
loops many, many times. And then finally
go off following Z, and again get to the
final state. Or, after I use of input Y
for any I, however large, we follow input
Z and we accept. This proves that X1 or
the IZ is accepted for any I. Remember
that Y cannot be empty so all these
accepted strings are different. Thus the
DSA accept an infinite number of strings,
one for each I. Remember, we still do not
have an algorithm because we can't test
the infinite number of strings that
lengths equal or greater to N, however we
don't have to Because it is sufficient to
test strings of length between N and 2N
minus one and there are a finite number of
such strings. When we prove this statement
we show at least have an algorithm
although it is a rather time-consuming
one. Now we picked Y to be the first cycle
on the path. So the length of XY cannot be
greater than N. That is, some state within
the first N plus one states on the path
surely repeats. We also know that if
length of "Y" is at least one, so "Y" lies
between one and N then if W is the
shortest accepted string of length at
least n, then we claim that W can not be
as long as 2n. But suppose it were. Now
XZ, as we know, is another accepted
string, and the length of XZ is the length
of W minus the length of Y, and the length
of Y is at most N. So the length of XZ is
at least N. That means that XZ is a string
shorter than W, yet at least N in length
and it is also accepted. But we assume
that there was no string accepted that was
shorter than W and also of length of at
least N. As a result Given any really long
string W that's accepted we can keep
taking out pieces of length between one
and N, that's the Y in each of these
diagrams, and we just keep throwing them
away and eventually what's left of W will
be between N and 2N minus one. So the
algorithm to decide whether a regular
language is infinite is to construct a DFA
for it. And let the DFA have N states.
Test all the strings of length between N
and 2N minus one, and say infinite if any
of them are accepted. Otherwise say
finite. This is a terrible algorithm. If
there are K input symbols and N states,
then the number of strings we have to
simulate is about K to the power 2N. That
is a lot of work and there is a much more
efficient algorithm. One that takes time
proportional to the number of transitions,
that is K times N, if implemented right. I
wanted to give you the argument about the
length of strings because it's important
when we take up the pumping [inaudible], a
technique for showing languages not to be
regular. We already discussed searching
forward from a node in a graph to find all
the nodes you can reach. So we can
eliminate the states that are not
reachable from the start state. We then
want to eliminate states that don't reach
a final state. This algorithm is the same
except start by marking the final states
and follow the arcs backwards. Now there
is an elegant algorithm for finding cycles
using depth first search that takes time
proportional to the number of edges or
transitions. I am going to trust that
you'll meet this algorithm in a course on
algorithms if a data structures, if you
haven't already done so. However, here is
a simple way to test for a cycle in a
graph, it takes time proportional to the
number of nodes or states, times the
number of arcs or transitions. We are
going to do the same thing for each node
N. Starting at N, search forward until you
either can reach no more nodes. Or you
discover that you can reach N. That is,
here's node N, we explore forward it, and
if we're lucky at some point we reach a
node that has an arc back to N. If you can
reach N, then you have a cycle and you can
conclude the language of the DFA is
infinite. If not, try the same process
from another node. If you exhaust all the
nodes as starting points and you still
haven't found the cycle, then there are
none and you conclude the language is
finite. This is a good time to introduce
the pumping lemma for regular languages
because we have essentially proved it
during our analysis of the [inaudible] of
this problem Here is the statement of the
pumping lemma. For every regular language
L there is an integer N. Which happens to
be the number of states which sum DFA for
L, such that if every string w and l,
whose length is at least n, we can break w
into w=xyz where Y is the first label of
sub-string of W that goes from a state to
the, the same state, as we saw in the
previous slides, such that three things
are true. First the prefix of XY is short,
it is of length at most N. We are sure of
that by making y the label of the first
cycle we encounter. Second, Y is not the
empty string. We are sure of this because
Y connects two different occurrences of
the same state along the path of W. And
lastly, XY to the IZ is in L for all
integers I. The statement is particularly
complex because it is of the form for all
there exists, for all there exists. But
here's how we use it. Think of a game
played between you and an adversary. You
pick the language L that you want to show
is not regular and suppose the adversary
claims it is regular Then the adversary
has to provide that their exist parts
while you play the for all parts. You have
already picked L. Now the adversary has to
pick M. He can pick a number as large as
he wants , but once picked, it is
finalized and the game proceeds. Now you
get to pick the string W, subject only to
the constraint that it is at least as long
as N, the number the adversary picked.
Next the advisory has to break your W up
into X Y Z subject to the constraints of
the length of X Y as at most N and the
length of Y as at least one. You win the
game by picking an I so if your X want to
be I Z is not an L. However, in a proof we
don't know what moves an advisory will
make, but to win we want to cover all
possible moves that is we know the picked
N, but we don't know N's actual value. So
we must pick W in terms of N. Similarly,
we know W equals X, Y, Z, but we don't
know exactly where Y is, except that it is
not empty and it is among the first N
positions of W. Thus, our argument that X,
Y to the I, Z is not an [inaudible] for
any of these possible Y's. Now, let's see
an example. Let us pick this language as
L. It is the set of strings consisting of
some number of zeros followed by the same
number of ones and we have claimed before
that it is an example of a non regular
language. Now we're going to prove it. Now
the adversary picks N, we don't know what
N is, but we know it has some fixed value.
Now we get to choose W in terms of N. And
we pick, W equals zero to the N one to the
N, that is N zeros followed by N ones. But
then the adversary gets to break W into X,
Y, Z and you don't know exactly how it is
broken. But we know enough about X, Y, Z
to show that there is some string, in
particular the case I=2, that the pumping
lemma says has to be in the language L.
But obviously it isn't because it has more
0's than 1's. That is, We know that Y,
being part of the first N positions of W,
Can have only zeros. So two Ys have more
zeros than one Y and the number one which
are all contained within a Z doesn't
change. We next take up the question of
testing whether two regular languages are
the same. We've supposedly given
representation that the two languages, L
and M Whatever representation we are given
we convert to DFAs, then we have to
combine those DFAs into a single DFA, that
in a sense, runs both DFAs in parallel, we
call it the product DFA. Suppose these two
DFA's have states Q and R in the product
DFA the states are pairs, one state from Q
the other from R The start state of the
[inaudible] DFA is the pair consisting of
the start state from each DFA. For the
transitions to the product DFA suppose we
have a state that is the pair QR And
suppose A is the input symbol from which
we want to figure out the transition. We
look at the transition function for the
first DFA, Say Delta L, and we see where Q
goes on input A. So here's Q and on input
A it goes to some state like that. Then we
look at the transition function for the
second DFA. Say, delta M, and we see where
R goes on input A. So here's R on input A
it goes somewhere here. Okay. Then in the
product DFA, the transition from the state
Qr ... Then in the product DFA the
transition from the state QR on input A is
the state pair that is the first component
is delta L of Qa, and whose second
component is delta M of Ra. That is we
simulate the two transitions in parallel.
Here is a little example. Here are the two
given DFAs. We will call this the orange
DFA and that the purple DFA. And here is
the product DFA. For example let's figure
out the transition from AC on zero, so
here's state AC and I look in the orange
[inaudible] a on a zero goes to a itself
And in the purple automaton, C on a zero
goes to D. So the Combination AC goes to
AD, and you see that transition here. For
another example where does AD go on input
one? Well, the orange automaton says A
goes to B on one. The purple automaton
says D goes to C on one. So, on one AD
goes to BC. That's this transition there.
The algorithm for testing whether two
DFA's are equivalent, that is whether they
accept the same language, begins by
constructing the product DFA. Make the
final states of the product DFA be all
those pairs such that one is a final state
and the other isn't. If string W reaches
one of these final states in the product,
then W is accepted by one of the original
DFA's and not the other. Thus the two
languages are not the same. Only if the
product DFA with this selection of final
states has an empty language of the two
DFA's equivalent. Here's an example AC has
made a final state because in the original
[inaudible] c is final and a is not.
Likewise BD is final because B is final
but D is not. We now see that the two
original DFA's are not equivalent. It
happens that the final state B, D is not
reachable from the start state so there
are no strings that the orange
[inaudible], [inaudible], [inaudible] but
the purple one does not accept. However,
AC is also a final state. And it is
obviously reachable from the final states
simply because it is the final state. That
is the empty string distinguishes between
the orange and purple automata. The empty
string is accepted by the orange automaton
but not the purple one. A related question
to ask about regular languages is whether
one is contained in the other. The test
is, in a sense, one half of the
equivalence test we just saw. Start by
building the product automaton. But we
have to define the final states
differently. How will you do that? That is
L is not contained in M if, and only if,
there is some string W that is in L but
not in M. Such a string would get the DFA
for L to a final state, but would not get
the DFA for M to a final state. So the
question of containment is the same as the
question of whether there is any string,
W, that gets the product automaton to a
state QR, where Q is final and R is not.
Here B is the only final state of the
first automaton and D is the only non
final state of the second automaton, so
only BD is final. Okay. As we observed
before, BD is not reachable from the start
state. It has arcs out but no arcs in.
Thus the language of the product automaton
is empty. And we conclude that the
language of the orange automaton is a
subset of the language of the purple
automaton. Next we're going to attack the
problem of given a DFA find the equivalent
DFA with the fewest states. There is an
obvious dumb algorithm, just consider all
the DFA's with the same input alphabet but
a smaller number of states, there's a huge
but finite number of such states, sorry,
got to do that over. Next, we're going to
attack the problem of, given a DFA, find
the equivalent DFA with the fewest states.
There's an obvious dumb algorithm. Just
consider all the DFAs with the same input
alphabet, but a smaller number of states.
There's a huge but finite number of such
[inaudible] so in principal we can solve
this problem. This time we're not going to
dwell on the bad algorithm, but talk you
through the good algorithm immediately.
The key idea is to build a table of pairs
of states and figure out which pairs are
distinguishable in the sense that there is
some input string that leads one of the
pair to a final state and the other to a
non-final state. Otherwise, states are
indistinguishable and they can be merged
into a single state. Here is the
[inaudible] automaton we saw way back in
the beginning of the course. Now, let's
look at the states forty thirty here and
add in ... Input S takes them both to an
accepting state that is [inaudible] and
input O takes them both to deuce, here and
here. Thus, no further inputs could ever
distinguish 4030 from add in. Similarly
thirty forty and [inaudible] are
indistinguishable Now we can deduce that
thirty all , that's here, and deuce. Are
also indistinguishable, on input S They go
to forty thirty and add in, respectively,
that is, thirty all goes to forty thirty
on S. And deuce goes to add in on S. But
we know that those two states are
indistinguishable so we'll never be able
to distinguish 30 all from deuce by a
sequence beginning with S. And further, on
input O well, 30 all goes to 30-40. And
deuce goes to add out. And we said we cap
these thing with these two states. So
there is no string [inaudible] that can
distinguish 30 all from deuce. We are now
going to talk about how we find the
distinguishable states. The basis is pairs
that are distinguishable by the empty
string. These are the pairs that have one
final and one non-final state. For the
inductive step we can mark a pair QR if
those stay [inaudible] put A to
distinguish [inaudible]. If delta of QA
and delta of RA are marked, then they're
distinguishable by some string W. That is,
here we have Q, here's Q, and it goes on W
to let's say a non final state, and R goes
on W to some final state. For the
inductive step we can mark a pair Q, R if
these states go at some input A to
distinguishable states, that is to say
here is Q And on A goes to some state, I
don't know whet it is, but its certainly
delta Q and A, and here is R and it goes
on A to again some other state i don't
know what it is but its a delta of R and
A. And then these two states on input W
will say that this one goes to a non-final
state and that goes to a final state on
input W. Then, a claim AW distinguishes Q
from R, because obviously Q goes on AW to
a non-final state and R goes on the same
AW to a final state. After no more marks
are possible, the unmarked pairs are
equivalent and can be merged into one
state. This point may be obvious, but we
are going to need it in what follows. If
there is no string W that distinguishes
state P and Q, and there is no string that
distinguishes Q from R. Then how can we
distinguish P from R? That would mean that
some string W leads one of P and R, say R
to a final state, so here is P and W,
let's say it leads it to a final state. If
there is no string W that distinguishes
States P and Q, then there is no string
that distinguishes Q from R, then how
could some string W distinguish P from R.
That would mean that there is some string
W, needs one of P and R To a final state
and the other to a non-final state. Let's
say R leads to a final state, so here's W
leading P to a non-final state. And here's
R and on the same W gets to a final state.
But then W also distinguishes Q from
either P or R. So here's Q. And W leads it
to some state. Okay, so let's say that W
leads Q to a final state. Then W
distinguishes Q from P, because here, P on
W goes to a non-final state, Q on W goes
to a final state. If Q goes to a non-final
state, then it doesn't distinguish, P from
Q but it does distinguish R from Q because
Q would then go to a non-final state while
R goes to a final state. Incidentally,
distinguishable is not transitive. It is
quite possible that W distinguishes P from
Q and also Q from R, but does not
distinguish P from R. For example W could
lead both P and R to final state and Q to
a non-final state. We are now going to use
the table of in distinguishability to
merge states that are indistinguishable.
That gives us the minimum state DFA.
Although we must be careful to remove at
some stage, all the states that are not
reachable from the start state. Suppose we
have set of indistinguishable states, say
Q1 through QK. We're going to replace them
all by a single state that behaves as they
all do, called the representative Q. It
can be one of the QI's or some new name we
create for this purpose. On any symbol A,
all the indistinguishable states, the
QI's, go to states that are also
indistinguishable from one another. For if
not, then we can use the distinction
between say the states delta Q1 of A and
delta Q2 of A to distinguish between Q1
and Q2. But we already know that Q1 and Q2
are indistinguishable so that can't
happen. Thus, make the transition for
state Q on input A be the representative
for the indistinguishable states delta Q1A
and so on. Let's work with the DFA that we
constructed from the NFA that represented
the moves on a chess board. We are going
to make it easier to work with by renaming
the states to be single letters. For
example A is the set containing only one,
and G is the set containing one, three,
Five, seven, and nine. Here is a little
trick for arranging pairs in a triangle so
that each pair appears exactly once.
Notice that the rows are the states in
order, except for the last state G, which
doesn't appear there. Then the columns are
labeled by the states in backwards order
except for the first state A We begin the
table of indistinguishabilities by marking
each pair that consists of a final state
and a non-final state. Here, the final
states are F and G. So pairs that have one
of these and one of the other states A
through E are marked. Let's look at the
transitions on input R. Notice that the
column for R has only states B and D.
Since we've not yet distinguished B from D
there's no way input R can help
distinguish other pairs of states at this
point. However we have more luck with
input B. Some states go to final states on
input B, namely C, D, E, and G. And
others, A, B, and F, go to non-final
states, thus we can distinguish any of C,
D, E, or G from any of A, B, or F. Some of
these pairs are already distinguished, but
we get seven new pairs marked in red here.
At the next step we discovered two more
distinguishable pairs, CD and CE. For
example C and D lead to F and G
respectively, on input B. So we know that
whatever string W distinguishes F from G,
D followed by W will distinguish P from D
now we can lock the pair A, B. These
states transition on input R to B and D
respectively. And we already know that we
can distinguish B from D. Unfortunately, D
and E can never be marked because on both
inputs they go to the same state. Okay, we
are now done. We have distinguished every
pair of states except for D and E.
Moreover, we can never distinguish these
states since they both go to D on input R
and to G on input B. Thus, D and E form an
indistinguishable group of states and we
can replace them by representative. We
choose the new state A which is the
representative and all transitions from
other states to D or E are replaced by
transitions to H. The rows for D and E are
replaced by a row for H. As D and E
transition to D on input R, H transitions
to itself on R. The transition on input B
for H is to G, which is the same as it was
for both D and E. As we mentioned,
collapsing indistinguishable states to a
single state goes a long way to finding
the minimum state equivalent to a given
DFA. But there's one other issue that in
distinguishability doesn't address. The
possible existence of unreachable states
that are cluttering up the transition
diagram or table. But it is easy to find
such states and we can either eliminate
them from the original DFA or we can
eliminate them after merging
indistinguishable states. It doesn't
matter. Now we've done our best to combine
states that we know how to combine. But it
is in principle possible that there is
some other smaller DFA that we can't get
by combining states of our DFA. And
fortunately that can't happen as we shall
now see. Here's the proof that there is
nothing smaller than the DFA we get by
merging states and eliminating unreachable
states. Suppose As are DSA and D is a
hypothetical equivalent with fewer states
Imagine we combine the states of A and B
to form a larger DFA. It doesn't matter
what the start state of the combined
[inaudible] is but the final states are
those of A and B. We need to use
distinguishable in its contra-positive
form, that is if W distinguishes delta of
Q, A from delta of P, A then surely A, W
distinguishes Q from P. So if Q and P are
indistinguishable, then so are their
successors on any input A. Here is an
informal illustration of the proof
technique. We start off with the fact that
the start dates of automata A and B are
surely indistinguishable because the
automata accept the same language Now
suppose the start states go to some
states, P and Q on input A. P and Q must
be indisguishable because if they were
distinguishable then we could distinguish
the start dates and we know we cannot do
that Now suppose, Q and P go on input B to
other states R and S. Then R and S must be
indistinguishable for the same reason.
Formally we shall prove that every state Q
of A is in [inaudible] of some state of
being. The proof is in the induction on
the length of the shortest string that
gets you to Q from a start state. Notice
that because we eliminated unreachable
states we know there is such a shorter
string For the basis, the state of A that
is reachable from the start state by a
string of length zero, which of course is
the start state itself. We know that this
state of A is indistinguishable from the
start state of B because the languages are
the same. For the induction, assume the
inductive hypothesis for strings shorter
then W and suppose W is a shorter string
getting A to state Q. Let W equal X, A
that is A is the last symbol of W and X is
all the rest of W. We can apply the
inductive hypothesis to X because it is
shorter than W. We know X gets A to some
state R that is indistinguishable from
some state P of B. But then A takes state
R on input A to state Q and we know B
takes state P on input A to some state,
say S. Then Q must be indistinguishable
from S using the argument that we saw two
slides ago. Okay. Now we use the
transitivity of indistinguishable to argue
that no two states of A are
indistinguishable from the same state of
B. For if they were they would be
indistinguishable from each other. But we
A, cannot have indistinguishable states
because we merged them all constructing A
Thus, B has at least as many states of A,
even though we started off assuming that
there was no relationship between the
automata A and B except that they each
accepted the same language. So, that
concludes the, entire argument that says
that by throwing away unreachable states
and then merging indistinguishable states
you get an automaton that is a small, that
is has as few states as any other
automaton for the same language.

Vous aimerez peut-être aussi