Académique Documents
Professionnel Documents
Culture Documents
Matchers
Compiler Design Lexical Analysis
asist. dr. ing. Ciprian-Bogdan Chirila
chirila@cs.upt.ro
http://www.cs.upt.ro/~chirila
Outline
Important States of an NFA
Functions Computed from the Syntax Tree
Computing nullable, firstpos and lastpos
Computing followpos
Converting a Regular Expression Directly to
a DFA
Minimizing the Number of States of a DFA
State Minimization of a Lexical Analyzers
Trading Time for Space in DFA Simulation
Optimization of DFA-Based Pattern
Matchers
First algorithm
constructs a DFA directly from a regular expression
without constructing an intermediate NFA
with fewer states
used in Lex
Second algorithm
minimizes the number of states of any DFA
combines states having the same future behavior
has O(n*log(n)) efficiency
Third algorithm
produces more compact representations of
transitions tables then the standard two dimensional
ones
Important States of an NFA
it has non- out transitions
used when computing -closure(move(T,a))
the set of states reachable from T on input a
the set moves(s,a) is non-empty if state s is
important
cat nodes
are
represented
as circles
Representation Rules
syntax tree leaves are labeled by or by an
alphabet symbol
to each leaf which is not we attach a
unique integer
the position of the leaf
the position of its symbol
a symbol may have several positions
symbol a has positions 1 and 3 (on the next
slide!!!)
positions in the syntax tree correspond to
NFA important states
Thompson Constructed NFA for
(a|b)*abb#
State a b
A B A
B B D
D B E
E B A
State Minimization in
Lexical Analyzers
to group together
all states that recognize a particular token
all states that do not indicate any token
e.g. {0137,7} {247} {8,58} {7} {68} {}
{0137,7} do not indicate any token
{8,58} announce a*b+
{} - dead state
has transitions to itself on input a and b
is target state for states 8, 58, 68 on input a
State Minimization in
Lexical Analyzers
next, we split
0137 from 7
they go to different groups on input a
8 from 58
they go to different groups on input b
dead states can be dropped
if we treat missing transitions as signal to end
token recognition
Trading Time for Space in DFA
Simulation
transition function of a DFA
two dimensional table indexed by states and
characters
typical lexical analyzer has
hundreds of states
ASCII alphabet of 128 input characters
< 1 MB
compilers live in small devices too
1 MB could be too much
Alternate Representations
list of character-state pairs
ending by a default state
chosen for any input character not on the list
the most frequently occurring next state
thus, the table is reduced by a large factor
Bibliography
Alfred V. Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D. Ullman Compilers, Principles,
Techniques and Tools, Second Edition,
2007