Académique Documents
Professionnel Documents
Culture Documents
1
EXAMPLE 1
As a rst exer
ise, we shall write the previous example
as a lex s
ript:
%%
a*b printf("Token 1 found\n");
+ printf("Token 2 found\n");
%%
main()
{ yylex();
}
2
Note the following:
The
ore of the lex s
ript lies between the two %%s.
This
onsists of a series of pattern a
tion pairs. a*b
is an example of a pattern. An a
tion is any valid
C statement.
Lex takes a lex s
ript and produ
es a le
alled
lex.yy.
. Among other things, this
ontains the
denition of a fun
tion yylex(), whi
h is the lexi
al
analyser.
The part after the se
ond %%
ontains the fun
tion
main
alling yylex.
Linking to the lex library is required be
ause lex.yy.
uses
ertain predened fun
tions like yywrap, yyless,
and yymore.
An invo
ation of yylex goes through the mat
h pat-
tern ! perform a
tion
y
le repeatedly till the en-
tire input string is over. Thus all tokens in the input
string are returned in a single invo
ation of yylex.
As we have seen, it is sometimes ne
essary to return
a single token for every invo
ation of yylex. To do
this:
%%
a*b {printf("Token 1 found\n"); return;}
+ {printf("Token 2 found\n"); return;}
%%
main()
3
{ yylex();yylex
}
4
STRUCTURE OF A LEX SPECIFICATION
A lex program has three parts:
. . . denition se
tion. . .
%%
. . . rules se
tion. . .
%%
. . . user dened fun
tions. . .
The denition se
tion may
ontain, among other
things, literal blo
ks and pattern names. Literal
blo
ks provide a way for initialising variables used
in the a
tions. While pattern names are shorthand
for
omplex patterns.
As stated earlier, a rule is a pattern { a
tion pair.
We shall examine the set of allowed patterns later.
We have also seen the fun
tion main() being dened
in the user dened se
tion. We shall see examples
of other fun
tions whi
h
an go in at this pla
e.
Why
annot these fun
tion be pla
ed in the literal
blo
k?
5
Structure of lex.yy.c
Global Definitions:
The C definitions are inserted
verbatim over here.
Other lex defined functions and data
structures also inserted here.
yylex();
main ()
yywrap()
6
EXAMPLE 2
We want to
ount the number of
hara
ters, words and
lines in a pie
e of text. The resulting lexi
al analyser
should be able to read its input from a le whose name
is to be supplied along with the
ommand to invoke the
lexi
al analyser.
%{
/* Initialisation of variables to be used in a
tions */
unsigned
harCount = 0, wordCount = 0, lineCount = 0;
%}
word [^ \t\n℄+
eol \n
%%
%%
main(arg
, argv)
int arg
;
har ** argv;
{ if (arg
>1) {
FILE* file;
file = fopen(argv[1℄, "r");
if (!file){
fprintf(stderr, "
ould not open %s\n", argv[1℄);
7
exit(1);
}
yyin = file;
}
yylex();
printf("%d %d %d\n",
harCount, wordCount, lineCount);
return 0;
}
EXAMPLE 3
We want to extend the predvious example so that the
ounting
an be done over multiple les.
%{
unsigned
harCount = 0, wordCount = 0, lineCount = 0;
%}
word [^ \t\n℄+
eol \n
%%
%%
int
urrentfile = 1;
har **filelist;
main(arg , argv)
9
int arg
;
har ** argv;
{filelist = argv;
if (arg
==1)
yylex();
else
{FILE * file;
file = fopen(filelist[
urrentfile℄,"r");
yyin = file;
yylex();
};
printf("%d %d %d \n",
harCount, wordCount, lineCount);
}
int yywrap()
{
FILE* file;
/* f
lose(yyin);
*/ if (filelist[++
urrentfile℄ != (
har*) 0)
{
file = fopen(filelist[
urrentfile℄, "r");
yyin = file;
return 0;
}
else return 1;
}
This example illustrates a non-trivial use of a user
dened fun
tion.
The fun
tion yywrap has a defualt denition in the
lex library. We want override this denition for our
10
purpose.
When yylex hits the end of the
urrent le, it
alls
yywrap. If yywrap returns 1, yylex exits. Otherwise
it assumes that there is more input to be read from
the le asso
iated with yyin.
The key idea is to redene yywrap so that when it
is
alled, yyin is bound to the next le supplied in
the
ommand line and 0 is returned. This is done
until there are no more les to be read, when a 1 is
returned.
11
PATTERNS
We have seen the denition se
tion and the se
tion
on-
taining the user dened fun
tions in a lex s
ript. We
shall now examine the rules se
tion. As stated earlier,
the rules se
tion
onsist of pattern { a
tion pairs. It is
not hard to see that the power of the generated lexi
al
analyser depends on the des
riptive power of patterns.
1. Pattern { x, x is a single
hara
ter not in the lex
meta-
hara
ter set.
Meaning { The
hara
ter x itself.
Example { a
2. Pattern { \x, x is a single
hara
ter in the lex meta-
hara
ter set.
Meaning { The
hara
ter x itself.
Example { \.
3. Pattern { "s", s is a string of
hara
ters.
Meaning { The string s literally.
Example { "[℄", the string [℄
4. Pattern { [s℄, s is a string of
hara
ters..
Meaning { Any single
hara
ter o
urring in s.
Example { [ab
℄, the
hara
ter a, b or
.
5. Pattern { [x-y℄, x and y are
hara
ters.
Meaning { A shorthand for a
hara
ter
lass. De-
notes any
hara
ter in the range x to y.
Example { [a-
℄, the
hara
ter a, b or
.
6. Pattern { [^s℄, s is a string of
hara
ters.
Meaning { Any
hara
ters ex
ept those in string s.
12
Example { [^ \t\n℄, All
hara
ters ex
ept for a
blank, a tab or a newline.
7. Pattern { .
Meaning { Any
hara
ter ex
ept for a newline
Example { See Example 3.
8. Pattern { p{m,n}, p is a pattern, m and nare integers
Meaning { m through n o
urren
es of the pattern
p.
Example { a{1,3}, The strings a, aa and aaa
9. Pattern { (p), p is a pattern
Meaning { The same as p. Parenthesis is used to
larify asso
iation
Example { (ab
)+, One or more o
urren
es of ab
10. Pattern { p1|p2, p1 and p2 are patterns
Meaning { The strings represented by p1 or p2
Example { (ab
)+|de, One or more o
urren
es of
ab
or the string de
11. Pattern { p1p2, p1 and p2 are patterns
Meaning { The strings represented by p1
on
ate-
nated with those represented by p2
Example { (ab
)+de, One or more o
urren
es of
ab
followed by de
12. Pattern { p?, p is a pattern
Meaning { Zero or one o
urren
es of the strings
represented by p
Example { (ab
)*, Zero or one p o
urren
es of ab
.
13
13. Pattern { p*, p is a pattern
Meaning { Zero or more o
urren
es of the strings
represented by p
Example { (ab
)*, Zero or more o
urren
es of ab
.
14. Pattern { p+, p is a pattern
Meaning { One or more o
urren
es of the strings
represented by p
Example { (ab
)*, One or more o
urren
es of ab
.
15. Pattern { {n}, n is a name dened in the denition
se
tion
Meaning { The strings dened by the pattern
or-
responding to n
Example { See Example 2.
14
EXAMPLES OF PATTERN USAGE
1. We want to represent real numbers through patterns.
Here are some examples of reals: .36, 45., 3.414, .21e-6.
The lex s
ript dening reals is:
DIGIT [0-9℄
%%
({DIGIT}+\.{DIGIT}*|{DIGIT}*\.{DIGIT}+)(e-?{DIGIT}+)?
%%
/*///***/
/*///*a*b//**/
/**/
/*/*
16
3. As a third example we shall
onsider the denition
of strings. A string is a sequen
e of
hara
ters en
losed
within " and ". A " within a string is represented as ".
%%
\"[^"℄*(\"\"[^"℄*)*\" {ECHO;printf("\n");}
%%
main ()
{
yylex();
}
The input
""
""""
"jghg""nbv"
""""""
"fgb"""
gives as output
""
""""
"jghg""nbv"
""""""
"fgb"""
17
MORE PATTERNS
Certain patterns are
ontext dependent in the sense
that are a
tive only if
ertain
onditions ar fullled.
1. Pattern { ^p, p is a pattern
Meaning { Any string represented by p, provided it
o
urs at the beginning of a line.
Example { Consider the lex spe
i
ation:
%%
^#define {printf("<");ECHO;printf(">\n");}
%%
18
2. Pattern { p$, p is a pattern
Meaning { Any string represented by p, provided it
o
urs at the end of a line
Example { end$, the string end provided it o
urs
at the end of a line.
3. Pattern { p1/p2, p1 and p2 are patterns
Meaning { Any string represented by p1, provided
it is followed by a string represented by p2
Example { Re
all that FORTRAN ignores all spa
es
ex
ept those in
omments and Hollerith strings.
Therefore it is usual in a Fortran
ompiler to re-
move all spa
es in a prepro
essor pass, and then
start lexi
al analysis. Then a DO statement will
appear as:
DO50k=1,20,1
19
Here is a lex s
ript that distinguishes between the
two:
letter [a-zA-Z℄
digit [0-9℄
num {digit}+
id {letter}({letter}|{digit})*
%%
DO/{num}{id}=({num}|{id}), {printf("<");
ECHO;printf(">\n");}
{id} {printf("<");
ECHO;printf(">\n");}
%%
main ()
{
yylex();
}
20
START STATES
Sometimes the
ontext
ondition
annot be
aptured
by the patterns given above. For example, the interpre-
tation of a token might depend on the tokens already
seen. In su
h a
ase one needs states
As an example, in C, the string <stdio.h> should be
interpreted as a le name, if it is pre
eded #in
lude.
This is done by introdu
ing a state, say INCLSTATE.
We then want to say that when the lexi
al analyser is in
the state INCLSTATE, interpret the string within the
angular bra
kets as a lename.
%s INCLSTATE
%%
^#in
lude {BEGIN INCLSTATE;}
<INCLSTATE>"<"[^>\n℄+">" {..pro
ess filename..}
<INCLSTATE>\n {BEGIN INITIAL;}
%%
21
%s INCLSTATE
%%
\<.*\> {printf("some token found\n");}
^#in
lude {BEGIN INCLSTATE;}
<INCLSTATE>"<"[^>\n℄+">" {printf("<"); ECHO;
printf(">\n");}
<INCLSTATE>\n {BEGIN INITIAL;}
%%
For the input:
#in
lude <stdio.h>
We get the output:
some token found
To re
tify the situation, we use ex
lusive states. When
the s
anner is in an ex
lusive state s, only the patterns
beginning with <s> are valid.
%x INCLSTATE
%%
\<.*\> {printf("some token found\n");}
^#in
lude {BEGIN INCLSTATE;}
<INCLSTATE>"<"[^>\n℄+">" {printf("<"); ECHO;
printf(">\n");}
<INCLSTATE>\n {BEGIN INITIAL;}
%%
So for the same input:
#in
lude <stdio.h>
We get the output:
<<stdio.h>>
22
PUTTING IT ALL TOGETHER
We now show a
omplete lexi
al analyser for a small
programming language:
%{
#define LT 1
#define LE 2
#define EQ 3
#define NE 4
#define GT 5
#define GE 6
#define IF 7
#define THEN 8
#define ELSE 9
#define ID 10
#define NUMBER 11
#define RELOP 12
#define LPAREN 13
#define RPAREN 14
%}
23
ws [ \t\n℄+
letter [A-Za-z℄
digit [0-9℄
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+\-℄?{digit}+)?
%%
%%
24
int hash(s)
/* given a string s returns the hash value */
har* s;
{
har* p;
unsigned h = 0, g;
for (p = s; *p != '\0'; p++) {
h = (h << 4) + (*p);
if (g = h&0xf0000000) {
h = h ^ (g >> 24);
h = h ^ g;
}
}
return h % SYMTABSIZE;
}
int install_id () {
int symtab_index;
symtab_index = hash(yytext);
str
py(symtab[symtab_index℄.idstring, yytext);
symtab[symtab_index℄.other_attributes = yyleng;
return(symtab_index);
}
%%
27