Académique Documents
Professionnel Documents
Culture Documents
Dr.Aruna Malapati
BITS Pilani Asst Professor
Department of CSIS
Hyderabad Campus
BITS Pilani
Hyderabad Campus
Lexer / Scanner
Today’s Agenda
• Lexical Analysis
Lexer/
Scanner
Go get me the next team
Syntax Analyzer
Next
please
Scanner
n t a ;
Scanner
i
Scanner
Scanner
i n
I found a team
called keyword
a ;
keyword
Scanner
i n t
Syntax Analyzer
a ;
Next
please
Scanner
a ;
Scanner
a ;
Next
please
Scanner
Source code
Pattern
Scanner recognition
How to
specification:
Regular Tokens
Expressin
Its Tools:DFA
& NFA
• Example of Tokens
Operators: = + - > ( { := == <>
Keywords: if while for int double
Numeric literals: 43 4.565 -3.6e10 0x13F3A
Character literals: ‘a’ ‘~’ ‘\’’
String literals: “4.565” “Fall 10” “\”\” = empty”
• Example of non-tokens
White space space(‘ ‘) tab(‘\t’) end-of-line(‘\n’)
Comments /*this is not a token*/
BITS Pilani, Hyderabad Campus
Introducing Basic
Terminology
• What are Major Terms for Lexical Analysis?
– TOKEN
• Set of strings defining an atomic element with a defined meaning
• Examples include <Identifier>, <number>, etc.
– PATTERN
• A rule describing a set of string
• Recall File and OS Wildcards ([A-Z]*.*)
– LEXEME
• a sequence of characters that match some pattern
• Identifiers: x, count, name, etc…
else a identifier
, operator
return b;
int Keyword
} b identifier
) operator
{ operator
if Keyword
.. ..
BITS Pilani, Hyderabad Campus
Example - 2
Lexical analysis
- Transform multi-character input stream to token stream
- Reduce length of program representation (remove spaces)
for_key
for_key ID(“var1”)
Practical Issues:
• Translating RE into executable form
• Input buffering
• Interface to parser
• Tools
b e g i n WS
AN
What if both need to happen at the same time?
f o r
Lexical Analyzer
Tokens
• Input buffering
– Read in characters one by one
Ib,fp
f o r v a r 1 = 1 0 v a r 1 < =
Ib – Lexeme Beginning
fp – Forward pointer (keeps track of portion of input string scanned)
Ib fp
f o r v a r 1 = 1 0 v a r 1 < =
Ib fp
f o r v a r 1 = 1 0 v a r 1 < =
Ib fp
f o r v a r 1 = 1 0 v a r 1 < =
f o r v a r 1 = Buffer
f o r v a r 1 = Buffer1
1 0 v a r 1 < = Buffer2
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
• Example:
• Letter_(letter_ | digit)*
– Lexical analyzer is unable to proceed because none of the patterns for tokens
matches a prefix of remaining input.
– Delete successive characters from the remaining input until the analyzer can find
a well-formed token.
– May confuse the parser
C
lex.yy.c a.out
compiler
declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions