Lect Slides

Introduction to Compilers
IntroductiontoCompilers
Writing Cross Compilers

WritingCrossCompilers
Mac C compiler
source code
in Unix C
Unix C
compiler
Mac C complier
usable on Unix
Mac C compiler
source code
in Unix C
Mac C complier
usable on Unix
Mac C complier
usable on Mac
Writing Retargetable Compilers

WritingRetargetableCompilers
Twomethods:
Makeastrictdistinctionbetweenfront
Make a strict distinction between frontend
end
andbackend,thenusedifferentbackends.
Generatecodeforavirtualmachine,thenbuild
,
acompilerorinterpretertotranslatevirtual
machinecodetoaspecificmachinecode.
Bootstrapping
Processofwritinga
g compiler
p
((or assembler)in
)
thetarget programminglanguage whichitis
intendedtocompile.
Applyingthistechniqueleadstoaself
Applying this technique leads to a self
hosting compiler.
Manycompilersformanyprogramming
Many compilers for many programming
languagesarebootstrapped,includingcompilers
for BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell,
Modula 2 Oberon,
Modula2,
Oberon OCaml,
OCaml Common
Common
Lisp, Scheme,Java, Python, Scala, Nimrod, Eiffel,
andmore.
Formal Languages
FormalLanguages
Alreadystudied
Already studied
Roles of Scanner
RolesofScanner
Removalofcomments
Removal of comments
Caseconversion
Removalofwhitespaces
Removal of white spaces
Blanks,tabulars,carriagereturnsandlinefeeds
Interpretationofcompilerdirectives
Interpretation of compiler directives
#include, #ifdef, #ifndef and
#define aredirectivesto
are directives to redirect
redirecttheinput
the inputof
of
thecompiler
Maybedonebyaprecompiler
Token:
Token:Anelementofthelexicaldefinitionof
An element of the lexical definition of
thelanguage.
Lexeme:Asequenceofcharactersidentified
Lexeme: A sequence of characters identified
asatoken.
Pattern
P
:Setofstringsisdescribedbyarule
S
f i
i d
ib d b
l
calledpatternassociatedwithatoken.
Regular Languages and Regular Expression

RegularLanguagesandRegularExpression
StudiedinTheoryofcomputation
Studied in Theory of computation
Possible Implementations
PossibleImplementations
LexicalAnalyzerGenerator(e.g.Lex)
y
( g
)
+ safe,quick
Mustlearnsoftware,unabletohandleunusualsituations
TableDrivenLexicalAnalyzer
+ generalandadaptablemethod,samefunctioncanbeused
for all tabledriven
foralltable
drivenlexicalanalyzers
lexical analyzers
Buildingtransitiontablecanbetediousanderrorprone
Possible Implementations
PossibleImplementations
Handwritten
Hand written
+ Canbeoptimized,canhandleanyunusual
situation easy to build for most languages
situation,easytobuildformostlanguages
Errorprone,notadaptableormaintainable
Design of a Lexical Analyzer

DesignofaLexicalAnalyzer
Steps
St
1- Construct a set of regular expressions (REs)
that define the form of all valid token
2- Derive an NDFA from
f
the
h REs
3- Derive a DFA from the NDFA
4- Translate to a state transition table
5- Implement the table
5
6- Implement the algorithm to interpret the table
Specification of tokens
Specificationoftokens
Regularexpressionsareimportantnotationfor
specifying patterns
specifyingpatterns.
RulestodefineRegularexpressions
Limitations of regular expressions

Limitationsofregularexpressions
Notdescribebalancedornestedconstructs.
Repeatingstringscannotbedescribed
Eg{wcw|wisstringofasandbs}
Regular Expressions
RegularExpressions
:{}
s
: {s | s in s^}
a
: {a}
r | s : {r | r in r^} or {s | s in s^}
s* : {sn | s in s^ and n>=0}
s+
: {sn | s in s^ and n>
n>=1}
1}
id -> letter(letter|digit)*
Num->digit+(.digit+)? (E(+|-)?digit+)?
Recognition of tokens
Recognitionoftokens
Transitiondiagrams:
Asanintermediatestepinconstructionoflexical
analyzer,weproduceastylizedflowchart,calleda
transitiondiagram.
Letterordigit
letter
start
9
10
other
11
Transitiondiagramforidentifiersandkeywords
Return(gettoken(),install_id())
(
k ()
ll d())
Implementingatransitiondiagram
p
g
g
Asequenceoftransitiondiagramscanbeconvertedintoaprogramtolookfor
thetokensspecifiedbythediagrams.Programsizeisproportionaltothenoof
states&edgesinthediagrams.
& d
i h di
digit
g
start
digit
25
other
26
27
Transitiondiagramfornumbers
C code for Lexical Analyzer is :

CcodeforLexicalAnalyzeris:
token nexttoken()
{while(1){
switch (state) {
case 0: c = nextchar();
/* c is lookahead character */
if (
(c==blank
bl k :: c==tab
t b :: c==newline)
li ) {
state = 0;
g
g
lexerne_beginning++;
/* advance beginning of lexerne */
}
else if (c == '<') state = 1;
else if (c == '=') state = 5;
else if (c == '>')
> ) state = 6;
else state = fail();

()
break;
/* cases 1-8 here */
case9:c=nextchar ();
if (isletter(c)) state = 10;
break;
if (isletter(c)) state = 10;
else if (isdigit(c)) state = 10;
else state = 11;
break;
case 11: retract(1); install_id();

install id();
return ( gettoken() );
.../* cases 12-24 here */
case25:c=nextchar ();
if(isdigi t(c))state=26;
break;
if (isdigit(c)) state = 26;
else state = 27; break;
case 27: retract(1); install_nurn();
return ( NUM ); }}}
Gettoken()
Looksforlexemeinsymboltable.Iflexemeiskeyword,correspondingtokenis
returned;otherwisetokenidisreturned.
Install id()
Install_id()
Hasaccesstobuffer,wheretheidentifierlexemeislocated.
Sym
Symtableisexamined&iflexemeisfoundmarkedaskeyword,itreturns0.
table is examined & if lexeme is found marked as keyword,it returns 0.
Lexemeisfound&isprogramvariable,returnspointertosymtableentry
Ifnotfoundinsymtable,itisinstalledasavariable&pointertonewlycreated
entryisreturned.
t i t
d
Install_num()
Derive NDFA from REs

DeriveNDFAfromREs
CouldderiveDFAfromREsbut:
MucheasiertodoNDFA,thenderiveDFA
NostandardwayofderivingDFAsfromRes
No standard way of deriving DFAs from Res
UseThompsonsconstruction(Loudens)
letter
letter
digit
Derive DFA from NDFA

DeriveDFAfromNDFA
Usesubsetconstruction(Louden
Use subset construction (Loudens)
s)
Maybeoptimized
Easiertoimplement:
i
i l
No edges
Determinist(nobacktracking)
letter
letter
digit
letter
[ h ]
[other]
letter
l
letter
digit
digit
Implementation Concerns
ImplementationConcerns
Backtracking
Principle :Atokenisnormallyrecognizedonlywhenthe
nextcharacterisread.
Problem :Maybethischaracterispartofthenexttoken.
Example :x<1. < isrecognizedonlywhen1 is
read In this case we have to backtrack on character to
read.Inthiscase,wehavetobacktrackoncharacterto
continuetokenrecognition.
Canincludetheoccurrenceofthesecasesinthestate
transitiontable.
Implementation Concerns
ImplementationConcerns
Ambiguity
Problem :Sometokenslexemesaresubsetsofother
tokens.
Example :
n-1. Isit<n><><1>or<n><1>?
Solutions
l i
:
Postponethedecisiontothesyntacticanalyzer
Donotallowsignprefixtonumbersinthelexicalspecification
g p
p
Interactwiththesyntacticanalyzertofindasolution.(Induces
coupling)
Example
Alphabet:
p
{:,*,=,(,),<,>,{,},[a..z],[0..9]}
Simpletokens:
{(,),{,},:,<,>}
Compositetokens:
{:=,>=,<=,<>,(*,*)}
{
(* *)}
Words:
id::=letter(letter|digit)
id ::= letter(letter | digit)*
num::=digit*
Example
Ambiguityproblems:
Ambiguity problems:
Character
:
>
<
(
*
Possible tokens
:, ::=
>, >=
<, <=, <>
(, (*
* *)
*,
Backtracking:
Backtracking:
Mustbackupacharacterwhenwereadacharacter
thatispartofthenexttoken.
Occurrencesarecodedinthetable
O
d d i th t bl

Lect Slides

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lect Slides

Transféré par

Droits d'auteur :

Formats disponibles

Introduction to Compilers

Writing Cross Compilers

Writing Retargetable Compilers

Regular Languages and Regular Expression

Design of a Lexical Analyzer

Limitations of regular expressions

C code for Lexical Analyzer is :

else state = fail();

case 11: retract(1); install_id();

Derive NDFA from REs

Derive DFA from NDFA

Vous aimerez peut-être aussi