Vous êtes sur la page 1sur 33

Introduction to Compilers

IntroductiontoCompilers

Writing Cross Compilers


WritingCrossCompilers

Mac C compiler
source code
in Unix C

Unix C
compiler

Mac C complier
usable on Unix

Mac C compiler
source code
in Unix C

Mac C complier
usable on Unix

Mac C complier
usable on Mac

Writing Retargetable Compilers


WritingRetargetableCompilers
Twomethods:
Makeastrictdistinctionbetweenfront
Make a strict distinction between frontend
end
andbackend,thenusedifferentbackends.
Generatecodeforavirtualmachine,thenbuild
,
acompilerorinterpretertotranslatevirtual
machinecodetoaspecificmachinecode.

Bootstrapping
Processofwritinga
g compiler
p
((or assembler)in
)
thetarget programminglanguage whichitis
intendedtocompile.
Applyingthistechniqueleadstoaself
Applying this technique leads to a self
hosting compiler.
Manycompilersformanyprogramming
Many compilers for many programming
languagesarebootstrapped,includingcompilers
for BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell,
Modula 2 Oberon,
Modula2,
Oberon OCaml,
OCaml Common
Common
Lisp, Scheme,Java, Python, Scala, Nimrod, Eiffel,
andmore.

Formal Languages
FormalLanguages
Alreadystudied
Already studied

Roles of Scanner
RolesofScanner
Removalofcomments
Removal of comments
Caseconversion
Removalofwhitespaces
Removal of white spaces
Blanks,tabulars,carriagereturnsandlinefeeds

Interpretationofcompilerdirectives
Interpretation of compiler directives
#include, #ifdef, #ifndef and
#define aredirectivesto
are directives to redirect
redirecttheinput
the inputof
of
thecompiler
Maybedonebyaprecompiler

Token:
Token:Anelementofthelexicaldefinitionof
An element of the lexical definition of
thelanguage.
Lexeme:Asequenceofcharactersidentified
Lexeme: A sequence of characters identified
asatoken.
Pattern
P
:Setofstringsisdescribedbyarule
S
f i
i d
ib d b
l
calledpatternassociatedwithatoken.

Regular Languages and Regular Expression


RegularLanguagesandRegularExpression
StudiedinTheoryofcomputation
Studied in Theory of computation

Possible Implementations
PossibleImplementations
LexicalAnalyzerGenerator(e.g.Lex)
y
( g
)
+ safe,quick
Mustlearnsoftware,unabletohandleunusualsituations

TableDrivenLexicalAnalyzer
+ generalandadaptablemethod,samefunctioncanbeused
for all tabledriven
foralltable
drivenlexicalanalyzers
lexical analyzers
Buildingtransitiontablecanbetediousanderrorprone

Possible Implementations
PossibleImplementations
Handwritten
Hand written
+ Canbeoptimized,canhandleanyunusual
situation easy to build for most languages
situation,easytobuildformostlanguages
Errorprone,notadaptableormaintainable

Design of a Lexical Analyzer


DesignofaLexicalAnalyzer
Steps
St
1- Construct a set of regular expressions (REs)
that define the form of all valid token
2- Derive an NDFA from
f
the
h REs
3- Derive a DFA from the NDFA
4- Translate to a state transition table
5- Implement the table
5
6- Implement the algorithm to interpret the table

Specification of tokens
Specificationoftokens
Regularexpressionsareimportantnotationfor
specifying patterns
specifyingpatterns.
RulestodefineRegularexpressions

Limitations of regular expressions


Limitationsofregularexpressions
Notdescribebalancedornestedconstructs.
Repeatingstringscannotbedescribed
Eg{wcw|wisstringofasandbs}

Regular Expressions
RegularExpressions

:{}
s
: {s | s in s^}
a
: {a}
r | s : {r | r in r^} or {s | s in s^}
s* : {sn | s in s^ and n>=0}
s+
: {sn | s in s^ and n>
n>=1}
1}
id -> letter(letter|digit)*
Num->digit+(.digit+)? (E(+|-)?digit+)?

Recognition of tokens
Recognitionoftokens
Transitiondiagrams:
Asanintermediatestepinconstructionoflexical
analyzer,weproduceastylizedflowchart,calleda
transitiondiagram.
Letterordigit

letter

start
9

10

other
11

Transitiondiagramforidentifiersandkeywords

Return(gettoken(),install_id())
(
k ()
ll d())

Implementingatransitiondiagram
p
g
g
Asequenceoftransitiondiagramscanbeconvertedintoaprogramtolookfor
thetokensspecifiedbythediagrams.Programsizeisproportionaltothenoof
states&edgesinthediagrams.
& d
i h di

digit
g
start

digit
25

other
26

27

Transitiondiagramfornumbers

C code for Lexical Analyzer is :


CcodeforLexicalAnalyzeris:

token nexttoken()
{while(1){
switch (state) {
case 0: c = nextchar();
/* c is lookahead character */
if (
(c==blank
bl k :: c==tab
t b :: c==newline)
li ) {
state = 0;
g
g
lexerne_beginning++;
/* advance beginning of lexerne */
}
else if (c == '<') state = 1;
else if (c == '=') state = 5;
else if (c == '>')
> ) state = 6;

else state = fail();


()
break;
/* cases 1-8 here */
case9:c=nextchar ();
if (isletter(c)) state = 10;
else state = fail();
break;
case 10: c = nextchar();
if (isletter(c)) state = 10;
else if (isdigit(c)) state = 10;
else state = 11;
break;

case 11: retract(1); install_id();


install id();
return ( gettoken() );
.../* cases 12-24 here */
case25:c=nextchar ();
if(isdigi t(c))state=26;
else state = fail();
break;
case 26: c = nextchar();
if (isdigit(c)) state = 26;
else state = 27; break;
case 27: retract(1); install_nurn();
return ( NUM ); }}}

Gettoken()
Looksforlexemeinsymboltable.Iflexemeiskeyword,correspondingtokenis
returned;otherwisetokenidisreturned.

Install id()
Install_id()
Hasaccesstobuffer,wheretheidentifierlexemeislocated.
Sym
Symtableisexamined&iflexemeisfoundmarkedaskeyword,itreturns0.
table is examined & if lexeme is found marked as keyword,it returns 0.
Lexemeisfound&isprogramvariable,returnspointertosymtableentry
Ifnotfoundinsymtable,itisinstalledasavariable&pointertonewlycreated
entryisreturned.
t i t
d

Install_num()

Derive NDFA from REs


DeriveNDFAfromREs
CouldderiveDFAfromREsbut:
MucheasiertodoNDFA,thenderiveDFA
NostandardwayofderivingDFAsfromRes
No standard way of deriving DFAs from Res
UseThompsonsconstruction(Loudens)

letter

letter

digit

Derive DFA from NDFA


DeriveDFAfromNDFA
Usesubsetconstruction(Louden
Use subset construction (Loudens)
s)
Maybeoptimized
Easiertoimplement:
i
i l
No edges
Determinist(nobacktracking)

letter

letter

digit

letter

[ h ]
[other]

letter

l
letter

digit
digit

Implementation Concerns
ImplementationConcerns
Backtracking
Principle :Atokenisnormallyrecognizedonlywhenthe
nextcharacterisread.
Problem :Maybethischaracterispartofthenexttoken.
Example :x<1. < isrecognizedonlywhen1 is
read In this case we have to backtrack on character to
read.Inthiscase,wehavetobacktrackoncharacterto
continuetokenrecognition.
Canincludetheoccurrenceofthesecasesinthestate
transitiontable.

Implementation Concerns
ImplementationConcerns
Ambiguity
Problem :Sometokenslexemesaresubsetsofother
tokens.
Example :
n-1. Isit<n><><1>or<n><1>?
Solutions
l i
:
Postponethedecisiontothesyntacticanalyzer
Donotallowsignprefixtonumbersinthelexicalspecification
g p
p
Interactwiththesyntacticanalyzertofindasolution.(Induces
coupling)

Example
Alphabet:
p
{:,*,=,(,),<,>,{,},[a..z],[0..9]}

Simpletokens:
{(,),{,},:,<,>}

Compositetokens:
{:=,>=,<=,<>,(*,*)}
{
(* *)}

Words:
id::=letter(letter|digit)
id ::= letter(letter | digit)*
num::=digit*

Example
Ambiguityproblems:
Ambiguity problems:

Character
:
>
<
(
*

Possible tokens
:, ::=
>, >=
<, <=, <>
(, (*
* *)
*,

Backtracking:
Backtracking:
Mustbackupacharacterwhenwereadacharacter
thatispartofthenexttoken.
Occurrencesarecodedinthetable
O
d d i th t bl

Vous aimerez peut-être aussi