Vous êtes sur la page 1sur 189

COMPILER

DESIGN
G.Appasami,M.Sc.,M.C.A.,M.Phil.,M.Tech.,(Ph.D.)
AssistantProfessor
DepartmentofComputerScienceandEngineering
Dr.PaulsEngineeringCollage
PaulsNagar,Villupuram
Tamilnadu,India

SARUMATHIPUBLICATIONS

Villupuram,Tamilnadu,India

FirstEdition:July2015
SecondEdition:April2016

PublishedBy

SARUMATHIPUBLICATIONS
Allrightsreserved.Nopartofthispublicationcanbereproducedorstoredinanyformor
bymeansofphotocopy,recordingorotherwisewithoutthepriorwrittenpermissionofthe
author.

PriceRs.101/

Copiescanbehadfrom

SARUMATHIPUBLICATIONS
Villupuram,Tamilnadu,India.
Sarumathi.publications@gmail.com

Printedat
MeenamOffset

Pondicherry605001,India

CS6660

COMPILERDESIGN

UNITIINTRODUCTIONTOCOMPILERS

LTPC

3003
5

TranslatorsCompilationandInterpretationLanguageprocessorsThePhasesofCompiler
Errors Encountered in Different PhasesThe Grouping of PhasesCompiler Construction
ToolsProgrammingLanguagebasics.
UNITIILEXICALANALYSIS

Need and Role of Lexical AnalyzerLexical ErrorsExpressing Tokens by Regular


ExpressionsConvertingRegularExpressiontoDFAMinimizationofDFALanguagefor
SpecifyingLexicalAnalyzersLEXDesignofLexicalAnalyzerforasampleLanguage.
UNITIIISYNTAXANALYSIS

10

Need and Role of the ParserContext Free Grammars Top Down Parsing General
StrategiesRecursiveDescentParserPredictiveParserLL(1)ParserShiftReduceParserLR
ParserLR(0)ItemConstructionofSLRParsingTableIntroductiontoLALRParserError
Handling and Recovery in Syntax AnalyzerYACCDesign of a syntax Analyzer for a
SampleLanguage.
UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT12
Syntax directed DefinitionsConstruction of Syntax TreeBottomup Evaluation of S
Attribute Definitions Design of predictive translator Type SystemsSpecification of a
simpletypecheckerEquivalenceofTypeExpressionsTypeConversions.
RUNTIME ENVIRONMENT: Source Language IssuesStorage OrganizationStorage
Allocation Parameter PassingSymbol TablesDynamic Storage AllocationStorage
AllocationinFORTAN.
UNITVCODEOPTIMIZATIONANDCODEGENERATION

PrincipalSourcesofOptimizationDAGOptimizationofBasicBlocksGlobalDataFlow
AnalysisEfficientDataFlowAlgorithmsIssuesinDesignofaCodeGeneratorASimple
CodeGeneratorAlgorithm.
TOTAL:45PERIODS
TEXTBOOK:
1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, Compilers
Principles,TechniquesandTools,2ndEdition,PearsonEducation,2007.
REFERENCES:
1. RandyAllen,KenKennedy, OptimizingCompilersforModernArchitectures:A
DependencebasedApproach,MorganKaufmannPublishers,2002.
2. Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan
KaufmannPublishersElsevierScience,India,IndianReprint2003.
3. KeithDCooperandLindaTorczon,EngineeringaCompiler,MorganKaufmann
PublishersElsevierScience,2004.
4. Charles N. Fischer, Richard. J. LeBlanc, Crafting a Compiler with C, Pearson
Education,2008.

Acknowledgement

I am very much grateful to the management of pauls educational trust,


Respected principal Dr.Y.R.M.Rao,M.E.,Ph.D.,cherishedDean Dr.E.Mariappane,
M.E.,Ph.D.,andhelpfulHeadofthedepartmentMr.M.G.LavakumarM.E.,(Ph.D.).

Ithank mycolleagues andfriends fortheir cooperationand their support inmy


careerventure.
Ithankmyparentsandfamilymembersfortheirvaluablesupportincompletionof
thebooksuccessfully.
IexpressmyspecialthankstoSARUMATHIPUBLICATIONSfortheircontinued
cooperationinshapingthework.
Suggestionsandcommentstoimprovethetextareverymuchsolicitated.

Mr.G.Appasami

TABLEOFCONTENTS
UNITIINTRODUCTIONTOCOMPILERS
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8

Translators
CompilationandInterpretation
Languageprocessors
ThePhasesofCompiler
ErrorsEncounteredinDifferentPhases
TheGroupingofPhases
CompilerConstructionTools
ProgrammingLanguagebasics

1.1
1.1
1.1
1.3
1.8
1.9
1.10
1.10

UNITIILEXICALANALYSIS
2.1
2.2
2.3
2.4
2.5
2.6
2.7

NeedandRoleofLexicalAnalyzer
LexicalErrors
ExpressingTokensbyRegularExpressions
ConvertingRegularExpressiontoDFA
MinimizationofDFA
LanguageforSpecifyingLexicalAnalyzersLEX
DesignofLexicalAnalyzerforasampleLanguage

2.1
2.3
2.3
2.6
2.9
2.10
2.12

UNITIIISYNTAXANALYSIS
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14

NeedandRoleoftheParser
ContextFreeGrammars
TopDownParsingGeneralStrategies
RecursiveDescentParser
PredictiveParser
LL(1)Parser
ShiftReduceParser
LRParser
LR(0)Item
ConstructionofSLRParsingTable
IntroductiontoLALRParser
ErrorHandlingandRecoveryinSyntaxAnalyzer
YACC
DesignofasyntaxAnalyzerforaSampleLanguage

3.1
3.1
3.9
3.10
3.11
3.12
3.14
3.15
3.17
3.18
3.22
3.26
3.27
3.29

UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15

SyntaxdirectedDefinitions
ConstructionofSyntaxTree
BottomupEvaluationofSAttributeDefinitions
Designofpredictivetranslator
TypeSystems
Specificationofasimpletypechecker
EquivalenceofTypeExpressions
TypeConversions
RUNTIMEENVIRONMENT:SourceLanguageIssues
StorageOrganization
StorageAllocation
ParameterPassing
SymbolTables
DynamicStorageAllocation
StorageAllocationinFORTAN

4.1
4.2
4.3
4.6
4.7
4.8
4.10
4.14
4.16
4.19
4.21
4.23
4.24
4.28
4.29

UNITVCODEOPTIMIZATIONANDCODEGENERATION
5.1
5.2
5.3
5.4
5.5
5.6
5.7

PrincipalSourcesofOptimization
DAG
OptimizationofBasicBlocks
GlobalDataFlowAnalysis
EfficientDataFlowAlgorithms
IssuesinDesignofaCodeGenerator
ASimpleCodeGeneratorAlgorithm

5.1
5.8
5.9
5.15
5.19
5.21
5.24

CS6660 __

Compiler Design

Unit I

_____1.1

UNITIINTRODUCTIONTOCOMPILERS
1.1TRANSLATORS
Atranslatorisonekindofprogramthattakesoneformofprogram(input)andconvertsinto
anotherform(output).Theinputprogramiscalled source languageandtheoutputprogramis
calledtargetlanguage.
Thesourcelanguagecanbelowlevellanguagelikeassemblylanguageorahighlevel
languagelikeC,C++,JAVA,FORTRAN,andsoon.
The target language can be a low level language (assembly language) or a machine
language(setofinstructionsexecuteddirectlybyaCPU).

Source

Translator

language

Targe
t
langua
ge

Figure1.1:Translator
TypesofTranslatorsare:
(1).Compilers
(2).Interpreters
(3).Assemblers
1.2COMPILATIONANDINTERPRETATION
A compiler isaprogramthatreadsaprograminonelanguageandtranslatesitintoan
equivalentprograminanotherlanguage.Thetranslationdonebyacompileriscalledcompilation.
Aninterpreterisanothercommonkindoflanguageprocessor.Insteadofproducingatarget
programasatranslation,aninterpreterappearstodirectlyexecutetheoperationsspecifiedinthe
source program on inputs supplied by the user. An interpreter executes the source program
statementbystatement.ThetranslationdonebyaninterpreteriscalledInterpretation.
1.3LANGUAGEPROCESSORS
(i)Compiler
Acompilerisaprogramthatcanreadaprograminonelanguage(thesourcelanguage)and
translateitintoanequivalentprograminanotherlanguage(thetargetlanguage)compilationis
showninFigure1.2.
Sourc
e
program
Compiler
(Input
)

Target
program
(Output)

Figure1.2:ACompiler
Animportantroleofthecompileristoreportanyerrorsinthesourceprogramthatitdetects
duringthetranslationprocess.
Ifthetargetprogramisanexecutablemachinelanguageprogram,itcanthenbecalledby
theusertoprocessinputsandproduceoutputs.

Input

Target Program

Figure1.3:Runningthetargetprogram

Output

CS6660 __

Compiler Design

Unit I

_____1.2

(ii)Interpreter
Aninterpreterisanothercommonkindoflanguageprocessor.Insteadofproducingatarget
programasatranslation,aninterpreterappearstodirectlyexecutetheoperationsspecifiedinthe
sourceprogramoninputssuppliedbytheuser,asshowninFigure1.4.
Source
Program

Interpreter

Input

Outp
ut

Figure1.4:Aninterpreter
Themachinelanguagetargetprogramproducedbyacompilerisusuallymuchfasterthan
aninterpreter(mappinginputstooutputsiseasyincompiler).
Compilerconvertsthesourcetotargetcompletely,butaninterpreterexecutesthesource
programstatementbystatement.UsuallyinterpretergivesbettererrordiagnosticsthanaCompiler.
(iii)HybridCompiler
Hybrid Compiler is combination of compilation and interpretation. Java language
processorscombinecompilationandinterpretationasshowninFigure1.4.
Javasource programfirst becompiledinto anintermediate formcalled bytecodes.The
bytecodesaretheninterpretedbyavirtualmachine.
Abenefitofthisarrangementisthatbytecodescompiledononemachinecanbeinterpreted
onanothermachine.
Source program
Translator
Intermediate
program
Input

Virtual
Machine

Outp
ut

Figure1.5:Ahybridcompiler
Inordertoachievefasterprocessingofinputstooutputs,someJavacompilers,calledjust
intimecompilers,translatethebytecodesintomachinelanguageimmediatelybeforetheyrun.
(iv)Languageprocessingsystem
Inadditiontoacompiler,severalotherprogramsmayberequiredtocreateanexecutable
targetprogram,asshowninFigure1.6.
Preprocessor:Preprocessorcollectsthesourceprogramwhichisdividedintomodulesandstored
inseparatefiles.Thepreprocessormayalsoexpandshorthandscalledmacrosintosourcelanguage
statements.E.g.#include<math.h>,#definePI.14
Compiler:Themodifiedsourceprogramisthenfedtoacompiler.Thecompilermayproducean
assemblylanguageprogramasitsoutput.becauseassemblylanguageiseasiertoproduceasoutput
andiseasiertodebug.

Assembler: The assembly language is then processed by a program called an assembler that
producesrelocatablemachinecodeasitsoutput.

CS6660 __

Compiler Design

Unit I

_____1.3

Linker:Thelinkerresolvesexternalmemoryaddresses,wherethecodeinonefilemayrefertoa
locationinanotherfile.Largeprogramsareoftencompiledinpieces,sotherelocatablemachine
codemayhavetobelinkedtogetherwithotherrelocatableobjectfilesandlibraryfilesintothe
codethatactuallyrunsonthemachine.
Loader:Theloaderthenputstogetheralloftheexecutableobjectfilesintomemoryforexecution.
Italsoperformsrelocationofanobjectcode.

Figure1.6:Alanguageprocessingsystem
Note:Preprocessors,Assemblers,LinkersandLoaderarecollectivelycalledcousinsofcompiler.
1.4THEPHASESOFCOMPILER/STRUCTUREOFCOMPILER
Theprocessofcompilationcarriedoutintwoparts,theyareanalysisandsynthesis.The
analysispartbreaks upthesourceprogramintoconstituentpiecesandimposesagrammatical
structureonthem.
Itthenusesthisstructuretocreateanintermediaterepresentationofthesourceprogram.
Theanalysispartalsocollectsinformationaboutthesourceprogramandstoresitinadatastructure
calledasymboltable,whichispassedalongwiththeintermediaterepresentationtothesynthesis
part.
Theanalysispartcarriedoutinthreephases,theyarelexicalanalysis,syntaxanalysisand
SemanticAnalysis.Theanalysispartisoftencalledthefrontendofthecompiler.Thesynthesispart
constructsthedesiredtargetprogramfromtheintermediaterepresentationandtheinformationin
thesymboltable.
The synthesispart carriedoutinthreephases,theyare IntermediateCodeGeneration,
CodeOptimizationandCodeGeneration.Thesynthesispartiscalledthebackendofthecompiler.

CS6660 __

Compiler Design

Unit I

_____1.4

Figure1.7:Phasesofacompiler
1.4.1LexicalAnalysis
Thefirstphaseofacompileriscalledlexicalanalysisorscanningorlinearanalysis.The
lexical analyzer reads the stream of characters making up the source program and groups the
charactersintomeaningfulsequencescalledlexemes.
Foreachlexeme,thelexicalanalyzerproducesasoutputatokenoftheform
<tokenname,attributevalue>
Thefirstcomponenttokennameisanabstractsymbolthatisusedduringsyntaxanalysis,
andthesecondcomponentattributevaluepointstoanentryinthesymboltableforthistoken.
Informationfromthesymboltableentry'isneededforsemanticanalysisandcodegeneration.
Forexample,supposeasourceprogramcontainstheassignmentstatement
position=initial+rate*60

(1.1)

CS6660 __

Compiler Design

Unit I

_____1.5

Figure1.8:Translationofanassignmentstatement
Thecharactersinthisassignmentcouldbegroupedintothefollowinglexemesandmappedintothe
followingtokens.

(2)
(3)
(4)
(5)
(6)
(7)

(1) position isalexemethatwouldbemappedintoatoken<id,1>.whereidisanabstract


symbolstandingforidentifierand1pointstothesymbolableentryforposition.
Theassignmentsymbol=isalexemethatismappedintothetoken<=>.
initialisalexemethatismappedintothetoken<id,2>.
+isalexemethatismappedintothetoken<+>.
rateisalexemethatismappedintothetoken<id,3>.
*isalexemethatismappedintothetoken<*>.
60isalexemethatismappedintothetoken<60>.
Blanksseparatingthelexemeswouldbediscardedbythelexicalanalyzer.Thesequenceof
tokensproducedasfollowsafterlexicalanalysis.

<id,1><=><id,2><+><id,3><*><60>

(1.2)

CS6660 __

Compiler Design

Unit I

_____1.6

1.4.2SyntaxAnalysis
Thesecondphaseofthecompilerissyntaxanalysisorparsingorhierarchicalanalysis.
Theparserusesthefirstcomponentsofthetokensproducedbythelexicalanalyzertocreate
atreelikeintermediaterepresentationthatdepictsthegrammaticalstructureofthetokenstream.
Thehierarchicaltreestructuregeneratedinthisphaseiscalledparsetreeorsyntaxtree.
Inasyntaxtree,eachinteriornoderepresentsanoperationandthechildrenofthenode
representtheargumentsoftheoperation.

Figure1.9:Syntaxtreeforposition=initial+rate*60
Thetreehasaninteriornodelabeled*with<id,3>asitsleftchildandtheinteger60asits
right child. The node <id, 3> represents the identifier rate. Similarly <id,2> and <id, 1> are
representedasintree.Therootofthetree,labeled=,indicatesthatwemuststoretheresultofthis
additionintothelocationfortheidentifierposition.
1.4.3SemanticAnalysis
Thesemanticanalyzerusesthesyntaxtreeandtheinformationinthesymboltabletocheck
thesourceprogramforsemanticconsistencywiththelanguagedefinition.
Itensuresthecorrectnessoftheprogram,matchingoftheparenthesisisalsodoneinthis
phase.
Italsogatherstypeinformationandsavesitineitherthesyntaxtreeorthesymboltable,for
subsequentuseduringintermediatecodegeneration.
Animportantpartofsemanticanalysisistypechecking,wherethecompilerchecksthat
eachoperatorhasmatchingoperands.
Thecompilermustreportanerrorifafloatingpointnumberisusedtoindexanarray.The
languagespecificationmaypermitsometypeconversionslikeintegertofloatforfloatadditionis
calledcoercions.
Theoperator*isappliedtoafloatingpointnumberrateandaninteger60.Theintegermay
beconvertedintoafloatingpointnumberbytheoperator inttofloat explicitlyasshowninthe
figure.

Figure1.10:Semantictreeforposition=initial+rate*60
1.4.4IntermediateCodeGeneration
Aftersyntaxandsemanticanalysisofthesourceprogram,manycompilersgeneratean
explicitlowlevelormachinelikeintermediaterepresentation.
Theintermediaterepresentationhavetwoimportantproperties:
a. Itshouldbeeasytoproduce
b. Itshouldbeeasytotranslateintothetargetmachine.

CS6660 __

Compiler Design

Unit I

_____1.7

Threeaddresscodeisoneoftheintermediaterepresentations,whichconsistsofasequence
of assemblylike instructions with three operands per instruction. Each operand can act like a
register.
TheoutputoftheintermediatecodegeneratorinFigure1.8consistsofthethreeaddresscode
sequenceforposition=initial+rate*60
t1=inttofloat(60)
t2=id3*t1
t3=id2+t2
id1=t3
(1.3)
1.4.5CodeOptimization
The machineindependent codeoptimization phaseattempts to improve theintermediate
codesothatbettertargetcodewillresult.Usuallybettermeansfaster.
Optimizationhastoimprovetheefficiencyofcodesothatthetargetprogramrunningtime
andconsumptionofmemorycanbereduced.
Theoptimizercandeducethattheconversionof60fromintegertofloatingpointcanbe
doneonceandforallatcompiletime,sotheinttofloatoperationcanbeeliminatedbyreplacingthe
integer60bythefloatingpointnumber60.0.
Moreover,t3isusedonlyoncetotransmititsvaluetoid1sotheoptimizercantransform
(1.3)intotheshortersequence
t1=id3*60.0
id1=id2+t1
(1.4)
1.4.6CodeGeneration
Thecodegeneratortakesasinputanintermediaterepresentationofthesourceprogramand
mapsitintothetargetlanguage.
Ifthetargetlanguageismachinecode,thentheregistersormemorylocationsareselected
foreachofthevariablesusedbytheprogram.
Theintermediateinstructionsaretranslatedintosequencesofmachineinstructions.
Forexample,usingregistersR1andR2,theintermediatecodein(1.4)mightgettranslated
intothemachinecode
LDF R2,id3
MULFR2,R2,#60.0
LDF Rl,id2
ADDFRl,Rl,R2
STF idl,Rl
(1.5)
Thefirstoperandofeachinstructionspecifiesadestination.TheFineachinstructiontells
usthatitdealswithfloatingpointnumbers.
Thecodein(1.5)loadsthecontentsofaddressid3intoregisterR2,thenmultipliesitwith
floatingpointconstant60.0.The#signifiesthat60.0istobetreatedasanimmediateconstant.The
thirdinstructionmovesid2intoregisterR1andthefourthaddstoitthevaluepreviouslycomputed
in register R2. Finally, the value in register R1 is stored into the address of id1, so the code
correctlyimplementstheassignmentstatement(1.1).

CS6660 __

Compiler Design

Unit I

_____1.8

1.4.7 SymbolTableManagement

Thesymboltable,whichstoresinformationabouttheentiresourceprogram,isused
byallphasesofthecompiler.
Anessentialfunction ofacompiler is torecord thevariable names usedinthe
sourceprogramandcollectinformationaboutvariousattributesofeachname.
Theseattributesmayprovideinformationaboutthestorageallocatedforaname,its
type,itsscope.
In the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or by
reference),andthetypereturnedaremaintainedinsymboltable.
Thesymboltableisadatastructurecontainingarecordforeachvariablename,with
fieldsfortheattributesofthename.Thedatastructureshouldbedesignedtoallow
thecompilertofindtherecordforeachnamequicklyandtostoreorretrievedata
fromthatrecordquickly.
Asymboltablecanbeimplementedinoneofthefollowingways:
O Linear(sortedorunsorted)list
O BinarySearchTree
O Hashtable

Amongtheaboveall,symboltablesaremostlyimplementedashashtables,where
thesourcecodesymbolitselfistreatedasakeyforthehashfunctionandthereturn
valueistheinformationaboutthesymbol.
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
O Tostorethenamesofallentitiesinastructuredformatoneplace.
O Toverifyifavariablehasbeendeclared.
O Toimplementtypechecking,byverifyingassignmentsandexpressions.
O Todeterminethescopeofaname(scoperesolution).

1.5 ERRORSENCOUNTEREDINDIFFERENTPHASES

Animportantroleofthecompileristoreportanyerrorsinthesourceprogramthat
itdetectsduringtheentiretranslationprocess.
Each phases of compiler can encounter errors, after detecting errors, must be
correctedtoprecedecompilationprocess.
Thesyntaxandsemanticphaseshandleslargenumberoferrorsincompilationprocess.

Errorhandlerhandlesalltypesoferrorslikelexicalerrors,syntaxerrors,semantic
errorsandlogicalerrors.
Lexicalerrors:
Lexicalanalyzerdetectserrorsfrominputcharacters.
Nameofsomekeywordsidentifierstypedincorrectly.
Example:switchiswrittenasswich.
Syntaxerrors:
Syntaxerrorsaredetectedbysyntaxanalyzer.
Errorslikesemicolonmissingorunbalancedparenthesis.
Example:((a+b*(cd)).Inthisstatement)missingafterb.
Semanticerrors:
Datatypemismatcherrorshandledbysemanticanalyzer.
Incompatibledatatypevaleassignment.
Example:Assigningastringvaluetointeger.
Logicalerrors:
Codenotereachableandinfiniteloops.

Misuseofoperators.Codeswrittenafterendofmain()block.

CS6660 __

Compiler Design

Unit I

_____1.9

1.6 THEGROUPINGOFPHASES

Eachphasesdealswiththelogicalorganizationofacompiler.

Activitiesofseveralphasesmaybegroupedtogetherintoapassthatreadsaninput
fileandwritesanoutputfile.
The frontendphases oflexicalanalysis, syntax analysis,semantic analysis,and
intermediatecodegenerationmightbegroupedtogetherintoonepass.
Codeoptimizationmightbeanoptionalpass.
Abackendpassconsistingofcodegenerationforaparticulartargetmachine.

Source program (input)


Front end
Lexical Analyzer
Syntax analyzer
Semantic Analyzer
Intermediate
Code Generator
Source language dependent
(Machine independent)
Intermediate code
Back end
Code
optimizer
(optional)
Code Generator
Machine dependent
(Source language dependent)
Target program (output)

Figure1.11:TheGroupingofPhasesofcompiler
Some compiler collections have been created around carefully designed intermediate
representationsthatallowthefrontendforaparticularlanguagetointerfacewiththebackendfora
certaintargetmachine.
Advantages:
Withthesecollections,wecanproducecompilersfordifferentsourcelanguagesforone
targetmachinebycombiningdifferentfrontends.
Similarly,wecanproducecompilersfordifferenttargetmachines,bycombiningafront
endfordifferenttargetmachines.

CS6660 __

Compiler Design

Unit I

_____1.10

1.7COMPILERCONSTRUCTIONTOOLS
The compiler writer, like any software developer, can profitably use modern software
developmentenvironmentscontainingtoolssuchaslanguageeditors,debuggers,versionmanagers,
profilers,testharnesses,andsoon.
Writingacompilerisatediousandtimeconsumingtask;therearesomespecializedtoolsto
implementvariousphasesofacompiler.ThesetoolsarecalledCompilerConstructionTools.
Somecommonlyusedcompilerconstructiontoolsaregivenbelow:

Scannergenerators
Parsergenerators
Syntaxdirectedtranslationengines
Dataflowanalysisengines
Codegeneratorgenerators
Compilerconstructiontoolkits

[LexicalAnalysis]
[SyntaxAnalysis]
[IntermediateCode]
[CodeOptimization]
[CodeGeneration]
[Forallphases]

1. Scanner generators that produce lexical analyzers from a regularexpression


descriptionofthetokensofalanguage.UnixhasatoolforScannergeneratorcalled
LEX.
2. Parsergenerators thatautomaticallyproduce syntaxanalyzers(parsetree) froma
grammaticaldescriptionofaprogramminglanguage.UnixhasatoolcalledYACC
whichisaparsergenerator.
3. Syntaxdirectedtranslationenginesthatproducecollectionsofroutinesforwalking
aparsetreeandgeneratingintermediatecode.
4. Dataflowanalysisengines thatfacilitatethegatheringofinformationabouthow
valuesare transmittedfromonepartofaprogramtoeachotherpart.Dataflow
analysisisakeypartofcodeoptimization.
5. Codegeneratorgeneratorsthatproduceacodegeneratorfromacollectionofrules
for translating each operation of the intermediate language into the machine
languageforatargetmachine.
6. Compilerconstruction toolkits that provide an integrated set of routines for
constructingvariousphasesofacompiler.
1.8 PROGRAMMINGLANGUAGEBASICS.
Todesignanefficientcompilerweshouldknowsomelanguagebasics.Importantconcepts
frompopularprogramminglanguageslikeC,C++,C#,andJavaarelistedbelow.
SomeoftheProgrammingLanguagebasicswhichareusedinmostofthelanguagesare
listedbelow.Theyare:
TheStatic/DynamicDistinction
EnvironmentsandStates
StaticScopeandBlockStructure
ExplicitAccessControl
DynamicScope
ParameterPassingMechanisms

Aliasing

CS6660 __

Compiler Design

Unit I

_____1.11

1.8.1TheStatic/DynamicDistinction
Thelanguageusesastaticpolicyorthattheissuecanbedecidedatcompiletime.Onthe
otherhand,apolicythatonlyallowsadecisiontobemadewhenweexecutetheprogramissaidto
beadynamicpolicyortorequireadecisionatruntime.
Thescopeofadeclarationofxistheregionoftheprograminwhichusesofxrefertothis
declaration.Alanguageusesstaticscopeorlexicalscopeifitispossibletodeterminethescopeof
adeclarationbylookingonlyattheprogram.Otherwise,thelanguageusesdynamicscope.With
dynamicscope,astheprogramruns,thesameuseofxcouldrefertoanyofseveraldifferent
declarationsofx.
Example:considertheuseoftheterm"static"asitappliestodatainaJavaclassdeclaration.In
Java,avariableisanameforalocationinmemoryusedtoholdadatavalue.Here,"static"refers
nottothescopeofthevariable,butrathertotheabilityofthecompilertodeterminethelocationin
memorywherethedeclaredvariablecanbefound.Adeclarationlike
public staticintx;
Thismakesxaclassvariableandsaysthatthereisonlyonecopyofx,nomatterhowmany
objectsofthisclassarecreated.Moreover,thecompilercandeterminealocationinmemorywhere
thisintegerxwillbeheld.Incontrast,had"static"beenomittedfromthisdeclaration,theneach
objectoftheclasswouldhaveitsownlocationwherexwouldbeheld,andthecompilercouldnot
determinealltheseplacesinadvanceofrunningtheprogram.
1.8.2EnvironmentsandStates
Programminglanguagesaffectthevaluesofdataelementsoraffecttheinterpretationof
namesforthatdatachanges,astheprogramruns.Forexample,theexecutionofanassignmentsuch
asx=y+1changesthevaluedenotedbythenamex.Morespecifically,theassignmentchanges
thevalueinwhateverlocationisdenotedbyx.
Thelocationdenotedbyxcanchangeatruntime.Ifxisnotastatic(or"class")variable,
theneveryobjectoftheclasshasitsownlocationforaninstanceofvariablex.Inthatcase,the
assignmenttoxcanchangeanyofthose"instance"variables,dependingontheobjecttowhicha
methodcontainingthatassignmentisapplied.
environment
state
nameslocations(variables)values
Theassociationofnameswithlocationsinmemory(thestore)andthenwithvaluescanbe
describedbytwomappingsthatchangeastheprogramruns:
1. Theenvironmentisamappingfromnamestolocationsinthestore.Sincevariablesreferto
locations('lvalues"intheterminologyofC),wecouldalternativelydefineanenvironment
asamappingfromnamestovariables.
2. The state isamappingfromlocationsinstoretotheirvalues.Thatis,thestatemaps1
valuestotheircorrespondingrvalues,intheterminologyofC.
Environmentschangeaccordingtothescoperulesofalanguage.
Example: Consider the C program fragment, Integer i is declared a global variable, and also
declaredasavariablelocaltofunctionf.Whenfisexecuting,theenvironmentadjustssothatname
ireferstothelocationreservedfortheithatislocaltof,andanyuseofi,suchastheassignmenti
=3shownexplicitly,referstothatlocation.

CS6660 __

Compiler Design

Unit I

_____1.12

Typically,thelocaliisgivenaplaceontheruntimestack.

inti;
...
voidf(..){
inti;

i=3;

x=i+1;

/*globali*/

/*locali*/

/*useoflocali*/

/*useofglobali*/

Wheneverafunctiongotherthanfisexecuting,usesoficannotrefertotheithatislocalto
f.Usesofnameiingmustbewithinthescopeofsomeotherdeclarationofi.Anexampleisthe
explicitlyshownstatementx=i+l,whichisinsidesomeprocedurewhosedefinitionisnotshown.
Theiini+1presumablyreferstotheglobali.
1.8.3StaticScopeandBlockStructure
The scope rules for C are based on program structure; the scope of a declaration is
determinedimplicitlybywherethedeclarationappearsintheprogram.Laterlanguages,suchasC+
+,Java,andC#,alsoprovideexplicitcontroloverscopesthroughtheuseofkeywordslikepublic,
private,andprotected.
Ablockisagroupingofdeclarationsandstatements.Cusesbraces{and}todelimita
block;thealternativeuseofbeginandendinsomelanguages.
Example: TheC++programinFig.1.10hasfourblocks,withseveraldefinitionsofvariablesa
andb.Asamemoryaid,eachdeclarationinitializesitsvariabletothenumberoftheblocktowhich
itbelongs.

Output
32
14
12
11
Figure1.12:BlocksinaC++program

CS6660 __

Compiler Design

Unit I

_____1.13

Considerthedeclarationinta=1inblockB1.ItsscopeisallofB1,exceptforthoseblocks
nestedwithinB1thathavetheirowndeclarationofa.B2,nestedimmediatelywithinB1,doesnot
haveadeclarationofa,butB3does.B4doesnothaveadeclarationofa,soblockB3istheonly
placeintheentireprogramthatisoutsidethescopeofthedeclarationofthenameathatbelongsto
B1.Thatis,thisscopeincludesB4andallofB2exceptforthepartofB2thatiswithinB3.The
scopesofallfivedeclarationsaresummarizedinFigure1.13.

Figure1.13:Scopesofdeclarations
1.8.4ExplicitAccessControl
Classesandstructuresintroduceanewscopefortheirmembers.Ifpisanobjectofaclass
withafield(member)x,thentheuseofxinp.xreferstofieldxintheclassdefinition.thescopeof
amemberdeclarationxinaclassCextendstoanysubclassC',exceptifC'hasalocaldeclaration
ofthesamenamex.
Through the use of keywords like public, private, and protected, object oriented
languagessuchasC++orJavaprovideexplicitcontroloveraccesstomembernamesinasuper
class.Thesekeywordssupportencapsulationbyrestrictingaccess.
Thus,privatenamesarepurposelygivenascopethatincludesonlythemethoddeclarations
anddefinitionsassociatedwiththatclassandany"friend"classes(theC++term).Protectednames
areaccessibletosubclasses.Publicnamesareaccessiblefromoutsidetheclass.
1.8.5DynamicScope
Technically,anyscopingpolicyisdynamicifitisbasedonfactor(s)thatcanbeknownonly
whentheprogramexecutes.Thetermdynamicscope,however,usuallyreferstothefollowing
policy:auseofanamexreferstothedeclarationofxinthemostrecentlycalledprocedurewith
suchadeclaration.
Dynamicscopingofthistypeappearsonlyinspecialsituations.
We shall consider two examples of dynamic policies: macro expansion in the C
preprocessorandmethodresolutioninobjectorientedprogramming.
Example:IntheCprogram,identifieraisamacrothatstandsforexpression(x+I).Butwecannot
resolvexstatically,thatis,intermsoftheprogramtext.
#definea(x+1)
intx=2;
voidb(){intx=1;printf(%d\n,
a);}voidc(){printf("%d\n,a);}
voidmain(){b();c();}
Infact,inordertointerpretx,wemustusetheusualdynamicscoperule.thefunctionmain
firstcallsfunctionb.Asbexecutes,itprintsthevalueofthemacroa.Since(x+1)mustbe
substitutedfora,weresolvethisuseofxtothedeclarationintx=linfunctionb.Thereasonisthat
bhasadeclarationofx,sothe(x+1)intheprintfinbreferstothisx.Thus,thevalueprintedis1.

CS6660 __

Compiler Design

Unit I

_____1.14

Afterbfinishes,andciscalled,weagainneedtoprintthevalueofmacroa.However,the
onlyxaccessibletocistheglobalx.Theprintfstatementincthusreferstothisdeclarationofx,
andvalue2isprinted.
1.8.6ParameterPassingMechanisms
Allprogramminglanguageshaveanotionofaprocedure,buttheycandifferinhowthese
procedures get their arguments. The actual parameters (the parameters used in the call of a
procedure)areassociatedwiththeformalparameters(thoseusedintheproceduredefinition).
Incallbyvalue,theactualparameterisevaluated(ifitisanexpression)orcopied(ifitisa
variable).Thevalueisplacedinthelocationbelongingtothecorrespondingformalparameterof
thecalledprocedure.ThismethodisusedinCandJava.
Incallbyreference,theaddressoftheactualparameterispassedtothecalleeasthevalue
ofthecorrespondingformalparameter.Usesoftheformalparameterinthecodeofthecalleeare
implementedbyfollowingthispointertothelocationindicatedbythecaller.Changestotheformal
parameterthusappearaschangestotheactualparameter.
AthirdmechanismcallbynamewasusedintheearlyprogramminglanguageAlgol60.It
requiresthatthecalleeexecuteasiftheactualparameterweresubstitutedliterallyfortheformal
parameterinthecodeofthecallee,asiftheformalparameterwereamacrostandingfortheactual
parameter.
1.8.7Aliasing
Thereisaninterestingconsequenceofcallbyreferenceparameterpassingoritssimulation,
as in Java, where references to objects are passed by value. It is possible that two formal
parameterscanrefertothesamelocation;suchvariablesaresaidtobealiasesofoneanother.As
a result, any two variables, which may appear to take their values from two distinct formal
parameters,canbecomealiasesofeachother.
Example:Supposeaisanarraybelongingtoaprocedurep,andpcallsanotherprocedureq(x,y)
withacallq(a,a).Supposealsothatparametersarepassedbyvalue,butthatarraynamesarereally
referencestothelocationwherethearrayisstored,asinCorsimilarlanguages.Now,xandyhave
becomealiasesofeachother.Theimportantpointisthatifwithinqthereisanassignmentx[10]=
2,thenthevalueofy[10]alsobecomes2.

CS6660

Compiler Design

Unit II

2.1

UNITIILEXICALANALYSIS
2.1NEEDANDROLEOFLEXICALANALYZER
LexicalAnalysisisthefirstphaseofcompiler.Itreadstheinputcharactersfromleftto
right,onecharacteratatime,fromthesourceprogram.
Itgeneratesthesequenceoftokensforeachlexeme.Eachtokenisalogicalcohesiveunit
suchasidentifiers,keywords,operatorsandpunctuationmarks.
Itneedstoenterthatlexemeintothesymboltableandalsoreadsfromthesymboltable.
TheseinteractionsaresuggestedinFigure2.1.

Figure2.1:Interactionsbetweenthelexicalanalyzerandtheparser
Sincethelexicalanalyzeristhepartofthecompilerthatreadsthesourcetext,itmay
perform certain other tasks besides identification of lexemes. One such task is stripping out
comments and whitespace (blank, newline, tab). Another task is correlating error messages
generatedbythecompilerwiththesourceprogram.
Needs/Roles/Functionsoflexicalanalyzer

Itproducesstreamoftokens.
Iteliminatescommentsandwhitespace.
Itkeepstrackoflinenumbers.
Itreportstheerrorencounteredwhilegeneratingtokens.
Itstoresinformationaboutidentifiers,keywords,constantsandsoonintosymboltable.
Lexicalanalyzersaredividedintotwoprocesses:
a) Scanningconsistsofthesimpleprocessesthatdonotrequiretokenizationoftheinput,such
asdeletionofcommentsandcompactionofconsecutivewhitespacecharactersintoone.
b) Lexicalanalysisisthemorecomplexportion,wherethescannerproducesthesequenceof
tokensasoutput.
LexicalAnalysisversusParsing/IssuesinLexicalanalysis
1. Simplicityofdesign:Itisthemostimportantconsideration.Theseparationoflexicaland
syntacticanalysisoftenallowsustosimplifytasks.whitespaceandcommentsremovedby
thelexicalanalyzer.
2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply
specializedtechniquesthatserveonlythelexicaltask,notthejobofparsing.Inaddition,
specializedbufferingtechniquesforreadinginputcharacters canspeedupthecompiler
significantly.
3. Compilerportabilityisenhanced. Inputdevicespecificpeculiaritiescanberestrictedto
thelexicalanalyzer.
Tokens,Patterns,andLexemes
Atokenisapairconsistingofatokennameandanoptionalattributevalue.Thetokenname

isanabstractsymbolrepresentingakindofsinglelexicalunit,e.g.,aparticularkeyword,ora

CS6660

Compiler Design

Unit II

2.2

sequenceofinputcharactersdenotinganidentifier.Operators,specialsymbolsandconstantsare
alsotypicaltokens.
Apatternisadescriptionoftheformthatthelexemesofatokenmaytake.Patternissetof
rulesthatdescribethetoken.Alexemeisasequenceofcharactersinthesourceprogramthat
matchesthepatternforatoken.
Table2.1:TokensandLexemes
TOKEN
INFORMAL
DESCRIPTION SAMPLELEXEMES
(PATTERN)
if
charactersi,f
if
else
characterse,l,s,e
else
comparison
<or>or<=or>=or==or!=
<=,!=
id
Letter,followedbylettersanddigits
pi,score,D2,sum,id_1,AVG
number
anynumericconstant
35,3.14159,0,6.02e23
literal
anythingsurroundedby
Core, Design Appasami,
Inmanyprogramminglanguages,thefollowingclassescovermostorallofthetokens:
1. Onetokenforeachkeyword.Thepatternforakeywordisthesameasthekeyworditself.
2. Tokensfortheoperators,eitherindividuallyorinclassessuchasthetokencomparison
mentionedintable2.1.
3. Onetokenrepresentingallidentifiers.
4. Oneormoretokensrepresentingconstants,suchasnumbersandliteralstrings.
5. Tokens for each punctuation symbol, such as left and right parentheses, comma, and
semicolon
AttributesforTokens
Whenmorethanonelexemecanmatchapattern,thelexicalanalyzermustprovidethe
subsequentcompilerphasesadditionalinformationabouttheparticularlexemethatmatched.
Thelexicalanalyzerreturnstotheparsernotonlyatokenname,butanattributevaluethat
describesthelexemerepresentedbythetoken.
Thetokennameinfluencesparsingdecisions,whiletheattributevalueinfluencestranslation
oftokensaftertheparse.
Informationaboutanidentifiere.g.,itslexeme,itstype,andthelocationatwhichitisfirst
found(incaseanerrormessage)iskeptinthesymboltable.
Thus,theappropriateattributevalueforanidentifierisapointertothesymboltableentry
forthatidentifier.
Example:ThetokennamesandassociatedattributevaluesfortheFortranstatement
E=M
*C**2arewrittenbelowasasequenceofpairs.
<id,pointertosymboltableentryforE>
<assign_op>
<id,pointertosymboltableentryforM>
<mult_op>
<id,pointertosymboltableentryforC>
<exp_op>
<number,integervalue2>
Notethatincertainpairs,especiallyoperators,punctuation,andkeywords,thereisnoneed

foranattributevalue.Inthisexample,thetokennumberhasbeengivenanintegervaluedattribute.

CS6660

Compiler Design

Unit II

2.3

2.2LEXICALERRORS
Itishardforalexicalanalyzertotellthatthereisasourcecodeerrorwithouttheaidof
othercomponents.
ConsideraCprogramstatementfi(a==f(x)).Thelexicalanalyzercannottellwhetherfiis
amisspellingofthekeywordiforanundeclaredfunctionidentifier.Sincefiisavalidlexemefor
thetokenid,thelexicalanalyzermustreturnthetokenidtotheparser.
Thelexicalanalyzerisunabletoproceedbecausenoneofthepatternsfortokensmatches
anyprefixoftheremaininginput.Thesimplestrecoverystrategyis"panicmode"recovery.
Wedeletesuccessivecharactersfromtheremaininginput,untilthelexicalanalyzercanfind
awellformedtokenatthebeginningofwhatinputisleft.
Otherpossibleerrorrecoveryactionsare:
1.
2.
3.
4.

Deleteonecharacterfromtheremaininginput.
Insertamissingcharacterintotheremaininginput.
Replaceacharacterbyanothercharacter.
Transposetwoadjacentcharacters.

Transformationslikethesemaybetriedinanattempttorepairtheinput.Thesimplestsuch
strategyistoseewhetheraprefixoftheremaininginputcanbetransformedintoavalidlexemeby
asingletransformation.
Inpracticemostlexicalerrorsinvolveasinglecharacter.Amoregeneralcorrectionstrategy
istofindthesmallestnumberoftransformationsneededtoconvertthesourceprogramintoone
thatconsistsonlyofvalidlexemes.
2.3EXPRESSINGTOKENSBYREGULAREXPRESSIONS
SpecificationofTokens
Regularexpressionsareanimportantnotationforspecifyinglexemepatterns.Wecannot
expressallpossiblepatterns,theyareveryeffectiveinspecifyingthosetypesofpatternsthatwe
actuallyneedfortokens.
StringsandLanguages
An alphabet is any finite set ofsymbols. Examples of symbols are letters, digits, and
punctuation.Theset{0,1)isthebinaryalphabet.ASCIIisanimportantexampleofanalphabet.
Astring(sentenceorword)overanalphabetisafinitesequenceofsymbolsdrawnfrom
thatalphabet.Thelengthofastrings,usuallywritten|s|,isthenumberofoccurrencesofsymbols
ins.Forexample,bananaisastringoflengthsix.Theemptystring,denoted ,isthestringof
lengthzero.
A language isanycountablesetofstringsoversomefixedalphabet.Abstractlanguages
like , the empty set, or { }, thesetcontainingonlytheemptystring,arelanguagesunderthis
definition.
PartsofStrings:
1. Aprefixofstringsisanystringobtainedbyremovingzeroormoresymbolsfromtheend
ofs.Forexample,ban,banana,andareprefixesofbanana.
2. Asufixofstringsisanystringobtainedbyremovingzeroormoresymbolsfromthe
beginningofs.Forexample,nana,banana,andaresuffixesofbanana.
3. Asubstringofsisobtainedbydeletinganyprefixandanysuffixfroms.Forinstance,
banana,nan,andaresubstringsofbanana.
4. Theproperprefixes,suffixes,andsubstringsofastringsarethose,prefixes,suffixes,and
substrings,respectively,ofsthatarenotornotequaltositself.
5. A subsequence of s is any string formed by deleting zero or more not necessarily

consecutivepositionsofs.Forexample,baanisasubsequenceofbanana.

CS6660

Compiler Design

Unit II

2.4

6. Ifxandyarestrings,thentheconcatenationofxandy,denotedxy,isthestringformedby
appendingytox.
OperationsonLanguages
Inlexicalanalysis,themostimportantoperationsonlanguagesareunion,concatenation,
andclosure,whicharedefinedintable2.2.
Table2.2:Definitionsofoperationsonlanguages

Example:LetLbethesetofletters{A,B,...,Z,a,b,...,z)andletDbethesetofdigits{0,1,..
.9).OtherlanguagesthatcanbeconstructedfromlanguagesLandD
1. LUDisthesetoflettersanddigitsstrictlyspeakingthelanguagewith62stringsoflength
one,eachofwhichstringsiseitheroneletteroronedigit.
2. LDisthesetdf520stringsoflengthtwo,eachconsistingofoneletterfollowedbyone
digit.
3. L4isthesetofall4letterstrings.
4. L*isthesetofailstringsofletters,includinge,theemptystring.
5. L(LUD)*isthesetofallstringsoflettersanddigitsbeginningwithaletter.
6. D+isthesetofallstringsofoneormoredigits.
Regularexpression
Regularexpressioncanbedefinedasasequenceofsymbolsandcharactersexpressinga
stringorpatterntobesearched.
Regularexpressionsaremathematicalrepresentationwhichdescribesthesetofstringsof
specificlanguage.
Regularexpressionforidentifiersrepresentedbyletter_(letter_|digit)*.Theverticalbar
meansunion,theparenthesesareusedtogroupsubexpressions,andthestarmeans"zeroormore
occurrencesof".
EachregularexpressionrdenotesalanguageL(r),whichisalsodefinedrecursivelyfrom
thelanguagesdenotedbyr'ssubexpressions.
Therules that define the regular expressions over some alphabet .
Basisrules:
1. is a regular expression, and L() is { }.
2. If a is a symbol in , then a is a regular expression, and L(a) = {a}, that is, the
languagewithonestringoflengthone.
Inductionrules:SupposerandsareregularexpressionsdenotinglanguagesL(r)andL(s),
respectively.
1. (r)|(s)isaregularexpressiondenotingthelanguageL(r)UL(s).
2. (r)(s)isaregularexpressiondenotingthelanguageL(r)L(s).
3. (r)*isaregularexpressiondenoting(L(r))*.
4. (r) is a regular expression denoting L(r). i.e., Additional pairs of parentheses around
expressions.

Example:Let = {a, b}.


CS6660

Compiler Design

Regular
expression
a|b
(a|b)(a|b)
a*
(a|b)*
a|a*b

Unit II

Language

Meaning

{a,b}
{aa,ab,ba,bb}
{ , a, aa, aaa, }
{,a,b,aa,ab,ba,bb,
aaa, }
{a, b, ab, aab, aaab, }

Single a or b
All strings of length two over the alphabet
Consistingofallstringsofzeroormorea's
setofallstringsconsistingofzeroormore
instancesofaorb
Stringaandallstringsconsistingofzeroor
morea'sandendinginb

2.5

Alanguagethatcanbedefinedbyaregularexpressioniscalledaregularset.Iftwo
regularexpressionsrandsdenotethesameregularset,wesaytheyareequivalentandwriter=s.
Forinstance,(a|b)=(b|a),(a|b)*=(a*b*)*,(b|a)*=(a|b)*,(a|b)(b|a)=aa|ab|ba|bb.
Algebraiclaws
Algebraiclawsthatholdforarbitraryregularexpressionsr,s,andt:
LAW
DESCRIPTION
r|s=s|r
|iscommutative
r(s|t)=(r|s)t
|isassociative
r(st)=(rs)t
Concatenationisassociative
r(s|t)=rs|rt;(s|t)r=sr|tr
Concatenationdistributesover|
r=r=r
is the identity for concatenation
r* = (r |)*
is guaranteed in a closure
r**=r*
*isidempotent
ExtensionsofRegularExpressions
FewnotationalextensionsthatwerefirstincorporatedintoUnixutilitiessuchasLexthat
areparticularlyusefulinthespecificationlexicalanalyzers.
1. Oneormoreinstances:Theunary,postfixoperator+representsthepositiveclosureofa
regularexpressionanditslanguage.Ifrisaregularexpression,then(r)+denotesthe
+
+
language(L(r)) .Thetwousefulalgebraiclaws,r*=r | and r+ =rr*=r*r.
2. Zerooroneinstance:Theunarypostfixoperator?means"zerooroneoccurrence."
That is, r? is equivalent to r| , L(r?) = L(r) U {}.
3. Characterclasses:Aregularexpressiona1|a2||an,wheretheai'sareeachsymbolsofthe
alphabet,canbereplacedbytheshorthand[a1, a2, an].Thus,[abc]isshorthandfora|b|c,
and[az] is shorthand for a|b||z.
Example:RegulardefinitionforC
identifierLetter_[AZa
z_]
digit[09]
idletter_(letter_|digit)*
Example:Regulardefinitionunsigned
integerdigit[09]
+

digitsdigit
numberdigits(.digits)?(E[+]?digits)?

Note:Theoperators*,+,and?hasthesameprecedenceandassociativity.

CS6660

Compiler Design

Unit II

2.6

2.4CONVERTINGREGULAREXPRESSIONTODFA
ToconstructaDFAdirectlyfromaregularexpression,weconstructitssyntaxtreeandthen
compute four functions: nullable, firstpos, lastpos, and followpas, defined as follows. Each
definitionreferstothesyntaxtreeforaparticularaugmentedregularexpression(r)#.
1. nullable(n)istrueforasyntaxtreenodenifandonlyifthesubexpressionrepresentedbyn
has initslanguage.Thatis,thesubexpressioncanbe"madenull"ortheemptystring,
eventhoughtheremaybeotherstringsitcanrepresentaswell.
2. firstpos(n) isthesetofpositionsinthesubtreerootedatnthatcorrespondtothefirst
symbolofatleastonestringinthelanguageofthesubexpressionrootedatn.
3. lastpos(n)isthesetofpositionsinthesubtreerootedatnthatcorrespondtothelastsymbol
ofatleastonestringinthelanguageofthesubexpressionrootedatn.
4. followpos(p), forapositionp,isthesetofpositionsqintheentiresyntaxtreesuchthat
thereissomestringx=a1a2aninL((r)#)suchthatforsomei,thereisawaytoexplain
themembershipofxinL((r)#)bymatchingaitopositionpofthesyntaxtree
andai+1topositionq.
Wecancomputenullable,firstpos,andlastposbyastraightforwardrecursionontheheight
ofthetree.Thebasisandinductiverulesfornullableandfirstposaresummarizedintable.
Therulesforlastposareessentiallythesameasforfirstpos,buttherolesofchildrenc1and
c2mustbeswappedintheruleforacatnode.
Thereareonlytwowaystocomputefollowpos.
1. Ifnisacatnodewithleftchildc landrightchildc2,thenforeverypositioniinlastpos(c 1),
allpositionsinfirstpos(c2)areinfollowpos(i).
2. 2.Ifnisastarnode,andiisapositioninlastpos(n),thenallpositionsinfirstpos(n)arein
followpos(i).

ConvertingaRegularExpressionDirectlytoaDFA
Algorithm:ConstructionofaDFAfromaregularexpressionr.
INPUT:Aregularexpressionr.
OUTPUT:ADFADthatrecognizesL(r).
METHOD:
1.
2.
3.

ConstructasyntaxtreeTfromtheaugmentedregularexpression(r)#.
Computenullable,firstpos,lastpos,andfollowposforT.
ConstructDstates,thesetofstatesofDFAD,andDtran,thetransitionfunctionforD,

CS6660

Compiler Design

Unit II

initializeDstatestocontainonlytheunmarkedstatefirstpos(no),
wherenoistherootofsyntaxtreeTfor(r)#;
while(thereisanunmarkedstateSinDstates)
{
markS;
for(eachinputsymbola)
{
letUbetheunionoffollowpos(p)forallpinSthatcorrespondtoa;
if(UisnotinDstates)
addUasanunmarkedstatetoDstates;
Dtran[S,a]=U
}
}

2.7

By
theaboveprocedure.ThestatesofDaresetsofpositionsinT.Initially,eachstateis
"unmarked,"andastatebecomes"marked"justbeforeweconsideritsouttransitions.Thestart
stateofDisfirstpos(no),wherenodenoistherootofT.Theacceptingstatesarethose
containingthepositionfortheendmarkersymbol#.
Example:ConstructaDFAfortheregularexpressionr=(a|b)*abb

Figure2.2:Syntaxtreefor(a|b)*abb#

Figure2.3:firstposandlastposfornodesinthesyntaxtreefor(a|b)*abb#

CS6660

Compiler Design

Unit II

2.8

Wemustalsoapplyrule2tothestarnode.Thatruletellsuspositions1and2areinboth
followpos(1)andfollowpos(2),sincebothfirstpasandlastposforthisnodeare{1,2}.Thecomplete
setsfollowposaresummarizedintable
NODEn
Followpos(n)
1
{1,2,3}
2
{1,2,3}
3
{4}
4
{4}
5
{4}
6
{}

Figure2.4:Directedgraphforthefunctionfollowpos
nullableistrueonlyforthestarnode,andweexhibitedfirstposandlastposinFigure2.3.
Thevalueoffirstposfortherootofthetreeis{1,2,3},sothissetisthestartstateofD.allthisset
ofstatesA.WemustcomputeDtran[A,a]andDtran[A,b].AmongthepositionsofA,1and3
correspondtoa,while2correspondstob.Thus,Dtran[A,a]=followpos(1)Ufollowpos(3)={1,
2,3,4},andDtran[A,b]=followpos(2)={1,2,3}.

Figure2.5:DFAconstructedfor(a|b)*abb#
ThelatterisstateA,andsodoesnothavetobeaddedtoDstates,buttheformer,B=
{1,2,3,4},isnew,soweaddittoDstatesandproceedtocomputeitstransitions.TheompleteDFA
isshowninFigure2.5.
Example:Construct NFA for (alb)*abbandconverttoDFAbysubsetconstruction.

Figure2.6:NFA for (a|b)*abb

CS6660

Compiler Design

Unit II

2.9

Figure2.7:NFAfor(a|b)*abb

Figure2.8ResultofapplyingthesubsetconstructiontoFigure2.6
2.5MINIMIZATIONOFDFA
TherecanbemanyDFA'sthatrecognizethesamelanguage.Forinstance,theDFAsof
Figure2.5and2.8bothrecognizethesamelanguageL((a|b)*abb).
WewouldgenerallypreferaDFAwithasfewstatesaspossible,sinceeachstaterequires
entriesinthetablethatdescribesthelexicalanalyzer.
Algorithm:MinimizingthenumberofstatesofaDFA.
INPUT:ADFADwithsetofstatesS,inputalphabet,initialstateso,andsetofacceptingstates
F.
OUTPUT:ADFAD'acceptingthesamelanguageasDandhavingasfewstatesaspossible.
METHOD:

2.

1. StartwithaninitialpartitionIIwithtwogroups,FandSF,theacceptingandnonaccepting
statesofD.
ApplytheprocedureofFig.3.64toconstructanewpartitionanew.
initially, let new = ; for
( each group G of )

{
partitionGintosubgroupssuchthattwostatessandtareinthesamesubgroupif
andonlyifforall
inputsymbolsa,statessandthave transitions on a to states in the same group of ;
/*atworst,astatewillbeinasubgroupbyitself*/
replaceGinIInewbythesetofallsubgroupsformed;

5.
(a)

3. If new = , let final = andcontinuewithstep(4).Otherwise,repeatstep (2) with


new in place of If .
4. Choose one state in each group of final as the representative for that group. The
representativeswillbethestatesoftheminimumstateDFAD'.
TheothercomponentsofD'areconstructedasfollows:
ThestatestateofD'istherepresentativeofthegroupcontainingthestartstateofD.
(b) Theacceptingstatesof D' aretherepresentativesofthosegroupsthatcontainan

acceptingstateofD.

CS6660

Compiler Design

Unit II

2.10

(c) LetsbetherepresentativeofsomegroupGoffina,andletthetransitionofDfromson
inputabetostatet.Let r betherepresentativeoft'sgroupH.Thenin D',thereisa
transitionfromstoroninputa.
Example:LetusreconsidertheDFAofFigure2.8forminimization.
STATE
a
b
A
B
C
B
B
D
C
B
C
D
B
E
(E)
B
C
Theinitialpartitionconsistsofthetwogroups{A,B,C,D}{E},whicharerespectivelythe
nonacceptingstatesandtheacceptingstates.
To construct new,theprocedureconsidersbothgroupsandinputsaandb.Thegroup{E}
cannot be split, because it has only one state, so (E} will remain intact in new.
Theothergroup{A,B,C,D}canbesplit,sowemustconsidertheeffectofeachinput
symbol.Oninputa,eachofthesestatesgoestostateB,sothereisnowaytodistinguishthese
statesusingstringsthatbeginwitha.Oninputb,statesA,B,andCgotomembersofgroup{A,B,
C,D},whilestateDgoestoE,amemberofanothergroup.
Thus, in new, group {A, B, C, D} is split into {A, B, C}{D}, and new forthisroundis
{A,B,C){D){E}.
Inthenextround,wecansplit{A,B,C}into{A,C}{B},sinceAandCeachgotoa
memberof{A,B,C)oninputb,whileBgoestoamemberofanothergroup,{D}.Thus,afterthe
second round, new ={A,C}{B}{D}{E).
Forthethirdround,wecannotsplittheoneremaininggroupwithmorethanonestate,since
AandCeachgotothesamestate(andthereforetothesamegroup)oneachinput.Weconclude
that final ={A,C}{B){D){E).
Now,weshallconstructtheminimumstateDFA.Ithasfourstates,correspondingtothe
four groups of final,andletuspickA,B,D,andEastherepresentativesofthesegroups.The
initialstateisA,andtheonlyacceptingstateisE.
Table:TransitiontableofminimumstateDFA
STATE
a
b
A
B
A
B
B
D
C
B
E
(E)
B
A
2.6LANGUAGEFORSPECIFYINGLEXICALANALYZERSLEX
Therearewiderangeftoolsforconstructionoflexicalanalyzerbasedonregular
expressions.Lexisatool(Computerprogram)thatgenerateslexicalanalyzers.
Lexisalexicalanalyzerbasedtoolbyspecifyingregularexpressionstodescribepatterns
fortoken.LextoolisreferredtoastheLexlanguageandthetoolitselfistheLexcompiler.
UseofLex

TheLexcompilertransformstheinputpatterns intoatransitiondiagramandgenerates
code.

CS6660

Compiler Design

Unit II

2.11

Aninputfilelex.liswrittenintheLexlanguageanddescribesthelexicalanalyzertobe
generated.TheLexcompilertransforms lex.l toaCprogram,inafilethatisalways
namedlex.yy.c.
Thefilelex.yy.ciscompiledbytheCCompilerandconvertedintoafilea.out.The
Ccompileroutputisaworkinglexicalanalyzerthatcantakeastream ofinputcharacters
andproduceastreamoftokens.
Theattributevalue,whetheritbeanothernumericcode,apointertothesymboltable,or
nothing,isplacedinaglobalvariableyylvalwhichissharedbetweenthelexicalanalyzer
andparser

Figure2.9:CreatingalexicalanalyzerwithLex
StructureofLexPrograms
ALexprogramhasthefollowingform:
declarations
%%
translation rules
%%
auxiliary functions

Thedeclarationssectionincludesdeclarationsofvariables,manifestconstants(identifiers
declaredtostandforaconstant,e.g.,thenameofatoken),andregulardefinitions.
ThetranslationrulesoflexprogramstatementhavetheformPattern{Action}
PatternP1{ActionA1}
PatternP2{ActionA2}

PatternPn{ActionAn}
Eachpatternisaregularexpression.Theactionsarefragmentsofcodetypicallywrittenin
Clanguage.
Thethirdsectionholdswhateveradditionalfunctionsareusedintheactions.Alternatively,
thesefunctionscanbecompiledseparatelyandloadedwiththe
lexicalanalyzer.
Thelexicalanalyzerbeginsreadingitsremaininginput,onecharacteratatime,untilitfinds
thelongestprefixoftheinputthatmatchesoneofthepatternsP i.Itthenexecutestheassociated
actionAi. Typically,Ai willreturntotheparser,butifitdoesnot(e.g.,becausePidescribes
whitespaceorcomments),thenthelexicalanalyzerproceedstofindadditionallexemes,untilone
ofthecorrespondingactionscausesareturntotheparser.Thelexicalanalyzerreturnsasingle
value,thetokenname,totheparser,butusestheshared,integervariableyylvaltopassadditional
informationaboutthelexemefound.

CS6660

Compiler Design

Unit II

2.12

2.7DESIGNOFLEXICALANALYZERFORASAMPLELANGUAGE
ThelexicalanalyzergeneratorsuchasLexisarchitectedwithanautomationsimulator.The
implementationofLexcompilercanbebasedoneitherNFAorDFA.
2.7.1TheStructureoftheGeneratedAnalyzer
Figure2.10showsthearchitectureofalexicalanalyzergeneratedbyLex.ALexprogramis
convertedintoatransitiontableandactionswhichareusedbyafiniteAutomatonsimulator.
Theprogramthatservesasthelexicalanalyzerincludesafixedprogramthatsimulatesan
automaton;theautomatonisdeterministicornondeterministic.Therestofthelexicalanalyzer
consistsofcomponentsthatarecreatedfromtheLexprogrambyLexitself.

Figure2.10:ALexprogramisturnedintoatransitiontableandactions,whichareusedbyafinite
automatonsimulator
Thesecomponentsare:
1. Atransitiontablefortheautomaton.
2. ThosefunctionsthatarepasseddirectlythroughLextotheoutput.
3. The actions from the input program, which appear as fragments of code to be
invokedattheappropriatetimebytheautomatonsimulator.
2.7.2 PatternMatchingBasedonNFA's
Toconstructtheautomationforseveralregularexpressions,weneedtocombineallNFAsinto
onebyintroducinganewstartstate with transitionstoeachofthestartstatesoftheNFA'sN i
forpatternpiasshowninfigure2.11.

Figure2.11:AnNFAconstructedfromaLexprogram
Example:Considertheatern

CS6660

Compiler Design

Unit II

a
{actionAl forpatternpl}
abb
{actionA2forpatternp2}
a*b+ {actionA3forpatternp3}

Figure2.12:NFA'sfora,abb,anda*b+

Figure2.13:CombinedNFA

Figure2.14:Sequenceofsetsofstatesenteredwhenprocessinginputaaba

Figure2.15:TransitiongraphforDFAhandlingthepatternsa,abb,anda*b+

2.13

CS6660

Compiler Design

Unit III

3.1

UNITIIISYNTAXANALYSIS
3.1NEEDANDROLEOFTHEPARSER
Theparsertakesthetokenproducedbylexicalanalysisandbuildsthesyntaxtree(parse
tree).ThesyntaxtreecanbeeasilyconstructedfromContextFreeGrammar.
Theparserreportssyntaxerrorsinanintelligiblefashionandrecoversfromcommonly
occurringerrorstocontinueprocessingtheremainderoftheprogram.
token

Sourc
e

Lexical

progra
m

Analyzer

Parse

Parser

tre
e

Get next
token

Rest of

Front End

intermediat
e
representati
on

Symbol
Table

Figure3.1:Positionofparserincompilermodel
RoleoftheParser:

Parserbuildstheparsetree.
ParserPerformscontextfreesyntaxanalysis.
Parserhelpstoconstructintermediatecode.
Parserproducesappropriateerrormessages.
Parserattemptstocorrectfewerrors.
Typesofparsersforgrammars:
Universalparsers
Universal parsing methods such as the CockeYoungerKasami algorithm and Earley's
algorithm can parse any grammar. These general methods are too inefficient to use in
production.Thismethodisnotcommonlyusedincompilers.
Topdownparsers
Topdownmethodsbuildparsetreesfromthetop(root)tothebottom(leaves)
Bottomupparsers.
Bottomupmethodsstartfromtheleavesandworktheirwayuptotheroot.

3.2CONTEXTFREEGRAMMARS
3.2.1TheFormalDefinitionofaContextFreeGrammar
AcontextfreegrammarGisdefinedbythe4tuple:G=(V,T,PS)where
1. Visafinitesetofnonterminals(variable).
2. Tisafinitesetofterminals.
3.

PisafinitesetofproductionrulesoftheformA.WhereAisnonterminaland is stringofterminals and/ornonterminals.Pisarelation fromVto(V


T)*.

4. Sisthestartsymbol(variableSV).

CS6660

Compiler Design

Unit III

3.2

Example3.1:Thefollowinggrammardefinessimplearithmeticexpressions.Inthisgrammar,the
terminalsymbolsareid+*/().Thenonterminalsymbolsareexpression,termandfactor,and
expressionisthestartsymbol.
expression expression+term
expression expression term
expression term
termterm*factor
term term/factor
term factor
factor (expression )
factor id
3.2.2NotationalConventions
Thefollowingnotationalconventionsforgrammarscanbeused
1.
Thesesymbolsareterminals:
(a) Lowercaselettersearlyinthealphabet,suchasa,b,e.
(b) Operatorsymbolssuchas+,*,andsoon.
(c) Punctuationsymbolssuchasparentheses,comma,andsoon.
(d) Thedigits0,1,...,9.
(e) Boldfacestringssuchasidorif,eachofwhichrepresentsasingleterminalsymbol.
2.
Thesesymbolsarenonterminals:
(a) Uppercaselettersearlyinthealphabet,suchasA,B,C.
(b) TheletterSisusuallythestartsymbolwhenwhichappears.
(c) Lowercase,italicnamessuchasexprorstmt.
(d) When discussing programming constructs, uppercase letters may be used to
represent nonterminals for the constructs. For example, nonterminals for
expressions,terms,andfactorsareoftenrepresentedbyE,T,andF,respectively.
3. Uppercaseletterslateinthealphabet,suchasX,Y,Z,representgrammarsymbols;thatis,
eithernonterminalsorterminals.
4.
Lowercaseletterslateinthealphabet,chieflyu,v,...,z,represent(possiblyempty)strings
ofterminals.
5. LowercaseGreekletters, , , forexample,represent(possiblyempty)stringsofgrammar
symbols.Thus,agenericproductioncanbewrittenasA,whereAistheheadandthe
body.
7.

6. Asetofproductions A 1 , A 2 ,, A k withacommonheadA(callthemA
productions),maybewrittenA1| 2|| k.call1,2,,ktheAlternativesforA.
Unlessstatedotherwise,theheadofthefirstproductionisthestartsymbol.
Example3.2:Usingtheseconventions,thegrammarofExample3.1canberewrittenconciselyas
EE+T|ET|T
TT*F|T/F|FF(
E)|id

CS6660

Compiler Design

Unit III

3.3

3.2.3Derivations
Thederivationusesproductionstogenerateastring(setofterminals).Thederivationis
formedbyreplacingthenonterminalintherighthandsidebysuitableproductionrule.
Thederivationsareclassifiedintotwotypesbasedontheorderofreplacementof
production.Theyare:
1. Leftmostderivation
Iftheleftmostnonterminalisreplacedbyitsproductioninderivation,thenitcalled
leftmostderivation.
2. Rightmostderivation
Iftherightmostnonterminalisreplacedbyitsproductioninderivation,thenitcalled
rightmostderivation.
Example3.3:LMDandRMDforexample3.2
LMDfor(id+id)

E E (E)(E+E) (id+E) (id+id)


RMDfor(id+id)
E E
(E)

(E+E)

(id+id)

(E+id)

Example 3.4: Consider the context free grammar (CFG) G = ({S}, {a, b, c}, P, S ) where
P={SSbS|ScS|a}.Derivethestringabacabyleftmostderivationandrightmostderivation.
Leftmostderivationforabaca
SbS
(usingruleSa)
(usingruleSScS)
(usingruleSa)
S

abS

abScS

abacS

abaca

(usingruleSa)

Rightmostderivationforabaca
ScS
(usingruleSa)
(usingruleSSbS)
(usingruleSa)
S

Sca

SbSca

Sbaca

abaca

(usingruleSa)

3.2.4ParseTreesandDerivations
Aparsetreeisagraphicalrepresentationofaderivation.tisconvenienttoseehowstrings
arederivedfromthestartsymbol.Thestartsymbolofthederivationbecomestherootoftheparse

tree.
CS6660

Compiler Design

Unit III

3.4

Example3.5:constructionofparsetreefor( id +id) (id+id)


Derivation:
EE

(E)(E+E)(id+E)

Parsetree:
E

E
E

E
E

E
E

E
E

E
)

E)

)
E

+
E

E
id

id

id

Figure3.2:Parsetreefor(id+id)

3.2.5Ambiguity
A grammar that produces more than one parse tree for some sentence is said to be
ambiguous.Putanotherway,anambiguousgrammarisonethatproducesmorethanoneleftmost
derivationormorethanonerightmostderivationforthesamesentence.
AgrammarGissaidtobeambiguousifithasmorethanoneparsetreeeitherinLMDorinRMD
foratleastonestring.
Example3.6: Thearithmeticexpressiongrammar(3.3)permitstwodistinctleftmostderivations
forthesentenceid+id*id:
E+E
E
E*E
E+E*E
E

id+E

id+E*E

id+id*E

id+id*id

id

id+id*E

id+id*id

id+E*E

+
E
id

E
E
*

id

id

E
id

E
id

Figure3.3:Twoparsetreesforid+id*id

CS6660

Compiler Design

Unit III

3.5

3.2.6VerifyingtheLanguageGeneratedbyaGrammar
AproofthatagrammarGgeneratesalanguageLhastwoparts:showthateverystring
generatedbyGisinL,andconverselythateverystringinLcanindeedbegeneratedbyG.
Example3.7:ConsiderthefollowinggrammarS( S ) S | . this simple grammar generates all
stringsofbalancedparentheses.ToshowthateverysentencederivablefromSisbalanced,weuse
aninductiveproofonthenumberofstepsninaderivation.
BASIS:Thebasisisn=1.TheonlystringofterminalsderivablefromSinonestepistheempty
string,whichsurelyisbalanced.
INDUCTION:Nowassumethatallderivationsoffewerthannstepsproducebalancedsentences,
andconsideraleftmostderivationofexactlynsteps.Suchaderivationmustbeoftheform

ThederivationsofxandyfromStakefewerthannsteps,sobytheinductivehypothesisxandy
arebalanced.Therefore,thestring(x)ymustbebalanced.
Thatis,ithasanequalnumberofleftandrightparentheses,andeveryprefixhasatleastasmany
leftparenthesesasright.
HavingthusshownthatanystringderivablefromSisbalanced,
WemustnextshowthateverybalancedstringisderivablefromS.
Todoso,useinductiononthelengthofastring.
BASIS:If the string is of length 0, it must be, which is balanced.
INDUCTION:First,observethateverybalancedstringhasevenlength.Assumethatevery
balancedstringoflengthlessthan 2n isderivablefromS,andconsiderabalancedstringwof
lengthn, n 1. Surely w begins with aleftparenthesis.Let(x)betheshortestnonemptyprefixof
whavinganequalnumberofleftandrightparentheses.Thenwcanbewrittenasw=(x)ywhere
bothxandyarebalanced.Sincexandyareoflengthlessthan2n,theyarederivablefromSbythe
inductivehypothesis.Thus,wecanfindaderivationoftheform
Provedthatw=(x)yisalsoderivablefromS.
3.2.7ContextFreeGrammarsversusRegularExpressions
Everyregularlanguageisacontextfreelanguage,butnotvice
versa.Example3.8:Thegrammarforregularexpression(a|b)*abb
A aA|bA|aB
B bC
C b
Describethesamelanguage,thesetofstringsofa'sandb'sendinginabb.Sowecaneasily
describetheselanguageseitherbyfiniteautomataorPDA.
n n

Ontheotherhand,thelanguageL={a b |n 1} with an equal number of a's and b's is a


prototypicalexampleofalanguagethatcanbedescribedbyagrammarbutnotbyaregular
expression.wecansaythat"finiteautomatacannotcount"meaningthatafiniteautomatoncannot
n n
acceptalanguagelike{a b |n 1} that wouldrequireittokeepcountofthenumberofa'sbefore
itseestheb's.Sothesekindsoflanguages(ContextFreeGrammars)areacceptedbyPDAasPDA

usesstackasitsmemory.

CS6660

Compiler Design

Unit III

3.6

3.2.8Leftrecursion
AcontextfreegrammarissaidtobeleftrecursiveifithasanonterminalAwithtwo
productionsinthefollowingform.
AA |
Where and are sequences of terminals and nonterminals that do not startwithA.
Leftrecursionintopdownparsingcanenterintoinfiniteloop.Itcreatesseriousproblems,
sowehaveavoidLeftrecursion.
Forexample,inexprexpr+term|term

Figure3.4:Leftrecursiveandrightrecursivewaysofgeneratingastring
ALGORITHM3.1Eliminatingleftrecursion.
INPUT:GrammarGwithnocyclesorproductions.
OUTPUT:Anequivalentgrammarwithnoleftrecursion.

METHOD:ApplythealgorithmtoG.Notethattheresultingnonleftrecursive
grammarmay have productions.
arrange the nonterminals in some order A1, A, ,
An. for(eachifrom1ton){
for(eachjfrom1toi1){
replaceeachproductionoftheformAiAj by the
productionsAi1|2| |k, where
Aj1|2| |kareallcurrentAjproductions
}
eliminatetheimmediateleftrecursionamongtheAiproductions
}
Note:SimplymodifytheleftrecursiveproductionAA | to
A A'
A' A' |

Example3.9:Considerthegrammarforarithmeticexpressions.
EE+T|T
TT*F|F

E(E)|id

CS6660

Compiler Design

Unit III

3.7

EliminateleftrecursiveproductionsEandTbyapplyingtheleftrecursionEliminatingAlgorithm.
IfAA | thenA
A'
A' A' |
TheproductionEE+T|Tisreplacedby
ETE'E'
+T E' |
TheproductionTT*F|Fisreplacedby
TFT'T'
* F T' |
Therefore,finallyweobtain,
ETE'E'
+T E' |
TFT'T'
* F T' |
E(E)|id

Example3.10:Considerthegrammar,Eliminateleftrecursiveproductions.
SAa|b
AA c | S d |
Thereisnoimmediateleftrecursion.TogetitsubstituteSproductionin
A.AA c | A a d | b d |
AA c | A a d | b d | is replaced by
AbdA'|A'A'
c A' | a d A' |
Therefore,finallyweobtaingrammarwithoutleftrecursion,
SAa|b
AbdA'|A'A'
c A' | a d A' |
Example3.11:Considerthegrammar
A ABd|Aa|a
B Be|b
Thegrammarwithoutleftrecursionis
AaA'
A'B d A' | a A' |
BbB'
B'e B' |

Example3.12:Eliminateleftrecursionfromthegivengrammar.AAc|Aad|bd|b
cAfterremovingleftrecursion,thegrammarbecomes,
AbdA'|bc
A'A'cA'|ad

A'A'

CS6660

Compiler Design

Unit III

3.8

3.2.9Leftfactoring
Leftfactoringisaprocessoffactoringoutthecommonprefixesoftwoormoreproduction
alternatesforthesamenonterminal.

Algorithm3.2:Leftfactoringa
grammar.INPUT:GrammarG.
OUTPUT:Anequivalentleftfactoredgrammar.
ETHOD: For each nonterminal A, find the longest prefix common to two or more of
its alternatives.Ifai.e.,thereisanontrivialcommonprefixreplacealloftheA
productionsA 1 | 2 | | n| , where represents all alternatives thatdonotbegin
with , by
AA' |
A' 1| 2 | | n
Here A' is a new nonterminal. Repeatedly apply this transformation until no two
alternativesforanonterminalhaveacommonprefix.

Example3.13:Eliminateleftfactorsfromthegivengrammar.ST+S|T
Afterleftfactoring,thegrammarbecomes,
STL
L+ S |

Example3.14:Leftfactorthefollowinggrammar.SiEtS|iEtSeS|a;Eb
Afterleftfactoring,thegrammarbecomes,
SiEtSS'|a
S'e S |
Eb
Uses:
Leftfactoringisusedinpredictivetopdownparsingtechnique.

CS6660

Compiler Design

Unit III

3.9

3.3TOPDOWNPARSINGGENERALSTRATEGIES
Topdownparsingcanbeviewedastheproblemofconstructingaparsetreefortheinput
string,startingfromtherootandcreatingthenodesoftheparsetreeinpreorder(depthfirst).Top
downparsingcanbeviewedasfindingaleftmostderivationforaninputstring.
Parsers are generally distinguished by whether they work topdown (start with the
grammar'sstartsymbolandconstructtheparsetreefromthetop)orbottomup(startwiththe
terminalsymbolsthatformtheleavesoftheparsetreeandbuildthetreefromthebottom).Top
downparsersincluderecursivedescentandLLparsers,whilethemostcommonformsofbottom
upparsersareLRparsers.
Types of
parser
Top down
parser
Backtrackin
g

Bottom up
parser

Predictive
parser

Recursive
descent

Shift Reduce
parser

LL(1)
parser

SLR
parser

LR parser
LALR
parser

(C) LR
parser

Figure3.5:Typesofparser
Example3.15:Thesequenceofparsetreesfortheinputid+id*idinatopdownparse(LMD).
E TE'
E'+ T E' |
T FT'
T'* F T' |
F (E)|id

Figure3.6:Topdownparseforid+id*id

CS6660

Compiler Design

Unit III

3.10

3.4RECURSIVEDESCENTPARSER
Theseparsersuseaprocedureforeachnonterminal.Theprocedurelooksatitsinputand
decideswhichproductiontoapplyforitsnonterminal.Terminalsinthebodyoftheproductionare
matchedtotheinputattheappropriatetime,whilenonterminalsinthebodyresultincallstotheir
procedure.Backtracking,inthecasewhenthewrongproductionwaschosen,isapossibility.
voidA()
{
ChooseanAproduction,AX1X2...
Xk;for(i=ltok)
{
if(Xiisanonterminal)call
procedureXi();
elseif(Xiequalsthecurrentinputsymbola)
advancetheinputtothenextsymbol;

else/*anerrorhasoccurred*/;
}
}
Example3.16:Considerthe
grammarScAd
Aab|a
Toconstructaparsetreetopdownfortheinputstringw=cad,beginwithatreeconsisting
ofasinglenodelabeledS,andtheinputpointerpointingtoc,thefirstsymbolofw.Shasonlyone
production,soweuseittoexpandSandobtainthetreeofFigure3.7(a).Theleftmostleaf,labeled
c,matchesthefirstsymbolofinputw,soweadvancetheinputpointertoa,thesecondsymbolof
w,andconsiderthenextleaf,labeledA.
Now,weexpandAusingthefirstalternativeAabtoobtainthetreeofFigure3.7(b).We
haveamatchforthesecondinputsymbol,a,soweadvancetheinputpointertod,thethirdinput
symbol,andcomparedagainstthenextleaf,labeledb.Sincebdoesnotmatchd,wereportfailure
andgobacktoAtoseewhetherthereisanotheralternativeforAthathasnotbeentried,butthat
mightproduceamatch
S

S
c

c
a

S
d
b

S
d

Figure3.7:Stepsinatopdownparse
ThesecondalternativeforAproducesthetreeofFigure3.7(c).Theleafamatchesthe

secondsymbolofwandtheleafdmatchesthethirdsymbol.Sincewehaveproducedaparsetree
forw,wehaltandannouncesuccessfulcompletionofparsing.

CS6660

Compiler Design

Unit III

3.11

3.5PREDICTIVEPARSER(NONRECURSIVE)
Anonrecursivepredictiveparsercanbebuiltbymaintainingastackexplicitly,ratherthan
implicitlyviarecursivecalls.Theparsermimicsaleftmostderivation.Ifwistheinputthathas been matched so far, then the stack holds a sequence of grammar symbols such
that

Swalm

ThetabledrivenparserinFigure3.8hasaninputbuffer,astackcontainingasequenceof
grammarsymbols,aparsingtableconstructed,andanoutputstream.Theinputbuffercontainsthe
stringtobeparsed,followedbytheendmarker$.Wereusethesymbol$tomarkthebottomofthe
stack,whichinitiallycontainsthestartsymbolofthegrammarontopof$.
TheparseriscontrolledbyaprogramthatconsidersX,thesymbolontopofthestack,and
a,thecurrentinputsymbol.IfXisanonterminal,theparserchoosesanXproductionbyconsulting
entryM[X,a]oftheparsingtableM.Otherwise,itchecksforamatchbetweentheterminalXand
currentinputsymbola.
Input

+ b

Predictive

X
Stack

Parsing
Program

Outp
ut

Z
$

Parsing
Table M

Figure3.8:Modelofatabledrivenpredictiveparser
Algorithm3.3:Tabledrivenpredictiveparsing.
INPUT:AstringwandaparsingtableMforgrammarG.
OUTPUT:IfwisinL(G),aleftmostderivationofw;otherwise,anerrorindication.
METHOD:Initially,theparserisinaconfigurationwithw$intheinputbufferandthestartsymbol
SofGontopofthestack,above$.ThefollowingprocedureusesthepredictiveparsingtableMto
produceapredictiveparsefortheinput.
setiptopointtothefirstsymbolofw;
setXtothetopstacksymbol;
while( X $ ) { /* stack is not empty */
if(Xisa)popthestackandadvance
ip;elseif(Xisaterminal)error();
elseif(M[X,a]isanerrorentry)error();
elseif(M[X,a]=XY1Y2Yk){

outputtheproductionXY1Y2
Yk;popthestack;
pushYkYk1Y1ontothestack,withYlontop;
}
setXtothetopstacksymbol;
}

CS6660

Compiler Design

Unit III

3.12

Example3.17:Considergrammarfortheinputid+id*id
usingthenonrecursivepredictive
parser.
E TE'
E'+ T E' |
T FT'
T'* F T' |
F (E)|id
idE
id+TE
id+idT'E
idTE
id+FT'E
TE
FTE

id+id*FT'E
id+id*id

id+id*idT'E

E id+id*id

Figure3.9:Movesmadebyapredictiveparseroninputid+id*id
3.6LL(1)PARSER
Agrammarsuchthatitispossibletochoosethecorrectproductionwithwhichtoexpanda
givennonterminal,lookingonlyatthenextinputsymbol,iscalledLL(1).Thesegrammarsallowus
toconstructapredictiveparsingtablethatgives,foreachnonterminalandeachlookaheadsymbol,
thecorrectchoiceofproduction.Errorcorrectioncanbefacilitatedbyplacingerrorroutinesin
someorallofthetableentriesthathavenolegitimateproduction.
LL(1)Grammars
Predictiveparsers(recursivedescentparsers)needingnobacktracking,canbeconstructed
foraclassofgrammarscalledLL(1).Thefirst"L"inLL(1)standsforscanningtheinputfromleft
toright,thesecond"L"forproducingaleftmostderivation,andthe"1"forusingoneinputsymbol
oflookaheadateachsteptomakeparsingactiondecisions.

CS6660

Compiler Design

Unit III

3.13

TransitionDiagramsforPredictiveParsers
Transitiondiagramsareusefulforvisualizingpredictiveparsers.Toconstructthetransition
diagramfromagrammar,firsteliminateleftrecursionandthenleftfactorthegrammar.Then,for
eachnonterminalA,
1. Createaninitialandfinal(return)state.
2. ForeachproductionAX1X2Xk,createapathfromtheinitialtothefinalstate,with
edgeslabeledX1,X2, , Xk.IfA,thepathisanedgelabeled.
AgrammarGisLL(1)ifandonlyifwheneverA | are two distinct productions of G, the
followingconditionshold:
1. Fornoterminalado both and derive strings beginning with a.
2. At most one of and can derive the empty string.
3. If
then doesnotderiveanystringbeginningwithaterminalinFOLLOW(A).

, then doesnotderiveanystringbeginningwithaterminalin

ikewise, if

FOLLOW(A).
PredictiveparserscanbeconstructedforLL(1)grammarsincetheproperproductionto
applyforanonterminalcanbeselectedbylookingonlyatthecurrentinputsymbol.Flowof
controlconstructswiththeirdistinguishingkeywordsgenerallysatisfytheLL(1)constraints.For
instance,
Stmtif(expr)stmtelsestmt|while(expr)stmt|{stmt_list}
Fortheaboveproductionsthekeywordsif,while,andsymbol{telluswhichalternateiveis
onlyonethatcouldpossiblysucceed.
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminalsorE:canbeaddedtoanyFIRSTset.
1.
IfXisaterminal,thenFIRST(X)={X}.
2.

IfXisanonterminalandXYlY2Ykis a production for some k1, then place a in


FIRST(X)ifforsomei,aisinFIRST(Yi), and is in all of FIRST(Y1),, FIRST(Yi1);
thatis,Yl Yi1=>* . (If isinFIRST(Yj) for all j = 1,, . . . , k, then add to FIRST(X).
Forexample,everythinginFIRST(Yl)issurelyinFIRST(X).IfY ldoes not derive , then
weaddnothingmoretoFIRST(X),butifYl=>* , then we add F1RST(Y2),andSoon.)
3. IfXis a production, then add to FIRST(X).
TocomputeFOLLOW(A)forallnonterminalsA,applythefollowingrulesuntilnothingcanbe
addedtoanyFOLLOWset.
1. Place$inFOLLOW(S),whereSisthestartsymbol,and$istheinputrightendmarker.
2. IfthereisaproductionAB, then everything in FIRST( ) except is in FOOW(B).
3. IfthereisaproductionAB,oraproductionAB,whereFIRST() contains , then
everythinginFOLLOW(A)isinFOLLOW(B).
Example3.18:ConstructthePredictiveparsingtableforLL(1)grammar:
SiEtSS'|a
S'eS|
Eb
TheLL(1)grammarwillnotbeleftrecursive.SofindtheFIRST()andFOLLOW().

CS6660

FIRST():

Compiler Design

FIRST(S)={i,a},

FOLLOW():FOLLOW(S)={e,$},
NON
TERMINAL
S
S'

Unit III

3.14

FIRST(S') ={e, },

FIRST(E)={b}

FOLLOW(S')={e,$},

FOLLOW(E)={t,$}

INPUTSYMBOL

a
Sa

i
SiEtSS'

S'eS
S'

S'

Eb

3.7SHIFTREDUCEPARSER
Bottomupparsersgenerallyoperatebychoosing,onthebasisofthenextinputsymbol
(lookaheadsymbol)andthecontentsofthestack,whethertoshiftthenextinputontothestack,or
toreducesomesymbolsatthetopofthestack.Areducesteptakesaproductionbodyatthetopof
thestackandreplacesitbytheheadoftheproduction.
Example3.19:Considertheproductionrulesfortheshiftreduceparseroninputid*id.
E E+T|T
T T*F|F
F (E)|id
STACK
$
$id1
$F
$T
$T*
$T*id2
$T*F
$T
$E

INPUT ACTION
id1*id2
*id2
*id2
*id2
id2

$
$
$
$
$
$
$
$
$

shift
reducebyFid
reducebyTF
shift
shift
reducebyFid
reducebyTT*F
reducebyET
accept

Theactionsofashiftreduceparseroninputid*id,usingtheLR(0)automaton.Weusea
stacktoholdstates,thegrammarsymbolscorrespondingtothestatesonthestackappearincolumn
SYMBOLS. At line (1), the stack holds the start state 0 of the automaton; the corresponding
symbolisthebottomofstackmarker$.

CS6660

Compiler Design

Unit III

3.15

3.8LRPARSER
AschematicofanLRparserisshowninFigure3.10.Itconsistsofaninput,anoutput,a
stack,adriverprogram,andaparsingtablethathastwopasts(ACTIONandGOTO).Thedriver
programisthesameforallLRparsers;onlytheparsingtablechangesfromoneparsertoanother.
Theparsingprogramreadscharactersfromaninputbufferoneatatime.Whereashiftreduce
parserwouldshiftasymbol,anLRparsershiftsastate.Eachstatesummarizestheinformation
containedinthestackbelowit.
Input

a1

ai

an

LR
Stack

Sm
S

Output

Parsing
Program

m1

ACTION

GOTO

Figure3.10:ModelofanLRparser
Thestackholdsasequenceofstates,s0s1smwheresm,isontop.IntheSLRmethod,thestack
holdsstatesfromtheLR(0)automaton;thecanonicalLRandLALRmethodsaresimilar.
StructureoftheLRParsingTable
Theparsingtableconsistsoftwoparts:aparsingactionfunctionACTIONandagotofunction
GOTO.

(c)
(d)

1. TheACTIONfunctiontakesasargumentsastate i andaterminal a (or$,theinputend


marker).ThevalueofACTION[i,a]canhaveoneoffourforms:
(a) Shiftj,wherejisastate.Theactiontakenbytheparsereffectivelyshiftsinputato
thestack,butusesstatejtorepresenta.
(b) ReduceA.Theactionoftheparsereffectivelyreducesonthetopofthestack
toheadA.
Accept.Theparseracceptstheinputandfinishesparsing.
Error.Theparserdiscoversanerrorinitsinputandtakessomecorrectiveaction.
2. WeextendtheGOTOfunction,definedonsetsofitems,tostates:ifGOTO[Ii,A]=Ij,then
GOTOalsomapsastateiandanonterminalAtostatej.
Algorithm3.4:LRparsingalgorithm.
INPUT:AninputstringwandanLRparsingtablewithfunctionsACTIONandGOTOfora
grammarG.
OUTPUT:IfwisinL(G),thereductionstepsofabottomupparseforw;otherwise,anerror
indication.
METHOD:Initially,theparserhassoonitsstack,wheresoistheinitialstate,andw$inthe
inputbuffer.
letabethefirstsymbolof

w$;while(1)

CS6660

Compiler Design

Unit III

3.16

{/*repeatforever*/
letsbethestateontopofthe
stack;if(ACTION[S,a]=shiftt)
{
pushtontothestack;
letabethenextinputsymbol;
}
elseif(ACTION[S,a]=reduceA)
{
pop||symbolsoffthestack;
letstatetnowbeontopofthe
stack;pushGOTO[t,A]ontothe
stack;outputtheproductionA;
}
elseif(ACTION[S,a]=accept)break;/*parsingisdone
*/elsecallerrorrecoveryroutine;
}
Example3.20:TheFigure4.37showstheACTIONandGOT0functionsofanLRparsingtable
fortheexpressiongrammar
EE+T|T
TT*F|F
E(E)|id

STATE
0
1
2
3
4
5
6
7
8
9
10
11

id
s5

+
s6
r2
r4

ACTION
*
(
s4
s7
r4

s5

r2
r4

accept
r2
r4

s4
r6

r6

s5
s5

r6

r3
r5

Figure3.11Parsingtableforexpressiongrammar

s1
r1
r3
r5

F
3

3
10

r6

s4
s4
s6
r1
r3
r5

E
1

GOTO
T
2

r1
r3
r5

CS6660

Compiler Design

Unit III

3.17

3.9LR(0)ITEM
AnLRparsermakesshiftreducedecisionsbymaintainingstatestokeeptrackofwherewe
areinaparse.Statesrepresentsetsof"items".AnLR(0)item(itemforshort)ofagrammarGisa
productionofGwithadotatsomepositionofthebody.Thus,productionAXYZyieldsthefour
items.
AX Y Z
AXY Z
AX YZ
AX Y Z
TheproductionAgenerates only one item, A.

Anitemindicateshowmuchofaproductionwehaveseenatagivenpointintheparsing
process.
Forexample,theitemAXYZ indicates that we hope to see a string derivable from
XYZnextontheinput.Item
AXY ZindicatesthatwehavejustseenontheinputastringderivablefromXandthat
wehopenexttoseeastringderivablefromYZ.
ItemA X Y Z indicatesthatwehaveseenthebodyXYZandthatitmaybetimeto
reduceXYZtoA.
OnecollectionofsetsofLR(0)items,calledthecanonicalLR(0)collection,providesthe
basis for constructing a deterministic finite automaton that is used to make parsing
decisions.SuchanautomatoniscalledanLR(0)automaton.
ToconstructthecanonicalLR(0)collectionforagrammar,wedefineanaugmented
grammarandtwofunctions,CLOSUREandGOTO.IfGisagrammarwithstartsymbolS,
thenG',theaugmentedgrammarforG,isGwithanewstartsymbolS'andproductionS'
S.Thepurposeofthisnewstartingproductionistoindicatetotheparserwhenitshould
stopparsingandannounceacceptanceoftheinput.Thatis,acceptanceoccursonlywhen
theparserisabouttoreducebyS'S.

ClosureofItemSets
IfIisasetofitemsforagrammarG,thenCLOSURE(I)isthesetofitemsconstructedfrom
Ibythetworules:
1. Initially,addeveryiteminItoCLOSURE(I).
2. IfABisinCLOSURE(I)andBis a production, then add the item Bto
CLOSURE(I),ifitisnotalreadythere.Applythisruleuntilnomorenewitemscanbe
addedtoCLOSURE(I)
Intuitively,ABinCLOSURE(I)indicatesthat,atsomepointintheparsingprocess,
wethinkwemightnextseeasubstringderivablefromBasinput.Thesubstringderivablefrom
BwillhaveaprefixderivablefromBbyapplyingoneoftheBproductions.Wethereforeadd
itemsforalltheBproductions;thatis,ifBis a production, we also include Bin
CLOSURE(I).
Aconvenientwaytoimplementthefunctionclosureistokeepabooleanarrayadded,
indexedbythenonterminalsofG,suchthatadded[B]issettotrueifandwhenweaddtheitemB
for each BproductionB.
Wecandivideallthesetsofitemsofinterestintotwoclasses.Theyare:
1. Kernelitems:theinitialitem,S'S,andallitemswhosedotsarenotattheleftend.

2. Nonkernelitems:allitemswiththeirdotsattheleftend,exceptforS'S

CS6660

Compiler Design

Unit III

3.18

SetOfItemsCLOSURE(I)
{
J=I;
repeat
for(eachitemABinJ)
for(eachproductionBof G )
if(Bis not in J )
addBto J;
untilnomoreitemsareaddedtoJonone
round;returnJ;
}
Figure3.32:ComputationofCLOSURE
Example4.21:Considertheaugmentedexpression
grammar:E'E
EE+T|T
TT*F|F
E(E)|id
IfIisthesetofoneitem{[E'E]},thenCLOSURE(I)containsthesetofitemsI0inFigure.
E'E
EE+T
ET
TT*F
TF
E(E)
E id
3.10 CONSTRUCTIONOFSLRPARSINGTABLE
TheSLRmethodforconstructingparsingtablesisagoodLRparsing.theparsingtable
constructed by this LR parser using an SLRparsing table called SLR parser. The other two
methodsaugmenttheSLRmethodwithlookaheadinformation.TheSLRmethodbeginswith
LR(0)itemsandLR(0)automata.
Givenagrammar,G,weaugmentG'toproduceG,withanewstartsymbolS'.FromG',we
constructC,thecanonicalcollectionofsetsofitemsforG'togetherwiththeGOTOfunction.
Algorithm3.5:ConstructinganSLRparsingtable.
INPUT:AnaugmentedgrammarG'.
OUTPUT:TheSLRparsingtablefunctionsACTIONandGOTOforG'.
METHOD:
1. ConstructC={I0,I1,...,In},thecollectionofsetsofLR(0)itemsforG'.
2. StateiisconstructedfromIi.Theparsingactionsforstateiaredeterminedasfollows:

(a) If[Aa]isinIi,andGOTO(Ii,a)=Ij,thensetACTION[i,a]to"shiftj".Herea
mustbeaterminal.

CS6660

Compiler Design

Unit III

3.19

(b) If[A ]isinIi,thensetACTION[i,a]to"reduceA"forallainFOLLOW(A);


hereAmaynotbeS'.
(c) If[S'S]isinIi,thensetACTION[i,$]to"accept".
Ifanyconflictingactionsresultfromtheaboverules,wesaythegrammarisnotSLR(1).
Thealgorithmfailstoproduceaparserinthiscase.
3. Thegototransitionsforstate i areconstructedforallnonterminalsAusingtherule:If
GOTO(Ii,A)=Ij,thenGOTO[i,A]=j.
4. Allentriesnotdefinedbyrules(2)and(3)aremade"error".
5. Theinitialstateoftheparseristheoneconstructedfromthesetofitemscontaining[S'
S].
AnLRparserusingtheSLR(1)tableforGiscalledtheSLR(1)parserforG,andagrammar
havinganSLR(1)parsingtableissaidtobeSLR(1).Weusuallyomitthe"(1)"afterthe"SLR,"
sinceweshallnotdealherewithparsershavingmorethanonesymboloflookahead.
Example 3.22: Let us construct the SLR table for the augmented expression grammar. The
canonicalcollectionofsetsofLR(0)itemsforthefollowinggrammar.
E E+T|T
T T*F|F
F (E)|id
Step1:AugmentE'production
F id
E'E
(Acceptr0)
Goto(I0,T):I2
EE+T
ET
TT*F
TF
F(E)
Fid

(r1)
(r2)
(r3)
(r4)
(r5)
(r6)

Step2:ClosureofE'
ThesetofitemsI0:
E'E
E E+T
E T
T T*F
T F
F (E)
F id
Step3:GOTOoperationof

everysymbolonI0items:
Goto(I0,E):I1
E'E
EE+T

ET
TT*F
Goto(I0,F):I3

TF
Goto(I0,():I4

E
E
E
T
T
F
F

(E)
E+T
T
T*F
F
(E)
id

Goto(I0,id):I5

Fid
Goto(I1,+):I6
E E+T

T T*F
T F
F (E)

Goto(I2,*):I7
TT*FF(E)
Fid
Goto(I4,E):I8

F(E)
EE+T

Goto(I7,F):I10

Goto(I6,T):I9
EE+TT

Goto(I8,)):I11

T*F

TT*F

F(E)

CS6660

Compiler Design

Unit III

Step4:ConstructionofDFA

Figure3.12:LR(0)automatonDFAwitheverystateasfinalandI0asinitial.
Step5:ConstructionofFOLLOWSETfornonterminals
FOLLOW(E')={$}becauseE'isastartsymbol.
FOLLOW(E):
E'E
i.e., follow of E is E, so add $ because E is start symbol
E E+T i.e.,followofEis+,soadd+
F (E)
i.e.,followofEis),soadd)
FOLLOW(E)={+,),$}
FOLLOW(T):
AsE'E,ET.i.e.,E'=E=T=startsymbol.add$
E E+TT+Ti.e.,followofTis+,soadd+
T T*Fi.e.,followofTis*,soadd*
F (E) i.e., follow of T is ), so add )
FOLLOW(T)={+,*,),$}

3.20

CS6660

Compiler Design

Unit III

3.21

FOLLOW(F):
E'E,E T,TF.i.e.,E'=E=T=F=startsymbol. add$
EE+TT+TF+Ti.e.,followofFis+,soadd+
T T*F F*FT
i.e.,followofFis*,soadd*
F (E)(T)(F)
i.e.,followofFis),soadd)
FOLLOW(F)={+,*,),$}
Step6:Construction
SLRparsingtablewillbeconstructedusingtheAlgorithm3.5ConstructinganSLR
parsing
table.
Step7:Tablefilling
FirstconsiderthesetofitemsI0:
TheitemF (E)givesrisetotheentryinactiontable ACTION[0,(]=shift4, ,Goto(I0,():I4.
TheitemF idgivesrisetotheentryinactiontable ACTION[0,id]=shift5, ,Goto(I0,id):I4.
OtheritemsinI0yieldnoactions.
NowconsiderI1:E'EandEE+T
ThefirstitemyieldsACTION[1,$]=accept,andthesecondyieldsACTION[1,+]==shift6.
NextconsiderI2:ETandTT*F
SinceFOLLOW(E)={$,+,)},thefirstitemmakes
ACTION[2,$]=ACTION[2,+]=ACTION[2,)]=reduceET
TheseconditemmakesACTION[2,*]=shift7.Andsoon.
STATE
0
1
2
3
4
5
6
7
8
9
10
11
CS6660

id
s5

+
s6
r2
r4

ACTION
*
(
s4
s7
r4

s5

r2
r4

accept
r2
r4

s4
r6

r6

s5
s5

r6

E
1

GOTO
T
2

F
3

3
10

r6

s4
s4
s6
r1
r3
r5

s7
r3
r5
Compiler Design

s11
r1
r3
r5

r1
r3
r5
Unit III

3.22

Step8:Inputparsing
LINE
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)

STACK
0
05
03
02
027
0275
027510
02
01
016
0165
0163
0169
01

SYMBOLS
id
F
T
T*
T*id
T*F
T
E
E+
E+id
E+F
E+T
E

INPUT
id*id+id$
*id+id$
*id+id$
*id+id$
id+id$
+id$
+id$
+id$
+id$
id$
$
$
$
$

ACTION
shift
reducebyFid
reducebyTF
shift
shift
reducebyFid
reducebyTT*F
reducebyET
shift
shift
reducebyFid
reducebyTF
reducebyET+T
accept

Atline(1),thestackholdsthestartstate0oftheautomaton;thecorrespondingsymbolis
thebottomofstackmarker$.Thenextinputsymbolisidandstate0hasatransitiononidtostate
5.Wethereforeshift.Atline(2),state5(symbolid)hasbeenpushedontothestack.Thereisno
transitionfromstate5oninput*,sowereduce.Fromitem[Fid]instate5,thereductionisby
productionFid.
Withsymbols,areductionisimplementedbypoppingthebodyoftheproductionfromthe
stack(online(2),thebodyisid)andpushingtheheadoftheproduction(inthiscase,F).With
states,wepopstate5forsymbolid,whichbringsstate0tothetopandlookforatransitiononF,
theheadoftheproduction.
3.11INTRODUCTIONTOLALRPARSER
The LALR (EoolcaheadLR) technique is often used in practice, because the tables
obtainedbyLALRparserareconsiderablysmallerthanthecanonicalLRparsertables.
Foracomparisonofparsersize,theSLRandLALRtablesforagrammaralwayshavethe
samenumberofstates,andthisnumberistypicallyseveralhundredstatesforalanguagelikeC.
ThecanonicalLRtablewouldtypicallyhaveseveralthousandstatesforthesamesizelanguage.
Thus, it is much easier and more economical to construct SLR and LALR tables than the
canonicalLRtables.
Algorithm3.6::Aneasy,butspaceconsumingLALRtableconstruction.
INPUT:AnaugmentedgrammarG'.
OUTPUT:TheLALRparsingtablefunctionsACTIONandGOT0forG'.
METHOD:
1. ConstructC=(I0,I1,, In),thecollectionofsetsofLR(1)items.
2. ForeachcorepresentamongthesetofLR(1)items,findallsetshavingthatcore,and
replacethesesetsbytheirunion.
3. LetC'={J0,J1,,Jm}betheresultingsetsofLR(1)items.Theparsingactionsforstatei
areconstructedfromJi.Ifthereisaparsingactionconflict,thealgorithmfailstoproducea
parser,andthegrammarissaidnottobeLALR(1).

CS6660

Compiler Design

Unit III

3.23

4. TheGOTOtableisconstructedasfollows.IfJistheunionofoneormoresetsofLR(1)
items,thatis,J= I1I2 Ik,thenthecoresofGOTO(I1,X),GOTO(I2,X), ,
GOTO(In,X)arethesame,sinceI1,I2, ,Ik,allhavethesamecore.LetKbetheunionof
allsetsofitemshavingthesamecoreasGOTO(I1,X).ThenGOTO(J,X)=K.
Algorithm3.7:ConstructionofthesetsofLR(1)
items.INPUT:AnaugmentedgrammarG'.
OUTPUT:ThesetsofLR(1)itemsthatarethesetofitemsvalidforoneormoreviableprefixesofG'.

METHOD:TheproceduresCLOSUREandGOT0andthemainroutineitemsforconstructingthe
setsofitemsweregivenbelow.
SetOfftemsCLOSURE(I)
{
repeat
for(eachitem[AB,a]inI)
for(eachproductionBinG')
for(eachterminalbinFIRST(a))
add[B,b]tosetI;

untilnomoreitemsareaddedtoI;
returnI;
}
SetOfItemsGOTO(I,X){
initializeJtobetheemptyset;
for(eachitem[AX,a]inI)add
item[AX,a]tosetJ;

returnCLOSURE(J);
}
voiditems(G')
{
initializeCtoCLOSURE({[S'S,$]});
repeat
for(eachsetofitemsIinC)
for(eachgrammarsymbolX)
if(GOTO(I,X)isnotemptyandnotinC
)addGOTO(I,X)toC;
untilnonewsetsofitemsareaddedtoC;
}
Example3.23:Considerthefollowingaugmentedgrammar.
S'S
S CC

CcC|d
ConstructparsingtableforLALR(1)parser.

CS6660

ConstrcutionofSetof
LR(1)items.
I0:
SS,$
SCC,$
CcC,c/d
Cd, c/d
I1:GOTO(I0,S)

Compiler Design

I3:GOTO(I0,c)
C cC, c/d C

I6:goto(I2,c)
C cC, $ C

cC, c/d C

cC, $C

d, c/d

d, $

cC, c/dCd,
c/d

3.24

I7:GOTO(I2,d)
I4:GOTO(I0,d)

Cd, $

Cd, c/d
I8:GOTO(I3,C)

SS, $
I2:GOTO(I0,C)
SCC, $C

Unit III

CcC, c/d
I5:GOTO(I2,C)

SCC,$

I9:GOTO(I6,C)

CcC, $

Figure3.13:TheGOTOgraphfortheabovegrammar

CS6660

Compiler Design

Unit III

3.25

Therearethreepairsofsetsofitemsthatcanbemerged.I3andI6arereplacedbytheirunion:
I36:GOTO(I0,c)
CcC, c/d/$
CcC, c/d/$
Cd, c/d/$

I4andI7arereplacedbytheirunion:
I47:GOTO(I0,d)
Cd, c/d/$
I8andI9arereplacedbytheirunion:
I8:GOTO(I3,C)
CcC, c/d/$

TheLALRACTIONandGOTOfunctionsforthecondensedsetsofitemsareshownintable3.4
STATE
0
1
2
36
47
5
89

ACTION
c
s36

d
s47

GOTO
S
1

C
2

Accept
s36
s36
r3

s47
s47
r3

r2

r2

5
89
r3
r1
r2

Parsingtheinputstringccdd
Stack
$0
$0c36
$0c36c36
$0c36c36d47
$0c36c36C89
$0c36C89
$0C2
$0C2d47
$0C2C5
$0S1

Inputbuffer
ccdd$
cdd$
dd$
d$
d$
d$
d$
$
$
$

Actiontable
Gototable
action[0,c]=s36
action[36,c]=s36
action[36,d]=s47
action[47,d]=r36 [36,C]=89
action[89,d]=r2
[36,C]=89
action[89,d]=r2
[0,C]=2
action[2,d]=s47
action[47,$]=r36 [2,C]=5
action[5,$]=r1
[0,S]=1
accept

Parsingaction
Shift
Shift
ReducebyCd
ReducebyCcC
ReducebyCcC
Shift
ReducebyCd
ReducebySCC

CS6660

Compiler Design

Unit III

3.26

3.12ERRORHANDLINGANDRECOVERYINSYNTAXANALYZER
SyntaxErrorHandling
Ifacompilerhadtoprocessonlycorrectprograms,itsdesignandimplementationwouldbe
simplified greatly. However, a compiler is expected to assist the programmer in locating and
trackingdownerrorsthatinevitablycreepintoprograms,despitetheprogrammer'sbestefforts.
Fewlanguageshavebeendesignedwitherrorhandlinginmind,eventhougherrorsareso
commonplace.
Mostprogramminglanguagespecificationsdonotdescribehowacompilershouldrespond
toerrors;errorhandlingislefttothecompilerdesigner.
Planningtheerrorhandlingrightfromthestartcanbothsimplifythestructureofacompiler
andimproveitshandlingoferrors.
Commonprogrammingerrorscanoccuratmanydifferentlevels.

Lexicalerrorsincludemisspellingsofidentifiers,keywords,oroperatorse.g.,theuseof
anidentifierelipsesizeinsteadofellipsesizeandmissingquotesaroundtextintendedasa
string.
Syntacticerrors includemisplacedsemicolonsorextraormissingbraces;thatis,'(("or
")." As another example, in C or Java, the appearance of a case statement without an
enclosingswitchisasyntactic.
Semanticerrorsincludetypemismatchesbetweenoperatorsandoperands.Anexampleis
areturnstatementinaJavamethodwithresulttypevoid.
Logicalerrorscanbeanythingfromincorrectreasoningonthepartoftheprogrammerto
theuseinaCprogramoftheassignmentoperator=insteadofthecomparisonoperator==.

Syntacticerrorsaredetectedveryefficientlybysyntaxanalyzers.Severalparsingmethods,
suchastheLLandLRmethods,detectanerrorassoonaspossible.
Anotherreasonforemphasizingerrorrecoveryduringparsingisthatmanyerrorsappear
syntactic,whatevertheircause,andareexposedwhenparsingcannotcontinue.
Afewsemanticerrors,suchastypemismatches,canalsobedetectedefficiently;however,
accuratedetectionofsemanticandlogicalerrorsatcompiletimeisadifficulttask.ingeneral.
Theerrorhandlerinaparserhasgoalsthataresimpletostatebutchallengingtorealize:

Reportthepresenceoferrorsclearlyandaccurately.
Recoverfromeacherrorquicklyenoughtodetectsubsequenterrors.
Addminimaloverheadtotheprocessingofcorrectprograms.

Acommonstrategyistoprinttheoffendinglinewithapointertothepositionatwhichan
errorisdetected.
ErrorRecoveryStrategies
Onceanerrorisdetected,itshouldberecoveredbyparser.Thesimplestapproachisforthe
parsertoquitwithaninformativeerrormessagewhenitdetectsthefirsterror.
Additional errors are often uncovered if the parser can restore itself to a state where
processingoftheinputcancontinue.
1.PanicModeRecovery
Withthismethod,ondiscoveringanerror,theparserdiscardsinputsymbolsoneatatime
untiloneofadesignatedsetofsynchronizingtokensisfound.Thesynchronizingtokensareusually
delimiters,suchassemicolonor},whoseroleinthesourceprogramisclearandunambiguous.

CS6660

Compiler Design

Unit III

3.27

Whilepanicmodecorrectionoftenskipsaconsiderableamountofinputwithoutcheckingitfor
additionalerrors,ithastheadvantageofsimplicity.
2.PhraseLevelRecovery
Ondiscoveringanerror,aparsermayperformlocalcorrectionontheremaininginput;that
is,itmayreplaceaprefixoftheremaininginputbysomestringthatallowstheparsertocontinue.
Atypicallocalcorrectionistoreplaceacommabyasemicolon,deleteanextraneoussemicolon,or
insertamissingsemicolon.Thechoiceofthelocalcorrectionislefttothecompilerdesigner.
Phraselevel replacement has been used in several errorrepairing compilers, as it can
correctanyinputstring.Itsmajordrawbackisthedifficultyithasincopingwithsituationsin
whichtheactualerrorhasoccurredbeforethepointofdetection(mustavoidinfiniteloops).
3.ErrorProductions
Byanticipatingcommonerrorsthatmightbeencountered,wecanaugmentthegrammarfor
thelanguageathandwithproductionsthatgeneratetheerroneousconstructs.
Aparserconstructedfromagrammaraugmentedbytheseerrorproductionsdetectsthe
anticipatederrorswhenanerrorproductionisusedduringparsing.Theparsercanthengenerate
appropriateerrordiagnosticsabouttheerroneousconstructthathasbeenrecognizedintheinput.
4.GlobalCorrection
Acompilermakesfewchangesaspossibleinprocessinganincorrectinputstring.Thereare
algorithmsforchoosingaminimalsequenceofchangestoobtainagloballyleastcostcorrection.
GivenanincorrectinputstringxandgrammarG,thesealgorithmswillfindaparsetreefor
arelatedstringy,suchthatthenumberofinsertions,deletions,andchangesoftokensrequiredto
transformxintoyisassmallaspossible.Unfortunately,thesemethodsareingeneraltoocostlyto
implementintermsoftimeandspace,sothesetechniquesarecurrentlyonlyoftheoreticalinterest.
3.13YACC
YACCisanacronymfor"YetAnotherCompilerCompiler".Itwasoriginallydevelopedin
theearly1970sbyStephenC.JohnsonatAT&TCorporationandwrittenintheBprogramming
language,butsoonrewritteninC.ItappearedaspartofVersion3Unix,andafulldescriptionof
Yaccwaspublishedin1975.
YaccisacomputerprogramfortheUnixoperatingsystem.ItisaLALRparsergenerator,
generatingaparser,thepartofacompilerthattriestomakesyntacticsenseofthesourcecode,
specificallyaLALRparser,basedonananalyticgrammarwritteninanotationsimilartoBNF.
YaccitselfusedtobeavailableasthedefaultparsergeneratoronmostUnixsystems.The
inputtoYaccisagrammarwithsnippetsofCcode(called"actions")attachedtoitsrules.Its
outputisashiftreduceparserinCthatexecutestheCsnippetsassociatedwitheachruleassoonas
theruleisrecognized.Typicalactionsinvolvetheconstructionofparsetrees.
CS6660

Compiler Design

Unit III

3.28

declarations
%%
translationrules
%%
supportingCroutine
Yacc Specification

Yacc
compiler

y.tab.c

y.tab.c

C
compiler

a.out

input

a.out

output

translate.y

Figure3.14:Creatinganinput/outputtranslatorwithYacc
Yaccspecificationofasimpledeskcalculator
%{
#include<ctype.h>
%}
%tokenDIGIT
%%
line :expr\n{printf(%d\n,$1);}
;
expr :expr+term($$=$1+$3;}
|term
;
term :term* factor { $$ = $1 * $; }
|factor
;
factor:(expr)($$=$2;}
:DIGIT
;
%%
yylex()
{
intc;
c=getchar();
if(isdigit(c))
{
yylval=c0);
returnDIGIT;
}
returnc;

CS6660

Compiler Design

Unit III

3.29

3.14DESIGNOFASYNTAXANALYZERFORASAMPLELANGUAGE
Source
files

Program
Generator

Lexical
Rules

Lex

Generated
Output files

Compiled
Compiler
Program
In execution

Lex.yy.c
cc

Gramma
r
Rules

yacc

Generatin
g
Output

y.tab.c

a.out

Parsed
Output

Input

Figure3.15:DesignofSyntaxAnalyzerwithLexandYacc
YACC(YetAnotherCompilerCompiler).
Automaticallygenerateaparserforacontextfreegrammar(LALRparser)
Allowssyntaxdirecttranslationbywritinggrammarproductionsandsemantic actions

LALR(1)ismorepowerfulthanLL(1).
Workwithlex.YACCcallsyylextogetthenexttoken.
YACCandlexmustagreeonthevaluesforeachtoken.

Likelex,YACCpredatedc++,needworkaroundforsomeconstructswhenusingc++(will
giveanexample).
YACCfileformat:
declarations /*specifytokens,andnonterminals*/
%%
translationrules/*specifygrammarhere*/
%%
supportingCroutines
Commandyaccyaccfileproducesy.tab.c,whichcontainsaroutineyyparse().
yyparse()callsyylex()togettokens.
yyparse()returns0iftheprogramisgrammaticallycorrect,nonzerootherwise
YACCautomaticallybuildsaparserforthegrammar(LALRparser).
Mayhaveshift/reduceandreduce/reduceconflictswhenthegrammarisnotLALR
Inthiscase,youwillneedtomodifygrammartomakeitLALRinorderforyaccto
workproperly.
YACCtriestoresolveconflictsautomatically
Defaultconflictresolution:
shift/reduce>shift
reduce/reduce>firstproductioninthestate

CS6660

Compiler Design

Unit III

3.30

Programtorecognizeavalidvariable(identifier)whichstartswithaletterfollowedbyanynumberof
lettersordigits.
LEX
%{
#include"y.tab.h"
externyylval;%}

%%
[09]+{yylval=atoi(yytext);return
DIGIT;}[azAZ]+{returnLETTER;}
[\t];
\nreturn0;
.{returnyytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%tokenLETTERDIGIT
%%
variable:LETTER|LETTERrest
;
rest:LETTERrest|
DIGITrest|
LETTER|
DIGIT

;
%%
main()
{
yyparse();
printf("Thestringisavalidvariable\n");
}
intyyerror(char*s)
{
printf("thisisnotavalid
variable\n");exit(0);
}
OUTPUT
$lexp4b.l
$yaccdp4b.y

$cclex.yy.cy.tab.cll
$./a.out
input34
Thestringisavalidvariable
$./a.out
89file

Thisisnotavalidvariable

CS6660

Compiler Design

Unit IV

4.1

UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT
4.1SYNTAXDIRECTEDDEFINITIONS
Asyntaxdirecteddefinition(SSD)isageneralizationofacontextfreegrammarinwhich
eachgrammarsymbolhasanassociatedsetofattributes,partitionedintotwosubsetscalledthe
synthesizedandinheritedattributesofthatgrammarsymbol.
Anattributecanrepresentastring,anumber,atype,amemorylocation,orwhatever.The
value of an attribute at a parsetree node is defined by a semantic rule associated with the
productionusedatthatnode.
Thevalueofasynthesizedattributeatanodeiscomputedfromthevaluesofattributesat
thechildrenofthatnodeintheparsetree;thevalueofaninheritedattributeiscomputedfromthe
valuesofattributesatthesiblingsandparentofthatnode.
Example4.1:ThesyntaxdirecteddefinitioninFigure4.1isforadeskcalculatorprogram,This
definition associates an integervalued synthesized attribute called val with each of the
nonterminalsE,T,andF.ForeachE,T,andFproduction.thesemanticrulecomputesthevalueof
attributevalforthenonterminalontheleftsidefromthevaluesofvalforthenonterminalsonthe
rightside.
Production
L En
E E1 +T
E T
T T1 *F
T F
F
(E)
F digit

Semanticrule
Print(E.val)
E.val:=E1.val+T.val
E.val:=T.val
T.val:=T1.val*F.val
T.val:=F.val
F.val:=E.val
F.val:=digit.lexval

Example4.2:Anannotatedparsetreefortheinputstring3*5+4n,

Figure4.1:Annotatedparsetreefor3*5+4n

CS6660

Compiler Design

Unit IV

4.2

4.2CONSTRUCTIONOFSYNTAXTREE
Syntaxdirecteddefinitionsareveryusefulforconstructionofsyntaxtrees.eachnodeina
syntaxtreerepresentsaconstruct;thechildrenofthenoderepresentthemeaningfulcomponentsof
theconstruct.AsyntaxtreenoderepresentinganexpressionE l+E2haslabel+andtwochildren
representingthesubexpressionsElandE2.
Thenodesofasyntaxtreeareimplementedbyobjectswithasuitablenumberoffields.
Eachobjectwillhaveanopfieldthatisthelabelofthenode.
Theobjectswillhaveadditionalfieldsasfollows:

Ifthenodeisaleaf,anadditionalfieldholdsthelexicalvaluefortheleaf.Aconstructor
functionLeaf(op,val)createsaleafobject.Alternatively,ifnodesareviewedasrecords,
thenLeafreturnsapointertoanewrecordforaleaf.
Ifthenodeisaninteriornode,thereareasmanyadditionalfieldsasthenodehas
childreninthesyntaxtree.AconstructorfunctionNodetakestwoormorearguments:
Node(op,c1,c2,...,ck)createsanobjectwithfirstfieldopandkadditionalfieldsfor
thekchildrenc1,...,ck.

Example4.3:TheSattributeddefinitioninFigureconstructssyntaxtreesforasimpleexpression
grammarinvolvingonlythebinaryoperators+and.Asusual,theseoperatorsareatthesame
precedencelevelandarejointlyleftassociative.Allnonterminalshaveonesynthesizedattribute
node,whichrepresentsanodeofthesyntaxtree.
EverytimethefirstproductionEEl+Tisused,itsrulecreatesanodewith'+'foropand
twochildren,El.nodeandT.node,forthesubexpressions.Thesecondproductionhasasimilarrule.
S.
No.
(1)
(2)
(3)
(4)
(5)
(6)

PRODUCTION
EEl+T
EElT
ET
T(E)
Tid
Tnum

SEMANTICRULES
E.node=newNode('+',El.node,T.node)
E.node=newNode('',El.node,T.node)
E.node=T.node
T.node=E.node
T.node=newLeaf(id,id.entry)
T.node=newLeaf(num,num.val)

Example4.3: TheLattributeddefinitioninFigure4.2performsthesametranslationastheS
attributeddefinitioninExample4.3.TheattributesforthegrammarsymbolsE,T,id,andnurnare
asdiscussedinExample4.3.
1) pl=newLeaf(id,entrya);
2)
3)
4)
5)

p2=newLeaf(num,4);
p3=newNode('',pl,p2);
p4=newLeaf(id,entryc);
p5=newNode('+',p3,p4);

CS6660

Compiler Design

Unit IV

4.3

Figure4.2:Syntaxtreefora4+c
Example4.4:InC,thetypeint[2][3]canbereadas,"arrayof2arraysof3integers."The
correspondingtypeexpressionarray(2,array(3,integer))isrepresentedbythetree.
Theoperatorarraytakestwoparameters,anumberandatype.Iftypesarerepresentedby
trees,thenthisoperatorreturnsatreenodelabeledarraywithtwochildrenforanumberandatype.
array
array

2
3

integer

Figure4.3:Typeexpressionforint[2][3]
4.3 BOTTOMUPEVALUATIONOFSATTRIBUTEDEFINITIONS

Anattributegrammarisaformalwaytodefineattributesfortheproductionsofa
formalgrammar,associatingtheseattributestovalues.Theevaluationoccursinthe
nodesoftheabstractsyntaxtree,whenthelanguageisprocessedbysomeparseror
compiler.
Theattributesaredividedintotwogroups: synthesized attributes andinherited
attributes.Thesynthesizedattributesaretheresultoftheattributeevaluationrules,
andmayalsousethevaluesoftheinheritedattributes.Theinheritedattributesare
passeddownfromparentnodes.
Insomeapproaches,synthesizedattributesareusedtopasssemanticinformation
up theparsetree,whileinheritedattributeshelppasssemanticinformationdown
andacrossit.
SyntaxdirecteddefinitionswithonlysynthesizedattributesarecalledSattributes.
ThisiscommonlyusedinLRparsers.

Theimplementationisdonebyusingastacktoholdinformationaboutsubtreesthat
havebeenparsed.

CS6660

Compiler Design

Unit IV

4.4

Atranslatorforanarbitrarysyntaxdirecteddefinitioncanbedifficulttobuild.However,
therearelargeclassesofusefulsyntaxdirecteddefinitionsforwhichitiseasytoconstruct
translators.

Onlysynthesizedattributesappearinthesyntaxdirecteddefinitioninthefollowingtablefor
constructingthesyntaxtreeforanexpression.
S.No.
(1)
(2)
(3)
(4)
(5)
(6)

PRODUCTION
EEl+T
EElT
ET
T(E)
Tid
Tnum

SEMANTICRULES
E.node=newNode('+',El.node,T.node)
E.node=newNode('',El.node,T.node)
E.node=T.node
T.node=E.node
T.node=newLeaf(id,id.entry)
T.node=newLeaf(num,num.val)

Thisapproachcanbeappliedtoconstructsyntaxtreesduringbottomupparsing.thetranslationof
expressionsduringtopdownparsingoftenusesinheritedattributes.
SynthesizedAttributesontheParserStack
AtranslatorforanSattributeddefinitioncanoftenbeimplementedwiththehelpofanLR
parsergenerator.
FromanSattributeddefinition,theparsergeneratorcanconstructatranslatorthatevaluates
attributesasitparsestheinput.
Abottomupparserusesastacktoholdinformationaboutsubtreesthathavebeenparsed.
Wecanuseextrafieldsintheparserstacktoholdthevaluesofsynthesizedattributes.

top

State

X
Y
Z

Value

X.x
Y.y
Z.z

Theabovetableshowsanexampleofaparserstackwithspaceforoneattributevalue.The
stackisimplementedbyapairofarraysstateandvalue.Eachstateentryisapointer(orindex)to
anLR(1)parsingtable,IftheithstatesymbolisA,thenvalue[i]willholdthevalueoftheattribute
associatedwiththeparsetreenodecorrespondingtothisA.
Thecurrenttopofthestackisindicatedbythepointer top.Weassumethatsynthesized
attributesareevaluatedjustbeforeeachreduction.SupposethesemanticruleA.a:=f(X.x,Y.y,Z.z)
isassociatedwiththeproductionAXYZ.BeforeXYZisreducedtoA,thevalueoftheattribute
Z.zisinvalue[top],thatofY.yinvalue[top1],andthatofX.xinvalue[top2].
Ifasymbolhasnoattribute,thenthecorrespondingentryinthevaluearrayisundefined.
Afterthereduction,topisdecrementedby2,thestatecoveringAisputinstate[top](i.e.,whereX
was),andthevalueofthesynthesizedattributeA.aisputinvalue[top].
Example4.4:Considerthesyntaxdirecteddefinitionofthedeskcalculatorinthefollowingtable.
Production
Semanticrule
Print(E.val)
L En
CS6660

Compiler Design

Unit IV

4.5

E
E
T
T
F
F

E1 +T
T
T1 *F
F
(E)
digit

E.val:=E1.val+T.val
E.val:=T.val
T.val:=T1.val*F.val
T.val:=F.val
F.val:=E.val
F.val:=digit.lexval

Figure4.4:Annotatedparsetreefor3*5+4n
ImplementationofadeskcalculatorwithanLRparserisgiveninthetable.
Production
Semanticrule
Print(value[top])
L E$
value[ntop]:=value[top2]+value[top]
E E1 +T
E.val:=T.val
E T
value[ntop]:=value[top2]*value[top]
T T1 *F
T.val:=F.val
T F
value[ntop]:=value[top1]
F (E)
F.val:=digit.lexval
F digit
Thevalueofntopissettotopr+1.Aftereachcodefragmentisexecuted,topissettontop.
ThesynthesizedattributesintheannotatedparsetreecanbeevaluatedbyanLRparserduringa
bottomupparseoftheinputline3*5+4.
Theparseoftheexpression3*5+4$withthestackisshowninthetable.
Input
State
Value
Productionused
3*5+4 $

*5 +4 $ 3
3
Fdigit
*5 +4 $ F
3
*5 +4 $ T
3
TF
5 +4 $ T*
3*
+4 $ T*5
3*5
CS6660

+4 $ T*F

Compiler Design

3*5

Unit IV

Fdigit

4.6

+4 $ T
+4 $ E
4 $ E+
$ E+4
$ E+F
$ E+T
$ E
E$
L

15
15
15 +
15 +4
15 +4
15 +4
19
19
19

TT*F
ET

Fdigit
TF
EE+T
LE$

4.4DESIGNOFPREDICTIVETRANSLATION
Thefollowingalgorithmgeneralizestheconstructionofpredictiveparserstoimplementa
translationschemebasedonagrammarsuitablefortopdownparsing.
Algorithm4.2:Constructionofapredictivesyntaxdirectedtranslator.
Input:Asyntaxdirectedtranslationschemewithanunderlyinggrammarsuitablefor
predictiveparsing.
Output:Codeforasyntaxdirectedtranslator.
Methud:Thetechniqueisamodificationofthepredictiveparserconstruction
1. ForeachnonterminalA,constructafunctionthathasaformalparameterforeachinherited
attributeofAandthatreturnsthevaluesofthesynthesizedattributesofA
2. ThecodefornonterminalAdecides whatproductiontousebasedonthecurrentinput
symbol.
3. The code associated with each production does the following. We consider the tokens,
nonterminals,andactionsontherightsideoftheproductionfromlefttoright.
(i).FortokenXwithsynthesizedattributex,savethevalueofxinthevariabledeclared
forX.x.ThengenerateacalltomatchtokenXandadvancetheinput.
(ii).FornonterminalB,generateanassignmentc:=B(b1,b2, , bk)withafunction
call on the right side, where b1, b2, , bk are the variables for the inherited
attributesofBandcisthevariablefarthesynthesizedattributeofB.
(iii).Foranaction,copythecodeintotheparser,replacingeachreferencetoanattribute
bythevariableforthatattribute.
Example4.5:ThegrammarinisLL(1)andhencesuitablefortapdownparsingcanbe
generatedbypredictingasuitableproductionrule.
EE1+T{E.nptr:=mknode('+',E1.nptr,T.nptr)
}EE1T{E.nptr:=mkrtodr('',El.nptr,
T.nptr)}
ET{E.nptr:=T.nptr}
ER{E.nptr:=R.nptr
}R
Tid{T.nptr:=mkleaf(id,id.entry)}
Tnum{T.nptr:=mkleaf(num,num.entry)}
CombinetwooftheEproductionstomakethetranslatorsmaller.Thenewproductionsusetoken
optorepresent+and.

CS6660

Compiler Design

Unit IV

4.7

EE1opT{E.nptr:=mknode('op',E1.nptr,T.nptr)}
ET{E.nptr:=T.nptr}
ER{E.nptr:=R.nptr}
R
Tid{T.nptr:=mkleaf(id,id.entry)}
T num{T.nptr:=mkleaf(num,num.entry)}

4.5 TYPESYSTEMS

Thedesignofatypecheckerforalanguageisbasedoninformationabout
thesyntacticconstructsinthelanguage.Eachexpressionhasassociatedtypes
tolanguageconstructs.
lf both operands of the arithmetic operators of addition, subtraction and
multiplicationareoftypeinteger,thentheresultisoftypeinteger.
Theresultoftheunary&operatorisapointertotheobjectreferredtobythe
operand.Ifthe type of the operand is '', the type of the result is 'pointer
to ''.
Basic types are the atomic types with no internal structure as far as the
programmerisconcerned.
Thebasictypesareboolean,character,inieger.andreal.
Arrays,records,andsets,pointersandfunctionscanalsobetreatedasconstructedtypes.

TypeExpressions
Atypeexpressioniseitherabasictypeorisformedbyapplyinganoperatorcalledatype
constructortoatypeexpression.Thesetsofbasictypesandconstructorsdependonthelanguageto
bechecked.
Thefollowingaresomeoftypeexpressions:
1. Abasictypeisatypeexpression.Typicalbasictypesforalanguageincludeboolean,char,
integer,float,andvoid(theabsenceofavalue).type_errorisaspecialbasictype.
2. Sincetypeexpressionsmaybenamed,atypenameisatypeexpression.
3. Atypeconstructorappliedtotypeexpressionsisatypeexpression.Constructorsinclude:
a) Arrays:IfTisatypeexpression,thenarray(I,T)isatypeexpressiondenotingthe
type ofanarray withelements oftype TandindexsetI.Iis often arange of
integers.Ex.inta[25];
b) Products:IfT1andT2aretypeexpressions,thentheirCartesianproductT1xT2is
atypeexpression.xassociatestotheleftandthatithashigherprecedence.Products
areintroducedforcompleteness;theycanbeusedtorepresentalistortupleoftypes
(e.g.,forfunctionparameters).
c) Records:Arecordisadatastructurewithnamedfields.Atypeexpressioncanbe
formedbyapplyingtherecordtypeconstructortothefieldnamesandtheirtypes.
d) Pointers:IfTisatypeexpression,thenpointer(T)isatypeexpressiondenotingthe
type"pointertoanobjectoftypeT".Forexample:inta;int*p=&a;
e) Functions:Mathematically,afunctionmapsdementsofoneset(domain)toanother

set(range). FunctionF:DR.Atypeexpressioncanbeformedbyusingthetype
constructorforfunctiontypes.Wewritestfor"functionfromtypestotype
t".
4. Typeexpressionsmaycontainvariableswhosevaluesarethemselvestypeexpressions.

CS6660

Compiler Design

Unit IV

4.8

Example4.6:Thearraytypeint[2][3]canbereadas"arrayof2arraysof3integerseach"and
writtenasatypeexpressionarray(2,array(3,integer)).ThistypeisrepresentedbythetreeinFigure
6.14.Theoperatorarraytakestwoparameters,anumberandatype.
array
array

2
3

integer

Figure4.5:Typeexpressionforint[2][3]

TypeSystems

Atypesystemisacollectionofrulesforassigningtypeexpressionstothevariouspartsofa
program.Atypecheckerimplementsatypesystem,Thetypesystemsarespecifiedina
syntaxdirectedmanner.
Different type systems may be used by different compilers or processors of the same
language.Forexample,inPascal,thetypeofanarrayincludestheindexsetofthearray,so
afunctionwithanarrayargumentcanonlybeappliedtoarrayswiththatindexset.

StaticandDynamicCheckingofTypes

Checkingdonebyacompilerissaidtobestatic,whilecheckingdonewhenthetarget
programrunsistermeddynamic.
Anycheckcanbedonedynamically,ifthetargetcodecarriesthetypeofanelementalong
withthevalueofthatelement.
Asoundtypesystemeliminatestheneedfordynamiccheckingfurtypeerrorsbecauseit
allowsustodeterminestaticallythattheseerrorscannotoccurwhenthetargetprogram
runs.
Inasoundtypesystem,Typeerrorscannotoccurwhenthetargetcoderun.
Alanguageisstronglytyped,ifitscompilercanguaranteethattheprogramsitacceptswill
executewithouttypeerrors.Eg.Forintegersintarray[255];.

ErrorRecovery

Sincetypecheckinghasthepotentialforcatchingerrorsinprograms.itisimportant
foratypecheckertodosomethingreasonablewhenanerrorisdiscovered.
Attheveryleast,thecompilermustreportthenatureandlocationoftheerror.

Itisdesirableforthetypecheckertorecoverfromerrors,soitcanchecktherestof
theinput.
Sinceerrorhandlingaffectsthetypecheckingrules.Ithastobedesignedintothe
typesystemrightfromthestart;therulesmustbepreparedtocopewitherrors.
Copingwithmissinginformationrequiresforerrorhandling.

4.6 SPECIFICATIONOFASIMPLETYPECHECKER
Specificationofasimpletypecheckerforasimplelanguageinwhichthetypeofeach
identifiermustbedeclaredbeforetheidentifierisused.Thetypecheckerisatranslationscheme
thatsynthesizesthetypeofeachexpressionfromthetypesofitssubexpressions.Thetypechecker
canhandlesarrays,pointers,statements,andfunctions.
Specificationofasimpletypecheckerincludesthefollowing:

CS6660

Compiler Design

Unit IV

4.9

ASimpleLanguage
TypeCheckingofExpressions
TypeCheckingofStatements
TypeCheckingofFunctions

ASimpleLanguage
Thefollowinggrammargeneratesprograms,representedbythenonterrninalP,consistingofa
sequenceofdeclarationsDfollowedbyasingleexpressionE.
PD;E
DD;D|id:T
Tchar | integer | array[num] of T | T
Atranslationschemeforaboverules:
PD;E
DD;D
Did:T
{addtype(id.entry,T.type}
Tchar
{T.type:=char}
Tinteger
{T.type:=integer}
TT1

{T.type:=pointer(T1.type)}

Tarray[num]ofT1

{T.type:=array(1..num.val,T1.type)}

TypeCheckingofExpressions
ThesynthesizedattributetypeforEgivesthetypeexpressionassignedbythetypesystem
totheexpressiongeneratedbyE.Thefollowingsemanticrulessaythatconstantsrepresentedby
thetokensliteralandnumhavetypecharandinteger,respectively:
Rule
SemanticRule
Eliteral
{E.type:=char}
Enum
{E.type:=integer}
Afunctionlookup(e)isusedtofetchthetypesavedinthesymboltableentrypointedtoby
e.Whenanidentifierappearsinanexpression,itsdeclaredtypeisfetchedandassignedtothe
attributetype;
Eid
{E.type:=lookup(id.entry}
Theexpressionformedbyapplyingthemodoperatortotwosubexpressionsoftypeinteger
hastypeinteger;otherwise,itstypeistype_error.Theruleis
EE1modE2{E.type:=ifE1.type=integerandE2.type=integerthen
integer
elsetypr_error}
InanarrayreferenceE1[E2],theindexexpressionE2musthavetypeinteger,inwhichcase
theresultisthedementtypetobtainedfromthetypearray(s.t)ofE 1;wemakenouseoftheindex
setsofthearray.
EE1[E2]{E.type:=ifE2.type=integerandE1.type=array(s,t)thent

elsetypr_error}
CS6660

Compiler Design

Unit IV

4.10

Withinexpressions,thepostfixoperatoryieldstheobjectpointedtobyitsoperand.The
type of E is the type of the object pointed to by the pointer E:
EE1
{E.type:=ifE1.type=ponter(t)then t
elsetypr_error}
TypeCheckingofStatements
Sincelanguageconstructslikestatementstypicallydonothavevalues,thespecialbasic
type void can be assigned to them. If an error is detected within a statement, then the type
type_errorassigned.
Theassignmentstatement,conditionalstatement,andwhilestatementsareconsideredfor
thetypeChecking.TheSequencesofstatementsareseparatedbysemicolons.
Sid:=E

{S.type:=ifid.type=E,typethenvoid
elsetype_error}

EifEthenS1

{S.type:=ifE.type=BooleanthenS1.type
elsetype_error}
{S.type:=ifE.type=BooleanthenS1.type
elsetype_error}

EwhileEdoS1
ES1;S2

{S.type:=ifS1.type=voidandS2.type=voidthenvoid
elsetype_error}

TypeCheckingofFunctions
Theapplicationofafunctiontoanargumentcanbecapturedbythe
productionEE(E)
inwhichanexpression1stheapplicationofoneexpressiontoanother.Therulesforassociating
typeexpressionswithnonterminalTcanbeaugmentedbythefollowingproductionandactionto
permitfunctiontypesindeclarations.
TT1''T2
{T.type:=T1.typeT2.type}
Quotesaroundthearrowusedasafunctionconstructordistinguishitfromthearrowusedasthe
metasyrnbolinaproduction.
Theruleforcheckingthetypeofafunctionapplicationis
EE1(E2){E.type:=ifE2.type=sandE1.type=stthent
elsetypr_error}
4.7 EQUIVALENCEOFTYPEEXPRESSIONS

Iftwotypeexpressionsareequalthenreturnacertaintypeelsereturntype_error.
Itisimportanttohaveaprecisedefinitiontosaythattwotypeexpressionsareequivalent.

Thekeyissueiswhetheranameinatypeexpressionstandsforitselforwhetheritis
anabbreviationforanothertypeexpression.
For efficiency, compilers we representations that allow type equivalence to be
determinedquickly.

Thenotionoftypeequivalenceimplementedbyaspecificcompilercanoftenbe
explainedusingtheconceptsofstructuralandnameequivalence

CS6660

Compiler Design

Unit IV

4.11

InC,thisisachievedbytypedefandstructstatement.
StructuralEquivalenceofTypeExpressions
Type expressions are built from basic types and constructors, a natural notion of
equivalencebetweentwotypeexpressionsisstructuralequivalence;i.e.,twoexpressionsareeither
thesamebasictype,orareformedbyapplyingthesameconstructortostructurallyequivalent
types.Thatis,twotypeexpressionsarestructurallyequivalentifandonlyiftheyareidentical.
Forexample,thetypeexpressionintegerisequivalentonlytointegerbecausetheyarethe
samebasictype.
Similarly, pointer (integer) is equivalent only to pointer (integer) because the two are
formedbyapplyingthesameconstructorpointertoequivalenttypes.
Thealgorithmrecursivelycomparesthestructureoftypeexpressionswithoutcheckingfor
cycles so it can be applied to a tree or a dag representation. It assumes that [he only type
constructorsareforarrays,products,pointers,andfunctions.
Theconstructedtypearray(n1,t1)andarray(n2,t2)areequivalentiffn1=n2andt1=t2.
Algorithmsequiv(s,t)
if(sandtaresamebasictype)then
returntrue
elseif(s=array(s1,s2)andt=array(t1,t2))then
return(sequiv(s1,t1)andsequiv(s2,t2))

elseif(s=s1xs2andt=t1xt2)then
return(sequiv(s1,t1)andsequiv(s2,t2))
elseif(s=pointer(s1)andt=pointer(t1)then
return(sequiv(s1,t1))
elseif(s=s1 s2 andt=t1
t2 ) then
return(sequiv(s1,t1)and
sequiv(s2,t2))elsereturnfalse
Example4.7:TheencodingoftypeexpressionsinthisexampleisfromaCCompilerforfast
checkingoftypeequivalence.
BASICTYPE
ENCODING
boolean
0000
char
0001
integer
0010
real
0011
TYPECONSTRUCTOR
pointer
array
freturns

ENCODING
01
10
11

TYPEEXPRESSION
char

ENCODING
0000000001

freturns(char)
CS6660

0000110001

Compiler Design

Unit IV

pointer(freturns(char))
array(pointer(freturns(char)))

4.12

0001110001
1001110001

NamesforTypeExpressions
Insomelanguages,typescanbegivennames(Datatypename).Forexample,inthePascal
programfragment.
type link = cell;
var
next :link;
last
:link;
p
: cell;
q,r
: cell;
Theidentifier link is declared to be a name for the type cell. Thevariablesnext,last,p,
q,rarenotidenticaltype,becausethetypedependsontheimplementation.
Typegraphisconstructedtocheckthenameequivalence.

Everytimeatypeconstructororbasictypeisseen,anewnodeiscreated.
Everytimeanewtypenameisseen,aleafiscreated.

twotypeexpressionsareequivalentiftheyarerepresentedbythesamenodeinthetype
graph.

Example4.8:ConsiderPascalprogramfragment
type

link
np
nqr

= cell;
= cell;
= cell;

var

next
last
p
q
r

:link;
:link;
:np;
:nqr;
:nqr;

Theidentifier link is declared to be a name for the type cell.newtypenamesnpandnqrhave


beenintroduced.sincenextandlastaredeclaredwiththesametypename,theyaretreatedas
havingequivalenttypes.Similarly,qandraretreatedashavingequivalenttypesbecausethesame
implicittypenameisassociatedwiththem.However,p,q,andnextdonothaveequivalenttypes,
sincetheyallhavetypeswithdifferentnames.
next last

link = pointer

point
er

pointe
r

cell

Figure4.6:Associationofvariablesandnodesinthetypegraph.

Notethattypenamecel1hasthreeparents.Alllabeledpointer.Anequalsignappearsbetweenthe
typenamelinkandthenodeinthetypegraphtowhichitrefers.

CS6660

Compiler Design

Unit IV

4.13

Example:CheckforequivalenceoftypeexpressionsforthefollowingCcode:
typedefstruct
{
intdata[100];
intcount;

}Stack;
typedefstruct
{
intdata[100];
intcount;

}Set;
Stackx,y;
Setr,s;
Nameequivalence: Themoststraightforward:twotypesareequalif,andonlyif,theyhavethe
samename.xandywouldbeofthesametypeandrandswouldbeofthesametype,butthetype
ofxorywouldnotbeequivalenttothetypeofrors.
x=y;valid
r=s;valid
structuralequivalence:Twotypesareequalif,andonlyif,theyhavethesame"structure"
x=r;

valid

usingNameequivalence&structuralequivalencethetwotypesStackandSetaretype
equivalent.

CS6660

Compiler Design

Unit IV

4.14

4.8TYPECONVERSIONS
Considerexpressionslikex+i,wherexisoftypefloatandiisoftypeinteger.Sincethe
representationofintegersandfloatingpointnumbersisdifferentwithinacomputeranddifferent
machine instructions areusedforoperations onintegers andfloats, thecompiler mayneed to
convertoneoftheoperandsof+toensurethatbothoperandsareofthesametypewhenthe
additionoccurs.
Supposethatintegersareconvertedtofloatswhennecessary,usingaunaryoperator(float).
Forexample,theinteger2isconvertedtoafloatinthecodefortheexpression2*3.14:
tl=(float)2
t2=tl*3.14

TheattributeE.type,whosevalueiseitherintegerorfloat.
TheruleassociatedwithEEl+E2buildsonthepseudocode
if(E1.type=integerandE2.type=integer) E.type=integer;
elseif(E1.type=floatandE2.type=integer)E.type=float;
elseif(E1.type=integerandE2.type=float)E.type=float;
elseif(E1.type=floatandE2.type=float)E.type=float;

Typeconversionrulesvaryfromlanguagetolanguage.TherulesforJavainFigure4.7
distinguish between widening conversions, which are intended to preserve information, and
narrowingconversions,whichcanloseinformation.
double

double

float

float

long

long

int

int

short

char

cha
r

short

byte

byte

(a)Wideningconversions
(b)Narrowingconversions
Figure4.7:ConversionsbetweenprimitivetypesinJava
Coercions
Conversionfromonetypetoanotherissaidtobeimplicitifitisdoneautomaticallybythe
compiler. Implicit type conversions, also called coercions, are limited in many languages to
wideningconversions.Conversionissaidtobeexplicitiftheprogrammermustwritesomethingto
causetheconversion.Explicitconversionsarealsocalledcasts.
ThesemanticactionforcheckingEEl+E2usestwofunctions:
1. max(tl,t2)takestwotypestlandt2andreturnsthemaximum(orleastupperbound)ofthe
twotypes inthewideninghierarchy.Itdeclaresanerrorifeither tl or t2 is notinthe
hierarchy;e.g.,ifeithertypeisanarrayorapointertype.
2. widen(a,t,w)generatestypeconversionsifneededtowidenanaddressaoftypetintoa
valueoftypew.Itreturnsaitselfiftandwarethesametype.Otherwise,it

CS6660

Compiler Design

Unit IV

4.15

generatesaninstructiontodotheconversionandplacetheresultinatemporaryt,
whichisreturnedastheresult.

Pseudocodeforwiden,assumingthattheonlytypesareintegerandfloat.
Addrwiden(Addra,Typet,Typew)
{
if(t=w)returna;
elseif(t=integerandw=float)
{
temp=newTemp();
gen(temp'=''(float)'
a);returntemp;
}
elseerror;
}
Introducingtypeconversionsintoexpressionevaluation
EEl+E2{E.type=max(E1.type,E2.type);
a1=widen(E1.addr,E1.type,E.type);
a2=widen(E2.addr,E2.type,E.type);
E.addr=newTemp();

gen(E.addr'='a1'+'a2);}
Example4.9.Considerexpressionsformedbyapplyinganarithmeticoperatoraptoconstantsand
identifiers, as in the grammar. Suppose there are two types real and integer, with integers
convertedtorealswhennecessary.AttributetypeofnonterminalEcanbeeitherintegerorreal,and
thetypecheckingrulesareshownbelow,functionlookup(e)returnsthetypesavedinthesymbol
tableentrypointedtobye.
PRODUCTION
SEMANTICRULE
Enum
E.type=integer
Enum.num
E.type=real
Eid
E.type=lookup(id.entry)
EE1opE2

E.type=if(E1.type=integerandE2.type=integer)
theninteger
elseif(E1.type=integerandE2.type=real)
thenreal
elseif(E1.type=realandE2.type=integer)
thenreal
elseif(E1.type=realandE2.type=real)
thenreal

elsetype_error

CS6660

Compiler Design

Unit IV

4.16

4.9RUNTIMEENVIRONMENT:SOURCELANGUAGEISSUES
RunTimeEnvironment

RunTimeEnvironmentestablishesrelationshipsbetweennamesanddataobjects.
TheallocationanddeallocationofdataobjectsaremanagedbytheRunTimeEnvironment
Eachexecutionofaprocedureisreferredtoasanactivationoftheprocedure.

Iftheprocedureisrecursive,severalofitsactivationsmay&aliveatthesametime.Each
callofaprocedureleadstoanactivationthatmaymanipulatedataobjectsallocatedforits
use.
Therepresentationofadataobjectatruntimeisdeterminedbyitstype.

Often,elementarydatatypes,suchascharacters,integers,andrealscanberepresentedby
equivalentdataobjectsinthetargetmachine.
However, aggregates, such as arrays, strings,and structures, are usually represented by
collectionsofprimitiveobjects.

SourceLanguageIssues
1.
2.
3.
4.
5.

Procedure
ActivationTrees
ControlStack
TheScopeofaDeclaration
BindingsofNames

Procedure

Aproceduredefinitionisadeclarationthatassociatesanidentifierwithastatement.The
identifieristheprocedurenameandthestatementistheprocedurebody.
Aprocedurereturnsvalueforthecalledfunction.
Acompleteprogramwillalsobetreatedasaprocedure.

Whenaprocedurenameappearswithinanexecutablestatement,wesaythattheprocedure
iscalledatthatpoint.
Thebasicideaisthataprocedurecallexecutestheprocedurebody.

Someoftheidentifiersappearinginaproceduredefinitionarespecial,andarecalledformal
parametersoftheprocedure.
Actualparametersmaybepassedtoacalledprocedure.
Procedurescancontainslocalandglobalvariables.

ActivationTrees

Wemakethefollowingassumptionsabouttheflowofcontrolamongproceduresduringthe
executionofaprogram:
1. Controlflowssequentially;thatis,theexecutionofaprogramconsistsofasequence
ofsteps,withcontrolkingatsomespecificpointintheprogramateachstep.
2. Eachexecutionofaprocedurestartsatthebeginningoftheprocedurebodyand
eventuallyreturnscontroltothepointimmediatelyfollowingtheplacewherethe
procedurewascalled.Thismeanstheflowofcontrolbetweenprocedurescanbe
depictedusingtrees.

EachexecutionofaprocedurebodyIsreferredtoasan activation oftheprocedure,The


lifetimeofanactivationofaprocedurepisthesequenceofstepsbetweenthefirstandlast
stepsintheexecutionoftheprocedurebody.
If a and b arc procedure activations, then their lifetimes are either nonoverlapping or

nested.

CS6660

Compiler Design

Unit IV

4.17

Aprocedureisrecursiveifanewactivationcanbeginbeforeanearlieractivationofthe
sameprocedurehasended.
Thelifetimeoftheactivationquicksort(1,9)isthesequenceofstepsexecutedbetween
printingenterquicksort(1,9)andprintingleavequicksort(l,9).
Thefollowingaretherulestoconstructanactivationtree:
1.
2.

Eachnoderepresentsanactivationofaprocedure.
Therootnoderepresentstheactivationofthemainprogram.
3. Thenodeforaistheparentofthenodeforbifandonlyifcontrolflowsfromactivationa
tob.
4. Thenodeforaistotheleftofthenodeforbifandonlyifthelifetimeofaoccursbefore
thelifetimeofb.
entermain()
enterreadarray()
leavereadarray()
enterquicksort(1,9)

enterpartition(l,9)
leavepartition(1,9)
enterquicksort(l,3)
...
leavequicksort(1,3)
enterquicksort(5,9)

...
leavequicksort(5,9)
leavequicksort(1,9)
leavemain()

Figure4.8:Anactivationtreecorrespondingtotheoutputofactivationofquicksort
ControlStack
Theflowofcontrolinaprogramcorrespondstoadepthfirsttraversaloftheactivationtree
thatstartsattheroot,visitsanodebeforeitschildren,andrecursivelyvisitschildrenateachnodein
alefttorightorder.
Wecanuseastack,calledacontrolstacktokeeptrackofliveprocedureactivations;the
ideaistopushthenodeforactivationontothecontrolstackastheactivationbeginsandtopopthe
nodewhentheactivationends.Thenthecontentsofthecontrolstackarerelatedtopathstotheroot
oftheactivationtree.Whennodenisatthetopofthecontrolstack,thestackcontainsthenodes

alongthepathfromntotheroot.

CS6660

Compiler Design

Unit IV

4.18

Example 4.10: Figure 4.8 shows nodes from the activation tree of Figure 4.9 that have been
reachedwhencontrolenterstheactivationrepresentedbyq(2,3).Activationswithlabelsr,p(1,9),
p(1,3),andq(1,3)haveexecutedtocompletion,sothefigurecontainsdashedlinestotheirnodes.
Thesolidlinesmarkthepathfromq(2,3)totheroot.

Figure4.9:Thecontrolstackcontainsnodesalongapathtotheroot.
TheScopeofaDeclaration

Adeclarationinalanguageisasyntacticconstructthatassociatesinformationwithaname.
Declarationsmaybeexplicit,asinthe Pascal fragment vari:integer; ortheymaybe
Implicit.Forexample,anyvariablenamestartingwithIisassumedtodenoteanintegerina
Fortranprogram,unlessotherwisedeclared.

Thescoperulesofalanguagedeterminewhichdeclarationofanameapplieswhenthe
nameappearsinthetextofaprogram.
The portion of the program to which a declaration applies is called the scope of that
declaration.Anoccurrenceofanameinaprocedureissaidtobelocaltotheprocedureifit
isinthescopeofadeclarationwithintheprocedure;otherwise,theoccurrenceissaidtobe
nonlocal.

Atcompiletime,thesymboltablecanbeusedtofindthedeclarationthatappliestoan
occurrenceofaname.
Special,static,global,volatile,finalandsoonarealsousedtodeclarevariables.

BindingsofNames
Evenifeachnameisdeclaredonceinaprogram,thesamenamemaydenotedifferentdata
objectsatruntime.Theinformalterm"dataobject"correspondstoastoragelocationthatcanhold
values.
Inprogramminglanguagesemantics,thetermenvironmentreferstoafunctionthatmapsa
nametoastoragelocation,andthetermstatereferstoafunctionthatmapsastoragelocationtothe
valueheldhereasinFigure4.10.
environment

name

storag
e

state

value

Figure4.10:Twostagemappingfromnamestovalues
Environments and states are different; an assignment changes the state, but not the
environment.Forexample,supposethatstorageaddress100,associatedwithvariablepi,holds0.

Aftertheassignmentpi:=3.14,thesamestorageaddressisassociatedwithpi,butthevalueheld
thereis3.14.

CS6660

Compiler Design

Unit IV

4.19

Whenanenvironmentassociatesstoragelocationswithanamex,wesaythatxisboundto
s;theassociationitselfisreferredtoasabindingofx.Thetermstorage"location"istobetaken
figuratively.Ifxisnotofabasictype,thestoragesforxmaybeacollectionofmemorywords.
Staticnotion
definitionofaprocedure

Dynamiccounterpart
activationsoftheprocedure
bindingsofthename

Declarationofaname
Scopeofadeclaration

lifetimeofabinding

4.10STORAGEORGANIZATION
Theexecutingtargetprogramrunsinitsownlogicaladdressspaceinwhicheachprogram
valuehasalocation.Themanagementandorganizationofthislogicaladdressspaceisshared
betweenthecompiler,operatingsystem,andtargetmachine.Theoperatingsystemmapsthelogical
addressesintophysicaladdresses,whichareusuallyspreadthroughoutmemory.
Theruntimerepresentationofanobjectprograminthelogicaladdressspaceconsistsof
dataandprogramareasasshowninFigure.AcompilerforalanguagelikeC++onanoperating
systemlikeLinuxmightsubdividememoryinthisway.
Theruntimestorageissubdividedtoholdcodeanddataasfollows:

Thegeneratedtargetcode
Dataobjects
Controlstack(whichkeepstrackofinformationofprocedureactivations0
00

Code
Static data
Heap

Free Memory

Stack

FF

Figure4.11:Typicalsubdivisionofruntimememoryintocodeanddataareas
Thesizeofthegeneratedtargetcodeisfixedatcompiletime,sothecompilercanplacethe
executabletargetcodeinastaticallydeterminedareaCode,usuallyinthelowendofmemory.
Thesizeofsomeprogramdataobjects,suchasglobalconstants,anddatageneratedbythe
compiler,suchasinformationtosupportgarbagecollection,maybeknownatcompiletime,andthese
dataobjectscanbeplacedinanotherstaticallydeterminedareacalledStatic.Onereasonfor

CS6660

Compiler Design

Unit IV

4.20

staticallyallocatingasmanydataobjectsaspossibleisthattheaddressesoftheseobjectscanbe
compiledintothetargetcode.Inearlyversions ofFortran,alldataobjects couldbeallocated
statically.
Tomaximizetheutilizationofspaceatruntime,theothertwoareas,StackandHeap,areat
theoppositeendsoftheremainderoftheaddressspace.Theseareasaredynamic;theirsizecan
changeastheprogramexecutes.Theseareasgrowtowardseachotherasneeded.Thestackisused
tostoredatastructurescalledactivationrecordsthatgetgeneratedduringprocedurecalls.
ActivationRecords
Procedurecallsandreturnsareusuallymanagedbyaruntimestackcalledthecontrolstack.
Eachliveactivationhasanactivationrecord(sometimescalledaframe)onthecontrolstack.The
contentsofactivationrecordsvarywiththelanguagebeingimplemented.
Actualparametersode
Returnedvalues
Controllink
Accesslink
Savedmachinestatus
Localdata
Temporaries
Figure4.12:Ageneralactivationrecord
Thefollowingarethecontentsinanactivationrecord

2.

5.

1. Temporaryvalues,suchasthosearisingfromtheevaluationofexpressions,incaseswhere
thosetemporariescannotbeheldinregisters.
Localdatabelongingtotheprocedurewhoseactivationrecordthisis.
3. Asavedmachinestatus,withinformationaboutthestateofthemachinejustbeforethecall
totheprocedure.Thisinformationtypicallyincludesthereturnaddressandthecontentsof
registersthatwereusedbythecallingprocedureandthatmustberestoredwhenthereturn
occurs.
4. An"accesslink"maybeneededtolocatedataneededbythecalledprocedurebutfound
elsewhere,e.g.,inanotheractivationrecord.
Acontrollink,pointingtotheactivationrecordofthecaller.
6. Spaceforthereturnvalueofthecalledfunction,ifany.Again,notallcalledprocedures
returnavalue,andifonedoes,wemayprefertoplacethatvalueinaregisterforefficiency.
7. The actual parameters used by the calling procedure. Commonly, these values are not
placedintheactivationrecordbutratherinregisters.

CS6660

Compiler Design

Unit IV

4.21

4.11STORAGEALLOCATION
Therearebasicallythreestorageallocationstrategyisusedineachofthethreedataareasin
theorganization.
1. Staticallocationlaysoutstorageforalldataobjectsatcompiletime.
2. Stackallocationmanagestheruntimestorageasastack,
3. Heapallocationallocatesanddeallocatesstorageasneededatruntimefromadata
areaknownasaheap,
1. StaticAllocation

Instaticallocation,namesareboundtostorageastheprogramiscompiled,sothereisno
needforaruntimesupportpackage.
Sincethebindingsdonotchangeatruntime,everytimeaprocedureisactivated,itsnames
areboundtothesamestoragelocations.
Theabovepropertyallowsthevaluesoflocalnamestoberetainedacrossactivationsofa
procedure.Thatis,whencontrolreturnstoaprocedure,thevaluesofthelocalsarethesame
astheywerewhencontrolleftthelasttime.
Fromthetypeofaname,thecompilerdeterminestheamountofstoragetosetasideforthat
name.
Theaddressofthisstorageconsistsofanoffsetfromanendoftheactivationrecordforthe
procedure.
Thecompilermusteventuallydecidewheretheactivationrecordsgo,relativetothetarget
codeandtooneanother.

Thefollowingarethelimitationsforstaticmemoryallocation.
1. Thesizeofadataobjectandconstraintsonitspositioninmemorymustbeknownat
compiletime.
2. Recursiveproceduresarerestricted,becauseallactivationsofaprocedureusethe
samebindingsforlocalnames.
3. Dynamicallocationisnotallowed.SoDatastructurescannotbecreateddynamically.
2. StackAllocation
1. Stackallocationisbasedontheideaofacontrolslack.
2. AstackisaLastInFirstOut(LIFO)storagedevicewherenewstorageisallocated
anddeallocatedatonlyone``end'',calledtheTopofthestack.
3. Storageisorganizedasastack,andactivationrecordsarepushedandpoppedas
activationsbeginandend,respectively.
4. Storageforthelocalsineachcallofaprocedureiscontainedintheactivationrecord
forthatcall.Thuslocalsareboundtofreshstorageineachactivation,becauseanew
activationrecordispushedontothestackwhenacallismade.
5. Furthermore,thevaluesoflocalsaredetectedwhentheactivationends;thatis,the
valuesarelostbecausethestorageforlocalsdisappearswhentheactivationrecordis
popped.
6. Atruntime,anactivationrecordcanbeallocatedanddeallocatedbyincrementing
anddecrementingtopofthestackrespectively.

a.Callingsequence

CS6660

Compiler Design

Unit IV

4.22

Thelayoutandallocationofdatatomemorylocationsintheruntimeenvironmentarekey
issuesinstoragemanagement.Theseissuesaretrickybecausethesamenameinaprogramtextcan
refertomultiplelocationsatruntime.
Thetwoadjectivesstaticanddynamicdistinguishbetweencompiletimeandruntime,respectively.
Wesaythatastorageallocationdecisionisstatic,ifitcanbemadebythecompilerlookingonlyat
thetextoftheprogram,notatwhattheprogramdoeswhenitexecutes.
Conversely,adecisionisdynamicifitcanbedecidedonlywhiletheprogramisrunning.Many
compilersusesomecombinationofthefollowingtwostrategiesfordynamicstorageallocation:
1. Stackstorage.Nameslocaltoaprocedureareallocatedspaceonastack.Thestacksupports
thenormalcall/returnpolicyforprocedures.
2. Heapstorage.Datathatmayoutlivethecalltotheprocedurethatcreateditisusually
allocatedona"heap"ofreusablestorage.

Figure4.13:Divisionoftasksbetweencallerandcallee

Thecodeforthecalleecanaccessitstemporariesandlocaldatausingoffsetsfromtopsp.Thecall
sequenceis:
1. Thecallerevaluatesactual.
2. Thecallerstoresareturnaddressandtheoldvalueof topsp intothecake'sactivation
record.ThecallerthenincrementstopsptothepositionshowninFigure4.17Thatis,top
spismovedpastthecaller'slocaldataandtemporariesandthecalk'sparameterandstatus
fields.
3. Thecalleesavesregistervaluesandotherstatusinformation.
4. Thecalleeinitializesitslocaldataandbeginsexecution.
Apossiblereturnsequenceis:
1. Thecalleeplacesareturnvaluenexttotheactivationrecordofthecaller.

CS6660

Compiler Design

Unit IV

4.23

2. Usingtheinformationinthestatusfield,thecalleerestorestopspandother
registersandbranchestoareturnaddressinthecaller'scode.
3. Although topsp has been decremented, the caller can copy the returned
valueintoitsownactivationrecordanduseittoevaluateanexpression.
b. variablelengthdata
1. Variablelengthdataarenotstoredintheactivationrecord.Onlyapointerto
thebeginningofeachdataappearsintheactivationrecord.
2. Therelativeaddressesofthesepointersareknownatcompiletime.
c. danglingreferences
1. A danglingreference occurswhenthereisareferencetostoragethathas
beendeallocated.
2. Itisalogicalerrortousedanglingreferences,Sincethevalueofdeallocated
storageisundefinedaccordingtothesemanticsofmostlanguages.
3. Heapallocation
1. Thedeallocationofactivationrecordsneednotoccurinalastinfirstout
fashion,sostoragecannotbeorganizedasastack.
2. Heap allocation parcels out pieces of contiguous storage, as needed for
activationrecordsorotherobjects.Piecesmaybedeallocatedinanyorder.
Soovertimetheheapwillconsistofalternateareasthatarefreeandinuse.
3. Heapisanalternateforstack.
4.12PARAMETERPASSING
Allprogramminglanguageshaveanotionofaprocedure,buttheycandifferinhowthese
procedures get their arguments. The actual parameters (the parameters used in the call of a
procedure)areassociatedwiththeformalparameters(thoseusedintheproceduredefinition).
Callbyvalue
itisanexpression)orcopied(ifitisato
In callbyvalue, the actual parameter is evaluated (if thecorrespondingformalparameterof
variable).Thevalueisplacedinthelocationbelonging
thecalledprocedure.ThismethodisusedinCandJava.

Theactualparametersareevaluatedandtheirrvaluesarcpassedtothecalledprocedure.
Callbyvaluecanbeimplementedasfollows:
O Aformalparameteristreatedjustlikealocalname,sothestoragefortheformalsis
intheactivationrecordofthecalledprocedure.
O Thecallerevaluatestheactualparametersandplacestheirrvaluesinthestoragefor
theformals.
Callbyreference
Incallbyreference,theaddressoftheactualparameterispassedtothecalleeasthevalue
ofthecorrespondingformalparameter.Usesoftheformalparameterinthecodeofthecalleeare
implementedbyfollowingthispointertothelocationindicatedbythecaller.Changestotheformal
parameterthusappearaschangestotheactualparameter.

When parameters are passed by reference (also known as callbyaddress or callby


location),thecallerpassestothecalledprocedureapointertothestorageaddressofeachactual
parameter.
O Ifanactualparameterisanameoranexpressionhavingan lvalue.Thenthat l

valueitselfispassed.

CS6660
O

Compiler Design

Unit IV

4.24

However,iftheactualparameterisanexpression,likea+bor2,thathasno lvalue,then
theexpressionisevaluatedinanewlocation,andtheaddressofthatlocationispassed.

Copyrestore

Ahybridbetweencallbyvalueandcallbyreferenceiscopyrestorelinkage(copyincopy
oat.orva1ueresult).
Beforecontrolflowstothecalledprocedure,Theactualparametersareevaluated.Ther
valuesoftheactualsarepassedtothecalledprocedureasincallbyvalue.
Whencontrolreturns,thecurrentrvaluesoftheformalparametersarecopiedbackintothe
lvaluesoftheactuals.

Callbyname
A mechanism callbyname was used in the early programming language Algol 60. It
requiresthatthecalleeexecuteasiftheactualparameterweresubstitutedliterallyfortheformal
parameterinthecodeofthecallee,asiftheformalparameterwereamacrostandingfortheactual
parameter.
CallbynameistraditionallydefinedbythecopyruleofAlgol.
1. Theprocedureistreatedasifitwereamacro;thatis,itsbodyissubstitutedforthe
callinthecaller,withtheactualparametersliterallysubstitutedfortheformals.
Suchaliteralsubstitutioniscalledmacroexpansionorinlineexpansion.
2. Thelocalnamesofthecalledprocedurearekeptdistinctfromthenamesofthe
callingprocedure.eachlocalofthecalledprocedurebeingsystematicallyrenamed
intoadistinctnewnamebeforethemacroexpansionisdone.
3. Theactualparametersaresurroundedbyparenthesesifnecessarytopreservetheir
integrity.
4.13 SYMBOLTABLES
Symboltables aredatastructures thatareusedbycompilers toholdinformationabout
sourceprogramconstructs.Theinformationiscollectedincrementallybytheanalysisphasesofa
compilerandusedbythesynthesisphasestogeneratethetargetcode.Entriesinthesymboltable
containinformationaboutanidentifiersuchasitscharacterstring(orlexeme),itstype,itsposition
instorage,andanyotherrelevantinformation.

Lexical

Syntax

Semantic

Analyzer

Analyzer

Analyzer

Intermediate
code
generator

Code Code optimizer

generator

Symbol
Table

Figure4.14:interactionamongSymboltableandvariousphasesofcompiler

Thesymboltable,whichstoresinformationabouttheentiresourceprogram,isusedby
allphasesofthecompiler.

CS6660

Compiler Design

Unit IV

4.25

Anessentialfunctionofacompileristorecordthevariablenames usedinthesource
programandcollectinformationaboutvariousattributesofeachname.
Theseattributesmayprovideinformationaboutthestorageallocatedforaname,itstype,its
scope.
Inthecaseofprocedurenames,suchthingsasthenumberandtypesofitsarguments,the
methodofpassingeachargument(forexample,byvalueorbyreference),andthetype
returnedaremaintainedinsymboltable.
Thesymboltableisadatastructurecontainingarecordforeachvariablename,withfields
fortheattributesofthename.Thedatastructureshouldbedesignedtoallowthecompilerto
findtherecordforeachnamequicklyandtostoreorretrievedatafromthatrecordquickly.
Asymboltablecanbeimplementedinoneofthefollowingways:
O Linear(sortedorunsorted)list
O BinarySearchTree
O Hashtable
Amongtheaboveall,symboltablesaremostlyimplementedas hashtables,wherethe
sourcecodesymbolitselfistreatedasakeyforthehashfunctionandthereturnvalueisthe
informationaboutthesymbol.
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
O Tostorethenamesofallentitiesinastructuredformatoneplace.
O Toverifyifavariablehasbeendeclared.
O Toimplementtypechecking,byverifyingassignmentsandexpressions.
O Todeterminethescopeofaname(scoperesolution).

SymbolTableEntries
A compiler uses a symbol table to keep track of scope and binding information about
names.Thesymboltableissearchedeverytimeanameisencounteredinthesourcetext.Changes
tothetableoccurifanewnameornewInformationaboutanexistingnameisdiscovered.Alinear
lististhesimplesttoimplement,butits performanceispoor.Hashingschemes providebetter
performance.

Thesymboltablegrowsdynamicallyeventhoughfixedatcompiletime.
Eachentryinthesymboltableisforthedeclarationofaname.
Theformatofentriesdoesnotuniform.

Eachentrycanbeimplementedasarecordconsistingofasequenceofconsecutivewordsof
memory.
Tokeepsymboltablerecordsuniform;itmaybeconvenientforsomeoftheinformation
aboutanametobekeptoutsidethetableentry,withonlyapointertothisinformation
storedintherecord.
Thefollowinginformationaboutidentifiersarestoredinsymboltable.
O Thename.
O Thedatatype.
O Theblocklevel.
O Itsscope(local,global).
O Pointer/address
O Itsoffsetfrombasepointer
O Functionname,parameter,andvariable.

CharactersinaName

Thereisadistinctionbetweenthetokenidforanidentifierorname.

Thelexemeconsistingofthecharacterstringformingthename,andtheattributesofthe
name.

CS6660

Compiler Design

Unit IV

4.26

Stringsofcharactersmaybeunwieldytoworkwith,socompilersoftenusesomefixed
lengthrepresentationofthenameratherthanthelexeme.
Thelexemeisneededwhenasymboltableentryissetupforthefirsttime,andwhenwe
lookupalexemefoundintheinputtodeterminewhetheritisanamethathasalready
appeared.
Acommonrepresentationofanameisapointertoasymboltableentryforit.

Ifthereisamodestupperboundonthelengthofaname,thenthecharactersinthenamecanbe
storedinthesymboltableentry,asinFigure4.15.

Figure4.15:SymboltablenamesInfixedsizespacewithinarecord
Ifthereisnolimitonthelengthofaname,orifthelimitisrarelyreached,theindirectschemeof
Figure4.16canbeused.

Figure4.16:symboltablenamesInaseparatearray
StorageAllocationInformation

Informationaboutthestoragelocationsthatwillbebundtonamesatruntimeiskeptinthe
symboltable.
Staticanddynamicallocationcanbedone.
Storageisallocatedforcode,data,stack,andheap.
COMMONblocksinFortranareloadedseparately.
TheListDataStructureforSymbolTables

Thecompilerplansouttheactivationrecordforeachprocedure.

Thesimplestandeasiesttoimplementdatastructureforasymboltableisalinearlistof
recordsasshowninfigure4.17.

CS6660

Compiler Design

Unit IV

4.27

Weuseasinglearray,orequivalentlyseveralarrays.tostorenamesandtheirassociated
information.
Ifthesymboltablecontainsnnames,Tofindthedataaboutaname,ontheaverage,we
searchn/2names,sothecostofaninquiryisalsoproportionalton.
Id1
Info1
Id1
Info1

...
Idn
Infon

available

Figure4.17.Alinearlistofrecords.

HashTablesforSymbolTables

Variations of the searching technique known as hashing have been implemented in many
compilers.
openhashingisasimplestvariantofsearchingtechnique.
Even this scheme gives us the capability of performing e inquiries on n names in time
proportionalton(n+e)/m,foranyconstantmofourchoosing.
Thismethodisgenerallymoreefficientthanlinearlistsandisthemethodofchowforsymbol
tablesinmostsituations.
ThebasichashingschemeisillustratedinFigure4.34.Therearetwopartstothedata
structure:
1. Ahashtableconsistingofafixedarrayofmpointerstotableentries.
2. Tableentriesorganizedintomseparatelinkedlists,calledbuckers(somebucketsmaybe
empty).Eachrecordinthesymboltableappearsonexactlyoneoftheselists.

CS6660

Compiler Design

Unit IV

4.28

Figure4.18:Ahashtableofsize210.
RepresentingScopeInformation
Asimpleapproachistomaintainaseparatesymboltableforeachscope.Ineffect,the
symbol table foraprocedure orscopeis the compiletimeequivalentofanactivationrecord.
LinkedlistisbesttorepresenttheScopeInformation.

Figure4.19:Themostrecententryforaisnearthefront.
4.14DYNAMICSTORAGEALLOCATION
Thetechniquesneededtoimplementdynamicstorageallocationismainlydependsonhow
thestoragedeallocated.Ifdeallocationisimplicit,thentheruntimesupportpackageisresponsible
fordeterminingwhenastorageblockisnolongerneeded.Thereislessacompilerhastodoif
deallocationisdoneexplicitlybytheprogrammer.
ExplicitAllocationofFixedSizedBlocks
Thesimplestformofdynamicallocationinvolvesblocksofafixedsize.Bylinkingthe
blocksinalist,asinFigure4.41.Allocationanddeallocationcanbedonequicklywithlittleorno
storageoverhead.

Figure4.20:Adeallocatedblockisaddedtothelitofavailableblocks.

CS6660

Compiler Design

Unit IV

4.29

Supposethatblocksaretobedrawnfromacontiguousareaofstorage.Initializationofthe
areaisdonebyusingaportionofeachblockforalinktothenextblock.Apointeravailablepoints
tothefirstblock.Allocationconsistsoftakingablockoffthelistanddeallocationconsistsof
puttingtheblockbackonthelist.
ExplicitAllocationofVariableSizedBlocks
Whenblocksareallocatedanddeallocated,storagecanbecomefragmented;thatis,theheapmay
consistofalternateblocksthatarefreeandinuse,asinFigure4.42.

Figure4.21:Freeandusedblocksinaheap.
ThesituationshowninFigure4.42canoccurifaprogramallocatesfiveblocksandthende
allocatesthesecondandfourth,Forexample.Fragmentationisofnoconsequenceifblocksareof
fixedsize,butiftheyareofvariablesize,asituationlikeFigure7.42isaproblem,becausewe
couldnotallocateablocklargerthananyoneofthefreeblocks,eventhoughthespaceisavailable.
Firstfit,worstfitandbestfitaresomemethodsforallocatingvariablesizedblocks
ImplicitDeallocation
Implicit deallocation requires cooperation between the user program and the runtime
package, because the latter needs to know when a storage block is no longer in use. This
cooperationisimplementedbyfixingtheformatofstorageblocks,theformatofastorageblockis
asshowninFigure4.43.

Figure4.22:Theformatofablock.
Referencecounts:Wekeeptrackofthenumberofblocksthatpointdirectlytothepresent
block.Ifthiscounteverdropsto0,thentheblockcanbedeallocatedbecauseitcannotbereferred
to.Inotherwords,theblockhasbecomegarbagethatcanbecollected.Maintainingreference
countscanbecostlyintime.
Markingtechniques: Analternativeapproachistosuspendtemporarilyexecutionofthe
userprogramandusethefrozenpointerstodeterminewhichblocksareinuse.
4.15STORAGEALLOCATIONINFORTAN.
FORTRAN was designed to permit static storage allocation. However, there are some
issues, such as the treatment of COMMON and EQUIVALENCE declarations, that are fairly
specialtoFortran.

CS6660

Compiler Design

Unit IV

4.30

AFortrancompilercancreateanumberofdataareas,ie.,blocksofstorageinwhichthe
valuesofobjectscanbestored.
ThereisonedataareaforeachprocedureandonedataareaforeachnamedCOMMON
blockandforblankCOMMON,ifused.
Thesymboltablemustrecordforeachnamethedataareainwhichitbelongsanditsoffset
inthatdataarea,thatis,itspositionrelativetothebeginningofthearea.
Thecompilermusteventuallydecidewherethedataareasgorelativetotheexecutablecode
andtooneanother,butthischoiceisarbitrary,sincethedataareasareindependent.
DATAinCOMMONAreas
Arecordiscreatedforeachblockwiththefirstandlastnamesofthecurrentprocedure,that
isdeclaredtobeinthatCOMMONblock.
Adeclarationis:COMMON/BLOCK1/NAMEl,NAME2
Thecompilermustdothefollowing:
1. InthetableforCOMMONblocknames,createarecordforBLOCK1,ifonedoesnot
alreadyexist.
2. InthesymboltableentriesforNAME1andNAME2,setapointertothesymboltableentry
forBLOCK1,indicatingthattheseareinCOMMONandmembersofBLOCK1.
3. a)IftherecordhasjustnowbeencreatedforBLOCK1.setapointerinthatrecordtothe
symboltableentryforNAME1,indicatingthefirstnameinthisCOMMONblock.Then,
linkthesymboltableentryforNAME1tothatforNAME2,usingafieldofthesymbol
tablereservedforlinkingmembersofthesameCOMMONblock.Finally,setapointerin
therecordforBLOCK1tothesymboltableentryforNaME2,indicatingthelastfound
memberofthatblock.
b) If,however,thisisnotthefirstdeclarationofBLOCK1,simplylinkNAME1and
NAME2totheendofthelistofnamesforBLOCK1.Thepointertotheendofthe
listforBLOCK1,appearingintherecordforBLOCK1.
Afteraprocedurehasbeenprocessed,wecalltheequivalencealgorithm;Abitinthesymboltable
entryforXYZisset,indicatingthatXYZhasbeenequivalencedtosomething.Createamemory
mapforeachCOMMONblockbyscanningthelistofnamesforthatblock.
EQUIVALENCEstatements
Thefirstalgorithmsforprocessingequivalencestatementsappearedinassemblersrather
thancompilers.Sincethesealgorithmscanbeabitcomplex,especiallywheninteractionsbetween
COMMONandEQUIVALENCEstatementsareconsidered,letustreatfirstasituationtypicalof
anassemblylanguage,wheretheonlyEQUIVALENCEstatementsareofthe'form
EQUIVALENCEA,B+offset
whereAandBarethenamesoflocations.ThisstatementmakesAdenotethelocationthatisoffset
memoryunitsbeyondthelocationforB.
A sequence of EQUIVALENCE statements groups names into equivalence sets whose
positionsrelativetooneanotherarealldefinedbytheEQUIVALENCEstatements,
EQUIVALENCEA,B+100
EQUIVALENCEC,D40

CS6660

Compiler Design

Unit V

5.1

UNITVSYNTAXANALYSIS
5.1 PRINCIPALSOURCESOFOPTIMIZATION
Acompileroptimizationmustpreservethesemanticsoftheoriginalprogram.

Exceptinveryspecialcircumstances,onceaprogrammerchoosesandimplementsa
particularalgorithm,thecompilercannotunderstandenoughabouttheprogramto
replaceitwithasubstantiallydifferentandmoreefficientalgorithm.

Acompilerknowsonlyhowtoapplyrelativelylowlevelsemantictransformations,
usinggeneralfactssuchasalgebraicidentitieslikei+0=i.
5.1.1CausesofRedundancy

Therearemanyredundantoperationsinatypicalprogram.Sometimestheredundancyis
availableatthesourcelevel.
Forinstance,aprogrammermayfinditmoredirectandconvenienttorecalculatesome
result,leavingittothecompilertorecognizethatonlyonesuchcalculationisnecessary.
Butmoreoften,theredundancyisasideeffectofhavingwrittentheprograminahighlevel
language.

Asaprogramiscompiled,eachofthesehighleveldatastructureaccessesexpandsintoa
number of lowlevel pointer arithmetic operations, such as the computation of the
locationofthe(i,j)thelementofamatrixA.
Accesses to the same data structure often share many common lowlevel operations.
Programmers are not aware of these lowlevel operations and cannot eliminate the
redundanciesthemselves.
5.1.2ARunningExample:Quicksort
Considerafragmentofasortingprogramcalledquicksorttoillustrateseveralimportant
codeimprovingtransformations.TheCprogramforquicksortisgivenbelow
voidquicksort(intm,intn)
/*recursivelysortsa[m]througha[n]*/
{
inti,j;
intv,x;
if(n<=m)return;
/*fragmentbeginshere*/
i=m1;j=n;v=a[n];
while(1){
doi=i+1;while(a[i]<v);
doj=j1;while(a[j]>
v);if(i>=j)break;
x=a[i];a[i]=a[j];a[j]=x;/*swapa[i],a[j]*/
}
x=a[i];a[i]=a[n];a[n]=x;/*swapa[i],a[n]*/
/*fragmentendshere*/
quicksort(m,j);quicksort(i+1,n);
}

Figure5.1:Ccodeforquicksort

CS6660

Compiler Design

Unit V

5.2

IntermediatecodeforthemarkedfragmentoftheprograminFigure5.1isshowninFigure
5.2. In this example we assume that integers occupy four bytes. The assignment x = a[i] is
translatedintothetwothreeaddressstatementst6=4*iandx=a[t6]asshowninsteps(14)and(15)
ofFigure.5.2.Similarly,a[j]=xbecomest10=4*janda[t10]=xinsteps(20)and(21).

Figure5.2:ThreeaddresscodeforfragmentinFigure.5.1

Figure5.3:FlowgraphforthequicksortfragmentofFigure5.1
Figure5.3istheflowgraphfortheprograminFigure5.2.BlockB 1istheentrynode.All
conditionalandunconditionaljumpstostatementsinFigure5.2havebeenreplacedinFigure5.3
byjumpstotheblockofwhichthestatementsareleaders.InFigure5.3,therearethreeloops.

CS6660

Compiler Design

Unit V

5.3

BlocksB2andB3areloopsbythemselves.BlocksB 2,B3,B4,andB5togetherformaloop,withB 2
theonlyentrypoint.

5.1.3SemanticsPreservingTransformations
Thereareanumberofwaysinwhichacompilercanimproveaprogramwithoutchanging
the function it computes. Common subexpression elimination, copy propagation, deadcode
elimination,andconstantfoldingarecommonexamplesofsuchfunctionpreserving(orsemantics
preserving)transformations.

(a)Before
(b)After
Figure5.4:Localcommonsubexpressionelimination
Someoftheseduplicatecalculationscannotbeavoidedbytheprogrammerbecausetheylie
belowthelevelofdetailaccessiblewithinthesourcelanguage.Forexample,blockB5shownin
Figure 5.4(a) recalculates 4 * i and 4 *j, although none of these calculations were requested
explicitlybytheprogrammer.
5.1.4GlobalCommonSubexpressions
AnoccurrenceofanexpressionEiscalledacommonsubexpressionifEwaspreviously
computedandthevaluesofthevariablesinEhavenotchangedsincethepreviouscomputation.
WeavoidrecomputingEifwecanuseitspreviouslycomputedvalue;thatis,thevariablexto
whichthepreviouscomputationofEwasassignedhasnotchangedintheinterim.
Theassignmentstot7andt10inFigure5.4(a)computethecommonsubexpressions4*i
and4*j,respectively.ThesestepshavebeeneliminatedinFigure5.4(b),whichusest6insteadof
t7andt8insteadoft10.
Figure9.5showstheresultofeliminatingbothglobalandlocalcommonsubexpressions
fromblocksB5andB6intheflowgraphofFigure5.3.WefirstdiscussthetransformationofB 5
andthenmentionsomesubtletiesinvolvingarrays.
Afterlocalcommonsubexpressionsareeliminated,B5stillevaluates4*iand4*j,asshown
inFigure5.4(b).Botharecommonsubexpressions;inparticular,thethreestatements
t8=4*j
t9=a[t8]
a[t8]=x
inB5canbereplaced
byt9=a[t4]
a[t4]=x
usingt4computedinblockB3.InFigure5.5,observethatascontrolpassesfromtheevaluationof
4*jinB3toB3,thereisnochangetojandnochangetot4,sot4canbeusedif4*jisneeded.

CS6660

Compiler Design

Unit V

5.4

Another common subexpression comes to light in B5 after t4 replaces t8. The new
expressiona[t4]correspondstothevalueofa[j]atthesourcelevel.Notonlydoesjretainitsvalue
ascontrolleavesB3andthenentersB5,buta[j],avaluecomputedintoatemporaryt5,doestoo,
becausetherearenoassignmentstoelementsofthearrayaintheinterim.Thestatements
t9=a[t4]
a[t6]=t9
inB5thereforecanbereplaced
bya[t6]=t5
Analogously,thevalueassignedtoxinblockB5ofFigure5.4(b)isseentobethesameas
thevalueassignedtot3inblockB2.BlockB5inFigure5.5istheresultofeliminatingcommon
subexpressionscorrespondingtothevaluesofthesourcelevelexpressionsa[i]anda[j]fromB 5in
Figure5.4(b).AsimilarseriesoftransformationshasbeendonetoB6inFigure5.5.
The expression a[tl] in blocks B1 and B6 of Figure 5.5 is not considered a common
subexpression,althoughtlcanbeusedinbothplaces.AftercontrolleavesB 1andbeforeitreaches
B6,itcangothroughB5,wherethereareassignmentstoa.Hence,a[tl]maynothavethesame
value on reaching B6 as it did on leaving B 1, and it is not safe to treat a[tl] as a common
subexpression.

Figure5.5:B5andB6aftercommonsubexpressionelimination
5.1.5CopyPropagation
Block B5 in Figure 5.5 can be further improved by eliminating x, using two new
transformations.Oneconcernsassignmentsoftheformu=vcalledcopystatements,orcopiesfor

CS6660

Compiler Design

Unit V

5.5

short. Copies would have arisen much sooner, because the normal algorithm for eliminating
commonsubexpressionsintroducesthem,asdoseveralotheralgorithms.

(a)

(b)

Figure5.6:Copiesintroducedduringcommonsubexpressionelimination
Inordertoeliminatethecommonsubexpressionfromthestatement c=d+e inFigure
5.6(a),wemustuseanewvariablettoholdthevalueofd+e.Thevalueofvariablet,insteadof
thatoftheexpression d+e,isassignedtocinFigure5.6(b).Sincecontrolmayreach c=d+e
eitheraftertheassignmenttoaoraftertheassignmenttob,itwouldbeincorrecttoreplacec=d+e
byeitherc=aorbyc=b.
Theideabehindthecopypropagationtransformationistousevforu,whereverpossible
afterthecopystatementu=v.Forexample,theassignmentx=t3inblockB 5ofFigure5.5isa
copy.CopypropagationappliedtoB5yieldsthecodeinFigure5.7.Thischangemaynotappearto
beanimprovement,but,itgivesustheopportunitytoeliminatetheassignmenttox.

Figure5.7:BasicblockB5aftercopypropagation
5.1.6DeadCodeElimination
AvariableisLIVEatapointinaprogramifitsvaluecanbeusedsubsequently;otherwise,it
isDEADatthatpoint.ArelatedideaisDEAD(orUSELESS)codestatementsthatcomputevaluesthat
nevergetused.Whiletheprogrammerisunlikelytointroduceanydeadcodeintentionally,itmay
appearastheresultofprevioustransformations.
Deducing at compile time that the value of an expression is a constant and using the
constantinsteadisknownasconstantfolding.
Oneadvantageofcopypropagationisthatitoftenturnsthecopystatementintodeadcode.
Forexample,copypropagationfollowedbydeadcodeeliminationremovestheassignmenttox
andtransformsthecodeinFigure5.7into

ThiscodeisafurtherimprovementofblockB5inFigure5.5.
5.1.7CodeMotion
Loops are a very important place for optimizations, especially the inner loops where
programstendtospendthebulkoftheirtime.Therunningtimeofaprogrammaybeimprovedif
wedecreasethenumberofinstructionsinaninnerloop,evenifweincreasetheamountofcode
outsidethatloop.

CS6660

Compiler Design

Unit V

5.6

Animportantmodificationthatdecreasestheamountofcodeinaloopis CODEMOTION.This
transformationtakesanexpressionthatyieldsthesameresultindependentofthenumberoftimesa
loopisexecuted(aloopinvariantcomputation)andevaluatestheexpressionbeforetheloop.
Evaluationoflimit2isaloopinvariantcomputationinthefollowingwhilestatement:
while(i<=limit2)/*statementdoesnotchangelimit*/
Codemotionwillresultintheequivalentcode
t=limit2
while(i<=t)/*statementdoesnotchangelimitort*/
Now,thecomputationoflimit2isperformedonce,beforeweentertheloop.Previously,there
wouldben+1calculationsoflimit2ifweiteratedthebodyoftheloopntimes.
5.1.8InductionVariablesandReductioninStrength
Anotherimportantoptimizationistofindinductionvariablesinloopsandoptimizetheir
computation.Avariablexissaidtobean"inductionvariable"ifthereisapositiveornegative
constantcsuchthateachtimexisassigned,itsvalueincreasesbyc.Forinstance, i and t2 are
inductionvariablesintheloopcontainingB2ofFigure5.5.Inductionvariablescanbecomputed
withasingleincrement(additionorsubtraction)perloopiteration.Thetransformationofreplacing
anexpensiveoperation,suchasmultiplication,byacheaperone,
such as addition, is known as strength reduction. But induction variables not only allow us
sometimestoperformastrengthreduction;oftenitispossibletoeliminateallbutoneofagroupof
inductionvariableswhosevaluesremaininlockstepaswegoaroundtheloop.

Figure5.8:Strengthreductionappliedto4*jinblockB3

CS6660

Compiler Design

Unit V

5.7

Whenprocessingloops,itisusefultowork"insideout";thatis,weshallstartwiththe
innerloopsandproceedtoprogressivelylarger,surroundingloops.Thus,weshallseehowthis
optimizationappliestoourquicksortexamplebybeginningwithoneoftheinnermostloops:B3by
itself.Notethatthevaluesofjandt4remaininlockstep;everytimethevalueofjdecreasesby1,
thevalueoft4decreasesby4,because4*jisassignedtot4.Thesevariables,jandt4,thusforma
goodexampleofapairofinductionvariables.
Whentherearetwoormoreinductionvariablesinaloop,itmaybepossibletogetridofall
butone.FortheinnerloopofB3inFig.9.5,wecannotgetridofeitherjort4completely;t4is
usedinB3andjisusedinB4.However,wecanillustratereductioninstrengthandapartofthe
processofinductionvariableelimination.Eventually,jwillbeeliminatedwhentheouterloop
consistingofblocksB2,B3,B4andBsisconsidered.

Figure5.9:Flowgraphafterinductionvariableelimination
AfterreductioninstrengthisappliedtotheinnerloopsaroundB 2andB3,theonlyuseofiandjis
todeterminetheoutcomeofthetestinblockB 4.Weknowthatthevaluesofiandt2satisfythe
relationshipt2=4*i,whilethoseofjandt4satisfytherelationshipt4=4*j.Thus,thetestt2>=
t4cansubstitutefori>=j.Oncethisreplacementismade,iinblockB 2andjinblockB3become
dead variables, and the assignments to them in these blocks become dead code that can be
eliminated.TheresultingflowgraphisshowninFigure.5.9.
Note:
1. Codemotion,inductionvariableeliminationandstrengthreductionareloopoptimization
techniques.
2. Commonsubexpressionelimination,copypropogationdeadcodeeliminationandconstant
foldingarefunctionpreservingtransformations.

CS6660

Compiler Design

Unit V

5.8

5.2DIRECTEDACYCLICGRAPHS(DAG)
Likethesyntaxtreeforanexpression,aDAGhasleavescorrespondingtoatomicoperands
andinteriorcodescorrespondingtooperators.ThedifferenceisthatanodeNinaDAGhasmore
thanoneparentifNrepresentsacommonsubexpression;inasyntaxtree,thetreeforthecommon
subexpressionwouldbereplicatedas many times as thesubexpressionappears intheoriginal
expression.Thus,aDAGnotonlyrepresentsexpressionsmoresuccinctly,itgivesthecompiler
importantcluesregardingthegenerationofefficientcodetoevaluatetheexpressions.
Example:TheDAGfortheexpressiona+a*(bc)+(bc)*dbysequenceofsteps
The leaf for a has two parents, because a appears twice in the expression. More
interestingly,thetwooccurrencesofthecommonsubexpressionbcarerepresentedbyonenode,
thenodelabeled .Thatnodehastwoparents,representingitstwousesinthesubexpressions
a*(bc)and(bc)*d.Eventhoughbandcappeartwiceinthecompleteexpression,theirnodeseach
haveoneparent,sincebothusesareinthecommonsubexpressionbc.

Figure5.10:Dagfortheexpressiona+a*(bc)+(bc)*d
Table5.1:SyntaxdirecteddefinitiontoproducesyntaxtreesorDAG's
S.No.
PRODUCTION
SEMANTICRULES
E.node=newNode('+',El.node,T.node)
1)
EE1 +T
E.node=newNode('',El.node,T.node)
2)
EE1 T
3)
ET
E.node=T.node
4)
T(E)
E.node=T.node
T.node=newLeaf(id,id.entry)
5)
TID
T.node=newLeaf(num,num.val)
6)
TNUM
TheSyntaxdirecteddefinition(SDD)ofFigure5.10canconstructeithersyntaxtreesor
DAG's.ItwasusedtoconstructsyntaxtreesinExample5.10,wherefunctionsLeafandNode
createdafreshnodeeachtimetheywerecalled.ItwillconstructaDAGif,beforecreatinganew
node,thesefunctionsfirstcheckwhetheranidenticalnodealreadyexists.Ifapreviouslycreated
identicalnodeexists,theexistingnodeisreturned.Forinstance,beforeconstructinganewnode,
Node(op,left,right)wecheckwhetherthereisalreadyanodewithlabelop,andchildrenleftand
right,inthatorder.Ifso,Nodereturnstheexistingnode;otherwise,itcreatesanewnode.

CS6660

Compiler Design

Unit V

5.9

1) pl=Leaf(id,entrya)
2) p2=Leaf(id,entrya)=p1
3) p3=Leaf(id,entryb)
4) p4=Leaf(id,entryc)
5) p5=Node('',p3,p4)
6) p6=Node('*',plp5)
7) p7=Node('f'p1,p6)
8) p8=Leaf(id,entryb)=p3
9) p9=Leaf(id,entryc)=p4
10) pl0=Node('',p3,p4)=p5
11) p11=Leaf(id,entryd)
12) p12=Node('*',p5,p11)
7

13) p13=Node('+',p ,pl2)


Assumethatentryapointstothesymboltableentryfora,andsimilarlyfortheotheridentifiers.
WhenthecalltoLeaf(id,entrya)isrepeatedatstep2,thenodecreatedbythepreviouscallis
returned,sop2=pl.Similarly,thenodesreturnedatsteps8and9arethesameasthosereturnedat
steps3and4(i.e.,p8=p3andp9=p4).Hencethenodereturnedatstep10mustbethesameatthat
returnedatstep5;i.e.,p10=p5.
5.3OPTIMIZATIONOFBASICBLOCKS
Wecanoftenobtainasubstantialimprovementintherunningtimeofcodemerelyby
performinglocaloptimizationwithineachbasicblockbyitself.
5.3.1TheDAGRepresentationofBasicBlocks
Manyimportanttechniquesforlocaloptimizationbeginbytransformingabasicblockintoa
DAG(directedacyclicgraph).Theideaextendsnaturallytothecollectionofexpressionsthatare
createdwithinonebasicblock.WeconstructaDAGforabasicblockasfollows:
1. ThereisanodeintheDAGforeachoftheinitialvaluesofthevariablesappearinginthe
basicblock.
2. ThereisanodeNassociatedwitheachstatementswithintheblock.ThechildrenofNare
those nodes corresponding to statements that are the last definitions, prior to s, of the
operandsusedbys.
3. NodeNislabeledbytheoperatorappliedats,andalsoattachedtoNisthelistofvariables
forwhichitisthelastdefinitionwithintheblock.
4. Certainnodesaredesignatedoutputnodes.Thesearethenodeswhosevariablesareliveon
exitfromtheblock;thatis,theirvaluesmaybeusedlater,inanotherblockoftheflow
graph.Calculationofthese"livevariables"isamatterforglobalflowanalysis,
TheDAGrepresentationofabasicblockletsusperformseveralcodeimprovingtransformations
onthecoderepresentedbytheblock.
a) Wecaneliminatelocalcommonsubexpressions,thatis,instructionsthatcomputeavalue
thathasalreadybeencomputed.
b) Wecaneliminatedeadcode,thatis,instructionsthatcomputeavaluethatisneverused.
c) Wecanreorderstatementsthatdonotdependononeanother;suchreorderingmayreduce
thetimeatemporaryvalueneedstobepreservedinaregister.
d) We can apply algebraic laws to reorder operands of threeaddress instructions, and
sometimestherebysimplifythecomputation.

CS6660

Compiler Design

Unit V

5.10

Example:ConstructDAGfromthebasicblock.
1

t1=4*i

t2=a[t1]

t3=4*i

t4=b[t3]

t5=t2*t4

t6=prod+t5

t7=i+1

i=t7

ifi<=20goto1

Statement1
*

Statement2

Statement3

[]

[]

t1

t2

Statement5

pro
d

* t5

* t1, t3

FinalDAG
t6,
* prod
prod

* t5
[] t2

[] t4

b
4

[] t2 [] t4

t2 [] t4
b

t3

t6,
* prod

* t1,

Statement8,9

* t5
[
]

t1, t3

[] t4

t6,
* prod

*
4

Statement6,7

[] t4

t2

* t1, t3

prod

[]
a

* t5
[] t2

t2

* t1
4

Statement4

<=
t1, t3
i

t7,
i
1

2
0

t1, t3
i

t7, i
1

Figure5.11:StepbystepconstructionofDAG

CS6660

Compiler Design

Unit V

5.11

5.3.2FindingLocalCommonSubexpressions
Commonsubexpressionscanbedetectedbynoticing,asanewnodeMisabouttobeadded,
whetherthereisanexistingnodeNwiththesamechildren,inthesameorder,andwiththesame
operator.Ifso,NcomputesthesamevalueasMandmaybeusedinitsplace.
Example5.10:ADAGfortheblock
a=b+c
b=ad
c=b+c
d=ad

isshowninFigure5.11.Whenweconstructthenodeforthethirdstatementc=b+c,we
knowthattheuseofbinb+creferstothenodeofFigure5.11labeled,becausethatisthemost
recentdefinitionofb.Thus,wedonotconfusethevaluescomputedatstatementsoneandthree.

Figure5.11:DAGforbasicblock
However,thenodecorrespondingtothefourthstatementd=adhastheoperatorandthe
nodeswithattachedvariablesaandd0aschildren.Sincetheoperatorandthechildrenarethesame
asthoseforthenodecorrespondingtostatementtwo,wedonotcreatethisnode,butadddtothe
listofdefinitionsforthenodelabeled.
Ifbisnotliveonexitfromtheblock,thenwedonotneedtocomputethatvariable,andcan
usedtoreceivethevaluerepresentedbythenodelabeled.
a=b+c
d=ad

c=d+c
However,ifbothbanddareliveonexit,thenafourthstatementmustbeusedtocopythe
valuefromonetotheother.'
Example5.11:Whenwelookforcommonsubexpressions,wereallyarelookingforexpressions
thatareguaranteedtocomputethesamevalue,nomatterhowthatvalueiscomputed.Thus,the
DAGmethodwillmissthefactthattheexpressioncomputedbythefirstandfourthstatementsin
thesequence
a=b+c
b=bd
c=c+d

e=b+c

CS6660

Compiler Design

Unit V

5.12

isthesame,namelyb0+c0.Thatis,eventhoughbandcbothchangebetweenthefirstand
laststatements,theirsumremainsthesame,becauseb+c=(bd)+(c+d).TheDAGforthis
sequenceisshowninFig.5.12,butdoesnotexhibitanycommonsubexpressions.

Figure5.12:DAGforbasicblock
5.3.3DeadCodeElimination
TheoperationonDAG'sthatcorrespondstodeadcodeeliminationcanbeimplementedas
follows. We delete from a DAG any root (node with no ancestors) that has no live variables
attached.RepeatedapplicationofthistransformationwillremoveallnodesfromtheDAGthat
correspondtodeadcode.
Example5.12:If,inFig.5.11,aandbarelivebutcandearenot,wecanimmediatelyremovethe
rootlabelede.Then,thenodelabeledcbecomesarootandcanberemoved.Therootslabeleda
andbremain,sincetheyeachhavelivevariablesattached.

Figure5.13:DAGafterDeadCodeElimination
5.3.4TheUseofAlgebraicIdentities
Algebraicidentitiesrepresentanotherimportantclassofoptimizationsonbasicblocks.For
example,wemayapplyarithmeticidentities,suchastoeliminatecomputationsfromabasicblock.

Anotherclassofalgebraicoptimizationsincludeslocalreductioninstrength,
thatis,replacingamoreexpensiveoperatorbyacheaperoneasin:

A third class of related optimizations is constant folding. Here we evaluate constant


expressions at compile time and replace the constant expressions by their value. Thus the
expression2*3.14wouldbereplacedby6.28.Manyconstantexpressionsariseinpracticebecause
ofthefrequentuseofsymbolicconstantsinprograms.

CS6660

Compiler Design

Unit V

5.13

5.3.5RepresentationofArrayReferences
Considerforinstancethesequenceofthreeaddressstatements:
=

[]

[] =
replacingthethirdinstruction

Theabovecodecanbe"optimized"by

bythesimplerz=

[]

[]

x.However,thefirststatementcannotbeoptimized.
TheproperwaytorepresentarrayaccessesinaDAGisasfollows.
1. Anassignmentfromanarray,likex=a[i],isrepresentedbycreatinganodewithoperator
=[]andtwochildrenrepresentingtheinitialvalueofthearray,a0inthiscase,andtheindex
i.Variablexbecomesalabelofthisnewnode.
2. Anassignmenttoanarray,likea[jl=y,isrepresentedbyanewnodewithoperator[]=and
threechildrenrepresentingao,jandy.Thereisnovariablelabelingthisnode.Whatis
differentisthatthecreationofthisnodekzllsallcurrentlyconstructednodeswhosevalue
dependsona0.Anodethathasbeenkilledcannotreceiveanymorelabels;thatis,itcannot
becomeacommonsubexpression.

Example5.11:TheDAGforthebasicbloc
=

[]
[] =

[]

ThenodeNforxiscreatedfirst,butwhenthenodelabeled[]=iscreated,Niskilled.
Thus,whenthenodeforziscreated,itcannotbeidentifiedwithN,andanewnodewiththesame
operandsa0andi0mustbecreatedinstead.
z

=[]

Killed
=[]

=[]x

a0

a0

j0

y0

Figure5.12:TheDAGforasequenceofarrayassignments

Example5.12:Sometimes,anodemustbekilledeventhoughnoneofitschildrenhaveanarray
likea0inExample5.11asattachedvariable.Likewise,anodecankillifithasadescendantthatis
anarray,eventhoughnoneofitschildrenarearraynodes.Forinstance,considerthethreeaddress
code
b=12+a

x=b[i]
b[j]=y

CS6660

Compiler Design

Unit V

5.14

Whatishappeninghereisthat,forefficiencyreasons,bhasbeendefinedtobeapositionin
anarraya.Forexample,iftheelementsof a arefourbyteslong,thenbrepresentsthefourth
elementofa.Ifjandirepresentthesamevalue,thenb[i]andb[j]representthesamelocation.
Thereforeitisimportanttohavethethirdinstruction,b[j]=y,killthenodewithxasitsattached
variable.
=[]
Killed
=[]

12

i0

j0

y0

Figure5.13:Anodethatkillsauseofanarrayneednothavethatarrayasachild
However,asweseeinFig.5.13,boththekillednodeandthenodethatdoesthekillinghave
a0asagrandchild,notasachild.
5.3.6PointerAssignmentsandProcedure=Calls

Whenweassignindirectly =throughapointer,asintheassignments

wedonotknowwhatporqpointto.Ineffect,x=*pisauseofeveryvariablewhatsoever,and*q
=yisapossibleassignmenttoeveryvariable.Asaconsequence,theoperator=*musttakeall
nodesthatarecurrentlyassociated
withidentifiersasarguments,whichisrelevantfordeadcodeelimination.Moreimportantly,the
*=operatorkillsallothernodessofarconstructedintheDAG.
Thereareglobalpointeranalysesonecouldperformthatmightlimitthesetofvariablesapointercouldreferenceata
given=place&inthecode.Evenlocalanalysiscouldrestrictthescopeofapointer.Forinstance,inthesequence=

weknowthatx,andnoothervariable,isgiventhevalueofy,sowedon'tneedtokillanynodebut
thenodetowhichxwasattached.

Procedurecallsbehavemuchlikeassignmentsthroughpointers.Intheabsenceofglobaldataflow
information,wemustassumethataprocedureusesandchangesanydatatowhichithasaccess.
Thus,ifvariablexisinthescope
ofaprocedureP,acalltoPbothusesthenodewithattachedvariablexandkillsthatnode.
5.3.7ReassemblingBasicBlocksfromDAG's
After we perform whatever optimizations are possible while constructing the DAG or by
manipulatingtheDAGonceconstructed,wemayreconstitutethethreeaddresscodeforthebasic
blockfromwhichwebuilttheDAG.Foreachnodethathasoneormoreattachedvariables,we

CS6660

Compiler Design

Unit V

5.15

constructathreeaddressstatementthatcomputesthevalueofoneofthosevariables.Wepreferto
computetheresultintoavariablethatisliveonexitfromtheblock.However,ifwedonothave
global livevariable information to work from, we need to assume that every variable of the
program(butnottemporariesthataregeneratedbythecompilertoprocessexpressions)isliveon
exitfromtheblock.
If the node has more than one live variable attached, then we have to introduce copy
statementstogivethecorrectvaluetoeachofthosevariables.Sometimes,globaloptimizationcan
eliminatethosecopies,ifwecanarrangetouseoneoftwovariablesinplaceoftheother.
Example8.15:consideragainExample5.11,ifbisnotliveonexitfromtheblock,thenthethree
statements
a=b+c
d=ad
c=d+c
sufficetoreconstructthebasicblock.Thethirdinstruction,c=d+c,mustusedasanoperand
ratherthanb,becausetheoptimizedblocknevercomputesb.
Ifbothbanddareliveonexit,orifwearenotsurewhetherornottheyareliveonexit,
thenweneedtocomputebaswellasd.Wecandosowiththesequence
a=b+c
d=ad
b=d
c=d+c
Thisbasicblockisstillmoreefficientthantheoriginal.Althoughthenumberofinstructionsisthe
same, we have replaced a subtraction by a copy, which tends to be less expensive on most
machines.Further,itmaybethatbydoingaglobalanalysis,wecaneliminatetheuseofthis
computationofboutsidetheblockbyreplacingitbyusesofd.Inthatcase,wecancomebackto
thisbasicblockandeliminateb=dlater.Intuitively,wecaneliminatethiscopyifwhereverthis
value of b is used, d is still holding the same value. That situation may or may not be true,
dependingonhowtheprogramrecomputesd.

5.4GLOBALDATAFLOWANALYSIS
Globaldataflowanalysiscollectstheinformationabouttheentireprogramanddistribute
thisinformationtoeachblockintheflowgraph.Dataflowinformationcanbecollectedbysettingupandsolvingsystemsofequationsthatrelatesinformationatvarious
pointsinaprogram.Theseequationsaretermedasdataflowequations.Atypicaldataflowequationhastheform

Out[S]=gen[S](in[S]kill[S])

Where
gen[S]=DefinitionswithinBthatreachtheendof
B.in[S]= Definitions that reach Bs entry.
kill[S]=DefinitionsthatneverreachtheendofBduetoredefinitionsofvariablesinB..
Out[S]=DefinitionsthatreachBs exit.

Pathsandpoints
Adefinitionpointisapointinaprogramatwhichdefinitioniscarriedoutforavariable.
Areferencepointisapointinaprogramatwhichareferencetoadataitemismade.
Anevaluationpointisapointinaprogramatwhichexpressionisevaluatedcompletely.

CS6660

Compiler Design

x =3
y =x +5
z = x+ y

Unit V

5.16

Definitionpointforavariablex
Referencepointforavariablex
Evaluationpointforavariablez

Thenumberofpointsinabasicblockiscalculatedasfollows:

Apointcalculatedbetweentwoadjacentstatementsinablock.
Apointbeforethefirststatementoftheblock
Apointafterthelaststatementoftheblock

Example8.16:Findthenumberofpointsinthebasic
blocka=b+c
b=e+u
c=8*b
Numberofpointsbetweentwoadjacentstatementsintheblock=2
Numberofpointsbeforethefirststatementoftheblock=1
Numberofpointsafterthelaststatementoftheblock=1
Totalnumberofpoints=2+1+1=4points
Apathfromp1topnisasequenceofpintsp1,p2,...,pnsuchthatforeachibetween1and
n1,either
1. pi is the point immediately preceding a statement and pi+1 is the point immediately
followingthatstatementinthesameblock,or
2. piistheendofsomeblockandpi+1isthebeginningofasuccessorblock.
ReachingDefinitions

Adefinitiondofavariablex:Adefinitiondofavariablexisastatementthatassignsa
valuetox.otherkindsofstatements(procedurecallorpointer)assignmentdefineavalue
forvariablexarecalledambiguousdefinitions.

Useofvariablex:Theuseofvariablexmeansthevalueofxisreferencedinexpression
evaluation.
Reachability:Definitiondofavariablexreachesapointpifthereisapathfromthepoint
immediatelyfollowingdtop,suchthatdisnot"killed"alongthatpath.
Killingavariable:Definitiondofavariablexiskilledwhenthereisaredefinitionforthe
variablex.
Livevariable:Avariablexisliveatsomepointpifthereisapathfromptoexit,along
whichthevalueofxisusedbeforeitisredefined.Otherwisethevariableissaidtobedead
atthatpoint.

CS6660

Compiler Design

Unit V

5.17

Example8.17:Findthereachabilityofvariablex.
x=5
B2

y=7

t=3

B t=x+
4
5

Variable X is defined in B1.

B1

Variable X is used in B4
Variable X is reachable to B4 via B3 (not
killed in B3)

B3
w=
t+10

Variable X killed at B6 (by redefinition)


Variable X is live in B1, B3, and B4

B
5

Variable T can reaches from B3 to B6 via


B5 or B4 In path1 (B3-B4-B6), t
killed.
In path2 (B3-B5-B6), t used.

x=t+
w
B6
b = 15 B7

Figure5.15:Reachingdefinitions
DataflowanalysisofstructuredPrograms
Flow graphs forcontrolflow constructs suchas ifelse and dowhile statements have a
usefulproperty;thereisasinglebeginningpintatwhichcontrolentersandasingleendpointthat
controlleavesfromwhenexecutionofthestatementisover.

S1

S2

S1 ;
S2

If E goto
S1

S1

S2

IF E then S1 else
S2

S1
If E goto
S1

do S1
while E

Figure5.16:Structuredcontrolconstructs
ConservativeEstimationofDataFlowInformation
Optimizationsappliedtothecodemustbesafe.i.e.,thedataflowfactscomputedshould
definitelybetrue.
Twomainreasonsthatcauseresultsofanalysistobeconservative:
1. Controlflow: Thedataflowequationsaregeneratedbasedontheassumption
thatallpathsareexecutable,butinpracticalitwillexecuteonepathinifthen
elsecontrol.
2. Pointersandaliasing: Thevalueofthepointermaynotknowninadvanceto
theprogrammar.

Thedefinitionsreaching:thebeginningandendofstatementswiththefollowingsyntaxgiven
below
Sid=E|S;S|ifEthenSelseS|doSwhileE
Eid+id|id
CS6660

Compiler Design

Unit V

5.18

gen[S]={d}
(a)

d:
a=b+c

kill[S]=Da{d}
out[S]=gen[S]

(b)

(c)

S1

gen[S]=gen[S]
kill[S]=kill[S]

S1

in[S1]=in[S]
in[S2]=in[S1]
out[S]=out[S2]

S1

S1

(IN[S] - kill[S])

gen[S1]kill[S2])
kill[S]gen[S])

gen[S]=gen[S1]gen[S2]
kill[S]=kill[S1]kill[S2]
in[S1]=in[S]
in[S2]=in[S]

out[S]=out[S1]out[S2]

gen[S]=gen[S1]
kill[S]=kill[S1]

in[S1]=in[S]

S1

(d)

in[S2]=in[S]

gen[S1]'

out[S]=out[S1]

Figure5.17:Dataflowequationsforreachingdefinitions
Representationofsets

Thesetofdefinitionsforgen[S]andkill[S]canberepresentedbybitvectors.
Thebitvectorisassigned1toapositionI,ifthedefinitionnumberedIispresentintheset.
Thiscanbetakenastheindexofthestatement.
Thebitvectorrepresentationallowssetoperationstobeimplementedefficiently.
Considerthecode

j=j1
ife1then
a=u2
else
i=u3

/*d5*/
/*d6*/
/*d7*/
if

Gen ={d6,

if

00000
11

00000
00

d7}
Kill ={}
e1

d1

d2

e1

d1
0000011

Gen ={d6, d7} Gen ={ d7}


Kill
={d5}
Kill ={d1, d4}

0000100

d2
00000
01
10010
00

Figure5.17:Setrepresentationandbitvectorrepresentationforgen[]andkill[].

CS6660

Compiler Design

Unit V

5.19

5.5EFFICIENTDATAFLOWALGORITHMS
Dataflowanalysisspeedcanbeincreasedbythefollowingtwoalgorithms
1. DepthFirstOrderinginiterativeAlgorithms:
2. StructurebasedDataFlowAnalysis.
Thefirstisanapplicationofdepthfirstorderingtoreducethenumberof'passesthatthe
iterative algorithm takes, and the second uses intervals or the T 1and T2 transformations to
generalizethesyntaxdirectedapproach.
DepthFirstOrderinginiterativeAlgorithms

Reachingdefinitions.Availableexpressions,orlivevariables,anyeventofsignificanceata
nodewillbepropagatedtothatnodealonganacyclicpath.
Iterativealgorithmscanbeusedtotracktheiracyclicnature.

Ifadefinitiondisinin[B]thenthereissomeacyclicpathfromtheblockcontainingdtoB
suchthatdisinthein'sandout'sallalongthatpath.
Ifanexpressionx+yisnotavailableattheentrancetoblockB,thenthereissomeacyclic
paththatdemonstratesthatfact;eitherthepathisfromtheinitialnodeandincludesno
statementthatkillsorgeneratesx+y,orthepathisfromablockthatkillsx+yandalongthe
paththereisnosubsequentgenerationofx+y.

Forlivevariables.ifxisliveonexitfromblockB,thenthereisanacyclicpathfromBtoa
useofx,alongwiththerearenodefinitionsofx.
IfauseofxisreachedfromtheendofblockBalongapathwithacycle,wecaneliminate
thatcycletofindashorterpathalongwhichtheuseofxisstillreachedfromB.
Procedure
1. Firstvisittherootnodeofthetree.Eg.(1)
2. Ifnorootnodepresent,thenvisitthefirstrighthandsidenode.Eg.(1)
3. Afterreachingdepthvisitthemissednodebyvisitingtheirparentnode.
1
2

2
3

8
10

10

Figure5.18:Depthfirsttraversalforthegiventree.
Theorderofvisitingtheedgesintheabovetreeis:
13467810898764543121
Steps:

Afternode4,thereisconfusion,either5or6,weconsidered6.
Aftervisitingnode10,backtractto8tovisit9.
ThedefinitiondfromOut[1]willreachIn[3]andOut[3]willreachIn[4]andsoon.

CS6660

Compiler Design

Unit V

5.20

StructurebasedDataFlowAnalysis
Wecanimplementdataflowalgorithmsthatvisitnodesnomoretimesthantheinterval
depthoftheflowgraph.TheideasexposedhereapplytosyntaxdirecteddataflowalgorithmsFor
allsortsofstructuredcontrolstatements.
Thisalgorithmfocusonmultipleexistsintheblocks.
GenR,BindicatesthedefinitionthatwasgeneratedintheregionRofthebasicblockB.
KillR,BindicatesthedefinitionthatwaskilledintheregionRofthebasicblockB.
ThetransferfunctionTransR,B(S)ofdefinitionsetSissetofdefinitionsreachtheend
ofblockBbytravelingalongpathswhollywithinR.
ThedefinitionsreachingtheendofblockBfallintotwoclasses.
1. ThosethataregeneratedwithinRandpropagatetotheendofBindependentofS.
2. ThosethatarenotgeneratedinR,butthatalsoarenotkilledalongsomepathfrom
theheaderofRtotheendofB,andthereforeareinTransR,B(S)ifandonlyifthey
areinS.
Thus,wemaywritetransintheform:
TransR,B(S)=GenR,B(SKillR,B)
Case1:
IfthetransformationdoesnotalteranydefinitionIthebasicblockB,thenthetransfer
functionofregionR,issameasthetransferfunctionofBlockB.
B,B
= Gen[B]
Gen
B,B
= Kill[B]
kill

Case2:
TheregionRisformedwhenR1consumesR2.TherearenoedgesfromR2toR1.Headerof
RistheheaderofR1.TheR2doesnotaffectthetransferfunctionofR1.
Gen
R,B =GenR1,B
forallBinR1.
kill R,B =killR1,B
...

R2
...

...

Figure5.19:RegionbuildingbyT2
ForBinR2,adefinitioncanreachtheendofBifanyofthefollowingconditionshold:
1. ThedefinitionisgeneratedwithinR2.
2. ThedefinitionisgeneratedwithinR 1reachestheendofsomepredecessoroftheheaderof
R2,andisnotkilledgoingfromtheheaderofR2toB.
3. ThedefinitionisinthesetSavailableattheheaderofR 1,notkilledgoingtosome
predecessoroftheheaderofR2,andnotkilledgoingfromtheheaderofR2toB.

CS6660

Compiler Design

Unit V

5.21

5.6ISSUESINDESIGNOFACODEGENERATOR
The most important criterion for a code generator is that it produce correct code. The
followingissuesarisesduringthecodegenerationphase.
1InputtotheCodeGenerator
2TheTargetProgram
3MemoryManagement
4InstructionSelection
5RegisterAllocation
6EvaluationOrder
1InputtotheCodeGenerator
Theinputtothecodegeneratoris theintermediaterepresentation(IR) ofthesource
programproducedbythefrontend,alongwithinformationinthesymboltablethatisusedto
determinetheruntimeaddressesofthedataobjectsdenotedbythenamesintheIR.
Thechoicefortheintermediaterepresentationincludesthefollowing:
Threeaddressrepresentationssuchasquadruples,triples,indirecttriples;
Virtualmachinerepresentationssuchasbytecodesandstackmachinecode;
Linearrepresentationssuchaspostfixnotation.
GraphicalrepresentationssuchassyntaxtreesandDAG's.
Thefrontendhasscanned,parsed,andtranslatedthesourceprogramintoarelativelylow
levelIR,sothatthevaluesofthenamesappearingintheIRcanberepresentedbyquantitiesthat
thetargetmachinecandirectlymanipulate,suchasintegersandfloatingpointnumbers.
Allsyntacticandstaticsemanticerrorshavebeendetected,thatthenecessarytypechecking
hastakenplace,andthattypeconversionoperatorshavebeeninsertedwherevernecessary.The
codegeneratorcanproceedontheassumptionthatitsinputiserrorfree.
2TheTargetProgram
Theoutputofthecodegeneratoristhetargetprogramwhichisgoingtoruninthefollowing
computers.
The instructionset architecture of the target machine has a significant impact on the
difficultyofconstructingagoodcodegeneratorthatproduceshighqualitymachinecode.
The most common targetmachine architectures are RISC (reduced instruction set
computer),CISC(complexinstructionsetcomputer),andstackbased.

ARISCmachinetypicallyhasmanyregisters,threeaddressinstructions,simpleaddressing
modes,andarelativelysimpleinstructionsetarchitecture.Incontrast,a CISC machine
typicallyhasfewregisters,twoaddressinstructions,avarietyofaddressingmodes,several
registerclasses,variablelengthinstructions,andinstructionswithsideeffects.

Inastackbasedmachine,operationsaredonebypushingoperandsontoastackandthen
performing the operations on the operands at the top of the stack. To achieve high
performancethetopofthestackiskeptinregisters.
TheJVMisasoftwareinterpreterforJavabytecodes,anintermediatelanguageproduced
by Java compilers. The interpreter provides software compatibility across multiple
platforms, a major factor in the success of Java. To improve the high performance
interpretationjustintime(JIT)Javacompilershavebeencreated.

Theoutputofthecodegeneratormaybe:

Absolutemachinelanguageprogram: Itcanbeplacedinafixedmemorylocationand
immediatelyexecuted.

CS6660

Compiler Design

Unit V

5.22

Reloadablemachinelanguageprogram: Itallowssubprograms(objectmodules)tobe
compiledseparately.Asetofrelocatableobjectmodulescanbelinkedtogetherandloaded
forexecutionbyalinkingloader.thecompilermustprovideexplicitrelocationinformation
totheloaderifautomaticrelocationisnotpossible.

Assemblylanguageprogram: Theprocess ofcodegenerationis somewhateasier,but


assemblymustbeconvertedintomachineexecutablecodewithhelpofassembler.

3MemoryManagement

Namesinthesourceprogramaremappedtoaddressesofdataobjectsinruntimememory
byboththefrontendandcodegenerator.
MemoryManagementusessymboltabletogetnamesinformation.

Theamountofmemoryrequiredbydeclaredidentifiesarecalculatedandstoragespaceis
reservedinmemoryatruntime.
Labelsinthreeaddresscodeareconvertedintoequivalentmemoryaddress.
Forinstanceifareferencetogotojisencounteredinthreeaddresscodethenappropriate
jumpinstructioncanbegeneratedbycomputingmemoryaddressforlabelj.

Someinstructionaddresscanbecalculatedinruntimeonlythatisalsoafterloadingthe
program.

4InstructionSelection

ThecodegeneratormustmaptheIRprogramintoacodesequencethatcanbeexecutedbythe
targetmachine.Thecomplexityofperformingthismappingisdeterminedbyafactorssuchas
Theleveloftheintermediaterepresentation(IR).
Thenatureoftheinstructionsetarchitecture.
Thedesiredqualityofthegeneratedcode.
IftheIRishighlevel,thecodegeneratormaytranslateeachIRstatementintoasequenceof
machineinstructionsusingcodetemplates.Suchstatementbystatementcodegeneration,however,
oftenproducespoorcodethatneedsfurtheroptimization.IftheIRreflectssomeofthelowlevel
detailsoftheunderlyingmachine,thenthecodegeneratorcanusethisinformationtogenerate
moreefficientcodesequences.
Theuniformityandcompletenessoftheinstructionsetareimportantfactors.Theselection
ofinstructiondependsontheinstructionsetofthetargetmachine.Instructionspeedsandmachine
idiomsareotherimportantfactorsinselectionofinstruction.
If we do not care about the efficiency of the target program, instruction selection is
straightforward. For each type of threeaddress statement, we can design a code skeleton that
definesthetargetcodetobegeneratedforthatconstruct.
Forexample,everythreeaddressstatementoftheformx=y+z,wherex,y,andzare
staticallyallocated,canbetranslatedintothecodesequence
LD
R0,y
ADD R0,R0,z
ST
x,R0

//R0 =y
(loadyintoregisterRO)
//R0 =R0+z
(addztoR0)
//x=R0
(storeROintox)

Thisstrategyoftenproducesredundantloadsandstores.Forexample,thesequenceofthree
addressstatements
a=b+c
d=a+e
wouldbetranslatedintothefollowingcode
LD
R0,b
//R0 =b
ADD R0,R0,c
//R0 =R0+c

ST

a,R0

//a=R0

CS6660

Compiler Design

LD
R0,a
ADD R0,R0,e
ST
d,R0

Unit V

5.23

//R0 =a
//R0 =R0+e
//d=R0

Here,thefourthstatementisredundantsinceitloadsavaluethathasjustbeenstored,andsoisthe
thirdifaisnotsubsequentlyused.
Thequalityofthegeneratedcodeisusuallydeterminedbyitsspeedandsize.Onmost
machines, a given IR program can be implemented by many different code sequences, with
significant cost differences between the different implementations. A naive translation of the
intermediatecodemaythereforeleadtocorrectbutunacceptablyinefficienttargetcode.
5RegisterAllocation
A key problem in code generation is deciding what values to hold in what registers.
Registersarethefastestcomputationalunitonthetargetmachine,butweusuallydonothave
enough of them to hold all values. Values not held in registers need to reside in memory.
Instructions involving register operands are invariably shorter and faster than those involving
operandsinmemory,soefficientutilizationofregistersisparticularlyimportant.
Theuseofregistersisoftensubdividedintotwosubproblems:
1. Registerallocation:Duringregisterallocation,selecttheappropriatesetofvariablesthat
willresideinregistersateachpointintheprogram.
2. Register assignment: During register assignment, pick the specific register in which
correspondingvariablewillresidein.
Findinganoptimalassignmentofregisterstovariablesisdifficult,evenwithsingleregister
machines. Mathematically, the problem is NPcomplete. The problem is further complicated
becausethehardwareand/ortheoperatingsystemofthetargetmachinemayrequirethatcertain
registerusageconventionsbeobserved.Certainmachinesrequireregisterpairsforsomeoperands
andresults.
Considerthetwothreeaddresscodesequences,theonlydifferenceistheoperatorinthesecond
statement.
t=a+b
t=t*c
t=t/d
TheefficientOptimalmachinecodesequenceswithonlyoneregisterR0
LD
R0,a
ADD R0,b
MULR0,c
DIV

R0,d

ST

R0,t

6EvaluationOrder
Theevaluationorderisanimportantfactoringeneratinganefficienttargetcode.Some
computation orders require fewer registers to hold intermediate results than others. However,
pickingabestorderinthegeneralcaseisadifficultNPcompleteproblem.Wecanavoidthe
problembygeneratingcodeforthethreeaddressstatementsintheorderinwhichtheyhavebeen
producedbytheintermediatecodegenerator.

CS6660

Compiler Design

Unit V

5.24

5.7ASIMPLECODEGENERATORALGORITHM
Acodegeneratorgeneratestargetcodeforasequenceofthreeaddressinstructions.Oneof
theprimaryissuesduringcodegenerationisdecidinghowtouseregisterstobestadvantage.Best
targetcodewilluseminimumregistersinexecution.
Therearefourprincipalusesofregisters:
Theoperandsofanoperationmustbeinregistersinordertoperformtheoperation.
Registersmakegoodtemporariesusedonlywithinasinglebasicblock.

Registersareusedtohold(global)valuesthatarecomputedinonebasicblockandusedin
otherblocks
Registersareoftenusedtohelpwithruntimestoragemanagement

Themachineinstructionsareoftheform
LDreg,mem
STmem,reg
OPreg,reg,reg
RegisterandAddressDescriptors

Foreachavailableregister,aregisterdescriptorkeepstrackofthevariablenameswhose
currentvalueisinthatregister.Initiallyallregisterdescriptorsareempty.Asthecode
generationprogresses,eachregisterwillholdthevalue.
Foreachprogramvariable,anaddressdescriptorkeepstrackofthelocationorlocations
wherethecurrentvalueofthatvariablecanbefound.Thelocationmightbearegister,a
memoryaddress,astacklocation,orsomesetofmorethanoneofthese.Theinformation
canbestoredinthesymboltableentryforthatvariablename.

FunctionGetReg

AnessentialpartofthealgorithmisafunctiongetReg(I),whichselectsregistersforeach
memorylocationassociatedwiththethreeaddressinstructionI.
FunctiongetReghasaccesstotheregisterandaddressdescriptorsforallthevariablesofthe
basicblock,andmayalsohaveaccesstocertainusefuldataflowinformationsuchasthe
variablesthatareliveonexitfromtheblock.
Inathreeaddressinstructionsuchasx=y+z,Apossibleimprovementtothealgorithmis
togeneratecodeforbothx=y+zandx=z+ywhenever+isacommutativeoperator,and
pickthebettercodesequence.

MachineInstructionsforOperations
ForathreeaddressinstructionwithOperations(+,, * , / , )suchasx=y+z,dothefollowing:
1. UsegetReg(x=y+z)toselectregistersforx,y,andz.CalltheseRx,Ry,andRz.
2. IfyisnotinRy(accordingtotheregisterdescriptorforRy),thenissueaninstructionLDRy,
y',wherey'isoneofthememorylocationsfory(bytheaddressdescriptorfory).
3. Similarly,ifzisnotinRz,issueandinstructionLDRz,z',wherez'isalocationforz.
4. IssuetheinstructionADDRx,Ry,Rz.
MachineInstructionsforCopyStatements
Consideranimportantspecialcase:athreeaddresscopystatementoftheformx=y.

WeassumethatgetRegwillalwayschoosethesameregisterforbothxandy.Ifyisnot
alreadyinthatregisterRy,thengeneratethemachineinstructionLDRy,y.
IfywasalreadyinRy,wedonothing.

CS6660

Compiler Design

Unit V

5.25

ManagingRegisterandAddressDescriptors

a.
b.

a.
c.
a.

Asthecodegenerationalgorithmissuesload,store,andothermachineinstructions,itneeds
toupdatetheregisterandaddressdescriptors.Therulesareasfollows:
1. FortheinstructionLDR,X
ChangetheregisterdescriptorforregisterRsoitholdsonlyx.
ChangetheaddressdescriptorforxbyaddingregisterRasanadditionallocation.
2. FortheinstructionST X,R,changetheaddressdescriptorforxtoincludeitsownmemory
location.
3. Foranoperation OP R, R, R suchasADD Rx, Ry,and Rz implementingathreeaddress
instructionx=y+x
ChangetheregisterdescriptorforRxsothatitholdsonlyx.
b. Changetheaddressdescriptorfor x sothatitsonlylocationis Rx.Notethatthe
memorylocationforxisnotnowintheaddressdescriptorforx.
RemoveRxfromtheaddressdescriptorofanyvariableotherthanx.
4. Whenweprocessacopystatementx=y,aftergeneratingtheloadforyintoregisterRy,if
needed,andaftermanagingdescriptorsasforallloadstatements(perruleI):
AddxtotheregisterdescriptorforRy.
b. ChangetheaddressdescriptorforxsothatitsonlylocationisRy.
Example5.16:Letustranslatethebasicblockconsistingofthethreeaddressstatements
t=ab
u=ac
v=t+u
a=d
d=v+u wheret,u,andvaretemporaries,localtotheblock,whilea,b,c,anddare
variablesthatareliveonexitfromtheblock.Whenaregister'svalueisnolongerneeded,thenwe
reuseitsregister.AsummaryofallthemachinecodeinstructionsgeneratedisinFigure.

CS6660

Compiler Design

Unit V

5.26

SUMMARY
THEPRINCIPALSOURCESOFOPTIMIZATION
SemanticsPreservingTransformations(Functions)Safeguardsoriginalprogram
meaningGlobalCommonSubexpressions
CopyPropagationDead
CodeEliminationCode
Motion/Movement
InductionVariablesandReductionin
StrengthOPTIMIZATIONOFBASICBLOCKS
TheDAGRepresentationofBasic
BlocksFindingLocalCommon
SubexpressionsDeadCodeElimination
TheUseofAlgebraicIdentities
RepresentationofArrayReferencesPointer
AssignmentsandProcedureCalls
ReassemblingBasicBlocksfromDAG's
GLOBALDATAFLOWANALYSIS
Pathsandpoints

ReachingDefinitions
DataflowanalysisofstructuredPrograms
ConservativeEstimationofDataFlow
InformationRepresentationofsets

EFFICIENTDATAFLOWALGORITHMSDepth
FirstOrderinginiterativeAlgorithms
StructurebasedDataFlowAnalysis
ISSUESINTHEDESIGNOFACODEGENERATOR
InputtotheCodeGenerator
TheTargetProgram
MemoryManagement
InstructionSelection
RegisterAllocation
EvaluationOrder

PEEPHOLEOPTIMIZATION
EliminatingRedundantLoadsand
StoresEliminatingUnreachableCode
FlowofControlOptimizations
AlgebraicSimplificationandReductionin
StrengthUseofMachineIdioms
DATAFLOWANALYSIS
TheDataFlowAbstraction
TheDataFlowAnalysisSchema
DataFlowSchemasonBasic
BlocksReachingDefinitions
LiveVariableAnalysis
AvailableExpressions
LOOPOPTINIZATION

CodeMotionwhile(i<max1){sum=sum+a[i]}=>n=max1;while(i<n){sum=sum+a[i]}
InductionVariablesandStrengthReduction:onlyoneInductionVariableinloop,eitheri++orj=j+2,*by
+Loopinvariantmethod
Loopunrolling
Loopfusion

COMPILETIMEEVALUATION
Constantfolding:Computationofconstantdoneatcompiletime,E.g.Clength=2*(22/7)*r.
Constantpropagation:Valueofvariableisreplacedandcomputedatcompiletime.
E.g.pi=3.14;r=6;Area=pi*r*r;,then,Areaiscomputedas3.14*6*6.
Variablepropagation:onevariableisreplacedbyanotheratcompiletime.
E.g.x=pi;Area=x*r*r;,then,Areaiscomputedaspi*r*r.

DR.PAULSENGINEERINGCOLLEGE
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING
Year&Semester
SubjectCode
SubjectName
Degree&Branch

:
:
:
:

III/VI
CS6660
COMPILERDESIGN
B.EC.S.E.
UNITIINTRODUCTIONTOCOMPILERS

1.Whatisacompiler?
Acompilerisaprogramthatreadsaprogramwritteninonelanguagethesourcelanguage
andtranslatesitintoanequivalentprograminanotherlanguagethetargetlanguage.The
compilerreportstoitsuserthepresenceoferrorsinthesourceprogram.
2. Whatarethetwopartsofacompilation?Explainbriefly.
AnalysisandSynthesisarethetwopartsofcompilation.
Theanalysispartbreaksupthesourceprogramintoconstituentpiecesandcreates
anintermediaterepresentationofthesourceprogram.
Thesynthesispartconstructsthedesiredtargetprogramfromtheintermediaterepresentation.
3. Listthesubpartsorphasesofanalysispart.
Analysisconsistsofthreephases:
LinearAnalysis.
HierarchicalAnalysis.
SemanticAnalysis.
4. Depictdiagrammaticallyhowalanguageisprocessed.
SkeletalsourceprogramPreprocessorSourceprogramCompilerTargetassembly
programAssemblerRelocatablemachinecodeLoader/linkeditorlibrary, relocatable
objectfiles
Absolutemachinecode
5.Whatislinearanalysis?
Linearanalysisisoneinwhichthestreamofcharactersmakingupthesourceprogramis
readfromlefttorightandgroupedintotokensthataresequencesofcharactershavinga
collectivemeaning.Alsocalledlexicalanalysisorscanning.

6.Listthevariousphasesofacompiler.
Thefollowingarethevariousphasesofacompiler:
LexicalAnalyzer
SyntaxAnalyzer
SemanticAnalyzer
Intermediatecodegenerator
Codeoptimizer
Codegenerator

7.Whataretheclassificationsofacompiler?
Compilersareclassifiedas:
pass
pass
andgo
8.Whatisasymboltable?
Asymboltableisadatastructurecontainingarecordforeachidentifier,withfieldsforthe
attributesoftheidentifier.Thedatastructureallowsustofindtherecordforeachidentifier
quicklyandtostoreorretrievedatafromthatrecordquickly.
Wheneveranidentifierisdetectedbyalexicalanalyzer,itisenteredintothesymboltable.
Theattributesofanidentifiercannotbedeterminedbythelexicalanalyzer.
9. Mentionsomeofthecousinsofacompiler.
Cousinsofthecompilerare:
Preprocessors
Assemblers
LoadersandLinkEditors
10. Listthephasesthatconstitutethefrontendofacompiler.
Thefrontendconsistsofthosephasesorpartsofphasesthatdependprimarilyonthe
sourcelanguageandarelargelyindependentofthetargetmachine.Theseinclude
LexicalandSyntacticanalysis
Thecreationofsymboltable
Semanticanalysis
Generationofintermediatecode
Acertainamountofcodeoptimizationcanbedonebythefrontendaswell.Alsoincludeserror
handlingthatgoesalongwitheachofthesephases.
11.Mentionthebackendphasesofacompiler.
Thebackendofcompilerincludesthoseportionsthatdependonthetargetmachineand
generallythoseportionsdonotdependonthesourcelanguage,justtheintermediatelanguage.
Theseinclude
Codeoptimization
Codegeneration,alongwitherrorhandlingandsymboltableoperations.
12. Definecompilercompiler.
Systemstohelpwiththecompilerwritingprocessareoftenbeenreferredtoas
compilercompilers,compilergeneratorsortranslatorwritingsystems.
Largelytheyareorientedaroundaparticularmodeloflanguages,andtheyaresuitablefor
generatingcompilersoflanguagessimilarmodel.

13.Listthevariouscompilerconstructiontools.
Thefollowingisalistofsomecompilerconstructiontools:
Scannergenerators
Parsergenerators
Syntaxdirectedtranslationengines
Dataflowanalysisengines
Codegeneratorgenerators
Compilerconstructiontoolkits

[LexicalAnalysis]
[SyntaxAnalysis]
[IntermediateCode]
[CodeOptimization]
[CodeGeneration]
[Forallphases]

14. Listoutlanguageprocessors
(i) Compiler
(ii) Interpreter
(iii) HybridCompiler
(iv)Languageprocessingsystem(Preprocessors,Assemblers,LinkersandLoader)
15. Listoutsomeprogramminglanguagebasics.
Todesignanefficientcompilerweshouldknowsomelanguagebasics.Important
conceptsfrompopularprogramminglanguageslikeC,C++,C#,andJavaarelistedbelow.
SomeoftheProgrammingLanguagebasicswhichareusedinmostofthelanguagesare
listedbelow.Theyare:
TheStatic/DynamicDistinction
EnvironmentsandStates
StaticScopeandBlockStructure
ExplicitAccessControl
DynamicScope
ParameterPassingMechanisms
Aliasing

UNITIILEXICALANALYSIS
1. WritetheNeeds/Roles/Functionsoflexicalanalyzer

Itproducesstreamoftokens.
Iteliminatescommentsandwhitespace.
Itkeepstrackoflinenumbers.
Itreportstheerrorencounteredwhilegeneratingtokens.
Itstoresinformationaboutidentifiers,keywords,constantsandsoonintosymboltable.

2. Differentiatetokens,patterns,lexeme.
Sequenceofcharactersthathaveacollectivemeaning.
Thereisasetofstringsintheinputforwhichthesametokenisproducedas
output.Thissetofstringsisdescribedbyarulecalledapatternassociatedwith
thetoken
meAsequenceofcharactersinthesourceprogramthatismatchedby
thepatternforatoken.
2.Listtheoperationsonlanguages.
UnionLUM={s|sisinLorsisinM}
ConcatenationLM={st|sisinLandtisinM}
KleeneClosureL*(zeroormoreconcatenationsofL)
PositiveClosureL+(oneormoreconcatenationsofL)
3.Writearegularexpressionforanidentifier.
Anidentifierisdefinedasaletterfollowedbyzeroormorelettersordigits.
Theregularexpressionforanidentifierisgivenasletter(letter|digit)*
4.Mentionthevariousnotationalshorthandsforrepresentingregularexpressions.
regular
expressionsa|b|c.)
5.Whatisthefunctionofahierarchicalanalysis?
Hierarchicalanalysisisoneinwhichthetokensaregroupedhierarchicallyinto
nestedcollectionswithcollectivemeaning.AlsotermedasParsing.
6.Whatdoesasemanticanalysisdo?
Semanticanalysisisoneinwhichcertainchecksareperformedtoensurethatcomponentsof
aprogramfittogethermeaningfully.
Mainlyperformstypechecking.
7.Listthevariouserrorrecoverystrategiesforalexicalanalysis.

Possibleerrorrecoveryactionsare:

10. Definenullable(n),firstpos(n),lastpos(n)andfollowpos(p)
1. nullable(n)istrueforasyntaxtreenodenifandonlyifthesubexpressionrepresented
by nhas initslanguage.Thatis,thesubexpressioncanbe"madenull"ortheempty
string,eventhoughtheremaybeotherstringsitcanrepresentaswell.
2. firstpos(n) isthesetofpositionsinthesubtreerootedatnthatcorrespondtothefirst
symbolofatleastonestringinthelanguageofthesubexpressionrootedatn.
3. lastpos(n) isthesetofpositionsinthesubtreerootedatnthatcorrespondtothelast
symbolofatleastonestringinthelanguageofthesubexpressionrootedatn.
4. followpos(p),forapositionp,isthesetofpositionsqintheentiresyntaxtreesuchthat
thereissomestringx=a1a2aninL((r)#)suchthatforsomeI,thereisawaytoexplain
themembershipofxinL((r)#)bymatchinga itopositionpofthesyntaxtreeandai+1to
positionq.

12.WritethealgorithmforConvertingaRegularExpressionDirectlytoaDFA
Algorithm:ConstructionofaDFAfromaregularexpressionr.
INPUT:Aregularexpressionr.
OUTPUT:ADFADthatrecognizesL(r).
METHOD:
1. ConstructasyntaxtreeTfromtheaugmentedregularexpression(r)#.
2. Computenullable,firstpos,lastpos,andfollowposforT.
initializeDstatestocontainonlytheunmarkedstatefirstpos(no),
wherenoistherootofsyntaxtreeTfor(r)#;
3. ConstructDstates,thesetofstatesofDFAD,andDtran,thetransitionfunctionforD,
while(thereisanunmarkedstateSinDstates)
{
markS;
for(eachinputsymbola)
{
letUbetheunionoffollowpos(p)forallpinSthatcorrespondtoa;
if(UisnotinDstates)

By
theaboveprocedure.ThestatesofDaresetsofpositionsinT.Initially,eachstateis"unmarked,"
andastatebecomes"marked"justbeforeweconsideritsouttransitions.ThestartstateofDis
firstpos(no),wherenodenoistherootofT.Theacceptingstatesarethosecontainingtheposition
fortheendmarkersymbol#.
13.WritetheStructureofLexPrograms

ALexprogramhasthefollowingform:
DECLARATIONS

%%
TRANSLATION RULES

14.ConstructaDFAandfirstposandlastposfornodesfortheregularexpressionr=(a|b)*abb

UNITIIISYNTAXANALYSIS
1. Defineparser.
Hierarchicalanalysisisoneinwhichthetokensaregroupedhierarchicallyinto
nestedcollectionswithcollectivemeaning.
AlsotermedasParsing.
2. Mentionthebasicissuesinparsing.
Therearetwoimportantissuesinparsing.
3. Whylexicalandsyntaxanalyzersareseparatedout?
Reasonsforseparatingtheanalysisphaseintolexicalandsyntaxanalyzers:
Simplerdesign.
Compilerefficiencyisimproved.
Compilerportabilityisenhanced.
4. Defineacontextfreegrammar.
AcontextfreegrammarGisacollectionofthefollowing

GcanberepresentedasG=(V,T,S,P)
Productionrulesaregiveninthefollowingform
Non terminal (V U T)*
5. Brieflyexplaintheconceptofderivation.
DerivationfromSmeansgenerationofstringwfromS.Forconstructingderivationtwo
thingsareimportant.
i) Choiceofnonterminalfromseveralothers.
ii) Choiceofrulefromproductionrulesforcorrespondingnonterminal.
Insteadofchoosingthearbitrarynonterminalonecanchoose
i) eitherleftmostderivationleftmostnonterminalinasentinelform
ii) orrightmostderivationrightmostnonterminalinasentinelform
6. Defineambiguousgrammar.
AgrammarGissaidtobeambiguousifitgeneratesmorethanoneparsetreeforsome
sentenceoflanguageL(G).
i.e.bothleftmostandrightmostderivationsaresameforthegivensentence.
7. Whatisaoperatorprecedenceparser?
Agrammarissaidtobeoperatorprecedenceifitpossessthefollowingproperties:
1. No production on the right side is .
2. Thereshouldnotbeanyproductionrulepossessingtwoadjacentnonterminalsattheright
handside.
8. ListthepropertiesofLRparser.
1. LR parsers can be constructed to recognize most of the
programminglanguagesforwhichthecontextfreegrammarcanbe
written.
2. The class of grammar that can be parsed by LR parser is a
supersetofclassofgrammarsthatcanbeparsedusingpredictive
parsers.

3.LRparsersworkusingnonbacktrackingshiftreducetechniqueyet
itisefficientone.
9. MentionthetypesofLRparser.
simpleLRparser
lookaheadLRparser
10. Whataretheproblemswithtopdownparsing?
Thefollowingaretheproblemsassociatedwithtopdownparsing:

ty
11.WritethealgorithmforFIRSTandFOLLOW.
FIRST
1.IfXisterminal,thenFIRST(X)IS{X}.
. If X is a production, then add to FIRST(X).
. If X is non terminal and X Y1,Y..Yk is a production, then place a in FIRST(X)
if for somei,aisinFIRST(Yi) , and is in all of FIRST(Y1),FIRST(Yi1);
FOLLOW
1.Place$inFOLLOW(S),whereSisthestartsymboland$istheinputrightendmarker.
. If there is a production A B , then everything in FIRST( ) except for is placed in
FOLLOW(B).
3.Ifthere is a production A B, or a production A B where FIRST( ) contains
,theneverythinginFOLLOW(A)isinFOLLOW(B).
12. List the advantages and disadvantages of operator precedence
parsing.Advantages
Thistypeofparsingissimpletoimplement.
Disadvantages
1. Theoperatorlikeminushastwodifferentprecedence(unaryandbinary).Henceitishardto
handletokenslikeminussign.
2. Thiskindofparsingisapplicabletoonlysmallclassofgrammars.
13. Whatisdanglingelseproblem?
Ambiguitycanbeeliminatedbymeansofdanglingelsegrammarwhichisshowbelow:
stmt if expr then stmt
|ifexprthenstmtelsestmt
|other
14. WriteshortnotesonYACC.
YACCisanautomatictoolforgeneratingtheparserprogram.
YACC stands for Yet Another Compiler Compiler which is basically the utility
availablefromUNIX.
BasicallyYACCisLALRparsergenerator.
Itcanreportconflictorambiguitiesintheformoferrormessages.
15. Whatismeantbyhandlepruning?
Arightmostderivationinreversecanbeobtainedbyhandlepruning.
Ifwisasentenceofthegrammarathand,thenw= n, where n is the nth rightsentential
formofsomeasyetunknownrightmostderivation

S = 0 => 1=> n1 => n = w


16. DefineLR(0)items.
AnLR(0)itemofagrammarGisaproductionofGwithadotatsomepositionofthe
rightside.Thus,production A XYZ yields the four items
A.XYZ
AX.YZ
AXY.Z
AXYZ.
17. Whatismeantbyviableprefixes?
Thesetofprefixesofrightsententialformsthatcanappearonthestackofashiftreduce
parserarecalledviableprefixes.Anequivalentdefinitionofaviableprefixisthatitisa
prefixofarightsententialformthatdoesnotcontinuepasttherightendoftherightmost
handleofthatsententialform.
18. Definehandle.
Ahandleofastringisasubstringthatmatchestherightsideofaproduction,andwhose
reductiontothenonterminalontheleftsideoftheproductionrepresentsonestepalongthe
reverseofarightmostderivation.
Ahandleofaright sentential form is a production A and a position of where the
string may be found and replaced by A to produce the previous rightsententialformin
a rightmost derivation of . That is , if S =>Aw => w,then A in the position
following is a handle of w.
19. Whatarekernel&nonkernelitems?
Kernelitems,whish include the initial item, S'.S,andallitemswhosedotsarenotat
theleftend.
Nonkernelitems,whichhavetheirdotsattheleftend.
20. Whatisphraselevelerrorrecovery?
Phraselevelerrorrecoveryisimplementedbyfillingintheblankentriesinthepredictive
parsingtablewithpointerstoerrorroutines.Theseroutinesmaychange,insert,ordelete
symbolsontheinputandissueappropriateerrormessages.Theymayalsopopfromthestack.

10

UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT
1. Whatarethebenefitsofintermediatecodegeneration?
ACompilerfordifferentmachinescanbecreatedbyattachingdifferentback
endtotheexistingfrontendsofeachmachine.
ACompilerfordifferentsourcelanguagescanbecreatedbyprovingdifferent
frontendsforcorrespondingsourcelanguagestexistingbackend.
Amachineindependentcodeoptimizercanbeappliedtointermediatecode
inordertooptimizethecodegeneration.
2. Whatarethevarioustypesofintermediatecoderepresentation?
Therearemainlythreetypesofintermediatecoderepresentations.

3.

4.

Definebackpatching.
Backpatchingistheactivityoffillingupunspecifiedinformationoflabelsusing
appropriatesemanticactionsinduringthecodegenerationprocess.Inthesemantic
actionsthefunctionsusedaremklist(i),merge_list(p1,p2)andbackpatch(p,i)
Mentionthefunctionsthatareusedinbackpatching.
functionwhereIisanindextothearrayofquadruple.
p2)thisfunctionconcatenatestwolistspointedbyp1andp2.
Itreturnsthepointertotheconcatenatedlist.
5. Whatistheintermediatecoderepresentationfortheexpressionaorbandnotc?
Theintermediatecoderepresentationfortheexpressionaorbandnotcisthe
threeaddresssequence
t1:=notc
t2:=bandt1
t3:=aort2
6. Whatarethevariousmethodsofimplementingthreeaddressstatements?
Thethreeaddressstatementscanbeimplementedusingthefollowingmethods.
operator(OP),arg1,arg2,result.
thesymboltable.
usedinsteadofusingstatements.
7. Givethesyntaxdirecteddefinitionforifelsestatement.
1. S if E then S1
E.true:=new_label()
E.false:=S.next
S1.next:=S.next
S.code :=E.code | | gen_code(E.true : ) | |
S1.code . S if E thenS1elseS2
E.true:=new_label()

ingpointersare

11

E.false:=new_label()
S1.next:=S.next
S2.next:=S.next
S.code :=E.code | | gen_code(E.true : ) | | S1.code| | gen_code(go
to,S.next) | |gen_code(E.false :) | | S.code

12

UNITVCODEOPTIMIZATIONANDCODEGENERATION
1.Mentionthepropertiesthatacodegeneratorshouldpossess.
words,thecodegeneratedshouldbesuchthatitshouldmakeeffectiveuse
oftheresourcesofthetargetmachine.
2. Listtheterminologiesusedinbasicblocks.
Defineandusethethreeaddressstatementa:=b+cissaidtodefineaand
tousebandc.
Liveanddeadthenameinthebasicblockissaidtobeliveatagiven
pointifitsvalueisusedafterthatpointintheprogram.Andthenameinthe
basicblockissaidtobedeadatagivenpointifitsvalueisneverusedafter
thatpointintheprogram.
3. Whatisaflowgraph?
Aflowgraphisadirectedgraphinwhichtheflowcontrolinformationisaddedtothe
basicblocks.
B1toblockB2ifB2immediatelyfollows
B1inthegivensequence.WecansaythatB1isapredecessorofB2.
4. WhatisaDAG?Mentionitsapplications.

Directed acyclic graph(DAG) is a useful data structure for implementing


transformationsonbasicblocks.
DAGisusedin
expressions.
theblock.
outsidetheblock.
listofquadruplesbyeliminatingthecommonsuexpressions
andnotperformingtheassignmentoftheformx:=yunlessanduntilitisa
must.
5. Definepeepholeoptimization.
Peepholeoptimizationisasimpleandeffectivetechniqueforlocallyimprovingtarget
code.Thistechniqueisappliedtoimprovetheperformanceofthetargetprogramby
examiningtheshortsequenceoftargetinstructionsandreplacingtheseinstructions
byshorterorfastersequence.
6. Listthecharacteristicsofpeepholeoptimization.

7.

Howdoyoucalculatethecostofaninstruction?
The costofaninstruction canbecomputed as one plus costassociated withthe
sourceanddestinationaddressingmodesgivenbyaddedcost.
MOVR0,R11

13

MOVR1,M2
SUB5(R0),*10(R1)3
8. Whatisabasicblock?
Abasicblockisasequenceofconsecutivestatementsinwhichflowofcontrolenters
atthebeginningandleavesattheendwithouthaltorpossibilityofbranching.
Eg.t1:=a*5
t2:=t1+7
t3:=t25
t4:=t1+t3
t5:=t2+b

9. Mention the issues to be considered while applying the techniques for


codeoptimization.
overtheprogramefficiencymustbeachievedwithout
changingthealgorithmoftheprogram.
10. Whatarethebasicgoalsofcodemovement?
Toreducethesizeofthecodei.e.toobtainthespacecomplexity.
Toreducethefrequencyofexecutionofcodei.e.toobtainthetimecomplexity.
11. Whatdoyoumeanbymachinedependentandmachineindependentoptimization?
machinefortheinstructionsetusedandaddressingmodesusedforthe
instructionstoproducetheefficienttargetcode.
programminglanguagesforappropriateprogrammingstructureandusageof
efficientarithmeticpropertiesinordertoreducetheexecutiontime.
12. Whatarethedifferentdataflowproperties?

13.

Listthedifferentstorageallocationstrategies.
Thestrategiesare:
Heapallocation
14. Whatarethecontentsofactivationrecord?
Theactivationrecordisablockofmemoryusedformanagingtheinformationneeded
byasingleexecutionofaprocedure.Variousfieldsfactivationrecordare:
iables

14

15.Whatisdynamicscoping?
Indynamicscopingauseofnonlocalvariablereferstothenonlocaldatadeclaredinmost
recentlycalledandstillactiveprocedure.Thereforeeachtimenewfindingsaresetupfor
localnamescalledprocedure.Indynamicscopingsymboltablescanberequiredatrun
time.16.Definesymboltable.
Symboltableisadatastructureusedbythecompilertokeeptrackofsemanticsof
thevariables.Itstoresinformationaboutscopeandbindinginformationaboutnames.
Whatiscodemotion?
Codemotionisanoptimizationtechniqueinwhichamountofcodeinaloopisdecreased.
Thistransformationisapplicabletotheexpressionthatyieldsthesameresultindependent
ofthenumberoftimestheloopisexecuted.Suchanexpressionisplacedbeforetheloop.
Whatarethepropertiesofoptimizingcompiler?
Thesourcecodeshouldbesuchthatitshouldproduceminimumamountoftargetcode.
Thereshouldnotbeanyunreachablecode.
Deadcodeshouldbecompletelyremovedfromsourcelanguage.
Theoptimizingcompilersshouldapplyfollowingcodeimprovingtransformations
onsourcelanguage.
i) commonsubexpressionelimination
ii) deadcodeelimination
iii) codemovement
iv) strengthreduction
20.Suggestasuitableapproachforcomputinghashfunction.
Usinghashfunctionweshouldobtainexactlocationsofnameinsymboltable.The
hashfunctionshouldresultinuniformdistributionofnamesinsymboltable.
Thehashfunctionshouldbesuchthattherewillbeminimumnumberofcollisions.Collisionis
suchasituationwherehashfunctionresultsinsamelocationforstoringthenames.

17.

18.

15

REFERENCES:
1. AlfredVAho,MonicaS.Lam,RaviSethiandJeffreyDUllman, Compilers
Principles,TechniquesandTools,2ndEdition,PearsonEducation,2007.
2. RandyAllen,KenKennedy, OptimizingCompilersforModernArchitectures:A
DependencebasedApproach,MorganKaufmannPublishers,2002.
3.
StevenS.Muchnick, AdvancedCompilerDesignandImplementation, Morgan
KaufmannPublishersElsevierScience,India,IndianReprint2003.
4.
KeithDCooperandLindaTorczon,EngineeringaCompiler,MorganKaufmann
PublishersElsevierScience,2004.
5.
CharlesN.Fischer,Richard.J.LeBlanc, CraftingaCompilerwithC,Pearson
Education,2008.

Vous aimerez peut-être aussi