Académique Documents
Professionnel Documents
Culture Documents
DESIGN
G.Appasami,M.Sc.,M.C.A.,M.Phil.,M.Tech.,(Ph.D.)
AssistantProfessor
DepartmentofComputerScienceandEngineering
Dr.PaulsEngineeringCollage
PaulsNagar,Villupuram
Tamilnadu,India
SARUMATHIPUBLICATIONS
Villupuram,Tamilnadu,India
FirstEdition:July2015
SecondEdition:April2016
PublishedBy
SARUMATHIPUBLICATIONS
Allrightsreserved.Nopartofthispublicationcanbereproducedorstoredinanyformor
bymeansofphotocopy,recordingorotherwisewithoutthepriorwrittenpermissionofthe
author.
PriceRs.101/
Copiescanbehadfrom
SARUMATHIPUBLICATIONS
Villupuram,Tamilnadu,India.
Sarumathi.publications@gmail.com
Printedat
MeenamOffset
Pondicherry605001,India
CS6660
COMPILERDESIGN
UNITIINTRODUCTIONTOCOMPILERS
LTPC
3003
5
TranslatorsCompilationandInterpretationLanguageprocessorsThePhasesofCompiler
Errors Encountered in Different PhasesThe Grouping of PhasesCompiler Construction
ToolsProgrammingLanguagebasics.
UNITIILEXICALANALYSIS
10
Need and Role of the ParserContext Free Grammars Top Down Parsing General
StrategiesRecursiveDescentParserPredictiveParserLL(1)ParserShiftReduceParserLR
ParserLR(0)ItemConstructionofSLRParsingTableIntroductiontoLALRParserError
Handling and Recovery in Syntax AnalyzerYACCDesign of a syntax Analyzer for a
SampleLanguage.
UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT12
Syntax directed DefinitionsConstruction of Syntax TreeBottomup Evaluation of S
Attribute Definitions Design of predictive translator Type SystemsSpecification of a
simpletypecheckerEquivalenceofTypeExpressionsTypeConversions.
RUNTIME ENVIRONMENT: Source Language IssuesStorage OrganizationStorage
Allocation Parameter PassingSymbol TablesDynamic Storage AllocationStorage
AllocationinFORTAN.
UNITVCODEOPTIMIZATIONANDCODEGENERATION
PrincipalSourcesofOptimizationDAGOptimizationofBasicBlocksGlobalDataFlow
AnalysisEfficientDataFlowAlgorithmsIssuesinDesignofaCodeGeneratorASimple
CodeGeneratorAlgorithm.
TOTAL:45PERIODS
TEXTBOOK:
1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, Compilers
Principles,TechniquesandTools,2ndEdition,PearsonEducation,2007.
REFERENCES:
1. RandyAllen,KenKennedy, OptimizingCompilersforModernArchitectures:A
DependencebasedApproach,MorganKaufmannPublishers,2002.
2. Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan
KaufmannPublishersElsevierScience,India,IndianReprint2003.
3. KeithDCooperandLindaTorczon,EngineeringaCompiler,MorganKaufmann
PublishersElsevierScience,2004.
4. Charles N. Fischer, Richard. J. LeBlanc, Crafting a Compiler with C, Pearson
Education,2008.
Acknowledgement
Mr.G.Appasami
TABLEOFCONTENTS
UNITIINTRODUCTIONTOCOMPILERS
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Translators
CompilationandInterpretation
Languageprocessors
ThePhasesofCompiler
ErrorsEncounteredinDifferentPhases
TheGroupingofPhases
CompilerConstructionTools
ProgrammingLanguagebasics
1.1
1.1
1.1
1.3
1.8
1.9
1.10
1.10
UNITIILEXICALANALYSIS
2.1
2.2
2.3
2.4
2.5
2.6
2.7
NeedandRoleofLexicalAnalyzer
LexicalErrors
ExpressingTokensbyRegularExpressions
ConvertingRegularExpressiontoDFA
MinimizationofDFA
LanguageforSpecifyingLexicalAnalyzersLEX
DesignofLexicalAnalyzerforasampleLanguage
2.1
2.3
2.3
2.6
2.9
2.10
2.12
UNITIIISYNTAXANALYSIS
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
NeedandRoleoftheParser
ContextFreeGrammars
TopDownParsingGeneralStrategies
RecursiveDescentParser
PredictiveParser
LL(1)Parser
ShiftReduceParser
LRParser
LR(0)Item
ConstructionofSLRParsingTable
IntroductiontoLALRParser
ErrorHandlingandRecoveryinSyntaxAnalyzer
YACC
DesignofasyntaxAnalyzerforaSampleLanguage
3.1
3.1
3.9
3.10
3.11
3.12
3.14
3.15
3.17
3.18
3.22
3.26
3.27
3.29
UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
SyntaxdirectedDefinitions
ConstructionofSyntaxTree
BottomupEvaluationofSAttributeDefinitions
Designofpredictivetranslator
TypeSystems
Specificationofasimpletypechecker
EquivalenceofTypeExpressions
TypeConversions
RUNTIMEENVIRONMENT:SourceLanguageIssues
StorageOrganization
StorageAllocation
ParameterPassing
SymbolTables
DynamicStorageAllocation
StorageAllocationinFORTAN
4.1
4.2
4.3
4.6
4.7
4.8
4.10
4.14
4.16
4.19
4.21
4.23
4.24
4.28
4.29
UNITVCODEOPTIMIZATIONANDCODEGENERATION
5.1
5.2
5.3
5.4
5.5
5.6
5.7
PrincipalSourcesofOptimization
DAG
OptimizationofBasicBlocks
GlobalDataFlowAnalysis
EfficientDataFlowAlgorithms
IssuesinDesignofaCodeGenerator
ASimpleCodeGeneratorAlgorithm
5.1
5.8
5.9
5.15
5.19
5.21
5.24
CS6660 __
Compiler Design
Unit I
_____1.1
UNITIINTRODUCTIONTOCOMPILERS
1.1TRANSLATORS
Atranslatorisonekindofprogramthattakesoneformofprogram(input)andconvertsinto
anotherform(output).Theinputprogramiscalled source languageandtheoutputprogramis
calledtargetlanguage.
Thesourcelanguagecanbelowlevellanguagelikeassemblylanguageorahighlevel
languagelikeC,C++,JAVA,FORTRAN,andsoon.
The target language can be a low level language (assembly language) or a machine
language(setofinstructionsexecuteddirectlybyaCPU).
Source
Translator
language
Targe
t
langua
ge
Figure1.1:Translator
TypesofTranslatorsare:
(1).Compilers
(2).Interpreters
(3).Assemblers
1.2COMPILATIONANDINTERPRETATION
A compiler isaprogramthatreadsaprograminonelanguageandtranslatesitintoan
equivalentprograminanotherlanguage.Thetranslationdonebyacompileriscalledcompilation.
Aninterpreterisanothercommonkindoflanguageprocessor.Insteadofproducingatarget
programasatranslation,aninterpreterappearstodirectlyexecutetheoperationsspecifiedinthe
source program on inputs supplied by the user. An interpreter executes the source program
statementbystatement.ThetranslationdonebyaninterpreteriscalledInterpretation.
1.3LANGUAGEPROCESSORS
(i)Compiler
Acompilerisaprogramthatcanreadaprograminonelanguage(thesourcelanguage)and
translateitintoanequivalentprograminanotherlanguage(thetargetlanguage)compilationis
showninFigure1.2.
Sourc
e
program
Compiler
(Input
)
Target
program
(Output)
Figure1.2:ACompiler
Animportantroleofthecompileristoreportanyerrorsinthesourceprogramthatitdetects
duringthetranslationprocess.
Ifthetargetprogramisanexecutablemachinelanguageprogram,itcanthenbecalledby
theusertoprocessinputsandproduceoutputs.
Input
Target Program
Figure1.3:Runningthetargetprogram
Output
CS6660 __
Compiler Design
Unit I
_____1.2
(ii)Interpreter
Aninterpreterisanothercommonkindoflanguageprocessor.Insteadofproducingatarget
programasatranslation,aninterpreterappearstodirectlyexecutetheoperationsspecifiedinthe
sourceprogramoninputssuppliedbytheuser,asshowninFigure1.4.
Source
Program
Interpreter
Input
Outp
ut
Figure1.4:Aninterpreter
Themachinelanguagetargetprogramproducedbyacompilerisusuallymuchfasterthan
aninterpreter(mappinginputstooutputsiseasyincompiler).
Compilerconvertsthesourcetotargetcompletely,butaninterpreterexecutesthesource
programstatementbystatement.UsuallyinterpretergivesbettererrordiagnosticsthanaCompiler.
(iii)HybridCompiler
Hybrid Compiler is combination of compilation and interpretation. Java language
processorscombinecompilationandinterpretationasshowninFigure1.4.
Javasource programfirst becompiledinto anintermediate formcalled bytecodes.The
bytecodesaretheninterpretedbyavirtualmachine.
Abenefitofthisarrangementisthatbytecodescompiledononemachinecanbeinterpreted
onanothermachine.
Source program
Translator
Intermediate
program
Input
Virtual
Machine
Outp
ut
Figure1.5:Ahybridcompiler
Inordertoachievefasterprocessingofinputstooutputs,someJavacompilers,calledjust
intimecompilers,translatethebytecodesintomachinelanguageimmediatelybeforetheyrun.
(iv)Languageprocessingsystem
Inadditiontoacompiler,severalotherprogramsmayberequiredtocreateanexecutable
targetprogram,asshowninFigure1.6.
Preprocessor:Preprocessorcollectsthesourceprogramwhichisdividedintomodulesandstored
inseparatefiles.Thepreprocessormayalsoexpandshorthandscalledmacrosintosourcelanguage
statements.E.g.#include<math.h>,#definePI.14
Compiler:Themodifiedsourceprogramisthenfedtoacompiler.Thecompilermayproducean
assemblylanguageprogramasitsoutput.becauseassemblylanguageiseasiertoproduceasoutput
andiseasiertodebug.
Assembler: The assembly language is then processed by a program called an assembler that
producesrelocatablemachinecodeasitsoutput.
CS6660 __
Compiler Design
Unit I
_____1.3
Linker:Thelinkerresolvesexternalmemoryaddresses,wherethecodeinonefilemayrefertoa
locationinanotherfile.Largeprogramsareoftencompiledinpieces,sotherelocatablemachine
codemayhavetobelinkedtogetherwithotherrelocatableobjectfilesandlibraryfilesintothe
codethatactuallyrunsonthemachine.
Loader:Theloaderthenputstogetheralloftheexecutableobjectfilesintomemoryforexecution.
Italsoperformsrelocationofanobjectcode.
Figure1.6:Alanguageprocessingsystem
Note:Preprocessors,Assemblers,LinkersandLoaderarecollectivelycalledcousinsofcompiler.
1.4THEPHASESOFCOMPILER/STRUCTUREOFCOMPILER
Theprocessofcompilationcarriedoutintwoparts,theyareanalysisandsynthesis.The
analysispartbreaks upthesourceprogramintoconstituentpiecesandimposesagrammatical
structureonthem.
Itthenusesthisstructuretocreateanintermediaterepresentationofthesourceprogram.
Theanalysispartalsocollectsinformationaboutthesourceprogramandstoresitinadatastructure
calledasymboltable,whichispassedalongwiththeintermediaterepresentationtothesynthesis
part.
Theanalysispartcarriedoutinthreephases,theyarelexicalanalysis,syntaxanalysisand
SemanticAnalysis.Theanalysispartisoftencalledthefrontendofthecompiler.Thesynthesispart
constructsthedesiredtargetprogramfromtheintermediaterepresentationandtheinformationin
thesymboltable.
The synthesispart carriedoutinthreephases,theyare IntermediateCodeGeneration,
CodeOptimizationandCodeGeneration.Thesynthesispartiscalledthebackendofthecompiler.
CS6660 __
Compiler Design
Unit I
_____1.4
Figure1.7:Phasesofacompiler
1.4.1LexicalAnalysis
Thefirstphaseofacompileriscalledlexicalanalysisorscanningorlinearanalysis.The
lexical analyzer reads the stream of characters making up the source program and groups the
charactersintomeaningfulsequencescalledlexemes.
Foreachlexeme,thelexicalanalyzerproducesasoutputatokenoftheform
<tokenname,attributevalue>
Thefirstcomponenttokennameisanabstractsymbolthatisusedduringsyntaxanalysis,
andthesecondcomponentattributevaluepointstoanentryinthesymboltableforthistoken.
Informationfromthesymboltableentry'isneededforsemanticanalysisandcodegeneration.
Forexample,supposeasourceprogramcontainstheassignmentstatement
position=initial+rate*60
(1.1)
CS6660 __
Compiler Design
Unit I
_____1.5
Figure1.8:Translationofanassignmentstatement
Thecharactersinthisassignmentcouldbegroupedintothefollowinglexemesandmappedintothe
followingtokens.
(2)
(3)
(4)
(5)
(6)
(7)
<id,1><=><id,2><+><id,3><*><60>
(1.2)
CS6660 __
Compiler Design
Unit I
_____1.6
1.4.2SyntaxAnalysis
Thesecondphaseofthecompilerissyntaxanalysisorparsingorhierarchicalanalysis.
Theparserusesthefirstcomponentsofthetokensproducedbythelexicalanalyzertocreate
atreelikeintermediaterepresentationthatdepictsthegrammaticalstructureofthetokenstream.
Thehierarchicaltreestructuregeneratedinthisphaseiscalledparsetreeorsyntaxtree.
Inasyntaxtree,eachinteriornoderepresentsanoperationandthechildrenofthenode
representtheargumentsoftheoperation.
Figure1.9:Syntaxtreeforposition=initial+rate*60
Thetreehasaninteriornodelabeled*with<id,3>asitsleftchildandtheinteger60asits
right child. The node <id, 3> represents the identifier rate. Similarly <id,2> and <id, 1> are
representedasintree.Therootofthetree,labeled=,indicatesthatwemuststoretheresultofthis
additionintothelocationfortheidentifierposition.
1.4.3SemanticAnalysis
Thesemanticanalyzerusesthesyntaxtreeandtheinformationinthesymboltabletocheck
thesourceprogramforsemanticconsistencywiththelanguagedefinition.
Itensuresthecorrectnessoftheprogram,matchingoftheparenthesisisalsodoneinthis
phase.
Italsogatherstypeinformationandsavesitineitherthesyntaxtreeorthesymboltable,for
subsequentuseduringintermediatecodegeneration.
Animportantpartofsemanticanalysisistypechecking,wherethecompilerchecksthat
eachoperatorhasmatchingoperands.
Thecompilermustreportanerrorifafloatingpointnumberisusedtoindexanarray.The
languagespecificationmaypermitsometypeconversionslikeintegertofloatforfloatadditionis
calledcoercions.
Theoperator*isappliedtoafloatingpointnumberrateandaninteger60.Theintegermay
beconvertedintoafloatingpointnumberbytheoperator inttofloat explicitlyasshowninthe
figure.
Figure1.10:Semantictreeforposition=initial+rate*60
1.4.4IntermediateCodeGeneration
Aftersyntaxandsemanticanalysisofthesourceprogram,manycompilersgeneratean
explicitlowlevelormachinelikeintermediaterepresentation.
Theintermediaterepresentationhavetwoimportantproperties:
a. Itshouldbeeasytoproduce
b. Itshouldbeeasytotranslateintothetargetmachine.
CS6660 __
Compiler Design
Unit I
_____1.7
Threeaddresscodeisoneoftheintermediaterepresentations,whichconsistsofasequence
of assemblylike instructions with three operands per instruction. Each operand can act like a
register.
TheoutputoftheintermediatecodegeneratorinFigure1.8consistsofthethreeaddresscode
sequenceforposition=initial+rate*60
t1=inttofloat(60)
t2=id3*t1
t3=id2+t2
id1=t3
(1.3)
1.4.5CodeOptimization
The machineindependent codeoptimization phaseattempts to improve theintermediate
codesothatbettertargetcodewillresult.Usuallybettermeansfaster.
Optimizationhastoimprovetheefficiencyofcodesothatthetargetprogramrunningtime
andconsumptionofmemorycanbereduced.
Theoptimizercandeducethattheconversionof60fromintegertofloatingpointcanbe
doneonceandforallatcompiletime,sotheinttofloatoperationcanbeeliminatedbyreplacingthe
integer60bythefloatingpointnumber60.0.
Moreover,t3isusedonlyoncetotransmititsvaluetoid1sotheoptimizercantransform
(1.3)intotheshortersequence
t1=id3*60.0
id1=id2+t1
(1.4)
1.4.6CodeGeneration
Thecodegeneratortakesasinputanintermediaterepresentationofthesourceprogramand
mapsitintothetargetlanguage.
Ifthetargetlanguageismachinecode,thentheregistersormemorylocationsareselected
foreachofthevariablesusedbytheprogram.
Theintermediateinstructionsaretranslatedintosequencesofmachineinstructions.
Forexample,usingregistersR1andR2,theintermediatecodein(1.4)mightgettranslated
intothemachinecode
LDF R2,id3
MULFR2,R2,#60.0
LDF Rl,id2
ADDFRl,Rl,R2
STF idl,Rl
(1.5)
Thefirstoperandofeachinstructionspecifiesadestination.TheFineachinstructiontells
usthatitdealswithfloatingpointnumbers.
Thecodein(1.5)loadsthecontentsofaddressid3intoregisterR2,thenmultipliesitwith
floatingpointconstant60.0.The#signifiesthat60.0istobetreatedasanimmediateconstant.The
thirdinstructionmovesid2intoregisterR1andthefourthaddstoitthevaluepreviouslycomputed
in register R2. Finally, the value in register R1 is stored into the address of id1, so the code
correctlyimplementstheassignmentstatement(1.1).
CS6660 __
Compiler Design
Unit I
_____1.8
1.4.7 SymbolTableManagement
Thesymboltable,whichstoresinformationabouttheentiresourceprogram,isused
byallphasesofthecompiler.
Anessentialfunction ofacompiler is torecord thevariable names usedinthe
sourceprogramandcollectinformationaboutvariousattributesofeachname.
Theseattributesmayprovideinformationaboutthestorageallocatedforaname,its
type,itsscope.
In the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or by
reference),andthetypereturnedaremaintainedinsymboltable.
Thesymboltableisadatastructurecontainingarecordforeachvariablename,with
fieldsfortheattributesofthename.Thedatastructureshouldbedesignedtoallow
thecompilertofindtherecordforeachnamequicklyandtostoreorretrievedata
fromthatrecordquickly.
Asymboltablecanbeimplementedinoneofthefollowingways:
O Linear(sortedorunsorted)list
O BinarySearchTree
O Hashtable
Amongtheaboveall,symboltablesaremostlyimplementedashashtables,where
thesourcecodesymbolitselfistreatedasakeyforthehashfunctionandthereturn
valueistheinformationaboutthesymbol.
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
O Tostorethenamesofallentitiesinastructuredformatoneplace.
O Toverifyifavariablehasbeendeclared.
O Toimplementtypechecking,byverifyingassignmentsandexpressions.
O Todeterminethescopeofaname(scoperesolution).
1.5 ERRORSENCOUNTEREDINDIFFERENTPHASES
Animportantroleofthecompileristoreportanyerrorsinthesourceprogramthat
itdetectsduringtheentiretranslationprocess.
Each phases of compiler can encounter errors, after detecting errors, must be
correctedtoprecedecompilationprocess.
Thesyntaxandsemanticphaseshandleslargenumberoferrorsincompilationprocess.
Errorhandlerhandlesalltypesoferrorslikelexicalerrors,syntaxerrors,semantic
errorsandlogicalerrors.
Lexicalerrors:
Lexicalanalyzerdetectserrorsfrominputcharacters.
Nameofsomekeywordsidentifierstypedincorrectly.
Example:switchiswrittenasswich.
Syntaxerrors:
Syntaxerrorsaredetectedbysyntaxanalyzer.
Errorslikesemicolonmissingorunbalancedparenthesis.
Example:((a+b*(cd)).Inthisstatement)missingafterb.
Semanticerrors:
Datatypemismatcherrorshandledbysemanticanalyzer.
Incompatibledatatypevaleassignment.
Example:Assigningastringvaluetointeger.
Logicalerrors:
Codenotereachableandinfiniteloops.
Misuseofoperators.Codeswrittenafterendofmain()block.
CS6660 __
Compiler Design
Unit I
_____1.9
1.6 THEGROUPINGOFPHASES
Eachphasesdealswiththelogicalorganizationofacompiler.
Activitiesofseveralphasesmaybegroupedtogetherintoapassthatreadsaninput
fileandwritesanoutputfile.
The frontendphases oflexicalanalysis, syntax analysis,semantic analysis,and
intermediatecodegenerationmightbegroupedtogetherintoonepass.
Codeoptimizationmightbeanoptionalpass.
Abackendpassconsistingofcodegenerationforaparticulartargetmachine.
Figure1.11:TheGroupingofPhasesofcompiler
Some compiler collections have been created around carefully designed intermediate
representationsthatallowthefrontendforaparticularlanguagetointerfacewiththebackendfora
certaintargetmachine.
Advantages:
Withthesecollections,wecanproducecompilersfordifferentsourcelanguagesforone
targetmachinebycombiningdifferentfrontends.
Similarly,wecanproducecompilersfordifferenttargetmachines,bycombiningafront
endfordifferenttargetmachines.
CS6660 __
Compiler Design
Unit I
_____1.10
1.7COMPILERCONSTRUCTIONTOOLS
The compiler writer, like any software developer, can profitably use modern software
developmentenvironmentscontainingtoolssuchaslanguageeditors,debuggers,versionmanagers,
profilers,testharnesses,andsoon.
Writingacompilerisatediousandtimeconsumingtask;therearesomespecializedtoolsto
implementvariousphasesofacompiler.ThesetoolsarecalledCompilerConstructionTools.
Somecommonlyusedcompilerconstructiontoolsaregivenbelow:
Scannergenerators
Parsergenerators
Syntaxdirectedtranslationengines
Dataflowanalysisengines
Codegeneratorgenerators
Compilerconstructiontoolkits
[LexicalAnalysis]
[SyntaxAnalysis]
[IntermediateCode]
[CodeOptimization]
[CodeGeneration]
[Forallphases]
Aliasing
CS6660 __
Compiler Design
Unit I
_____1.11
1.8.1TheStatic/DynamicDistinction
Thelanguageusesastaticpolicyorthattheissuecanbedecidedatcompiletime.Onthe
otherhand,apolicythatonlyallowsadecisiontobemadewhenweexecutetheprogramissaidto
beadynamicpolicyortorequireadecisionatruntime.
Thescopeofadeclarationofxistheregionoftheprograminwhichusesofxrefertothis
declaration.Alanguageusesstaticscopeorlexicalscopeifitispossibletodeterminethescopeof
adeclarationbylookingonlyattheprogram.Otherwise,thelanguageusesdynamicscope.With
dynamicscope,astheprogramruns,thesameuseofxcouldrefertoanyofseveraldifferent
declarationsofx.
Example:considertheuseoftheterm"static"asitappliestodatainaJavaclassdeclaration.In
Java,avariableisanameforalocationinmemoryusedtoholdadatavalue.Here,"static"refers
nottothescopeofthevariable,butrathertotheabilityofthecompilertodeterminethelocationin
memorywherethedeclaredvariablecanbefound.Adeclarationlike
public staticintx;
Thismakesxaclassvariableandsaysthatthereisonlyonecopyofx,nomatterhowmany
objectsofthisclassarecreated.Moreover,thecompilercandeterminealocationinmemorywhere
thisintegerxwillbeheld.Incontrast,had"static"beenomittedfromthisdeclaration,theneach
objectoftheclasswouldhaveitsownlocationwherexwouldbeheld,andthecompilercouldnot
determinealltheseplacesinadvanceofrunningtheprogram.
1.8.2EnvironmentsandStates
Programminglanguagesaffectthevaluesofdataelementsoraffecttheinterpretationof
namesforthatdatachanges,astheprogramruns.Forexample,theexecutionofanassignmentsuch
asx=y+1changesthevaluedenotedbythenamex.Morespecifically,theassignmentchanges
thevalueinwhateverlocationisdenotedbyx.
Thelocationdenotedbyxcanchangeatruntime.Ifxisnotastatic(or"class")variable,
theneveryobjectoftheclasshasitsownlocationforaninstanceofvariablex.Inthatcase,the
assignmenttoxcanchangeanyofthose"instance"variables,dependingontheobjecttowhicha
methodcontainingthatassignmentisapplied.
environment
state
nameslocations(variables)values
Theassociationofnameswithlocationsinmemory(thestore)andthenwithvaluescanbe
describedbytwomappingsthatchangeastheprogramruns:
1. Theenvironmentisamappingfromnamestolocationsinthestore.Sincevariablesreferto
locations('lvalues"intheterminologyofC),wecouldalternativelydefineanenvironment
asamappingfromnamestovariables.
2. The state isamappingfromlocationsinstoretotheirvalues.Thatis,thestatemaps1
valuestotheircorrespondingrvalues,intheterminologyofC.
Environmentschangeaccordingtothescoperulesofalanguage.
Example: Consider the C program fragment, Integer i is declared a global variable, and also
declaredasavariablelocaltofunctionf.Whenfisexecuting,theenvironmentadjustssothatname
ireferstothelocationreservedfortheithatislocaltof,andanyuseofi,suchastheassignmenti
=3shownexplicitly,referstothatlocation.
CS6660 __
Compiler Design
Unit I
_____1.12
Typically,thelocaliisgivenaplaceontheruntimestack.
inti;
...
voidf(..){
inti;
i=3;
x=i+1;
/*globali*/
/*locali*/
/*useoflocali*/
/*useofglobali*/
Wheneverafunctiongotherthanfisexecuting,usesoficannotrefertotheithatislocalto
f.Usesofnameiingmustbewithinthescopeofsomeotherdeclarationofi.Anexampleisthe
explicitlyshownstatementx=i+l,whichisinsidesomeprocedurewhosedefinitionisnotshown.
Theiini+1presumablyreferstotheglobali.
1.8.3StaticScopeandBlockStructure
The scope rules for C are based on program structure; the scope of a declaration is
determinedimplicitlybywherethedeclarationappearsintheprogram.Laterlanguages,suchasC+
+,Java,andC#,alsoprovideexplicitcontroloverscopesthroughtheuseofkeywordslikepublic,
private,andprotected.
Ablockisagroupingofdeclarationsandstatements.Cusesbraces{and}todelimita
block;thealternativeuseofbeginandendinsomelanguages.
Example: TheC++programinFig.1.10hasfourblocks,withseveraldefinitionsofvariablesa
andb.Asamemoryaid,eachdeclarationinitializesitsvariabletothenumberoftheblocktowhich
itbelongs.
Output
32
14
12
11
Figure1.12:BlocksinaC++program
CS6660 __
Compiler Design
Unit I
_____1.13
Considerthedeclarationinta=1inblockB1.ItsscopeisallofB1,exceptforthoseblocks
nestedwithinB1thathavetheirowndeclarationofa.B2,nestedimmediatelywithinB1,doesnot
haveadeclarationofa,butB3does.B4doesnothaveadeclarationofa,soblockB3istheonly
placeintheentireprogramthatisoutsidethescopeofthedeclarationofthenameathatbelongsto
B1.Thatis,thisscopeincludesB4andallofB2exceptforthepartofB2thatiswithinB3.The
scopesofallfivedeclarationsaresummarizedinFigure1.13.
Figure1.13:Scopesofdeclarations
1.8.4ExplicitAccessControl
Classesandstructuresintroduceanewscopefortheirmembers.Ifpisanobjectofaclass
withafield(member)x,thentheuseofxinp.xreferstofieldxintheclassdefinition.thescopeof
amemberdeclarationxinaclassCextendstoanysubclassC',exceptifC'hasalocaldeclaration
ofthesamenamex.
Through the use of keywords like public, private, and protected, object oriented
languagessuchasC++orJavaprovideexplicitcontroloveraccesstomembernamesinasuper
class.Thesekeywordssupportencapsulationbyrestrictingaccess.
Thus,privatenamesarepurposelygivenascopethatincludesonlythemethoddeclarations
anddefinitionsassociatedwiththatclassandany"friend"classes(theC++term).Protectednames
areaccessibletosubclasses.Publicnamesareaccessiblefromoutsidetheclass.
1.8.5DynamicScope
Technically,anyscopingpolicyisdynamicifitisbasedonfactor(s)thatcanbeknownonly
whentheprogramexecutes.Thetermdynamicscope,however,usuallyreferstothefollowing
policy:auseofanamexreferstothedeclarationofxinthemostrecentlycalledprocedurewith
suchadeclaration.
Dynamicscopingofthistypeappearsonlyinspecialsituations.
We shall consider two examples of dynamic policies: macro expansion in the C
preprocessorandmethodresolutioninobjectorientedprogramming.
Example:IntheCprogram,identifieraisamacrothatstandsforexpression(x+I).Butwecannot
resolvexstatically,thatis,intermsoftheprogramtext.
#definea(x+1)
intx=2;
voidb(){intx=1;printf(%d\n,
a);}voidc(){printf("%d\n,a);}
voidmain(){b();c();}
Infact,inordertointerpretx,wemustusetheusualdynamicscoperule.thefunctionmain
firstcallsfunctionb.Asbexecutes,itprintsthevalueofthemacroa.Since(x+1)mustbe
substitutedfora,weresolvethisuseofxtothedeclarationintx=linfunctionb.Thereasonisthat
bhasadeclarationofx,sothe(x+1)intheprintfinbreferstothisx.Thus,thevalueprintedis1.
CS6660 __
Compiler Design
Unit I
_____1.14
Afterbfinishes,andciscalled,weagainneedtoprintthevalueofmacroa.However,the
onlyxaccessibletocistheglobalx.Theprintfstatementincthusreferstothisdeclarationofx,
andvalue2isprinted.
1.8.6ParameterPassingMechanisms
Allprogramminglanguageshaveanotionofaprocedure,buttheycandifferinhowthese
procedures get their arguments. The actual parameters (the parameters used in the call of a
procedure)areassociatedwiththeformalparameters(thoseusedintheproceduredefinition).
Incallbyvalue,theactualparameterisevaluated(ifitisanexpression)orcopied(ifitisa
variable).Thevalueisplacedinthelocationbelongingtothecorrespondingformalparameterof
thecalledprocedure.ThismethodisusedinCandJava.
Incallbyreference,theaddressoftheactualparameterispassedtothecalleeasthevalue
ofthecorrespondingformalparameter.Usesoftheformalparameterinthecodeofthecalleeare
implementedbyfollowingthispointertothelocationindicatedbythecaller.Changestotheformal
parameterthusappearaschangestotheactualparameter.
AthirdmechanismcallbynamewasusedintheearlyprogramminglanguageAlgol60.It
requiresthatthecalleeexecuteasiftheactualparameterweresubstitutedliterallyfortheformal
parameterinthecodeofthecallee,asiftheformalparameterwereamacrostandingfortheactual
parameter.
1.8.7Aliasing
Thereisaninterestingconsequenceofcallbyreferenceparameterpassingoritssimulation,
as in Java, where references to objects are passed by value. It is possible that two formal
parameterscanrefertothesamelocation;suchvariablesaresaidtobealiasesofoneanother.As
a result, any two variables, which may appear to take their values from two distinct formal
parameters,canbecomealiasesofeachother.
Example:Supposeaisanarraybelongingtoaprocedurep,andpcallsanotherprocedureq(x,y)
withacallq(a,a).Supposealsothatparametersarepassedbyvalue,butthatarraynamesarereally
referencestothelocationwherethearrayisstored,asinCorsimilarlanguages.Now,xandyhave
becomealiasesofeachother.Theimportantpointisthatifwithinqthereisanassignmentx[10]=
2,thenthevalueofy[10]alsobecomes2.
CS6660
Compiler Design
Unit II
2.1
UNITIILEXICALANALYSIS
2.1NEEDANDROLEOFLEXICALANALYZER
LexicalAnalysisisthefirstphaseofcompiler.Itreadstheinputcharactersfromleftto
right,onecharacteratatime,fromthesourceprogram.
Itgeneratesthesequenceoftokensforeachlexeme.Eachtokenisalogicalcohesiveunit
suchasidentifiers,keywords,operatorsandpunctuationmarks.
Itneedstoenterthatlexemeintothesymboltableandalsoreadsfromthesymboltable.
TheseinteractionsaresuggestedinFigure2.1.
Figure2.1:Interactionsbetweenthelexicalanalyzerandtheparser
Sincethelexicalanalyzeristhepartofthecompilerthatreadsthesourcetext,itmay
perform certain other tasks besides identification of lexemes. One such task is stripping out
comments and whitespace (blank, newline, tab). Another task is correlating error messages
generatedbythecompilerwiththesourceprogram.
Needs/Roles/Functionsoflexicalanalyzer
Itproducesstreamoftokens.
Iteliminatescommentsandwhitespace.
Itkeepstrackoflinenumbers.
Itreportstheerrorencounteredwhilegeneratingtokens.
Itstoresinformationaboutidentifiers,keywords,constantsandsoonintosymboltable.
Lexicalanalyzersaredividedintotwoprocesses:
a) Scanningconsistsofthesimpleprocessesthatdonotrequiretokenizationoftheinput,such
asdeletionofcommentsandcompactionofconsecutivewhitespacecharactersintoone.
b) Lexicalanalysisisthemorecomplexportion,wherethescannerproducesthesequenceof
tokensasoutput.
LexicalAnalysisversusParsing/IssuesinLexicalanalysis
1. Simplicityofdesign:Itisthemostimportantconsideration.Theseparationoflexicaland
syntacticanalysisoftenallowsustosimplifytasks.whitespaceandcommentsremovedby
thelexicalanalyzer.
2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply
specializedtechniquesthatserveonlythelexicaltask,notthejobofparsing.Inaddition,
specializedbufferingtechniquesforreadinginputcharacters canspeedupthecompiler
significantly.
3. Compilerportabilityisenhanced. Inputdevicespecificpeculiaritiescanberestrictedto
thelexicalanalyzer.
Tokens,Patterns,andLexemes
Atokenisapairconsistingofatokennameandanoptionalattributevalue.Thetokenname
isanabstractsymbolrepresentingakindofsinglelexicalunit,e.g.,aparticularkeyword,ora
CS6660
Compiler Design
Unit II
2.2
sequenceofinputcharactersdenotinganidentifier.Operators,specialsymbolsandconstantsare
alsotypicaltokens.
Apatternisadescriptionoftheformthatthelexemesofatokenmaytake.Patternissetof
rulesthatdescribethetoken.Alexemeisasequenceofcharactersinthesourceprogramthat
matchesthepatternforatoken.
Table2.1:TokensandLexemes
TOKEN
INFORMAL
DESCRIPTION SAMPLELEXEMES
(PATTERN)
if
charactersi,f
if
else
characterse,l,s,e
else
comparison
<or>or<=or>=or==or!=
<=,!=
id
Letter,followedbylettersanddigits
pi,score,D2,sum,id_1,AVG
number
anynumericconstant
35,3.14159,0,6.02e23
literal
anythingsurroundedby
Core, Design Appasami,
Inmanyprogramminglanguages,thefollowingclassescovermostorallofthetokens:
1. Onetokenforeachkeyword.Thepatternforakeywordisthesameasthekeyworditself.
2. Tokensfortheoperators,eitherindividuallyorinclassessuchasthetokencomparison
mentionedintable2.1.
3. Onetokenrepresentingallidentifiers.
4. Oneormoretokensrepresentingconstants,suchasnumbersandliteralstrings.
5. Tokens for each punctuation symbol, such as left and right parentheses, comma, and
semicolon
AttributesforTokens
Whenmorethanonelexemecanmatchapattern,thelexicalanalyzermustprovidethe
subsequentcompilerphasesadditionalinformationabouttheparticularlexemethatmatched.
Thelexicalanalyzerreturnstotheparsernotonlyatokenname,butanattributevaluethat
describesthelexemerepresentedbythetoken.
Thetokennameinfluencesparsingdecisions,whiletheattributevalueinfluencestranslation
oftokensaftertheparse.
Informationaboutanidentifiere.g.,itslexeme,itstype,andthelocationatwhichitisfirst
found(incaseanerrormessage)iskeptinthesymboltable.
Thus,theappropriateattributevalueforanidentifierisapointertothesymboltableentry
forthatidentifier.
Example:ThetokennamesandassociatedattributevaluesfortheFortranstatement
E=M
*C**2arewrittenbelowasasequenceofpairs.
<id,pointertosymboltableentryforE>
<assign_op>
<id,pointertosymboltableentryforM>
<mult_op>
<id,pointertosymboltableentryforC>
<exp_op>
<number,integervalue2>
Notethatincertainpairs,especiallyoperators,punctuation,andkeywords,thereisnoneed
foranattributevalue.Inthisexample,thetokennumberhasbeengivenanintegervaluedattribute.
CS6660
Compiler Design
Unit II
2.3
2.2LEXICALERRORS
Itishardforalexicalanalyzertotellthatthereisasourcecodeerrorwithouttheaidof
othercomponents.
ConsideraCprogramstatementfi(a==f(x)).Thelexicalanalyzercannottellwhetherfiis
amisspellingofthekeywordiforanundeclaredfunctionidentifier.Sincefiisavalidlexemefor
thetokenid,thelexicalanalyzermustreturnthetokenidtotheparser.
Thelexicalanalyzerisunabletoproceedbecausenoneofthepatternsfortokensmatches
anyprefixoftheremaininginput.Thesimplestrecoverystrategyis"panicmode"recovery.
Wedeletesuccessivecharactersfromtheremaininginput,untilthelexicalanalyzercanfind
awellformedtokenatthebeginningofwhatinputisleft.
Otherpossibleerrorrecoveryactionsare:
1.
2.
3.
4.
Deleteonecharacterfromtheremaininginput.
Insertamissingcharacterintotheremaininginput.
Replaceacharacterbyanothercharacter.
Transposetwoadjacentcharacters.
Transformationslikethesemaybetriedinanattempttorepairtheinput.Thesimplestsuch
strategyistoseewhetheraprefixoftheremaininginputcanbetransformedintoavalidlexemeby
asingletransformation.
Inpracticemostlexicalerrorsinvolveasinglecharacter.Amoregeneralcorrectionstrategy
istofindthesmallestnumberoftransformationsneededtoconvertthesourceprogramintoone
thatconsistsonlyofvalidlexemes.
2.3EXPRESSINGTOKENSBYREGULAREXPRESSIONS
SpecificationofTokens
Regularexpressionsareanimportantnotationforspecifyinglexemepatterns.Wecannot
expressallpossiblepatterns,theyareveryeffectiveinspecifyingthosetypesofpatternsthatwe
actuallyneedfortokens.
StringsandLanguages
An alphabet is any finite set ofsymbols. Examples of symbols are letters, digits, and
punctuation.Theset{0,1)isthebinaryalphabet.ASCIIisanimportantexampleofanalphabet.
Astring(sentenceorword)overanalphabetisafinitesequenceofsymbolsdrawnfrom
thatalphabet.Thelengthofastrings,usuallywritten|s|,isthenumberofoccurrencesofsymbols
ins.Forexample,bananaisastringoflengthsix.Theemptystring,denoted ,isthestringof
lengthzero.
A language isanycountablesetofstringsoversomefixedalphabet.Abstractlanguages
like , the empty set, or { }, thesetcontainingonlytheemptystring,arelanguagesunderthis
definition.
PartsofStrings:
1. Aprefixofstringsisanystringobtainedbyremovingzeroormoresymbolsfromtheend
ofs.Forexample,ban,banana,andareprefixesofbanana.
2. Asufixofstringsisanystringobtainedbyremovingzeroormoresymbolsfromthe
beginningofs.Forexample,nana,banana,andaresuffixesofbanana.
3. Asubstringofsisobtainedbydeletinganyprefixandanysuffixfroms.Forinstance,
banana,nan,andaresubstringsofbanana.
4. Theproperprefixes,suffixes,andsubstringsofastringsarethose,prefixes,suffixes,and
substrings,respectively,ofsthatarenotornotequaltositself.
5. A subsequence of s is any string formed by deleting zero or more not necessarily
consecutivepositionsofs.Forexample,baanisasubsequenceofbanana.
CS6660
Compiler Design
Unit II
2.4
6. Ifxandyarestrings,thentheconcatenationofxandy,denotedxy,isthestringformedby
appendingytox.
OperationsonLanguages
Inlexicalanalysis,themostimportantoperationsonlanguagesareunion,concatenation,
andclosure,whicharedefinedintable2.2.
Table2.2:Definitionsofoperationsonlanguages
Example:LetLbethesetofletters{A,B,...,Z,a,b,...,z)andletDbethesetofdigits{0,1,..
.9).OtherlanguagesthatcanbeconstructedfromlanguagesLandD
1. LUDisthesetoflettersanddigitsstrictlyspeakingthelanguagewith62stringsoflength
one,eachofwhichstringsiseitheroneletteroronedigit.
2. LDisthesetdf520stringsoflengthtwo,eachconsistingofoneletterfollowedbyone
digit.
3. L4isthesetofall4letterstrings.
4. L*isthesetofailstringsofletters,includinge,theemptystring.
5. L(LUD)*isthesetofallstringsoflettersanddigitsbeginningwithaletter.
6. D+isthesetofallstringsofoneormoredigits.
Regularexpression
Regularexpressioncanbedefinedasasequenceofsymbolsandcharactersexpressinga
stringorpatterntobesearched.
Regularexpressionsaremathematicalrepresentationwhichdescribesthesetofstringsof
specificlanguage.
Regularexpressionforidentifiersrepresentedbyletter_(letter_|digit)*.Theverticalbar
meansunion,theparenthesesareusedtogroupsubexpressions,andthestarmeans"zeroormore
occurrencesof".
EachregularexpressionrdenotesalanguageL(r),whichisalsodefinedrecursivelyfrom
thelanguagesdenotedbyr'ssubexpressions.
Therules that define the regular expressions over some alphabet .
Basisrules:
1. is a regular expression, and L() is { }.
2. If a is a symbol in , then a is a regular expression, and L(a) = {a}, that is, the
languagewithonestringoflengthone.
Inductionrules:SupposerandsareregularexpressionsdenotinglanguagesL(r)andL(s),
respectively.
1. (r)|(s)isaregularexpressiondenotingthelanguageL(r)UL(s).
2. (r)(s)isaregularexpressiondenotingthelanguageL(r)L(s).
3. (r)*isaregularexpressiondenoting(L(r))*.
4. (r) is a regular expression denoting L(r). i.e., Additional pairs of parentheses around
expressions.
Compiler Design
Regular
expression
a|b
(a|b)(a|b)
a*
(a|b)*
a|a*b
Unit II
Language
Meaning
{a,b}
{aa,ab,ba,bb}
{ , a, aa, aaa, }
{,a,b,aa,ab,ba,bb,
aaa, }
{a, b, ab, aab, aaab, }
Single a or b
All strings of length two over the alphabet
Consistingofallstringsofzeroormorea's
setofallstringsconsistingofzeroormore
instancesofaorb
Stringaandallstringsconsistingofzeroor
morea'sandendinginb
2.5
Alanguagethatcanbedefinedbyaregularexpressioniscalledaregularset.Iftwo
regularexpressionsrandsdenotethesameregularset,wesaytheyareequivalentandwriter=s.
Forinstance,(a|b)=(b|a),(a|b)*=(a*b*)*,(b|a)*=(a|b)*,(a|b)(b|a)=aa|ab|ba|bb.
Algebraiclaws
Algebraiclawsthatholdforarbitraryregularexpressionsr,s,andt:
LAW
DESCRIPTION
r|s=s|r
|iscommutative
r(s|t)=(r|s)t
|isassociative
r(st)=(rs)t
Concatenationisassociative
r(s|t)=rs|rt;(s|t)r=sr|tr
Concatenationdistributesover|
r=r=r
is the identity for concatenation
r* = (r |)*
is guaranteed in a closure
r**=r*
*isidempotent
ExtensionsofRegularExpressions
FewnotationalextensionsthatwerefirstincorporatedintoUnixutilitiessuchasLexthat
areparticularlyusefulinthespecificationlexicalanalyzers.
1. Oneormoreinstances:Theunary,postfixoperator+representsthepositiveclosureofa
regularexpressionanditslanguage.Ifrisaregularexpression,then(r)+denotesthe
+
+
language(L(r)) .Thetwousefulalgebraiclaws,r*=r | and r+ =rr*=r*r.
2. Zerooroneinstance:Theunarypostfixoperator?means"zerooroneoccurrence."
That is, r? is equivalent to r| , L(r?) = L(r) U {}.
3. Characterclasses:Aregularexpressiona1|a2||an,wheretheai'sareeachsymbolsofthe
alphabet,canbereplacedbytheshorthand[a1, a2, an].Thus,[abc]isshorthandfora|b|c,
and[az] is shorthand for a|b||z.
Example:RegulardefinitionforC
identifierLetter_[AZa
z_]
digit[09]
idletter_(letter_|digit)*
Example:Regulardefinitionunsigned
integerdigit[09]
+
digitsdigit
numberdigits(.digits)?(E[+]?digits)?
Note:Theoperators*,+,and?hasthesameprecedenceandassociativity.
CS6660
Compiler Design
Unit II
2.6
2.4CONVERTINGREGULAREXPRESSIONTODFA
ToconstructaDFAdirectlyfromaregularexpression,weconstructitssyntaxtreeandthen
compute four functions: nullable, firstpos, lastpos, and followpas, defined as follows. Each
definitionreferstothesyntaxtreeforaparticularaugmentedregularexpression(r)#.
1. nullable(n)istrueforasyntaxtreenodenifandonlyifthesubexpressionrepresentedbyn
has initslanguage.Thatis,thesubexpressioncanbe"madenull"ortheemptystring,
eventhoughtheremaybeotherstringsitcanrepresentaswell.
2. firstpos(n) isthesetofpositionsinthesubtreerootedatnthatcorrespondtothefirst
symbolofatleastonestringinthelanguageofthesubexpressionrootedatn.
3. lastpos(n)isthesetofpositionsinthesubtreerootedatnthatcorrespondtothelastsymbol
ofatleastonestringinthelanguageofthesubexpressionrootedatn.
4. followpos(p), forapositionp,isthesetofpositionsqintheentiresyntaxtreesuchthat
thereissomestringx=a1a2aninL((r)#)suchthatforsomei,thereisawaytoexplain
themembershipofxinL((r)#)bymatchingaitopositionpofthesyntaxtree
andai+1topositionq.
Wecancomputenullable,firstpos,andlastposbyastraightforwardrecursionontheheight
ofthetree.Thebasisandinductiverulesfornullableandfirstposaresummarizedintable.
Therulesforlastposareessentiallythesameasforfirstpos,buttherolesofchildrenc1and
c2mustbeswappedintheruleforacatnode.
Thereareonlytwowaystocomputefollowpos.
1. Ifnisacatnodewithleftchildc landrightchildc2,thenforeverypositioniinlastpos(c 1),
allpositionsinfirstpos(c2)areinfollowpos(i).
2. 2.Ifnisastarnode,andiisapositioninlastpos(n),thenallpositionsinfirstpos(n)arein
followpos(i).
ConvertingaRegularExpressionDirectlytoaDFA
Algorithm:ConstructionofaDFAfromaregularexpressionr.
INPUT:Aregularexpressionr.
OUTPUT:ADFADthatrecognizesL(r).
METHOD:
1.
2.
3.
ConstructasyntaxtreeTfromtheaugmentedregularexpression(r)#.
Computenullable,firstpos,lastpos,andfollowposforT.
ConstructDstates,thesetofstatesofDFAD,andDtran,thetransitionfunctionforD,
CS6660
Compiler Design
Unit II
initializeDstatestocontainonlytheunmarkedstatefirstpos(no),
wherenoistherootofsyntaxtreeTfor(r)#;
while(thereisanunmarkedstateSinDstates)
{
markS;
for(eachinputsymbola)
{
letUbetheunionoffollowpos(p)forallpinSthatcorrespondtoa;
if(UisnotinDstates)
addUasanunmarkedstatetoDstates;
Dtran[S,a]=U
}
}
2.7
By
theaboveprocedure.ThestatesofDaresetsofpositionsinT.Initially,eachstateis
"unmarked,"andastatebecomes"marked"justbeforeweconsideritsouttransitions.Thestart
stateofDisfirstpos(no),wherenodenoistherootofT.Theacceptingstatesarethose
containingthepositionfortheendmarkersymbol#.
Example:ConstructaDFAfortheregularexpressionr=(a|b)*abb
Figure2.2:Syntaxtreefor(a|b)*abb#
Figure2.3:firstposandlastposfornodesinthesyntaxtreefor(a|b)*abb#
CS6660
Compiler Design
Unit II
2.8
Wemustalsoapplyrule2tothestarnode.Thatruletellsuspositions1and2areinboth
followpos(1)andfollowpos(2),sincebothfirstpasandlastposforthisnodeare{1,2}.Thecomplete
setsfollowposaresummarizedintable
NODEn
Followpos(n)
1
{1,2,3}
2
{1,2,3}
3
{4}
4
{4}
5
{4}
6
{}
Figure2.4:Directedgraphforthefunctionfollowpos
nullableistrueonlyforthestarnode,andweexhibitedfirstposandlastposinFigure2.3.
Thevalueoffirstposfortherootofthetreeis{1,2,3},sothissetisthestartstateofD.allthisset
ofstatesA.WemustcomputeDtran[A,a]andDtran[A,b].AmongthepositionsofA,1and3
correspondtoa,while2correspondstob.Thus,Dtran[A,a]=followpos(1)Ufollowpos(3)={1,
2,3,4},andDtran[A,b]=followpos(2)={1,2,3}.
Figure2.5:DFAconstructedfor(a|b)*abb#
ThelatterisstateA,andsodoesnothavetobeaddedtoDstates,buttheformer,B=
{1,2,3,4},isnew,soweaddittoDstatesandproceedtocomputeitstransitions.TheompleteDFA
isshowninFigure2.5.
Example:Construct NFA for (alb)*abbandconverttoDFAbysubsetconstruction.
CS6660
Compiler Design
Unit II
2.9
Figure2.7:NFAfor(a|b)*abb
Figure2.8ResultofapplyingthesubsetconstructiontoFigure2.6
2.5MINIMIZATIONOFDFA
TherecanbemanyDFA'sthatrecognizethesamelanguage.Forinstance,theDFAsof
Figure2.5and2.8bothrecognizethesamelanguageL((a|b)*abb).
WewouldgenerallypreferaDFAwithasfewstatesaspossible,sinceeachstaterequires
entriesinthetablethatdescribesthelexicalanalyzer.
Algorithm:MinimizingthenumberofstatesofaDFA.
INPUT:ADFADwithsetofstatesS,inputalphabet,initialstateso,andsetofacceptingstates
F.
OUTPUT:ADFAD'acceptingthesamelanguageasDandhavingasfewstatesaspossible.
METHOD:
2.
1. StartwithaninitialpartitionIIwithtwogroups,FandSF,theacceptingandnonaccepting
statesofD.
ApplytheprocedureofFig.3.64toconstructanewpartitionanew.
initially, let new = ; for
( each group G of )
{
partitionGintosubgroupssuchthattwostatessandtareinthesamesubgroupif
andonlyifforall
inputsymbolsa,statessandthave transitions on a to states in the same group of ;
/*atworst,astatewillbeinasubgroupbyitself*/
replaceGinIInewbythesetofallsubgroupsformed;
5.
(a)
acceptingstateofD.
CS6660
Compiler Design
Unit II
2.10
(c) LetsbetherepresentativeofsomegroupGoffina,andletthetransitionofDfromson
inputabetostatet.Let r betherepresentativeoft'sgroupH.Thenin D',thereisa
transitionfromstoroninputa.
Example:LetusreconsidertheDFAofFigure2.8forminimization.
STATE
a
b
A
B
C
B
B
D
C
B
C
D
B
E
(E)
B
C
Theinitialpartitionconsistsofthetwogroups{A,B,C,D}{E},whicharerespectivelythe
nonacceptingstatesandtheacceptingstates.
To construct new,theprocedureconsidersbothgroupsandinputsaandb.Thegroup{E}
cannot be split, because it has only one state, so (E} will remain intact in new.
Theothergroup{A,B,C,D}canbesplit,sowemustconsidertheeffectofeachinput
symbol.Oninputa,eachofthesestatesgoestostateB,sothereisnowaytodistinguishthese
statesusingstringsthatbeginwitha.Oninputb,statesA,B,andCgotomembersofgroup{A,B,
C,D},whilestateDgoestoE,amemberofanothergroup.
Thus, in new, group {A, B, C, D} is split into {A, B, C}{D}, and new forthisroundis
{A,B,C){D){E}.
Inthenextround,wecansplit{A,B,C}into{A,C}{B},sinceAandCeachgotoa
memberof{A,B,C)oninputb,whileBgoestoamemberofanothergroup,{D}.Thus,afterthe
second round, new ={A,C}{B}{D}{E).
Forthethirdround,wecannotsplittheoneremaininggroupwithmorethanonestate,since
AandCeachgotothesamestate(andthereforetothesamegroup)oneachinput.Weconclude
that final ={A,C}{B){D){E).
Now,weshallconstructtheminimumstateDFA.Ithasfourstates,correspondingtothe
four groups of final,andletuspickA,B,D,andEastherepresentativesofthesegroups.The
initialstateisA,andtheonlyacceptingstateisE.
Table:TransitiontableofminimumstateDFA
STATE
a
b
A
B
A
B
B
D
C
B
E
(E)
B
A
2.6LANGUAGEFORSPECIFYINGLEXICALANALYZERSLEX
Therearewiderangeftoolsforconstructionoflexicalanalyzerbasedonregular
expressions.Lexisatool(Computerprogram)thatgenerateslexicalanalyzers.
Lexisalexicalanalyzerbasedtoolbyspecifyingregularexpressionstodescribepatterns
fortoken.LextoolisreferredtoastheLexlanguageandthetoolitselfistheLexcompiler.
UseofLex
TheLexcompilertransformstheinputpatterns intoatransitiondiagramandgenerates
code.
CS6660
Compiler Design
Unit II
2.11
Aninputfilelex.liswrittenintheLexlanguageanddescribesthelexicalanalyzertobe
generated.TheLexcompilertransforms lex.l toaCprogram,inafilethatisalways
namedlex.yy.c.
Thefilelex.yy.ciscompiledbytheCCompilerandconvertedintoafilea.out.The
Ccompileroutputisaworkinglexicalanalyzerthatcantakeastream ofinputcharacters
andproduceastreamoftokens.
Theattributevalue,whetheritbeanothernumericcode,apointertothesymboltable,or
nothing,isplacedinaglobalvariableyylvalwhichissharedbetweenthelexicalanalyzer
andparser
Figure2.9:CreatingalexicalanalyzerwithLex
StructureofLexPrograms
ALexprogramhasthefollowingform:
declarations
%%
translation rules
%%
auxiliary functions
Thedeclarationssectionincludesdeclarationsofvariables,manifestconstants(identifiers
declaredtostandforaconstant,e.g.,thenameofatoken),andregulardefinitions.
ThetranslationrulesoflexprogramstatementhavetheformPattern{Action}
PatternP1{ActionA1}
PatternP2{ActionA2}
PatternPn{ActionAn}
Eachpatternisaregularexpression.Theactionsarefragmentsofcodetypicallywrittenin
Clanguage.
Thethirdsectionholdswhateveradditionalfunctionsareusedintheactions.Alternatively,
thesefunctionscanbecompiledseparatelyandloadedwiththe
lexicalanalyzer.
Thelexicalanalyzerbeginsreadingitsremaininginput,onecharacteratatime,untilitfinds
thelongestprefixoftheinputthatmatchesoneofthepatternsP i.Itthenexecutestheassociated
actionAi. Typically,Ai willreturntotheparser,butifitdoesnot(e.g.,becausePidescribes
whitespaceorcomments),thenthelexicalanalyzerproceedstofindadditionallexemes,untilone
ofthecorrespondingactionscausesareturntotheparser.Thelexicalanalyzerreturnsasingle
value,thetokenname,totheparser,butusestheshared,integervariableyylvaltopassadditional
informationaboutthelexemefound.
CS6660
Compiler Design
Unit II
2.12
2.7DESIGNOFLEXICALANALYZERFORASAMPLELANGUAGE
ThelexicalanalyzergeneratorsuchasLexisarchitectedwithanautomationsimulator.The
implementationofLexcompilercanbebasedoneitherNFAorDFA.
2.7.1TheStructureoftheGeneratedAnalyzer
Figure2.10showsthearchitectureofalexicalanalyzergeneratedbyLex.ALexprogramis
convertedintoatransitiontableandactionswhichareusedbyafiniteAutomatonsimulator.
Theprogramthatservesasthelexicalanalyzerincludesafixedprogramthatsimulatesan
automaton;theautomatonisdeterministicornondeterministic.Therestofthelexicalanalyzer
consistsofcomponentsthatarecreatedfromtheLexprogrambyLexitself.
Figure2.10:ALexprogramisturnedintoatransitiontableandactions,whichareusedbyafinite
automatonsimulator
Thesecomponentsare:
1. Atransitiontablefortheautomaton.
2. ThosefunctionsthatarepasseddirectlythroughLextotheoutput.
3. The actions from the input program, which appear as fragments of code to be
invokedattheappropriatetimebytheautomatonsimulator.
2.7.2 PatternMatchingBasedonNFA's
Toconstructtheautomationforseveralregularexpressions,weneedtocombineallNFAsinto
onebyintroducinganewstartstate with transitionstoeachofthestartstatesoftheNFA'sN i
forpatternpiasshowninfigure2.11.
Figure2.11:AnNFAconstructedfromaLexprogram
Example:Considertheatern
CS6660
Compiler Design
Unit II
a
{actionAl forpatternpl}
abb
{actionA2forpatternp2}
a*b+ {actionA3forpatternp3}
Figure2.12:NFA'sfora,abb,anda*b+
Figure2.13:CombinedNFA
Figure2.14:Sequenceofsetsofstatesenteredwhenprocessinginputaaba
Figure2.15:TransitiongraphforDFAhandlingthepatternsa,abb,anda*b+
2.13
CS6660
Compiler Design
Unit III
3.1
UNITIIISYNTAXANALYSIS
3.1NEEDANDROLEOFTHEPARSER
Theparsertakesthetokenproducedbylexicalanalysisandbuildsthesyntaxtree(parse
tree).ThesyntaxtreecanbeeasilyconstructedfromContextFreeGrammar.
Theparserreportssyntaxerrorsinanintelligiblefashionandrecoversfromcommonly
occurringerrorstocontinueprocessingtheremainderoftheprogram.
token
Sourc
e
Lexical
progra
m
Analyzer
Parse
Parser
tre
e
Get next
token
Rest of
Front End
intermediat
e
representati
on
Symbol
Table
Figure3.1:Positionofparserincompilermodel
RoleoftheParser:
Parserbuildstheparsetree.
ParserPerformscontextfreesyntaxanalysis.
Parserhelpstoconstructintermediatecode.
Parserproducesappropriateerrormessages.
Parserattemptstocorrectfewerrors.
Typesofparsersforgrammars:
Universalparsers
Universal parsing methods such as the CockeYoungerKasami algorithm and Earley's
algorithm can parse any grammar. These general methods are too inefficient to use in
production.Thismethodisnotcommonlyusedincompilers.
Topdownparsers
Topdownmethodsbuildparsetreesfromthetop(root)tothebottom(leaves)
Bottomupparsers.
Bottomupmethodsstartfromtheleavesandworktheirwayuptotheroot.
3.2CONTEXTFREEGRAMMARS
3.2.1TheFormalDefinitionofaContextFreeGrammar
AcontextfreegrammarGisdefinedbythe4tuple:G=(V,T,PS)where
1. Visafinitesetofnonterminals(variable).
2. Tisafinitesetofterminals.
3.
4. Sisthestartsymbol(variableSV).
CS6660
Compiler Design
Unit III
3.2
Example3.1:Thefollowinggrammardefinessimplearithmeticexpressions.Inthisgrammar,the
terminalsymbolsareid+*/().Thenonterminalsymbolsareexpression,termandfactor,and
expressionisthestartsymbol.
expression expression+term
expression expression term
expression term
termterm*factor
term term/factor
term factor
factor (expression )
factor id
3.2.2NotationalConventions
Thefollowingnotationalconventionsforgrammarscanbeused
1.
Thesesymbolsareterminals:
(a) Lowercaselettersearlyinthealphabet,suchasa,b,e.
(b) Operatorsymbolssuchas+,*,andsoon.
(c) Punctuationsymbolssuchasparentheses,comma,andsoon.
(d) Thedigits0,1,...,9.
(e) Boldfacestringssuchasidorif,eachofwhichrepresentsasingleterminalsymbol.
2.
Thesesymbolsarenonterminals:
(a) Uppercaselettersearlyinthealphabet,suchasA,B,C.
(b) TheletterSisusuallythestartsymbolwhenwhichappears.
(c) Lowercase,italicnamessuchasexprorstmt.
(d) When discussing programming constructs, uppercase letters may be used to
represent nonterminals for the constructs. For example, nonterminals for
expressions,terms,andfactorsareoftenrepresentedbyE,T,andF,respectively.
3. Uppercaseletterslateinthealphabet,suchasX,Y,Z,representgrammarsymbols;thatis,
eithernonterminalsorterminals.
4.
Lowercaseletterslateinthealphabet,chieflyu,v,...,z,represent(possiblyempty)strings
ofterminals.
5. LowercaseGreekletters, , , forexample,represent(possiblyempty)stringsofgrammar
symbols.Thus,agenericproductioncanbewrittenasA,whereAistheheadandthe
body.
7.
6. Asetofproductions A 1 , A 2 ,, A k withacommonheadA(callthemA
productions),maybewrittenA1| 2|| k.call1,2,,ktheAlternativesforA.
Unlessstatedotherwise,theheadofthefirstproductionisthestartsymbol.
Example3.2:Usingtheseconventions,thegrammarofExample3.1canberewrittenconciselyas
EE+T|ET|T
TT*F|T/F|FF(
E)|id
CS6660
Compiler Design
Unit III
3.3
3.2.3Derivations
Thederivationusesproductionstogenerateastring(setofterminals).Thederivationis
formedbyreplacingthenonterminalintherighthandsidebysuitableproductionrule.
Thederivationsareclassifiedintotwotypesbasedontheorderofreplacementof
production.Theyare:
1. Leftmostderivation
Iftheleftmostnonterminalisreplacedbyitsproductioninderivation,thenitcalled
leftmostderivation.
2. Rightmostderivation
Iftherightmostnonterminalisreplacedbyitsproductioninderivation,thenitcalled
rightmostderivation.
Example3.3:LMDandRMDforexample3.2
LMDfor(id+id)
(E+E)
(id+id)
(E+id)
Example 3.4: Consider the context free grammar (CFG) G = ({S}, {a, b, c}, P, S ) where
P={SSbS|ScS|a}.Derivethestringabacabyleftmostderivationandrightmostderivation.
Leftmostderivationforabaca
SbS
(usingruleSa)
(usingruleSScS)
(usingruleSa)
S
abS
abScS
abacS
abaca
(usingruleSa)
Rightmostderivationforabaca
ScS
(usingruleSa)
(usingruleSSbS)
(usingruleSa)
S
Sca
SbSca
Sbaca
abaca
(usingruleSa)
3.2.4ParseTreesandDerivations
Aparsetreeisagraphicalrepresentationofaderivation.tisconvenienttoseehowstrings
arederivedfromthestartsymbol.Thestartsymbolofthederivationbecomestherootoftheparse
tree.
CS6660
Compiler Design
Unit III
3.4
(E)(E+E)(id+E)
Parsetree:
E
E
E
E
E
E
E
E
E
E
)
E)
)
E
+
E
E
id
id
id
Figure3.2:Parsetreefor(id+id)
3.2.5Ambiguity
A grammar that produces more than one parse tree for some sentence is said to be
ambiguous.Putanotherway,anambiguousgrammarisonethatproducesmorethanoneleftmost
derivationormorethanonerightmostderivationforthesamesentence.
AgrammarGissaidtobeambiguousifithasmorethanoneparsetreeeitherinLMDorinRMD
foratleastonestring.
Example3.6: Thearithmeticexpressiongrammar(3.3)permitstwodistinctleftmostderivations
forthesentenceid+id*id:
E+E
E
E*E
E+E*E
E
id+E
id+E*E
id+id*E
id+id*id
id
id+id*E
id+id*id
id+E*E
+
E
id
E
E
*
id
id
E
id
E
id
Figure3.3:Twoparsetreesforid+id*id
CS6660
Compiler Design
Unit III
3.5
3.2.6VerifyingtheLanguageGeneratedbyaGrammar
AproofthatagrammarGgeneratesalanguageLhastwoparts:showthateverystring
generatedbyGisinL,andconverselythateverystringinLcanindeedbegeneratedbyG.
Example3.7:ConsiderthefollowinggrammarS( S ) S | . this simple grammar generates all
stringsofbalancedparentheses.ToshowthateverysentencederivablefromSisbalanced,weuse
aninductiveproofonthenumberofstepsninaderivation.
BASIS:Thebasisisn=1.TheonlystringofterminalsderivablefromSinonestepistheempty
string,whichsurelyisbalanced.
INDUCTION:Nowassumethatallderivationsoffewerthannstepsproducebalancedsentences,
andconsideraleftmostderivationofexactlynsteps.Suchaderivationmustbeoftheform
ThederivationsofxandyfromStakefewerthannsteps,sobytheinductivehypothesisxandy
arebalanced.Therefore,thestring(x)ymustbebalanced.
Thatis,ithasanequalnumberofleftandrightparentheses,andeveryprefixhasatleastasmany
leftparenthesesasright.
HavingthusshownthatanystringderivablefromSisbalanced,
WemustnextshowthateverybalancedstringisderivablefromS.
Todoso,useinductiononthelengthofastring.
BASIS:If the string is of length 0, it must be, which is balanced.
INDUCTION:First,observethateverybalancedstringhasevenlength.Assumethatevery
balancedstringoflengthlessthan 2n isderivablefromS,andconsiderabalancedstringwof
lengthn, n 1. Surely w begins with aleftparenthesis.Let(x)betheshortestnonemptyprefixof
whavinganequalnumberofleftandrightparentheses.Thenwcanbewrittenasw=(x)ywhere
bothxandyarebalanced.Sincexandyareoflengthlessthan2n,theyarederivablefromSbythe
inductivehypothesis.Thus,wecanfindaderivationoftheform
Provedthatw=(x)yisalsoderivablefromS.
3.2.7ContextFreeGrammarsversusRegularExpressions
Everyregularlanguageisacontextfreelanguage,butnotvice
versa.Example3.8:Thegrammarforregularexpression(a|b)*abb
A aA|bA|aB
B bC
C b
Describethesamelanguage,thesetofstringsofa'sandb'sendinginabb.Sowecaneasily
describetheselanguageseitherbyfiniteautomataorPDA.
n n
usesstackasitsmemory.
CS6660
Compiler Design
Unit III
3.6
3.2.8Leftrecursion
AcontextfreegrammarissaidtobeleftrecursiveifithasanonterminalAwithtwo
productionsinthefollowingform.
AA |
Where and are sequences of terminals and nonterminals that do not startwithA.
Leftrecursionintopdownparsingcanenterintoinfiniteloop.Itcreatesseriousproblems,
sowehaveavoidLeftrecursion.
Forexample,inexprexpr+term|term
Figure3.4:Leftrecursiveandrightrecursivewaysofgeneratingastring
ALGORITHM3.1Eliminatingleftrecursion.
INPUT:GrammarGwithnocyclesorproductions.
OUTPUT:Anequivalentgrammarwithnoleftrecursion.
METHOD:ApplythealgorithmtoG.Notethattheresultingnonleftrecursive
grammarmay have productions.
arrange the nonterminals in some order A1, A, ,
An. for(eachifrom1ton){
for(eachjfrom1toi1){
replaceeachproductionoftheformAiAj by the
productionsAi1|2| |k, where
Aj1|2| |kareallcurrentAjproductions
}
eliminatetheimmediateleftrecursionamongtheAiproductions
}
Note:SimplymodifytheleftrecursiveproductionAA | to
A A'
A' A' |
Example3.9:Considerthegrammarforarithmeticexpressions.
EE+T|T
TT*F|F
E(E)|id
CS6660
Compiler Design
Unit III
3.7
EliminateleftrecursiveproductionsEandTbyapplyingtheleftrecursionEliminatingAlgorithm.
IfAA | thenA
A'
A' A' |
TheproductionEE+T|Tisreplacedby
ETE'E'
+T E' |
TheproductionTT*F|Fisreplacedby
TFT'T'
* F T' |
Therefore,finallyweobtain,
ETE'E'
+T E' |
TFT'T'
* F T' |
E(E)|id
Example3.10:Considerthegrammar,Eliminateleftrecursiveproductions.
SAa|b
AA c | S d |
Thereisnoimmediateleftrecursion.TogetitsubstituteSproductionin
A.AA c | A a d | b d |
AA c | A a d | b d | is replaced by
AbdA'|A'A'
c A' | a d A' |
Therefore,finallyweobtaingrammarwithoutleftrecursion,
SAa|b
AbdA'|A'A'
c A' | a d A' |
Example3.11:Considerthegrammar
A ABd|Aa|a
B Be|b
Thegrammarwithoutleftrecursionis
AaA'
A'B d A' | a A' |
BbB'
B'e B' |
Example3.12:Eliminateleftrecursionfromthegivengrammar.AAc|Aad|bd|b
cAfterremovingleftrecursion,thegrammarbecomes,
AbdA'|bc
A'A'cA'|ad
A'A'
CS6660
Compiler Design
Unit III
3.8
3.2.9Leftfactoring
Leftfactoringisaprocessoffactoringoutthecommonprefixesoftwoormoreproduction
alternatesforthesamenonterminal.
Algorithm3.2:Leftfactoringa
grammar.INPUT:GrammarG.
OUTPUT:Anequivalentleftfactoredgrammar.
ETHOD: For each nonterminal A, find the longest prefix common to two or more of
its alternatives.Ifai.e.,thereisanontrivialcommonprefixreplacealloftheA
productionsA 1 | 2 | | n| , where represents all alternatives thatdonotbegin
with , by
AA' |
A' 1| 2 | | n
Here A' is a new nonterminal. Repeatedly apply this transformation until no two
alternativesforanonterminalhaveacommonprefix.
Example3.13:Eliminateleftfactorsfromthegivengrammar.ST+S|T
Afterleftfactoring,thegrammarbecomes,
STL
L+ S |
Example3.14:Leftfactorthefollowinggrammar.SiEtS|iEtSeS|a;Eb
Afterleftfactoring,thegrammarbecomes,
SiEtSS'|a
S'e S |
Eb
Uses:
Leftfactoringisusedinpredictivetopdownparsingtechnique.
CS6660
Compiler Design
Unit III
3.9
3.3TOPDOWNPARSINGGENERALSTRATEGIES
Topdownparsingcanbeviewedastheproblemofconstructingaparsetreefortheinput
string,startingfromtherootandcreatingthenodesoftheparsetreeinpreorder(depthfirst).Top
downparsingcanbeviewedasfindingaleftmostderivationforaninputstring.
Parsers are generally distinguished by whether they work topdown (start with the
grammar'sstartsymbolandconstructtheparsetreefromthetop)orbottomup(startwiththe
terminalsymbolsthatformtheleavesoftheparsetreeandbuildthetreefromthebottom).Top
downparsersincluderecursivedescentandLLparsers,whilethemostcommonformsofbottom
upparsersareLRparsers.
Types of
parser
Top down
parser
Backtrackin
g
Bottom up
parser
Predictive
parser
Recursive
descent
Shift Reduce
parser
LL(1)
parser
SLR
parser
LR parser
LALR
parser
(C) LR
parser
Figure3.5:Typesofparser
Example3.15:Thesequenceofparsetreesfortheinputid+id*idinatopdownparse(LMD).
E TE'
E'+ T E' |
T FT'
T'* F T' |
F (E)|id
Figure3.6:Topdownparseforid+id*id
CS6660
Compiler Design
Unit III
3.10
3.4RECURSIVEDESCENTPARSER
Theseparsersuseaprocedureforeachnonterminal.Theprocedurelooksatitsinputand
decideswhichproductiontoapplyforitsnonterminal.Terminalsinthebodyoftheproductionare
matchedtotheinputattheappropriatetime,whilenonterminalsinthebodyresultincallstotheir
procedure.Backtracking,inthecasewhenthewrongproductionwaschosen,isapossibility.
voidA()
{
ChooseanAproduction,AX1X2...
Xk;for(i=ltok)
{
if(Xiisanonterminal)call
procedureXi();
elseif(Xiequalsthecurrentinputsymbola)
advancetheinputtothenextsymbol;
else/*anerrorhasoccurred*/;
}
}
Example3.16:Considerthe
grammarScAd
Aab|a
Toconstructaparsetreetopdownfortheinputstringw=cad,beginwithatreeconsisting
ofasinglenodelabeledS,andtheinputpointerpointingtoc,thefirstsymbolofw.Shasonlyone
production,soweuseittoexpandSandobtainthetreeofFigure3.7(a).Theleftmostleaf,labeled
c,matchesthefirstsymbolofinputw,soweadvancetheinputpointertoa,thesecondsymbolof
w,andconsiderthenextleaf,labeledA.
Now,weexpandAusingthefirstalternativeAabtoobtainthetreeofFigure3.7(b).We
haveamatchforthesecondinputsymbol,a,soweadvancetheinputpointertod,thethirdinput
symbol,andcomparedagainstthenextleaf,labeledb.Sincebdoesnotmatchd,wereportfailure
andgobacktoAtoseewhetherthereisanotheralternativeforAthathasnotbeentried,butthat
mightproduceamatch
S
S
c
c
a
S
d
b
S
d
Figure3.7:Stepsinatopdownparse
ThesecondalternativeforAproducesthetreeofFigure3.7(c).Theleafamatchesthe
secondsymbolofwandtheleafdmatchesthethirdsymbol.Sincewehaveproducedaparsetree
forw,wehaltandannouncesuccessfulcompletionofparsing.
CS6660
Compiler Design
Unit III
3.11
3.5PREDICTIVEPARSER(NONRECURSIVE)
Anonrecursivepredictiveparsercanbebuiltbymaintainingastackexplicitly,ratherthan
implicitlyviarecursivecalls.Theparsermimicsaleftmostderivation.Ifwistheinputthathas been matched so far, then the stack holds a sequence of grammar symbols such
that
Swalm
ThetabledrivenparserinFigure3.8hasaninputbuffer,astackcontainingasequenceof
grammarsymbols,aparsingtableconstructed,andanoutputstream.Theinputbuffercontainsthe
stringtobeparsed,followedbytheendmarker$.Wereusethesymbol$tomarkthebottomofthe
stack,whichinitiallycontainsthestartsymbolofthegrammarontopof$.
TheparseriscontrolledbyaprogramthatconsidersX,thesymbolontopofthestack,and
a,thecurrentinputsymbol.IfXisanonterminal,theparserchoosesanXproductionbyconsulting
entryM[X,a]oftheparsingtableM.Otherwise,itchecksforamatchbetweentheterminalXand
currentinputsymbola.
Input
+ b
Predictive
X
Stack
Parsing
Program
Outp
ut
Z
$
Parsing
Table M
Figure3.8:Modelofatabledrivenpredictiveparser
Algorithm3.3:Tabledrivenpredictiveparsing.
INPUT:AstringwandaparsingtableMforgrammarG.
OUTPUT:IfwisinL(G),aleftmostderivationofw;otherwise,anerrorindication.
METHOD:Initially,theparserisinaconfigurationwithw$intheinputbufferandthestartsymbol
SofGontopofthestack,above$.ThefollowingprocedureusesthepredictiveparsingtableMto
produceapredictiveparsefortheinput.
setiptopointtothefirstsymbolofw;
setXtothetopstacksymbol;
while( X $ ) { /* stack is not empty */
if(Xisa)popthestackandadvance
ip;elseif(Xisaterminal)error();
elseif(M[X,a]isanerrorentry)error();
elseif(M[X,a]=XY1Y2Yk){
outputtheproductionXY1Y2
Yk;popthestack;
pushYkYk1Y1ontothestack,withYlontop;
}
setXtothetopstacksymbol;
}
CS6660
Compiler Design
Unit III
3.12
Example3.17:Considergrammarfortheinputid+id*id
usingthenonrecursivepredictive
parser.
E TE'
E'+ T E' |
T FT'
T'* F T' |
F (E)|id
idE
id+TE
id+idT'E
idTE
id+FT'E
TE
FTE
id+id*FT'E
id+id*id
id+id*idT'E
E id+id*id
Figure3.9:Movesmadebyapredictiveparseroninputid+id*id
3.6LL(1)PARSER
Agrammarsuchthatitispossibletochoosethecorrectproductionwithwhichtoexpanda
givennonterminal,lookingonlyatthenextinputsymbol,iscalledLL(1).Thesegrammarsallowus
toconstructapredictiveparsingtablethatgives,foreachnonterminalandeachlookaheadsymbol,
thecorrectchoiceofproduction.Errorcorrectioncanbefacilitatedbyplacingerrorroutinesin
someorallofthetableentriesthathavenolegitimateproduction.
LL(1)Grammars
Predictiveparsers(recursivedescentparsers)needingnobacktracking,canbeconstructed
foraclassofgrammarscalledLL(1).Thefirst"L"inLL(1)standsforscanningtheinputfromleft
toright,thesecond"L"forproducingaleftmostderivation,andthe"1"forusingoneinputsymbol
oflookaheadateachsteptomakeparsingactiondecisions.
CS6660
Compiler Design
Unit III
3.13
TransitionDiagramsforPredictiveParsers
Transitiondiagramsareusefulforvisualizingpredictiveparsers.Toconstructthetransition
diagramfromagrammar,firsteliminateleftrecursionandthenleftfactorthegrammar.Then,for
eachnonterminalA,
1. Createaninitialandfinal(return)state.
2. ForeachproductionAX1X2Xk,createapathfromtheinitialtothefinalstate,with
edgeslabeledX1,X2, , Xk.IfA,thepathisanedgelabeled.
AgrammarGisLL(1)ifandonlyifwheneverA | are two distinct productions of G, the
followingconditionshold:
1. Fornoterminalado both and derive strings beginning with a.
2. At most one of and can derive the empty string.
3. If
then doesnotderiveanystringbeginningwithaterminalinFOLLOW(A).
, then doesnotderiveanystringbeginningwithaterminalin
ikewise, if
FOLLOW(A).
PredictiveparserscanbeconstructedforLL(1)grammarsincetheproperproductionto
applyforanonterminalcanbeselectedbylookingonlyatthecurrentinputsymbol.Flowof
controlconstructswiththeirdistinguishingkeywordsgenerallysatisfytheLL(1)constraints.For
instance,
Stmtif(expr)stmtelsestmt|while(expr)stmt|{stmt_list}
Fortheaboveproductionsthekeywordsif,while,andsymbol{telluswhichalternateiveis
onlyonethatcouldpossiblysucceed.
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminalsorE:canbeaddedtoanyFIRSTset.
1.
IfXisaterminal,thenFIRST(X)={X}.
2.
CS6660
FIRST():
Compiler Design
FIRST(S)={i,a},
FOLLOW():FOLLOW(S)={e,$},
NON
TERMINAL
S
S'
Unit III
3.14
FIRST(S') ={e, },
FIRST(E)={b}
FOLLOW(S')={e,$},
FOLLOW(E)={t,$}
INPUTSYMBOL
a
Sa
i
SiEtSS'
S'eS
S'
S'
Eb
3.7SHIFTREDUCEPARSER
Bottomupparsersgenerallyoperatebychoosing,onthebasisofthenextinputsymbol
(lookaheadsymbol)andthecontentsofthestack,whethertoshiftthenextinputontothestack,or
toreducesomesymbolsatthetopofthestack.Areducesteptakesaproductionbodyatthetopof
thestackandreplacesitbytheheadoftheproduction.
Example3.19:Considertheproductionrulesfortheshiftreduceparseroninputid*id.
E E+T|T
T T*F|F
F (E)|id
STACK
$
$id1
$F
$T
$T*
$T*id2
$T*F
$T
$E
INPUT ACTION
id1*id2
*id2
*id2
*id2
id2
$
$
$
$
$
$
$
$
$
shift
reducebyFid
reducebyTF
shift
shift
reducebyFid
reducebyTT*F
reducebyET
accept
Theactionsofashiftreduceparseroninputid*id,usingtheLR(0)automaton.Weusea
stacktoholdstates,thegrammarsymbolscorrespondingtothestatesonthestackappearincolumn
SYMBOLS. At line (1), the stack holds the start state 0 of the automaton; the corresponding
symbolisthebottomofstackmarker$.
CS6660
Compiler Design
Unit III
3.15
3.8LRPARSER
AschematicofanLRparserisshowninFigure3.10.Itconsistsofaninput,anoutput,a
stack,adriverprogram,andaparsingtablethathastwopasts(ACTIONandGOTO).Thedriver
programisthesameforallLRparsers;onlytheparsingtablechangesfromoneparsertoanother.
Theparsingprogramreadscharactersfromaninputbufferoneatatime.Whereashiftreduce
parserwouldshiftasymbol,anLRparsershiftsastate.Eachstatesummarizestheinformation
containedinthestackbelowit.
Input
a1
ai
an
LR
Stack
Sm
S
Output
Parsing
Program
m1
ACTION
GOTO
Figure3.10:ModelofanLRparser
Thestackholdsasequenceofstates,s0s1smwheresm,isontop.IntheSLRmethod,thestack
holdsstatesfromtheLR(0)automaton;thecanonicalLRandLALRmethodsaresimilar.
StructureoftheLRParsingTable
Theparsingtableconsistsoftwoparts:aparsingactionfunctionACTIONandagotofunction
GOTO.
(c)
(d)
w$;while(1)
CS6660
Compiler Design
Unit III
3.16
{/*repeatforever*/
letsbethestateontopofthe
stack;if(ACTION[S,a]=shiftt)
{
pushtontothestack;
letabethenextinputsymbol;
}
elseif(ACTION[S,a]=reduceA)
{
pop||symbolsoffthestack;
letstatetnowbeontopofthe
stack;pushGOTO[t,A]ontothe
stack;outputtheproductionA;
}
elseif(ACTION[S,a]=accept)break;/*parsingisdone
*/elsecallerrorrecoveryroutine;
}
Example3.20:TheFigure4.37showstheACTIONandGOT0functionsofanLRparsingtable
fortheexpressiongrammar
EE+T|T
TT*F|F
E(E)|id
STATE
0
1
2
3
4
5
6
7
8
9
10
11
id
s5
+
s6
r2
r4
ACTION
*
(
s4
s7
r4
s5
r2
r4
accept
r2
r4
s4
r6
r6
s5
s5
r6
r3
r5
Figure3.11Parsingtableforexpressiongrammar
s1
r1
r3
r5
F
3
3
10
r6
s4
s4
s6
r1
r3
r5
E
1
GOTO
T
2
r1
r3
r5
CS6660
Compiler Design
Unit III
3.17
3.9LR(0)ITEM
AnLRparsermakesshiftreducedecisionsbymaintainingstatestokeeptrackofwherewe
areinaparse.Statesrepresentsetsof"items".AnLR(0)item(itemforshort)ofagrammarGisa
productionofGwithadotatsomepositionofthebody.Thus,productionAXYZyieldsthefour
items.
AX Y Z
AXY Z
AX YZ
AX Y Z
TheproductionAgenerates only one item, A.
Anitemindicateshowmuchofaproductionwehaveseenatagivenpointintheparsing
process.
Forexample,theitemAXYZ indicates that we hope to see a string derivable from
XYZnextontheinput.Item
AXY ZindicatesthatwehavejustseenontheinputastringderivablefromXandthat
wehopenexttoseeastringderivablefromYZ.
ItemA X Y Z indicatesthatwehaveseenthebodyXYZandthatitmaybetimeto
reduceXYZtoA.
OnecollectionofsetsofLR(0)items,calledthecanonicalLR(0)collection,providesthe
basis for constructing a deterministic finite automaton that is used to make parsing
decisions.SuchanautomatoniscalledanLR(0)automaton.
ToconstructthecanonicalLR(0)collectionforagrammar,wedefineanaugmented
grammarandtwofunctions,CLOSUREandGOTO.IfGisagrammarwithstartsymbolS,
thenG',theaugmentedgrammarforG,isGwithanewstartsymbolS'andproductionS'
S.Thepurposeofthisnewstartingproductionistoindicatetotheparserwhenitshould
stopparsingandannounceacceptanceoftheinput.Thatis,acceptanceoccursonlywhen
theparserisabouttoreducebyS'S.
ClosureofItemSets
IfIisasetofitemsforagrammarG,thenCLOSURE(I)isthesetofitemsconstructedfrom
Ibythetworules:
1. Initially,addeveryiteminItoCLOSURE(I).
2. IfABisinCLOSURE(I)andBis a production, then add the item Bto
CLOSURE(I),ifitisnotalreadythere.Applythisruleuntilnomorenewitemscanbe
addedtoCLOSURE(I)
Intuitively,ABinCLOSURE(I)indicatesthat,atsomepointintheparsingprocess,
wethinkwemightnextseeasubstringderivablefromBasinput.Thesubstringderivablefrom
BwillhaveaprefixderivablefromBbyapplyingoneoftheBproductions.Wethereforeadd
itemsforalltheBproductions;thatis,ifBis a production, we also include Bin
CLOSURE(I).
Aconvenientwaytoimplementthefunctionclosureistokeepabooleanarrayadded,
indexedbythenonterminalsofG,suchthatadded[B]issettotrueifandwhenweaddtheitemB
for each BproductionB.
Wecandivideallthesetsofitemsofinterestintotwoclasses.Theyare:
1. Kernelitems:theinitialitem,S'S,andallitemswhosedotsarenotattheleftend.
2. Nonkernelitems:allitemswiththeirdotsattheleftend,exceptforS'S
CS6660
Compiler Design
Unit III
3.18
SetOfItemsCLOSURE(I)
{
J=I;
repeat
for(eachitemABinJ)
for(eachproductionBof G )
if(Bis not in J )
addBto J;
untilnomoreitemsareaddedtoJonone
round;returnJ;
}
Figure3.32:ComputationofCLOSURE
Example4.21:Considertheaugmentedexpression
grammar:E'E
EE+T|T
TT*F|F
E(E)|id
IfIisthesetofoneitem{[E'E]},thenCLOSURE(I)containsthesetofitemsI0inFigure.
E'E
EE+T
ET
TT*F
TF
E(E)
E id
3.10 CONSTRUCTIONOFSLRPARSINGTABLE
TheSLRmethodforconstructingparsingtablesisagoodLRparsing.theparsingtable
constructed by this LR parser using an SLRparsing table called SLR parser. The other two
methodsaugmenttheSLRmethodwithlookaheadinformation.TheSLRmethodbeginswith
LR(0)itemsandLR(0)automata.
Givenagrammar,G,weaugmentG'toproduceG,withanewstartsymbolS'.FromG',we
constructC,thecanonicalcollectionofsetsofitemsforG'togetherwiththeGOTOfunction.
Algorithm3.5:ConstructinganSLRparsingtable.
INPUT:AnaugmentedgrammarG'.
OUTPUT:TheSLRparsingtablefunctionsACTIONandGOTOforG'.
METHOD:
1. ConstructC={I0,I1,...,In},thecollectionofsetsofLR(0)itemsforG'.
2. StateiisconstructedfromIi.Theparsingactionsforstateiaredeterminedasfollows:
(a) If[Aa]isinIi,andGOTO(Ii,a)=Ij,thensetACTION[i,a]to"shiftj".Herea
mustbeaterminal.
CS6660
Compiler Design
Unit III
3.19
(r1)
(r2)
(r3)
(r4)
(r5)
(r6)
Step2:ClosureofE'
ThesetofitemsI0:
E'E
E E+T
E T
T T*F
T F
F (E)
F id
Step3:GOTOoperationof
everysymbolonI0items:
Goto(I0,E):I1
E'E
EE+T
ET
TT*F
Goto(I0,F):I3
TF
Goto(I0,():I4
E
E
E
T
T
F
F
(E)
E+T
T
T*F
F
(E)
id
Goto(I0,id):I5
Fid
Goto(I1,+):I6
E E+T
T T*F
T F
F (E)
Goto(I2,*):I7
TT*FF(E)
Fid
Goto(I4,E):I8
F(E)
EE+T
Goto(I7,F):I10
Goto(I6,T):I9
EE+TT
Goto(I8,)):I11
T*F
TT*F
F(E)
CS6660
Compiler Design
Unit III
Step4:ConstructionofDFA
Figure3.12:LR(0)automatonDFAwitheverystateasfinalandI0asinitial.
Step5:ConstructionofFOLLOWSETfornonterminals
FOLLOW(E')={$}becauseE'isastartsymbol.
FOLLOW(E):
E'E
i.e., follow of E is E, so add $ because E is start symbol
E E+T i.e.,followofEis+,soadd+
F (E)
i.e.,followofEis),soadd)
FOLLOW(E)={+,),$}
FOLLOW(T):
AsE'E,ET.i.e.,E'=E=T=startsymbol.add$
E E+TT+Ti.e.,followofTis+,soadd+
T T*Fi.e.,followofTis*,soadd*
F (E) i.e., follow of T is ), so add )
FOLLOW(T)={+,*,),$}
3.20
CS6660
Compiler Design
Unit III
3.21
FOLLOW(F):
E'E,E T,TF.i.e.,E'=E=T=F=startsymbol. add$
EE+TT+TF+Ti.e.,followofFis+,soadd+
T T*F F*FT
i.e.,followofFis*,soadd*
F (E)(T)(F)
i.e.,followofFis),soadd)
FOLLOW(F)={+,*,),$}
Step6:Construction
SLRparsingtablewillbeconstructedusingtheAlgorithm3.5ConstructinganSLR
parsing
table.
Step7:Tablefilling
FirstconsiderthesetofitemsI0:
TheitemF (E)givesrisetotheentryinactiontable ACTION[0,(]=shift4, ,Goto(I0,():I4.
TheitemF idgivesrisetotheentryinactiontable ACTION[0,id]=shift5, ,Goto(I0,id):I4.
OtheritemsinI0yieldnoactions.
NowconsiderI1:E'EandEE+T
ThefirstitemyieldsACTION[1,$]=accept,andthesecondyieldsACTION[1,+]==shift6.
NextconsiderI2:ETandTT*F
SinceFOLLOW(E)={$,+,)},thefirstitemmakes
ACTION[2,$]=ACTION[2,+]=ACTION[2,)]=reduceET
TheseconditemmakesACTION[2,*]=shift7.Andsoon.
STATE
0
1
2
3
4
5
6
7
8
9
10
11
CS6660
id
s5
+
s6
r2
r4
ACTION
*
(
s4
s7
r4
s5
r2
r4
accept
r2
r4
s4
r6
r6
s5
s5
r6
E
1
GOTO
T
2
F
3
3
10
r6
s4
s4
s6
r1
r3
r5
s7
r3
r5
Compiler Design
s11
r1
r3
r5
r1
r3
r5
Unit III
3.22
Step8:Inputparsing
LINE
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
STACK
0
05
03
02
027
0275
027510
02
01
016
0165
0163
0169
01
SYMBOLS
id
F
T
T*
T*id
T*F
T
E
E+
E+id
E+F
E+T
E
INPUT
id*id+id$
*id+id$
*id+id$
*id+id$
id+id$
+id$
+id$
+id$
+id$
id$
$
$
$
$
ACTION
shift
reducebyFid
reducebyTF
shift
shift
reducebyFid
reducebyTT*F
reducebyET
shift
shift
reducebyFid
reducebyTF
reducebyET+T
accept
Atline(1),thestackholdsthestartstate0oftheautomaton;thecorrespondingsymbolis
thebottomofstackmarker$.Thenextinputsymbolisidandstate0hasatransitiononidtostate
5.Wethereforeshift.Atline(2),state5(symbolid)hasbeenpushedontothestack.Thereisno
transitionfromstate5oninput*,sowereduce.Fromitem[Fid]instate5,thereductionisby
productionFid.
Withsymbols,areductionisimplementedbypoppingthebodyoftheproductionfromthe
stack(online(2),thebodyisid)andpushingtheheadoftheproduction(inthiscase,F).With
states,wepopstate5forsymbolid,whichbringsstate0tothetopandlookforatransitiononF,
theheadoftheproduction.
3.11INTRODUCTIONTOLALRPARSER
The LALR (EoolcaheadLR) technique is often used in practice, because the tables
obtainedbyLALRparserareconsiderablysmallerthanthecanonicalLRparsertables.
Foracomparisonofparsersize,theSLRandLALRtablesforagrammaralwayshavethe
samenumberofstates,andthisnumberistypicallyseveralhundredstatesforalanguagelikeC.
ThecanonicalLRtablewouldtypicallyhaveseveralthousandstatesforthesamesizelanguage.
Thus, it is much easier and more economical to construct SLR and LALR tables than the
canonicalLRtables.
Algorithm3.6::Aneasy,butspaceconsumingLALRtableconstruction.
INPUT:AnaugmentedgrammarG'.
OUTPUT:TheLALRparsingtablefunctionsACTIONandGOT0forG'.
METHOD:
1. ConstructC=(I0,I1,, In),thecollectionofsetsofLR(1)items.
2. ForeachcorepresentamongthesetofLR(1)items,findallsetshavingthatcore,and
replacethesesetsbytheirunion.
3. LetC'={J0,J1,,Jm}betheresultingsetsofLR(1)items.Theparsingactionsforstatei
areconstructedfromJi.Ifthereisaparsingactionconflict,thealgorithmfailstoproducea
parser,andthegrammarissaidnottobeLALR(1).
CS6660
Compiler Design
Unit III
3.23
4. TheGOTOtableisconstructedasfollows.IfJistheunionofoneormoresetsofLR(1)
items,thatis,J= I1I2 Ik,thenthecoresofGOTO(I1,X),GOTO(I2,X), ,
GOTO(In,X)arethesame,sinceI1,I2, ,Ik,allhavethesamecore.LetKbetheunionof
allsetsofitemshavingthesamecoreasGOTO(I1,X).ThenGOTO(J,X)=K.
Algorithm3.7:ConstructionofthesetsofLR(1)
items.INPUT:AnaugmentedgrammarG'.
OUTPUT:ThesetsofLR(1)itemsthatarethesetofitemsvalidforoneormoreviableprefixesofG'.
METHOD:TheproceduresCLOSUREandGOT0andthemainroutineitemsforconstructingthe
setsofitemsweregivenbelow.
SetOfftemsCLOSURE(I)
{
repeat
for(eachitem[AB,a]inI)
for(eachproductionBinG')
for(eachterminalbinFIRST(a))
add[B,b]tosetI;
untilnomoreitemsareaddedtoI;
returnI;
}
SetOfItemsGOTO(I,X){
initializeJtobetheemptyset;
for(eachitem[AX,a]inI)add
item[AX,a]tosetJ;
returnCLOSURE(J);
}
voiditems(G')
{
initializeCtoCLOSURE({[S'S,$]});
repeat
for(eachsetofitemsIinC)
for(eachgrammarsymbolX)
if(GOTO(I,X)isnotemptyandnotinC
)addGOTO(I,X)toC;
untilnonewsetsofitemsareaddedtoC;
}
Example3.23:Considerthefollowingaugmentedgrammar.
S'S
S CC
CcC|d
ConstructparsingtableforLALR(1)parser.
CS6660
ConstrcutionofSetof
LR(1)items.
I0:
SS,$
SCC,$
CcC,c/d
Cd, c/d
I1:GOTO(I0,S)
Compiler Design
I3:GOTO(I0,c)
C cC, c/d C
I6:goto(I2,c)
C cC, $ C
cC, c/d C
cC, $C
d, c/d
d, $
cC, c/dCd,
c/d
3.24
I7:GOTO(I2,d)
I4:GOTO(I0,d)
Cd, $
Cd, c/d
I8:GOTO(I3,C)
SS, $
I2:GOTO(I0,C)
SCC, $C
Unit III
CcC, c/d
I5:GOTO(I2,C)
SCC,$
I9:GOTO(I6,C)
CcC, $
Figure3.13:TheGOTOgraphfortheabovegrammar
CS6660
Compiler Design
Unit III
3.25
Therearethreepairsofsetsofitemsthatcanbemerged.I3andI6arereplacedbytheirunion:
I36:GOTO(I0,c)
CcC, c/d/$
CcC, c/d/$
Cd, c/d/$
I4andI7arereplacedbytheirunion:
I47:GOTO(I0,d)
Cd, c/d/$
I8andI9arereplacedbytheirunion:
I8:GOTO(I3,C)
CcC, c/d/$
TheLALRACTIONandGOTOfunctionsforthecondensedsetsofitemsareshownintable3.4
STATE
0
1
2
36
47
5
89
ACTION
c
s36
d
s47
GOTO
S
1
C
2
Accept
s36
s36
r3
s47
s47
r3
r2
r2
5
89
r3
r1
r2
Parsingtheinputstringccdd
Stack
$0
$0c36
$0c36c36
$0c36c36d47
$0c36c36C89
$0c36C89
$0C2
$0C2d47
$0C2C5
$0S1
Inputbuffer
ccdd$
cdd$
dd$
d$
d$
d$
d$
$
$
$
Actiontable
Gototable
action[0,c]=s36
action[36,c]=s36
action[36,d]=s47
action[47,d]=r36 [36,C]=89
action[89,d]=r2
[36,C]=89
action[89,d]=r2
[0,C]=2
action[2,d]=s47
action[47,$]=r36 [2,C]=5
action[5,$]=r1
[0,S]=1
accept
Parsingaction
Shift
Shift
ReducebyCd
ReducebyCcC
ReducebyCcC
Shift
ReducebyCd
ReducebySCC
CS6660
Compiler Design
Unit III
3.26
3.12ERRORHANDLINGANDRECOVERYINSYNTAXANALYZER
SyntaxErrorHandling
Ifacompilerhadtoprocessonlycorrectprograms,itsdesignandimplementationwouldbe
simplified greatly. However, a compiler is expected to assist the programmer in locating and
trackingdownerrorsthatinevitablycreepintoprograms,despitetheprogrammer'sbestefforts.
Fewlanguageshavebeendesignedwitherrorhandlinginmind,eventhougherrorsareso
commonplace.
Mostprogramminglanguagespecificationsdonotdescribehowacompilershouldrespond
toerrors;errorhandlingislefttothecompilerdesigner.
Planningtheerrorhandlingrightfromthestartcanbothsimplifythestructureofacompiler
andimproveitshandlingoferrors.
Commonprogrammingerrorscanoccuratmanydifferentlevels.
Lexicalerrorsincludemisspellingsofidentifiers,keywords,oroperatorse.g.,theuseof
anidentifierelipsesizeinsteadofellipsesizeandmissingquotesaroundtextintendedasa
string.
Syntacticerrors includemisplacedsemicolonsorextraormissingbraces;thatis,'(("or
")." As another example, in C or Java, the appearance of a case statement without an
enclosingswitchisasyntactic.
Semanticerrorsincludetypemismatchesbetweenoperatorsandoperands.Anexampleis
areturnstatementinaJavamethodwithresulttypevoid.
Logicalerrorscanbeanythingfromincorrectreasoningonthepartoftheprogrammerto
theuseinaCprogramoftheassignmentoperator=insteadofthecomparisonoperator==.
Syntacticerrorsaredetectedveryefficientlybysyntaxanalyzers.Severalparsingmethods,
suchastheLLandLRmethods,detectanerrorassoonaspossible.
Anotherreasonforemphasizingerrorrecoveryduringparsingisthatmanyerrorsappear
syntactic,whatevertheircause,andareexposedwhenparsingcannotcontinue.
Afewsemanticerrors,suchastypemismatches,canalsobedetectedefficiently;however,
accuratedetectionofsemanticandlogicalerrorsatcompiletimeisadifficulttask.ingeneral.
Theerrorhandlerinaparserhasgoalsthataresimpletostatebutchallengingtorealize:
Reportthepresenceoferrorsclearlyandaccurately.
Recoverfromeacherrorquicklyenoughtodetectsubsequenterrors.
Addminimaloverheadtotheprocessingofcorrectprograms.
Acommonstrategyistoprinttheoffendinglinewithapointertothepositionatwhichan
errorisdetected.
ErrorRecoveryStrategies
Onceanerrorisdetected,itshouldberecoveredbyparser.Thesimplestapproachisforthe
parsertoquitwithaninformativeerrormessagewhenitdetectsthefirsterror.
Additional errors are often uncovered if the parser can restore itself to a state where
processingoftheinputcancontinue.
1.PanicModeRecovery
Withthismethod,ondiscoveringanerror,theparserdiscardsinputsymbolsoneatatime
untiloneofadesignatedsetofsynchronizingtokensisfound.Thesynchronizingtokensareusually
delimiters,suchassemicolonor},whoseroleinthesourceprogramisclearandunambiguous.
CS6660
Compiler Design
Unit III
3.27
Whilepanicmodecorrectionoftenskipsaconsiderableamountofinputwithoutcheckingitfor
additionalerrors,ithastheadvantageofsimplicity.
2.PhraseLevelRecovery
Ondiscoveringanerror,aparsermayperformlocalcorrectionontheremaininginput;that
is,itmayreplaceaprefixoftheremaininginputbysomestringthatallowstheparsertocontinue.
Atypicallocalcorrectionistoreplaceacommabyasemicolon,deleteanextraneoussemicolon,or
insertamissingsemicolon.Thechoiceofthelocalcorrectionislefttothecompilerdesigner.
Phraselevel replacement has been used in several errorrepairing compilers, as it can
correctanyinputstring.Itsmajordrawbackisthedifficultyithasincopingwithsituationsin
whichtheactualerrorhasoccurredbeforethepointofdetection(mustavoidinfiniteloops).
3.ErrorProductions
Byanticipatingcommonerrorsthatmightbeencountered,wecanaugmentthegrammarfor
thelanguageathandwithproductionsthatgeneratetheerroneousconstructs.
Aparserconstructedfromagrammaraugmentedbytheseerrorproductionsdetectsthe
anticipatederrorswhenanerrorproductionisusedduringparsing.Theparsercanthengenerate
appropriateerrordiagnosticsabouttheerroneousconstructthathasbeenrecognizedintheinput.
4.GlobalCorrection
Acompilermakesfewchangesaspossibleinprocessinganincorrectinputstring.Thereare
algorithmsforchoosingaminimalsequenceofchangestoobtainagloballyleastcostcorrection.
GivenanincorrectinputstringxandgrammarG,thesealgorithmswillfindaparsetreefor
arelatedstringy,suchthatthenumberofinsertions,deletions,andchangesoftokensrequiredto
transformxintoyisassmallaspossible.Unfortunately,thesemethodsareingeneraltoocostlyto
implementintermsoftimeandspace,sothesetechniquesarecurrentlyonlyoftheoreticalinterest.
3.13YACC
YACCisanacronymfor"YetAnotherCompilerCompiler".Itwasoriginallydevelopedin
theearly1970sbyStephenC.JohnsonatAT&TCorporationandwrittenintheBprogramming
language,butsoonrewritteninC.ItappearedaspartofVersion3Unix,andafulldescriptionof
Yaccwaspublishedin1975.
YaccisacomputerprogramfortheUnixoperatingsystem.ItisaLALRparsergenerator,
generatingaparser,thepartofacompilerthattriestomakesyntacticsenseofthesourcecode,
specificallyaLALRparser,basedonananalyticgrammarwritteninanotationsimilartoBNF.
YaccitselfusedtobeavailableasthedefaultparsergeneratoronmostUnixsystems.The
inputtoYaccisagrammarwithsnippetsofCcode(called"actions")attachedtoitsrules.Its
outputisashiftreduceparserinCthatexecutestheCsnippetsassociatedwitheachruleassoonas
theruleisrecognized.Typicalactionsinvolvetheconstructionofparsetrees.
CS6660
Compiler Design
Unit III
3.28
declarations
%%
translationrules
%%
supportingCroutine
Yacc Specification
Yacc
compiler
y.tab.c
y.tab.c
C
compiler
a.out
input
a.out
output
translate.y
Figure3.14:Creatinganinput/outputtranslatorwithYacc
Yaccspecificationofasimpledeskcalculator
%{
#include<ctype.h>
%}
%tokenDIGIT
%%
line :expr\n{printf(%d\n,$1);}
;
expr :expr+term($$=$1+$3;}
|term
;
term :term* factor { $$ = $1 * $; }
|factor
;
factor:(expr)($$=$2;}
:DIGIT
;
%%
yylex()
{
intc;
c=getchar();
if(isdigit(c))
{
yylval=c0);
returnDIGIT;
}
returnc;
CS6660
Compiler Design
Unit III
3.29
3.14DESIGNOFASYNTAXANALYZERFORASAMPLELANGUAGE
Source
files
Program
Generator
Lexical
Rules
Lex
Generated
Output files
Compiled
Compiler
Program
In execution
Lex.yy.c
cc
Gramma
r
Rules
yacc
Generatin
g
Output
y.tab.c
a.out
Parsed
Output
Input
Figure3.15:DesignofSyntaxAnalyzerwithLexandYacc
YACC(YetAnotherCompilerCompiler).
Automaticallygenerateaparserforacontextfreegrammar(LALRparser)
Allowssyntaxdirecttranslationbywritinggrammarproductionsandsemantic actions
LALR(1)ismorepowerfulthanLL(1).
Workwithlex.YACCcallsyylextogetthenexttoken.
YACCandlexmustagreeonthevaluesforeachtoken.
Likelex,YACCpredatedc++,needworkaroundforsomeconstructswhenusingc++(will
giveanexample).
YACCfileformat:
declarations /*specifytokens,andnonterminals*/
%%
translationrules/*specifygrammarhere*/
%%
supportingCroutines
Commandyaccyaccfileproducesy.tab.c,whichcontainsaroutineyyparse().
yyparse()callsyylex()togettokens.
yyparse()returns0iftheprogramisgrammaticallycorrect,nonzerootherwise
YACCautomaticallybuildsaparserforthegrammar(LALRparser).
Mayhaveshift/reduceandreduce/reduceconflictswhenthegrammarisnotLALR
Inthiscase,youwillneedtomodifygrammartomakeitLALRinorderforyaccto
workproperly.
YACCtriestoresolveconflictsautomatically
Defaultconflictresolution:
shift/reduce>shift
reduce/reduce>firstproductioninthestate
CS6660
Compiler Design
Unit III
3.30
Programtorecognizeavalidvariable(identifier)whichstartswithaletterfollowedbyanynumberof
lettersordigits.
LEX
%{
#include"y.tab.h"
externyylval;%}
%%
[09]+{yylval=atoi(yytext);return
DIGIT;}[azAZ]+{returnLETTER;}
[\t];
\nreturn0;
.{returnyytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%tokenLETTERDIGIT
%%
variable:LETTER|LETTERrest
;
rest:LETTERrest|
DIGITrest|
LETTER|
DIGIT
;
%%
main()
{
yyparse();
printf("Thestringisavalidvariable\n");
}
intyyerror(char*s)
{
printf("thisisnotavalid
variable\n");exit(0);
}
OUTPUT
$lexp4b.l
$yaccdp4b.y
$cclex.yy.cy.tab.cll
$./a.out
input34
Thestringisavalidvariable
$./a.out
89file
Thisisnotavalidvariable
CS6660
Compiler Design
Unit IV
4.1
UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT
4.1SYNTAXDIRECTEDDEFINITIONS
Asyntaxdirecteddefinition(SSD)isageneralizationofacontextfreegrammarinwhich
eachgrammarsymbolhasanassociatedsetofattributes,partitionedintotwosubsetscalledthe
synthesizedandinheritedattributesofthatgrammarsymbol.
Anattributecanrepresentastring,anumber,atype,amemorylocation,orwhatever.The
value of an attribute at a parsetree node is defined by a semantic rule associated with the
productionusedatthatnode.
Thevalueofasynthesizedattributeatanodeiscomputedfromthevaluesofattributesat
thechildrenofthatnodeintheparsetree;thevalueofaninheritedattributeiscomputedfromthe
valuesofattributesatthesiblingsandparentofthatnode.
Example4.1:ThesyntaxdirecteddefinitioninFigure4.1isforadeskcalculatorprogram,This
definition associates an integervalued synthesized attribute called val with each of the
nonterminalsE,T,andF.ForeachE,T,andFproduction.thesemanticrulecomputesthevalueof
attributevalforthenonterminalontheleftsidefromthevaluesofvalforthenonterminalsonthe
rightside.
Production
L En
E E1 +T
E T
T T1 *F
T F
F
(E)
F digit
Semanticrule
Print(E.val)
E.val:=E1.val+T.val
E.val:=T.val
T.val:=T1.val*F.val
T.val:=F.val
F.val:=E.val
F.val:=digit.lexval
Example4.2:Anannotatedparsetreefortheinputstring3*5+4n,
Figure4.1:Annotatedparsetreefor3*5+4n
CS6660
Compiler Design
Unit IV
4.2
4.2CONSTRUCTIONOFSYNTAXTREE
Syntaxdirecteddefinitionsareveryusefulforconstructionofsyntaxtrees.eachnodeina
syntaxtreerepresentsaconstruct;thechildrenofthenoderepresentthemeaningfulcomponentsof
theconstruct.AsyntaxtreenoderepresentinganexpressionE l+E2haslabel+andtwochildren
representingthesubexpressionsElandE2.
Thenodesofasyntaxtreeareimplementedbyobjectswithasuitablenumberoffields.
Eachobjectwillhaveanopfieldthatisthelabelofthenode.
Theobjectswillhaveadditionalfieldsasfollows:
Ifthenodeisaleaf,anadditionalfieldholdsthelexicalvaluefortheleaf.Aconstructor
functionLeaf(op,val)createsaleafobject.Alternatively,ifnodesareviewedasrecords,
thenLeafreturnsapointertoanewrecordforaleaf.
Ifthenodeisaninteriornode,thereareasmanyadditionalfieldsasthenodehas
childreninthesyntaxtree.AconstructorfunctionNodetakestwoormorearguments:
Node(op,c1,c2,...,ck)createsanobjectwithfirstfieldopandkadditionalfieldsfor
thekchildrenc1,...,ck.
Example4.3:TheSattributeddefinitioninFigureconstructssyntaxtreesforasimpleexpression
grammarinvolvingonlythebinaryoperators+and.Asusual,theseoperatorsareatthesame
precedencelevelandarejointlyleftassociative.Allnonterminalshaveonesynthesizedattribute
node,whichrepresentsanodeofthesyntaxtree.
EverytimethefirstproductionEEl+Tisused,itsrulecreatesanodewith'+'foropand
twochildren,El.nodeandT.node,forthesubexpressions.Thesecondproductionhasasimilarrule.
S.
No.
(1)
(2)
(3)
(4)
(5)
(6)
PRODUCTION
EEl+T
EElT
ET
T(E)
Tid
Tnum
SEMANTICRULES
E.node=newNode('+',El.node,T.node)
E.node=newNode('',El.node,T.node)
E.node=T.node
T.node=E.node
T.node=newLeaf(id,id.entry)
T.node=newLeaf(num,num.val)
Example4.3: TheLattributeddefinitioninFigure4.2performsthesametranslationastheS
attributeddefinitioninExample4.3.TheattributesforthegrammarsymbolsE,T,id,andnurnare
asdiscussedinExample4.3.
1) pl=newLeaf(id,entrya);
2)
3)
4)
5)
p2=newLeaf(num,4);
p3=newNode('',pl,p2);
p4=newLeaf(id,entryc);
p5=newNode('+',p3,p4);
CS6660
Compiler Design
Unit IV
4.3
Figure4.2:Syntaxtreefora4+c
Example4.4:InC,thetypeint[2][3]canbereadas,"arrayof2arraysof3integers."The
correspondingtypeexpressionarray(2,array(3,integer))isrepresentedbythetree.
Theoperatorarraytakestwoparameters,anumberandatype.Iftypesarerepresentedby
trees,thenthisoperatorreturnsatreenodelabeledarraywithtwochildrenforanumberandatype.
array
array
2
3
integer
Figure4.3:Typeexpressionforint[2][3]
4.3 BOTTOMUPEVALUATIONOFSATTRIBUTEDEFINITIONS
Anattributegrammarisaformalwaytodefineattributesfortheproductionsofa
formalgrammar,associatingtheseattributestovalues.Theevaluationoccursinthe
nodesoftheabstractsyntaxtree,whenthelanguageisprocessedbysomeparseror
compiler.
Theattributesaredividedintotwogroups: synthesized attributes andinherited
attributes.Thesynthesizedattributesaretheresultoftheattributeevaluationrules,
andmayalsousethevaluesoftheinheritedattributes.Theinheritedattributesare
passeddownfromparentnodes.
Insomeapproaches,synthesizedattributesareusedtopasssemanticinformation
up theparsetree,whileinheritedattributeshelppasssemanticinformationdown
andacrossit.
SyntaxdirecteddefinitionswithonlysynthesizedattributesarecalledSattributes.
ThisiscommonlyusedinLRparsers.
Theimplementationisdonebyusingastacktoholdinformationaboutsubtreesthat
havebeenparsed.
CS6660
Compiler Design
Unit IV
4.4
Atranslatorforanarbitrarysyntaxdirecteddefinitioncanbedifficulttobuild.However,
therearelargeclassesofusefulsyntaxdirecteddefinitionsforwhichitiseasytoconstruct
translators.
Onlysynthesizedattributesappearinthesyntaxdirecteddefinitioninthefollowingtablefor
constructingthesyntaxtreeforanexpression.
S.No.
(1)
(2)
(3)
(4)
(5)
(6)
PRODUCTION
EEl+T
EElT
ET
T(E)
Tid
Tnum
SEMANTICRULES
E.node=newNode('+',El.node,T.node)
E.node=newNode('',El.node,T.node)
E.node=T.node
T.node=E.node
T.node=newLeaf(id,id.entry)
T.node=newLeaf(num,num.val)
Thisapproachcanbeappliedtoconstructsyntaxtreesduringbottomupparsing.thetranslationof
expressionsduringtopdownparsingoftenusesinheritedattributes.
SynthesizedAttributesontheParserStack
AtranslatorforanSattributeddefinitioncanoftenbeimplementedwiththehelpofanLR
parsergenerator.
FromanSattributeddefinition,theparsergeneratorcanconstructatranslatorthatevaluates
attributesasitparsestheinput.
Abottomupparserusesastacktoholdinformationaboutsubtreesthathavebeenparsed.
Wecanuseextrafieldsintheparserstacktoholdthevaluesofsynthesizedattributes.
top
State
X
Y
Z
Value
X.x
Y.y
Z.z
Theabovetableshowsanexampleofaparserstackwithspaceforoneattributevalue.The
stackisimplementedbyapairofarraysstateandvalue.Eachstateentryisapointer(orindex)to
anLR(1)parsingtable,IftheithstatesymbolisA,thenvalue[i]willholdthevalueoftheattribute
associatedwiththeparsetreenodecorrespondingtothisA.
Thecurrenttopofthestackisindicatedbythepointer top.Weassumethatsynthesized
attributesareevaluatedjustbeforeeachreduction.SupposethesemanticruleA.a:=f(X.x,Y.y,Z.z)
isassociatedwiththeproductionAXYZ.BeforeXYZisreducedtoA,thevalueoftheattribute
Z.zisinvalue[top],thatofY.yinvalue[top1],andthatofX.xinvalue[top2].
Ifasymbolhasnoattribute,thenthecorrespondingentryinthevaluearrayisundefined.
Afterthereduction,topisdecrementedby2,thestatecoveringAisputinstate[top](i.e.,whereX
was),andthevalueofthesynthesizedattributeA.aisputinvalue[top].
Example4.4:Considerthesyntaxdirecteddefinitionofthedeskcalculatorinthefollowingtable.
Production
Semanticrule
Print(E.val)
L En
CS6660
Compiler Design
Unit IV
4.5
E
E
T
T
F
F
E1 +T
T
T1 *F
F
(E)
digit
E.val:=E1.val+T.val
E.val:=T.val
T.val:=T1.val*F.val
T.val:=F.val
F.val:=E.val
F.val:=digit.lexval
Figure4.4:Annotatedparsetreefor3*5+4n
ImplementationofadeskcalculatorwithanLRparserisgiveninthetable.
Production
Semanticrule
Print(value[top])
L E$
value[ntop]:=value[top2]+value[top]
E E1 +T
E.val:=T.val
E T
value[ntop]:=value[top2]*value[top]
T T1 *F
T.val:=F.val
T F
value[ntop]:=value[top1]
F (E)
F.val:=digit.lexval
F digit
Thevalueofntopissettotopr+1.Aftereachcodefragmentisexecuted,topissettontop.
ThesynthesizedattributesintheannotatedparsetreecanbeevaluatedbyanLRparserduringa
bottomupparseoftheinputline3*5+4.
Theparseoftheexpression3*5+4$withthestackisshowninthetable.
Input
State
Value
Productionused
3*5+4 $
*5 +4 $ 3
3
Fdigit
*5 +4 $ F
3
*5 +4 $ T
3
TF
5 +4 $ T*
3*
+4 $ T*5
3*5
CS6660
+4 $ T*F
Compiler Design
3*5
Unit IV
Fdigit
4.6
+4 $ T
+4 $ E
4 $ E+
$ E+4
$ E+F
$ E+T
$ E
E$
L
15
15
15 +
15 +4
15 +4
15 +4
19
19
19
TT*F
ET
Fdigit
TF
EE+T
LE$
4.4DESIGNOFPREDICTIVETRANSLATION
Thefollowingalgorithmgeneralizestheconstructionofpredictiveparserstoimplementa
translationschemebasedonagrammarsuitablefortopdownparsing.
Algorithm4.2:Constructionofapredictivesyntaxdirectedtranslator.
Input:Asyntaxdirectedtranslationschemewithanunderlyinggrammarsuitablefor
predictiveparsing.
Output:Codeforasyntaxdirectedtranslator.
Methud:Thetechniqueisamodificationofthepredictiveparserconstruction
1. ForeachnonterminalA,constructafunctionthathasaformalparameterforeachinherited
attributeofAandthatreturnsthevaluesofthesynthesizedattributesofA
2. ThecodefornonterminalAdecides whatproductiontousebasedonthecurrentinput
symbol.
3. The code associated with each production does the following. We consider the tokens,
nonterminals,andactionsontherightsideoftheproductionfromlefttoright.
(i).FortokenXwithsynthesizedattributex,savethevalueofxinthevariabledeclared
forX.x.ThengenerateacalltomatchtokenXandadvancetheinput.
(ii).FornonterminalB,generateanassignmentc:=B(b1,b2, , bk)withafunction
call on the right side, where b1, b2, , bk are the variables for the inherited
attributesofBandcisthevariablefarthesynthesizedattributeofB.
(iii).Foranaction,copythecodeintotheparser,replacingeachreferencetoanattribute
bythevariableforthatattribute.
Example4.5:ThegrammarinisLL(1)andhencesuitablefortapdownparsingcanbe
generatedbypredictingasuitableproductionrule.
EE1+T{E.nptr:=mknode('+',E1.nptr,T.nptr)
}EE1T{E.nptr:=mkrtodr('',El.nptr,
T.nptr)}
ET{E.nptr:=T.nptr}
ER{E.nptr:=R.nptr
}R
Tid{T.nptr:=mkleaf(id,id.entry)}
Tnum{T.nptr:=mkleaf(num,num.entry)}
CombinetwooftheEproductionstomakethetranslatorsmaller.Thenewproductionsusetoken
optorepresent+and.
CS6660
Compiler Design
Unit IV
4.7
EE1opT{E.nptr:=mknode('op',E1.nptr,T.nptr)}
ET{E.nptr:=T.nptr}
ER{E.nptr:=R.nptr}
R
Tid{T.nptr:=mkleaf(id,id.entry)}
T num{T.nptr:=mkleaf(num,num.entry)}
4.5 TYPESYSTEMS
Thedesignofatypecheckerforalanguageisbasedoninformationabout
thesyntacticconstructsinthelanguage.Eachexpressionhasassociatedtypes
tolanguageconstructs.
lf both operands of the arithmetic operators of addition, subtraction and
multiplicationareoftypeinteger,thentheresultisoftypeinteger.
Theresultoftheunary&operatorisapointertotheobjectreferredtobythe
operand.Ifthe type of the operand is '', the type of the result is 'pointer
to ''.
Basic types are the atomic types with no internal structure as far as the
programmerisconcerned.
Thebasictypesareboolean,character,inieger.andreal.
Arrays,records,andsets,pointersandfunctionscanalsobetreatedasconstructedtypes.
TypeExpressions
Atypeexpressioniseitherabasictypeorisformedbyapplyinganoperatorcalledatype
constructortoatypeexpression.Thesetsofbasictypesandconstructorsdependonthelanguageto
bechecked.
Thefollowingaresomeoftypeexpressions:
1. Abasictypeisatypeexpression.Typicalbasictypesforalanguageincludeboolean,char,
integer,float,andvoid(theabsenceofavalue).type_errorisaspecialbasictype.
2. Sincetypeexpressionsmaybenamed,atypenameisatypeexpression.
3. Atypeconstructorappliedtotypeexpressionsisatypeexpression.Constructorsinclude:
a) Arrays:IfTisatypeexpression,thenarray(I,T)isatypeexpressiondenotingthe
type ofanarray withelements oftype TandindexsetI.Iis often arange of
integers.Ex.inta[25];
b) Products:IfT1andT2aretypeexpressions,thentheirCartesianproductT1xT2is
atypeexpression.xassociatestotheleftandthatithashigherprecedence.Products
areintroducedforcompleteness;theycanbeusedtorepresentalistortupleoftypes
(e.g.,forfunctionparameters).
c) Records:Arecordisadatastructurewithnamedfields.Atypeexpressioncanbe
formedbyapplyingtherecordtypeconstructortothefieldnamesandtheirtypes.
d) Pointers:IfTisatypeexpression,thenpointer(T)isatypeexpressiondenotingthe
type"pointertoanobjectoftypeT".Forexample:inta;int*p=&a;
e) Functions:Mathematically,afunctionmapsdementsofoneset(domain)toanother
set(range). FunctionF:DR.Atypeexpressioncanbeformedbyusingthetype
constructorforfunctiontypes.Wewritestfor"functionfromtypestotype
t".
4. Typeexpressionsmaycontainvariableswhosevaluesarethemselvestypeexpressions.
CS6660
Compiler Design
Unit IV
4.8
Example4.6:Thearraytypeint[2][3]canbereadas"arrayof2arraysof3integerseach"and
writtenasatypeexpressionarray(2,array(3,integer)).ThistypeisrepresentedbythetreeinFigure
6.14.Theoperatorarraytakestwoparameters,anumberandatype.
array
array
2
3
integer
Figure4.5:Typeexpressionforint[2][3]
TypeSystems
Atypesystemisacollectionofrulesforassigningtypeexpressionstothevariouspartsofa
program.Atypecheckerimplementsatypesystem,Thetypesystemsarespecifiedina
syntaxdirectedmanner.
Different type systems may be used by different compilers or processors of the same
language.Forexample,inPascal,thetypeofanarrayincludestheindexsetofthearray,so
afunctionwithanarrayargumentcanonlybeappliedtoarrayswiththatindexset.
StaticandDynamicCheckingofTypes
Checkingdonebyacompilerissaidtobestatic,whilecheckingdonewhenthetarget
programrunsistermeddynamic.
Anycheckcanbedonedynamically,ifthetargetcodecarriesthetypeofanelementalong
withthevalueofthatelement.
Asoundtypesystemeliminatestheneedfordynamiccheckingfurtypeerrorsbecauseit
allowsustodeterminestaticallythattheseerrorscannotoccurwhenthetargetprogram
runs.
Inasoundtypesystem,Typeerrorscannotoccurwhenthetargetcoderun.
Alanguageisstronglytyped,ifitscompilercanguaranteethattheprogramsitacceptswill
executewithouttypeerrors.Eg.Forintegersintarray[255];.
ErrorRecovery
Sincetypecheckinghasthepotentialforcatchingerrorsinprograms.itisimportant
foratypecheckertodosomethingreasonablewhenanerrorisdiscovered.
Attheveryleast,thecompilermustreportthenatureandlocationoftheerror.
Itisdesirableforthetypecheckertorecoverfromerrors,soitcanchecktherestof
theinput.
Sinceerrorhandlingaffectsthetypecheckingrules.Ithastobedesignedintothe
typesystemrightfromthestart;therulesmustbepreparedtocopewitherrors.
Copingwithmissinginformationrequiresforerrorhandling.
4.6 SPECIFICATIONOFASIMPLETYPECHECKER
Specificationofasimpletypecheckerforasimplelanguageinwhichthetypeofeach
identifiermustbedeclaredbeforetheidentifierisused.Thetypecheckerisatranslationscheme
thatsynthesizesthetypeofeachexpressionfromthetypesofitssubexpressions.Thetypechecker
canhandlesarrays,pointers,statements,andfunctions.
Specificationofasimpletypecheckerincludesthefollowing:
CS6660
Compiler Design
Unit IV
4.9
ASimpleLanguage
TypeCheckingofExpressions
TypeCheckingofStatements
TypeCheckingofFunctions
ASimpleLanguage
Thefollowinggrammargeneratesprograms,representedbythenonterrninalP,consistingofa
sequenceofdeclarationsDfollowedbyasingleexpressionE.
PD;E
DD;D|id:T
Tchar | integer | array[num] of T | T
Atranslationschemeforaboverules:
PD;E
DD;D
Did:T
{addtype(id.entry,T.type}
Tchar
{T.type:=char}
Tinteger
{T.type:=integer}
TT1
{T.type:=pointer(T1.type)}
Tarray[num]ofT1
{T.type:=array(1..num.val,T1.type)}
TypeCheckingofExpressions
ThesynthesizedattributetypeforEgivesthetypeexpressionassignedbythetypesystem
totheexpressiongeneratedbyE.Thefollowingsemanticrulessaythatconstantsrepresentedby
thetokensliteralandnumhavetypecharandinteger,respectively:
Rule
SemanticRule
Eliteral
{E.type:=char}
Enum
{E.type:=integer}
Afunctionlookup(e)isusedtofetchthetypesavedinthesymboltableentrypointedtoby
e.Whenanidentifierappearsinanexpression,itsdeclaredtypeisfetchedandassignedtothe
attributetype;
Eid
{E.type:=lookup(id.entry}
Theexpressionformedbyapplyingthemodoperatortotwosubexpressionsoftypeinteger
hastypeinteger;otherwise,itstypeistype_error.Theruleis
EE1modE2{E.type:=ifE1.type=integerandE2.type=integerthen
integer
elsetypr_error}
InanarrayreferenceE1[E2],theindexexpressionE2musthavetypeinteger,inwhichcase
theresultisthedementtypetobtainedfromthetypearray(s.t)ofE 1;wemakenouseoftheindex
setsofthearray.
EE1[E2]{E.type:=ifE2.type=integerandE1.type=array(s,t)thent
elsetypr_error}
CS6660
Compiler Design
Unit IV
4.10
Withinexpressions,thepostfixoperatoryieldstheobjectpointedtobyitsoperand.The
type of E is the type of the object pointed to by the pointer E:
EE1
{E.type:=ifE1.type=ponter(t)then t
elsetypr_error}
TypeCheckingofStatements
Sincelanguageconstructslikestatementstypicallydonothavevalues,thespecialbasic
type void can be assigned to them. If an error is detected within a statement, then the type
type_errorassigned.
Theassignmentstatement,conditionalstatement,andwhilestatementsareconsideredfor
thetypeChecking.TheSequencesofstatementsareseparatedbysemicolons.
Sid:=E
{S.type:=ifid.type=E,typethenvoid
elsetype_error}
EifEthenS1
{S.type:=ifE.type=BooleanthenS1.type
elsetype_error}
{S.type:=ifE.type=BooleanthenS1.type
elsetype_error}
EwhileEdoS1
ES1;S2
{S.type:=ifS1.type=voidandS2.type=voidthenvoid
elsetype_error}
TypeCheckingofFunctions
Theapplicationofafunctiontoanargumentcanbecapturedbythe
productionEE(E)
inwhichanexpression1stheapplicationofoneexpressiontoanother.Therulesforassociating
typeexpressionswithnonterminalTcanbeaugmentedbythefollowingproductionandactionto
permitfunctiontypesindeclarations.
TT1''T2
{T.type:=T1.typeT2.type}
Quotesaroundthearrowusedasafunctionconstructordistinguishitfromthearrowusedasthe
metasyrnbolinaproduction.
Theruleforcheckingthetypeofafunctionapplicationis
EE1(E2){E.type:=ifE2.type=sandE1.type=stthent
elsetypr_error}
4.7 EQUIVALENCEOFTYPEEXPRESSIONS
Iftwotypeexpressionsareequalthenreturnacertaintypeelsereturntype_error.
Itisimportanttohaveaprecisedefinitiontosaythattwotypeexpressionsareequivalent.
Thekeyissueiswhetheranameinatypeexpressionstandsforitselforwhetheritis
anabbreviationforanothertypeexpression.
For efficiency, compilers we representations that allow type equivalence to be
determinedquickly.
Thenotionoftypeequivalenceimplementedbyaspecificcompilercanoftenbe
explainedusingtheconceptsofstructuralandnameequivalence
CS6660
Compiler Design
Unit IV
4.11
InC,thisisachievedbytypedefandstructstatement.
StructuralEquivalenceofTypeExpressions
Type expressions are built from basic types and constructors, a natural notion of
equivalencebetweentwotypeexpressionsisstructuralequivalence;i.e.,twoexpressionsareeither
thesamebasictype,orareformedbyapplyingthesameconstructortostructurallyequivalent
types.Thatis,twotypeexpressionsarestructurallyequivalentifandonlyiftheyareidentical.
Forexample,thetypeexpressionintegerisequivalentonlytointegerbecausetheyarethe
samebasictype.
Similarly, pointer (integer) is equivalent only to pointer (integer) because the two are
formedbyapplyingthesameconstructorpointertoequivalenttypes.
Thealgorithmrecursivelycomparesthestructureoftypeexpressionswithoutcheckingfor
cycles so it can be applied to a tree or a dag representation. It assumes that [he only type
constructorsareforarrays,products,pointers,andfunctions.
Theconstructedtypearray(n1,t1)andarray(n2,t2)areequivalentiffn1=n2andt1=t2.
Algorithmsequiv(s,t)
if(sandtaresamebasictype)then
returntrue
elseif(s=array(s1,s2)andt=array(t1,t2))then
return(sequiv(s1,t1)andsequiv(s2,t2))
elseif(s=s1xs2andt=t1xt2)then
return(sequiv(s1,t1)andsequiv(s2,t2))
elseif(s=pointer(s1)andt=pointer(t1)then
return(sequiv(s1,t1))
elseif(s=s1 s2 andt=t1
t2 ) then
return(sequiv(s1,t1)and
sequiv(s2,t2))elsereturnfalse
Example4.7:TheencodingoftypeexpressionsinthisexampleisfromaCCompilerforfast
checkingoftypeequivalence.
BASICTYPE
ENCODING
boolean
0000
char
0001
integer
0010
real
0011
TYPECONSTRUCTOR
pointer
array
freturns
ENCODING
01
10
11
TYPEEXPRESSION
char
ENCODING
0000000001
freturns(char)
CS6660
0000110001
Compiler Design
Unit IV
pointer(freturns(char))
array(pointer(freturns(char)))
4.12
0001110001
1001110001
NamesforTypeExpressions
Insomelanguages,typescanbegivennames(Datatypename).Forexample,inthePascal
programfragment.
type link = cell;
var
next :link;
last
:link;
p
: cell;
q,r
: cell;
Theidentifier link is declared to be a name for the type cell. Thevariablesnext,last,p,
q,rarenotidenticaltype,becausethetypedependsontheimplementation.
Typegraphisconstructedtocheckthenameequivalence.
Everytimeatypeconstructororbasictypeisseen,anewnodeiscreated.
Everytimeanewtypenameisseen,aleafiscreated.
twotypeexpressionsareequivalentiftheyarerepresentedbythesamenodeinthetype
graph.
Example4.8:ConsiderPascalprogramfragment
type
link
np
nqr
= cell;
= cell;
= cell;
var
next
last
p
q
r
:link;
:link;
:np;
:nqr;
:nqr;
link = pointer
point
er
pointe
r
cell
Figure4.6:Associationofvariablesandnodesinthetypegraph.
Notethattypenamecel1hasthreeparents.Alllabeledpointer.Anequalsignappearsbetweenthe
typenamelinkandthenodeinthetypegraphtowhichitrefers.
CS6660
Compiler Design
Unit IV
4.13
Example:CheckforequivalenceoftypeexpressionsforthefollowingCcode:
typedefstruct
{
intdata[100];
intcount;
}Stack;
typedefstruct
{
intdata[100];
intcount;
}Set;
Stackx,y;
Setr,s;
Nameequivalence: Themoststraightforward:twotypesareequalif,andonlyif,theyhavethe
samename.xandywouldbeofthesametypeandrandswouldbeofthesametype,butthetype
ofxorywouldnotbeequivalenttothetypeofrors.
x=y;valid
r=s;valid
structuralequivalence:Twotypesareequalif,andonlyif,theyhavethesame"structure"
x=r;
valid
usingNameequivalence&structuralequivalencethetwotypesStackandSetaretype
equivalent.
CS6660
Compiler Design
Unit IV
4.14
4.8TYPECONVERSIONS
Considerexpressionslikex+i,wherexisoftypefloatandiisoftypeinteger.Sincethe
representationofintegersandfloatingpointnumbersisdifferentwithinacomputeranddifferent
machine instructions areusedforoperations onintegers andfloats, thecompiler mayneed to
convertoneoftheoperandsof+toensurethatbothoperandsareofthesametypewhenthe
additionoccurs.
Supposethatintegersareconvertedtofloatswhennecessary,usingaunaryoperator(float).
Forexample,theinteger2isconvertedtoafloatinthecodefortheexpression2*3.14:
tl=(float)2
t2=tl*3.14
TheattributeE.type,whosevalueiseitherintegerorfloat.
TheruleassociatedwithEEl+E2buildsonthepseudocode
if(E1.type=integerandE2.type=integer) E.type=integer;
elseif(E1.type=floatandE2.type=integer)E.type=float;
elseif(E1.type=integerandE2.type=float)E.type=float;
elseif(E1.type=floatandE2.type=float)E.type=float;
Typeconversionrulesvaryfromlanguagetolanguage.TherulesforJavainFigure4.7
distinguish between widening conversions, which are intended to preserve information, and
narrowingconversions,whichcanloseinformation.
double
double
float
float
long
long
int
int
short
char
cha
r
short
byte
byte
(a)Wideningconversions
(b)Narrowingconversions
Figure4.7:ConversionsbetweenprimitivetypesinJava
Coercions
Conversionfromonetypetoanotherissaidtobeimplicitifitisdoneautomaticallybythe
compiler. Implicit type conversions, also called coercions, are limited in many languages to
wideningconversions.Conversionissaidtobeexplicitiftheprogrammermustwritesomethingto
causetheconversion.Explicitconversionsarealsocalledcasts.
ThesemanticactionforcheckingEEl+E2usestwofunctions:
1. max(tl,t2)takestwotypestlandt2andreturnsthemaximum(orleastupperbound)ofthe
twotypes inthewideninghierarchy.Itdeclaresanerrorifeither tl or t2 is notinthe
hierarchy;e.g.,ifeithertypeisanarrayorapointertype.
2. widen(a,t,w)generatestypeconversionsifneededtowidenanaddressaoftypetintoa
valueoftypew.Itreturnsaitselfiftandwarethesametype.Otherwise,it
CS6660
Compiler Design
Unit IV
4.15
generatesaninstructiontodotheconversionandplacetheresultinatemporaryt,
whichisreturnedastheresult.
Pseudocodeforwiden,assumingthattheonlytypesareintegerandfloat.
Addrwiden(Addra,Typet,Typew)
{
if(t=w)returna;
elseif(t=integerandw=float)
{
temp=newTemp();
gen(temp'=''(float)'
a);returntemp;
}
elseerror;
}
Introducingtypeconversionsintoexpressionevaluation
EEl+E2{E.type=max(E1.type,E2.type);
a1=widen(E1.addr,E1.type,E.type);
a2=widen(E2.addr,E2.type,E.type);
E.addr=newTemp();
gen(E.addr'='a1'+'a2);}
Example4.9.Considerexpressionsformedbyapplyinganarithmeticoperatoraptoconstantsand
identifiers, as in the grammar. Suppose there are two types real and integer, with integers
convertedtorealswhennecessary.AttributetypeofnonterminalEcanbeeitherintegerorreal,and
thetypecheckingrulesareshownbelow,functionlookup(e)returnsthetypesavedinthesymbol
tableentrypointedtobye.
PRODUCTION
SEMANTICRULE
Enum
E.type=integer
Enum.num
E.type=real
Eid
E.type=lookup(id.entry)
EE1opE2
E.type=if(E1.type=integerandE2.type=integer)
theninteger
elseif(E1.type=integerandE2.type=real)
thenreal
elseif(E1.type=realandE2.type=integer)
thenreal
elseif(E1.type=realandE2.type=real)
thenreal
elsetype_error
CS6660
Compiler Design
Unit IV
4.16
4.9RUNTIMEENVIRONMENT:SOURCELANGUAGEISSUES
RunTimeEnvironment
RunTimeEnvironmentestablishesrelationshipsbetweennamesanddataobjects.
TheallocationanddeallocationofdataobjectsaremanagedbytheRunTimeEnvironment
Eachexecutionofaprocedureisreferredtoasanactivationoftheprocedure.
Iftheprocedureisrecursive,severalofitsactivationsmay&aliveatthesametime.Each
callofaprocedureleadstoanactivationthatmaymanipulatedataobjectsallocatedforits
use.
Therepresentationofadataobjectatruntimeisdeterminedbyitstype.
Often,elementarydatatypes,suchascharacters,integers,andrealscanberepresentedby
equivalentdataobjectsinthetargetmachine.
However, aggregates, such as arrays, strings,and structures, are usually represented by
collectionsofprimitiveobjects.
SourceLanguageIssues
1.
2.
3.
4.
5.
Procedure
ActivationTrees
ControlStack
TheScopeofaDeclaration
BindingsofNames
Procedure
Aproceduredefinitionisadeclarationthatassociatesanidentifierwithastatement.The
identifieristheprocedurenameandthestatementistheprocedurebody.
Aprocedurereturnsvalueforthecalledfunction.
Acompleteprogramwillalsobetreatedasaprocedure.
Whenaprocedurenameappearswithinanexecutablestatement,wesaythattheprocedure
iscalledatthatpoint.
Thebasicideaisthataprocedurecallexecutestheprocedurebody.
Someoftheidentifiersappearinginaproceduredefinitionarespecial,andarecalledformal
parametersoftheprocedure.
Actualparametersmaybepassedtoacalledprocedure.
Procedurescancontainslocalandglobalvariables.
ActivationTrees
Wemakethefollowingassumptionsabouttheflowofcontrolamongproceduresduringthe
executionofaprogram:
1. Controlflowssequentially;thatis,theexecutionofaprogramconsistsofasequence
ofsteps,withcontrolkingatsomespecificpointintheprogramateachstep.
2. Eachexecutionofaprocedurestartsatthebeginningoftheprocedurebodyand
eventuallyreturnscontroltothepointimmediatelyfollowingtheplacewherethe
procedurewascalled.Thismeanstheflowofcontrolbetweenprocedurescanbe
depictedusingtrees.
nested.
CS6660
Compiler Design
Unit IV
4.17
Aprocedureisrecursiveifanewactivationcanbeginbeforeanearlieractivationofthe
sameprocedurehasended.
Thelifetimeoftheactivationquicksort(1,9)isthesequenceofstepsexecutedbetween
printingenterquicksort(1,9)andprintingleavequicksort(l,9).
Thefollowingaretherulestoconstructanactivationtree:
1.
2.
Eachnoderepresentsanactivationofaprocedure.
Therootnoderepresentstheactivationofthemainprogram.
3. Thenodeforaistheparentofthenodeforbifandonlyifcontrolflowsfromactivationa
tob.
4. Thenodeforaistotheleftofthenodeforbifandonlyifthelifetimeofaoccursbefore
thelifetimeofb.
entermain()
enterreadarray()
leavereadarray()
enterquicksort(1,9)
enterpartition(l,9)
leavepartition(1,9)
enterquicksort(l,3)
...
leavequicksort(1,3)
enterquicksort(5,9)
...
leavequicksort(5,9)
leavequicksort(1,9)
leavemain()
Figure4.8:Anactivationtreecorrespondingtotheoutputofactivationofquicksort
ControlStack
Theflowofcontrolinaprogramcorrespondstoadepthfirsttraversaloftheactivationtree
thatstartsattheroot,visitsanodebeforeitschildren,andrecursivelyvisitschildrenateachnodein
alefttorightorder.
Wecanuseastack,calledacontrolstacktokeeptrackofliveprocedureactivations;the
ideaistopushthenodeforactivationontothecontrolstackastheactivationbeginsandtopopthe
nodewhentheactivationends.Thenthecontentsofthecontrolstackarerelatedtopathstotheroot
oftheactivationtree.Whennodenisatthetopofthecontrolstack,thestackcontainsthenodes
alongthepathfromntotheroot.
CS6660
Compiler Design
Unit IV
4.18
Example 4.10: Figure 4.8 shows nodes from the activation tree of Figure 4.9 that have been
reachedwhencontrolenterstheactivationrepresentedbyq(2,3).Activationswithlabelsr,p(1,9),
p(1,3),andq(1,3)haveexecutedtocompletion,sothefigurecontainsdashedlinestotheirnodes.
Thesolidlinesmarkthepathfromq(2,3)totheroot.
Figure4.9:Thecontrolstackcontainsnodesalongapathtotheroot.
TheScopeofaDeclaration
Adeclarationinalanguageisasyntacticconstructthatassociatesinformationwithaname.
Declarationsmaybeexplicit,asinthe Pascal fragment vari:integer; ortheymaybe
Implicit.Forexample,anyvariablenamestartingwithIisassumedtodenoteanintegerina
Fortranprogram,unlessotherwisedeclared.
Thescoperulesofalanguagedeterminewhichdeclarationofanameapplieswhenthe
nameappearsinthetextofaprogram.
The portion of the program to which a declaration applies is called the scope of that
declaration.Anoccurrenceofanameinaprocedureissaidtobelocaltotheprocedureifit
isinthescopeofadeclarationwithintheprocedure;otherwise,theoccurrenceissaidtobe
nonlocal.
Atcompiletime,thesymboltablecanbeusedtofindthedeclarationthatappliestoan
occurrenceofaname.
Special,static,global,volatile,finalandsoonarealsousedtodeclarevariables.
BindingsofNames
Evenifeachnameisdeclaredonceinaprogram,thesamenamemaydenotedifferentdata
objectsatruntime.Theinformalterm"dataobject"correspondstoastoragelocationthatcanhold
values.
Inprogramminglanguagesemantics,thetermenvironmentreferstoafunctionthatmapsa
nametoastoragelocation,andthetermstatereferstoafunctionthatmapsastoragelocationtothe
valueheldhereasinFigure4.10.
environment
name
storag
e
state
value
Figure4.10:Twostagemappingfromnamestovalues
Environments and states are different; an assignment changes the state, but not the
environment.Forexample,supposethatstorageaddress100,associatedwithvariablepi,holds0.
Aftertheassignmentpi:=3.14,thesamestorageaddressisassociatedwithpi,butthevalueheld
thereis3.14.
CS6660
Compiler Design
Unit IV
4.19
Whenanenvironmentassociatesstoragelocationswithanamex,wesaythatxisboundto
s;theassociationitselfisreferredtoasabindingofx.Thetermstorage"location"istobetaken
figuratively.Ifxisnotofabasictype,thestoragesforxmaybeacollectionofmemorywords.
Staticnotion
definitionofaprocedure
Dynamiccounterpart
activationsoftheprocedure
bindingsofthename
Declarationofaname
Scopeofadeclaration
lifetimeofabinding
4.10STORAGEORGANIZATION
Theexecutingtargetprogramrunsinitsownlogicaladdressspaceinwhicheachprogram
valuehasalocation.Themanagementandorganizationofthislogicaladdressspaceisshared
betweenthecompiler,operatingsystem,andtargetmachine.Theoperatingsystemmapsthelogical
addressesintophysicaladdresses,whichareusuallyspreadthroughoutmemory.
Theruntimerepresentationofanobjectprograminthelogicaladdressspaceconsistsof
dataandprogramareasasshowninFigure.AcompilerforalanguagelikeC++onanoperating
systemlikeLinuxmightsubdividememoryinthisway.
Theruntimestorageissubdividedtoholdcodeanddataasfollows:
Thegeneratedtargetcode
Dataobjects
Controlstack(whichkeepstrackofinformationofprocedureactivations0
00
Code
Static data
Heap
Free Memory
Stack
FF
Figure4.11:Typicalsubdivisionofruntimememoryintocodeanddataareas
Thesizeofthegeneratedtargetcodeisfixedatcompiletime,sothecompilercanplacethe
executabletargetcodeinastaticallydeterminedareaCode,usuallyinthelowendofmemory.
Thesizeofsomeprogramdataobjects,suchasglobalconstants,anddatageneratedbythe
compiler,suchasinformationtosupportgarbagecollection,maybeknownatcompiletime,andthese
dataobjectscanbeplacedinanotherstaticallydeterminedareacalledStatic.Onereasonfor
CS6660
Compiler Design
Unit IV
4.20
staticallyallocatingasmanydataobjectsaspossibleisthattheaddressesoftheseobjectscanbe
compiledintothetargetcode.Inearlyversions ofFortran,alldataobjects couldbeallocated
statically.
Tomaximizetheutilizationofspaceatruntime,theothertwoareas,StackandHeap,areat
theoppositeendsoftheremainderoftheaddressspace.Theseareasaredynamic;theirsizecan
changeastheprogramexecutes.Theseareasgrowtowardseachotherasneeded.Thestackisused
tostoredatastructurescalledactivationrecordsthatgetgeneratedduringprocedurecalls.
ActivationRecords
Procedurecallsandreturnsareusuallymanagedbyaruntimestackcalledthecontrolstack.
Eachliveactivationhasanactivationrecord(sometimescalledaframe)onthecontrolstack.The
contentsofactivationrecordsvarywiththelanguagebeingimplemented.
Actualparametersode
Returnedvalues
Controllink
Accesslink
Savedmachinestatus
Localdata
Temporaries
Figure4.12:Ageneralactivationrecord
Thefollowingarethecontentsinanactivationrecord
2.
5.
1. Temporaryvalues,suchasthosearisingfromtheevaluationofexpressions,incaseswhere
thosetemporariescannotbeheldinregisters.
Localdatabelongingtotheprocedurewhoseactivationrecordthisis.
3. Asavedmachinestatus,withinformationaboutthestateofthemachinejustbeforethecall
totheprocedure.Thisinformationtypicallyincludesthereturnaddressandthecontentsof
registersthatwereusedbythecallingprocedureandthatmustberestoredwhenthereturn
occurs.
4. An"accesslink"maybeneededtolocatedataneededbythecalledprocedurebutfound
elsewhere,e.g.,inanotheractivationrecord.
Acontrollink,pointingtotheactivationrecordofthecaller.
6. Spaceforthereturnvalueofthecalledfunction,ifany.Again,notallcalledprocedures
returnavalue,andifonedoes,wemayprefertoplacethatvalueinaregisterforefficiency.
7. The actual parameters used by the calling procedure. Commonly, these values are not
placedintheactivationrecordbutratherinregisters.
CS6660
Compiler Design
Unit IV
4.21
4.11STORAGEALLOCATION
Therearebasicallythreestorageallocationstrategyisusedineachofthethreedataareasin
theorganization.
1. Staticallocationlaysoutstorageforalldataobjectsatcompiletime.
2. Stackallocationmanagestheruntimestorageasastack,
3. Heapallocationallocatesanddeallocatesstorageasneededatruntimefromadata
areaknownasaheap,
1. StaticAllocation
Instaticallocation,namesareboundtostorageastheprogramiscompiled,sothereisno
needforaruntimesupportpackage.
Sincethebindingsdonotchangeatruntime,everytimeaprocedureisactivated,itsnames
areboundtothesamestoragelocations.
Theabovepropertyallowsthevaluesoflocalnamestoberetainedacrossactivationsofa
procedure.Thatis,whencontrolreturnstoaprocedure,thevaluesofthelocalsarethesame
astheywerewhencontrolleftthelasttime.
Fromthetypeofaname,thecompilerdeterminestheamountofstoragetosetasideforthat
name.
Theaddressofthisstorageconsistsofanoffsetfromanendoftheactivationrecordforthe
procedure.
Thecompilermusteventuallydecidewheretheactivationrecordsgo,relativetothetarget
codeandtooneanother.
Thefollowingarethelimitationsforstaticmemoryallocation.
1. Thesizeofadataobjectandconstraintsonitspositioninmemorymustbeknownat
compiletime.
2. Recursiveproceduresarerestricted,becauseallactivationsofaprocedureusethe
samebindingsforlocalnames.
3. Dynamicallocationisnotallowed.SoDatastructurescannotbecreateddynamically.
2. StackAllocation
1. Stackallocationisbasedontheideaofacontrolslack.
2. AstackisaLastInFirstOut(LIFO)storagedevicewherenewstorageisallocated
anddeallocatedatonlyone``end'',calledtheTopofthestack.
3. Storageisorganizedasastack,andactivationrecordsarepushedandpoppedas
activationsbeginandend,respectively.
4. Storageforthelocalsineachcallofaprocedureiscontainedintheactivationrecord
forthatcall.Thuslocalsareboundtofreshstorageineachactivation,becauseanew
activationrecordispushedontothestackwhenacallismade.
5. Furthermore,thevaluesoflocalsaredetectedwhentheactivationends;thatis,the
valuesarelostbecausethestorageforlocalsdisappearswhentheactivationrecordis
popped.
6. Atruntime,anactivationrecordcanbeallocatedanddeallocatedbyincrementing
anddecrementingtopofthestackrespectively.
a.Callingsequence
CS6660
Compiler Design
Unit IV
4.22
Thelayoutandallocationofdatatomemorylocationsintheruntimeenvironmentarekey
issuesinstoragemanagement.Theseissuesaretrickybecausethesamenameinaprogramtextcan
refertomultiplelocationsatruntime.
Thetwoadjectivesstaticanddynamicdistinguishbetweencompiletimeandruntime,respectively.
Wesaythatastorageallocationdecisionisstatic,ifitcanbemadebythecompilerlookingonlyat
thetextoftheprogram,notatwhattheprogramdoeswhenitexecutes.
Conversely,adecisionisdynamicifitcanbedecidedonlywhiletheprogramisrunning.Many
compilersusesomecombinationofthefollowingtwostrategiesfordynamicstorageallocation:
1. Stackstorage.Nameslocaltoaprocedureareallocatedspaceonastack.Thestacksupports
thenormalcall/returnpolicyforprocedures.
2. Heapstorage.Datathatmayoutlivethecalltotheprocedurethatcreateditisusually
allocatedona"heap"ofreusablestorage.
Figure4.13:Divisionoftasksbetweencallerandcallee
Thecodeforthecalleecanaccessitstemporariesandlocaldatausingoffsetsfromtopsp.Thecall
sequenceis:
1. Thecallerevaluatesactual.
2. Thecallerstoresareturnaddressandtheoldvalueof topsp intothecake'sactivation
record.ThecallerthenincrementstopsptothepositionshowninFigure4.17Thatis,top
spismovedpastthecaller'slocaldataandtemporariesandthecalk'sparameterandstatus
fields.
3. Thecalleesavesregistervaluesandotherstatusinformation.
4. Thecalleeinitializesitslocaldataandbeginsexecution.
Apossiblereturnsequenceis:
1. Thecalleeplacesareturnvaluenexttotheactivationrecordofthecaller.
CS6660
Compiler Design
Unit IV
4.23
2. Usingtheinformationinthestatusfield,thecalleerestorestopspandother
registersandbranchestoareturnaddressinthecaller'scode.
3. Although topsp has been decremented, the caller can copy the returned
valueintoitsownactivationrecordanduseittoevaluateanexpression.
b. variablelengthdata
1. Variablelengthdataarenotstoredintheactivationrecord.Onlyapointerto
thebeginningofeachdataappearsintheactivationrecord.
2. Therelativeaddressesofthesepointersareknownatcompiletime.
c. danglingreferences
1. A danglingreference occurswhenthereisareferencetostoragethathas
beendeallocated.
2. Itisalogicalerrortousedanglingreferences,Sincethevalueofdeallocated
storageisundefinedaccordingtothesemanticsofmostlanguages.
3. Heapallocation
1. Thedeallocationofactivationrecordsneednotoccurinalastinfirstout
fashion,sostoragecannotbeorganizedasastack.
2. Heap allocation parcels out pieces of contiguous storage, as needed for
activationrecordsorotherobjects.Piecesmaybedeallocatedinanyorder.
Soovertimetheheapwillconsistofalternateareasthatarefreeandinuse.
3. Heapisanalternateforstack.
4.12PARAMETERPASSING
Allprogramminglanguageshaveanotionofaprocedure,buttheycandifferinhowthese
procedures get their arguments. The actual parameters (the parameters used in the call of a
procedure)areassociatedwiththeformalparameters(thoseusedintheproceduredefinition).
Callbyvalue
itisanexpression)orcopied(ifitisato
In callbyvalue, the actual parameter is evaluated (if thecorrespondingformalparameterof
variable).Thevalueisplacedinthelocationbelonging
thecalledprocedure.ThismethodisusedinCandJava.
Theactualparametersareevaluatedandtheirrvaluesarcpassedtothecalledprocedure.
Callbyvaluecanbeimplementedasfollows:
O Aformalparameteristreatedjustlikealocalname,sothestoragefortheformalsis
intheactivationrecordofthecalledprocedure.
O Thecallerevaluatestheactualparametersandplacestheirrvaluesinthestoragefor
theformals.
Callbyreference
Incallbyreference,theaddressoftheactualparameterispassedtothecalleeasthevalue
ofthecorrespondingformalparameter.Usesoftheformalparameterinthecodeofthecalleeare
implementedbyfollowingthispointertothelocationindicatedbythecaller.Changestotheformal
parameterthusappearaschangestotheactualparameter.
valueitselfispassed.
CS6660
O
Compiler Design
Unit IV
4.24
However,iftheactualparameterisanexpression,likea+bor2,thathasno lvalue,then
theexpressionisevaluatedinanewlocation,andtheaddressofthatlocationispassed.
Copyrestore
Ahybridbetweencallbyvalueandcallbyreferenceiscopyrestorelinkage(copyincopy
oat.orva1ueresult).
Beforecontrolflowstothecalledprocedure,Theactualparametersareevaluated.Ther
valuesoftheactualsarepassedtothecalledprocedureasincallbyvalue.
Whencontrolreturns,thecurrentrvaluesoftheformalparametersarecopiedbackintothe
lvaluesoftheactuals.
Callbyname
A mechanism callbyname was used in the early programming language Algol 60. It
requiresthatthecalleeexecuteasiftheactualparameterweresubstitutedliterallyfortheformal
parameterinthecodeofthecallee,asiftheformalparameterwereamacrostandingfortheactual
parameter.
CallbynameistraditionallydefinedbythecopyruleofAlgol.
1. Theprocedureistreatedasifitwereamacro;thatis,itsbodyissubstitutedforthe
callinthecaller,withtheactualparametersliterallysubstitutedfortheformals.
Suchaliteralsubstitutioniscalledmacroexpansionorinlineexpansion.
2. Thelocalnamesofthecalledprocedurearekeptdistinctfromthenamesofthe
callingprocedure.eachlocalofthecalledprocedurebeingsystematicallyrenamed
intoadistinctnewnamebeforethemacroexpansionisdone.
3. Theactualparametersaresurroundedbyparenthesesifnecessarytopreservetheir
integrity.
4.13 SYMBOLTABLES
Symboltables aredatastructures thatareusedbycompilers toholdinformationabout
sourceprogramconstructs.Theinformationiscollectedincrementallybytheanalysisphasesofa
compilerandusedbythesynthesisphasestogeneratethetargetcode.Entriesinthesymboltable
containinformationaboutanidentifiersuchasitscharacterstring(orlexeme),itstype,itsposition
instorage,andanyotherrelevantinformation.
Lexical
Syntax
Semantic
Analyzer
Analyzer
Analyzer
Intermediate
code
generator
generator
Symbol
Table
Figure4.14:interactionamongSymboltableandvariousphasesofcompiler
Thesymboltable,whichstoresinformationabouttheentiresourceprogram,isusedby
allphasesofthecompiler.
CS6660
Compiler Design
Unit IV
4.25
Anessentialfunctionofacompileristorecordthevariablenames usedinthesource
programandcollectinformationaboutvariousattributesofeachname.
Theseattributesmayprovideinformationaboutthestorageallocatedforaname,itstype,its
scope.
Inthecaseofprocedurenames,suchthingsasthenumberandtypesofitsarguments,the
methodofpassingeachargument(forexample,byvalueorbyreference),andthetype
returnedaremaintainedinsymboltable.
Thesymboltableisadatastructurecontainingarecordforeachvariablename,withfields
fortheattributesofthename.Thedatastructureshouldbedesignedtoallowthecompilerto
findtherecordforeachnamequicklyandtostoreorretrievedatafromthatrecordquickly.
Asymboltablecanbeimplementedinoneofthefollowingways:
O Linear(sortedorunsorted)list
O BinarySearchTree
O Hashtable
Amongtheaboveall,symboltablesaremostlyimplementedas hashtables,wherethe
sourcecodesymbolitselfistreatedasakeyforthehashfunctionandthereturnvalueisthe
informationaboutthesymbol.
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
O Tostorethenamesofallentitiesinastructuredformatoneplace.
O Toverifyifavariablehasbeendeclared.
O Toimplementtypechecking,byverifyingassignmentsandexpressions.
O Todeterminethescopeofaname(scoperesolution).
SymbolTableEntries
A compiler uses a symbol table to keep track of scope and binding information about
names.Thesymboltableissearchedeverytimeanameisencounteredinthesourcetext.Changes
tothetableoccurifanewnameornewInformationaboutanexistingnameisdiscovered.Alinear
lististhesimplesttoimplement,butits performanceispoor.Hashingschemes providebetter
performance.
Thesymboltablegrowsdynamicallyeventhoughfixedatcompiletime.
Eachentryinthesymboltableisforthedeclarationofaname.
Theformatofentriesdoesnotuniform.
Eachentrycanbeimplementedasarecordconsistingofasequenceofconsecutivewordsof
memory.
Tokeepsymboltablerecordsuniform;itmaybeconvenientforsomeoftheinformation
aboutanametobekeptoutsidethetableentry,withonlyapointertothisinformation
storedintherecord.
Thefollowinginformationaboutidentifiersarestoredinsymboltable.
O Thename.
O Thedatatype.
O Theblocklevel.
O Itsscope(local,global).
O Pointer/address
O Itsoffsetfrombasepointer
O Functionname,parameter,andvariable.
CharactersinaName
Thereisadistinctionbetweenthetokenidforanidentifierorname.
Thelexemeconsistingofthecharacterstringformingthename,andtheattributesofthe
name.
CS6660
Compiler Design
Unit IV
4.26
Stringsofcharactersmaybeunwieldytoworkwith,socompilersoftenusesomefixed
lengthrepresentationofthenameratherthanthelexeme.
Thelexemeisneededwhenasymboltableentryissetupforthefirsttime,andwhenwe
lookupalexemefoundintheinputtodeterminewhetheritisanamethathasalready
appeared.
Acommonrepresentationofanameisapointertoasymboltableentryforit.
Ifthereisamodestupperboundonthelengthofaname,thenthecharactersinthenamecanbe
storedinthesymboltableentry,asinFigure4.15.
Figure4.15:SymboltablenamesInfixedsizespacewithinarecord
Ifthereisnolimitonthelengthofaname,orifthelimitisrarelyreached,theindirectschemeof
Figure4.16canbeused.
Figure4.16:symboltablenamesInaseparatearray
StorageAllocationInformation
Informationaboutthestoragelocationsthatwillbebundtonamesatruntimeiskeptinthe
symboltable.
Staticanddynamicallocationcanbedone.
Storageisallocatedforcode,data,stack,andheap.
COMMONblocksinFortranareloadedseparately.
TheListDataStructureforSymbolTables
Thecompilerplansouttheactivationrecordforeachprocedure.
Thesimplestandeasiesttoimplementdatastructureforasymboltableisalinearlistof
recordsasshowninfigure4.17.
CS6660
Compiler Design
Unit IV
4.27
Weuseasinglearray,orequivalentlyseveralarrays.tostorenamesandtheirassociated
information.
Ifthesymboltablecontainsnnames,Tofindthedataaboutaname,ontheaverage,we
searchn/2names,sothecostofaninquiryisalsoproportionalton.
Id1
Info1
Id1
Info1
...
Idn
Infon
available
Figure4.17.Alinearlistofrecords.
HashTablesforSymbolTables
Variations of the searching technique known as hashing have been implemented in many
compilers.
openhashingisasimplestvariantofsearchingtechnique.
Even this scheme gives us the capability of performing e inquiries on n names in time
proportionalton(n+e)/m,foranyconstantmofourchoosing.
Thismethodisgenerallymoreefficientthanlinearlistsandisthemethodofchowforsymbol
tablesinmostsituations.
ThebasichashingschemeisillustratedinFigure4.34.Therearetwopartstothedata
structure:
1. Ahashtableconsistingofafixedarrayofmpointerstotableentries.
2. Tableentriesorganizedintomseparatelinkedlists,calledbuckers(somebucketsmaybe
empty).Eachrecordinthesymboltableappearsonexactlyoneoftheselists.
CS6660
Compiler Design
Unit IV
4.28
Figure4.18:Ahashtableofsize210.
RepresentingScopeInformation
Asimpleapproachistomaintainaseparatesymboltableforeachscope.Ineffect,the
symbol table foraprocedure orscopeis the compiletimeequivalentofanactivationrecord.
LinkedlistisbesttorepresenttheScopeInformation.
Figure4.19:Themostrecententryforaisnearthefront.
4.14DYNAMICSTORAGEALLOCATION
Thetechniquesneededtoimplementdynamicstorageallocationismainlydependsonhow
thestoragedeallocated.Ifdeallocationisimplicit,thentheruntimesupportpackageisresponsible
fordeterminingwhenastorageblockisnolongerneeded.Thereislessacompilerhastodoif
deallocationisdoneexplicitlybytheprogrammer.
ExplicitAllocationofFixedSizedBlocks
Thesimplestformofdynamicallocationinvolvesblocksofafixedsize.Bylinkingthe
blocksinalist,asinFigure4.41.Allocationanddeallocationcanbedonequicklywithlittleorno
storageoverhead.
Figure4.20:Adeallocatedblockisaddedtothelitofavailableblocks.
CS6660
Compiler Design
Unit IV
4.29
Supposethatblocksaretobedrawnfromacontiguousareaofstorage.Initializationofthe
areaisdonebyusingaportionofeachblockforalinktothenextblock.Apointeravailablepoints
tothefirstblock.Allocationconsistsoftakingablockoffthelistanddeallocationconsistsof
puttingtheblockbackonthelist.
ExplicitAllocationofVariableSizedBlocks
Whenblocksareallocatedanddeallocated,storagecanbecomefragmented;thatis,theheapmay
consistofalternateblocksthatarefreeandinuse,asinFigure4.42.
Figure4.21:Freeandusedblocksinaheap.
ThesituationshowninFigure4.42canoccurifaprogramallocatesfiveblocksandthende
allocatesthesecondandfourth,Forexample.Fragmentationisofnoconsequenceifblocksareof
fixedsize,butiftheyareofvariablesize,asituationlikeFigure7.42isaproblem,becausewe
couldnotallocateablocklargerthananyoneofthefreeblocks,eventhoughthespaceisavailable.
Firstfit,worstfitandbestfitaresomemethodsforallocatingvariablesizedblocks
ImplicitDeallocation
Implicit deallocation requires cooperation between the user program and the runtime
package, because the latter needs to know when a storage block is no longer in use. This
cooperationisimplementedbyfixingtheformatofstorageblocks,theformatofastorageblockis
asshowninFigure4.43.
Figure4.22:Theformatofablock.
Referencecounts:Wekeeptrackofthenumberofblocksthatpointdirectlytothepresent
block.Ifthiscounteverdropsto0,thentheblockcanbedeallocatedbecauseitcannotbereferred
to.Inotherwords,theblockhasbecomegarbagethatcanbecollected.Maintainingreference
countscanbecostlyintime.
Markingtechniques: Analternativeapproachistosuspendtemporarilyexecutionofthe
userprogramandusethefrozenpointerstodeterminewhichblocksareinuse.
4.15STORAGEALLOCATIONINFORTAN.
FORTRAN was designed to permit static storage allocation. However, there are some
issues, such as the treatment of COMMON and EQUIVALENCE declarations, that are fairly
specialtoFortran.
CS6660
Compiler Design
Unit IV
4.30
AFortrancompilercancreateanumberofdataareas,ie.,blocksofstorageinwhichthe
valuesofobjectscanbestored.
ThereisonedataareaforeachprocedureandonedataareaforeachnamedCOMMON
blockandforblankCOMMON,ifused.
Thesymboltablemustrecordforeachnamethedataareainwhichitbelongsanditsoffset
inthatdataarea,thatis,itspositionrelativetothebeginningofthearea.
Thecompilermusteventuallydecidewherethedataareasgorelativetotheexecutablecode
andtooneanother,butthischoiceisarbitrary,sincethedataareasareindependent.
DATAinCOMMONAreas
Arecordiscreatedforeachblockwiththefirstandlastnamesofthecurrentprocedure,that
isdeclaredtobeinthatCOMMONblock.
Adeclarationis:COMMON/BLOCK1/NAMEl,NAME2
Thecompilermustdothefollowing:
1. InthetableforCOMMONblocknames,createarecordforBLOCK1,ifonedoesnot
alreadyexist.
2. InthesymboltableentriesforNAME1andNAME2,setapointertothesymboltableentry
forBLOCK1,indicatingthattheseareinCOMMONandmembersofBLOCK1.
3. a)IftherecordhasjustnowbeencreatedforBLOCK1.setapointerinthatrecordtothe
symboltableentryforNAME1,indicatingthefirstnameinthisCOMMONblock.Then,
linkthesymboltableentryforNAME1tothatforNAME2,usingafieldofthesymbol
tablereservedforlinkingmembersofthesameCOMMONblock.Finally,setapointerin
therecordforBLOCK1tothesymboltableentryforNaME2,indicatingthelastfound
memberofthatblock.
b) If,however,thisisnotthefirstdeclarationofBLOCK1,simplylinkNAME1and
NAME2totheendofthelistofnamesforBLOCK1.Thepointertotheendofthe
listforBLOCK1,appearingintherecordforBLOCK1.
Afteraprocedurehasbeenprocessed,wecalltheequivalencealgorithm;Abitinthesymboltable
entryforXYZisset,indicatingthatXYZhasbeenequivalencedtosomething.Createamemory
mapforeachCOMMONblockbyscanningthelistofnamesforthatblock.
EQUIVALENCEstatements
Thefirstalgorithmsforprocessingequivalencestatementsappearedinassemblersrather
thancompilers.Sincethesealgorithmscanbeabitcomplex,especiallywheninteractionsbetween
COMMONandEQUIVALENCEstatementsareconsidered,letustreatfirstasituationtypicalof
anassemblylanguage,wheretheonlyEQUIVALENCEstatementsareofthe'form
EQUIVALENCEA,B+offset
whereAandBarethenamesoflocations.ThisstatementmakesAdenotethelocationthatisoffset
memoryunitsbeyondthelocationforB.
A sequence of EQUIVALENCE statements groups names into equivalence sets whose
positionsrelativetooneanotherarealldefinedbytheEQUIVALENCEstatements,
EQUIVALENCEA,B+100
EQUIVALENCEC,D40
CS6660
Compiler Design
Unit V
5.1
UNITVSYNTAXANALYSIS
5.1 PRINCIPALSOURCESOFOPTIMIZATION
Acompileroptimizationmustpreservethesemanticsoftheoriginalprogram.
Exceptinveryspecialcircumstances,onceaprogrammerchoosesandimplementsa
particularalgorithm,thecompilercannotunderstandenoughabouttheprogramto
replaceitwithasubstantiallydifferentandmoreefficientalgorithm.
Acompilerknowsonlyhowtoapplyrelativelylowlevelsemantictransformations,
usinggeneralfactssuchasalgebraicidentitieslikei+0=i.
5.1.1CausesofRedundancy
Therearemanyredundantoperationsinatypicalprogram.Sometimestheredundancyis
availableatthesourcelevel.
Forinstance,aprogrammermayfinditmoredirectandconvenienttorecalculatesome
result,leavingittothecompilertorecognizethatonlyonesuchcalculationisnecessary.
Butmoreoften,theredundancyisasideeffectofhavingwrittentheprograminahighlevel
language.
Asaprogramiscompiled,eachofthesehighleveldatastructureaccessesexpandsintoa
number of lowlevel pointer arithmetic operations, such as the computation of the
locationofthe(i,j)thelementofamatrixA.
Accesses to the same data structure often share many common lowlevel operations.
Programmers are not aware of these lowlevel operations and cannot eliminate the
redundanciesthemselves.
5.1.2ARunningExample:Quicksort
Considerafragmentofasortingprogramcalledquicksorttoillustrateseveralimportant
codeimprovingtransformations.TheCprogramforquicksortisgivenbelow
voidquicksort(intm,intn)
/*recursivelysortsa[m]througha[n]*/
{
inti,j;
intv,x;
if(n<=m)return;
/*fragmentbeginshere*/
i=m1;j=n;v=a[n];
while(1){
doi=i+1;while(a[i]<v);
doj=j1;while(a[j]>
v);if(i>=j)break;
x=a[i];a[i]=a[j];a[j]=x;/*swapa[i],a[j]*/
}
x=a[i];a[i]=a[n];a[n]=x;/*swapa[i],a[n]*/
/*fragmentendshere*/
quicksort(m,j);quicksort(i+1,n);
}
Figure5.1:Ccodeforquicksort
CS6660
Compiler Design
Unit V
5.2
IntermediatecodeforthemarkedfragmentoftheprograminFigure5.1isshowninFigure
5.2. In this example we assume that integers occupy four bytes. The assignment x = a[i] is
translatedintothetwothreeaddressstatementst6=4*iandx=a[t6]asshowninsteps(14)and(15)
ofFigure.5.2.Similarly,a[j]=xbecomest10=4*janda[t10]=xinsteps(20)and(21).
Figure5.2:ThreeaddresscodeforfragmentinFigure.5.1
Figure5.3:FlowgraphforthequicksortfragmentofFigure5.1
Figure5.3istheflowgraphfortheprograminFigure5.2.BlockB 1istheentrynode.All
conditionalandunconditionaljumpstostatementsinFigure5.2havebeenreplacedinFigure5.3
byjumpstotheblockofwhichthestatementsareleaders.InFigure5.3,therearethreeloops.
CS6660
Compiler Design
Unit V
5.3
BlocksB2andB3areloopsbythemselves.BlocksB 2,B3,B4,andB5togetherformaloop,withB 2
theonlyentrypoint.
5.1.3SemanticsPreservingTransformations
Thereareanumberofwaysinwhichacompilercanimproveaprogramwithoutchanging
the function it computes. Common subexpression elimination, copy propagation, deadcode
elimination,andconstantfoldingarecommonexamplesofsuchfunctionpreserving(orsemantics
preserving)transformations.
(a)Before
(b)After
Figure5.4:Localcommonsubexpressionelimination
Someoftheseduplicatecalculationscannotbeavoidedbytheprogrammerbecausetheylie
belowthelevelofdetailaccessiblewithinthesourcelanguage.Forexample,blockB5shownin
Figure 5.4(a) recalculates 4 * i and 4 *j, although none of these calculations were requested
explicitlybytheprogrammer.
5.1.4GlobalCommonSubexpressions
AnoccurrenceofanexpressionEiscalledacommonsubexpressionifEwaspreviously
computedandthevaluesofthevariablesinEhavenotchangedsincethepreviouscomputation.
WeavoidrecomputingEifwecanuseitspreviouslycomputedvalue;thatis,thevariablexto
whichthepreviouscomputationofEwasassignedhasnotchangedintheinterim.
Theassignmentstot7andt10inFigure5.4(a)computethecommonsubexpressions4*i
and4*j,respectively.ThesestepshavebeeneliminatedinFigure5.4(b),whichusest6insteadof
t7andt8insteadoft10.
Figure9.5showstheresultofeliminatingbothglobalandlocalcommonsubexpressions
fromblocksB5andB6intheflowgraphofFigure5.3.WefirstdiscussthetransformationofB 5
andthenmentionsomesubtletiesinvolvingarrays.
Afterlocalcommonsubexpressionsareeliminated,B5stillevaluates4*iand4*j,asshown
inFigure5.4(b).Botharecommonsubexpressions;inparticular,thethreestatements
t8=4*j
t9=a[t8]
a[t8]=x
inB5canbereplaced
byt9=a[t4]
a[t4]=x
usingt4computedinblockB3.InFigure5.5,observethatascontrolpassesfromtheevaluationof
4*jinB3toB3,thereisnochangetojandnochangetot4,sot4canbeusedif4*jisneeded.
CS6660
Compiler Design
Unit V
5.4
Another common subexpression comes to light in B5 after t4 replaces t8. The new
expressiona[t4]correspondstothevalueofa[j]atthesourcelevel.Notonlydoesjretainitsvalue
ascontrolleavesB3andthenentersB5,buta[j],avaluecomputedintoatemporaryt5,doestoo,
becausetherearenoassignmentstoelementsofthearrayaintheinterim.Thestatements
t9=a[t4]
a[t6]=t9
inB5thereforecanbereplaced
bya[t6]=t5
Analogously,thevalueassignedtoxinblockB5ofFigure5.4(b)isseentobethesameas
thevalueassignedtot3inblockB2.BlockB5inFigure5.5istheresultofeliminatingcommon
subexpressionscorrespondingtothevaluesofthesourcelevelexpressionsa[i]anda[j]fromB 5in
Figure5.4(b).AsimilarseriesoftransformationshasbeendonetoB6inFigure5.5.
The expression a[tl] in blocks B1 and B6 of Figure 5.5 is not considered a common
subexpression,althoughtlcanbeusedinbothplaces.AftercontrolleavesB 1andbeforeitreaches
B6,itcangothroughB5,wherethereareassignmentstoa.Hence,a[tl]maynothavethesame
value on reaching B6 as it did on leaving B 1, and it is not safe to treat a[tl] as a common
subexpression.
Figure5.5:B5andB6aftercommonsubexpressionelimination
5.1.5CopyPropagation
Block B5 in Figure 5.5 can be further improved by eliminating x, using two new
transformations.Oneconcernsassignmentsoftheformu=vcalledcopystatements,orcopiesfor
CS6660
Compiler Design
Unit V
5.5
short. Copies would have arisen much sooner, because the normal algorithm for eliminating
commonsubexpressionsintroducesthem,asdoseveralotheralgorithms.
(a)
(b)
Figure5.6:Copiesintroducedduringcommonsubexpressionelimination
Inordertoeliminatethecommonsubexpressionfromthestatement c=d+e inFigure
5.6(a),wemustuseanewvariablettoholdthevalueofd+e.Thevalueofvariablet,insteadof
thatoftheexpression d+e,isassignedtocinFigure5.6(b).Sincecontrolmayreach c=d+e
eitheraftertheassignmenttoaoraftertheassignmenttob,itwouldbeincorrecttoreplacec=d+e
byeitherc=aorbyc=b.
Theideabehindthecopypropagationtransformationistousevforu,whereverpossible
afterthecopystatementu=v.Forexample,theassignmentx=t3inblockB 5ofFigure5.5isa
copy.CopypropagationappliedtoB5yieldsthecodeinFigure5.7.Thischangemaynotappearto
beanimprovement,but,itgivesustheopportunitytoeliminatetheassignmenttox.
Figure5.7:BasicblockB5aftercopypropagation
5.1.6DeadCodeElimination
AvariableisLIVEatapointinaprogramifitsvaluecanbeusedsubsequently;otherwise,it
isDEADatthatpoint.ArelatedideaisDEAD(orUSELESS)codestatementsthatcomputevaluesthat
nevergetused.Whiletheprogrammerisunlikelytointroduceanydeadcodeintentionally,itmay
appearastheresultofprevioustransformations.
Deducing at compile time that the value of an expression is a constant and using the
constantinsteadisknownasconstantfolding.
Oneadvantageofcopypropagationisthatitoftenturnsthecopystatementintodeadcode.
Forexample,copypropagationfollowedbydeadcodeeliminationremovestheassignmenttox
andtransformsthecodeinFigure5.7into
ThiscodeisafurtherimprovementofblockB5inFigure5.5.
5.1.7CodeMotion
Loops are a very important place for optimizations, especially the inner loops where
programstendtospendthebulkoftheirtime.Therunningtimeofaprogrammaybeimprovedif
wedecreasethenumberofinstructionsinaninnerloop,evenifweincreasetheamountofcode
outsidethatloop.
CS6660
Compiler Design
Unit V
5.6
Animportantmodificationthatdecreasestheamountofcodeinaloopis CODEMOTION.This
transformationtakesanexpressionthatyieldsthesameresultindependentofthenumberoftimesa
loopisexecuted(aloopinvariantcomputation)andevaluatestheexpressionbeforetheloop.
Evaluationoflimit2isaloopinvariantcomputationinthefollowingwhilestatement:
while(i<=limit2)/*statementdoesnotchangelimit*/
Codemotionwillresultintheequivalentcode
t=limit2
while(i<=t)/*statementdoesnotchangelimitort*/
Now,thecomputationoflimit2isperformedonce,beforeweentertheloop.Previously,there
wouldben+1calculationsoflimit2ifweiteratedthebodyoftheloopntimes.
5.1.8InductionVariablesandReductioninStrength
Anotherimportantoptimizationistofindinductionvariablesinloopsandoptimizetheir
computation.Avariablexissaidtobean"inductionvariable"ifthereisapositiveornegative
constantcsuchthateachtimexisassigned,itsvalueincreasesbyc.Forinstance, i and t2 are
inductionvariablesintheloopcontainingB2ofFigure5.5.Inductionvariablescanbecomputed
withasingleincrement(additionorsubtraction)perloopiteration.Thetransformationofreplacing
anexpensiveoperation,suchasmultiplication,byacheaperone,
such as addition, is known as strength reduction. But induction variables not only allow us
sometimestoperformastrengthreduction;oftenitispossibletoeliminateallbutoneofagroupof
inductionvariableswhosevaluesremaininlockstepaswegoaroundtheloop.
Figure5.8:Strengthreductionappliedto4*jinblockB3
CS6660
Compiler Design
Unit V
5.7
Whenprocessingloops,itisusefultowork"insideout";thatis,weshallstartwiththe
innerloopsandproceedtoprogressivelylarger,surroundingloops.Thus,weshallseehowthis
optimizationappliestoourquicksortexamplebybeginningwithoneoftheinnermostloops:B3by
itself.Notethatthevaluesofjandt4remaininlockstep;everytimethevalueofjdecreasesby1,
thevalueoft4decreasesby4,because4*jisassignedtot4.Thesevariables,jandt4,thusforma
goodexampleofapairofinductionvariables.
Whentherearetwoormoreinductionvariablesinaloop,itmaybepossibletogetridofall
butone.FortheinnerloopofB3inFig.9.5,wecannotgetridofeitherjort4completely;t4is
usedinB3andjisusedinB4.However,wecanillustratereductioninstrengthandapartofthe
processofinductionvariableelimination.Eventually,jwillbeeliminatedwhentheouterloop
consistingofblocksB2,B3,B4andBsisconsidered.
Figure5.9:Flowgraphafterinductionvariableelimination
AfterreductioninstrengthisappliedtotheinnerloopsaroundB 2andB3,theonlyuseofiandjis
todeterminetheoutcomeofthetestinblockB 4.Weknowthatthevaluesofiandt2satisfythe
relationshipt2=4*i,whilethoseofjandt4satisfytherelationshipt4=4*j.Thus,thetestt2>=
t4cansubstitutefori>=j.Oncethisreplacementismade,iinblockB 2andjinblockB3become
dead variables, and the assignments to them in these blocks become dead code that can be
eliminated.TheresultingflowgraphisshowninFigure.5.9.
Note:
1. Codemotion,inductionvariableeliminationandstrengthreductionareloopoptimization
techniques.
2. Commonsubexpressionelimination,copypropogationdeadcodeeliminationandconstant
foldingarefunctionpreservingtransformations.
CS6660
Compiler Design
Unit V
5.8
5.2DIRECTEDACYCLICGRAPHS(DAG)
Likethesyntaxtreeforanexpression,aDAGhasleavescorrespondingtoatomicoperands
andinteriorcodescorrespondingtooperators.ThedifferenceisthatanodeNinaDAGhasmore
thanoneparentifNrepresentsacommonsubexpression;inasyntaxtree,thetreeforthecommon
subexpressionwouldbereplicatedas many times as thesubexpressionappears intheoriginal
expression.Thus,aDAGnotonlyrepresentsexpressionsmoresuccinctly,itgivesthecompiler
importantcluesregardingthegenerationofefficientcodetoevaluatetheexpressions.
Example:TheDAGfortheexpressiona+a*(bc)+(bc)*dbysequenceofsteps
The leaf for a has two parents, because a appears twice in the expression. More
interestingly,thetwooccurrencesofthecommonsubexpressionbcarerepresentedbyonenode,
thenodelabeled .Thatnodehastwoparents,representingitstwousesinthesubexpressions
a*(bc)and(bc)*d.Eventhoughbandcappeartwiceinthecompleteexpression,theirnodeseach
haveoneparent,sincebothusesareinthecommonsubexpressionbc.
Figure5.10:Dagfortheexpressiona+a*(bc)+(bc)*d
Table5.1:SyntaxdirecteddefinitiontoproducesyntaxtreesorDAG's
S.No.
PRODUCTION
SEMANTICRULES
E.node=newNode('+',El.node,T.node)
1)
EE1 +T
E.node=newNode('',El.node,T.node)
2)
EE1 T
3)
ET
E.node=T.node
4)
T(E)
E.node=T.node
T.node=newLeaf(id,id.entry)
5)
TID
T.node=newLeaf(num,num.val)
6)
TNUM
TheSyntaxdirecteddefinition(SDD)ofFigure5.10canconstructeithersyntaxtreesor
DAG's.ItwasusedtoconstructsyntaxtreesinExample5.10,wherefunctionsLeafandNode
createdafreshnodeeachtimetheywerecalled.ItwillconstructaDAGif,beforecreatinganew
node,thesefunctionsfirstcheckwhetheranidenticalnodealreadyexists.Ifapreviouslycreated
identicalnodeexists,theexistingnodeisreturned.Forinstance,beforeconstructinganewnode,
Node(op,left,right)wecheckwhetherthereisalreadyanodewithlabelop,andchildrenleftand
right,inthatorder.Ifso,Nodereturnstheexistingnode;otherwise,itcreatesanewnode.
CS6660
Compiler Design
Unit V
5.9
1) pl=Leaf(id,entrya)
2) p2=Leaf(id,entrya)=p1
3) p3=Leaf(id,entryb)
4) p4=Leaf(id,entryc)
5) p5=Node('',p3,p4)
6) p6=Node('*',plp5)
7) p7=Node('f'p1,p6)
8) p8=Leaf(id,entryb)=p3
9) p9=Leaf(id,entryc)=p4
10) pl0=Node('',p3,p4)=p5
11) p11=Leaf(id,entryd)
12) p12=Node('*',p5,p11)
7
CS6660
Compiler Design
Unit V
5.10
Example:ConstructDAGfromthebasicblock.
1
t1=4*i
t2=a[t1]
t3=4*i
t4=b[t3]
t5=t2*t4
t6=prod+t5
t7=i+1
i=t7
ifi<=20goto1
Statement1
*
Statement2
Statement3
[]
[]
t1
t2
Statement5
pro
d
* t5
* t1, t3
FinalDAG
t6,
* prod
prod
* t5
[] t2
[] t4
b
4
[] t2 [] t4
t2 [] t4
b
t3
t6,
* prod
* t1,
Statement8,9
* t5
[
]
t1, t3
[] t4
t6,
* prod
*
4
Statement6,7
[] t4
t2
* t1, t3
prod
[]
a
* t5
[] t2
t2
* t1
4
Statement4
<=
t1, t3
i
t7,
i
1
2
0
t1, t3
i
t7, i
1
Figure5.11:StepbystepconstructionofDAG
CS6660
Compiler Design
Unit V
5.11
5.3.2FindingLocalCommonSubexpressions
Commonsubexpressionscanbedetectedbynoticing,asanewnodeMisabouttobeadded,
whetherthereisanexistingnodeNwiththesamechildren,inthesameorder,andwiththesame
operator.Ifso,NcomputesthesamevalueasMandmaybeusedinitsplace.
Example5.10:ADAGfortheblock
a=b+c
b=ad
c=b+c
d=ad
isshowninFigure5.11.Whenweconstructthenodeforthethirdstatementc=b+c,we
knowthattheuseofbinb+creferstothenodeofFigure5.11labeled,becausethatisthemost
recentdefinitionofb.Thus,wedonotconfusethevaluescomputedatstatementsoneandthree.
Figure5.11:DAGforbasicblock
However,thenodecorrespondingtothefourthstatementd=adhastheoperatorandthe
nodeswithattachedvariablesaandd0aschildren.Sincetheoperatorandthechildrenarethesame
asthoseforthenodecorrespondingtostatementtwo,wedonotcreatethisnode,butadddtothe
listofdefinitionsforthenodelabeled.
Ifbisnotliveonexitfromtheblock,thenwedonotneedtocomputethatvariable,andcan
usedtoreceivethevaluerepresentedbythenodelabeled.
a=b+c
d=ad
c=d+c
However,ifbothbanddareliveonexit,thenafourthstatementmustbeusedtocopythe
valuefromonetotheother.'
Example5.11:Whenwelookforcommonsubexpressions,wereallyarelookingforexpressions
thatareguaranteedtocomputethesamevalue,nomatterhowthatvalueiscomputed.Thus,the
DAGmethodwillmissthefactthattheexpressioncomputedbythefirstandfourthstatementsin
thesequence
a=b+c
b=bd
c=c+d
e=b+c
CS6660
Compiler Design
Unit V
5.12
isthesame,namelyb0+c0.Thatis,eventhoughbandcbothchangebetweenthefirstand
laststatements,theirsumremainsthesame,becauseb+c=(bd)+(c+d).TheDAGforthis
sequenceisshowninFig.5.12,butdoesnotexhibitanycommonsubexpressions.
Figure5.12:DAGforbasicblock
5.3.3DeadCodeElimination
TheoperationonDAG'sthatcorrespondstodeadcodeeliminationcanbeimplementedas
follows. We delete from a DAG any root (node with no ancestors) that has no live variables
attached.RepeatedapplicationofthistransformationwillremoveallnodesfromtheDAGthat
correspondtodeadcode.
Example5.12:If,inFig.5.11,aandbarelivebutcandearenot,wecanimmediatelyremovethe
rootlabelede.Then,thenodelabeledcbecomesarootandcanberemoved.Therootslabeleda
andbremain,sincetheyeachhavelivevariablesattached.
Figure5.13:DAGafterDeadCodeElimination
5.3.4TheUseofAlgebraicIdentities
Algebraicidentitiesrepresentanotherimportantclassofoptimizationsonbasicblocks.For
example,wemayapplyarithmeticidentities,suchastoeliminatecomputationsfromabasicblock.
Anotherclassofalgebraicoptimizationsincludeslocalreductioninstrength,
thatis,replacingamoreexpensiveoperatorbyacheaperoneasin:
CS6660
Compiler Design
Unit V
5.13
5.3.5RepresentationofArrayReferences
Considerforinstancethesequenceofthreeaddressstatements:
=
[]
[] =
replacingthethirdinstruction
Theabovecodecanbe"optimized"by
bythesimplerz=
[]
[]
x.However,thefirststatementcannotbeoptimized.
TheproperwaytorepresentarrayaccessesinaDAGisasfollows.
1. Anassignmentfromanarray,likex=a[i],isrepresentedbycreatinganodewithoperator
=[]andtwochildrenrepresentingtheinitialvalueofthearray,a0inthiscase,andtheindex
i.Variablexbecomesalabelofthisnewnode.
2. Anassignmenttoanarray,likea[jl=y,isrepresentedbyanewnodewithoperator[]=and
threechildrenrepresentingao,jandy.Thereisnovariablelabelingthisnode.Whatis
differentisthatthecreationofthisnodekzllsallcurrentlyconstructednodeswhosevalue
dependsona0.Anodethathasbeenkilledcannotreceiveanymorelabels;thatis,itcannot
becomeacommonsubexpression.
Example5.11:TheDAGforthebasicbloc
=
[]
[] =
[]
ThenodeNforxiscreatedfirst,butwhenthenodelabeled[]=iscreated,Niskilled.
Thus,whenthenodeforziscreated,itcannotbeidentifiedwithN,andanewnodewiththesame
operandsa0andi0mustbecreatedinstead.
z
=[]
Killed
=[]
=[]x
a0
a0
j0
y0
Figure5.12:TheDAGforasequenceofarrayassignments
Example5.12:Sometimes,anodemustbekilledeventhoughnoneofitschildrenhaveanarray
likea0inExample5.11asattachedvariable.Likewise,anodecankillifithasadescendantthatis
anarray,eventhoughnoneofitschildrenarearraynodes.Forinstance,considerthethreeaddress
code
b=12+a
x=b[i]
b[j]=y
CS6660
Compiler Design
Unit V
5.14
Whatishappeninghereisthat,forefficiencyreasons,bhasbeendefinedtobeapositionin
anarraya.Forexample,iftheelementsof a arefourbyteslong,thenbrepresentsthefourth
elementofa.Ifjandirepresentthesamevalue,thenb[i]andb[j]representthesamelocation.
Thereforeitisimportanttohavethethirdinstruction,b[j]=y,killthenodewithxasitsattached
variable.
=[]
Killed
=[]
12
i0
j0
y0
Figure5.13:Anodethatkillsauseofanarrayneednothavethatarrayasachild
However,asweseeinFig.5.13,boththekillednodeandthenodethatdoesthekillinghave
a0asagrandchild,notasachild.
5.3.6PointerAssignmentsandProcedure=Calls
Whenweassignindirectly =throughapointer,asintheassignments
wedonotknowwhatporqpointto.Ineffect,x=*pisauseofeveryvariablewhatsoever,and*q
=yisapossibleassignmenttoeveryvariable.Asaconsequence,theoperator=*musttakeall
nodesthatarecurrentlyassociated
withidentifiersasarguments,whichisrelevantfordeadcodeelimination.Moreimportantly,the
*=operatorkillsallothernodessofarconstructedintheDAG.
Thereareglobalpointeranalysesonecouldperformthatmightlimitthesetofvariablesapointercouldreferenceata
given=place&inthecode.Evenlocalanalysiscouldrestrictthescopeofapointer.Forinstance,inthesequence=
weknowthatx,andnoothervariable,isgiventhevalueofy,sowedon'tneedtokillanynodebut
thenodetowhichxwasattached.
Procedurecallsbehavemuchlikeassignmentsthroughpointers.Intheabsenceofglobaldataflow
information,wemustassumethataprocedureusesandchangesanydatatowhichithasaccess.
Thus,ifvariablexisinthescope
ofaprocedureP,acalltoPbothusesthenodewithattachedvariablexandkillsthatnode.
5.3.7ReassemblingBasicBlocksfromDAG's
After we perform whatever optimizations are possible while constructing the DAG or by
manipulatingtheDAGonceconstructed,wemayreconstitutethethreeaddresscodeforthebasic
blockfromwhichwebuilttheDAG.Foreachnodethathasoneormoreattachedvariables,we
CS6660
Compiler Design
Unit V
5.15
constructathreeaddressstatementthatcomputesthevalueofoneofthosevariables.Wepreferto
computetheresultintoavariablethatisliveonexitfromtheblock.However,ifwedonothave
global livevariable information to work from, we need to assume that every variable of the
program(butnottemporariesthataregeneratedbythecompilertoprocessexpressions)isliveon
exitfromtheblock.
If the node has more than one live variable attached, then we have to introduce copy
statementstogivethecorrectvaluetoeachofthosevariables.Sometimes,globaloptimizationcan
eliminatethosecopies,ifwecanarrangetouseoneoftwovariablesinplaceoftheother.
Example8.15:consideragainExample5.11,ifbisnotliveonexitfromtheblock,thenthethree
statements
a=b+c
d=ad
c=d+c
sufficetoreconstructthebasicblock.Thethirdinstruction,c=d+c,mustusedasanoperand
ratherthanb,becausetheoptimizedblocknevercomputesb.
Ifbothbanddareliveonexit,orifwearenotsurewhetherornottheyareliveonexit,
thenweneedtocomputebaswellasd.Wecandosowiththesequence
a=b+c
d=ad
b=d
c=d+c
Thisbasicblockisstillmoreefficientthantheoriginal.Althoughthenumberofinstructionsisthe
same, we have replaced a subtraction by a copy, which tends to be less expensive on most
machines.Further,itmaybethatbydoingaglobalanalysis,wecaneliminatetheuseofthis
computationofboutsidetheblockbyreplacingitbyusesofd.Inthatcase,wecancomebackto
thisbasicblockandeliminateb=dlater.Intuitively,wecaneliminatethiscopyifwhereverthis
value of b is used, d is still holding the same value. That situation may or may not be true,
dependingonhowtheprogramrecomputesd.
5.4GLOBALDATAFLOWANALYSIS
Globaldataflowanalysiscollectstheinformationabouttheentireprogramanddistribute
thisinformationtoeachblockintheflowgraph.Dataflowinformationcanbecollectedbysettingupandsolvingsystemsofequationsthatrelatesinformationatvarious
pointsinaprogram.Theseequationsaretermedasdataflowequations.Atypicaldataflowequationhastheform
Out[S]=gen[S](in[S]kill[S])
Where
gen[S]=DefinitionswithinBthatreachtheendof
B.in[S]= Definitions that reach Bs entry.
kill[S]=DefinitionsthatneverreachtheendofBduetoredefinitionsofvariablesinB..
Out[S]=DefinitionsthatreachBs exit.
Pathsandpoints
Adefinitionpointisapointinaprogramatwhichdefinitioniscarriedoutforavariable.
Areferencepointisapointinaprogramatwhichareferencetoadataitemismade.
Anevaluationpointisapointinaprogramatwhichexpressionisevaluatedcompletely.
CS6660
Compiler Design
x =3
y =x +5
z = x+ y
Unit V
5.16
Definitionpointforavariablex
Referencepointforavariablex
Evaluationpointforavariablez
Thenumberofpointsinabasicblockiscalculatedasfollows:
Apointcalculatedbetweentwoadjacentstatementsinablock.
Apointbeforethefirststatementoftheblock
Apointafterthelaststatementoftheblock
Example8.16:Findthenumberofpointsinthebasic
blocka=b+c
b=e+u
c=8*b
Numberofpointsbetweentwoadjacentstatementsintheblock=2
Numberofpointsbeforethefirststatementoftheblock=1
Numberofpointsafterthelaststatementoftheblock=1
Totalnumberofpoints=2+1+1=4points
Apathfromp1topnisasequenceofpintsp1,p2,...,pnsuchthatforeachibetween1and
n1,either
1. pi is the point immediately preceding a statement and pi+1 is the point immediately
followingthatstatementinthesameblock,or
2. piistheendofsomeblockandpi+1isthebeginningofasuccessorblock.
ReachingDefinitions
Adefinitiondofavariablex:Adefinitiondofavariablexisastatementthatassignsa
valuetox.otherkindsofstatements(procedurecallorpointer)assignmentdefineavalue
forvariablexarecalledambiguousdefinitions.
Useofvariablex:Theuseofvariablexmeansthevalueofxisreferencedinexpression
evaluation.
Reachability:Definitiondofavariablexreachesapointpifthereisapathfromthepoint
immediatelyfollowingdtop,suchthatdisnot"killed"alongthatpath.
Killingavariable:Definitiondofavariablexiskilledwhenthereisaredefinitionforthe
variablex.
Livevariable:Avariablexisliveatsomepointpifthereisapathfromptoexit,along
whichthevalueofxisusedbeforeitisredefined.Otherwisethevariableissaidtobedead
atthatpoint.
CS6660
Compiler Design
Unit V
5.17
Example8.17:Findthereachabilityofvariablex.
x=5
B2
y=7
t=3
B t=x+
4
5
B1
Variable X is used in B4
Variable X is reachable to B4 via B3 (not
killed in B3)
B3
w=
t+10
B
5
x=t+
w
B6
b = 15 B7
Figure5.15:Reachingdefinitions
DataflowanalysisofstructuredPrograms
Flow graphs forcontrolflow constructs suchas ifelse and dowhile statements have a
usefulproperty;thereisasinglebeginningpintatwhichcontrolentersandasingleendpointthat
controlleavesfromwhenexecutionofthestatementisover.
S1
S2
S1 ;
S2
If E goto
S1
S1
S2
IF E then S1 else
S2
S1
If E goto
S1
do S1
while E
Figure5.16:Structuredcontrolconstructs
ConservativeEstimationofDataFlowInformation
Optimizationsappliedtothecodemustbesafe.i.e.,thedataflowfactscomputedshould
definitelybetrue.
Twomainreasonsthatcauseresultsofanalysistobeconservative:
1. Controlflow: Thedataflowequationsaregeneratedbasedontheassumption
thatallpathsareexecutable,butinpracticalitwillexecuteonepathinifthen
elsecontrol.
2. Pointersandaliasing: Thevalueofthepointermaynotknowninadvanceto
theprogrammar.
Thedefinitionsreaching:thebeginningandendofstatementswiththefollowingsyntaxgiven
below
Sid=E|S;S|ifEthenSelseS|doSwhileE
Eid+id|id
CS6660
Compiler Design
Unit V
5.18
gen[S]={d}
(a)
d:
a=b+c
kill[S]=Da{d}
out[S]=gen[S]
(b)
(c)
S1
gen[S]=gen[S]
kill[S]=kill[S]
S1
in[S1]=in[S]
in[S2]=in[S1]
out[S]=out[S2]
S1
S1
(IN[S] - kill[S])
gen[S1]kill[S2])
kill[S]gen[S])
gen[S]=gen[S1]gen[S2]
kill[S]=kill[S1]kill[S2]
in[S1]=in[S]
in[S2]=in[S]
out[S]=out[S1]out[S2]
gen[S]=gen[S1]
kill[S]=kill[S1]
in[S1]=in[S]
S1
(d)
in[S2]=in[S]
gen[S1]'
out[S]=out[S1]
Figure5.17:Dataflowequationsforreachingdefinitions
Representationofsets
Thesetofdefinitionsforgen[S]andkill[S]canberepresentedbybitvectors.
Thebitvectorisassigned1toapositionI,ifthedefinitionnumberedIispresentintheset.
Thiscanbetakenastheindexofthestatement.
Thebitvectorrepresentationallowssetoperationstobeimplementedefficiently.
Considerthecode
j=j1
ife1then
a=u2
else
i=u3
/*d5*/
/*d6*/
/*d7*/
if
Gen ={d6,
if
00000
11
00000
00
d7}
Kill ={}
e1
d1
d2
e1
d1
0000011
0000100
d2
00000
01
10010
00
Figure5.17:Setrepresentationandbitvectorrepresentationforgen[]andkill[].
CS6660
Compiler Design
Unit V
5.19
5.5EFFICIENTDATAFLOWALGORITHMS
Dataflowanalysisspeedcanbeincreasedbythefollowingtwoalgorithms
1. DepthFirstOrderinginiterativeAlgorithms:
2. StructurebasedDataFlowAnalysis.
Thefirstisanapplicationofdepthfirstorderingtoreducethenumberof'passesthatthe
iterative algorithm takes, and the second uses intervals or the T 1and T2 transformations to
generalizethesyntaxdirectedapproach.
DepthFirstOrderinginiterativeAlgorithms
Reachingdefinitions.Availableexpressions,orlivevariables,anyeventofsignificanceata
nodewillbepropagatedtothatnodealonganacyclicpath.
Iterativealgorithmscanbeusedtotracktheiracyclicnature.
Ifadefinitiondisinin[B]thenthereissomeacyclicpathfromtheblockcontainingdtoB
suchthatdisinthein'sandout'sallalongthatpath.
Ifanexpressionx+yisnotavailableattheentrancetoblockB,thenthereissomeacyclic
paththatdemonstratesthatfact;eitherthepathisfromtheinitialnodeandincludesno
statementthatkillsorgeneratesx+y,orthepathisfromablockthatkillsx+yandalongthe
paththereisnosubsequentgenerationofx+y.
Forlivevariables.ifxisliveonexitfromblockB,thenthereisanacyclicpathfromBtoa
useofx,alongwiththerearenodefinitionsofx.
IfauseofxisreachedfromtheendofblockBalongapathwithacycle,wecaneliminate
thatcycletofindashorterpathalongwhichtheuseofxisstillreachedfromB.
Procedure
1. Firstvisittherootnodeofthetree.Eg.(1)
2. Ifnorootnodepresent,thenvisitthefirstrighthandsidenode.Eg.(1)
3. Afterreachingdepthvisitthemissednodebyvisitingtheirparentnode.
1
2
2
3
8
10
10
Figure5.18:Depthfirsttraversalforthegiventree.
Theorderofvisitingtheedgesintheabovetreeis:
13467810898764543121
Steps:
Afternode4,thereisconfusion,either5or6,weconsidered6.
Aftervisitingnode10,backtractto8tovisit9.
ThedefinitiondfromOut[1]willreachIn[3]andOut[3]willreachIn[4]andsoon.
CS6660
Compiler Design
Unit V
5.20
StructurebasedDataFlowAnalysis
Wecanimplementdataflowalgorithmsthatvisitnodesnomoretimesthantheinterval
depthoftheflowgraph.TheideasexposedhereapplytosyntaxdirecteddataflowalgorithmsFor
allsortsofstructuredcontrolstatements.
Thisalgorithmfocusonmultipleexistsintheblocks.
GenR,BindicatesthedefinitionthatwasgeneratedintheregionRofthebasicblockB.
KillR,BindicatesthedefinitionthatwaskilledintheregionRofthebasicblockB.
ThetransferfunctionTransR,B(S)ofdefinitionsetSissetofdefinitionsreachtheend
ofblockBbytravelingalongpathswhollywithinR.
ThedefinitionsreachingtheendofblockBfallintotwoclasses.
1. ThosethataregeneratedwithinRandpropagatetotheendofBindependentofS.
2. ThosethatarenotgeneratedinR,butthatalsoarenotkilledalongsomepathfrom
theheaderofRtotheendofB,andthereforeareinTransR,B(S)ifandonlyifthey
areinS.
Thus,wemaywritetransintheform:
TransR,B(S)=GenR,B(SKillR,B)
Case1:
IfthetransformationdoesnotalteranydefinitionIthebasicblockB,thenthetransfer
functionofregionR,issameasthetransferfunctionofBlockB.
B,B
= Gen[B]
Gen
B,B
= Kill[B]
kill
Case2:
TheregionRisformedwhenR1consumesR2.TherearenoedgesfromR2toR1.Headerof
RistheheaderofR1.TheR2doesnotaffectthetransferfunctionofR1.
Gen
R,B =GenR1,B
forallBinR1.
kill R,B =killR1,B
...
R2
...
...
Figure5.19:RegionbuildingbyT2
ForBinR2,adefinitioncanreachtheendofBifanyofthefollowingconditionshold:
1. ThedefinitionisgeneratedwithinR2.
2. ThedefinitionisgeneratedwithinR 1reachestheendofsomepredecessoroftheheaderof
R2,andisnotkilledgoingfromtheheaderofR2toB.
3. ThedefinitionisinthesetSavailableattheheaderofR 1,notkilledgoingtosome
predecessoroftheheaderofR2,andnotkilledgoingfromtheheaderofR2toB.
CS6660
Compiler Design
Unit V
5.21
5.6ISSUESINDESIGNOFACODEGENERATOR
The most important criterion for a code generator is that it produce correct code. The
followingissuesarisesduringthecodegenerationphase.
1InputtotheCodeGenerator
2TheTargetProgram
3MemoryManagement
4InstructionSelection
5RegisterAllocation
6EvaluationOrder
1InputtotheCodeGenerator
Theinputtothecodegeneratoris theintermediaterepresentation(IR) ofthesource
programproducedbythefrontend,alongwithinformationinthesymboltablethatisusedto
determinetheruntimeaddressesofthedataobjectsdenotedbythenamesintheIR.
Thechoicefortheintermediaterepresentationincludesthefollowing:
Threeaddressrepresentationssuchasquadruples,triples,indirecttriples;
Virtualmachinerepresentationssuchasbytecodesandstackmachinecode;
Linearrepresentationssuchaspostfixnotation.
GraphicalrepresentationssuchassyntaxtreesandDAG's.
Thefrontendhasscanned,parsed,andtranslatedthesourceprogramintoarelativelylow
levelIR,sothatthevaluesofthenamesappearingintheIRcanberepresentedbyquantitiesthat
thetargetmachinecandirectlymanipulate,suchasintegersandfloatingpointnumbers.
Allsyntacticandstaticsemanticerrorshavebeendetected,thatthenecessarytypechecking
hastakenplace,andthattypeconversionoperatorshavebeeninsertedwherevernecessary.The
codegeneratorcanproceedontheassumptionthatitsinputiserrorfree.
2TheTargetProgram
Theoutputofthecodegeneratoristhetargetprogramwhichisgoingtoruninthefollowing
computers.
The instructionset architecture of the target machine has a significant impact on the
difficultyofconstructingagoodcodegeneratorthatproduceshighqualitymachinecode.
The most common targetmachine architectures are RISC (reduced instruction set
computer),CISC(complexinstructionsetcomputer),andstackbased.
ARISCmachinetypicallyhasmanyregisters,threeaddressinstructions,simpleaddressing
modes,andarelativelysimpleinstructionsetarchitecture.Incontrast,a CISC machine
typicallyhasfewregisters,twoaddressinstructions,avarietyofaddressingmodes,several
registerclasses,variablelengthinstructions,andinstructionswithsideeffects.
Inastackbasedmachine,operationsaredonebypushingoperandsontoastackandthen
performing the operations on the operands at the top of the stack. To achieve high
performancethetopofthestackiskeptinregisters.
TheJVMisasoftwareinterpreterforJavabytecodes,anintermediatelanguageproduced
by Java compilers. The interpreter provides software compatibility across multiple
platforms, a major factor in the success of Java. To improve the high performance
interpretationjustintime(JIT)Javacompilershavebeencreated.
Theoutputofthecodegeneratormaybe:
Absolutemachinelanguageprogram: Itcanbeplacedinafixedmemorylocationand
immediatelyexecuted.
CS6660
Compiler Design
Unit V
5.22
Reloadablemachinelanguageprogram: Itallowssubprograms(objectmodules)tobe
compiledseparately.Asetofrelocatableobjectmodulescanbelinkedtogetherandloaded
forexecutionbyalinkingloader.thecompilermustprovideexplicitrelocationinformation
totheloaderifautomaticrelocationisnotpossible.
3MemoryManagement
Namesinthesourceprogramaremappedtoaddressesofdataobjectsinruntimememory
byboththefrontendandcodegenerator.
MemoryManagementusessymboltabletogetnamesinformation.
Theamountofmemoryrequiredbydeclaredidentifiesarecalculatedandstoragespaceis
reservedinmemoryatruntime.
Labelsinthreeaddresscodeareconvertedintoequivalentmemoryaddress.
Forinstanceifareferencetogotojisencounteredinthreeaddresscodethenappropriate
jumpinstructioncanbegeneratedbycomputingmemoryaddressforlabelj.
Someinstructionaddresscanbecalculatedinruntimeonlythatisalsoafterloadingthe
program.
4InstructionSelection
ThecodegeneratormustmaptheIRprogramintoacodesequencethatcanbeexecutedbythe
targetmachine.Thecomplexityofperformingthismappingisdeterminedbyafactorssuchas
Theleveloftheintermediaterepresentation(IR).
Thenatureoftheinstructionsetarchitecture.
Thedesiredqualityofthegeneratedcode.
IftheIRishighlevel,thecodegeneratormaytranslateeachIRstatementintoasequenceof
machineinstructionsusingcodetemplates.Suchstatementbystatementcodegeneration,however,
oftenproducespoorcodethatneedsfurtheroptimization.IftheIRreflectssomeofthelowlevel
detailsoftheunderlyingmachine,thenthecodegeneratorcanusethisinformationtogenerate
moreefficientcodesequences.
Theuniformityandcompletenessoftheinstructionsetareimportantfactors.Theselection
ofinstructiondependsontheinstructionsetofthetargetmachine.Instructionspeedsandmachine
idiomsareotherimportantfactorsinselectionofinstruction.
If we do not care about the efficiency of the target program, instruction selection is
straightforward. For each type of threeaddress statement, we can design a code skeleton that
definesthetargetcodetobegeneratedforthatconstruct.
Forexample,everythreeaddressstatementoftheformx=y+z,wherex,y,andzare
staticallyallocated,canbetranslatedintothecodesequence
LD
R0,y
ADD R0,R0,z
ST
x,R0
//R0 =y
(loadyintoregisterRO)
//R0 =R0+z
(addztoR0)
//x=R0
(storeROintox)
Thisstrategyoftenproducesredundantloadsandstores.Forexample,thesequenceofthree
addressstatements
a=b+c
d=a+e
wouldbetranslatedintothefollowingcode
LD
R0,b
//R0 =b
ADD R0,R0,c
//R0 =R0+c
ST
a,R0
//a=R0
CS6660
Compiler Design
LD
R0,a
ADD R0,R0,e
ST
d,R0
Unit V
5.23
//R0 =a
//R0 =R0+e
//d=R0
Here,thefourthstatementisredundantsinceitloadsavaluethathasjustbeenstored,andsoisthe
thirdifaisnotsubsequentlyused.
Thequalityofthegeneratedcodeisusuallydeterminedbyitsspeedandsize.Onmost
machines, a given IR program can be implemented by many different code sequences, with
significant cost differences between the different implementations. A naive translation of the
intermediatecodemaythereforeleadtocorrectbutunacceptablyinefficienttargetcode.
5RegisterAllocation
A key problem in code generation is deciding what values to hold in what registers.
Registersarethefastestcomputationalunitonthetargetmachine,butweusuallydonothave
enough of them to hold all values. Values not held in registers need to reside in memory.
Instructions involving register operands are invariably shorter and faster than those involving
operandsinmemory,soefficientutilizationofregistersisparticularlyimportant.
Theuseofregistersisoftensubdividedintotwosubproblems:
1. Registerallocation:Duringregisterallocation,selecttheappropriatesetofvariablesthat
willresideinregistersateachpointintheprogram.
2. Register assignment: During register assignment, pick the specific register in which
correspondingvariablewillresidein.
Findinganoptimalassignmentofregisterstovariablesisdifficult,evenwithsingleregister
machines. Mathematically, the problem is NPcomplete. The problem is further complicated
becausethehardwareand/ortheoperatingsystemofthetargetmachinemayrequirethatcertain
registerusageconventionsbeobserved.Certainmachinesrequireregisterpairsforsomeoperands
andresults.
Considerthetwothreeaddresscodesequences,theonlydifferenceistheoperatorinthesecond
statement.
t=a+b
t=t*c
t=t/d
TheefficientOptimalmachinecodesequenceswithonlyoneregisterR0
LD
R0,a
ADD R0,b
MULR0,c
DIV
R0,d
ST
R0,t
6EvaluationOrder
Theevaluationorderisanimportantfactoringeneratinganefficienttargetcode.Some
computation orders require fewer registers to hold intermediate results than others. However,
pickingabestorderinthegeneralcaseisadifficultNPcompleteproblem.Wecanavoidthe
problembygeneratingcodeforthethreeaddressstatementsintheorderinwhichtheyhavebeen
producedbytheintermediatecodegenerator.
CS6660
Compiler Design
Unit V
5.24
5.7ASIMPLECODEGENERATORALGORITHM
Acodegeneratorgeneratestargetcodeforasequenceofthreeaddressinstructions.Oneof
theprimaryissuesduringcodegenerationisdecidinghowtouseregisterstobestadvantage.Best
targetcodewilluseminimumregistersinexecution.
Therearefourprincipalusesofregisters:
Theoperandsofanoperationmustbeinregistersinordertoperformtheoperation.
Registersmakegoodtemporariesusedonlywithinasinglebasicblock.
Registersareusedtohold(global)valuesthatarecomputedinonebasicblockandusedin
otherblocks
Registersareoftenusedtohelpwithruntimestoragemanagement
Themachineinstructionsareoftheform
LDreg,mem
STmem,reg
OPreg,reg,reg
RegisterandAddressDescriptors
Foreachavailableregister,aregisterdescriptorkeepstrackofthevariablenameswhose
currentvalueisinthatregister.Initiallyallregisterdescriptorsareempty.Asthecode
generationprogresses,eachregisterwillholdthevalue.
Foreachprogramvariable,anaddressdescriptorkeepstrackofthelocationorlocations
wherethecurrentvalueofthatvariablecanbefound.Thelocationmightbearegister,a
memoryaddress,astacklocation,orsomesetofmorethanoneofthese.Theinformation
canbestoredinthesymboltableentryforthatvariablename.
FunctionGetReg
AnessentialpartofthealgorithmisafunctiongetReg(I),whichselectsregistersforeach
memorylocationassociatedwiththethreeaddressinstructionI.
FunctiongetReghasaccesstotheregisterandaddressdescriptorsforallthevariablesofthe
basicblock,andmayalsohaveaccesstocertainusefuldataflowinformationsuchasthe
variablesthatareliveonexitfromtheblock.
Inathreeaddressinstructionsuchasx=y+z,Apossibleimprovementtothealgorithmis
togeneratecodeforbothx=y+zandx=z+ywhenever+isacommutativeoperator,and
pickthebettercodesequence.
MachineInstructionsforOperations
ForathreeaddressinstructionwithOperations(+,, * , / , )suchasx=y+z,dothefollowing:
1. UsegetReg(x=y+z)toselectregistersforx,y,andz.CalltheseRx,Ry,andRz.
2. IfyisnotinRy(accordingtotheregisterdescriptorforRy),thenissueaninstructionLDRy,
y',wherey'isoneofthememorylocationsfory(bytheaddressdescriptorfory).
3. Similarly,ifzisnotinRz,issueandinstructionLDRz,z',wherez'isalocationforz.
4. IssuetheinstructionADDRx,Ry,Rz.
MachineInstructionsforCopyStatements
Consideranimportantspecialcase:athreeaddresscopystatementoftheformx=y.
WeassumethatgetRegwillalwayschoosethesameregisterforbothxandy.Ifyisnot
alreadyinthatregisterRy,thengeneratethemachineinstructionLDRy,y.
IfywasalreadyinRy,wedonothing.
CS6660
Compiler Design
Unit V
5.25
ManagingRegisterandAddressDescriptors
a.
b.
a.
c.
a.
Asthecodegenerationalgorithmissuesload,store,andothermachineinstructions,itneeds
toupdatetheregisterandaddressdescriptors.Therulesareasfollows:
1. FortheinstructionLDR,X
ChangetheregisterdescriptorforregisterRsoitholdsonlyx.
ChangetheaddressdescriptorforxbyaddingregisterRasanadditionallocation.
2. FortheinstructionST X,R,changetheaddressdescriptorforxtoincludeitsownmemory
location.
3. Foranoperation OP R, R, R suchasADD Rx, Ry,and Rz implementingathreeaddress
instructionx=y+x
ChangetheregisterdescriptorforRxsothatitholdsonlyx.
b. Changetheaddressdescriptorfor x sothatitsonlylocationis Rx.Notethatthe
memorylocationforxisnotnowintheaddressdescriptorforx.
RemoveRxfromtheaddressdescriptorofanyvariableotherthanx.
4. Whenweprocessacopystatementx=y,aftergeneratingtheloadforyintoregisterRy,if
needed,andaftermanagingdescriptorsasforallloadstatements(perruleI):
AddxtotheregisterdescriptorforRy.
b. ChangetheaddressdescriptorforxsothatitsonlylocationisRy.
Example5.16:Letustranslatethebasicblockconsistingofthethreeaddressstatements
t=ab
u=ac
v=t+u
a=d
d=v+u wheret,u,andvaretemporaries,localtotheblock,whilea,b,c,anddare
variablesthatareliveonexitfromtheblock.Whenaregister'svalueisnolongerneeded,thenwe
reuseitsregister.AsummaryofallthemachinecodeinstructionsgeneratedisinFigure.
CS6660
Compiler Design
Unit V
5.26
SUMMARY
THEPRINCIPALSOURCESOFOPTIMIZATION
SemanticsPreservingTransformations(Functions)Safeguardsoriginalprogram
meaningGlobalCommonSubexpressions
CopyPropagationDead
CodeEliminationCode
Motion/Movement
InductionVariablesandReductionin
StrengthOPTIMIZATIONOFBASICBLOCKS
TheDAGRepresentationofBasic
BlocksFindingLocalCommon
SubexpressionsDeadCodeElimination
TheUseofAlgebraicIdentities
RepresentationofArrayReferencesPointer
AssignmentsandProcedureCalls
ReassemblingBasicBlocksfromDAG's
GLOBALDATAFLOWANALYSIS
Pathsandpoints
ReachingDefinitions
DataflowanalysisofstructuredPrograms
ConservativeEstimationofDataFlow
InformationRepresentationofsets
EFFICIENTDATAFLOWALGORITHMSDepth
FirstOrderinginiterativeAlgorithms
StructurebasedDataFlowAnalysis
ISSUESINTHEDESIGNOFACODEGENERATOR
InputtotheCodeGenerator
TheTargetProgram
MemoryManagement
InstructionSelection
RegisterAllocation
EvaluationOrder
PEEPHOLEOPTIMIZATION
EliminatingRedundantLoadsand
StoresEliminatingUnreachableCode
FlowofControlOptimizations
AlgebraicSimplificationandReductionin
StrengthUseofMachineIdioms
DATAFLOWANALYSIS
TheDataFlowAbstraction
TheDataFlowAnalysisSchema
DataFlowSchemasonBasic
BlocksReachingDefinitions
LiveVariableAnalysis
AvailableExpressions
LOOPOPTINIZATION
CodeMotionwhile(i<max1){sum=sum+a[i]}=>n=max1;while(i<n){sum=sum+a[i]}
InductionVariablesandStrengthReduction:onlyoneInductionVariableinloop,eitheri++orj=j+2,*by
+Loopinvariantmethod
Loopunrolling
Loopfusion
COMPILETIMEEVALUATION
Constantfolding:Computationofconstantdoneatcompiletime,E.g.Clength=2*(22/7)*r.
Constantpropagation:Valueofvariableisreplacedandcomputedatcompiletime.
E.g.pi=3.14;r=6;Area=pi*r*r;,then,Areaiscomputedas3.14*6*6.
Variablepropagation:onevariableisreplacedbyanotheratcompiletime.
E.g.x=pi;Area=x*r*r;,then,Areaiscomputedaspi*r*r.
DR.PAULSENGINEERINGCOLLEGE
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING
Year&Semester
SubjectCode
SubjectName
Degree&Branch
:
:
:
:
III/VI
CS6660
COMPILERDESIGN
B.EC.S.E.
UNITIINTRODUCTIONTOCOMPILERS
1.Whatisacompiler?
Acompilerisaprogramthatreadsaprogramwritteninonelanguagethesourcelanguage
andtranslatesitintoanequivalentprograminanotherlanguagethetargetlanguage.The
compilerreportstoitsuserthepresenceoferrorsinthesourceprogram.
2. Whatarethetwopartsofacompilation?Explainbriefly.
AnalysisandSynthesisarethetwopartsofcompilation.
Theanalysispartbreaksupthesourceprogramintoconstituentpiecesandcreates
anintermediaterepresentationofthesourceprogram.
Thesynthesispartconstructsthedesiredtargetprogramfromtheintermediaterepresentation.
3. Listthesubpartsorphasesofanalysispart.
Analysisconsistsofthreephases:
LinearAnalysis.
HierarchicalAnalysis.
SemanticAnalysis.
4. Depictdiagrammaticallyhowalanguageisprocessed.
SkeletalsourceprogramPreprocessorSourceprogramCompilerTargetassembly
programAssemblerRelocatablemachinecodeLoader/linkeditorlibrary, relocatable
objectfiles
Absolutemachinecode
5.Whatislinearanalysis?
Linearanalysisisoneinwhichthestreamofcharactersmakingupthesourceprogramis
readfromlefttorightandgroupedintotokensthataresequencesofcharactershavinga
collectivemeaning.Alsocalledlexicalanalysisorscanning.
6.Listthevariousphasesofacompiler.
Thefollowingarethevariousphasesofacompiler:
LexicalAnalyzer
SyntaxAnalyzer
SemanticAnalyzer
Intermediatecodegenerator
Codeoptimizer
Codegenerator
7.Whataretheclassificationsofacompiler?
Compilersareclassifiedas:
pass
pass
andgo
8.Whatisasymboltable?
Asymboltableisadatastructurecontainingarecordforeachidentifier,withfieldsforthe
attributesoftheidentifier.Thedatastructureallowsustofindtherecordforeachidentifier
quicklyandtostoreorretrievedatafromthatrecordquickly.
Wheneveranidentifierisdetectedbyalexicalanalyzer,itisenteredintothesymboltable.
Theattributesofanidentifiercannotbedeterminedbythelexicalanalyzer.
9. Mentionsomeofthecousinsofacompiler.
Cousinsofthecompilerare:
Preprocessors
Assemblers
LoadersandLinkEditors
10. Listthephasesthatconstitutethefrontendofacompiler.
Thefrontendconsistsofthosephasesorpartsofphasesthatdependprimarilyonthe
sourcelanguageandarelargelyindependentofthetargetmachine.Theseinclude
LexicalandSyntacticanalysis
Thecreationofsymboltable
Semanticanalysis
Generationofintermediatecode
Acertainamountofcodeoptimizationcanbedonebythefrontendaswell.Alsoincludeserror
handlingthatgoesalongwitheachofthesephases.
11.Mentionthebackendphasesofacompiler.
Thebackendofcompilerincludesthoseportionsthatdependonthetargetmachineand
generallythoseportionsdonotdependonthesourcelanguage,justtheintermediatelanguage.
Theseinclude
Codeoptimization
Codegeneration,alongwitherrorhandlingandsymboltableoperations.
12. Definecompilercompiler.
Systemstohelpwiththecompilerwritingprocessareoftenbeenreferredtoas
compilercompilers,compilergeneratorsortranslatorwritingsystems.
Largelytheyareorientedaroundaparticularmodeloflanguages,andtheyaresuitablefor
generatingcompilersoflanguagessimilarmodel.
13.Listthevariouscompilerconstructiontools.
Thefollowingisalistofsomecompilerconstructiontools:
Scannergenerators
Parsergenerators
Syntaxdirectedtranslationengines
Dataflowanalysisengines
Codegeneratorgenerators
Compilerconstructiontoolkits
[LexicalAnalysis]
[SyntaxAnalysis]
[IntermediateCode]
[CodeOptimization]
[CodeGeneration]
[Forallphases]
14. Listoutlanguageprocessors
(i) Compiler
(ii) Interpreter
(iii) HybridCompiler
(iv)Languageprocessingsystem(Preprocessors,Assemblers,LinkersandLoader)
15. Listoutsomeprogramminglanguagebasics.
Todesignanefficientcompilerweshouldknowsomelanguagebasics.Important
conceptsfrompopularprogramminglanguageslikeC,C++,C#,andJavaarelistedbelow.
SomeoftheProgrammingLanguagebasicswhichareusedinmostofthelanguagesare
listedbelow.Theyare:
TheStatic/DynamicDistinction
EnvironmentsandStates
StaticScopeandBlockStructure
ExplicitAccessControl
DynamicScope
ParameterPassingMechanisms
Aliasing
UNITIILEXICALANALYSIS
1. WritetheNeeds/Roles/Functionsoflexicalanalyzer
Itproducesstreamoftokens.
Iteliminatescommentsandwhitespace.
Itkeepstrackoflinenumbers.
Itreportstheerrorencounteredwhilegeneratingtokens.
Itstoresinformationaboutidentifiers,keywords,constantsandsoonintosymboltable.
2. Differentiatetokens,patterns,lexeme.
Sequenceofcharactersthathaveacollectivemeaning.
Thereisasetofstringsintheinputforwhichthesametokenisproducedas
output.Thissetofstringsisdescribedbyarulecalledapatternassociatedwith
thetoken
meAsequenceofcharactersinthesourceprogramthatismatchedby
thepatternforatoken.
2.Listtheoperationsonlanguages.
UnionLUM={s|sisinLorsisinM}
ConcatenationLM={st|sisinLandtisinM}
KleeneClosureL*(zeroormoreconcatenationsofL)
PositiveClosureL+(oneormoreconcatenationsofL)
3.Writearegularexpressionforanidentifier.
Anidentifierisdefinedasaletterfollowedbyzeroormorelettersordigits.
Theregularexpressionforanidentifierisgivenasletter(letter|digit)*
4.Mentionthevariousnotationalshorthandsforrepresentingregularexpressions.
regular
expressionsa|b|c.)
5.Whatisthefunctionofahierarchicalanalysis?
Hierarchicalanalysisisoneinwhichthetokensaregroupedhierarchicallyinto
nestedcollectionswithcollectivemeaning.AlsotermedasParsing.
6.Whatdoesasemanticanalysisdo?
Semanticanalysisisoneinwhichcertainchecksareperformedtoensurethatcomponentsof
aprogramfittogethermeaningfully.
Mainlyperformstypechecking.
7.Listthevariouserrorrecoverystrategiesforalexicalanalysis.
Possibleerrorrecoveryactionsare:
10. Definenullable(n),firstpos(n),lastpos(n)andfollowpos(p)
1. nullable(n)istrueforasyntaxtreenodenifandonlyifthesubexpressionrepresented
by nhas initslanguage.Thatis,thesubexpressioncanbe"madenull"ortheempty
string,eventhoughtheremaybeotherstringsitcanrepresentaswell.
2. firstpos(n) isthesetofpositionsinthesubtreerootedatnthatcorrespondtothefirst
symbolofatleastonestringinthelanguageofthesubexpressionrootedatn.
3. lastpos(n) isthesetofpositionsinthesubtreerootedatnthatcorrespondtothelast
symbolofatleastonestringinthelanguageofthesubexpressionrootedatn.
4. followpos(p),forapositionp,isthesetofpositionsqintheentiresyntaxtreesuchthat
thereissomestringx=a1a2aninL((r)#)suchthatforsomeI,thereisawaytoexplain
themembershipofxinL((r)#)bymatchinga itopositionpofthesyntaxtreeandai+1to
positionq.
12.WritethealgorithmforConvertingaRegularExpressionDirectlytoaDFA
Algorithm:ConstructionofaDFAfromaregularexpressionr.
INPUT:Aregularexpressionr.
OUTPUT:ADFADthatrecognizesL(r).
METHOD:
1. ConstructasyntaxtreeTfromtheaugmentedregularexpression(r)#.
2. Computenullable,firstpos,lastpos,andfollowposforT.
initializeDstatestocontainonlytheunmarkedstatefirstpos(no),
wherenoistherootofsyntaxtreeTfor(r)#;
3. ConstructDstates,thesetofstatesofDFAD,andDtran,thetransitionfunctionforD,
while(thereisanunmarkedstateSinDstates)
{
markS;
for(eachinputsymbola)
{
letUbetheunionoffollowpos(p)forallpinSthatcorrespondtoa;
if(UisnotinDstates)
By
theaboveprocedure.ThestatesofDaresetsofpositionsinT.Initially,eachstateis"unmarked,"
andastatebecomes"marked"justbeforeweconsideritsouttransitions.ThestartstateofDis
firstpos(no),wherenodenoistherootofT.Theacceptingstatesarethosecontainingtheposition
fortheendmarkersymbol#.
13.WritetheStructureofLexPrograms
ALexprogramhasthefollowingform:
DECLARATIONS
%%
TRANSLATION RULES
14.ConstructaDFAandfirstposandlastposfornodesfortheregularexpressionr=(a|b)*abb
UNITIIISYNTAXANALYSIS
1. Defineparser.
Hierarchicalanalysisisoneinwhichthetokensaregroupedhierarchicallyinto
nestedcollectionswithcollectivemeaning.
AlsotermedasParsing.
2. Mentionthebasicissuesinparsing.
Therearetwoimportantissuesinparsing.
3. Whylexicalandsyntaxanalyzersareseparatedout?
Reasonsforseparatingtheanalysisphaseintolexicalandsyntaxanalyzers:
Simplerdesign.
Compilerefficiencyisimproved.
Compilerportabilityisenhanced.
4. Defineacontextfreegrammar.
AcontextfreegrammarGisacollectionofthefollowing
GcanberepresentedasG=(V,T,S,P)
Productionrulesaregiveninthefollowingform
Non terminal (V U T)*
5. Brieflyexplaintheconceptofderivation.
DerivationfromSmeansgenerationofstringwfromS.Forconstructingderivationtwo
thingsareimportant.
i) Choiceofnonterminalfromseveralothers.
ii) Choiceofrulefromproductionrulesforcorrespondingnonterminal.
Insteadofchoosingthearbitrarynonterminalonecanchoose
i) eitherleftmostderivationleftmostnonterminalinasentinelform
ii) orrightmostderivationrightmostnonterminalinasentinelform
6. Defineambiguousgrammar.
AgrammarGissaidtobeambiguousifitgeneratesmorethanoneparsetreeforsome
sentenceoflanguageL(G).
i.e.bothleftmostandrightmostderivationsaresameforthegivensentence.
7. Whatisaoperatorprecedenceparser?
Agrammarissaidtobeoperatorprecedenceifitpossessthefollowingproperties:
1. No production on the right side is .
2. Thereshouldnotbeanyproductionrulepossessingtwoadjacentnonterminalsattheright
handside.
8. ListthepropertiesofLRparser.
1. LR parsers can be constructed to recognize most of the
programminglanguagesforwhichthecontextfreegrammarcanbe
written.
2. The class of grammar that can be parsed by LR parser is a
supersetofclassofgrammarsthatcanbeparsedusingpredictive
parsers.
3.LRparsersworkusingnonbacktrackingshiftreducetechniqueyet
itisefficientone.
9. MentionthetypesofLRparser.
simpleLRparser
lookaheadLRparser
10. Whataretheproblemswithtopdownparsing?
Thefollowingaretheproblemsassociatedwithtopdownparsing:
ty
11.WritethealgorithmforFIRSTandFOLLOW.
FIRST
1.IfXisterminal,thenFIRST(X)IS{X}.
. If X is a production, then add to FIRST(X).
. If X is non terminal and X Y1,Y..Yk is a production, then place a in FIRST(X)
if for somei,aisinFIRST(Yi) , and is in all of FIRST(Y1),FIRST(Yi1);
FOLLOW
1.Place$inFOLLOW(S),whereSisthestartsymboland$istheinputrightendmarker.
. If there is a production A B , then everything in FIRST( ) except for is placed in
FOLLOW(B).
3.Ifthere is a production A B, or a production A B where FIRST( ) contains
,theneverythinginFOLLOW(A)isinFOLLOW(B).
12. List the advantages and disadvantages of operator precedence
parsing.Advantages
Thistypeofparsingissimpletoimplement.
Disadvantages
1. Theoperatorlikeminushastwodifferentprecedence(unaryandbinary).Henceitishardto
handletokenslikeminussign.
2. Thiskindofparsingisapplicabletoonlysmallclassofgrammars.
13. Whatisdanglingelseproblem?
Ambiguitycanbeeliminatedbymeansofdanglingelsegrammarwhichisshowbelow:
stmt if expr then stmt
|ifexprthenstmtelsestmt
|other
14. WriteshortnotesonYACC.
YACCisanautomatictoolforgeneratingtheparserprogram.
YACC stands for Yet Another Compiler Compiler which is basically the utility
availablefromUNIX.
BasicallyYACCisLALRparsergenerator.
Itcanreportconflictorambiguitiesintheformoferrormessages.
15. Whatismeantbyhandlepruning?
Arightmostderivationinreversecanbeobtainedbyhandlepruning.
Ifwisasentenceofthegrammarathand,thenw= n, where n is the nth rightsentential
formofsomeasyetunknownrightmostderivation
10
UNITIVSYNTAXDIRECTEDTRANSLATION&RUNTIMEENVIRONMENT
1. Whatarethebenefitsofintermediatecodegeneration?
ACompilerfordifferentmachinescanbecreatedbyattachingdifferentback
endtotheexistingfrontendsofeachmachine.
ACompilerfordifferentsourcelanguagescanbecreatedbyprovingdifferent
frontendsforcorrespondingsourcelanguagestexistingbackend.
Amachineindependentcodeoptimizercanbeappliedtointermediatecode
inordertooptimizethecodegeneration.
2. Whatarethevarioustypesofintermediatecoderepresentation?
Therearemainlythreetypesofintermediatecoderepresentations.
3.
4.
Definebackpatching.
Backpatchingistheactivityoffillingupunspecifiedinformationoflabelsusing
appropriatesemanticactionsinduringthecodegenerationprocess.Inthesemantic
actionsthefunctionsusedaremklist(i),merge_list(p1,p2)andbackpatch(p,i)
Mentionthefunctionsthatareusedinbackpatching.
functionwhereIisanindextothearrayofquadruple.
p2)thisfunctionconcatenatestwolistspointedbyp1andp2.
Itreturnsthepointertotheconcatenatedlist.
5. Whatistheintermediatecoderepresentationfortheexpressionaorbandnotc?
Theintermediatecoderepresentationfortheexpressionaorbandnotcisthe
threeaddresssequence
t1:=notc
t2:=bandt1
t3:=aort2
6. Whatarethevariousmethodsofimplementingthreeaddressstatements?
Thethreeaddressstatementscanbeimplementedusingthefollowingmethods.
operator(OP),arg1,arg2,result.
thesymboltable.
usedinsteadofusingstatements.
7. Givethesyntaxdirecteddefinitionforifelsestatement.
1. S if E then S1
E.true:=new_label()
E.false:=S.next
S1.next:=S.next
S.code :=E.code | | gen_code(E.true : ) | |
S1.code . S if E thenS1elseS2
E.true:=new_label()
ingpointersare
11
E.false:=new_label()
S1.next:=S.next
S2.next:=S.next
S.code :=E.code | | gen_code(E.true : ) | | S1.code| | gen_code(go
to,S.next) | |gen_code(E.false :) | | S.code
12
UNITVCODEOPTIMIZATIONANDCODEGENERATION
1.Mentionthepropertiesthatacodegeneratorshouldpossess.
words,thecodegeneratedshouldbesuchthatitshouldmakeeffectiveuse
oftheresourcesofthetargetmachine.
2. Listtheterminologiesusedinbasicblocks.
Defineandusethethreeaddressstatementa:=b+cissaidtodefineaand
tousebandc.
Liveanddeadthenameinthebasicblockissaidtobeliveatagiven
pointifitsvalueisusedafterthatpointintheprogram.Andthenameinthe
basicblockissaidtobedeadatagivenpointifitsvalueisneverusedafter
thatpointintheprogram.
3. Whatisaflowgraph?
Aflowgraphisadirectedgraphinwhichtheflowcontrolinformationisaddedtothe
basicblocks.
B1toblockB2ifB2immediatelyfollows
B1inthegivensequence.WecansaythatB1isapredecessorofB2.
4. WhatisaDAG?Mentionitsapplications.
7.
Howdoyoucalculatethecostofaninstruction?
The costofaninstruction canbecomputed as one plus costassociated withthe
sourceanddestinationaddressingmodesgivenbyaddedcost.
MOVR0,R11
13
MOVR1,M2
SUB5(R0),*10(R1)3
8. Whatisabasicblock?
Abasicblockisasequenceofconsecutivestatementsinwhichflowofcontrolenters
atthebeginningandleavesattheendwithouthaltorpossibilityofbranching.
Eg.t1:=a*5
t2:=t1+7
t3:=t25
t4:=t1+t3
t5:=t2+b
13.
Listthedifferentstorageallocationstrategies.
Thestrategiesare:
Heapallocation
14. Whatarethecontentsofactivationrecord?
Theactivationrecordisablockofmemoryusedformanagingtheinformationneeded
byasingleexecutionofaprocedure.Variousfieldsfactivationrecordare:
iables
14
15.Whatisdynamicscoping?
Indynamicscopingauseofnonlocalvariablereferstothenonlocaldatadeclaredinmost
recentlycalledandstillactiveprocedure.Thereforeeachtimenewfindingsaresetupfor
localnamescalledprocedure.Indynamicscopingsymboltablescanberequiredatrun
time.16.Definesymboltable.
Symboltableisadatastructureusedbythecompilertokeeptrackofsemanticsof
thevariables.Itstoresinformationaboutscopeandbindinginformationaboutnames.
Whatiscodemotion?
Codemotionisanoptimizationtechniqueinwhichamountofcodeinaloopisdecreased.
Thistransformationisapplicabletotheexpressionthatyieldsthesameresultindependent
ofthenumberoftimestheloopisexecuted.Suchanexpressionisplacedbeforetheloop.
Whatarethepropertiesofoptimizingcompiler?
Thesourcecodeshouldbesuchthatitshouldproduceminimumamountoftargetcode.
Thereshouldnotbeanyunreachablecode.
Deadcodeshouldbecompletelyremovedfromsourcelanguage.
Theoptimizingcompilersshouldapplyfollowingcodeimprovingtransformations
onsourcelanguage.
i) commonsubexpressionelimination
ii) deadcodeelimination
iii) codemovement
iv) strengthreduction
20.Suggestasuitableapproachforcomputinghashfunction.
Usinghashfunctionweshouldobtainexactlocationsofnameinsymboltable.The
hashfunctionshouldresultinuniformdistributionofnamesinsymboltable.
Thehashfunctionshouldbesuchthattherewillbeminimumnumberofcollisions.Collisionis
suchasituationwherehashfunctionresultsinsamelocationforstoringthenames.
17.
18.
15
REFERENCES:
1. AlfredVAho,MonicaS.Lam,RaviSethiandJeffreyDUllman, Compilers
Principles,TechniquesandTools,2ndEdition,PearsonEducation,2007.
2. RandyAllen,KenKennedy, OptimizingCompilersforModernArchitectures:A
DependencebasedApproach,MorganKaufmannPublishers,2002.
3.
StevenS.Muchnick, AdvancedCompilerDesignandImplementation, Morgan
KaufmannPublishersElsevierScience,India,IndianReprint2003.
4.
KeithDCooperandLindaTorczon,EngineeringaCompiler,MorganKaufmann
PublishersElsevierScience,2004.
5.
CharlesN.Fischer,Richard.J.LeBlanc, CraftingaCompilerwithC,Pearson
Education,2008.