Vous êtes sur la page 1sur 24

Compilers Project

Topic: Compiler for Flat Tiny C (FLTC)

Compilers
A Compiler translates a program in a source language to an equivalent program in a target language
Source Program

Compiler

Target Program

Typically
Source Languages - C, C++, ADA, FORTRAN etc.

Target languages - Instruction set of a microprocessor

An assembler translates assemble language programs in to object

code.

GCC

Structure of a Three Phase Compiler


Front End Optimize r Back End
Compile r

Source Progra m

IR

IR

Target ALP

An Optimizer
Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

May also improve space, power consumption,


Must preserve meaning of the code Measured by values of named variables

progra m Lexical Analyzer token stream Syntax Analyzer syntax tree Symbol Table Semantic Analyzer syntax tree Intermediate Code Generator intermediate representation Machine-Independent Code Optimizer intermediate representation Code Generator

Structure of a Typical Compiler


Front End

Instruction Selection Register Allocation Instruction Scheduling MachineDependent Code Optimizer

Back End

target-machine code

target-machine code

The Front End


program Lexical Analyzer/Scanner token stream Syntax Analyzer/Parser syntax tree Symbol Table Semantic Analyzer syntax tree Intermediate Code Generator intermediate representation

The Front End: Scanner and Parser


Source Code

Scanner

Tokens

Parser

IR Error s

Parser
Takes as input a stream of tokens

Checks if the stream of tokens constitutes a syntactically valid program of

the language
If the input program is syntactically correct

Output an intermediate representation of the code (like AST)


If the input program has syntactic errors Outputs relevant diagnostic information

Context Free Grammars and Programming Languages Expr Binop Expr Binop Expr | Expr | ! Expr | ( Expr ) Arithop | Relop | Eqop | Condop

Arithop + | | * | / | % | << | >>


Relop < | > | <= | >=

Eqop

== | !=

Condop && | ||

CFGs and Programming Languages


Statement Location = Expr ; | MethodCall ;

|
|

if ( Expr ) Block
if ( Expr ) Block else Block

while ( Expr ) Block

| continue ; | Block

Block { VarDeclList StatementList } StatementList Statement | Statement StatementList

Context Free Grammars and Programming Languages


Key Idea: All modern programming languages can be expressed using context free grammars (by design!)
Programs have recursive structures
A program is a collection of functions A function is a sequence of statements A statement can be any of if, while, for, assignment statements etc. The body of a while loop is a sequence of statements An arithmetic expression is a sum/product of two AEs.

CFGs are a nice way of expressing programs with recursive structure

CFGs and Programming Languages


Advantages of using CFGs to specify syntactic structure of languages
Clear and concise syntactic specification for languages Language can be developed or evolved iteratively
New constructs in the language can be added with relative ease.

Programming languages can be specified using a special sub-class of CFGs for which efficient parsing techniques and automatic parser generators exists.
These special class of CFGs also allow for automatically capturing ambiguities in

the language
CFGs impose a structure on the program which facilitates easy translation to

intermediate or target object code.

Grammar for FTC


Program class main { Field_Decl* Statement* } Field_Decl Type { id | id [ int_literal ] }+, ;

Type int | boolean | char


Statement Labelled_Statement | Location = Expr ; | if Expr then goto label ; | goto label ; | Method_Call; Labelled_Statement label Statement //Think of label as id: // label is a token like id

Grammar for FTC


Location id | id [ Expr ] Expr Literal | Location | Expr Binop Expr

| - Exp | ! Expr | ( Expr )


Binop Arithop | Relop | Condop Arithop + | - | * | / Relop < | > | <= | >= | == | != Condop && | || Method_Call print( Expr +,); | read(Location); Literal int_literal | string_literal | char_literal | bool_literal

Parsing Approaches
Cocke-Younger-Kasami (CYK) algorithm can construct a parse tree

for a given string and CFG in (n3) worst-case time.


Earleys algorithm
O(n3) for general CFGs

O(n2) for unambiguous grammars


We would like to have linear-time algorithms for parsing programs.

Yacc (Bison)

Structure of a Yacc Specification file

Yacc

flex can generate the yylex() function using the lexical specification.

Yacc

Abstract Syntax Trees (ASTs)


Compilers often use an abstract syntax tree instead of a parse tree The AST summarizes grammatical structure, without including detail

about the derivation

x+2-y

This is much more concise ASTs are one kind of intermediate representation (IR)

Abstract Syntax Trees


While if

expr subtree

statement subtree

expr subtree

statement subtree

if-else

expr subtree

ifstatement subtree

elsestatement subtree

AST Construction for Expression Grammar


%{ struct { enum Op op; struct astnode *left; struct astnode *right } astnode; #define YYSTYPE struct astnode *; %} %token NUMBER %left - + %left * / %%

Note: This yacc specification is not complete. I highlighted only the important parts.

expr: expr + expr { $$ = getNewAstnode(); $$->op = plus; $$->left = $1; $$->right = $3; }
| expr - expr { $$ = getNewAstnode(); $$->op = minus; $$->left = $1; $$->right = $3; } | expr * expr { $$ = getNewAstnode(); $$->op = mult; $$->left = $1; $$->right = $3; } | expr / expr { $$ = getNewAstnode(); $$->op = div; $$->left = $1; $$->right = $3; }

Abstract Syntax Trees


While if

expr subtree

statement subtree

expr subtree

statement subtree

if-else

expr subtree

ifstatement subtree

elsestatement subtree

AST Construction
Note: The way statement lists are handled here is different from the code I have shown in the class.

Symbol Tables

Vous aimerez peut-être aussi