Vous êtes sur la page 1sur 15

Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.

,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Syllabus : Unit – I : INTRODUCTION TO COMPILING

Compilers – Analysis of source program – Phases of a Complier –


Cousins of the Complier – Grouping of Phases – Complier construction tools –
Lexical Analysis – Role of Lexical Analysis – Input Buffering – Specification of
Tokens.

Compiler :
A Compiler is a program that reads a program written in one language (Source
Language like C,C++,etc…) and translate it into an equivalent program in another
language (Target Language like Machine Language) and the complier reports to its user
the presence of errors in the source program.

Source Program Target Program


Complier (Low Level Language)
(High Level Language)

Error Message

Classification of Compiler :

1. Single Pass Complier


2. Multi-Pass Complier
3. Load and Go Complier
4. Debugging or Optimizing Complier.

Software Tools :
Many software tools that manipulate source programs first perform some kind of
analysis. Some examples of such tools include:

 Structure Editors :
 A structure editor takes as input a sequence of commands to build a
source program.
 The structure editor not only performs the text-creation and
modification functions of an ordinary text editor, but it also analyzes
the program text, putting an appropriate hierarchical structure on the
source program.
 Example – while …. do and begin….. end.
 Pretty printers :
 A pretty printer analyzes a program and prints it in such a way that the
structure of the program becomes clearly visible.
 Static Checkers :
 A static checker reads a program, analyzes it, and attempts to discover
potential bugs without running the program.

1 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

 Interpreters :
 Translate from high level language ( BASIC, FORTRAN, etc..) into
assembly or machine language.
 Interpreters are frequently used to execute command language, since
each operator executed in a command language is usually an
invocation of a complex routine such as an editor or complier.
 The analysis portion in each of the following examples is similar to
that of a conventional complier.
 Text formatters.
 Silicon Compiler.
 Query interpreters.

Analysis of Source Program :


The analysis phase breaks up the source program into constituent pieces and
creates an intermediate representation of the source program. Analysis consists of three
phases:

 Linear analysis (Lexical analysis or Scanning)) :


 The lexical analysis phase reads the characters in the source program
and grouped into them tokens that are sequence of characters having a
collective meaning.
 Example : position : = initial + rate * 60
 Identifiers – position, initial, rate.
 Assignment symbol - : =
 Operators - + , *
 Number - 60
 Blanks – eliminated.

 Hierarchical analysis (Syntax analysis or Parsing) :


 It involves grouping the tokens of the source program into grammatical
phrases that are used by the complier to synthesize output.
 Example : position : = initial + rate * 60
Assignment statement
|
:=

Identifier Expression
| |
position +

Expression Expression
| |
identifier *
|
initial Expression
Expression |
| number
identifier |
| 60
rate
2 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

 Semantic analysis :
 In this phase checks the source program for semantic errors and
gathers type information for subsequent code generation phase.
 An important component of semantic analysis is type checking.
 Example : int to real conversion.

Expression
|
*

Expression
Expression |
| number
identifier |
| inttoreal
rate |
60

Phases of Complier:
A Compiler operates in phases, each of which transforms the source program from
one representation to another.

 Two parts (Six Phases) of compilation. They are,


 Analysis Phase ( Three Phases)
 Lexical Analysis
 Syntax Analysis
 Semantic Analysis
 Synthesis Phase ( Three Phases)
 Intermediate Code Generation
 Code Optimizer
 Code Generator
 Two other activities are
 Symbol Table Management
 Error Handler
Lexical Analysis :

 It is also called scanner.


 The lexical analysis phase reads the characters in the source program
and grouped into them tokens that are sequence of characters having a
collective meaning. Such as an Identifier, a Keyword, a Punctuation,
an operator or multi character operator like ++.
 The character sequence forming a token is called the lexeme for the
token. Certain tokens will be augmented by a lexical value.

 Example : position : = initial + rate * 60 id1 := id2 + id3 * 60


 Blanks – eliminated.

3 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Source Program

Lexical Analyzer

Syntax Analyzer

Symbol Table Semantic Analyzer Error


Management Handler

Intermediate Code
Generator

Code Optimizer

Code Generator

Target Program

Syntax analysis:

 It processes the string of descriptors (tokens), synthesized by the


lexical analyzer to determine the syntactic structure of an input
statement. This process is known as parsing.
 Output of the parsing step is a representation of the syntactic structure
of a statement. A convenient representation is in the form of a syntax
tree.
 Example : position : = initial + rate * 60
:=

id1 +

id2 *

id3 60
Semantic analysis :

 In this phase checks the source program for semantic errors and
gathers type information for subsequent code generation phase.
 An important component of semantic analysis is type checking.

4 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

 Example : int to real conversion.


:=

id1 +

id2 *

id3 inttoreal
|
60

Intermediate Code Generation:

 It should be easy to produce.


 It should be easy to translate into the target program.
 Three address codes consist of a sequence of instructions, each of
which has at most three operands.
 Example id1 := id2 + id3 * 60
 Three address code as
temp1 := inttoreal (60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3

Code Optimization:

 To improve the intermediate code, so that faster running machine code


will result.

 Example Three address code after optimization as


temp1 := id3 * 60.0
id1 := id2 + temp1
Code Generation:

 Final phase of the complier is the generation of target code, consisting


or relocatable machine code or assembly code.

 Example for 8086 conversion code


MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1

5 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Symbol Table Management:

 A Symbol table is data structure containing a record for each identifier


with fields for the attributes of an identifier.

 When an identifier in the source program is detected by the lexical


analyzer, the identifier is entered into the symbol table. However, the
attributes of an identifier cannot normally be determined during lexical
analyzer.

 The remaining phases enter information about identifiers into the


symbol table and then use this information in various ways.

Error Handler:

 Each phase can encounted errors

 The lexical phase can detect errors where the characters


remaining in the input do not form any token of the language.

 The syntax analysis phase can detect errors where the token
stream violates the structure rules of language.

 During semantic analysis, the compiler tries to detect construct


that have the right syntactic structure but no meaning to the
operation involved.

 An intermediate code generator may detect an operator whose


operands have incompatible.

 The code optimizer, doing control flow analysis may detect that
certain statements can never be reached.

 While entering information into the symbol table, the book


keeping routine may discover an identifier that has been
multiply declared with contradicting attributes.

6 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Write down the output of each phase for expression position : = initial + rate * 60

Source Program
position : = initial + rate * 60

Lexical Analyzer

id1 := id2 + id3 * 60

Syntax Analyzer

:=

id1 +

id2 *

id3 60

Semantic Analyzer
:=

id1 + Error
Symbol Table Handler
Management id2 *

id3 inttoreal
|
60

Intermediate Code
Generator

temp1 : = inttoreal (60)


temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3

Code Optimizer

temp1 := id3 * 60.0


id1 := id2 + temp1

Code Generator

Target Program
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
7 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Cousins of the Complier (Language Processing System) :


Skeletal source Program

Preprocessor
Source Program
Complier
Target Assembly Program
Assembler
Relocatable Machine Code
Load/Link-editor Library,
Relocatable Object Files

Absolute Machine Code


 Preprocessors :
 It produces input to Compiler. They may perform the following
functions.
 Macro Processing :
A preprocessor may allow a user to define macros that are
shorthands for longer constructs.
 File inclusion :
A preprocessor may include header files into the program
text.
 Rational preprocessors :
These preprocessors augment older language with more
modern flow of control and data structuring facilities.
 Language extensions :
These preprocessor attempts to add capabilities to the
language by what amounts to built in macros.

 Complier :
 It converts the source program (HLL) into target program (LLL).

 Assembler :
 It converts an assembly language (LLL) into machine code.

 Loader and Link Editors :


 Loader :
 The process of loading consists of taking relocatable machine
code, altering the relocatable addresses and placing the altered
instructions and data in memory at the proper locations.
 Link Editor :
 It allows us to make a single program from several files of
relocatable machine code.

8 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Grouping of Complier :
 A Symbol table is data structure containing a record for each identifier
with fields for the attributes of an identifier.
When an identifier in the source program is detected by the lexical analyzer, the

9 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

COMPILER CONSTRUCTION TOOLS:


The compiler writer like any programmer can profitably use software tools such as
debuggers, version managers, profilers and so on.

Compiler construction tools are


Parser generators
Scanner generators
Syntax-directed translations engines
Automatic code generators
Dataflow engines

 Parser Generators:
These produce syntax analyzers, normally from input that is based on
CFG. In early compliers, syntax analysis consumed not only a large fraction of the
scanning time of a complier, but a large fraction of the intellectual effort of writing a
complier.
Eg: PIC, EQM

 Scanner Generator:
These automatically generate lexical analyzers, normally from a
specification based on regular expressions. The basic organization of the resulting
lexical analyzer is in effect of finite automation.

 Syntax-Directed Translation Engines:


These produce intermediate code with three address format, normally from
input that is based on the parse tree.

 Automatic Code Generator:


It takes a collection of rules that define the translation of each operation of
the intermediate language into the machine language for the target machine.
The input specification for these systems may contain:
1. A description of the lexical and syntactic structure of the source
language.
2. A description of what output is to be generated for each source
language construct.
3. A description of the target machine.

 Dataflow Engines:
Much of the information needed to perform good code optimization
involves “dataflow analysis”, the gathering of information about how values are
transmitted from one part of a program to each other part.

These systems have often been referred as,


Compiler- compilers.
Compiler-generators
Translator-writing systems

10 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

ROLE OF LEXICAL ANALYSER:

Tokens
Source Program
Lexical Parser
analyzer
Get next token

Symbol table
Management

 Its main task is to read the input characters and produce as output a sequence
of tokens that the parser uses for syntax analysis.

 Receiving a “get next token” command from the parser, the lexical analyzer
reads input characters until it can identify the next token.

 Its secondary takes are,


1. One task is stripping out from the source program comments and while
space in the form of blank, tab, new line characters.
2. Another task is converting error messages from the compiler with the
source program.
 Two phases
1. Scanning
2. Lexical analysis
 The scanner is responsible for doing simple tasks, while the
lexical analyzer proper does the more complex operations.

FUNCTIONS:
1. It produces the stream of tokens.
2. It eliminates blank and commands.
3. It generates symbol table which stores the information about ID, constants
encountered in the input.
4. It keeps track of line number.
5. It reports the error encountered while interrupting the tokens.

ISSUES IN LEXICAL ANALYSIS:

There are several reasons for separating the analysis phase of compiling into
lexical analysis and parsing.
Simpler design.
Compiler efficiency is improved.
Compiler portability is enhanced.

11 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

TOKEN:

It is a sequence of character that can be treated as a single logical entity.


Typical tokens are,
1. Identifiers
2. Keywords
3. Operators
4. Special symbols
5. Constants

PATTERN:

A set of strings in the input for which the same token is produced as
output. This set of strings is described by a rule called a pattern associated with
the token.

LEXEME:

It is sequence of characters in the source program that is matched by the


pattern foe a token.

INPUT BUFFERING :

During the analysis, the scanner scans the input string from left to right one
character at a time to identify tokens. It uses two pointers for doing this analysis

1. Begin pointer (to keep track of first character for each token).

2. Forward pointer(to keep track of next character)


bp

f l o a t a , b ; a = A + 2 ;

fp
Steps in Scanning the Input:

1. Initially, both begin pointer and forward pointer points to the first character of the
lexeme.

2. The fp scans the buffer until there is a match with the described token is found.

3. Once the lexeme is found (either a space or a delimiter), the fp will represent the
right end to the lexeme.

12 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

bp

f l o a t a , b ; a = A + 2 ;

fp
4. After processing the lexeme, both pointers will be set to point the character
immediately after the lexeme.
bp

f l o a t a , b ; a = A + 2 ;

fp
5. This procedure is represented for the entire source program.

 Input strings are usually stored in buffer.


Two Types:

1. One buffer scheme

2. Two buffer scheme

One Buffer Scheme:

 Only one buffer of size „N‟ is used.

 First N characters of the input string are read into the buffer. When the fp
reaches the end into the buffer, it will be filled with the next set of N
characters.

Drawbacks:

 The problem with this implementation is that when the size of the token is
greater than „N‟ this scheme fails to produce the tokens.

Two Buffer Scheme:


bp

f L o a t eof a , b ; a = a + 2 eof

fp

First half N Size Second half N Size

 Two N character buffers are used.

13 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

 First N characters are read into the first half of the buffer. If the buffer
hasn‟t filled (<N) then a special character called EOF will be inserted to
indicate the end.

 When the pointer reaches the end of first half, then the second half will be
loaded with next N characters of the same program.

 When the pointer is about to reach the end of second half, then the first
half will be loaded with next N characters of the input.

Algorithm for advancing the fp :

if fp is at the end of first half then


begin
Load second half;
Increment fp by 1;
end
else if fp at the end of second half then
begin
Load first half;
Set fp to first character of first half;
end
else
increment fp by 1;
end

Every time to check whether it has reached its end or not.

To reduce the number of comparisons, a special character called sentinel character


(usually EOF) is introduced at ends of the buffer halves.

Algorithm for advancing the fp using Sentinel:

fp = fp+1;
begin
if fp = eof then
if fp at the end of first half then
begin
Load second half;
Fp by 1;
end
else if fp at the end of second half then
begin
Load first half;
Set fp to first character of first half;
end
else
Terminate lexical analysis;
End

14 / 15
Anna University – B.E -VI Sem CSE D. Jagadeesan, M.Tech., MISTE.,
CS1352 – Principles of Compiler Design Lect/CSE,APCE
Unit – I

Refer the following from Theory of Computation

1. Finite Automata
2. DFA
3. NFA
4. Regular Expression
5. Converting R.E into NFA
6. Converting NFA with into NFA and DFA
7. Minimization of DFA.
Formatted: Indent: Left: 1"

15 / 15

Vous aimerez peut-être aussi