Vous êtes sur la page 1sur 19

ACROPOLIS INSTITUTE OF TECHNOLOGY AND

RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

Q1. Explain why we should study about compiler. What is Compiler & its
various phases with diagram & by taking a example of a= b*cd.
Ans: Reasons for Studying Compilers:
• An essential programming tool
• Improves software productivity by hiding low-level details
• A tool for designing and evaluating computer architectures
• Inspired RISC, VLIW machines
• Machines’ performance measured on compiled code
• Techniques for developing other programming tools Examples: error
detection tools
• Little languages and program translations can be used to solve
other problems

Compiler : A compiler is a computer program (or set of programs)


that transforms source code written in a programming
language(the source language) into another computer language
(the target language, often having a binary form known as object
code).

Phases of Compiler:
Lexical Analyzer: The lexical analysis stage transforms a sequence of
characters to a sequence of lexical elements. These lexical entities
correspond principally to integers, floating point numbers, characters,
strings of characters and identifiers. The message Illegal
character might be generated by this analysis.

Syntax Analysis: The parsing stage constructs a syntax tree and


verifies that the sequence of lexical elements is correct with respect to
the grammar of the language. The message Syntax error indicates that
the phrase analyzed does not follow the grammar of the language.

Semantic Analysis: The semantic analysis stage traverses the syntax

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

tree, checking another aspect of program correctness. The analysis


consists principally of type inference, which if successful, produces
the most general type of an expression or declaration. Type error
messages may occur during this phase. This stage also detects
whether any members of a sequence are not of type unit. Other
warnings may result, including pattern matching analysis (e.g pattern
matching is not exhaustive, part of pattern matching will not be used).

Code Generation: Generation and the optimization of intermediate


code does not produce errors or warning messages.

The final step in the compilation process is the generation of a


program binary.

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

Given Expression is
a=b*c

Laxical Analyzer

id1=id2*id3;

Syntax Analyzer

Semantic Analyzer

Intermediate code generator

Temp1=id3; temp2=id2;
Temp3=temp2*temp1; id1=temp3;
code optimizer

Temp1=id2*id3;
Id1=temp1;

Code generator

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

MOVE ID2,AX
MOVE ID3,BX
MUL AX,BX
MOV AX,ID1

Q2. What is parsing? How many types of techniques used in the parsing
explain with diagram?
Ans: Parsing is the process of analyzing a text, made of a sequence
of tokens (for example, words), to determine its grammatical structure
with respect to a given (more or less) formal grammar. Parsing can
also be used as a linguistic term, especially in reference to how
phrases are divided up in garden path sentences.
The basic connection between a sentence and the grammar it derives
from is the parse
tree, which describes how the grammar was used to produce the
sentence.
There are only two techniques to do parsing.
The first method tries to imitate the original production process by
rederiving the sentence from the start symbol. This method is called top-
down, because the production tree is reconstructed from the top
downwards.

The second methods tries to roll back the production process and to
reduce the
sentence back to the start symbol. Quite naturally this technique is
called bottom-up.

Top-down parsing:
grammar for the language anb nc n
SS -> aSQ

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

S -> abc
bQc -> bbcc
cQ -> Qc
and suppose the (input) sentence is aabbcc.

Top-down parsing tends to identify the production rules (and thus to


characterize
the parse tree) in prefix order.

Bottom-up parsing:
A bottom up parser is trying to go backwards, performing the following
reverse derivation sequence:
ax → Ax → S
Intuitively, a top-down parser tries to expand nonterminals into right-
hand-sides and a bottom-up parser tries to replace (reduce) right-hand-sides
with nonterminals. The first action of the bottom-up parser would be to
replace a with A yielding Ax. Then it would replace Ax with S. Once it
arrives at a sentential form with exactly S, it has reached the goal and
stops, indicating success.

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

Q3. What do you understand by error recovery & error handling in LL & LR
parsing also explains its types.
Ans: Error Recovery in Predictive Parsing: An error is detected during
predictive parsing when the terminal on top of the stack does not match
the next input symbol or when nonterminal A is on top of the stack, a is
the next input symbol, and M[A, a] is error (i.e., the parsing-table entry is
empty).
Panic Mode
Panic-mode error recovery is based on the idea of skipping symbols on
the the input until a token in a selected set of synchronizing tokens
appears. Its effectiveness depends on the choice of synchronizing
set. The sets should be chosen so that the parser recovers quickly
from errors that are likely to occur in practice.
Phrase-level Recovery
Phrase-level error recovery is implemented by filling in the blank
entries in the predictive parsing table with pointers to error
routines. These routines may change, insert, or delete symbols on the input
and issue appropriate error messages. They may also pop from the stack.
Alteration of stack symbols or the pushing of new symbols onto the
stack is questionable for several reasons. First, the steps carried out by
the parser might then not correspond to the derivation of any word in the
language at all. Second, we must ensure that there is no possibility of an
infinite loop. Checking that any recovery action eventually results in an
input symbol being consumed (or the stack being shortened if the end
of the input has been reached) is a good way to protect against such loops.

Q4. Define Syntax Directed Definition. Construct a Syntax Directed


Definition to convert infix to postfix translation and also show
annotated parse tree for expression 9-5+2.

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

Ans: A syntax directed definition is a generalization of a context free


grammar in which each grammar symbol has an associated set of
attributes, partitioned into two subsets called the synthesized and
inherited attributes of that grammar symbol.

An attribute can represent anything we choose: a string, a number, a


type, a memory location, or whatever. The value of an attribute at a
parse tree node is defined by a semantic rule associated with a
production used at that node. The value of a synthesized attribute at a
node is computed from the values of attributes at the children of that
node in the parse tree; the value of an inherited attribute is computed
from the values of attributes at the siblings and parent of that node.

Semantic rules set up dependencies between attributes that will be


represented by a graph. From the dependency graph, we derive an

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

evaluation order for the semantic rules. Evaluation of the semantic


rules defines the values of the attributes at the nodes in parse tree for
the input string.
A parse tree showing the values of attributes at each node is called an
annotated parse tree. The process of computing the attributes at the
nodes is called annotating or decorating the parse tree.

Syntax Directed Definition for infix to postfix:

Productions Semantic rule

E -> E+T E.t := E.t||T.t||’+’


E-> E-T E.t=E.t||T.t|| ‘-‘
E-> T E.t=T.t
T-> 0 T.t=’0’
T-> 1 T.t=’1’
……. ………..
T-> 9 T.t=’9’

Q5. What is S attributed & L attributed definitions.

Ans: L-ATTRIBUTED DEFINITIONS : A syntax-directed definition is L-


attributed if each inherited attribute of Xj for i between 1 and n, and on
the right side of production A → X1X2…,Xn, depends only on:

1. The attributes (both inherited as well as synthesized) of the


symbols X1,X2,…, Xj−1 (i.e., the symbols to the left of Xj in the
production, and
2. The inherited attributes of A.

The syntax-directed definition above is an example of the L-attributed


definition, because the inherited attribute L.type depends on T.type,

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

and T is to the left of L in the production D → TL. Similarly, the


inherited attribute L1.type depends on the inherited attribute L.type,
and L is parent of L1 in the production L → L1,id.

S-ATTRIBUTED DEFINITIONS A syntax directed definition that uses


synthesized attributes exclusively is said to be an S- attributed
definition. A parse tree for an S-attributed definition can always be
annotated by evaluating the semantic rules for the attributes at each
node bottom up, from the leaves to the root.

Q6. Translate the expression into Quadruples, Triples and Indirect Triples.
-(a+b)*(c+d)-(a+b+c)
Ans:
Quadruples:

Operator Arg1 Arg2 result


+ a b T1
uminus T1 T2
+ c d T3
+ c T1 T4
* T2 T3 T5
- T5 T4 T6

Triples:

Operator Arg1 Arg2


(0) + a b
(1) uminus (0)
(2) + c d
(3) + (0) c

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

(4) * (1) (2)


(5) - (4) (3)

Indirect Triples:

Operator Arg1 Arg2


(14) + a b
(15) uminus (14)
(16) + c d
(17) + (14) c
(18) * (15) (16)
(19) - (18) (17)

Pointers to triples

(0) (14)
(1) (15)
(2) (16)
(3) (17)
(4) (18)
(5) (19)

Q7. What is activation record and what are its contents with diagram.
Ans: Procedure calls and returns are usually managed by a run-time stack
called the control stack. Each live activation has an activation record
(sometimes called a frame) on the control stack, with the root of the
activation tree at the bottom, and the entire sequence of activation
records on the stack corresponding to the path in the activation tree to
the activation where control currently resides. The latter activation has
its record at the top of the stack. The contents of activation records

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

vary with the language being implemented. Here is a list of the kinds
of data that might appear in an activation record

1. Temporary values, such as those arising from the evaluation of


expressions, in cases where those temporaries cannot be held in
registers.
2. Local data belonging to the procedure whose activation record this
is.
3. A saved machine status, with information about the state of the
machine just before the call to the procedure. This information
typically includes the return address (value of the program counter, to
which the called procedure must return) and the contents of registers
that were used by the calling procedure and that must be restored
when the return occurs.
4. An "access link" may be needed to locate data needed by the called
procedure but found elsewhere, e.g., in another activation record.

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

5. A control link, pointing to the activation record of the caller.


6. Space for the return value of the called function, if any. Again, not
all called procedures return a value, and if one does, we may prefer to
place that value in a register for efficiency.
7. The actual parameters used by the calling procedure. Commonly,
these values are not placed in the activation record but rather in
registers, when possible, for greater efficiency. However, we show a
space for them to be

Q8. Explain Symbol Table Organization and its Data Structure.


Ans: In computer science, a symbol table is a data structure used by a
language translator such as a compiler or interpreter, where
each identifier in a program's source code is associated with
information relating to its declaration or appearance in the source,
such as its type, scope level and sometimes its location.An object
file will contain a symbol table of the identifiers it contains that are
externally visible. During the linking of different object files,
a linker will use these symbol tables to resolve any unresolved
references.
A symbol table may only exist during the translation process, or it may
be embedded in the output of that process for later exploitation, for
example, during an interactive debugging session, or as a resource for
formatting a diagnostic report during or after execution of a program.
The various data structure used to implement the data structure.
List
The simplest and easiest to implement data structure for symbol table
is a linear list of records. We use singlearray or collection of several arrays
for this purpose to store name and their associated information. Now names
are added to end of array. End of array always marks by a point known as
space.

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

Self Organizing List


To reduce the time of searching we can add an addition field ‘linker’ to each
record field or each array index.
When a name is inserted then it will insert at ‘space’ and manage all linkers
to other existing name.

In above figure (a) represent the simple list and (b) represent self organzing
list in which Id1 is related to Id2
and Id3 is related to Id1.

Hash table:
A hash table, or a hash map, is a data structure that associates keys with
values ‘Open hashing’ is a key that
is applied to hash table. In hashing –open, there is a property that no limit on
number of entries that can be
made in table. Hash table consist an array ‘HESH’ and several buckets
attached to array HESH according to
hash function.

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

Search Tree:
Another approach to organize symbol table is that we add two link fields i.e.
left and right child, we use these
field as binary search tree. All names are created as child of root node that
always follow the property of binary
tree i.e. name <name ie and Namej <name. These two statements show that
all smaller name than Namei must be
left child of name otherwise right child of namej. For inserting any name it
always follow binary search tree
insert algorithm.

Q9. Describe the Storage Allocation Strategies of Symbol Table.


The data structure for a particular implementation of a symbol table is
sketched in a separate array ‘arr_lexemes’ holds the character string
forming an identifier. The string is terminated by an

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

end-of-string character, denoted by EOS, that may not appear in


identifiers. Each entry in symbol-table array‘arr_symbol_table’ is a
record consisting of two fields, as “lexeme_pointer”, pointing to the
beginning of a lexeme, and token. Additional fields can hold attribute
values. In figure 9.1, the 0th entry is left empty,
because lookup return 0 to indicate that there is no entry for a string.
The 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th entries are for the ‘a’, ‘plus’ ‘b’
‘and’, ‘c’, ‘minus’, and ‘d’ where 2nd, 4th and 6th entries are for
reserve keyword.

Q10. What is DAG? How basic blocks are represented through DAG.What are
its Advantages and Applications.
Ans: In mathematics and computer science, a directed acyclic
graph (commonly abbreviated to DAG), is a directed graph with
no directed cycles. That is, it is formed by a collection
of vertices and directed edges, each edge connecting one vertex to

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

another, such that there is no way to start at some vertex v and follow
a sequence of edges that eventually loops back to v again.
The DAG Representation of Basic Blocks
Directed acyclic graphs (DAGs) give a picture of how the value
computed by each statement in the basic block is used in the subsequent
statements of the block.

Definition: a dag for a basic block is a directed acyclic graph with the
following labels on nodes:

- leaves are labeled with either variable names or constants.


 they are unique identifiers
 from operators we determine whether l- or r-value.
 represent initial values of names. Subscript with 0.
- interior nodes are labeled by an operator symbol.
- Nodes are also (optionally) given a sequence of identifiers for labels.
- interior node ≡ computed values

- identifiers in the sequence – have that value.

Example of DAG Representation

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

t1:= 4*i
t2:= a[t1]
t3:= 4*i
t4:= b[t3]
t5:= t2 * t4
t6:= prod + t5
prod:= t6
t7:= i + 1
i:= t7
if i <= 20 goto 1
Three address code

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

+
t5
*
prod t4 (1)
[] t2 [] <=
t1, t3
* + t7, i
a b 20
4 i0 1

An object file will contain a symbol table of the identifiers it contains that are
externally visible. During the linking of different object files, a linker will use
these symbol tables to resolve any unresolved references.

A symbol table may only exist during the translation process, or it may be
embedded in the output of that process for later exploitation, for example,
during an interactive debugging session, or as a resource for formatting a
diagnostic report during or after execution of a program.

While reverse engineering an executable a lot of tools refer the symbol table
to check what addresses have been assigned to global variables and known
functions. If the symbol table has been stripped or cleaned out before

Compiler Design (CS-701)


ACROPOLIS INSTITUTE OF TECHNOLOGY AND
RESEARCH
Name: Abhijeet Kumar Pandey Subject:
Compiler Design
Branch: Computer Science & Engg.
Year/Sem:2010/VII

converting it into an executable tools will find it hard to find out addresses
and understand anything about the program.

Compiler Design (CS-701)

Vous aimerez peut-être aussi