Vous êtes sur la page 1sur 96

UNIT-I

INTRODUCTION TO SYSTEM
PROGRAMMING
By
A.C.Bhosale

1 SBPCOE ,TE E&TC SPOS 1/1/2018


Syllabus

 Introduction: Components of System Software,


Language Processing Activities, Fundamentals
of Language Processing.
 Assemblers: Elements of Assembly language
programming. Simple assembler scheme,
Structure of an assembler, Design of single and
two pass assembler.
 Macro Processors: Macro Definition and call,
Macro expansion, Nested Macro Calls,
Advanced Macro Facilities, Design of a two-
pass macro-processor.

2 SBPCOE ,TE E&TC SPOS 1/1/2018


Outline
 Components of System Software
 Language processor
 Language Processing Activities
 Fundamentals of Language Processing
 Assemblers
 Elements of Assembly language programming
 Simple assembler scheme
 Structure of an assembler
 Design of single and two pass assembler
 Macro Processors
 Macro Definition and call
 Macro expansion
 Nested Macro Calls
 Advanced Macro Facilities
 Design of a two-pass macro-processor
3 SBPCOE ,TE E&TC SPOS 1/1/2018
Components of System Software
 Translator
1. Assembler
2. Compiler
3. Interpreter
 System Manager
 Operating System
 Other Utilities
1. Linkers
2. Loader
3. DBMS
4. Debugger
5. Editor

4 SBPCOE ,TE E&TC SPOS 1/1/2018


Introducton of Language
Processor
 Language Processing activities arise due to the
differences between the manner in which a software
designer describes the ideas concerning the behavior
of a software and the manner in whic h these ideas
are implemented in a computer system.
 The designer expresses the ideas in terms related to
the application domain of the software. To implement
these ideas, their description has to be interpreted
in ter ms related to the execution domain.

5 SBPCOE ,TE E&TC SPOS 1/1/2018


Semantic Gap

Semantic Gap

Application Execution
Domain Domain

 Semantic Gap has many consequences


 Large development time
 Large development effort
 Poor quality of software
6 SBPCOE ,TE E&TC SPOS 1/1/2018
Specification and Execution Gaps

Specification Gap Execution Gap

Application PL Execution
Domain Domain Domain

 The software engineering steps aimed at the use of a


PL can be grouped into
 Specification, design and coding steps
 PL implementation steps
7 SBPCOE ,TE E&TC SPOS 1/1/2018
Specification and Execution Gaps
 Specification Gap
 It is the semantic gap between two specifications of the
same task.
 Execution Gap
 It is the gap between the semantics of programs (that
perform the same task) written in different
programming languages.

8 SBPCOE ,TE E&TC SPOS 1/1/2018


Language Processors
“A language processor is a software which bridges a
specification or execution gap”.
 The program form input to a language processor as
the source program and to its output as the target
program.
 The languages in whic h these programs are written
are called source language and target language,
respectively.

9 SBPCOE ,TE E&TC SPOS 1/1/2018


Types of Language Processors
 A language translator bridges an execution gap to
the mac hine language (or assembly language) of a
computer system. E.g. Assembler, Compiler.
 A detranslator bridges the same execution gap as
the language translator, but in the reverse direction.
 A preprocessor is a language processor which
bridges an execution gap but is not a language
translator.
 A language migrator bridges the specification gap
between two PLs.
10 SBPCOE ,TE E&TC SPOS 1/1/2018
Language Processors - Examples
Errors

C++ Program C++ C Program


preprocessor

Errors

C++ Program C++ Machine Language


translator Program

11 SBPCOE ,TE E&TC SPOS 1/1/2018


Interpreters
 An interpreter is a language processor whic h bridges an
execution gap without generating a mac hine language program.
 An interpreter is a language translator according to classification.

Interpreter Domain

Application PL Executio
Domain Domain n
Domain
12 SBPCOE ,TE E&TC SPOS 1/1/2018
Language Processing Activities
 Program Generation Activities
 Program Execution Activities

13 SBPCOE ,TE E&TC SPOS 1/1/2018


Program Generation
Errors

Program Program Generator Program in


Specification target PL

Specification Gap

Application Program Target PL Execution


Domain Generator Domain Domain
Domain
14 SBPCOE ,TE E&TC SPOS 1/1/2018
Program Execution
 Two popular models for program execution are
translation and interpretation.
 Program transla tion
Errors Data

m/c
Source Translator language Target
Program Program
program

15 1/1/2018
SBPCOE ,TE E&TC SPOS
 A program must be translated before it can be executed.
The translated program may be saved in a file. The saved program may be
executed repeatedly.

A program must be retranslated following modifications.

16 SBPCOE ,TE E&TC SPOS 1/1/2018


CPU Memory

PC Mac hine
PC Source
Language
Program
Program
+
Errors +
Data
Data

Interpretatio Program execution


n
17 SBPCOE ,TE E&TC SPOS 1/1/2018
Fundamentals of Language
Processing
Language Processing = Analysis of SP + Synthesis of TP
Collection of LP components engaged in analysis a source
program as the analysis phase and components engaged in
synthesizing a target program constitute the synthesis phase.

18 SBPCOE ,TE E&TC SPOS 1/1/2018


Analysis Phase
 The specification consists of three components:
 Lexical rules which govern the formation of valid lexical units in the
source
language
. Syntax rules which govern the formation of valid statements in the source
 language.
Semantic rules which associate meaning with valid statements of the language.

 Consider the following example:


percent_profit = (profit * 100) / cost_price;
Lexical units identifies =, * and / operators, 100 as constant, and the remaining
strings as identifiers.
Syntax analysis identifies the statement as an assignment statement with
percent_profit as the left hand side and (profit * 100) / cost_price as the
expression on the right hand side.
Semantic analysis determines the meaning of the statement to be the assignment
19 SBPCOE
of profit,TE E&TC
X 100 SPOS to percent_profit.
/ cost_price 1/1/2018
Synthesis Phase
 The synthesis phase is concerned with the construction of target
language statements which have the same meaning as a source
statement.
 It performs two main activities:
 Creation of data structures in the target program (memory allocation)
 Generation of target code (code generation)
 Example
MOVER AREG,
PROFIT MULT AREG,
100
DIV AREG, COST_PRICE
MOVEM AREG,
PERCENT_PROF PERCENT_P 1 ROFIT
I … 1
T DW 1
20 SBPCOE
PROFIT,TE E&TC SPOS DW 1/1/2018
COST_PRICE DW
Phases and Passes of LP

Language Processor

Analysis Synthesis Target


Source
Program Phase Phase Program

Errors Errors

 Analysis of source statements can not be immediately followed


by synthesis of equivalent target statements due to following
reasons:
 Forward References
 Issues concerning memory requirements and organization of a LP
21 SBPCOE ,TE E&TC SPOS 1/1/2018
Lexical Analysis (Scanning)
 It identifies the lexical units in a source statements. It then
classifies the units into different lexical classes, e.g. id‟
s, constants, reser ved id‟ s, etc. and enters them into
different tables.
 It builds a descriptor, called token, for eac h lexical unit. A
token contains two fields – class code and number in
class.
 class code identifies the class to whic h a lexical unit
belongs. number in class is the entry number of the
lexical unit in the relevant table.
 We depict a token as Code # no, e.g. Id # 10

22 SBPCOE ,TE E&TC SPOS 1/1/2018


Lexical Analysis (Scanning) -
Example
Symbol Type Length Address

i : integer; 1 i int
2 a real
a, b : real; 3 b real
a := b + i; 4 i* real
5 temp real
Note that int i first needed to be converted into real, that is why 4thentry is
added into the table.
Addition of entry 3 and 4, gives entry 5 (temp), which is value b + (i *).

The statement a := b+i; is represented as the string of tokens

Id#2 Op#5 Id#3 Op#3 Id#1 Op#10


23 SBPCOE ,TE E&TC SPOS 1/1/2018
Syntax Analysis (Parsing)
 It processes the string of tokens built by lexical analysis
to
determine the statement class, e.g. assignment statement, if
statement etc.
 It then builds an IC which represents the structure of a
statement. The IC is passed to semantic analysis to
determine the meaning of the statement.

real :=

a b a +

a, b : real b i
24 SBPCOE ,TE E&TC SPOS 1/1/2018
a := b + i

25 SBPCOE ,TE E&TC SPOS 1/1/2018


Semantic Analysis
 It identifies the sequence of actions necessary to implement the
meaning of a source statement.
 It determines the meaning of a sub tree in the IC, it adds information
to a table or adds an action to the sequence of actions. The analysis ends
when the tree has been completely processed.

:= := :=

a, real + a, real + a, real temp, real

b, real i, int b, real i*, real

26 SBPCOE ,TE E&TC SPOS 1/1/2018


Analysis Phase (Front end)
Source
Program

Lexica
l Scanning
Errors Tokens
Symbol Table
Syntax Parsing
Errors
Constants Table
Trees Other tables

Semantic Semantic
Errors Analysis

IC

27 SBPCOE ,TE E&TC SPOS 1/1/2018


IR
Synthesis Phase(Backend)
 It performs memory allocation and code generation.

 Memory Allocation
 The memory requirement of an identifier is computed from its type,
length and dimensionality and memory is allocated to it.
 The address of the memory area is entered in the symbol table.

Symbol Type Length Address


1 i int 2000
2 a real 2001
3 b Real 2002

28 SBPCOE ,TE E&TC SPOS 1/1/2018


Synthesis Phase (Back-end)
 Code Generation
It uses knowledge of the target architecture, viz. knowledge of

instructions and addressing modes in the target computer, to select
the
 appropriate instructions.
 The synthesis phase may decide to hold the values of i*
and temp in
 machine registers and may generate the assembly code.
 a := b + i;
CONV_R AREG, I
ADD_R AREG, B
MOVEM AREG, A

29 SBPCOE ,TE E&TC SPOS 1/1/2018


Synthesis Phase (Back end)
IR

IC Memory
Allocation
Symbol Table
Constants Table
Other tables
Code
Generation

Target
30 SBPCOE ,TE E&TC SPOS
Progra 1/1/2018
m
Fundamentals of Language Specifica tion

 PL Grammars
 The lexical and syntactic features of a programming
language are specified by its grammar.
 A language L can be considered to be a collection of
valid sentences.
 Each sentence can be looked upon as a sequence of
words, and eac h word as a sequence of letters or
graphic symbols acceptable in L.
 A language specified in this manner is known as a
formal language.

31 SBPCOE ,TE E&TC SPOS 1/1/2018


Alphabet
 The alphabet of L, denoted by the Greek symbol ∑
is the collection of symbols in its c haracter set.
 We use lower case letters a, b, c, etc. to denote
symbols in ∑
 A symbol in the alphabet is known as a
terminal
symbol (T) of L.
 The alphabet can be represented using mathematical
notation of a set, e.g.
∑ = { a, b, …, z, 0, 1, …, 9 }
32 SBPCOE ,TE E&TC SPOS 1/1/2018
where {, “,”, } are called meta
symbols.

33 SBPCOE ,TE E&TC SPOS 1/1/2018


String
A string is a finite sequence of symbols.
We represent strings by Greek symbols α, β, γ, etc.

Thus α= axy is a string over ∑


The length of a string is the number of symbols in it.

Absence of any symbol is also a string, null string ε.

Example

α= ab, β=axy

34 SBPCOE ,TE E&TC SPOS 1/1/2018


αβ = α.β = abaxy
[concatenation]

35 SBPCOE ,TE E&TC SPOS 1/1/2018


Nonterminal symbols
 A Nonterminal symbol (NT) is the name of a syntax
category of a language, e.g. noun, verb, etc.
 An NT is written as a single capital letter, or as a
name enclosed between <…>, e.g. A or <Noun>.
 It is a set of symbols not in ∑ that represents
intermediate states in the generation process.

36 SBPCOE ,TE E&TC SPOS 1/1/2018


Productions
 A production, also called a rewriting rule, is a rule
of the grammar.
 It has the form
A nonterminal symbol ::= String of Ts and NTs

L.H.S. R.H.S.
e.g. <article> ::= a | an | the
<Noun> ::= boy | apple
<Noun Phrase> ::= <article> <Noun>
37 SBPCOE ,TE E&TC SPOS 1/1/2018
Derivation, Reduction and Parse
Trees
 A grammar G is used for two purposes, to generate
valid strings of LGand to „recognize‟ valid strings of
LG.
 The derivation operation helps to generate valid
strings while the reduction operation helps to
recognize valid strings.
 A parse tree is used to depict the syntactic structure
of a valid string as it emerges during a sequence of
derivations or reductions.

38 SBPCOE ,TE E&TC SPOS 1/1/2018


Derivation
 Let production P1of grammar G be of the form
P1: A::= α
and let β be a string suc h that β = γAθ, then replacement of
A by α in string β constitutes a derivation according to
production P1.
 Example
<Sentence> ::= <Noun Phrase><Verb Phrase>
<Noun Phrase> ::= <Article> <Noun>
<Verb Phrase> ::= <Verb><Noun Phrase>
<Article> ::= a | an | the
<Noun> ::= boy | apple
<Verb> ::= ate
39 SBPCOE ,TE E&TC SPOS 1/1/2018
Derivation
 The following strings are sentential forms of LG.
<Noun Phrase> <Verb Phrase>
the boy <Verb Phrase> sentential
<Noun Phrase> ate <Noun Phrase> forms

the boy ate <Noun Phrase>

the boy ate an apple sentence

40 SBPCOE ,TE E&TC SPOS 1/1/2018


Reduction
Let production P1of grammar G be of the form
P1: A::= α
and let σ be a string such that σ = γAθ, then replacement of α by A
in string σ constitutes a reduction according to production P1.
Step String
0 the boy ate an apple
1 <Article> boy ate an apple
2 <Ar ticle> <Noun> ate an apple
3 <Ar ticle> <Noun> <Verb> an apple
4 <Ar ticle> <Noun> <Verb> <Ar ticle> apple
5 <Ar ticle> <Noun> <Verb> <Ar ticle> <Noun>
6 <Noun Phrase> <Verb> <Ar ticle> <Noun>
7 <Noun Phrase> <Verb> <Noun Phrase>
8 <Noun Phrase> <Verb Phrase>
41 SBPCOE ,TE E&TC SPOS 1/1/2018
9 <Sentence>
Parse
Trees
 A sequence of derivations or reductions reveals the syntactic
structure of a string with respect to G, in the form of a parse tree.
<Sentence> 9

<Noun Phrase> 6 <Verb Phrase> 8

<Ar ticle> 1 <Noun> 2 <Verb> 3 <Noun Phrase> 7

<Ar ticle> 4<Noun> 5

42 the
SBPCOE ,TE E&TC boy
SPOS ate an apple
1/1/2018
Classification of Grammars
 Type-0 grammar (Phrase Structure Grammar)
α ::= β, where both can be strings of Ts and NTs.
But it is not relevant to specification of Prog. Lang.
 Type-1 grammar (Context Sensitive Grammar)
α A β ::= α π β,
But it is not relevant to specification of Prog. Lang.
 Type-2 grammar (Context Free Grammar)
A ::= π, which can be applied independent of its context.
CFGs are ideally suited for PL specifications.
 Type-3 grammar (Linear or Regular Grammar)
A ::= t B | t OR A ::= B t | t
Nesting of constructs or matching of parentheses cannot be
specified using such productions.
43 SBPCOE ,TE E&TC SPOS 1/1/2018
Ambiguity in Grammatic Specification

 It implies the possibility of different interpretation of a


source string.
 Existence of ambiguity at the level of the syntactic structure
of a string would mean that more than one parse tree can
be built for the string. So string can have more than one
meaning associated with it.

Ambiguous Grammar a+b*c a+b*c


+ *
E  id | E + E | E * E + c
a *
Id  a | b | c
b c a b
Assume source
string is a + b * c
44 SBPCOE ,TE E&TC SPOS 1/1/2018
Eliminating Ambiguity – An
Example
Unambiguous Grammar a+b*c a+b*c
E  E +T|T id + id * id + id *
T  T * F| F id P + P * id P + P *
FF^P|P P P F+P* P F+F*
 id P T+F* P T+T*
id  a | b | c F E+T* F E+T
T E
E * T (?? Ambiguous) (Unambiguous
)
E E
E
T
E T T T
T
F F F F FT F
P P P P P P

45 SBPCOE ,TE E&TC SPOS id + id * id id + 1/1/2018


id * id
GTUExamples
Listout the unambiguous production rules
(grammar)
for arithmetic expression containing +, –, *, / and ^
(exponent).
E E+T|E–T|T
T T*F|T/F|F
F F^P|P
P  (E) | <id>

46 <id> – <id> * <id> ^ <id> + <id>1/1/2018


Derive stringSPOS
SBPCOE ,TE E&TC
E E+T E Parse Tree
 E–T+T +
E T
 T – T+T –
 F–T+T E T F
 P– T+T *
 <id> – T + T T T F
^
 <id> – T * F + T
F F F P P
 <id> – F * F + T
 <id> – P * F + T P P P
 <id> – <id> * F + T
id – id * id ^ id + id
 <id> – <id> * F ^ P + T
 <id> – <id> * P ^ P + T Abstract Syntax Tree
 <id> – <id> * <id> ^ P + T +
 <id> – <id> * <id> ^ <id> + T –
id
 <id> – <id> * <id> ^ <id> + F id *
 <id> – <id> * <id> ^ <id> + P
id ^
 <id> – <id> * <id> ^ <id> +
SBPCOE ,TE E&TC SPOS
47 1/1/2018
<id> id id
Another Example
 Consider the following grammar:
S aSbS|bSaS|ε

Derive the string abab. Draw corresponding parse


tree. Are these rules ambiguous ? Justify.

48 SBPCOE ,TE E&TC SPOS 1/1/2018


Assemblers

49 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembler: Definition
• Translating source code written in assembly
language to object code.

50 SBPCOE ,TE E&TC SPOS 1/1/2018


Language Levels
High Level Language

Assembler Language

Machine Language

Micro - Firmware
programming

Hardware
51 SBPCOE ,TE E&TC SPOS 1/1/2018
Machine code

 Machine code:

 Set of commands directly executable via CPU


 Commands in numeric code
 Lowest semantic level

52 SBPCOE ,TE E&TC SPOS 1/1/2018


Machine code language
 Structure:
 Operation code OpCode OpAddress
 Defining executable operation
 Operand address
 Specification of operands
 Constants/register addresses/storage addresses

53 SBPCOE ,TE E&TC SPOS 1/1/2018


Elements of the Assembly
Language Programming
 An Assembly language is a
 machine dependent,
 low level Programming language specific to a certain
computer system.
Three features when compared with machine language
are
1. Mnemonic Operation Codes
2. Symbolic operands
3. Data declarations

54 SBPCOE ,TE E&TC SPOS 1/1/2018


Elements of the Assembly
Language Programming
Mnemonic operation codes: eliminates the need to memorize
numeric operation codes.

Symbolic operands: Symbolic names can be associated with


data or instructions. Symbolic names can be used as
operands in assembly statements (need not know details of
memory bindings).

Data declarations: Data can be declared in a variety of


notations, including the decimal notation (avoids conversion
of constants into their internal representation).
55 SBPCOE ,TE E&TC SPOS 1/1/2018
Assembly language-structure
<Label> <Mnemomic> <Operand> Comments
 Label
 symbolic labeling of an assembler address
(command address at Machine level)
 Mnemomic
 Symbolic description of an operation
 Operands
 Contains of variables or addresse if necessary
 Comments

56 SBPCOE ,TE E&TC SPOS 1/1/2018


Statement format
An Assembly language statement has following format:
[Label] <opcode> <operand spec>[,<operand spec>..]
If a label is specified in a statement, it is associated
as a symbolic name with the memory word
generated for the statement.

<operand spec> has the following syntax:

<symbolic name> [+<displacement>] [(<index register>)]

Eg. AREA, AREA+5, AREA(4), AREA+5(4)


57 SBPCOE ,TE E&TC SPOS 1/1/2018
Mnemonic Operation Codes
 Each statement has two operands, first operand is always a
register and second operand refers to a memory word using
a symbolic name and optional displacement.

58 SBPCOE ,TE E&TC SPOS 1/1/2018


Operation Codes
 MOVE instructions move a value between a memory word and
a register
 MOVER – First operand is target and second operand is source
 MOVEM – first operand is source, second is target

 All arithmetic is performed in a register (replaces the contents


of a register) and sets condition code.

 A Comparision instruction sets condition code analogous to


arithmetics, i.e. without affecting values of operands.

 condition code can be tested by a Branch on Condition (BC)


instruction and the format is:
BC <condition code spec> , <memory address>
59 SBPCOE ,TE E&TC SPOS 1/1/2018
Machine Instruction Format

 sign is not a part of the instruction


 Opcode: 2 digits, Register Operand: 1 digit, Memory
Operand: 3 digits
 Condition code specified in a BC statement is encoded
into the first operand using the codes 1- 6 for
specifications LT, LE, EQ, GT, GE and ANY respectively
 In a Machine Language Program, all addresses and
constants are shown in decimal as shown in the next
60 slide
SBPCOE ,TE E&TC SPOS 1/1/2018
Example: ALP and its equivalent Machine
Language Program

61 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembly Language Statements
 An assembly program contains three kinds of
statements:
1) Imperative Statements
2) Declaration Statements
3) Assembler Directives

Imperative Statements: They indicate an action to be


performed during the execution of an assembled program.
Each imperative statement is translated into one machine
instruction.

62 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembly Language Statements
 Declaration Statements: syntax is as follows:
[Label] DS <constant>
[Label] DC '<value>'
 The DS (declare storage) statement reserves memory and associates names with
them.
 Ex:
A DS 1 ; reserves a memory area of 1 word, associating the name A to it
G DS 200 ; reserves a block of 200 words and the name G is associated with the
first word of the block (G+6 etc. to access the other words)
 The DC (declare constant) statement constructs memory words containing
constants.
 Ex:
ONE DC '1’ ; associates name one with a memory word containing value 1

63 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembly Language Statements

Use of Constants
 The DC statement does not really implement constants
 it just initializes memory words to given values.
 The values are not protected by the assembler and can
be changed by moving a new value into the memory
word.
 In the above example, the value of ONE can be changed
by executing an instruction
MOVEM BREG, ONE

64 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembly Language Statements

Use of Constants
 An Assembly Program can use constants just like HLL, in
two ways – as immediate operands, and as literals.

• 1) Immediate operands can be used in an assembly


statement only if the architecture of the target machine
includes the necessary features.
 Ex: ADD AREG,5

 This is translated into an instruction from two operands – AREG and


the value '5' as an immediate operand

65 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembly Language Statements

Use of Constants
 2) A literal is an operand with the syntax = '<value>'.
• It differs from a constant because its location cannot be
specified in the assembly program.
• Its value does not change during the execution of the
program.
• It differs from an immediate operand because no
architectural provision is needed to support its use.

ADD AREG, =‘5’  ADD AREG, FIVE


FIVE DC ‘5’
Use of literals vs. Use of DC
66 SBPCOE ,TE E&TC SPOS 1/1/2018
Assembly Language Statements
Assembler Directive
 Assembler directives instruct the assembler to perform
certain actions during the assembly of a program.

 Some assembler directives are described in the following:


1) START <constant>
 This directive indicates that the first word of the target
program generated by the assembler should be placed in
the memory word having address <constant>.

2) END [<operand spec>]


 This directive indicates the end of the of the source
program. The optional <operand spec> indicates the
address of the instruction where the execution of the
program should begin.
67 SBPCOE ,TE E&TC SPOS 1/1/2018
Advantages of Assembly Language

• The primary advantages of assembly language


programming over machine language programming
are due to the use of symbolic operand specifications.
(in comparison to machine language program)

• Assembly language programming holds an edge over


HLL programming in situations where it is desirable to
use architectural features of a computer.
(in comparison to high level language program)

68 SBPCOE ,TE E&TC SPOS 1/1/2018


Fundamentals of LP
 Language processing = analysis of source
program + synthesis of target program
 Analysis of source program is specification of
the source program
 Lexical rules: formation of valid lexical
units(tokens) in the source language
 Syntax rules : formation of valid statements in
the source language
 Semantic rules: associate meaning with valid
statements of the language

69 SBPCOE ,TE E&TC SPOS 1/1/2018


Fundamentals of LP
 Synthesis of target program is construction of
target language statements
 Memory allocation : generation of data structures
in the target program
 Code generation

70 SBPCOE ,TE E&TC SPOS 1/1/2018


A simple Assembly Scheme
 There are two phases in specifying an
assembler:
1. Analysis Phase
2. Synthesis Phase(the fundamental
information requirements will arise in this
phase)

71 SBPCOE ,TE E&TC SPOS 1/1/2018


A simple Assembly Scheme
Design Specification of an assembler
There are four steps involved to design the
specification of an assembler:
 Identify information necessary to perform a task.
 Design a suitable data structure to record info.
 Determine processing necessary to obtain and
maintain the info.
 Determine processing necessary to perform the
task

72 SBPCOE ,TE E&TC SPOS 1/1/2018


Synthesis Phase: Example
Consider the following statement:
MOVER BREG, ONE
The following info is needed to synthesize machine
instruction for this stmt:
1. Address of the memory word with which name ONE is
associated [depends on the source program, hence
made available by the Analysis phase].

2. Machine operation code corresponding to MOVER


[does not depend on the source program but depends
on the assembly language, hence synthesis phase can
determine this information for itself]
Note: Based on above discussion, the two data structures
required during the synthesis phase are described
next

73 SBPCOE ,TE E&TC SPOS 1/1/2018


Data structures in synthesis phase
Symbol Table --built by the analysis phase
 The two primary fields are name and address of the symbol
used to specify a value.
Mnemonics Table --already present
- The two primary fields are mnemonic and opcode,
along with length.

Synthesis phase uses these tables to obtain


 The machine address with which a name is associated.
 The machine op code corresponding to a mnemonic.

 The tables have to be searched with the


 Symbol name and the mnemonic as keys

74 SBPCOE ,TE E&TC SPOS 1/1/2018


Analysis Phase
 Primary function of the Analysis phase is to
build the symbol table.
 It must determine the addresses with which the
symbolic names used in a program are associated
 It is possible to determine some addresses directly like
the address of first instruction in the program
(ie.,start)
 Other addresses must be inferred
 To determine the addresses of the symbolic names we
need to fix the addresses of all program elements
preceding it through Memory Allocation.
 To implement memory allocation a data structure
called location counter is introduced.

75 SBPCOE ,TE E&TC SPOS 1/1/2018


Analysis Phase – Implementing memory
allocation
 LC(location counter) :
 is always made to contain the address of the next memory
word in the target program.
 It is initialized to the constant specified at the START
statement.
 When a LABEL is encountered,
 it enters the LABEL and the contents of LC in a new entry
of the symbol table.
LABEL – e.g. N, AGAIN, SUM etc
 It then finds the number of memory words required by the
assembly statement and updates the LC contents
 To update the contents of the LC, analysis phase needs to
know lengths of the different instructions
 This information is available in the Mnemonics table and is
extended with a field called length
 We refer the processing involved in maintaining the LC as
76
LCSBPCOE
Processing
,TE E&TC SPOS 1/1/2018
Example
START 100
MOVER BREG, N LC = 100 (1 byte)
MULT BREG, N LC = 101 (1 byte)
STOP LC = 102 (1 byte)
N DS 5 LC = 103

Symbol Address

N 103

77 SBPCOE ,TE E&TC SPOS 1/1/2018


 Since there the instructions take different amount
of memory, it is also stored in the mnemonic table
in the “length” field

Mnemonic Opcode Length


MOVER 04 1
MULT 03 1

78 SBPCOE ,TE E&TC SPOS 1/1/2018


Data structures of an assembler
During analysis and
Synthesis phases Mnemon Opcod lengt
ic e h
ADD 01 1
SUB 02 1
Mnemonic Table

Source Analysis Synthesis Target


---------------------------------> Phase Program
Program Phase

Symb Addre
ol ss  Data Access
-- > Control
N 104
Access
AGAI 113
Symbol Table
N

79 SBPCOE ,TE E&TC SPOS 1/1/2018


Data structures
 Mnemonics table is a fixed table which is
merely accessed by the analysis and synthesis
phases
 Symbol table is constructed during analysis
and used during synthesis

80 SBPCOE ,TE E&TC SPOS 1/1/2018


Tasks Performed : Analysis Phase
 Isolate the labels, mnemonic, opcode and
operand fields of a statement.

 If a label is present, enter (symbol, <LC>) into


the symbol table.

 Check validity of the mnemonic opcode using


mnemonics table.

 Update value of LC.

81 SBPCOE ,TE E&TC SPOS 1/1/2018


Tasks Performed : Synthesis Phase
 Obtain machine opcode corresponding to the
mnemonic from the mnemonic table.

 obtain address of the memory operand from symbol


table.

 Synthesize a machine instruction or machine form


of a constant, depending on the instruction.

82 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembler’s functions
 Convert mnemonic operation codes to their
machine language equivalents
 Convert symbolic operands to their equivalent
machine addresses
 Build the machine instructions in the proper
format
 Convert the data constants to internal machine
representations
 Write the object program and the assembly
listing

83 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembler:Design
• The design of assembler can be of:
– Scanning (tokenizing)
– Parsing (validating the instructions)
– Creating the symbol table
– Resolving the forward references
– Converting into the machine language

84 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembler Design
• Pass of a language processor – one complete
scan of the source program
• Assembler Design can be done in:
– Single pass
– Two pass
• Single Pass Assembler:
– Does everything in single pass
– Cannot resolve the forward referencing
• Two pass assembler:
– Does the work in two pass
85 SBPCOE ,TE E&TC SPOS 1/1/2018
– Resolves the forward references
Difficulties: Forward Reference
 Forward reference: reference to a label that is
defined later in the program.

Loc Label Operator Operand

1000 FIRST STL RETADR

1003 CLOOP JSUB RDREC


… … … … …
1012 J CLOOP
… … … … …
1033 RETADR RESW 1

86 SBPCOE ,TE E&TC SPOS 1/1/2018


Backpatching

 The problem of forward references is handled using


a process called backpatching
 Initially, the operand field of an instruction
containing a forward reference is left blank
 Ex: MOVER BREG, ONE can be only partially
synthesized since ONE is a forward reference
 The instruction opcode and address of BREG will be
assembled to reside in location 101
 To insert the second operand’s address later, an entry
is added as Table of Incomplete Instructions (TII)
 The entry TII is a pair (<instruction address>,
<symbol>) which is (101, ONE) here

87 SBPCOE ,TE E&TC SPOS 1/1/2018


Backpatching

 The problem of forward references is handled using a


process called backpatching
 When END statement is processed, the symbol table would
contain the addresses of all symbols defined in the source
program
 So TII would contain information of all forward references
 Now each entry in TII is processed to complete the
instruction
 Ex: the entry (101, ONE) would be processed by obtaining
the address of ONE from symbol table and inserting it in
the operand field of the instruction with assembled address
101.
 Alternatively, when definition of some symbol L is
encountered, all forward references to L can be processed

88 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembler Design

• Symbol Table:
– This is created during pass 1
– All the labels of the instructions are symbols
– Table has entry for symbol name, address value.
• Forward reference:
– Symbols that are defined in the later part of the
program are called forward referencing.
89
– There
SBPCOE will SPOS
,TE E&TC not be any address value for such symbols
1/1/2018
in the symbol table in pass 1.
Assembler Design

• Assembler directives are pseudo instructions.


– They provide instructions to the assemblers itself.
– They are not translated into machine operation
codes.

90 SBPCOE ,TE E&TC SPOS 1/1/2018


Assembler Design

• First pass:
– Scan the code by separating the symbol, mnemonic
op code and operand fields
– Build the symbol table
– Perform LC processing
– Construct intermediate representation
• Second Pass:
– Solves forward references
– Converts the code to the machine code
91 SBPCOE ,TE E&TC SPOS 1/1/2018
Two Pass Assembler
 Read from input line
 LABEL, OPCODE, OPERAND

Source
program

Intermediate Object
Pass 1 Pass 2
file codes

OPTAB SYMTAB SYMTAB

92 SBPCOE ,TE E&TC SPOS 1/1/2018


Data Structures in Pass I
 OPTAB – a table of mnemonic op codes
 Contains mnemonic op code, class and mnemonic info
 Class field indicates whether the op code corresponds to
 an imperative statement (IS),
 a declaration statement (DL) or
 an assembler Directive (AD)
 For IS, mnemonic info field contains the pair ( machine
opcode, instruction length)
 Else, it contains the id of the routine to handle the
declaration or a directive statement
 The routine processes the operand field of the statement to
determine the amount of memory required and updates LC
and the SYMTAB entry of the symbol defined

93 SBPCOE ,TE E&TC SPOS 1/1/2018


Data Structures in Pass I
 SYMTAB - Symbol Table
 Contains address and length
 LOCCTR - Location Counter
 LITTAB – a table of literals used in the program
 Contains literal and address
 Literals are allocated addresses starting with the
current value in LC and LC is incremented,
appropriately

94 SBPCOE ,TE E&TC SPOS 1/1/2018


OPTAB (operation code table)
 Content
 Menmonic opcode, class and mnemonic info
 Characteristic
 static table
 Implementation
 array or hash table, easy for search

95 SBPCOE ,TE E&TC SPOS 1/1/2018


SYMTAB (symbol table)
 Content
 label name, value, flag, (type, COPY 1000
length) etc. FIRST 1000
CLOOP 1003
 Characteristic ENDFIL 1015
 dynamic table (insert, delete, EOF 1024
search) THREE 102D
ZERO 1030
 Implementation RETADR 1033
 hash table, non-random keys, LENGTH 1036
hashing function BUFFER 1039
RDREC 2039

96 SBPCOE ,TE E&TC SPOS 1/1/2018

Vous aimerez peut-être aussi