Vous êtes sur la page 1sur 10

System Programming (BTCS-405A)

Session: Jan-May,2018

Assignment-1
_________________________________________________________________

CONTENTS:
1) Brief Study on System Programming Languages.

2) The Compiler: A Literature Review.

3) Future trends on Assembler.

Submitted By:
Somya Sharma

160280710

CSE B2

Submitted To:
Mr. Malkeet Singh

Associate Professor, CSE Deptt.


_________________________________________________________________

SHAHEED BHAGAT SINGH STATE TECHNICAL CAMPUS


Moga Road (NH-95), Ferozepur-152004
March-2018
_________________________________________________________________

1. Brief Study of System Programming


Languages
Introduction to systems programming
1. Systems Programming: involves developing those programs that interface the
computer system (the hardware) with the programmer and the user. These
programs include compilers, interpreters, assemblers, I/O routines, schedulers,
etc.

2. Systems programs are different from application programs in many ways. a)


Systems programs must deal effectively with unpredictable events or “exceptions”
(such as I/O errors). b) Systems programs must co-ordinate the activities of
various asynchronously executing programs. Most systems programming is done
with assembly language, but C, C++, and C# (C Sharp) are also used.
3. Syntax of Programming Languages: ( syntax ∴ grammar) The syntax of a
programming language is the set of rules and writing conventions that allow the
formation of correct programs in a language. Syntax deals only with the
“representation”; it only controls the structure of a sentence and nothing more.
Syntax has nothing to do with the meaning or runtime behaviour of a program.
E.g., a program may be syntactically correct but not do anything useful. The
syntax of a language is built from “syntactic elements” or “syntactic units”.
Examples of syntactic units are: (a) Character set – e.g., English and mathematical
symbols, (b) Identifiers – names for variables, functions, etc., (c) Keywords, (d)
Noise words – optional words inserted in programs to improve program
readability, (e) Comments – for program documentation, (f) Spaces, (g)
Delimiters – e.g., to mark the beginning and end of a function in C, we use the
pair of curly braces { and }, while in Pascal we use BEGIN and END.
Requirements of syntax: We want a language with a syntax that is: (a) Easy to
read → hence easy to debug. (b) Easy to write → fewer bugs in program (c) Easy
to verify the correctness of a program (d) Easy to translate into another language
(e) Not ambiguous.
4. Semantics: Semantics pertains to the meaning of words. The semantics of a
language is a description of what the sentences mean. It is much more difficult to
express the semantics of a language than it is to express the syntax. E.g., the
sentence “They are flying airplanes” has more than one meaning. In order to
implement a programming language we must know what each sentence means
(declaration, expression, etc). E.g., does the sentence, produce an output, take any
inputs, change the value stored in a variable, produce an error.

5. Domain: It refers to the scope or sphere of any activity. Application Domain: The
scope of an application is its application domain. E.g., the application domain of
an inventory program is warehouse and its associated tangibles (goods,
machinery, etc), transactions (e.g., receiving goods, purchase orders, locating
goods, shipping of goods, receiving payments, etc), people (e.g., workers,
managers, customers). All the above are objects in the application domain. The
application domain can best be described by a person in that domain. E.g., the
warehouse manager in the above example. Execution Domain: (also called as the
solution domain). The execution domain is the work of programmers, e.g.,
program code, documentation, test results, files, computers, etc. The solution
domain is partitioned into two levels: Abstract, high-level documents, such as
flow charts, diagrams, Low-level – data structures, function definitions, etc.
6. Semantic Gap: The difference between the semantics of the application domain
and the execution domain is called the semantic gap. Semantic Gap Application
Domain (problems, ideas, methods, to solve these problems).
7. Systems Programming Consequences of semantic gap: Large development times
– interaction between designers in application domain and programmers. Large
development efforts. Poor quality of software. The semantic gap is reduced by
programming languages (PL). The use of a PL introduces a new domain called the
programming language domain (or PL domain). Execution gap Specification gap
Application Domain PL Domain (problems, ideas, methods, to solve these
problems) Execution Domain (machine code, devices, etc.) The PL domain
bridges the gap between the application domain and the execution domain.
Specification gap: It is the semantic gap between the application domain and the
PL domain. It can also be defined as the semantic gap between the two
specifications of the same task. The specification gap is bridged by the software
development team. Execution gap: It is the gap between the semantics of
programs written in different programming languages. The execution gap is
bridged by the translator or interpreter. Advantages of introducing the PL domain:
(a) Large development times are reduced. (b) Better quality of software. (c)
Language processor provides diagnostic capabilities which detects errors. 9.
Language Processor: It is a software which bridges the specification or execution
gap. 10. Language Processing: It is any activity performed by a language
processor. Diagnostic capability is a feature of a language processor.
8. Processor is the source program. The output of a language processor is the target
program. The target program is not produced if the language processor finds any
errors in the source program. Source program Language Processor Target program
Types of language processors: (a) Language Translator: This bridges the
execution gap to the machine language of a computer system. Examples are
compiler and assembler. (b) De-translator: Similar to translator, but in the
opposite direction. (c) Preprocessor: This is a language processor whose source
and target languages are both high level, i.e., no translation takes place.
9. Problem-oriented Languages: In case of problem-oriented languages. The the PL
domain is very close to the application domain. The specification gap is reduced
in this case. Such PLs can be used only for specific applications, hence they are
called problem-oriented languages. They have a large execution gap, but the
execution gap is bridged by the translator or interpreter. Using these languages,
we only have to do specify “what to do”. Software development takes less time
using problem-oriented languages, but the resultant code may not be optimized.
Examples : Fourth generation languages (4GL) like SQL. 12. Procedure-oriented
languages: These provide general facilities and features which are required in
most applications. These languages are independent of application domains.
Hence, there is a large specification gap. The gap must be bridged by the
application designer. Using these languages, we have to specify “what to do” and
“how to do”. Examples. C, C++, FORTRAN, etc. 13. Compiler: A compiler is a
language translator. It translates a source code (programs in a high-level language)
into the target code (machine code, or object code). Source program Input
Compiler Target program Target program Output To do this translation, a
compiler steps through a number of phases. The simplest is a 2-phase compiler.
The first phase is called the front end and the second phase is called the back end.
10. Front End: The front end translates from the high-level language to a common
intermediate language. The front end is source language dependent but it is
machine-independent. Thus, the front end consists of the following phases: lexical
analysis, syntactic analysis, creation of symbol table, semantic analysis and
generation of intermediate code. The front end also includes error-handling
routines for each of these phases. Back End: The back end translates from this
common intermediate language to the machine code. The back end is machine
dependent. This includes code optimization, code generation, error-handling and
symbol table operations. Thus, a compiler bridges the execution gap.
11. Interpreter: It is a language processor. It also bridges the execution gap but does
not generate the machine code. An interpreter executes a program written in a
high level language. The essential difference between a compiler and an
interpreter is that while a compiler generates the machine code and is then no
longer needed, an interpreter is always required. Source program Interpreter
Output Input Characteristics of interpreter: Machine code is not stored. !" Source
code is essential for repeated execution of statements. Statement is analysed
during its interpretation. Useful for testing and debugging as overhead of storage
is not incurred. Differences between compiler and interpreter: COMPILER
INTERPRETER 1. Scans the entire program first and then translates it into
machine code. 1. Translates the program line-by-line. 2. Converts the entire
program to machine code; when all the syntax errors have been removed,
execution takes place. 2. Each time the program is executed, every line is checked
for syntax error and then converted to equivalent machine code. 3. Execution time
is less 3. Execution time is more. 4. Machine code can be saved and used; source
code and compiler no longer needed. 4. Machine code cannot be saved; interpreter
is always required for translation.

2. The Compiler: A Literature Overview


The compiler is a programming-language processing program. The input is the source
code of the program to be compiled. The output is "object code", typically targetted to the
instruction set of a specific machine (e.g. PDP 11 instruction set is way different from the
x86 instruction set, which differs from the IBM 360/370 mainframe instruction set. Often
there is one more step to transform that object code to an executable file. That additional
step is generally called "linking". The linker let's you combine multiple object modules
from separate compilations, and library modules from yet other compilations to form a
combined executable that can draw on all the available routines.

Other output from a compiler may include a listing of the code, sometimes annotated
with some additional information to make it easy to see the compiler's understanding of
the nesting of the loops and other such structure of the program. Listings were a lot more
customary in the world of batch processing than they are today. But a subset of a full
listing, giving you a report of what errors the compiler found in the source code is still an
important output of the compiler run.
A compiler is a translator program that takes a program or module source in one language
and generates one or more equivalent modules in another.A compiler is characterised by 3
languages: the source language (the one it parses), the target language (the one it
generates), and its implementation language (the one in which it itself is written). While
very typically a compiler will generate machine-instruction level code that implements
the source code, this is certainly not the only possible relationship: for instance, it could
be the first step in translating a mass of code from one high-level language to another.
I’ve seen Pascal-to-C compilers before now.
Similarly, a compiler need not take its target machine code all the way to executable
binary images: UNIX compilers, for instance, typically generate human-readable
assembly-language source and hand it off to the system’s assembler, where all the
specialisations of the machine’s architecture will be known, rather than trying to have
every compiler duplicate them.
Compiler design principles provide an in-depth view of translation and optimization
process. Compiler design covers basic translation mechanism and error detection &
recovery. It includes lexical, syntax, and semantic analysis as front end, and code
generation and optimization as back-end.
Computers are a balanced mix of software and hardware. Hardware is just a piece of
mechanical device and its functions are being controlled by a compatible software.
Hardware understands instructions in the form of electronic charge, which is the
counterpart of binary language in software programming. Binary language has only two
alphabets, 0 and 1. To instruct, the hardware codes must be written in binary format,
which is simply a series of 1s and 0s. It would be a difficult and cumbersome task for
computer programmers to write such codes, which is why we have compilers to write
such codes.
Language Processing System
We have learnt that any computer system is made of hardware and software. The
hardware understands a language, which humans cannot understand. So we write
programs in high-level language, which is easier for us to understand and remember.
These programs are then fed into a series of tools and OS components to get the desired
code that can be used by the machine. This is known as Language Processing System.

The high-level language is converted into binary language in various phases. A compiler
is a program that converts high-level language to assembly language. Similarly, an
assembler is a program that converts the assembly language to machine-level language.
Let us first understand how a program, using C compiler, is executed on a host machine.
·1 User writes a program in C language (high-level language).
·2 The C compiler, compiles the program and translates it to assembly program
(low-level language).
·3 An assembler then translates the assembly program into machine code (object).
·4 A linker tool is used to link all the parts of the program together for execution
(executable machine code).
·5 A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few other
tools that work closely with compilers.
Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input
for compilers. It deals with macro-processing, augmentation, file inclusion, language
extension, etc.
Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine
language. The difference lies in the way they read the source code or input. A compiler
reads the whole source code at once, creates tokens, checks semantics, generates
intermediate code, executes the whole program and may involve many passes. In
contrast, an interpreter reads a statement from the input, converts it to an intermediate
code, executes it, then takes the next statement in sequence. If an error occurs, an
interpreter stops execution and reports it. whereas a compiler reads the whole program
even if it encounters several errors.
Assembler
An assembler translates assembly language programs into machine code.The output of an
assembler is called an object file, which contains a combination of machine instructions
as well as the data required to place these instructions in memory.
Linker
Linker is a computer program that links and merges various object files together in order
to make an executable file. All these files might have been compiled by separate
assemblers. The major task of a linker is to search and locate referenced module/routines
in a program and to determine the memory location where these codes will be loaded,
making the program instruction to have absolute references.
Loader
Loader is a part of operating system and is responsible for loading executable files into
memory and execute them. It calculates the size of a program (instructions and data) and
creates memory space for it. It initializes various registers to initiate execution.
Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for
platform (B) is called a cross-compiler.
Source-to-source Compiler
A compiler that takes the source code of one programming language and translates it into
the source code of another programming language is called a source-to-source compiler.
The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the
next phase of the compiler. Let us understand the phases of a compiler.
Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a
stream of characters and converts it into meaningful lexemes. Lexical analyzer represents
these lexemes in the form of tokens as:
<token-name, attribute-value>
Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by
lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token
arrangements are checked against the source code grammar, i.e. the parser checks if the
expression made by the tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of
language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their
types and expressions; whether identifiers are declared before use or not etc. The
semantic analyzer produces an annotated syntax tree as an output.
Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code
for the target machine. It represents a program for some abstract machine. It is in between
the high-level language and the machine language. This intermediate code should be
generated in such a way that it makes it easier to be translated into the target machine
code.
Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).
Code Generation
In this phase, the code generator takes the optimized representation of the intermediate
code and maps it to the target machine language. The code generator translates the
intermediate code into a sequence of (generally) re-locatable machine code. Sequence of
instructions of machine code performs the task as the intermediate code would do.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the
identifier's names along with their types are stored here. The symbol table makes it easier
for the compiler to quickly search the identifier record and retrieve it. The symbol table is
also used for scope management.
Activation Trees

The execution of a procedure is called its activation. An activation record contains all the
necessary information required to call a procedure. An activation record may contain the
following units (depending upon the source language used).
Storage Allocation
Runtime environment manages runtime memory requirements for the following entities:
·6 Code : It is known as the text part of a program that does not change at runtime.
Its memory requirements are known at the compile time.
·7 Procedures : Their text part is static but they are called in a random manner. That
is why, stack storage is used to manage procedure calls and activations.
·8 Variables : Variables are known at the runtime only, unless they are global or
constant. Heap memory allocation scheme is used for managing allocation and de-
allocation of memory for variables in runtime.
3. Future Trends on Assembler:
An assembler primarily serves as the bridge between symbolically coded instructions
written in assembly language and the computer processor, memory and other
computational components. An assembler works by assembling and converting the source
code of assembly language into object code or an object file that constitutes a stream of
zeros and ones of machine code, which are directly executable by the processor.

It's easier to describe a three pass generic implementation rather then a processor specific
two pass implementation.

The first pass is syntax. Expand all macros, create a list of labels and identify which
segment (x86) they are in, code, data, stack, extra, or other. On some risc processors extra
instructions are also inserted, or re-ordered to insure correct execution during conditional
branches. Compute the maximum size of each set of opcodes.

The second pass, generate the actual opcodes, and locks the size of each instruction down
so that you know the exact size of everything, and can compute the actual address of
every label (including forward references). And assign a code address for debugging
information for each original source line as well.

The third pass, you actually go patch in the exact address of all the symbols. Many times
this actually involves some sort of addition. (IE... relative jump for example).

Now you just have to write out the object data, symbols, and debugging information.
One pass assemblers perform single scan over the source code. If it encounters any
undefined label, it puts it into symbol table along with the address so that the label can be
replaced later when its value is encountered.
On the other hand two pass assembler performs two sequential scans over the source
code. It divides the procedure into two steps.
Pass 1: creates table for undefined symbols/labels and their values
Pass 2: generates the machine code
One pass assembler tends to be faster as two pass assembler requires rescanning.
Pass 1 assembler will parse the code in order to generate the machine code .
Pass 2 assembler will take longer to complie,but has the benefit of allowing the program
to define symbol anywhere in the code.

Assembly Programming Language:


Assembly language is a low-level programming language for a computer or other
programmable device specific to a particular computer architecture in contrast to most
high-level programming languages, which are generally portable across multiple systems.
Assembly language is converted into executable machine code by a utility program
referred to as an assembler like NASM, MASM, etc.
Assembly languages generally lack high-level conveniences such as variables and
functions, and they are not portable between various families of processors. They have
the same structures and set of commands as machine language, but allow a programmer
to use names instead of numbers. This language is still useful for programmers when
speed is necessary or when they need to carry out an operation that is not possible in
high-level languages.
High level languages know nearly nothing about the hardware and often have no ways of
accessing it. That's where assembly comes to rescue. The other use is size/speed
optimization of code.All high-level languages are being compiled down to ASM
automatically. This gives you no control over the final result, over what the computer
actually does. And while compilers are good, they cannot know all the little tricks and
optimize code to perfection. If you want maximum performance, you need full control
over what the computer does in every little detail, which a high-level language does not
give you.

In the early days of microprocessors all programs were written in assembly. No C- code
compilers existed for the minimal CPUs of the day and memory was so expensive and
processors so slow that no one dreamed of sacrificing any form of efficiency for reduced
development costs.Assembly may be redundant at the application level - depending on
the quality of the compiler, the hardware bus bandwidth and the required real-time
responsiveness of the system.
There are also times when coding directly in assembler is the right choice. But for
reduced development timeframe (and hence cost) we generally find C to be a winner over
assembler.
The well-known approach "first implement with high level language and then use
assembler only in no-other-option".You cannot complete a software design without a
knowledge of underlying Assembly because "Real programmers know
assembler".

Vous aimerez peut-être aussi