Académique Documents
Professionnel Documents
Culture Documents
Copyright °2002
c Tim Bower
The interface between a computer’s hardware and its software is its architecture. The architecture is described
by what the computer’s instructions do, and how they are specified. Understanding how it all works requires
knowledge of the structure of a computer and its assembly language.
What is a computer?
There are lots of machines in our world, but only some of those machines qualify as being a computer. What
features makes a machine a computer?
The very first machines which bore the label of a computer were designed using electro-mechanical switches.
These switches were large. The computers designed from them were more like automated adding machines than
today’s computers. A program written for these early machines was entered into the computer by setting an
array of relays to be either an electrical short or open circuit. This was often accomplished with the aid of a
panel of plug-in contact points and cables. After setting the relays, the program could be executed. To execute
a new program, the cables needed to be moved to form a new network of relays.
With the invention of the vacuum tube in the 1940s, faster computers could be designed which could also
run more complicated programs. The real genesis of modern computers, however, came with the practice of
storing a program in memory. The possibility to storing much larger programs in memory became reality with
the invention of ferrite core memory in the 1950s.
According to mathematician John von Neumann, for a machine to be a computer it must have the following:
3. A program counter.
Put another way, it must be programmable. A computer executes the following simple loop for each program.
pc= 0;
do {
instruction = memory[pc++];
decode( instruction );
fetch( operands );
execute;
store( results );
} while( instruction != halt );
Note:
• Instructions are the verbs and operands are the objects of this process.
• In some architectures, such as the SPARC, the program counter is advanced by a set amount after each
instruction is read. In the Intel x86, however, the size of the instruction varies. So as the instruction is
read and decoded, the amount which the program counter should be advanced is also determined.
The important computer architecture components from von Neumann’s stored program control computer are:
CPU Central processing unit The engine of the computer that executes programs.
ALU Arithmetic logic unit This is the part of the CPU that executes individual instructions involving data
(operands).
1
ALU Data
(memory)
registers
IR Instructions
PC
CPU
Computer Architecture Proposed by John von Neumann.
Register A memory location in the CPU which holds a fixed amount of data. Registers of most current systems
hold 32 bits or 4 bytes of data.
PC Program counter, also called the instruction pointer, is a register which holds the memory address of the
next instruction to be executed.
IR Instruction register A register which holds the current instruction being executed.
Acc Accumulator A register designated to hold the result of an operation performed by the ALU.
2
Who is John von Neumann?
John Louis von Neumann was born 28 December 1903 in Budapest, Hungary and Died 8 February 1957 in Washington
DC.
He was a brilliant mathematician, synthesizer, and promoter of the stored program concept, whose logical design of
the Institute for Advanced Studies (IAS) computer became the prototype of most of its successors - the von Neumann
Architecture.
Von Neumann was a child prodigy, born into a banking family in Budapest, Hungary. When only six years old he could
divide eight-digit numbers in his head.
At a time of political unrest in central Europe, he was invited to visit Princeton University in 1930, and when the Institute
for Advanced Studies was founded there in 1933, he was appointed to be one of the original six Professors of Mathematics,
a position which he retained for the remainder of his life. By the latter years of World War II von Neumann was playing
the part of an executive management consultant, serving on several national committees, applying his amazing ability to
rapidly see through problems to their solutions. Through this means he was also a conduit between groups of scientists
who were otherwise shielded from each other by the requirements of secrecy. He brought together the needs of the Los
Alamos National Laboratory (and the Manhattan Project) with the capabilities of the engineers at the Moore School of
Electrical Engineering who were building the ENIAC, and later built his own computer called the IAS machine. Several
“supercomputers” were built by National Laboratories as copies of his machine.
Following the war, von Neumann concentrated on the development of the IAS computer and its copies around the world.
His work with the Los Alamos group continued and he continued to develop the synergism between computer capabilities
and the needs for computational solutions to nuclear problems related to the hydrogen bomb.
His insights into the organization of machines led to the infrastructure which is now known as the “von Neumann Archi-
tecture”. However, von Neumann’s ideas were not along those lines originally; he recognized the need for parallelism in
computers but equally well recognized the problems of construction and hence settled for a sequential system of implemen-
tation. Through the report entitled “First Draft of a Report on the EDVAC” [1945], authored solely by von Neumann,
the basic elements of the stored program concept were introduced to the industry.
In the 1950’s von Neumann was employed as a consultant to IBM to review proposed and ongoing advanced technology
projects. One day a week, von Neumann “held court” with IBM. On one of these occasions in 1954 he was confronted
with the FORTRAN concept. John Backus remembered von Neumann being unimpressed with the concept of high level
languages and compilers.
Donald Gillies, one of von Neumann’s students at Princeton, and later a faculty member at the University of Illinois,
recalled in the mid-1970’s that the graduates students were being “used” to hand assemble programs into binary for their
early machine (probably the IAS machine). He took time out to build an assembler, but when von Neumann found out
about it he was very angry, saying (paraphrased), “It is a waste of a valuable scientific computing instrument to use it to
do clerical work.”
3
ALU Data ALU Data
(memory) (memory)
ACC
Stack
IR Instructions IR Instructions
PC PC
CPU CPU
Stack Machine Architecture. Accumulator Machine Architecture.
ALU Data
(memory)
Register
File
IR Instructions
PC
CPU
Load/Store Machine Architecture.
4
Example Machine Instructions
y = y + 10;
y’ ≡ &y
[y’] ≡ *y’ = *&y = y
Machine Instructions
Machine instructions are classified into the following three categories:
1. data transfer operations (memory ⇔ register, register ⇔ register)
2. arithmetic logic operations (add, sub, and, or, xor, shift, etc)
3. program control operations (branch, call, interrupt)
How the operands are specified is called the addressing mode. We will discuss addressing modes more later.
5
The Computer’s Software
The program instructions are stored in memory in machine code or machine language format. An assembler is
the program used to translate symbolic programs (assembly language) into machine language programs.
machine language Low level computer instructions that are encoded into binary words.
assembly language The lowest level human readable programming language. All of the detailed instructions
for the computer are listed. Assembly programs are directly encoded into machine code. Assembly code
can be written by humans, but is more typically produced by a compiler.
high level language Humans typically write programs in a language which allows program logic to be ex-
pressed at a conceptual level, ignoring the implementation details which are required of assembly language
programs.
Years ago, hardware efficiency was extracted at the expense of the programmer’s time. If a fast program was
needed, then it was written in assembly language. Compilers were capable of translating programs from high
level languages, but they generated assembly language programs that were relatively inefficient as compared with
the same programs written by a programmer in assembly language. Programmers often found it necessary to
optimize the assembly language code created by a compiler to improve the performance and reduce the memory
requirements of the program.
This is no longer the case. Compilers have improved to the point that they can generate code comparable to,
or better than, the code most programmers can generate. Even if hand crafted optimizations could improve the
performance, there is little benefit derived from such a laborious activity. Many computers today execute so fast
and have enough memory that it is not necessary to optimize code at the assembly language level.
So, since it is increasingly rare for programmers to work at the assembly language level, why is it necessary
to learn assembly language? There are actually several reasons to study assembly language.
1. To understand or work on an operating system. Operating systems need to execute instructions which
can not be expressed in a high level language, so it is necessary that a portion of an operating system be
written in assembly language. Some instances when an operating system needs assembly language include:
initializing the hardware and data in the CPU at boot time, handling interrupts, low level interfaces with
hardware peripherals, and cases when a compiler’s protection features interfere with the needed operations.
3. Real time or embedded systems programming where there may be critical constraints for a program related
either to performance or available memory. In some cases with embedded systems, a compiler may not be
available.
4. To understand the internal working of a computer. Computer architecture can best be understood when
assembly language is used to supplement the study of computer architecture. Assembly language code does
not hide details about what the computer is doing.
6
required four or more parameters. As an example, the IBM System/370 has a single instruction that copies
a character string of arbitrary length from any location in memory to any other location in memory, while
translating characters according to a table stored in memory.
Computers which feature a large number of complex instructions are classified as complex instruction set
computers (CISC). Other examples of CISC computers include the Digital Equipment VAX and the Intel x86
line of processors. The DEC VAX has more than 200 instructions, dozens of distinct addressing modes and
instructions with as many as six operands.
The complexity of CISC was accommodated by the introduction of microprogramming or microcode. Mi-
crocode composed of low-level hardware instructions that implement high-level instructions required by an ar-
chitecture. Microcode was placed in ROM or control-store RAM (which is more expensive, but faster than the
ferrite-core memory used in many computers).
However, not all computer designers fell in line with the CISC philosophy. Seymore Cray, for one, believed that
complexity was bad, and continued to build the fastest computers in the world by using simple, register-oriented
instruction sets. Cray was a proponent of the Reduced Instruction Set Computer (RISC), which is the antidote
to CISC. The CDC 6600 and the Cray-1 supercomputer were the precursors of modern RISC architectures. In
1975, Cray made the following remarks about his computer design:
[Registers] made the instructions very simple. . . That is somewhat unique. Most machines have rather
elaborate instruction sets involving many more memory references in the instructions than the ma-
chines I have designed. Simplicity, I guess, is a way of saying it. I am all for simplicity. If it’s very
complicated, I cannot understand it.
Various technological changes in the 1980s made the architectural assumptions of the 1970s no longer valid.
• Faster (10 times or more) and cheaper semiconductor memory and integrated circuits began to replace
ferrite-core and transistor based discrete circuits.
• The invention of cache memories substantially improved the speed of non-microcoded programs.
• Compiler technology had progressed rapidly; optimizing compilers generate code that used only a small
subset of most instruction sets.
A new set of simplified design criteria emerged:
• Instructions should be simple unless there is a good reason for complexity. To be worthwhile, a new
instruction that increases cycle time by 10% must reduce the total number of cycles executed by at least
10%.
• Microcode is generally no faster than sequences of hardwired instructions. Moving software into microcode
does not make it better. It just makes it harder to modify.
• Fixed–format instructions and pipelined1 execution are more important than program size. As memory
becomes cheaper and faster, the space/time tradeoff resolved in favor of time — reducing space no longer
decreases time.
• Compiler technology should simplify instructions, rather than generate more complex instructions. Instead
of adding a complicated microcoded instruction, optimizing compilers can generate sequences of simple,
fast instructions to do the job. Operands can be kept in registers to increase speed even faster.
What is RISC?
Assembly language programs occasionally use large sets of machine instructions, whereas high–level language
compilers generally do not. For example, SUN’s C compiler uses only about 30% of the available Motorola 68020
instructions. Studies show that approximately 80% of computations for a typical program requires only 20% of
a processor’s instruction set.
The designers of RISC machines strive for hardware simplicity, with close cooperation between machine
architecture and compiler design. In order to add a new instruction, computer architects must ask:
1 Pipelining relates to parallelizing the steps in the loop of instruction executing. The next instruction is fetched and decoded
while the current instruction is executing. We will discuss pipelining more when we study the Sun SPARC architecture.
7
• to what extent would the added instruction improve performance and is it worth the cost of implementation
?
• no matter how useful it is in an isolated instance, would it make all others perform more slowly by its mere
presence ?
The goal of RISC architecture is to maximize the effective speed of a design by performing infrequent functions
in software and by including in hardware only features that yield a net performance gain. Performance gains are
measured by conducting detailed studies of large high–level language programs.
RISC architectures eliminate complicated instructions that require microcode support.
RISC Architecture
The following characteristics are typical of RISC architectures. Although none of these are required for an
architecture to be called RISC, this list does describe most current RISC architectures, including the SPARC
design.
2. Hardwired control with little or no microcode: Microcode adds a level of complexity and raises the number
of cycles per instruction.
3. Load/Store, register-to-register design: All computational instructions involve registers. Memory accesses
are made with only load and store instructions.
4. Simple fixed-format instructions with few addressing modes: All instructions are the same length (typically
32 bits) and have just a few ways to address memory.
5. Pipelining: The instruction set design allows for the processing of several instructions at the same time.
6. High–performance memory: RISC machines have at least 32 general–purpose registers and large cache
memory.
7. Migration of functions to software: Only those features that measurably improve performance are imple-
mented in hardware. Software contains sequences of simple instructions for executing complex functions
rather than complex instructions themselves, which improves system efficiency.
8. More concurrency is visible to software: For example, branches take effect after execution of the following
instruction, permitting a fetch of the next instruction during execution of the current instruction.
The real keys to enhanced performance are single-cycle execution and keeping the cycle time as short as
possible. Many characteristics of RISC architectures, such as load/store and register-to-register design, facilitate
single-cycle execution. Simple fixed-format instructions on the other hand, permit shorter cycles by reducing
decoding time.
8
Memory
bus
L1 Cache
• Principle of locality. Spatial locality refers to the fact that memory that is physically located closer to the
CPU can be accessed faster. Temporal locality refers to the tendency of programs to access the same data
several times in a short period of time.
• Smaller is faster.
These are the levels in a typical memory hierarchy. Moving farther away from the CPU, the memory in the
level becomes larger and slower.
When a memory lookup is required, the L1 cache is searched first. If the data is found, this is called a hit.
If the data is not in L1 cache, this is called a miss and the L2 cache is checked. If the data is not in the L2
cache, then the data is retrieved from main memory. When there is a miss at either the L1 or L2 cache, the data
retrieved from the next level is saved in the cache for future use. Cache hits make the program run much faster
than if all memory accesses must go to the main memory.
The connection between the CPU and main memory is called the front-side bus. A common design is for the
front-side bus to be divided into four channels. If the front-side bus speed is listed at 800 MHz, it is probably four
channels each running at 200 MHz. The connection between the CPU and the L2 cache is called the backside
bus.
9
Registers
Larger L1 Cache Faster,
memory L2 Cache more expensive
quantity Main Memory memory
Disk
Integer Variables
Unsigned variables that generally fall into the category of integers (char, short, int, long) are stored in straight
binary format, beginning with all zeros for zero up to all ones for the largest number that can be represented by
the data type.
The signed variables that generally fall into the category of integers (char, short, int, long) are stored in 2’s –
compliment format. This ensures that the binary digits represent a continuous number line from the most negative
number to the largest positive number with zero being represented with all zero bits. The most significant bit is
considered the sign bit. The sign bit is one for negative numbers and zero for positive numbers.
-2,147,483,648 0x80000000
-2,147,483,647 0x80000001
.. ..
. .
-32,768 0xffff8000 0x8000
-32,767 0xffff8001 0x8001
.. .. ..
. . .
-2 0xfffffffe 0xfffe
-1 0xffffffff 0xffff
0 0x00000000 0x0000
1 0x00000001 0x0001
.. .. ..
. . .
32,767 0x00007fff 0x7fff
.. ..
. .
2,147,483,647 0x7fffffff
Any two binary numbers can thus be added together in a straight forward manner to get the correct answer.
If there is a carry bit beyond what the data type can represent, it is discarded.
1 0x0001
+(-1) + 0xffff
------ ---------
0 0x0000
To change the sign of any number, invert all the bits and add 1.
2 = 0x0002 = 000...010 ==> 111...101
+ 1
-----------
111...110 = 0xfffe = -2
10
X+ 8 X+8
00010010 X+4 X+5 X+6 X+7 00010010 X+7 X+6 X+5 X+4
X−8 X−8
Byte Order
Not all computers store the bits (and bytes) of a variable in the same order. The Intel x86 line of processors
stores the least significant bit in the lowest memory address (right most position) and the most significant bit in
the highest memory address. This scheme is called Little Endian.
Sun SPARC and most other UNIX platforms do the opposite. They store the most significant byte in the
lowest memory address. SPARC is thus considered a Big Endian machine. In a TCP/IP packet, the first
transmitted data is the most significant byte, thus the Internet is considered Big Endian.
The lowest memory address is considered the memory address for a variable. Hence we see a difference
between Little Endian and Big Endian when we draw memory maps. With Little Endian (Intel) we label the
location of an address on the right side of the map. With Big Endian (SPARC), labels are placed on the left side
of the map.
The term is used because of an analogy with the story Gulliver’s Travels, in which Jonathan Swift imagined
a never-ending fight between the kingdoms of the Big-Endians and the Little-Endians, whose only difference is
in where they crack open a hard-boiled egg.
11
Single Precision
s exp mantissa
1 8 23
32 bits
Double Precision
s exp mantissa
1 11 52
64 bits
IEEE FPS floating point formats.
3. The digits to the right of the binary point are then stored as the mantissa starting with the most significant
bits of the mantissa field. Because all numbers are normalized, there is no need to store the leading 1.
Note: Because the leading 1 is dropped, it is no longer proper to refer to the stored value as the mantissa.
In IEEE terms, this mantissa minus its leading digit is called the significand.
4. Add 127 to the exponent and convert the resulting sum to binary for the stored exponent value. For double
precision, add 1023 to the exponent. Be sure to include all 8 or 11 bits of the exponent.
5. The sign bit is a one for negative numbers and a zero for positive numbers.
6. Compilers often express FPS numbers in hexadecimal, so a quick conversion to hexadecimal might be
desired.
12
• 100 = 1100100 (binary)
= 1.100100 x 2^6 sign = 0, significand = 100100...,
exponent = 6 + 127 = 133 = 10000101
management, and device control. It is not related to the user interface or utilities provided by the OS.
13
CPU when an interrupt is received. Thus, the reception of an interrupt is how user programs are suspended and
processing switched to the kernel.
Once the kernel gets control, it will want to save more registers from the user program, handle the hardware
event and check if work needs to be done related to internal operations such as memory or process management.
Then finally, the kernel will let a user program run again. In doing so, it will restore some registers and issue a
special instruction that causes the final registers to be restored and processing to switch back to the user program.
Since all the registers are restored, the user program never knows that it was interrupted.
There are three types of interrupts which the CPU recognizes.
Hardware Interrupt This is any type of hardware event such as a key pressed on the keyboard, a hard disk
completing the reading or writing of data, or the reception of an ethernet packet, etc. . . . Many operating
systems program a clock to issue interrupts at regular intervals so that the kernel is guaranteed to get
control on a regular basis even if no hardware events occur and a user program never releases the CPU.
Software Interrupt When a user program needs to make a system call to the operating system, such as for
I/O or to request more memory, it may issue a special instruction called a software interrupt to cause
the CPU to switch processing to the kernel.
Trap A trap is issued by the CPU itself when it detects that something is wrong or needs special attention.
In most cases a trap is issued when a user program performs an illegal instruction such as a divide by
zero error or illegal memory reference. In the Sun SPARC, there are some traps which occur in normal
processing of a program.
Most of the kernel’s code is termed reentrant, meaning that additional interrupts may be received even while
processing a previous interrupt. There are special assembly language instructions to turn interrupts off or on.
Interrupts are turned off in critical sections of the kernel when an interrupt will cause memory corruption in the
kernel. When interrupts are turned off, interrupts are queued by the hardware and will be issued when interrupts
are turned on again. A critical concern in operating system design is knowing when to turn interrupts off and
on. Interrupts should be left on except when absolutely necessary. Thus operating systems use clever algorithms
to make as much of the kernel reentrant as possible.
More will be discussed about operating systems as related to computer architecture and assembly language
later in the semester after more specifics of the processors and assembly language have been covered.
14