Computer Architecture Basic

Computer Architecture Basics
CIS 450 – Computer Organization and Architecture
Copyright °2002
c Tim Bower
The interface between a computer’s hardware and its software is its architecture. The architecture is described
by what the computer’s instructions do, and how they are specified. Understanding how it all works requires
knowledge of the structure of a computer and its assembly language.
What is a computer?
There are lots of machines in our world, but only some of those machines qualify as being a computer. What
features makes a machine a computer?
The very first machines which bore the label of a computer were designed using electro-mechanical switches.
These switches were large. The computers designed from them were more like automated adding machines than
today’s computers. A program written for these early machines was entered into the computer by setting an
array of relays to be either an electrical short or open circuit. This was often accomplished with the aid of a
panel of plug-in contact points and cables. After setting the relays, the program could be executed. To execute
a new program, the cables needed to be moved to form a new network of relays.
With the invention of the vacuum tube in the 1940s, faster computers could be designed which could also
run more complicated programs. The real genesis of modern computers, however, came with the practice of
storing a program in memory. The possibility to storing much larger programs in memory became reality with
the invention of ferrite core memory in the 1950s.
According to mathematician John von Neumann, for a machine to be a computer it must have the following:
1. Addressable memory that holds both instructions and data.
2. An arithmetic logic unit.
3. A program counter.
Put another way, it must be programmable. A computer executes the following simple loop for each program.
pc= 0;
do {
instruction = memory[pc++];
decode( instruction );
fetch( operands );
execute;
store( results );
} while( instruction != halt );
Note:
• Instructions are the verbs and operands are the objects of this process.
• In some architectures, such as the SPARC, the program counter is advanced by a set amount after each
instruction is read. In the Intel x86, however, the size of the instruction varies. So as the instruction is
read and decoded, the amount which the program counter should be advanced is also determined.
The important computer architecture components from von Neumann’s stored program control computer are:
CPU Central processing unit The engine of the computer that executes programs.
ALU Arithmetic logic unit This is the part of the CPU that executes individual instructions involving data
(operands).
1
ALU Data
(memory)
registers
IR Instructions
PC
CPU
Computer Architecture Proposed by John von Neumann.
Register A memory location in the CPU which holds a fixed amount of data. Registers of most current systems
hold 32 bits or 4 bytes of data.
PC Program counter, also called the instruction pointer, is a register which holds the memory address of the
next instruction to be executed.
IR Instruction register A register which holds the current instruction being executed.
Acc Accumulator A register designated to hold the result of an operation performed by the ALU.
Register File A collection of several registers.
Fundamental Computer Architectures

Here we describe the most common Computer Architectures, all of which use stored program control.
The Stack Machine

A stack machine implements a stack with registers. The operands of the ALU are always the top two registers
of the stack and the result from the ALU is stored in the top register of the stack.
Examples of the stack machine include Hewlett–Packard RPN calculators and the Java Virtual Machine
(JVM).
The advantage of a stack machine is it can shorten the length of instructions since operands are implicit. This
was important when memory was expensive (20-30 years ago). Now, in Java, it is important since we want to
ship executables (class files) over the network.
The Accumulator Machine

An accumulator machine has a special register, called an accumulator, whose contents are combined with another
operand as input to the ALU, with the result of the operation replacing the contents of the accumulator.
2
Who is John von Neumann?
John Louis von Neumann was born 28 December 1903 in Budapest, Hungary and Died 8 February 1957 in Washington
DC.
He was a brilliant mathematician, synthesizer, and promoter of the stored program concept, whose logical design of
the Institute for Advanced Studies (IAS) computer became the prototype of most of its successors - the von Neumann
Architecture.
Von Neumann was a child prodigy, born into a banking family in Budapest, Hungary. When only six years old he could
divide eight-digit numbers in his head.
At a time of political unrest in central Europe, he was invited to visit Princeton University in 1930, and when the Institute
for Advanced Studies was founded there in 1933, he was appointed to be one of the original six Professors of Mathematics,
a position which he retained for the remainder of his life. By the latter years of World War II von Neumann was playing
the part of an executive management consultant, serving on several national committees, applying his amazing ability to
rapidly see through problems to their solutions. Through this means he was also a conduit between groups of scientists
who were otherwise shielded from each other by the requirements of secrecy. He brought together the needs of the Los
Alamos National Laboratory (and the Manhattan Project) with the capabilities of the engineers at the Moore School of
Electrical Engineering who were building the ENIAC, and later built his own computer called the IAS machine. Several
“supercomputers” were built by National Laboratories as copies of his machine.
Following the war, von Neumann concentrated on the development of the IAS computer and its copies around the world.
His work with the Los Alamos group continued and he continued to develop the synergism between computer capabilities
and the needs for computational solutions to nuclear problems related to the hydrogen bomb.
His insights into the organization of machines led to the infrastructure which is now known as the “von Neumann Archi-
tecture”. However, von Neumann’s ideas were not along those lines originally; he recognized the need for parallelism in
computers but equally well recognized the problems of construction and hence settled for a sequential system of implemen-
tation. Through the report entitled “First Draft of a Report on the EDVAC” [1945], authored solely by von Neumann,
the basic elements of the stored program concept were introduced to the industry.
In the 1950’s von Neumann was employed as a consultant to IBM to review proposed and ongoing advanced technology
projects. One day a week, von Neumann “held court” with IBM. On one of these occasions in 1954 he was confronted
with the FORTRAN concept. John Backus remembered von Neumann being unimpressed with the concept of high level
languages and compilers.
Donald Gillies, one of von Neumann’s students at Princeton, and later a faculty member at the University of Illinois,
recalled in the mid-1970’s that the graduates students were being “used” to hand assemble programs into binary for their
early machine (probably the IAS machine). He took time out to build an assembler, but when von Neumann found out
about it he was very angry, saying (paraphrased), “It is a waste of a valuable scientific computing instrument to use it to
do clerical work.”
Source: http://ei.cs.vt.edu/ history/VonNeumann.html
3
ALU Data ALU Data
(memory) (memory)
ACC
Stack
IR Instructions IR Instructions
PC PC
CPU CPU
Stack Machine Architecture. Accumulator Machine Architecture.
ALU Data
(memory)
Register
File
IR Instructions
PC
CPU
Load/Store Machine Architecture.
4
Example Machine Instructions
y = y + 10;
y’ ≡ &y
[y’] ≡ *y’ = *&y = y
Stack Machine Accumulator Machine Load/Store Machine
push [y’] load [y’] load r0, [y’]

push 10 add 10 load r1, 10
add store y’ add r0, r1, r2
pop y’ store r2, y’
accumulator = accumulator [op] operand;

In fact, many machines have more than one accumulator
Pentium: 1, 2, 4, or 6 (depending on how you count)
MC68000: 16
In order to add two numbers in memory,

1. place one of the numbers into the accumulator (load operand)
2. execute the add instruction
3. store the contents of the accumulator back into memory (store operand)
The Load/Store Machine

Registers: provide faster access but are expensive.
Memory: provides slower access but is less expensive.
A small amount of high speed memory (expensive), called a register file, is provided for frequently accessed
variables and a much larger slower memory (less expensive) is provided for the rest of the program and data.
(SPARC: 32 registers at any one time)
This is based on the principle of “locality” — at a given time, a program typically accesses a small number
of variables much more frequently than others.
The machine loads and stores the registers from memory. The arithmetic and logic instructions operate with
registers, not main memory, for the location of operands.
Since the machine addresses only a small number of registers, the instruction field to refer to a register
(operand) is short; therefore, these machines frequently have instructions with three operands:
add src1, src2, dest
Machine Instructions
Machine instructions are classified into the following three categories:
1. data transfer operations (memory ⇔ register, register ⇔ register)
2. arithmetic logic operations (add, sub, and, or, xor, shift, etc)
3. program control operations (branch, call, interrupt)
How the operands are specified is called the addressing mode. We will discuss addressing modes more later.
5
The Computer’s Software
The program instructions are stored in memory in machine code or machine language format. An assembler is
the program used to translate symbolic programs (assembly language) into machine language programs.
machine language Low level computer instructions that are encoded into binary words.
assembly language The lowest level human readable programming language. All of the detailed instructions
for the computer are listed. Assembly programs are directly encoded into machine code. Assembly code
can be written by humans, but is more typically produced by a compiler.
high level language Humans typically write programs in a language which allows program logic to be ex-
pressed at a conceptual level, ignoring the implementation details which are required of assembly language
programs.
Years ago, hardware efficiency was extracted at the expense of the programmer’s time. If a fast program was
needed, then it was written in assembly language. Compilers were capable of translating programs from high
level languages, but they generated assembly language programs that were relatively inefficient as compared with
the same programs written by a programmer in assembly language. Programmers often found it necessary to
optimize the assembly language code created by a compiler to improve the performance and reduce the memory
requirements of the program.
This is no longer the case. Compilers have improved to the point that they can generate code comparable to,
or better than, the code most programmers can generate. Even if hand crafted optimizations could improve the
performance, there is little benefit derived from such a laborious activity. Many computers today execute so fast
and have enough memory that it is not necessary to optimize code at the assembly language level.
So, since it is increasingly rare for programmers to work at the assembly language level, why is it necessary
to learn assembly language? There are actually several reasons to study assembly language.
1. To understand or work on an operating system. Operating systems need to execute instructions which
can not be expressed in a high level language, so it is necessary that a portion of an operating system be
written in assembly language. Some instances when an operating system needs assembly language include:
initializing the hardware and data in the CPU at boot time, handling interrupts, low level interfaces with
hardware peripherals, and cases when a compiler’s protection features interfere with the needed operations.
2. To understand or work on a compiler.
3. Real time or embedded systems programming where there may be critical constraints for a program related
either to performance or available memory. In some cases with embedded systems, a compiler may not be
available.
4. To understand the internal working of a computer. Computer architecture can best be understood when
assembly language is used to supplement the study of computer architecture. Assembly language code does
not hide details about what the computer is doing.
Complex Instruction Sets and Reduced Instruction Sets

Another important classification of types of computer architectures relates to the available set of instructions
for the processor. Here we discuss the historical background and technical differences between two types of
processors.
If memory is an expensive and limited resource, there is a large benefit in reducing the size of a program.
During the 1960s and 1970s, memory was at a premium. Therefore, much effort was expended on minimizing
the size of individual instructions and minimizing the number of instructions necessary to implement a program.
During this time period, almost all computer designers believed that rich instruction sets would simplify compiler
design and improve the quality of computer architecture.
New instructions were developed to replace frequently used sequences of instructions. For example, a loop
variable is often decremented, followed by a branch operation if the result is positive. New architectures therefore
introduced a single instruction to decrement a variable and branch conditionally based on the result. Some
instructions came to be more like a procedure than a simple operation. Some of these powerful single instructions
6
required four or more parameters. As an example, the IBM System/370 has a single instruction that copies
a character string of arbitrary length from any location in memory to any other location in memory, while
translating characters according to a table stored in memory.
Computers which feature a large number of complex instructions are classified as complex instruction set
computers (CISC). Other examples of CISC computers include the Digital Equipment VAX and the Intel x86
line of processors. The DEC VAX has more than 200 instructions, dozens of distinct addressing modes and
instructions with as many as six operands.
The complexity of CISC was accommodated by the introduction of microprogramming or microcode. Mi-
crocode composed of low-level hardware instructions that implement high-level instructions required by an ar-
chitecture. Microcode was placed in ROM or control-store RAM (which is more expensive, but faster than the
ferrite-core memory used in many computers).
However, not all computer designers fell in line with the CISC philosophy. Seymore Cray, for one, believed that
complexity was bad, and continued to build the fastest computers in the world by using simple, register-oriented
instruction sets. Cray was a proponent of the Reduced Instruction Set Computer (RISC), which is the antidote
to CISC. The CDC 6600 and the Cray-1 supercomputer were the precursors of modern RISC architectures. In
1975, Cray made the following remarks about his computer design:
[Registers] made the instructions very simple. . . That is somewhat unique. Most machines have rather
elaborate instruction sets involving many more memory references in the instructions than the ma-
chines I have designed. Simplicity, I guess, is a way of saying it. I am all for simplicity. If it’s very
complicated, I cannot understand it.
Various technological changes in the 1980s made the architectural assumptions of the 1970s no longer valid.
• Faster (10 times or more) and cheaper semiconductor memory and integrated circuits began to replace
ferrite-core and transistor based discrete circuits.
• The invention of cache memories substantially improved the speed of non-microcoded programs.
• Compiler technology had progressed rapidly; optimizing compilers generate code that used only a small
subset of most instruction sets.
A new set of simplified design criteria emerged:
• Instructions should be simple unless there is a good reason for complexity. To be worthwhile, a new
instruction that increases cycle time by 10% must reduce the total number of cycles executed by at least
10%.
• Microcode is generally no faster than sequences of hardwired instructions. Moving software into microcode
does not make it better. It just makes it harder to modify.
• Fixed–format instructions and pipelined1 execution are more important than program size. As memory
becomes cheaper and faster, the space/time tradeoff resolved in favor of time — reducing space no longer
decreases time.
• Compiler technology should simplify instructions, rather than generate more complex instructions. Instead
of adding a complicated microcoded instruction, optimizing compilers can generate sequences of simple,
fast instructions to do the job. Operands can be kept in registers to increase speed even faster.
What is RISC?
Assembly language programs occasionally use large sets of machine instructions, whereas high–level language
compilers generally do not. For example, SUN’s C compiler uses only about 30% of the available Motorola 68020
instructions. Studies show that approximately 80% of computations for a typical program requires only 20% of
a processor’s instruction set.
The designers of RISC machines strive for hardware simplicity, with close cooperation between machine
architecture and compiler design. In order to add a new instruction, computer architects must ask:
1 Pipelining relates to parallelizing the steps in the loop of instruction executing. The next instruction is fetched and decoded
while the current instruction is executing. We will discuss pipelining more when we study the Sun SPARC architecture.
7
• to what extent would the added instruction improve performance and is it worth the cost of implementation
?
• no matter how useful it is in an isolated instance, would it make all others perform more slowly by its mere
presence ?
The goal of RISC architecture is to maximize the effective speed of a design by performing infrequent functions
in software and by including in hardware only features that yield a net performance gain. Performance gains are
measured by conducting detailed studies of large high–level language programs.
RISC architectures eliminate complicated instructions that require microcode support.
RISC Architecture
The following characteristics are typical of RISC architectures. Although none of these are required for an
architecture to be called RISC, this list does describe most current RISC architectures, including the SPARC
design.
1. Single–cycle execution: Most instructions are executed in a single machine cycle.
2. Hardwired control with little or no microcode: Microcode adds a level of complexity and raises the number
of cycles per instruction.
3. Load/Store, register-to-register design: All computational instructions involve registers. Memory accesses
are made with only load and store instructions.
4. Simple fixed-format instructions with few addressing modes: All instructions are the same length (typically
32 bits) and have just a few ways to address memory.
5. Pipelining: The instruction set design allows for the processing of several instructions at the same time.
6. High–performance memory: RISC machines have at least 32 general–purpose registers and large cache
memory.
7. Migration of functions to software: Only those features that measurably improve performance are imple-
mented in hardware. Software contains sequences of simple instructions for executing complex functions
rather than complex instructions themselves, which improves system efficiency.
8. More concurrency is visible to software: For example, branches take effect after execution of the following
instruction, permitting a fetch of the next instruction during execution of the current instruction.
The real keys to enhanced performance are single-cycle execution and keeping the cycle time as short as
possible. Many characteristics of RISC architectures, such as load/store and register-to-register design, facilitate
single-cycle execution. Simple fixed-format instructions on the other hand, permit shorter cycles by reducing
decoding time.
Early RISC Machines

In the mid 1970s, some computer architects observed that even complex computers execute mostly simple in-
structions. This observation led to work on the IBM 801 – the first intentional RISC machine (even though the
term RISC had yet to be coined).
The term RISC was coined as part of David Patterson’s 1980 course in microprocessor design at the University
of California at Berkeley. The RISC-I chip design was completed in 1982, and the RISC-II chip design was
completed in 1984. The RISC-II was a 32-bit microprocessor with 138 registers, and a 330-ns (3 MHz) cycle
time. Without the aid of elaborate compiler technology, the RISC-II outperformed the VAX 11/780 at integer
arithmetic.
8
Memory
bus
CPU L2 Main I/O

Cache Memory Devices
Register file (disk)
L1 Cache
Registers L1 Cache L2 Cache Main Memory Disk Memory
Size: 200 B 128 KB 256 KB 128 MB 30 GB

Speed: 5 ns 6 ns 10ns 100 ns 5ms
Memory Hierarchy Design

Memory hierarchy design is based on three important principles:
• Make the common case fast.
• Principle of locality. Spatial locality refers to the fact that memory that is physically located closer to the
CPU can be accessed faster. Temporal locality refers to the tendency of programs to access the same data
several times in a short period of time.
• Smaller is faster.
These are the levels in a typical memory hierarchy. Moving farther away from the CPU, the memory in the
level becomes larger and slower.
When a memory lookup is required, the L1 cache is searched first. If the data is found, this is called a hit.
If the data is not in L1 cache, this is called a miss and the L2 cache is checked. If the data is not in the L2
cache, then the data is retrieved from main memory. When there is a miss at either the L1 or L2 cache, the data
retrieved from the next level is saved in the cache for future use. Cache hits make the program run much faster
than if all memory accesses must go to the main memory.
The connection between the CPU and main memory is called the front-side bus. A common design is for the
front-side bus to be divided into four channels. If the front-side bus speed is listed at 800 MHz, it is probably four
channels each running at 200 MHz. The connection between the CPU and the L2 cache is called the backside
bus.
Binary Representation of Data

Here we briefly consider the format used to store data variables in memory and in registers. If you need more
details than is provided here, then check your notes from EECE 241 or other resources.
9
Registers
Larger L1 Cache Faster,
memory L2 Cache more expensive
quantity Main Memory memory
Disk
Integer Variables
Unsigned variables that generally fall into the category of integers (char, short, int, long) are stored in straight
binary format, beginning with all zeros for zero up to all ones for the largest number that can be represented by
the data type.
The signed variables that generally fall into the category of integers (char, short, int, long) are stored in 2’s –
compliment format. This ensures that the binary digits represent a continuous number line from the most negative
number to the largest positive number with zero being represented with all zero bits. The most significant bit is
considered the sign bit. The sign bit is one for negative numbers and zero for positive numbers.
Decimal int (hex) short (hex)
-2,147,483,648 0x80000000
-2,147,483,647 0x80000001
.. ..
. .
-32,768 0xffff8000 0x8000
-32,767 0xffff8001 0x8001
.. .. ..
. . .
-2 0xfffffffe 0xfffe
-1 0xffffffff 0xffff
0 0x00000000 0x0000
1 0x00000001 0x0001
.. .. ..
. . .
32,767 0x00007fff 0x7fff
.. ..
. .
2,147,483,647 0x7fffffff
Any two binary numbers can thus be added together in a straight forward manner to get the correct answer.
If there is a carry bit beyond what the data type can represent, it is discarded.
1 0x0001
+(-1) + 0xffff
------ ---------
0 0x0000
To change the sign of any number, invert all the bits and add 1.
2 = 0x0002 = 000...010 ==> 111...101
+ 1
-----------
111...110 = 0xfffe = -2
10
X+ 8 X+8
00010010 X+4 X+5 X+6 X+7 00010010 X+7 X+6 X+5 X+4
0x12 0x34 0x56 0x78 0x12 0x34 0x56 0x78 X

X
X+1 X+2 X+3 X+3 X+2 X+1
X−4 X−3 X−2 X−1 X−1 X−2 X−3 X−4
X−8 X−8
Big Endian Little Endian
Memory at Address X contains 0x12345678

Big/Little Endian Memory Maps
Conversions of Integer Variables

It is often necessary to convert a smaller data type to a larger type. For this, there are either special instructions
(Intel x86), or a sequence of a couple simple instructions (Sun SPARC) to promote a variable to a larger data
type.
If the variable is unsigned, then extra zeros are just filled into the most significant bits (movezx move - zero
extending, for Intel x86).
For signed variables, then the sign bit needs to be extended to fill the most significant bits (movesx move -
sign extending, for Intel x86).
0x6fa1 ==> 0x00006fa1 (sign extend a positive number)
0xfffe ==> 0xfffffffe (sign extend a negative number)

0x9002 ==> 0xffff9002 (sign extend a negative number)
Byte Order
Not all computers store the bits (and bytes) of a variable in the same order. The Intel x86 line of processors
stores the least significant bit in the lowest memory address (right most position) and the most significant bit in
the highest memory address. This scheme is called Little Endian.
Sun SPARC and most other UNIX platforms do the opposite. They store the most significant byte in the
lowest memory address. SPARC is thus considered a Big Endian machine. In a TCP/IP packet, the first
transmitted data is the most significant byte, thus the Internet is considered Big Endian.
The lowest memory address is considered the memory address for a variable. Hence we see a difference
between Little Endian and Big Endian when we draw memory maps. With Little Endian (Intel) we label the
location of an address on the right side of the map. With Big Endian (SPARC), labels are placed on the left side
of the map.
The term is used because of an analogy with the story Gulliver’s Travels, in which Jonathan Swift imagined
a never-ending fight between the kingdoms of the Big-Endians and the Little-Endians, whose only difference is
in where they crack open a hard-boiled egg.
11
Single Precision
s exp mantissa
1 8 23
32 bits
Double Precision
s exp mantissa
1 11 52
64 bits
IEEE FPS floating point formats.
Floating Point Variables

Floating point variables have been represented in many different ways inside computers of the past. But there is
now a well adhered to standard for the representation of floating point variables. The standard is known as the
IEEE Floating Point Standard (FPS). Like scientific notation, FPS represents numbers with multiple parts, a
sign bit, one part specifying the mantissa and a part representing the exponent. The mantissa is represented as a
signed magnitude integer (i.e., not 2’s Compliment), where the value is normalized. The exponent is represented
as an unsigned integer which is biased to accommodate negative numbers. An 8-bit unsigned value would normally
have a range of 0 to 255, but 127 is added to the exponent, giving it a range of -126 to +127.
Follow these steps to convert a number to FPS format.
1. First convert the number to binary.

2. Normalize the number so that there is one nonzero digit to the left of the binary place, adjusting the
exponent as necessary.
3. The digits to the right of the binary point are then stored as the mantissa starting with the most significant
bits of the mantissa field. Because all numbers are normalized, there is no need to store the leading 1.
Note: Because the leading 1 is dropped, it is no longer proper to refer to the stored value as the mantissa.
In IEEE terms, this mantissa minus its leading digit is called the significand.
4. Add 127 to the exponent and convert the resulting sum to binary for the stored exponent value. For double
precision, add 1023 to the exponent. Be sure to include all 8 or 11 bits of the exponent.
5. The sign bit is a one for negative numbers and a zero for positive numbers.
6. Compilers often express FPS numbers in hexadecimal, so a quick conversion to hexadecimal might be
desired.
Here are some examples using single precision FPS.
• 3.5 = 11.1 (binary)

= 1.11 x 2^1 sign = 0, significand = 1100...,
exponent = 1 + 127 = 128 = 10000000
FPS number (3.5) = 0100000001100000...

= 0x40600000
12
• 100 = 1100100 (binary)
= 1.100100 x 2^6 sign = 0, significand = 100100...,
exponent = 6 + 127 = 133 = 10000101
FPS number (100) = 010000101100100...

= 0x42c80000
• What decimal number is represented in FPS as 0xc2508000?

Here we just reverse the steps.
0xc2508000 = 11000010010100001000000000000000 (binary)
sign = 1; exponent = 10000100; significand = 10100001000000000000000

exponent = 132 ==> 132 - 127 = 5
-1.10100001 x 2^5 = -110100.001 = -52.125
Floating Point Arithmetic

Until fairly recently, floating point arithmetic was performed using complex algorithms with an integer ALU.
The main ALU in CPUs is still an integer arithmetic ALU. However, in the mid-1980s, special hardware was
developed to perform floating point arithmetic. Intel, for example, sold a chip known as the 80387 which was a
math co-processor to go along with the 80386 CPU. Most people did not buy the 80387 because of the cost. A
major selling point of the 80486 was that the math co-processor was integrated onto the CPU, which eliminated
the need to purchase a separate chip to get faster floating point arithmetic.
Floating point hardware usually has a special set of registers and instructions for performing floating point
arithmetic. There are also special instructions for moving data between memory or the normal registers and the
floating point registers.
Most of the discussion in this class will focus on integer operations, but we will try to show at least a couple
examples of floating point arithmetic.
Role of the Operating System

The operating system (OS) is a program that allocates and controls the use of all system resources: the processor,
the main memory, and all I/O devices. In addition, the operating system allows multiple, independent programs
to share computer resources while running concurrently. But when we look at our programs (written in any
language), we don’t see any allowance for the operating system or any other program. The code is written as
if our program is the only program running. So how is this accomplished? How does the operating system get
control back from user programs to do its work? The answer relates to the tight coupling between key parts of
the code in the OS kernel,2 the architecture of the CPU, and something called interrupts.
When a computer is turned on, or booted, the OS (Windows, Linux, Minix, Solaris, etc . . . ), initializes
the hardware and also builds critical data structures in memory. Most of the data structures are used by the
operating system kernel. However, some of the data structures are loaded according to the specification of the
CPU manufacturer. This CPU specific data is used to switch processing from user programs and the kernel.
In the Intel x86, for example, two special registers in the CPU hold pointers to memory used when an interrupt
is received. When a hardware event occurs, such as when a key is pressed on the keyboard, a hardware interrupt
is issued. The CPU then reads a register to get a pointer to a stack where it will save some of the key register
values to. This is not the same stack that the user program uses. The CPU then reads another register to get
a pointer to a special table called the interrupt descriptor table. It also checks with the interrupt hardware to
get a vector for which interrupt occurred. Then, based on which interrupt occurred and the information in the
interrupt descriptor table, the CPU causes processing to switch from the running of a user level program to
running a interrupt handler in the kernel. All of the above described operations are done automatically by the
2 The kernel of an OS is the critical part of the OS that handles the lowest levels of the OS such as scheduling of processes, memory
management, and device control. It is not related to the user interface or utilities provided by the OS.
13
CPU when an interrupt is received. Thus, the reception of an interrupt is how user programs are suspended and
processing switched to the kernel.
Once the kernel gets control, it will want to save more registers from the user program, handle the hardware
event and check if work needs to be done related to internal operations such as memory or process management.
Then finally, the kernel will let a user program run again. In doing so, it will restore some registers and issue a
special instruction that causes the final registers to be restored and processing to switch back to the user program.
Since all the registers are restored, the user program never knows that it was interrupted.
There are three types of interrupts which the CPU recognizes.
Hardware Interrupt This is any type of hardware event such as a key pressed on the keyboard, a hard disk
completing the reading or writing of data, or the reception of an ethernet packet, etc. . . . Many operating
systems program a clock to issue interrupts at regular intervals so that the kernel is guaranteed to get
control on a regular basis even if no hardware events occur and a user program never releases the CPU.
Software Interrupt When a user program needs to make a system call to the operating system, such as for
I/O or to request more memory, it may issue a special instruction called a software interrupt to cause
the CPU to switch processing to the kernel.
Trap A trap is issued by the CPU itself when it detects that something is wrong or needs special attention.
In most cases a trap is issued when a user program performs an illegal instruction such as a divide by
zero error or illegal memory reference. In the Sun SPARC, there are some traps which occur in normal
processing of a program.
Most of the kernel’s code is termed reentrant, meaning that additional interrupts may be received even while
processing a previous interrupt. There are special assembly language instructions to turn interrupts off or on.
Interrupts are turned off in critical sections of the kernel when an interrupt will cause memory corruption in the
kernel. When interrupts are turned off, interrupts are queued by the hardware and will be issued when interrupts
are turned on again. A critical concern in operating system design is knowing when to turn interrupts off and
on. Interrupts should be left on except when absolutely necessary. Thus operating systems use clever algorithms
to make as much of the kernel reentrant as possible.
More will be discussed about operating systems as related to computer architecture and assembly language
later in the semester after more specifics of the processors and assembly language have been covered.
14

Computer Architecture Basic

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Computer Architecture Basic

Transféré par

Droits d'auteur :

Formats disponibles

Computer Architecture Basics

CIS 450 – Computer Organization and Architecture

1. Addressable memory that holds both instructions and data.

2. An arithmetic logic unit.

Register File A collection of several registers.

Fundamental Computer Architectures

The Stack Machine

The Accumulator Machine

Source: http://ei.cs.vt.edu/ history/VonNeumann.html

Stack Machine Accumulator Machine Load/Store Machine

push [y’] load [y’] load r0, [y’]

accumulator = accumulator [op] operand;

In order to add two numbers in memory,

The Load/Store Machine

2. To understand or work on a compiler.

Complex Instruction Sets and Reduced Instruction Sets

1. Single–cycle execution: Most instructions are executed in a single machine cycle.

Early RISC Machines

CPU L2 Main I/O

Register file (disk)

Registers L1 Cache L2 Cache Main Memory Disk Memory

Size: 200 B 128 KB 256 KB 128 MB 30 GB

Memory Hierarchy Design

• Make the common case fast.

Binary Representation of Data

Decimal int (hex) short (hex)

0x12 0x34 0x56 0x78 0x12 0x34 0x56 0x78 X

X−4 X−3 X−2 X−1 X−1 X−2 X−3 X−4

Big Endian Little Endian

Memory at Address X contains 0x12345678

Conversions of Integer Variables

0x6fa1 ==> 0x00006fa1 (sign extend a positive number)

0xfffe ==> 0xfffffffe (sign extend a negative number)

Floating Point Variables

1. First convert the number to binary.

Here are some examples using single precision FPS.

• 3.5 = 11.1 (binary)

FPS number (3.5) = 0100000001100000...

FPS number (100) = 010000101100100...

• What decimal number is represented in FPS as 0xc2508000?

sign = 1; exponent = 10000100; significand = 10100001000000000000000

-1.10100001 x 2^5 = -110100.001 = -52.125

Floating Point Arithmetic

Role of the Operating System

Vous aimerez peut-être aussi