Vous êtes sur la page 1sur 59

ECE 365

Introduction to the Design of


Digital Computers
Prof. John Lee
Office: SL 160C
Tel: 278-2267
Email:johnlee@iupui.edu


Dept. of Electrical and Computer Engineering
IUPUI
2
Lets break
Born in South Korea

Worked for the Agency for Defense
Development for 10 years
Received Ph.D. at Georgia Tech
3
Course Information
Welcome to a boring class, computer architecture
Web page: Oncourse CL
Will be constantly updated, so check it out regularly
Prerequisites: ECE 270 and 362
Meeting time: MW 3:00 - 4:15pm, SL 109
Office hours: MTuWTh 1:30 2:30pm
Textbooks
Main: C. Hamacher et al., Computer Organization, Fifth Edition,
McGraw-Hill, 2002. ISBN: 0-07-232086-9
Auxiliary: N. Cater, Computer Architecture, Schaum's Outlines,
McGraw-Hill, 2002. ISBN: 0-07-136207-X
Reference: J. Levine, Linkers & Loaders, Morgan Kaufmann
Pub., ISBN: 1-55860-496-0
Hamacher and Carter books
5
Objectives To Learn
Hardware organization of computer systems
Instruction Set Architecture (ISA)
Instruction set and design consideration
Addressing modes, stacks, subroutines
Arithmetic and Logic Unit design
Performance consideration
Memory organization
I/O interface design
Direct Memory Access (DMA)
Pipelining
Shared and distributed memory processors
Computer simulation of digital systems
I will try to cover as much as possible
6
Outcomes expected
Identify the components needed in digital computers
Use different addressing modes and different instructions to
develop assembly code programs
Design a control strategy for a digital computer using a
hardwired or microprogrammed approach
Analyze the I/O organization of a digital computer
Describe the function and operation of a memory system
Design an Arithmetic Logic Unit
Analyze the execution of instructions for a pipelined processor
Describe the function and analyze the performance of the
various components of a computer system
7
Grading
Attendance and participation (5%)
1 Project consisting of two parts (15%)
Groups of two
Simulation using Unix machine
Practice: VHDL and its tool experience
Actual: Exploring and evaluating computer systems mainly in terms of
cache performance
Tiny bit of Unix commands are necessary
Simulation tools require high-performance workstations
Special computer architecture related projects are also welcome
Due to be posted
Exams
50%: Three good out of four tests
In class, for 30-60 min, beginning of the class
Tentative dates: 1/28, 2/18, 3/17, 4/7
Final: 30%, 5/2 (Wed), 3:30-5:30pm
Final Grade is combination of absolute and relative to your peers in class
Nevertheless, aim at A
+
or the first in class, and you will get A or A
+
8
Warning
Need to possibly memorize much stuff
due to architectural operations of components
Course will probably be boring
because neither much math nor visible phenomena
mainly understanding how something works
Completely rely on powerpoint slides
explanation, explanation and more explanation
might not be necessary to attend every class
if you miss class, you will probably be able to
understand by perusing the textbook
but do not miss intentionally
roll call at each class
Oncourse will be used heavily
9
Reasons to study ECE 365
Know more about computers
Get a better grade
Proceed to next levels and go to graduate school
Design and construct your own or better computer
Be good at fixing computers
Use them effectively
Enjoy more with them
Help others
Spend money wisely (e.g., buy better ones)
Get a better job
Earn money
Dream come true
More
10
Miscellaneous Policies
Homework: Assigned, but do not submit.
Self-study, already uploaded on Oncourse
They are assumed to be assigned upon the completion of the corresponding
chapter
Due is one week after that; the solution will be uploaded upon due
Tests
No crib sheets allowed
A calculator is allowed and will be cross-checked for memory erase
Cheating
The corresponding test is zero with possible F for the course
The incident is reported to the school for further action
During exams, touching any wireless devices are considered to be
cheating
Cell phones
Set them to vibration mode upon entering the class
Turn them off during exams
Borrowing anothers calculator during an exam is not allowed
Exams will not be returned but you will have chance to check grading
and scores
Please give me feedback immediately for any issue
11
Other Expectations
On time class arrival
Stay entire class
Violating the above two will disturb the rest of class
If you cannot organize your time, you should not be in the class
Going out and coming back during the class is prohibited except
emergency
Reading news paper in the class is prohibited
Using laptop for other purpose except the class is prohibited
Should not be discourteous, abrasive, aggressive, hard to get along
with
You should not monopolize the class discussion
Always be honest, appropriate and professional
Official email must be used and expect to read within 24 hours
Solutions to homework or tests must not be released to others beyond
this class students


For unstated minor things: please visit http://rights.iupui.edu
Any Questions?
More materials for college life can be found in
Oncourse:
How to succeed in college classes
Keys to college success

Any questions?
13
14
Chapter 1. Introduction
Rapidly changing field:
vacuum tube -> transistor -> IC -> VLSI
doubling every 1.5 years:
memory capacity
processor speed (due to advances in technology and organization)
Things youll be learning:
how computers work in an instruction level
how to program to operate a computer system
How to interface among components
issues affecting modern processors (caches, pipelines)
Why learn this stuff?
you majoring EE or ECE want to answer to some computer
related questions from your family, friends and relatives
you want to play or work with computers or may build nice
software people enjoy using (need performance)
you need to make a purchasing decision or offer expert advice
SERGE
LEEF
Rapid Growth of Size & Complexity
1979
29,000
Transistors
8088
1982
134,000
Transistors
286
1985
275,000
Transistors
386
1989
1,290,000
Transistors
486
1993
3.1M+
Transistors
Pentium
1995
5.5M+
Transistors
Pentium Pro
1997
7.5m+
Transistors
Pentium II
1999
9.5M+
Transistors
Pentium III
2000
42M
Transistors
Pentium 4
2004
592M
Transistors
Itanium 2 (9MB cache)

2005
1.72B
Transistors
Dual Core Itanium
2002
220M
Transistors
Itanium 2

Moores Law
Transistor count will be doubled
every 18 months

Gordon Moore, Intel co-founder
16
Integrated Circuits Capacity
17
Feature Size
We are currently at 65nm and moving towards 45nm
Intels Roadmap

18
19
Average Transistor Cost Per Year
20
What is a computer?
Components:
Processor(s)
Co-processors (graphics, security)
Memory (SRAM, DRAM, disk drives, CD/DVD)
input (mouse, keyboard, mic)
output (display, printer)
network
Main
memory
I/O
controller
I/O
controller
I/O
controller
Disk
Graphics
output
Network
Memory I/O bus
Processor
Cache
Interrupts
Disk
21
Interfacing Processors and Peripherals
I/O Design affected by many factors (expandability,
resilience)
Performance:
access latency
throughput
connection between devices and the system
the memory hierarchy
the operating system
A variety of different users (e.g., banks,
supercomputers, engineers)
22
P4: Prescott w/ 2MB L2 (90nm)

Prescott runs very fast (3.4+ GHz)
2MB L2 Unified Cache
12K* trace cache (think I$)
16KB data cache

Where is the cache?
Why the visual differences?
Why is it square?
Whats with the colors?

TC
L1D
L2
23
Intel Dual-core D-Series

24
AMD Dual-core

Intel Core 2 Duo
Shared L2 cache architecture
advantages
25
26
DCA vs. FSB Approach
DCA: Direct Connect Architecture thru HyperTransport
NUMA (non-uniform memory access)
27
AMD Quad-core
Discrete L2, Shared L3 cache architecture
Tileras Tile64 Processor

28
Know more about your computer?
HWMonitor
CPU-Z
PC.Wizard
29
30
I/O Devices
Very diverse devices
behavior (i.e., input vs. output)
partner (who is at the other end?)
data rate
Device Behavior Partner Data rate (KB/sec)
Keyboard (100cwpm) input human 0.01
Mouse input human 0.02
Voice input (2kHz) input human 4.00
Scanner (USB 2.0) input human 900.00
Audio output (CD) output human 88.00
Line printer (940lpm) output human 5.00
Laser printer (17ppm) output human 1000.00
Graphics display output human 60,000.00
Modem input or output machine 2.00-8.00
Network/LAN input or output machine 500-1,000,000.00
Floppy disk storage machine 100.00
Optical disk storage machine 1000.00
Magnetic tape storage machine 2000.00
Magnetic disk storage machine 2000.00-10,000.00
31
Instruction Set Architecture Abstraction
Delving into the depths
reveals more information
An abstraction omits unneeded
detail to help us cope with
complexity
What are some of the details
that appear in these familiar
abstractions?
Compiler

lw r2, mem[r7]
add r3, r4, r2
st r3, mem[r8]
High Level
Language
main() {
int i,b,c,a[10];
for (i=0; i<10; i++)
a[2] = b + c*i;
}
Assembler
ISA
32
Instruction Set Architecture (ISA)
A very important abstraction
interface between hardware and low-level software
standardizes instructions, machine language bit patterns, etc.
advantage: different implementations of the same architecture
Ex: Intel and AMD
disadvantage: sometimes prevents using new innovations

Modern instruction set architectures:
X86 (iA32), PowerPC (e.g. G4, G5)
Xscale, ARM, MIPS
Intel/HP EPIC (iA64), AMD64, Intels EM64T, SPARC, HP PA-RISC,
DEC/Compaq/HP Alpha
33
Review Basic Structure of Computers
A contemporary computer is a fast electronic
calculating machine that
accepts digitized input information,
processes it according to a list of internally stored
instructions, and
produces the resulting output information.

34
Definitions
Computer architecture:
The functional operation of the individual hardware units in a
computer system and the flow of information among and the
control of those units.
Types of computers
Personal computer: schools, business offices, desktop
Portable laptop: used mainly for word processing, desktop
High performance workstations: graphics and I/O capability,
higher computational power, desktop
Mainframes: business data processing in medium to large
range corporations
Supercomputers: large scale numerical calculations.
35
Basic Components of a Computer
36
Component definitions
CPU (central processing unit)
arithmetic and logic circuits in conjunction with the
main control circuits (or simply the processor).
Memory
internal storage (SRAM)
I/O
input and output equipment, some standard equipment
provide both input and output functions.
Example of I/O equipment: keyboard, screen.
37
Information
Information fed to the computer is either in the form of data or
instructions.
Instructions are explicit commands that
Govern the transfer of information within a computer as well as
between the computer and its I/O devices.
Specify the arithmetic and logic operations to be performed.
Program: a set of instructions that perform a task
The program is usually stored in the memory.
The processor fetches the instructions one at a time and performs
the desired operation.
Data are numbers and encoded characters that are used as
operands by the operations.

Information handled by a computer must be encoded in a suitable
format.
Each number, character or instruction is encoded as a string of
binary digits called bits, each has values of 0 or 1.
38
Operation of Computer Units
INPUT UNIT
Computers accept coded information through input units (read
data).
Example: the keyboard is wired so that whenever a key is
pressed, the corresponding digit is automatically translated
to its corresponding code and sent to the memory or to the
processor.
Other examples: joysticks, trackballs and mice.
OUTPUT UNIT
Computers send processed results to outside world.
Example: monitors, printers
CONTROL UNIT
The task of the control unit is to coordinate the operation
between the memory, ALU and I/O units.
39
Clock and Timing
Clock: a circuit that generates a signal at a regular
interval
Timing signals: signals that determine when a given
action is to take place.
Data transfers between the processor and the
memory are also controlled by the control unit
through timing signals.
The control circuitry is usually distributed throughout
the machine.

40
Memory Unit
The function of the memory unit is to store programs and data.
Primary memory
Primary storage or main memory is a fast memory that contains a
large number of storage cells.
Each cell is capable of storing one bit of information.
Cells are processed in groups of fixed size called words.
For easy access a distinct address is associated with each word
location.
Main memory is organized such that the contents of one word can be
stored or retrieved in one basic operation.
A given word is accessed by specifying its address and issuing a
control command that starts the store or retrieval process.
Memories in which any location can be reached in a short and fixed
amount of time after specifying the address are called random-
access memories (RAM).
Secondary memory
Example of secondary storage: disks, USB storages
Primary storage is expensive but fast while Secondary storage is large
but slow
41
ALU - Arithmetic and Logic Unit
Most computer operations are executed in the ALU.
Example: addition of two numbers located in the main
memory
The numbers are brought into the arithmetic unit.
Actual addition is carried out in the ALU
The sum is then stored in the memory or retained in
the processor for immediate use
Similarly, other arithmetic or logic operations are
initiated by bringing the required operands into the
ALU.
Not all of the operands are located in main memory,
some operands are kept in temporary
storage(registers) for frequent access.
Access times to registers is 10 times or more faster
than access times to memory.
42
Basic Operation Concepts
Example: Add the content of memory location LOCA to
the content of register R0 and place the sum in R0.

Assembly Instruction: Add LOCA,R0

The instruction Add LOCA,R0 combines a memory
access operation with an ALU operation.
The instruction is first fetched from main memory.
The operand at LOCA is then fetched and added to the
contents of R0.
The resulting sum is stored in R0.
43
An alternative way to implement the
instruction

In most computers, the above task is performed
using two instructions:
Load LOCA,R1
Add R1,R0

Advantage: divide a job in small units so that multiple
hardware units can work in parallel.
44
Processor block diagram
45
The processor contains
ALU
Control circuitry
IR register: holds the instruction that is currently being executed.
The IR contents are available to the control circuits which
generate timing signals.
PC register: keeps track of the execution of a program; it
contains the memory address of the next instruction to be
executed.
R0 through Rn-1: n general purpose registers.
MAR: memory address register, hold the address of the location
to or from which data are to be transferred.
MDR: contains the data to be written into or read out of the
addressed location.
46
Instruction Execution Steps
The PC is set to point to the first instruction in the program.
The contents of the PC are transferred to the MAR.
A read control signal is sent to memory.
The addressed word (i.e., instruction) is read from memory and
loaded into the MDR.
The content of MDR is transferred to IR.
If the instruction involves an ALU operation, obtain the required
operands.
If an operand resides in memory, it has to be fetched by sending
its address to MAR and initiating a read cycle.
When the operand has been read into MDR, it will be transferred
to the ALU.
47
Instruction Execution Steps (cont.)
When all the operands are available, the ALU performs the
operation.
If the result is to be stored in memory, it is sent to the MDR and
its corresponding address is placed in the MAR and a write cycle
is initiated.
While an instruction is executed, the contents of the PC are
incremented so that the PC points to the next instruction.
Normal execution may be preempted with interrupts.
An interrupt is a request from an I/O device for a service by the
processor.
The processor responds by executing an interrupt service routine.
The state of the processor is saved prior to the interrupt service
routine and restored after the return from the interrupt service
routine.
48
Bus Structures
For performance purposes, a computer is organized such that all of its
units can handle one full word of data at a given time.
All bits in a word are transferred in parallel over a group of wires.
Bus: a group of wires that connects several devices.
Buses usually carry data, address and control signals.
Single bus configuration: all units are connected to this bus.
Only one unit can control the bus at any given instant.
Low cost, new devices can be easily added on the bus.
Systems that contain multiple buses achieve more parallelism (better
performance at an increased cost).
Possible hierarchical structure
49
Software
In order for a user to enter and run an application, the computer
must already have some system software, which is a collection of
programs used to perform the following functions.
Receive and interpret user commands.
Enter and edit application programs.
Store files in secondary storage.
Manage the storage and retrieval of files in secondary storage.
Run standard applications such as spreadsheets.
Control I/O units to receive input information and produce
results.
Translate programs from source prepared by the user into
object form (machine instructions).
It is called BIOS and/or OS
50
Software Development Environment
Application programs are usually written in a high-
level language (Basic, C, C++, C#, etc.) independent
of a particular computer.
The text entry and editing system software allows
users to write a source program and store it in a file.
A system software, the compiler, translates the high-
level language into a machine language for a specific
computer.
Linking and running user-written applications with
existing standard libraries.

51
Operating Systems
The operating system (another system software) is a collection
of routines used to control the sharing and interaction among
different computer units.
The OS assigns resources to individual tasks such as main
memory, disk space, moves data between memory and disk,
handles I/O operations.
Example: part of a programs task involves
Reading a data file from the disk into the main memory
Performing some computation on the data
Printing the result
When execution of a program reaches the point where the data
file is needed, the program requests from the OS to transfer the
data file from disk to memory.
When computation is completed, the application transfers
control to the OS.
An OS routine is used to print the results.
52
Computer time usage
53
Comments for Computer Time Usage
The disk and the processor are idle during most of
the time period t4 to t5.
The operating system can load the next program to
be executed into memory while the printer is
operating.
The operating system is responsible for using the
resources as efficiently as possible when several
application programs are to be executed.

54
Performance
The total time needed to execute application program is
the most important measure of performance.
Performance is affected by
By the choice of machine language instructions.
The design of the hardware that constitutes the computer.
The way in which the compiler translates programs into
the machine language.
At the start of execution, all instructions and data are
stored in the main memory.
The processor clock cycle is an important parameter
55
Performance analysis
Clock rate: cycles per second, hertz (Hz).
Example: 200 million cycles per second = 200 megahertz (MHz).
A computer with a higher clock rate executes programs faster in
general.

EXECUTION TIME ANALYSIS
Let T be the time required to execute a program that is written
in a given high level language.
Complete execution of the program requires the execution of N
machine language instructions, N is the total number of
instructions executed including repeated instructions.
Let S be the average number of basic steps per machine
instruction.
56
Performance analysis (cont.)
Let R be the clock rate in cycles per second
N x S
T = -------
R
The performance parameter T for an application program is
more important than the value of R.
Rule: minimize N and S, and maximize R.
How?
A substantial improvement in performance can be achieved by
allowing execution of instructions to overlap.
=> pipelining
57
Cache memory
58
Cache memory
The internal speed of instruction execution is very high. It is
considerably faster than the speed at which instructions and
data are fetched from the main memory
performance can be improved by minimizing the movement of
instructions and data from the main memory.
HOW?
Processing is done by bringing instructions and data from the
main memory into the cache when they are first needed.
Subsequent requests for the same data and instructions (e.g.
loop) are satisfied from the cache.
Access time to the cache is much faster than to the main
memory.
Due to the limited storage capacity of the cache, the information
in the cache must be replaced when new data and instructions
are needed.

59
Other Performance Issues
More time is needed to transfer program and data files from
secondary storage disks into the main memory
it is possible to perform transfers to and from secondary
storage in parallel
The use of pipelining and parallelism leads to increased program
execution rates.

PARALLEL and DISTRIBUTED COMPUTING
Computer systems have evolved from machines based on a
single processing unit into configurations that contain a number
of processors.
Computer systems with multiple processors are useful because
large computations can often be partitioned into a set of tasks
and some of these can be executed in parallel (concurrently).
A cluster of computing machines is cooperating each other to
solve a larger problem into small pieces (divide-and-conquer).