Vous êtes sur la page 1sur 85

Evolution of Microprocessor

A 30 year history of microprocessors


Four generation of innovation

High performance microprocessor drivers:


Memory hierarchies instruction level parallelism (ILP)

Where are we and where are we going? Focus on desktop/server microprocessors vs. embedded/DSP microprocessor

Microprocessor Generations
First generation: 1971-78
Behind the power curve (16-bit, <50k transistors)

Second Generation: 1979-85


Becoming real computers (32-bit , >50k transistors)

Third Generation: 1985-89


Challenging the establishment (Reduced Instruction Set Computer/RISC, >100k transistors)

Fourth Generation: 1990 Architectural and performance leadership (64-bit, > 1M transistors, Intel/AMD translate into RISC internally)

In the beginning (8-bit) Intel 4004


First general-purpose, single-chip microprocessor Shipped in 1971 8-bit architecture, 4-bit implementation 2,300 transistors Performance < 0.1 MIPS (Million Instructions Per Sec) 8008: 8-bit implementation in 1972
3,500 transistors First microprocessor-based computer (Micral)
Targeted at laboratory instrumentation Mostly sold in Europe

1st Generation (16-bit) Intel 8086


Introduced in 1978
Performance < 0.5 MIPS

New 16-bit architecture


Assembly language compatible with 8080 29,000 transistors Includes memory protection, support for Floating Point coprocessor

In 1981, IBM introduces PC


Based on 8088--8-bit bus version of 8086

2nd Generation (32-bit) Motorola 68000


Major architectural step in microprocessors:
First 32-bit architecture
initial 16-bit implementation

First flat 32-bit address


Support for paging

General-purpose register architecture


Loosely based on PDP-11 minicomputer

First implementation in 1979


68,000 transistors < 1 MIPS (Million Instructions

Per Second) Used in


Apple Mac Sun , Silicon Graphics, & Apollo workstations

3rd Generation: MIPS R2000


Several firsts:
First (commercial) RISC microprocessor First microprocessor to provide integrated support for instruction & data cache First pipelined microprocessor (sustains 1 instruction/clock)

Implemented in 1985
125,000 transistors 5-8 MIPS (Million Instructions per Second)

4th Generation (64 bit) MIPS R4000


First 64-bit architecture Integrated caches
On-chip Support for off-chip, secondary cache

Integrated floating point Implemented in 1991:


Deep pipeline 1.4M transistors Initially 100MHz > 50 MIPS

Intel translates 80x86/ Pentium X instructions into RISC internally

Key Architectural Trends


Increase performance at 1.6x per year (2X/1.5yr)
True from 1985-present

Combination of technology and architectural enhancements


Technology provides faster transistors ( 1/lithographic feature size) and more of them Faster transistors leads to high clock rates More transistors (Moores Law):
Architectural ideas turn transistors into performance
Responsible for about half the yearly performance growth

Two key architectural directions


Sophisticated memory hierarchies Exploiting instruction level parallelism

Memory Hierarchies Caches: hide latency of DRAM and increase BW


Trend 1: Increasingly large caches
CPU-DRAM access gap has grown by a factor of 30-50! On-chip: from 128 bytes (1984) to 100,000+ bytes Multilevel caches: add another level of caching
First multilevel cache:1986 Secondary cache sizes today: 128,000 B to 16,000,000 B Third level caches: 1998

Trend 2: Advances in caching techniques:


Reduce or hide cache miss latencies

Cache aware combos: computers, compilers, code writers


prefetching: instruction to bring data into cache early

early restart after cache miss (1992) nonblocking caches: continue during a cache miss (1994)

The 1 uP: Intel 4004


s t

Introduced 1971 2250 transistors 108 kHz, 60,000 ops/sec 16 pins 10-micron process As powerful as the ENIAC which had 18000 tubes and occupied a large room Targeted use: Calculators Cost: less than $100

Introduced December 2001 55 million transistors 32-bit word size

Currently Popular Intel Pentium 4 (2.2GHz)

2 ALUs, each working at 4.4GHz 128-bit FPU 0.13 micron process Targeted use: PCs and low-end workstations Cost: around $600

Moores Law
In 1965, one of the founders of Intel Gordon Moore predicted that the number of transistor on an IC (and therefore the capability of microprocessors) will double every year. Later he modified it to 18months His prediction still holds true in 02. In fact, the time required for doubling is contracting to the original prediction, and is closer to a year now

4 0 0 4 8 0 0 8 8 0 8 0 8 0 8 6 2 8 6 3 8 6 4 8 6 P e ntium P e ntium 2 P e ntium 3 P e ntium 4 1 0 0 ,0 0 0 ,0 0 0 1 0 ,0 0 0 ,0 0 0 1 ,0 0 0 ,0 0 0 1 0 0 ,0 0 0 1 0 ,0 0 0

Evolution of Intel Microprocessors

1 ,0 0 0 1970

1975

1980

1985

1990

1995

2000

2005

4-, 8-, 16-, 32-, 64-bit (Word Length)


The 4004 dealt with data in chunks of 4bits at a time Pentium 4 deals with data in chunks (words) of 32-bit length The new Itanium processor deals with 64bit chunks (words) at a time Why have more bits (longer words)?

kHz, MHz, GHz (Clock Frequency) 108kHz 4004 worked at a clock frequency of
The latest processors have clock freqs. in GHz Out of 2 uPs having similar designs, one with higher clock frequency will be more powerful Same is not true for 2 uPs of dissimilar designs. Example: Out of PowerPC & Pentium 4 uPs working at the same freq, the former performs better due to superior design. Same for the Athlon uP when compared with a Pentium

Basic Components of Digital Computer

CPU Memory I/O

CPU

Memor y I/O

Could be a chip, a board, or several boards

Microcontrollers
Memory CPU
ROM RAM

I/O Subsystems: Timers, Counters, Analog Interfaces, I/O interfaces

A single chip

A microprocessor system?
uPs are powerful pieces of hardware, but not much useful on their own Just as the human brain needs hands, feet, eyes, ears, mouth to be useful; so does the uP A uP system is uP plus all the components it requires to do a certain task A microcomputer is 1 example of a uP system

Microprocess Data or Cache


Memory Bus

RAM I/O

Bus Interface Unit


System Bus

Control Unit Instruction Decoder

Arithmetic & Logic Unit Registers Floating Point Unit Registers

Instruction Cache

General-purpose microprocessor
CPU for Computers No RAM, ROM, I/O on CPU chip itself Example Intels x86, Motorolas 680x0
Many chips on mothers board

CPU GeneralPurpose Microprocessor

Data Bus

RAM

ROM

I/O Port

Timer

Serial COM Port

Address Bus General-Purpose Microprocessor System

Microcontroller :
A smaller computer On-chip RAM, ROM, I/O ports... Example Motorolas 6811, Intels 8051, Zilogs Z8 and PIC 16X

CPU I/O Port

RAM ROM Serial Timer COM Port

A single chip
Microcontroller

Microprocessor vs. Microcontroller


Microcontroller Microprocessor CPU, RAM, ROM, I/O and CPU is stand-alone, RAM, timer are all on a single chip ROM, I/O, timer are separate fix amount of on-chip ROM, designer can decide on the RAM, I/O ports amount of ROM, RAM and I/O ports. for applications in which cost, expansive power and space are critical versatility single-purpose
general-purpose

Block Diagram
External interrupts Interrupt Control On-chip ROM for program code
Timer/Counter

On-chip RAM

Timer 1 Timer 0

Counter Inputs

CPU Serial Port

OSC

Bus Control

4 I/O Ports

P0 P1 P2 P3

TxD RxD

Address/Data

8086 microprocessor
Address Bus 20 lines A19 A0 Data Bus 16 lines D15 D0 Microprocessor 8086 16 bit- microprocessor ? 16-bits data bus? Da ta Bu s
Control signals

Add Bus

20 bits address bus?


It can address any one of 1,048,576 (=220 ) memory locations/addresses. Each memory location is one byte wide. To store a word of 16 bit 2 memory locations are required. If the first byte of the word is at even address 8086 can read the entire word in one operation. If the first byte of the word is at an odd address, the 8086 will read the first byte with one bus operation and the second byte with another bus operation.

A19 A0 0.0 1.1

00000H FFFFFH 00000H

Memory Address Space

1,048,576 memory locations=1MBytes

FFFFFH

8086 INTERNAL ARCHITECTURE

2 units a 1. BIU 2. EU

Fig: 8086 Internal block diagram .

BIU and EU
BIU (bus interface unit) sends out addresses, fetches instructions from memory, reads data from ports and memory, and writes data to ports and memory. In other words, the BIU handles all transfers of data and addresses on the buses for the execution unit. EU (execution unit) of the 8086 tells the BIU where to fetch instructions or data from, decodes instructions, and executes instructions.

Bus Interface Unit


Receives instructions & data from main memory Instructions are then sent to the instruction cache, data to the data cache Also receives the processed data and sends it to the main memory

Floating-Point Unit (FPU)


Also known as the Numeric Unit It performs calculations that involve numbers represented in the scientific notation (also known as floating-point numbers). This notation can represent extremely small and extremely large numbers in a compact form Floating-point calculations are required for doing graphics, engineering and scientific work The ALU can do these calculations as well, but will do them very slowly

Registers
Both ALU & FPU have a very small amount of super-fast private memory placed right next to them for their exclusive use. These are called registers The ALU & FPU store intermediate and final results from their calculations in these registers Processed data goes back to the data cache and then to main memory from these registers

Control Unit
The brain of the uP Manages the whole uP Tasks include fetching instructions & data, storing data, managing input/output devices

Overview
Intel 8088 facts
20 bit address bus allow accessing
1 M memory locations

VDD (5V) 8-bit data


8088

16-bit internal data bus and 8-bit


external data bus. Thus, it need two read (or write) operations to read (or write) a 16-bit datum

20-bit address

control Byte addressable and byte-swapping signals To 8088 Word: 5A2F CLK 18001 5A High byte of word 18000 2F Low byte of word Memory locations

control signals from 8088

8088 signal classification

GND

Organization of 8088/8086
Address bus (20 bits) AH BH CH AL General purpose BL register CL

Execution Unit DL DH SP (EU)


BP SI DI ALU Data bus (16 bits)

Segment register

CS DS SS ES IP

Data bus (16 bits)

ALU Flag register EU control

Instruction Queue

Bus control

External bus

Bus Interface Unit (BIU

General Purpose Registers


15 8 7 0

AX
Data Group

AH BH CH DH SP

AL BL CL DL

Accumulator Base Counter Data Stack Pointer Base Pointer Source Index Destination Index

BX CX DX

Pointer and Index Group

BP SI DI

Arithmetic Logic Unit (ALU)


A
n bits

B
n bits

Carry Y= 0 ? A>B?

0 0 0 A+B 0 0 1 A -B 0 1 0 A -1 F 0 1 1 A 1 0 0 1 0 1

and B A or B not A

Signal F control which function will be conducted by ALU.

Signal F is generated according to the current instruction. Basic arithmetic operations: addition, subtraction, Basic logic operations: and, or, xor, shifting,

Flag Register

Flag register contains information reflecting the current status of a microprocessor. It also contains information which controls the operation of the microprocessor.

15

NT IOPL OF DF IF TFSF ZF AF PF
Status Flags

0 CF

Control Flags

IF: DF: TF:

Interrupt enable Direction flag Trap flag

CF: flag PF: AF: ZF: SF: OF: NT:

Carry flag Parity flag Auxiliary carry flag Zero flag Sign flag Overflow flag Nested task flag

Instruction Machine Codes


Instruction machine codes are binary numbers
For Example: 1000100011000011

MOV AL, BL

Machine code structure


Opcode

MOV Register mode


Mode Operand1Operand2

Some instructions do not have operands, or have only one operand

Opcode tells what operation is to be performed. Mode indicates the type of a instruction: Register type, or Memory type Operands tell what data should be used in the operation. Operands can be addresses telling where to get data (or where to store results)

(EU control logic generates ALU control signal

EU Operation
1. Fetch an instruction from instruction queue
AH BH CH DH AL General purpose BL register CL DL

2. According to the instruction, EU control logic generates control signals. (This

3. Depending on the control signal, EU performs one of the following operations:

SP process is also referred to as instruction BP SI decoding) ALU Data bus DI (16 bits)

ALU An arithmetic operation A logic operation Flag register Storing a datum into a register Moving a datum from a register Changing flag register

EU instruction control 1011000101001010

Generating Memory Addresses

How can a 16-bit microprocessor generate 20-bit memory addresses? Left shift 4 bits 16-bit register 0000 + 16-bit register Offset

FFFFF Addr1 + 0FFFF Addr1 Offset Segment address 00000 1M memory space Segment (64K)

20-bit memory address

Intel 80x86 memory address generation

Memory Segmentation
A segment is a 64KB block of memory starting from any 16-byte
boundary
For example: 00000, 00010, 00020, 20000, 8CE90, and E0840 are all valid segment addresses The requirement of starting from 16-byte boundary is due to the 4-bit left shifting

Segment registers in BIU

15

CS DS SS ES

Code Segment Data Segment Stack Segment Extra Segment

Memory Address Calculation

Segment addresses must be stored in segment registers Offset is derived from the combination of pointer registers, the Instruction Pointer (IP), and immediate values Examples 3 4 8 A 0 CS 4 2 1 4 IP + Instruction address 3 8 A B 4 1 2 3 4 0 DS 0 0 2 2 DI + Data address 1 2 3 6 2 +

Segment address Offset

0000

Memory address

5 0 0 0 0 SS F F E 0 SP + Stack address 5 F F E 0

Fetching Instructions
Where to fetch the next instruction?

8088 CS1 2 3 4 IP 0 0 1 2 12352


Update IP

Memory
12352 MOV AL, 0

After an instruction is fetched, Register IP is updated as follows: IP = IP + Length of the fetched instruction

For Example: the length of MOV AL, 0 is 2 bytes. After fe the IP is updated to 0014

Accessing Data Memory

There is a number of methods to generate the memory address when accessing data memory. These methods are referred to as Addressing Modes Examples: Direct addressing: MOV AL, [0300H] 1 2 3 4 0 0 3 0 0 Memory address 1 2 6 4 0 DS Register indirect addressing: MOV AL, [SI] 1 2 3 4 0 0 3 1 0 Memory address 1 2 6 5 0 DS (assume DS=1234H) (assume SI=0310H) (assume DS=1234H)

Reserved Memory Locations


Programs should not be loaded in these areas
Locations from FFFF0H to FFFFFH are used for system reset code Reset instruction area

Some memory locations are reserved for special purposes.

FFFFF FFFF0

Locations from 00000H to 003FFH Interrupt are used for the interrupt pointer table pointer It has 256 table entries table

Each table entry is 4 bytes 003FF 256 4 = 1024 = memory addressing space From 00000H to 003FFH 00000

Interrupts

An interrupt is an event that occurs while the processor is executing a program

The interrupt temporarily suspends execution of the program and switch the processor to executing a special routine (interrupt service routine) When the execution of interrupt service routine is complete, the processor resumes the execution of the original program Interrupt classification Hardware Interrupts
Caused by activating the processors interrupt control signals (NMI, INTR)

Software Interrupts
Caused by the execution of an INT instruction Caused by an event which is generated by the execution of a program, such as division by zero

8088 can have 256 interrupts

Minimum and Maximum Operation modes

Intel 8088 (8086) has two operation modes:

Minimum Mode

Maximum Mode

8088 generates control signals It needs 8288 bus controller to generate for memory and I/O operationscontrol signals for memory and I/O operations It Some functions are not available allows the use of 8087 coprocessor; in minimum mode it also provides other functions Compatible with 8085-based systems

8086/8088 Functional Units

E x e c u tio n U n it (E U )

B u s In te rfa c e U n it( B IU ) F e tc h e s O p c o d e s , R e a d s O p e ra n d s , W r it e s D a t a

8 0 8 6 /8 0 8 8 M P U

8086/8088 Internal Organisation


EU B IU A d d re s s B u s 2 0 b its AH BH CH DH SP SS BP ES DI IO BI In te rn a l C o m m u n ic a tio n s R e g is te rs Bus C o n tro l 8088 B us AL BL CL CS DL DS S U M M A T IO N D a ta B u s

T e m p o ra ry R e g is te rs In s tru c t io n Q u e u e ALU

EU C o n tro l

F la g s

8086/8088 20-bit Addresses


CS 1 6 - b it S e g n m e n t B a s e A d d r e s s 0000

IP 1 6 - b it O ff s e t A d d r e s s

2 0 - b it P h y s ic a l A d d r e s s

Exercise: 20-bit Addressing


1. CS contains 0A820h,IP contains 0CE24h. What is the resulting physical address? 2. CS contains 0B500h, IP contains 0024h. What is the resulting physical address?

GND AD14

GND AD14

1 GND

A14

i8 0 8 6 C ir c u it - M a x im u m M o d e
V cc
CLK R EADY R ESET S0# S1# S2# CLK M R DC # M W TC # AM W C # IO R C # IO W C # A IO W C # IN T A #

8284A C lo c k G e n e ra to r R DY

8288 Bus C o n t r o lle r


D EN D T /R # ALE

8086 CPU
M N /M X #

LE O E# BHE# A D 1 5 :A D 0 A 1 9 :A 1 6 IN T R 74LS373 x3
A 1 9 :A 0 , BHE#

A D D R /D A T A

D IR EN # 74LS245 74LS245 x2 x2
D 1 5 :D 0

A D D R /D a ta

8086/8 Maximum Mode


In maximum mode, the 8288 uses a set of status signals (S0, S1, S2) to rebuild the normal bus control signals of the microprocessor
MRDC#, MWTC#, IORC#, IOWC# etc Equivalent to MEMR# etc

Look at some special signals briefly

RESET# Signal
The Active low RESET# signal puts the 8086/8 into a defined state Clears the flags register, segment registers etc. Sets the effective program address to 0FFFF0h (CS=0F000h, IP=0FFF0h) 8086/8 Programs always start at 0FFFF0H after Reset has been asserted and removed Continues into latest generation CPUs

BHE# Signal (8086 Only)


The 8086 processor can address memory a byte at a time Its data bus is 16b wide It uses the BHE# signal and A0 (sometimes called BLE#) to address bytes using its 16b bus

Use of BHE#/A0(BLE#)
B y te - W id e a d d r e s s in g (8 0 8 8 ) FFFFF FFFFE FFFFD FFFFC A 1 9 ..A 1 O D D A d d re s s e s (8 0 8 6 ) FFFFF FFFFD FFFFB FFFF9 A 1 9 ..A 1 E V E N A d d re s s e s (8 0 8 6 ) FFFFE FFFFC FFFFA FFFF8

00002 00001 00000

00005 00003 00001

00004 00002 00000

D 1 5 :D 8 BHE# A 0 /B L E #

D 7 :D 0

Use of BHE#/BLE#
BHE# 0 0 1 1 A0/BLE# 0 1 0 1 Selection Whole word (16-bits) High byte to/from odd address Low byte to/from even address No selection

ALE and Address/data Bus Multiplexing


8086/8 Multiplexes the Address and Data signals onto the same set of pins Need off-chip logic to separate the signals Transparent latches designed just for address demultiplexing

ALE and 74HC373 Transparent Latch


C lo c k

A d d re s s / D a ta Bus

A d d re s s T im e

D a t a T im e

ALE

O u tp u t o f 74H C 373

M ic r o c o m p u t e r A d d r e s s B u s

7 4 H C 3 7 3 o r e q u iv a le n t

A d d re s s / D a ta B u s

In 0 : I n 7

Q 0 :Q 7

S y s te m A d d re s s B u s

ALE

LE OE# T r iS t a t e C o n t r o l s ig n a l, O E # , s h o w n c o n n e c te d to G N D f o r s im p lic it y

Use of ALE (Address Latch


Enable)
ALE is used with an external latch (74HC373) to demultiplex the address and data lines 74HC373 is transparent when its LE input (connected to ALE) is high When ALE goes low, the 373 holds the last data until ALE goes high again

8288 Bus Controller and Bus Transceivers


8288 B u s C o n t r o ll e r DEN# D T /R # EN# DD I IRR C P U [D 1 5 :D 8 ] 74H C 245 B u ffe r e d [D 1 5 :D 8 ] 8 2 8 8 B u s C o n t r o lle r a ls o g e n e r a t e s D i r e c t io n a n d E n a b l e s i g n a ls f o r B i D ir e c t io n a l T r a n s e iv e r s S u p p o r ts B u ff e r in g th e S y s te m D a ta B u s

EN# D IR C P U [D 7 :D 0 ] 74H C 245 B u ffe re d [D 7 :D 0 ]

To Memory and I/O Systems

8086 Read Cycle

8086 Write Cycle

8086 Read Cycle

(1 Wait State)

8086/8088 Summary
First Generation (introduced June 1978) One of the first 16b processors on the market 16b internal registers 16/8b external data bus 20b address bus (1MB addressable) Used in 1st generation IBM PCs (1981)

80186/80188
Evolution of 8086/8088 80186/80188 Increased instruction set On-chip system components (Clock generator, DMA, Interrupt, Timers) Unsuccessful in PCs Popular in embedded systems

2nd Generation Processor 286


P2 (286) = 2nd Generation Processor Introduced in 1981 CPU behind IBM AT Throughput of original IBM AT (6MHz) was about 500% of IBM PC (4.77MHz) Level of integration: 134k transistors (vs 29k in 8086) Still a 16b processor Available in higher clock frequencies: 25MHz

2nd Generation Processors 286


Fully backwards compatible to 8086 Improved instruction execution
80286 runs 8086 software without modification Average instruction takes 4.5 cycles vs. 12 cycles (8086)

Improved instruction set Real mode and Protected Mode

Multitasking-support. What happens in one area of memory doesnt affect other programs. Protected mode supported by Windows 3.0.

16MB addressable physical memory On-chip MMU (1GB virtual memory) Non-multiplexed address-bus and data-bus

Improving Computer Performance


Weve seen how 16b computer technology based on the 8086 and 80286 processors developed These computers are not powerful enough for todays applications How do you improve the performance of your computer? Lets start with the CPU

CPU Performance (1)


MOST OBVIOUS: Processor Clock Frequency Increased frequency increased execution rate State of the Art: >4GHz (03/2005) Memory and I/O access times can be performance bottleneck unless you take some special measures

CPU Performance (2)


ALU register width
A processor is an n-bit processor, where N represents the precision of the ALU N can be 4, 8, 16, 32, or 64 The wider the registers the more processing per clock

Data bus width


The wider the data bus the faster we can transfer data Since the memory and I/O device access times are finite, the more bits transferred per cycle the better

CPU Performance (3)


Address bus width Increased address width doesnt provide a speed increase as such CPU can directly address more memory PCs use big programs, which would not fit in a smaller address space Overcoming small address space takes time
Impacts on overall system performance

3rd Generation Processor 386


P3 (386) = 3rd Generation Processor Introduced: 10/1985 Full 32b processor
(32b registers. 32b internal and external databus. 32b address bus)

275k transistors. CMOS. 132-pin PGA package.

(Supply current Icc=400mA. Roughly the same as 8086 !)

Clock speeds: 16-33MHz P3 processors were far ahead of their time: First 386 PCs early 1987
(COMPAQ)

It took 10 years before 32b operating systems became mainstream!

3rd Generation Processor 386


Modes of operation:
Real. Protected. Virtual Real.

Protected mode of 386 is fully compatible with 286 New virtual real mode

Protected mode=native mode of operation. Chips are designed for advanced operating systems such as Windows NT Processor can run with hardware memory protection while simulating the 8086s real-mode operation. Multiple copies of e.g. DOS can run simultaneously, each in a protected area of memory. If a program in one memory area crashes, the rest of the system is protected.

Intel 32-bit Architecture:IA-32


A d d re s s A d d r e s s in g U n it (A U )

B u s U n it ( B U ) P re fe tc h Q u e u e

E x e c u t io n U n it ( E U ) ALU
C o n tro l U n it (C U )

D a ta

I n s t r u c tio n U n it ( IU )

R e g is te r s

T h e 8 0 3 8 6 in c lu d e s a B u s In t e r f a c e U n it f o r r e a d in g a n d p r o v id in g d a t a a n d in s tr u c t io n s , w it h a P r e f e tc h Q u e u e , a n I U fo r c o n t r o llin g th e E U w it h it s r e g is te r s , a s w e ll a s a n A U f o r g e n e r a t in g m e m o r y a n d I/ O a d d r e s s e s

80386 Features
32b general and offset registers 16B prefetch queue Memory management unit with segmentation unit and paging unit 32b address and data bus 4GB physical address space 64TB virtual address space i387 numerical coprocessor Implementation of real, protected and virtual 8086 modes

80386 Operating Modes


Protected Mode for Multitasking support Real Mode (native 8086 mode)
Processor powers up in Real Mode

System Management Mode


Power management or system security Processor switches to separate address space, while saving the entire context of the currently running program or task

80386 Register Set


I n s t r u c t io n P o in t e r 31 E IP 16 15 IP 0 EFLAG E F L A G R e g is te r 16 15 31 FLAG E0

G e n e r a l- P u r p o s e R e g is t e r s 16 15 31 EAX EBX ECX EDX ESI EDI EBP ESP AH BH CH DH

8 7 AL BL CL DL SI DI BP SP

0 CS SS DS ES FS G S

S e g m e n t R e g is te r s 15 0

80386 Prefetch Queue

E x e c u t io n U n it

1 6 - b y te d e e p In s tr u c tio n Q u e u e

B u s In t e r fa c e U n it

3 2 -b it D a ta Bus

Fetching from on-chip Queue is fast

Reading from offchip Memory is slow

80386 Prefetch Queue


80386 Prefetch queue is 16B deep 1. The instruction fetch can read from the prefetch queue faster than from memory 2. The prefetcher can do some work while the execution unit is doing other tasks in parallel

Coprocessor: i387
The hardware implementation of floating point processing in the i387 means floating point operations run at much higher speed. The i386 can execute all mathematical expressions using software emulation of the i387.

80386: Classic CISC Processor


CISC = Complex Instruction Set Computer Complex instructions ...but code-size efficient Micro-encoding of the machine instructions Extensive addressing capabilities for memory operations Few, but very useful CPU registers

80386 Execution Sequence


C o p ro c e s s o r C IS C P ro c e s s o r R e g is te r R e g is te r R e g is te r R e g is te r Execution Unit ALU

M ic r o c o d e RO M

M ic r o c o d e Q ueue

Prefetch Queue

Bus Interface

Decoding Unit

C o n tr o l U n it

In a m ic r o p r o g r a m m e d C I S C t h e p r o c e s s o r f e tc h e s t h e in s t r u c t io n s v ia t h e b u s in te r f a c e in t o a p r e f e t c h q u e u e , w h ic h t r a n s fe r s th e m t o a d e c o d in g u n it . T h e d e c o d in g u n it b r e a k s th e m a c h in e in s t r u c t io n in t o m a n y e le m e n t a r y m ic r o - in s t r u c t io n s a n d a p p le s t h e m t o a m ic r o c o d e q u e u e . T h e m ic r o - in s t r u c t io n s a r e t r a n s f e r r e d f r o m t h e m ic r o c o d e q u e u e t o t h e c o n t r o l a n d e x e c u t io n u n it w h ic h d r iv e s t h e A L U a n d t h e r e g is t e r s

80386 Complex Instructions


CISC drawback: Most instructions are so complicated, they have to be broken into a sequence of micro-steps These steps are called Micro-Code Stored in a ROM in the processor core Micro-code ROM: Access-time and size... They require extra ROM and decode logic

Vous aimerez peut-être aussi