Académique Documents
Professionnel Documents
Culture Documents
Features of 80386
The programming model consists of a set of “program-visible” registers that are used during
application programming. Figure illustrates the programming model of i386 and i486
microprocessors. It consists of all the registers of 8086 and 80286 microprocessors, plus a 32-bit
extension to each register. The 32-bit extended registers are EAX, EBX, ECX, EDX, EBP, ESP, ESI,
EDI, EIP and EFLAGS. The segment registers are still 16-bit, but two additional segment registers FS
and GS are present in 80386. In addition to the 16-bit “visible” portion, each segment register has a
64-bit (8-byte) program-invisible part (not shown in figure), used to store the segment descriptor
corresponding to each segment register.
********
Other than the registers in the programming model, 80386 has many special registers such as control
registers, debug registers and test registers. All these are 32-bit registers.
Control Registers
There are four control registers: CR0 – CR3.
CR0 contains a number of special control bits that are used in memory paging, math coprocessor
selection, protected mode selection, etc. Memory paging is one useful feature which allows any linear
address to be assigned to any physical memory location in the system.
CR1 is not used in 80386, but used in Pentium processor
CR2 is used to hold the 32-bit linear address of the last page accessed before a page-fault interrupt.
CR3 is used to hold the base address of the page directory
Debug Registers
There are eight debug registers: DR0 – DR7.
The first four debug registers (DR0–DR3) contain the 32-bit linear breakpoint addresses. These
breakpoint addresses point to instruction or data. These addresses are always compared with the
address generated by the program. If there is a match between the breakpoint address and the address
generated by the program, then the microprocessor causes a type-1 interrupt. This feature is an
extension of the basic tracing or single-step mechanism used in the earlier microprocessors.
DR4 and DR5 are not used in 80386, but used in Pentium
DR6 and DR7 contain a number of control bits which are used to control the way in which the
microprocessor responds to a breakpoint (debug or TRAP) interrupt.
Test Registers
There are two test registers: TR6 & TR7. They are used to test the translation look aside buffer (TLB)
that is used with the paging unit in 80386. The TLB holds the most commonly used page table address
translations, thereby reducing the number of memory read accesses. The TLB holds the most common
32 entries from the page table, and is tested with TR6 and TR7 registers.
TR6 holds the tag field (linear address) of the TLB, and TR7 holds the physical address of the TLB.
Both these registers contain several bits, which control the way in which the TLB is accessed.
In the protected mode operation of 80386, the memory address consists of 16-bit segment selector and
32-bit offset address. The selector points to a descriptor for the segment, in the descriptor table. The
offset address specifies the location of the desired code or data in the segment. Since the offset is 32-
bit, the maximum size of a segment in 80386 can be 2^32 = 4 gigabyte (GB).
Figure shows how the 80386 uses a selector to access a descriptor from the descriptor table, and how
it computes the physical address. The 16-bit selector which is contained in the segment register
consists of 13-bit index, 1-bit table indicator (TI) and 2-bit requested privilege level (RPL). If TI=0,
then the 13-bit index selects a segment descriptor from the global descriptor table. If TI=1, then the
13-bit index selects a segment descriptor from the local descriptor table. The 2-bit RPL is part of the
80386 processor’s built-in protection features, which we shall not discuss here.
The 13-bit index part of the selector is multiplied by 8 and used as a pointer to a descriptor in the
descriptor table. Multiplication by 8 is done because each descriptor is 8-byte in length. The
descriptor, among other things, contains the 32-bit physical base address of the segment, and the limit
or maximum size of the segment. The memory management unit (MMU) of 80386 adds the 32-bit
base address from the descriptor to the 32-bit offset or effective address, to generate the 32-bit
physical address of the code or data byte.
The size of physical memory in 80386 is 4 GB. If virtual addressing is used, then 64 terabyte memory
locations can be mapped to 4 GB memory space (memory swapping with hard disk). Figure shows the
organization of 80386 physical memory system.
The memory system is organized into four banks. Each bank is 8-bit wide, and consists of 1 GB
memory. To store a 16-bit number, lower byte is stored in one bank, and upper byte is stored in
another bank. To store a 32-bit number, one byte is stored in each bank (8-bit × 4 = 32-bit). This
arrangement allows byte, word and double-word to be accessed in single memory cycle.
Each memory byte is numbered in hexadecimal across the four banks, starting from 00000000h to
FFFFFFFFh, as shown in the figure. In 8086 and 80286, there are two banks, and they are enabled
using A0 and BHE# signals. Whereas in 80386, the four banks are individually enabled using four
bank enable signals BE0# – BE3#.
A pipeline is a special way of handling memory access, so that the memory gets additional time to
access data. Pipelining extends the memory access time from 50 ns (without pipeline) to 80 ns (with
pipeline), assuming 16 MHz clock frequency.
The pipeline works as follows: When an instruction is fetched from memory, the microprocessor often
has extra time before the next instruction needs to be fetched. During this extra time, the address of
the next instruction is sent out through the address bus, ahead of time. This extra time gives more
access time to slower memory devices.
Not all memory references can take advantage of the pipe, because sometimes microprocessor needs
to fetch code or data from a memory location which is not immediately following the previous
memory location. In that case, the memory cycle will be non-pipelined. Overall, pipeline is a cost-
saving feature that reduces the memory access time and increases the speed of memory cycle.
In systems with higher clock frequencies, another technique, called cache memory is used to increase
the speed of memory cycle. A cache memory is a high speed static-memory (SRAM) that is placed
between the microprocessor and DRAM memory. SRAMs have access times less than 10 ns, and
thus speed up the memory cycle.
The size of the cache memory is decided by the type of application program that is running on the
microprocessor. If the program is small and works on fewer amounts of data, then a small cache is
beneficial. If the program is large, and works on large blocks of data, then a large cache is
recommended. In 80386 based computer system, 256 KB cache memory is used.
Interleaved memory system is the technique used for improving the speed of the system. It requires
two or more complete set of address buses, and a controller that provides addresses for each bus.
Depending on the number of buses present, we have two-way interleave and four-way interleave.
In this technique, the memory system is divided into 2 or 4 parts. Suppose that there are two parts.
One part contains 32-bit addresses (4 bytes) 00000000h–00000003h, 00000008h–0000000Bh, and so
on. The other part contains addresses 00000004h–00000007h, 0000000Ch–0000000Fh, and so on.
While the microprocessor is processing on the data from one part of the memory, the interleave
control logic generates the address for the next data which is in the other part of memory. In this way,
the memory devices get more access time, without the need for inserting wait-states in the memory
cycle. This speeds up the system.
Input/Output System
The I/O system in 80386 is similar to that of 8086 microprocessor based system. In isolated-I/O, each
I/O location is given 16-bit address; so altogether there are 64 KB I/O locations. The I/O map of
80386 is similar to the memory map, shown in page #5, except that, in I/O, the address ranges from
0000h to FFFFh.
As in the case of memory system, the I/O system is also organized into four banks. Even though most
of the I/O data transfers are 8-bit wide, 16-bit and 32-bit data transfers are used in the recent disk
drives and video display interfaces. The wider (32-bit) I/O data path increases the data transfer rate
between the microprocessor and the I/O devices.
A31–A2: Address bus consists of 32-bit address. However, address bits A1 and A0 are not
available. Instead, they are encoded into the bank enable signals (BE0#–BE3#).
D31–D0: Data bus consists of 32 lines for data transfer between microprocessor & memory and
between microprocessor and I/O devices.
BE3#–BE0#: Bank enable signals select either one bank (for byte access), or two banks (for word
access) or all four banks (for double-word access).
M/IO# This signal selects the memory device when it is logic high, and selects I/O device
when it is at logic low.
W/R# This signal indicates write operation when at logic high, and indicates read operation
when at logic low
ADS# The address strobe signal becomes active whenever the microprocessor issues a valid
memory or I/O address. It is similar to ALE signal of 8086.
RESET The reset signal initializes the microprocessor, causing it to begin executing the
software at memory location FFFFFFF0h.
CLK2 Clock times-2 is connected to a clock signal that is twice the operating frequency of
the microprocessor. For example, if the microprocessor operates at 16 MHz, then 32
MHz clock signal is connected to this pin.
BS16# Bus size 16, selects 16-bit bus when the signal is logic-0, and selects 32-bit bus when
the signal is logic-1.
NA# Next address signal causes the 80386 to output the address of the next instruction or
data in the current bus cycle. This is useful in pipelining.
Intel’s 80486 is a 32-bit microprocessor. It has 32-bit data bus and 32-bit address bus
It has all the features of 80386 microprocessor
It is made up of over 1.2 million transistors
It has in-built numeric coprocessor similar to 80387
It has 8 KB level-1 cache memory; it also supports external (level-2) cache memory
It can operate at clock frequencies 25 MHz, 33 MHz, 50 MHz, 66 MHz and 100 MHz.
It has parity generator and checker unit, which provides/checks parity for each byte of memory
It has built-in self-test (BIST) that tests the microprocessor and other parts during start-up
It is packaged in 168-pin pin grid array (PGA) package.
Pentium processor uses superscalar architecture. It executes instructions in five stages, allowing the
processor to overlap multiple instructions, so that it takes less time to execute two instructions in a
row. It has two separate 8 KB caches on chip; one for the instructions, and the other for the data. This
allows the processor to fetch data and instructions simultaneously from the cache. When the data is
modified, only the data in the cache is changed; Data in the memory is changed only when the
processor copies the cache back to the DRAM.
The Pentium microprocessor uses branch prediction logic to reduce the time required for a branch
operation. Branch prediction is a digital logic circuit that guesses which way a branch will go before
this is known for sure.
Two-way branching is usually implemented with a conditional jump instruction. A conditional jump
can either be “not taken” and continue execution with the code which immediately follows the jump
instruction, or the jump can be “taken”, and the jump takes place to a different program memory
location. It is not known for sure whether the conditional jump will be taken or not taken, until the
condition has been calculated and the jump instruction has passed the execution stage in the
instruction pipeline.
Without branch prediction, the processor would have to wait until the conditional jump instruction has
passed the execution stage, before the next instruction can enter the fetch stage in the pipeline. The
branch predictor attempts to avoid this waste of time by trying to guess whether the conditional jump
is most likely to be taken or not taken. The branch that is guessed to be the most likely is then fetched
and speculatively executed. If it is later detected that the guess was wrong then the speculatively
executed or partially executed instructions are discarded and the pipeline starts over with the correct
branch, incurring a delay.
There are two types of branch predictions: Static and Dynamic. Static prediction is the simplest
branch prediction technique because it does not rely on information about the history of code
executing. Instead it predicts the outcome of a branch based solely on the branch instruction. Dynamic
branch prediction uses information about taken or not taken branches gathered at run-time to predict
the outcome of a branch. Pentium processor uses dynamic branch prediction.
The Pentium processor has separate 8 KB L1 cache for data, and 8 KB L1 cache for code, and they
are located inside the chip.
The data cache is configurable as a write-back or write-through on a line-by-line basis. When the
cache is configured as write-back, the cache acts like a buffer by receiving data from the processor
and writing data back to main memory whenever the system bus is available. The advantage to the
write-back process is that the processor is freed up to continue with other tasks while main memory is
updated at a later time. However the disadvantage to this approach is that by having cache handle
writes back to memory, the cost and complexity of cache subsequently increase. The second
alternative is to configure the Pentium cache as write-through. In a write-through cache scheme the
processor handles writes to main memory instead of the cache. The cache may update its contents as
the data comes through from the processor however the write operation does not end until the
processor has written the data back to main memory. The advantage to this approach is that the cache
does not have to be as complex, which thus makes it less expensive to implement. The disadvantage
of course is that the processor must wait until the main memory accepts the data before moving on to
its next task.
Cache consistency on the Pentium processor is maintained using the MESI protocol. The protocol is
used to decide if a cache entry should be updated or invalidated.
******** ********