80386 was Intels first 32-bit micro processor that
contained 32-bit data bus and 32-bit address bus. Through the 32-bit address bus, the 80386 addresses upto 4G bytes (= 2 32 bytes ) of memory. The 32 bit data bus allows to read or write single precision floating number (32 bits) from or into memory in a single memory read or write cycle. This increases the speed of execution of any program that manipulates real numbers in 80386. Most high level language programs and database management systems use real numbers for data storage. The 80386 processor uses 80387 (numeric coprocessor) to perform the floating-point operations. Oxford University Press 2013 Architecture of 80386 The internal architecture of 80386 is divided to three units namely bus interface unit, memory management unit and central processing unit. The central processing unit is further divided into execution unit and instruction unit. The execution unit has eight general purpose and eight special purpose registers which are either used for handling data or calculation of the offset addresses. The instruction unit decodes the opcode bytes received from the 16-byte instruction queue and arranges them in a three decoded-instruction queue so as to pass it to the control section for deriving the necessary control signals. The barrel shifter increases the speed of execution of shift and rotates instructions. 32-bit multiplication can be executed within one microsecond by the multiply/divide logic. Oxford University Press 2013 Architecture of 80386 The memory management unit (MMU) consists of a segmentation unit and a paging unit. The segmentation unit allows the use of two address components, namely segment and offset for relocability and sharing of certain code and data by many programs. The maximum size of a segment is 4 Gbytes. The paging unit organizes the physical memory in terms of pages of 4Kbytes each. The paging unit works under the control of segmentation unit (i.e each segment is divided into pages). The virtual memory is also organized in terms of segments and pages by the MMU. The 80386 requires a single +5V power supply for its operation. The clock frequency used in different versions of 80386 is 16 MHz, 20 MHz, 25 MHz and 33 MHz. Oxford University Press 2013 Functional block diagram of 80386 Oxford University Press 2013 Register organization of 80386 All the registers in 80386 are 32-bits. The 32-bit register, known as an extended register is represented by the register name with prefix E. But 16-bit registers such as AX,BX,CX, etc. and 8-bit registers such as AH,AL,BH, etc. are also available in 80386 as in 8086. There are two additional segment registers such as FS and GS, which provides two additional segments which can be accessed by a program. The 80386 includes a memory management unit (MMU) that allows memory resources to be allocated and managed by the operating system. Oxford University Press 2013 Register organization of 80386 The segment descriptor registers are not available for the programmer rather they are internally used to store the segment descriptor information like base address, limit and attributes of different segments. These registers are automatically loaded when the corresponding segment registers are loaded with new selectors. GDTR, IDTR, LDTR and TR are used to access the descriptor tables namely GDT, IDT, LDT and TSS descriptor respectively.
Oxford University Press 2013 Register organization of 80386 The registers DR0 to DR3 are used to store four program controllable breakpoint addresses at which execution of a program breaks, which is useful to debug a program using breakpoint technique easily. DR6 and DR7 hold break point status and break point control information respectively. The control registers CR1 is reserved for use in future Intel processor. The bits PE and PG (bits 0 and 31) in CR0 are used to enable protected mode operation and paging respectively. CR3 is used to hold the base address of page directory in memory. CR2 is used to hold the linear address for which page fault (required page being not present in physical memory) has occurred and using this address the operating system can load the required page in physical memory from the secondary memory.
Oxford University Press 2013 Segment registers and MMU registers Oxford University Press 2013 Control registers in 80386 to Pentium Oxford University Press 2013 Debug registers and Test registers in 80386 to Pentium Oxford University Press 2013 Segment registers and their default offset registers in 80386 to Pentium processors Oxford University Press 2013 Instruction set of 80386
The instruction set of the 80386 is upward compatible with the earlier 8086 and 80286 processors. The memory management instructions and techniques used by the 80386 are also compatible with the 80286. These features allowed 16-bit software which are written for 8086 and 80286, to be executed in 80386 also. There are few additional instructions included in 80386 which references the 32 bit registers and manages the memory system: BSF, BSR, BT, BTR, BTS, CDQ, CWDE, LFS, LGS, LSS, MOVSX, MOVZX, SET cc, SHLD, SHRD.
Oxford University Press 2013 Addressing memory by 80386 in protected mode The base address is 32 bits (B31-B0) and limit is 20 bits (L19-L0). If the granularity (G) bit is 1, then the limit field is multiplied by 4K to get the size of the segment in bytes and if G is 0 then the limit field itself gives the size of the segment in bytes. The available (AV) bit indicates whether the segment is available (AV=1) or not available (AV=0). The D bit indicates how the 80386 to Pentium 4 access register and memory data in the protected and real mode. If D is 0, the instructions are 16-bit instructions, compatible with 8086-80286 and these instructions use 16-bit registers and 16-bit offsets by default. If D is 1, the instructions are 32-bit instructions, compatible with 80386- 4 and these instructions use 32-bit registers and 32-bit offsets by default Oxford University Press 2013 Addressing of memory when the 80386 Pentium operates in protected mode Oxford University Press 2013 Format of segment descriptor in 80386 to Pentium Oxford University Press 2013 Accessing memory by different addressing modes in protected mode The From 80386 onwards, Intel introduced a new addressing mode called scaled index addressing mode in which the content in index register can be multiplied by a factor of 1 or 2 or 4 or 8 while calculating the effective address (EA)
Oxford University Press 2013 Accessing memory by different addressing modes in protected mode by 80386 to Pentium Oxford University Press 2013 Physical memory organization of 80386 The physical memory system of 80386 is 4G bytes. If virtual addressing is used using LDT and GDT, 64 Tera bytes (214 segment descriptors X 232 bytes/segment = 26 x 240 bytes) of virtual memory are mapped into the 4G bytes of the physical memory by the memory management unit and descriptors. The physical memory is divided into four 8-bit wide memory banks, each containing up to 1Gbyte of memory This 32 bit wide memory organization allows bytes, words or double words of memory data to be accessed directly. The physical memory address ranges from 00000000H to FFFFFFFFH. The physical memory location with address 00000000H is in bank0, with address 00000001H is in bank1, with address 00000002H is in bank2, and with address 00000003H is in bank3, etc. Oxford University Press 2013 Physical memory system of 80386 Oxford University Press 2013 Paging mechanism in 80386
The paging mechanism provides an efficient way of handling virtual memory. The paging can be enabled or disabled be setting or clearing the PG bit in control register CR0 respectively. When paging is enabled, each segment is divided into fixed size pages of 4 Kbytes each and the address generated by the segmentation mechanism is known as linear address. The information about each page is stored in a page table in the form of a 4- byte entry known as PTE (Page Table Entry) . Each page table can store a maximum of one thousand and twenty four (=210) PTEs and hence the size of a page table is 4 Kbytes. There can be a maximum of one thousand and twenty four (=210) page tables. The information about each page table is stored in a page directory in the form of a 4-byte entry known as PDE (Page Directory Entry). There is only one page directory and it can store a maximum of one thousand and twenty four (=210) PDEs and hence the size of a page directory is also 4 Kbytes. Oxford University Press 2013 Paging mechanism of 80386 Oxford University Press 2013 Conversion of linear address to physical address by the paging mechanism in 80386 Oxford University Press 2013 80486 microprocessor The bus interface, connected to the external system bus and to the on-chip cache and prefetcher unit. The prefetcher, which includes a 32-byte queue of prefetched instructions and is connected to the bus interface, cache, instruction decoder and segmentation unit. The cache unit which includes an 8 Kbyte cache, storing both code and data, and cache management logic. Ii is connected to through a 64-bit interunit transfer (data) bus to the segmentation unit, ALU and FPU. The cache unit is also directly connected to the paging unit, bus interface and prefetcher through 128 lines, permitting the prefetching 16 bytes of instructions simultaneously. The cache is four way set-associative, write through, with 16 bytes/line. In write-through policy of cache, during a write operation when there is a hit, both the cache and main memory are updated together Oxford University Press 2013 80486 microprocessor The instruction decode unit, which receives three bytes of undecoded instructions from the prefetcher queue and transmits decoded instructions to the control and protection test unit. The control and protection test unit which generates micro instructions transmitted to other units and performs protection testing The ALU, which includes general purpose register file, a barrel shifter, and registers for microcode use. The FPU which includes floating-point registers, an adder, a multiplier, and a shifter. The segmentation unit, which includes segmentation management logic, descriptor registers and break point logic. The paging unit, which includes paging management logic and a 32- entry TLB. Oxford University Press 2013 Pentium microprocessor The Pentium has a five stage integer pipeline, branching out into two paths U and V in the last three stages . The Pentium pipeline stages are as follows: PF- Prefetch. The CPU prefetches the code from the code cache and aligns the code to the initial byte of the next instruction to be decoded. D1- First decode. The CPU decodes the instruction to generate a control word. A single control word causes direct execution of an instruction. More complex instructions require microcoded control sequencing. D2- Second decode. The CPU decodes the control word, generated in stage D1, for subsequent use in the next execution (E) stage. In addition, addresses for data memory references are generated. E- Execute: The instruction is executed in the ALU. The barrel shifter or other operational units are used if necessary. The data cache is accessed at this stage if necessary. WB- Write back: The CPU stores the results and updates the flags at this stage.
Oxford University Press 2013 Block diagram of Pentium Oxford University Press 2013 31 Integer pipeline stages in Pentium Oxford University Press 2013 8-stage floating-point pipeline PF- Prefetch. Prefetch instructions from the code cache D1- First decode. Same as in the integer pipeline. D2- Second decode. Same as in the integer pipeline. E- Operand fetch. operands are fetched either from the floating-point register file or the cache. X1- First execute. First step in the floating-point execution by the FPU (Floating-Point Unit). X2- Second execute. Second step in the floating-point execution by the FPU WF- Write float. The FPU completes the floating-point computation and write the result into the floating-point register file. ER- Error reporting. The FPU reports internal special situations that might require additional processing to complete execution and updates the floating-point status word.
Oxford University Press 2013 Physical memory (1 Gbytes) organization in Pentium Oxford University Press 2013 Pentium Pro processor Pentium Pro processor formerly named P6 microprocessor contains 21 million transistors, three integer units as well as a floating-point unit to increase the performance of most software. The basic clock frequency was 150 MHz and 166 MHz in the initial versions of Pentium Pro processor made available in late 1995. In addition to the internal 16 Kbyte level-one (L1) cache (8K for code and 8K for data), the Pentium Pro processor also contains a 256 Kbyte level-two (L2) cache. The Pentium Pro processor uses three execution engines, so it can execute up to three instructions at a time, which can conflict and still execute in parallel. The Pentium Pro processor can address either a 4 Gbyte memory system or 64 Gbyte memory system. Pentium Pro processor has 36 bit address bus if configured for a 64 Gbyte memory system.
Oxford University Press 2013 Pentium II The Pentium II processor was released in 1997. Instead of being an integrated circuit, Intel has placed the Pentium II on a small circuit board. The main reason for the change is that the L2 cache found on the main circuit board of Pentium was not fast enough to justify a new microprocessor. On the Pentium system, the L2 cache operates at the system bus speed of 60 MHz or 66 MHz. The microprocessor and L2 cache are on a circuit board called the Pentium II module. This on-board, L2 cache operates at a speed of 133 MHz and stores 512 Kbytes of information. The microprocessor on the Pentium II module is actually a Pentium Pro with MMX (Multimedia) extensions which has no internal L2 cache.
Oxford University Press 2013 Pentium II Xeon Intel introduced a new version of the Pentium II called the Xeon in mid-1998, which was specifically designed for high end workstation and server applications. The main difference between the Pentium II and Pentium II Xeon is that the Xeon is available with the L1- cache size of 32K bytes and L2-cache size of either 512K, 1M or 2M bytes. The Xeon functions with the 440GX chipset. The Xeon is also designed to function with 4 Xeons in the same system, as can be done in Pentium Pro processor also.
Oxford University Press 2013 Pentium III The Pentium III uses a faster core than the Pentium II, but it is still a P6 or Pentium Pro processor. Pentium III is available in two versions namely the slot I version mounted on a plastic cartridge and a socket 370 version called a flip-chip, which looks like the older Pentium package which costs less. Pentium III is available to clock frequencies of 1GHz. The slot I version of Pentium III contains a 512 K cache and a flip-chip version of Pentium III contains a 256K cache. The speeds of both versions are comparable because the cache in the slot I version runs at one-half the clock speed, while the cache in the flip-chip version runs at the clock speed. Both versions use a memory bus speed of 100 MHz, while the Celeron uses a memory bus clock speed of 66 MHz. The speed of the front-side bus, the connection from the Pentium III to the memory controller, PCI controller and the AGP controller is now either 100 MHz or 133 MHz.
Oxford University Press 2013 Pentium 4 The Pentium 4 was released in late 2000. The Pentium 4 uses the Intel P-6 architecture. The main difference is that the Pentium 4 is available in a 1.3, 1.4, and 1.5 GHz speed versions, and the chipset that supports the Pentium 4 uses the RAMBUS memory technology in the place of SDRAM technology. These higher microprocessor speeds are made available by an improvement in the size of the internal integration. Pentium 4 contains 8 Kbyte L1 cache but it may be increased to 32K L1-cache in future versions of Pentium 4. L2- cache remains at 256 K bytes. The front side bus speed is increased from the current maximum of 133 MHz to 200 MHz or higher. Oxford University Press 2013 Summary ls of privThe 80186 has same architecture as 8086 but it also has more on-chip peripherals such as three timers, two DMA controllers, one interrupt controller, and peripheral and memory select logic. The protected mode operation is introduced in 80286 by Intel and it exists in all the Intel processors that are developed after 80286 such as 80386 , 80486 and Pentium. There are four leveilege levels present in protected mode operation namely level 00,01,02 and 03 in which level 00 is considered as highest privilege level and 11 is considered as lowest privilege level. A segment is accessed using a segment descriptor which is present either in GDT or LDT in protected mode. There will be only one GDT and as many number of LDTs equal to that of the number of tasks currently executed by CPU in the 80X86 based system.
Oxford University Press 2013 Summary The segment descriptor contains the complete details of a segment such as base address, size, access rights and other information regarding the segment. With the help of task register and task state segment, task switching is easily realized in 80286 to Pentium. The register size is increased to 32 bits and also there are two additional segments FS and GS present from the processor 80386 onwards. Also 4G bytes of main memory can be accessed by them. The paging mechanism introduced from 80386 onwards makes the handling of virtual memory easier. 80486 microprocessor has on-chip 8K byte unified cache and floating point unit. Pentium is the first two issue superscalar processor introduced by Intel with the help of its two integer U and V pipelines. The floating-point pipeline in Pentium makes the operation on floating point numbers faster.
Oxford University Press 2013 Key Terms Real mode addressing: In real mode addressing, 80286 to Pentium can access only one 1 Mbyte of physical memory as 8086. Protected Virtual Address Mode (PVAM) or Protected mode addressing : In PVAM, 80286 can access 16 Mbytes of physical memory and 1 Gbytes of virtual memory. In PVAM, 80386 to Pentium can access 4 Gbytes of physical memory and 64 Tbytes of virtual memory. Segment descriptor An 8-byte entry in LDT or GDT which contains the physical base address for a segment, size of the segment and access rights allotted to the segment.
Oxford University Press 2013 Key Terms Selector In protected mode, the 16-bit value in a segment register is known as selector and it is used to select one of the segment descriptor in LDT or GDT. GDT (Global Descriptor Table) - GDT contains segment descriptor of those segments (that belong to operating system, compiler, assembler, etc.) that can be used by all programs in a multi-user system. There will be only one GDT in the system. LDT (Local Descriptor Table) - LDT contains segment descriptor of those segments that belong to a particular user or task. In a multi- user or multi-tasking environment, there will be as many LDTs equal to number of users or number of tasks handled by the CPU. IDT (Interrupt Descriptor Table) The interrupt descriptor for different interrupt types, starting from interrupt type 00H are successively stored in IDT present in the memory.
Oxford University Press 2013 Key Terms Pointer In PVAM mode, the 16-bit selector and 16/32 bit offset are combined to form a 32/48 bit pointer type data. GDTR (Global Descriptor Table Register) GDTR holds the base address of GDT in memory. LDTR (Local Descriptor Table Register) LDTR holds the base address of LDT in memory. IDTR (Interrupt Descriptor Table Register) IDTR holds the base address of IDT in memory.
Oxford University Press 2013 Key Terms Multitasking- The ability of a CPU to execute many user programs simultaneously is known as multitasking. Task switching When the time slot for the execution of current task is over, the current state of the CPU is saved in a task state segment (TSS) and then the CPU state for the next task are loaded in the CPU registers from its TSS and execution begins. This is known as task switching. TR (Task Register) Used to select one of the TSS descriptor from the GDT. TSS (Task State Segment) - Segment in which the current state of the CPU is saved during task switching. Paging An efficient method to handle virtual memory in which a segment is divided in to equal sized Oxford University Press 2013 Key Terms Linear address When paging is enabled, the address obtained by adding the base address from the segment descriptor and offset address is known as linear address. Physical address- The address assigned to each location in the physical memory such as RAM or ROM. PD (Page Directory) Holds the page directory entry (PDE) of different page tables in memory. The base address of PD is indicated by control register (CR3) PT (Page Table) - Holds the page table entry (PTE) of different pages in memory. PDE (Page Directory Entry) Contains the base address of a page table in memory and control information related to that page table. PTE (Page Table Entry) - Contains the base address of a page in memory and control information related to that page. TLB (Translation Lookaside Buffer)- A small cache memory that will contain the 32 recently used linear addresses and their corresponding physical addresses used with paging mechanism.
Oxford University Press 2013 Key Terms Unified cache Cache that contain both code and data. FPU (Floating Point Unit) Used to perform floating point operations quickly. Two issue superscalar execution - Two instructions are simultaneously decoded and executed. Pipeline operation- At the same time, different stage of operations are performed on different instructions in a pipeline or overlapping the fetching, decoding and execution of different instructions. Dual cache Two cache memories, one having code alone and another having data alone.