Vliw Architecture

VLIW ARCHITECTURE
Increasing Processor Performance

Semiconductor Technology Parallel Processing
Multiprocessors, Multicomputers
Parallelism within the Processor

Pipelining ILP
VLIW
ILP (Instruction Level Parallelism)

Parallel Execution of Instructions. Overlapping of instructions ILP processors
Superscalar processors VLIW processors.
VLIW
Scalar Processors
Fetching and executing an instruction at a time A program represents a plan of execution. The processor acts as an interpreter that executes the instruction in the program one at a time.
VLIW
Execution in a Scalar Processor
Fetch
Execute Write Back

. 5
Decode
VLIW
Superscalar processors
Decision about operations by H/W
More than one instruction at a time Dynamic scheduling
VLIW
Basic Superscalar Approach

INSTRUCTION CACHE
INSTRUCTION BUFFERS, DECODERS, DISPATCHER
REGISTER FILE
RECORD BUFFER
EXECUTION UNIT #1
EXECUTION UNIT #2
EXECUTION UNIT #3
EXECUTION UNIT #4
VLIW
DATA CACHE .
Execution in Superscalar
Fetch Decode Execute Write Back
With degree 4
VLIW . 8
Disadvantages of Superscalar
Complexity of hardware. Window size constrained. This limits the capacity to detect independent instructions. More power consumption.
VLIW
VLIW
Very Long Instruction Word. Instructions hundereds of bits in length Uses long instruction called a Multiop Multiple functional units are concurrently used Functional units share a common register file. Code compaction by compiler.
VLIW
10
A Brief History
Joseph fisher,Trace scheduling,1979 He coined the acronym VLIW. In 1984, two companies were started
Multiflow, started by Joseph Fisher
Cydrome, founded by Bob Rau.
VLIW
11
In 1987, Cydrome delivered the first machine the 256 bit Cydra 5. Multiflow delivered Trace/200 - 1987 Trace/300 - 1988 Trace/500 - 1990
VLIW
12
Since then VLIW machines have seen a revival and some degrees of success.
Multiflow closed in 1990 Cydrome closed in 1998
VLIW
13
Basic VLIW Approach

INSTRUCTION CACHE
INSTRUCTION REGISTER
REGISTER FILE
EXECUTION UNIT #1
EXECUTION UNIT #2
EXECUTION UNIT #3
EXECUTION UNIT #4
VLIW
DATA CACHE .
14
Instruction Format
FP ADD FP MULT INT ALU Branch Load/Store
Instruction Issue Unit
FP ADD
FP MULT
Int ALU
Branch
Load/ Store Register File
VLIW
15
VLIW Execution
Fetch Decode Execute Write Back
With degree 4
VLIW . 16
Case Studies
Defoe. Intel Itanium Processor.
Transmeta Crusoe Processor.
VLIW
17
Defoe Architecture
To L2 Cache D-Cache
Simple Simple Complex Integer Integer Integer
Load/ Store
Load/ Store
Branch/ Cmp
64 entry Register File
16x Pred Score Board & Fetch

18
Dispersal Unit
From L2 Cache
VLIW
D-Cache
.
Instruction Encoding
64 bit compressed VLIW architecture.
Used variable length multiops Individual operations are encoded as 32 bit words. A special stop bit indicates the end of an instruction word.
Stop bit Predicate OPCODE RDEST VLIW bit) (1 (4 bits) (9 bits)Abhilash.P.K. (6 bits) RSRC 1 RSRC 2 (6 bits) (6 bits)
19
Intel Itanium Processor

Intels first implementation of IA-64. IA-64 is an ISA for the EPIC (Explicitly Parallel Instruction Computing) style of VLIW, developed jointly by Intel and HP.
VLIW
20
64 bit processor, with

4 integer units 4 multimedia units 2 load/store units 2 extended precision floating point units 2 single precision floating point units
VLIW . 21
Transmeta Crusoe Processor

Designed to reduce power consumption. Dynamic scheduling consumes more power.
VLIW replaces the complex ways of gaining ILP with simpler and more power efficient ways.
VLIW
22
Instruction Format
Instructions are either 64 or 128 bits long. Molecules and atoms.
64 GPRs
VLIW
23
Compiler Support
Instruction scheduling algorithms are critical.
Three important scheduling algorithms Trace scheduling Trace scheduling-2 Super Block scheduling
VLIW
24
Advantages
Less hardware complexity. Static Scheduling Much more hardware can be devoted to useful computation. Software has a larger window to look at.. Can find more ILP.
VLIW . 25
Shortcomings
Wasteful encoding with NOPs. Hard to maintain code compatibility between generations.
Increased program size.

Compiler has to explicitly add NOP.
New versions of the architecture can force major rewriting of the compiler.
VLIW . 26
Future of VLIW
Newer processors are mainly used for Stream and image processing. Eg PhilipsTrimedia Digital Signal Processig. Eg TMS320C62x from Texas Instr Mobile computing. Eg Transmeta Crusoe High end server applications. Eg Intel Itanium
VLIW
27
Stream and media processing lend themselves to VLIW style with large amounts of ILP.
Superscalars will be forced to use simpler structures and seek help from software.
VLIW
28
VLIW
29
VLIW
30

Vliw Architecture

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Vliw Architecture

Transféré par

Droits d'auteur :

Formats disponibles

VLIW ARCHITECTURE

Increasing Processor Performance

Parallelism within the Processor

ILP (Instruction Level Parallelism)

Superscalar processors VLIW processors.

Execution in a Scalar Processor

Execute Write Back

Basic Superscalar Approach

INSTRUCTION BUFFERS, DECODERS, DISPATCHER

Fetch Decode Execute Write Back

Basic VLIW Approach

Instruction Issue Unit

Load/ Store Register File

Fetch Decode Execute Write Back

Transmeta Crusoe Processor.

Simple Simple Complex Integer Integer Integer

64 entry Register File

16x Pred Score Board & Fetch

Intel Itanium Processor

64 bit processor, with

Transmeta Crusoe Processor

Increased program size.

Vous aimerez peut-être aussi