Académique Documents
Professionnel Documents
Culture Documents
com
A property of MVG_OMALLOOR
Analog Devices
SHARC
CS 433
Processor Presentation Series
Prof. Luddy Harrison
Overview
z Processor History
z Physical packaging
z Data paths, register files, computational units
z Pipelining, timing information
z Memory
z Instruction Set Architecture (ISA)
z Applications targeted
z Systems employing the SHARC
SHARC Features
z ADSP-2106x (2000)
z Single computational units based on predecessor
ADSP-2100 Family
z 40 MHz core
z ADSP-2116x (2001)
z SIMD (Single-Issue Multiple-Data) dual computational
unit architecture added
z 150-200 MHz core, 1-2 MB RAM
z ADSP-2126x, ADSP-2136x (2003 – Future)
z Integrated audio-centric peripherals (128-140db
Sample Rate Conversion) added
z 333-400 MHz core, 2-3 MB RAM
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 5
A property of MVG_OMALLOOR
ADSP-2106x Overview
CORE PROCESSOR DUAL-PORTED SRAM
TIMER INSTRUCTION TWO INDEPENDENT
BLCOK 0
DUAL-PORTED BLOCKS
BLCOK 1
CACHE
PROCESSOR PORT I/O PORT
MULTIPROC
INTERFACE
PM DATA BUS
BUS DATA BUS
CONNECT DM DATA BUS MUX
(PX)
HOST PORT
DATA DMA
REGISTER IOP CONTROLLER
FILE REGISTERS
ADSP-2106x Core
z Computational Units
z ALU, Multiplier, and Shifter can all perform independent
operations in a single cycle
z Register File
z Two sets (primary and alternate) of 16 registers, each
40-bits wide
z Program Sequencer and Data Address
Generators
z Allows computational units to operate independent of
instruction fetch and program counter increment
ADSP-2106x Packaging
ADSP-2106x
1x CLOCK CLKIN
EBOOT BMS
CS
LBOOT
CONTROL
ADDRESS
ADDR BOOT EPROM
DATA
DATA
IRQ
FLAG
TIMEXP ADDR31-0
ADDR
DATA47-0 DATA
LxCLK OE MEMORY &
LINK DEVICES LxACK RD WE PERIPHERALS
LxDAT WR ACK
ACK CS
TCLK0 MS3-0
RCLK0 PAGE
SERIAL TFS0 SBTS DMA DEVICE
DEVICE RFS0 SW
DT0 ADRCLK DATA
DR0 DMAR1-2
DMAG1-2
TCLK1
RCLK1 CS
SERIAL TFS1 HBR HOST PROCESSOR
DEVICE RFS1 HBG INTERFACE
DT1 REDY
DR1
RPBA BR1-6 ADDR
ID2-0 CPA
DATA
Edge-triggered or level-sensitive
IRQ2-0 Interrupt Request Lines
ADSP-2106x Registers
z Data Registers
z R15 – R0 (fixed-point), F15 – F0 (floating-point)
z Program Sequencer
z PC (program counter), PCSTKP (PC stack pointer),
FADDR (fetch address), etc.
z Data Address Generator
z I7 – I0 (DAG1 index), M7 – M0 (DAG1 modify)
z L7 – L0 (DAG1 length), B7 – B0 (DAG1 base)
z Bus Exchange, Timer, and System Registers
ADSP-2106x Buses
z Address
z Program Memory Address – 24 bits wide
z Data Memory Address – 32 bits wide
z Data
z Program Memory Data – 48 bits wide
z Stores instructions and data for dual-fetches
z Data Memory Data – 40 bits wide
z Stores data operands
ADSP-2106x I/O
z Serial Ports
z Operate at clock rate of processor
z DMA
z Port data can be automatically transferred to and
from on-chip memory
ADSP-2106x DMA
z I/O port block transfers (link/serial)
z External memory block transfers
z DMA Channel setup by writing memory buffer
parameters to DMA parameter registers
z Starting Address for Buffer
z Address Modifier
z Word Count
z Interrupt generated when transfer completes (i.e.
Word Count = 0)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
FS
EXT. PORT FIFO DEN
DMA ENABLE
FLSH
FLUSH EXT. PORT FIFO CHEN
DMA CHAINING ENABLE
EXTERN
EXT. DEVICES TO EXT. MEM. TRAN
DMA CHANNEL DIRECTION
INTIO
SINGLE-WORD INTERRUPTS PS
PACKING STATUS
HSHAKE
DMA HANDSHAKE
DTYPE
MASTER DATA TYPE
DMA MASTER MODE
PMODE
MSWF PACKING MODE
MOST SIGNIFICANT WORD FIRST
ADSP-2106x Pipelining
z Three phases
z Fetch
z Read from cache or program memory
z Decode
z Generate conditions for instruction
z Execute
z Operations specified by instruction completed
Pipelining
z Branches
z Delayed
z Two instructions after branch are executed
z Non-delayed
z Program sequencer suppresses instruction execution for
next two instructions
CLOCK CYCLES Æ
Fetch n+2 j j+1 j+2
Decode n+1 n+2 j j+1
Execute n no-op n+1 no-op n+2 j
Non-delayed Delayed
ADSP-2106x Memory
On-Chip SRAM ADSP-21060 ADSP-21062 ADSP-21061
0x0002 0000
0x0004 0000
BLOCK 0
0x0003 0000
BLOCK 0
BLOCK 1
0x0003 FFFF
0x0006 0000
NORMAL
WORD
ADDRESSING
128K x 32-bit words BLOCK 1
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 0x0007 FFFF 18
A property of MVG_OMALLOOR
ENTRY 1
ENTRY 1
ENTRY 1
ENTRY 1
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 22
A property of MVG_OMALLOOR
ADSP-2106x ISA
Compute Examples
z Single function
z F6 = (F2 + F3);
z Multi-function
z Distinct parallel operations supported
z Parallel computations and data transfers
z R1 = R2 * R6, M4 = R0;
z Simultaneous multiplier and ALU operations
z R1 = R2 * R6, F6 = F2 + F3;
0 CU OPCODE RN RX RY
z CU specifies
z 00 – ALU
z 01 – Multiplier
z 02 – Shifter
z OPCODE indicates operation type (add, subtract, etc.)
z RN specifies result register
z RX and RY specify operand registers
Multi-function Compute
z Parallel ALU and Multiplier operations
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ADSP-2106x ISA
Program Flow Control
z Instructions follow the format
IF condition JUMP/CALL, ELSE op2;
z No optional fields
z <addr24> is the loop start address
z termination is the loop ending condition to
check after each iteration
z Conditional Execution
z IF GT R1 = R2 * R6;
z IF NE JUMP label2;
z Also used for Call/Return
main: CALL routine;
routine: ...
RTS; /*return to main*/
ADSP-2106x ISA
Immediate Data Move
z Instructions follow the format
ureg = <data32>;
DM(<data32>, Ia) = ureg;
PM(<data24>, Ia) = ureg;
z Ia is an optional indirect addressor
z DM is a 32-bit data memory address
z PM is a 24-bit program memory address
ADSP-2106x ISA
Addressing Examples
z Direct
z JUMP <data24>;
z Relative to Program Counter
z JUMP (PC, <data24>);
z Register Indirect (using DAG registers)
z Pre-Modify (modification pre-address calculation)
z JUMP (M0, I0);
ADSP-2116x Overview
z Extension of 2106x, adding 150Mhz core and SIMD (Single-
Issue Multiple-Data) support via dual hardware
DIFFERENT DATA GOES TO EACH ELEMENT
PM DATA BUS
BUS
CONNECT DM DATA BUS
DATA DATA
REGISTER REGISTER
MULT FILE BARREL BARREL FILE
SHIFTER MULT
SHIFTER
ALU
PROGRAM
SEQUENCER
ADSP-2126x Overview
ADSP-2136x Overview
4 BLOCKS ON-CHIP MEMORY
CORE PROCESSOR
BLOCK 0 BLOCK 1 BLOCK 2 BLOCK 3
PROGRAM
DAG1 DAG2 ADDR DATA ADDR DATA ADDR DATA ADDR DATA
SEQUENCER
PM ADDRESS BUS
DM ADDRESS BUS
PM DATA BUS
DM DATA BUS
SHARC Benchmarks
z Algorithm benchmarks supplied by manufacturer:
Applications Targeted
z SHARC designed to
z Simplify Development
z Speed time to Market
z Reduce Product Costs
z Product targeted
z A/V Receivers
z 7.1 Surround Sound Decoding
z Mixing Consoles
z Digital Synthesizers
z Automobiles
SHARC Melody
Conclusion
z SHARC offers great deal of computational
power, with on-chip SRAM and SIMD
architecture
z Variety of applications (especially audio
processing) exploit it
Citations
z Processor details taken from product
manuals and descriptions at
http://www.analog.com
• This mode is a good way to specify initial values for registers. • Here the effective address is 500, the same as the operand.
• We’ve already used immediate addressing several times. • This is useful for working with pointers.
– It appears in the string conversion program you just saw. – You can think of the constant as a pointer.
– The register gets loaded with the data at that address.
– R0 is a pointer, and R1 is loaded with the data at that address. LD R2, (R0) // R2 contains the second element
– This is similar to R1 = *R0 in C or C++.
• So what’s the difference between direct mode and this one? • This is so common that some instruction sets can automatically
– In direct mode, the address is a constant that is hard-coded into increment the register for you:
the program and cannot be changed.
– Here the contents of R0, and hence the address being accessed, LD R1, (R0)+ // R1 contains the first element
LD R2, (R0)+ // R2 contains the second element
can easily be changed.
• The effective address here is M[360]. Indexed LD R1, CONST(R0) R1 ← M[R0 + CONST]
Relative LD R1, $CONST R1 ← M[PC + CONST]
• Indirect addressing is useful for working with multi-level pointers, or
Indirect LD R1, [CONST] R1 ← M[M[CONST]]
“handles.”
– The constant represents a pointer to a pointer.
– In C, we might write something like R1 = **ptr.
TOS ← R1 + R2
LD R1, C R1 ← M[C]
LD R2, D R2 ← M[D] MOVE X, A M[X] ← M[A] // Copy M[A] to M[X] first
ADD R1, R1, R2 R1 ← R1 + R2 // R1 = M[C] + M[D] ADD X, B M[X] ← M[X] + M[B] // Add M[B]
MOVE T, C M[T] ← M[C] // Copy M[C] to M[T]
MUL R1, R1, R3 R1 ← R1 * R3 // R1 has the result ADD T, D M[T] ← M[T] + M[D] // Add M[D]
ST X, R1 M[X] ← R1 // Store that into M[X] MUL X, T M[X] ← M[X] * M[T] // Multiply
TI C64: Architecture
Texas Instruments C64
VLIW signal processor
Program cache/program memory
32-
32-bit addresses
256-bit data
TMS320C64x CPU
Program fetch
Instruction dispatch
Functional units: Instruction decode
6 ALUs
(L1, L2, S1, S2, D1, D2) Register file A Register file B
2 multiplers (M1, M2)
Data path A
.L1
long src
Each functional unit has its own
src1
ST1b 32-bit write port into a GPR. Each
ST1a
The data path of C64x has the src2
long dst functional unit reads directly from
.L1 .S1 dst its own data path;
.S1 Register
file A following components: src1
dst
(A0-A31)
All units ending in 1 write to
LD1b
.M1
Two load-from-memory long dst
long src src2 register file A, and all units ending
LD1a data paths; in 2 write to register file B;
DA1 .D1
Two store-to-memory Each functional unit has two 32-
DA2
.D2 data paths; bit read ports for source operands
LD1a dst long dst src1 and src2;
LD1b Two data address paths; src1 dst
.M2 src1 L and S units have an extra 8-bit-
Register
Two register file data .D1
src2
wide port for 40-bit long writes, as
.M1
.S2 file B cross paths; well as an 8-bit input for 40-bit long
(B0-B31)
ST2a
ST2b
reads;
src2
TI C64: .L (.L1 and .L2) Unit Operations Performed .S (.S1 and .S2) Unit Operations Performed
.M (.M1 and .M2) Unit Operations Performed .D (.D1 and .D2) Unit Operations Performed
• 16 x 16 multiply operations • 32-bit add, subtract, linear and circular address calculation (for circular arrays)
• 16 x 32 multiply operations • Loads and stores with 5-bit constant offset
• Vector Operations • Loads and stores with 15-bit constant offset (.D2 only)
– Quad 8 x 8 multiply operations • Load and store double words with 5-bit constant
– Dual 16 x 16 multiply operations • Load and store non-aligned words and double words
– Dual 16 x 16 multiply with add/subtract operations • 5-bit constant generation
– Quad 8 x 8 multiply with add operation • 32-bit logical operations
• Bit expansion
• Bit interleaving/de-interleaving
• Variable shift operations
• Rotation
• Galois Field Multiply
2 D
ADI TigerSHARC: Core Block Diagram ADI TigerSHARC: Computation Block Block Diagram
Sequencer
ARM and Thumb
Low Power General Purpose Microprocssors
ARM11 MicroArchitecture
28 Jan 2005 Copyright ARM Ltd. 2002 December 8, 2003 Other ISA's 48
A property of MVG_OMALLOOR
ARMv5T
(ARM)
Summary
• Instruction sets can be classified along several lines.
– Addressing modes let instructions access memory in various ways.
– Data manipulation instructions can have from 0 to 3 operands.
– Those operands may be registers, memory addresses, or both.
• Instruction set design is intimately tied to processor datapath design.