Vous êtes sur la page 1sur 25

The SHARC

Super Harvard Architecture Computer

The SHARC
Developed by Analog Devices
Optimized for demanding DSP and imaging
applications.
32 Bit floating point, with 40 bit extended
floating point capabilities.
Large on-chip memory.
Ideal for scalable multi-processing applications.
2

Harvard Architecture
Program memory can store data.
Able to simultaneously read or write data at one
location and get instructions from another place in
memory.
2 buses
1 Data memory bus.
2 Program bus.

Either two separate memories or a single dual-port


memory.
3

Super Harvard Architecture


Many processor employ Harvard
Architecture by having two separate
memories or caches integrated into the
processor chip
The SHARC is unique in that its internal
memory is capable of holding a large
program as well a large amount of data.
This is what makes it SUPER!!!
4

DSP
Digital Signal Processor.
High speed, low overhead data movement
and rapid computations required.
Usually has a small on-board ROM, RAM
and single cycle multiply.
Designed to run single line, serial in, serial
out, signal processing applications very fast.
5

DSP Computations
The inner product of two vectors is a common
computation for determining energy or
correlation.
The following C code is an example:
for (n=0; n<length; n++)
result+= x[n] * y[n];
The process which has the lowest instruction
time will have the best performance.
6

SHARC DSP
The SHARC incorporates features aimed at
optimizing such loops.
High-Speed Floating Point Capability
Extended Floating Point

These features are DSP specific.


Meaning, when applied to a non-DSP
application performance may not be as
optimal.
7

Floating Point and


Extended Floating Point
The SHARC supports floating, extended-floating
and non-floating point.
No additional clock cycles for floating point
computations.
Data automatically truncated and zero padded when
moved between 32-bit memory and internal
registers.
Not accurate enough for scientific algorithms.
Excellent signal to noise ratio.
8

SHARCs Internal Memory


Makes SHARC unique.
Size
Allows many complex functions to be preformed on-chip.
Eliminating the need to move data between internal and
external memory.
Memory size is significantly larger then most other high
speed computational devices.

Dual-block, Dual-port
Optimizes the Harvard Architecture by allowing the fetch
of instructions while performing data memory accesses.
9

Multiply and Accumulate


Instructions on the SHARC
Like most DSPs the SHARC is able to compute a
product and add the product to a running total in a
single clock cycle.
The SHARCs super instruction is that it can
multiply and accumulate while adding,
subtracting, or averaging data in two other
registers.
These instructions give the SHARC its 120
megaflop rating.
10

Zero Overhead Looping


on the SHARC
A single instruction outside the loop performs
loop set-up. Informing the SHARC that there
is a loop approaching.
The instruction also includes the iteration
count and termination condition.
This causes the pipeline to remain full during
loop execution and also allows the
termination condition to be tested in parallel.
11

DAGs on the SHARC


Data Address Generators are integer computation
units that manage the indexing of registers.
Allows the SHARC to fetch a value and update
the index value.
If the updated value exceeds a limit, the DAB
adjusts the index so that it wraps.
This occurs in the same clock cycle as the read or
write.
12

DAG Capabilities
Circular Buffering
Rather then actually moving data in and out of a vector,
circular buffers are used.
Updating the index modulo, the oldest entry can be
conveniently replaced by the newest entry.

Bit Reverse Addressing


The bit pattern of a vector index is reversed.
Done automatically by the SHARC.
Required for Fast Fourier Transform (FFT), which is often
critical to DSP applications.
13

SHARC DSP
What Makes the SHARC unique?
It also has some features not related directly related
to optimizing numeric computations.
Pipelining
Handling Branches

Why has this not emerged sooner?


Technology has only recently become available to
make it economical to integrate general single
computing devices.
14

SHARCs Pipeline
3 stages
1 Instruction Fetch
2 Decode
3 Execution

Takes three clock cycles for an instruction to


propagate through the pipeline.
The processor execution speed is one instruction
per clock cycle even though each instruction
requires three clock cycles.
15

SHARCs Handling Branches


Delayed Branching
When a branch instruction is encountered the
two instructions which have been loaded and
decoded are executed before the branch.
This keeps the pipeline full and avoids junking
those two instructions and reloading the pipeline.
Beneficial in situations such as a few instruction
loops. When the ratio of wasted clock cycles to
instructions is significant.
16

SHARCs Handling Branches


Non-delayed Branching
Traditional branching.
If the pipeline cannot be reordered to use
delayed branching, non-delayed branching
is space saving.
Uses only one word of storage.
Although, it takes three cycles as the
pipeline gets reloaded.
17

Multi-processing
SHARC is uniquely equipped for multiprocessing.
Links to ports are very powerful multiprocessing capabilities.
Two main program models depending on the
application.
Adapts well to different multi-processing
architectures.
18

Multi-processing
SHARC Links
SHARC has 6 link ports that can transport
data at rates up to 40Mbytes/sec.
Links designed for point-to-point
connections.
Data can be transmitted in either direction
but not both simultaneously.

19

Multi-processing Program Model


MIMD
Multiple instruction, multiple data.
Good for applications that require multiple
instruction threads to execute concurrently.
Processors operate individually.
Each processor executes different code.

Typically used for image reconstruction and


multi-channel DSP.
20

Multi-processing Program Model


SIMD
Single instruction, multiple data.
Works best when all processors execute
identical instruction sequences.
Do not require overhead for inter-processor
synchronization.
Typically used for synthetic aperture radar
and automatic target recognition.
21

Multi-processing Architectures
Cluster Design
Groups of up to 6 in a cluster
Most common for joining multiple
SAHRC's
All processors, global I/O and global
memory connected to a common
Cluster bus.
Each SHARC can drive the bus.
22

Multi-processing Architectures
Mesh Design
All SHARCs joined by their link ports and are
connected to a common bus.
In SIMD mode one single master SHARC drives
the bus.
In MIMD mode mesh architecture cannot function
if data is lager then on chip available memory.
Advantageous scalability over a wider range of
applications.
23

How optimal is the SHARC for


non-DSP Applications?
It is obviously geared for DSP applications.
While it may fare better then other
processors it is still behind those which are
designed specifically for non-DSP
applications.

24

Sources
www.alacron.com/news/tp_mimd_simd.htm
www.analog.com
www.cs.seas.gwu.edu/~cs339/cs339lecture2.pdf
www.ixthos.aa.psiweb.com/technical/notes_
articles/articles

25

Vous aimerez peut-être aussi