Vous êtes sur la page 1sur 23

Mekelle Institute of Technology

Embedded Systems (CSE507)


Department of Electronics and Communication Engineering/Computer Science and Engineering

Lecture 6
Hardware Acceleration and Embedded Networks

Hardware Acceleration

Hardware acceleration is the use of computer hardware to perform some function faster than is possible in software running on the generalpurpose CPU. Examples of hardware acceleration include acceleration functionality in graphics processing units (GPUs) and instructions for complex operations in CPUs.( from wikipedia)

....
Normally, processors are sequential, and instructions are executed one by one. Various techniques are used to improve performance; hardware acceleration is one of them. The main difference between hardware and software is concurrency, allowing hardware to be much faster than software. Hardware accelerators are designed for computationally intensive software code. Depending upon granularity, hardware acceleration can vary from a small functional unit to a large functional block

....

The hardware that performs the acceleration, when in a separate unit from the CPU, is referred to as a hardware accelerator, or often more specifically as graphics accelerator or floatingpoint accelerator, etc. Those terms, however, are older and have been replaced with less descriptive terms like video card or graphics card.

....

Many hardware accelerators are built on top of field-programmable gate array chips.

CPUs and accelerators

Accelerated Systems
Use additional computational unit dedicated to some functions
hardwired logic extra CPU Coprocessor

Applications

graphics and multimedia (streaming data) encryption and compression communication devices (signal processing) supercomputing, numerical computations

Why Accelerators?

Better cost/performance.
Custom logic may be able to perform operation faster than a CPU of equivalent cost. CPU cost is a non-linear function of performance.

Better real-time performance.


Put time-critical functions on less-loaded processing elements.

Role of Performance Estimation

First, determine that the system really needs to be accelerated. How much faster is the accelerator on the core function (speed up)? How much data transfer overhead? How much is the overhead for synchronization with CPU? Performance estimation must be done on all levels of abstraction Simulation based methods (only average case, need to find test patterns that stress the system) Analytic methods (only limited accuracy, quick, used mainly on system level)

Accelerator Execution Time

Total accelerator execution time:

Input/output time (bus transactions) include: flushing register/cache values to main memory; time required for CPU to set up transaction; overhead of data transfers by bus packets, handshaking, etc.

Accelerator Gain

For simplification let us consider:


An application consists of one task only, repeated n times task can be executed completely on the accelerator

But in general:

not all tasks can be executed on the accelerator CPU has also other obligations possibilities:
single-threaded/blocking: CPU waits for accelerator multithreaded/non-blocking: CPU continues to execute along with accelerator, CPU must have useful work to do, software must support multi-threading

Accelerator/CPU Interface Issues

Synchronization
via interrupts via special data and control registers at accelerator

Data transfer to main memory


assisted by DMA or special logic within the accelerator caching problems as CPU works on cache and accelerator works on main memory (declare data area as non-cacheable, invalidate cache after transfer, write through cache, )

System Design Issues

Hardware/software co-design meeting system-level objectives by exploiting the synergism of hardware and software through their concurrent design. joint design of hardware and software architectures. Co-design problems have different flavor according to the application domain, implementation technology and design methodology Design a heterogeneous multiprocessor architecture Communication (bus, network, ) Memory architecture Interfaces and I/O Processing elements (CPU, application-specific integrated circuit, FPGA (Field-programmable gate array)) Program the system

Networking for Embedded Systems


Why we use networks. Network abstractions. Example networks.

Overheads for Computers as Components

13

Network elements
distributed computing platform: PE

PE communication link network PE


14

PEs may be CPUs or ASICs.


Overheads for Computers as Components

Networks in embedded systems


initial processing more processing

PE PE PE

sensor

actuator

Overheads for Computers as Components

15

Why distributed?

Higher performance at lower cost. Physically distributed activities---time constants may not allow transmission to central site. Improved debugging---use one CPU in network to debug others. May buy subsystems that have embedded processors.
Overheads for Computers as Components 16

Hardware architectures

Many different types of networks:


topology; scheduling of communication; routing.

Overheads for Computers as Components

17

Point-to-point networks

One source, one or more destinations, no data switching (serial port):

PE 1 link 1

PE 2 link 2

PE 3

18

Bus networks

Common physical connection:

PE 1

PE 2

PE 3

PE 4

header

address

data

ECC

packet format

Overheads for Computers as Components

19

Bus arbitration

Fixed: Same order every time. Fair: every PE has the same access over long periods.

round-robin: rotate top priority among Pes.

fixed round-robin

A A
A,B,C

B B

C C
A,B,C
Overheads for Computers as Components

A B

B C

C A
20

Multi-stage networks

Use several stages of switching elements. Often blocking. Often smaller than crossbar.

Overheads for Computers as Components

21

I2C bus

Designed for low-cost, medium data rate applications. Characteristics:


serial; multiple-master; fixed-priority arbitration.

Several micro-controllers come with built-in I2C controllers.


Overheads for Computers as Components 22

The CAN Bus

Originally designed for automotive electronics Now used for other applications as well Bit serial transmission, 500 Kb/s, over twisted pair,up to 40 m Synchronous, nodes synchronize themselves by listening to the bit transitions on the bus Arbitration by using Carrier Sense Multiple Access with Arbitration on Message Priority (CSMA/AMP) For error handling a special error frame and an overload frame are used as well as acknowledgements

23

Vous aimerez peut-être aussi