Vous êtes sur la page 1sur 3

Paper Title* (use style: paper title)

Subtitle as needed (paper subtitle)

Authors Name/s per 1st Affiliation (Author) Authors Name/s per 2nd Affiliation (Author)
line 1 (of Affiliation): dept. name of organization line 1 (of Affiliation): dept. name of organization
line 2-name of organization, acronyms acceptable line 2-name of organization, acronyms acceptable
line 3-City, Country line 3-City, Country
line 4-e-mail address if desired line 4-e-mail address if desired

Abstract— Embedded world constantly improving on daily offload. In section 4 we evaluate what are benefits of fast core
basis, no matter if it comes to integration level, power interconnection and task offloading.
consumption or performances. Central role in these challenges
takes System-on-Chip (SoC) with its heterogeneous structure and
great performances. This paper presents how SoC components
II. RELATED WORK
communicate with each other with emphasis on Digital Signal Nowadays electronic devices are pushed in a corner with
Processor (DSP) communication to general purpose processors large amount of data they need to process. Reason why this
and possibilities of computation offloading. happening is that we demand from them not just to be able to
do some computational work, we demanding from them to be
Keywords—System-on-Chip, DSP, CPU, offloading, IPC, more similar to us. They are becoming more anthropogenic.
GPU, HSA,
Simple examples will prove these points. Smartphones
I. INTRODUCTION need to know how many people are on picture, are they
laughing, and is there any sunlight as well as time and location
No so long ago there was only one main chip called central of created picture. Another example even more obvious is in
processor unit (CPU) which was responsible for all the work ADAS industry. Vehicles need to know their environment
that had to be done. That means we have one general purpose even more from humans. How many other vehicles are on the
unit in charge of different types of tasks. Beside that you need road, they speed, own speed, they need to recognize traffic
lots of different components like memory to store data, audio signs and pedestrians. They even need to recognize its owner
chip to decode music, GPU (Graphic Processing Unit) to draw or parks itself at parking place.
graphic and other smaller components that all have important
tasks. All these inputs are consisting of a large amount of data
which need to be processed at very short time. We don’t speak
The evolution of electronic and capabilities of high level about seconds. At speed of 200 km/h in one second we pass
integration ensured production of different components in one over 50 meters of road. Here we speak about milliseconds and
chip better known as System-on-Chip. SoC encapsulates most even less time to collect data, process them and make some
of all required modules like GPU, DSP, memory or power actions.
management circuits in one silicon chip. Very high level of
integration and short wiring means it also uses less power To be able to process all these data in very short time we
which is mandatory in embedded devices like smartphones must have all different types of processing units in a system
and tablets. which behave like one unit. This is the point where silicon
vendors like Qualcomm, TI, Samsung, Intel and many others
As we can see SoC ensures better integration and its started to produce powerful SoCs as a unique solution.
heterogeneous structure with different types of processing
units give us better performances. In this paper we will focus
more on interconnection between different units in one single
SoC. In our case it will be communication between DSP and
CPU as well as computational offloading of CPU which
presents one of key aspects in heterogeneous processing
structures.
In second section we describe reasons why SoC are used
and why they present perfect solution for processing large
amount of different types of data. In section 3 we describe
how different processing units inside SoCs are interconnected
and concept of coherent memory with purpose to reach
maximum performances and ensures easier computational Figure 1
Qualcomm Snapdragon 820 shown on Figure 1 consists of Software managed coherency only takes advantages in one
general purpose processors, like ARM Cortex, DSP cores case where number of cores in system reach hundred and more
responsible for image processing and algorithms and in charge [4].
of graphic there is GPU.
Hardware managed coherency simplifies software. This
This concept of general purpose processing unit with means once data is marked as ‘shared’ there is no need to be
combination of specific designed cores is basic for most of updated. Both CPU and DSP see exactly the same value. This
SoC vendors. Table 1 presents some comparison in SoC requires a coherent bus protocol which will allow cores to
production between two vendors. research cache to see if data is already on chip or need to be
fetched from external memory. Example of full memory
coherency is presented on Figure 3:
Qualcomm Texas Instruments
Vendor
Snapdragon 820 TDA2x
2xArm-Cortext A-15
CPU 4xKryo @ 2200MHz
2xArm-Cortext M-4
Hexagon 680 @
DSP 4x500MHz 2 x C66x @750 MHz

GPU Adreno 530 @ 624 2x SGX544 @ 384


MHz MHz
Table 1

III. MAIN PART


Having such a powerful SoCs, drives us to think how can Figure 3
we use these resources to gain maximum performances? First With shared virtual memory CPU and DSP can share
of all we need to know how these different types of cores are physical memory and operate on same virtual memory
connected and in which way they can interact with each other. address. This means both processors can use same buffer at
For example if we have a CPU which is running some task same time.
and we want to some part of this task runs on DSP, somehow Heterogeneous system architecture always requires
data from CPU must be transferred to DSP. In this paper we hardware coherency. Scope of shared virtual memory can be
present concept of coherent cached memory used in limited to shared buffers. This means only shared buffers will
heterogeneous system architectures knows as HSA [3]. be visible for both cores, not the whole memory. Full
Cache coherency ensures that all cores in our SoC see the coherency besides allowing cores to use whole shared memory
same data [1]. For example some if CPU creates object in its also supports atomic operations and synchronization between
local cache then when we pass that object to DSP, both of cores. HSA also defines different segments of memory like
them must be able to see same data. There are three global, shared and private, but the best thing is that each
mechanisms [2] to maintain coherency: address is unique no matter is it private or global.

 Caching disabled
IV. EVALUATION
 Software managed coherency Using this powerful mechanism of hardware managed
 Hardware managed coherency coherency offloading tasks from one core to another has never
been easier.
Disabled caching is the simplest one but it will downgrade
performances and in reality is not used. Performances also rise to another level. Using same shared
memory cores not need to waste time to clear own cache or to
Software managed coherency presents traditional solution invalidate data on extern memory. Power consumption is
where device driver must clean dirty data from cache and reduced, what is very significant for mobile and tablet
invalidate old data to enable sharing data with other cores. industry. Offloading tasks to DSPs which has been optimized
Cache cleaning and invalidation needs to be done in a right for it use less power than general purpose CPU for same task.
time. If done too often it waste its core, if done to infrequently
it will result in stale data. This is presented on Figure 2: Architectures like HSA use new task queuing system and
context-switching, were tasks and threads can be switched
even when threads are in running time.
All these advanced mechanisms ensure maximum SoC
utilization not just from software and hardware developers but
Figure 2 from end users also.

References
[1] Kai Li, Paul Hudak, “Memory coherence in shared virtual memory
systems”, Journal ACM Trasactions on Computer Systems (TOCS)
[2] Sultan Almakdi, Abdulwahab Alazeb, Mohammed Alshehri: "Cache
coherence mechanisms", College of Computer Science and Information
System, Najran University, Najran, Saudi Arabia
[3] Heterogeneous System Architecture Overview, Phil Rogers, HSA
Foundation President
[4] Xiaocheng Zhou, Hu Chen, Sai Luo, Ying Gao, Shoumeng Yan, Wei
Liu, Brian Lewis, Bratin Saha, "A Case for Software Managed
Coherence in Many-core Processors", Intel corporation

Vous aimerez peut-être aussi