Vous êtes sur la page 1sur 5

2012 12th International Conference on ITS Telecommunications

A novel multicore SDR architecture for smart vehicle


systems
Yao-Hua Chen, Chia-Pin Chen, Pei-Wei Hsu, Chun
fan Wei, Wei-Min Cheng, Hsun-Lun Huang, Tai-Yuan

Albert y.P. Chen

Cheng

SVP/CTO
Inventec

Information and Communications Research Laboratories

Taiwan, R.O.C.

Industrial Technology Research Institute

Chen.AlbertYP@inventec.com

Taiwan, R.O.C.
{ YaoHuaChen, apple.chen, pwhsu, PatrickWei, wmcheng,
cf, tychen } @itri.org.tw

Abstract- A transceiver architecture with multi-core software

applications can co-exist in the same equipment. Furthermore,

defined radio (SDR) technology is proposed for the physical layer

the system specs can be easily upgraded by loading the

inner processing of IEEE S02.Up in intelligent transportation

software with updated programs.

systems (ITS). By localizing the data transmissions between the

The SDR architectures can be classified into two categories:

adjacent digital signal processors (DSP), concatenate memories


and concatenate buses are introduced to ease the bandwidth

reconfigurable

requirement for the data communication among multicores. The

accelerator-assisted architectures [2]. The second approach has

proposed transceiver architecture is verified by the electronic

high degree of flexibility and is capable of supporting multiple

system-level (ESL) virtual platform with two application-specific

standards in mobile devices. To meet the throughput and

instruction-set

processors

(ASIP).

The

high

level

latency

power

architectures

requirements

of

and

high

data

DSP-centered

rate

with

applications,

an

estimation results are also provided in this paper. To enhance of

application-specific instruction-set processor (ASIP)

the channel estimation and equalization performance of IEEE

usually used to cover the common operations for SDR. These

S02.Up, the capability of the proposed architecture with the

specific instructions may include complex MAC, complex

decision feedback algorithm is analyzed.


Keywords-IEEE
(ITS);

S02.Up;

software-defined

intelligent
radio

[4] is

butterfly, etc[3]. Moreover, multi-core architectures are utilized


transportation

(SDR);

for further improvement of the processing data throughput.

systems

Conventionally, the data transmissions among DSPs in a

application-specific

multicore system are through a shared bus with an arbitrator, a

instruction-set processor (ASIP); electronic system-level (ESL)

network

with

routers

and

switches

or

cache

with

synchronization mechanisms. The transmitted data is usually


stored in a shared memory hooked on the shared bus or the

INTRODUCTION

I.

network and visible by all DSPs, as shown in Figure I. Due to

IEEE 8 02.11p[5] is an approved amendment to the IEEE


8 02.11

standard

environments

to

add

(WAVE)

wireless
for

access

supporting

in

the frequent accesses of the shared memory,

vehicular

the shared

memory must have high bandwidth requirement and may

Intelligent

become the performance bottleneck of the system. Moreover,

Transportation Systems (ITS) applications. The physical layer

complicated arbitration,

spec is almost the same as that of IEEE 8 02.11a except that the

routing

design or

synchronization

mechanisms is required to avoid the data collision.

channel bandwidth and the data rate of IEEE 8 02.11p are only
half of those of IEEE 8 02.11a. Intuitively, the transceiver for

In this paper, an ASIP for the baseband processing with

IEEE 8 02.11a can be applied for IEEE 8 02.11p. However, the

instruction set simulator, disassemble and linker is designed. A

design constraints of the receivers for IEEE 8 02.11p and other

multicore

IEEE 8 02.11 families are quite different. For those IEEE

concatenate buses is proposed to ease the bandwidth and

architecture

with

concatenate

memories

and

8 02.11 families in low mobility applications, achieving data

throughput requirements of the shared bus and shared memory

throughput and processing latency requirements are the main

in the conventional multicore systems. An ESL virtual platform

design

is built for functionality verification, power estimation and

issues. However,

for

IEEE

8 02.11p,

the

channel

estimation dominates the system performance due to the high

decision feedback analysis for the inner receiver processing of

mobility in the vehicular environments. Thus to achieve

IEEE 8 02.11p. The paper is organized as follows. Sections II

maximal hardware sharing, the Software-Defined Radio (SDR)

provides the proposed universal modem architecture. The

technology is proposed for implementation of 8 02.11p and

processing criterions and the algorithms for the inner receiver

other IEEE 8 02.11 standard families in this paper. The SDR

of IEEE 8 02.11p are introduced in Section III. Section IV

can offer significant advantages over the dedicated hardware

describes the detailed implementations of ESL virtual platform

designs for its high flexibility, short design cycle and even high

for IEEE802.11p and the simulation results. The conclusions

performance when cooperating with hardware accelerators. By

are drawn in Section V.

selecting the appropriate software modules, different radio

978-1-4673-3070-1/12/$31.00 2012 IEEE

275

16+16=32 p.s

DSP2

DSP3
Short Preamble

Long Preamble

------, ,------,> <,----"----

./

Signal Detect,

Coarse Freq.

Channel and Fine

RATE

AGe, Diwrsity

Offset Estimation

Frequency Offset

LENGTH

Selection

Timing Synchroni7.e Estimation

......... Shared Meii

...............

Figure 1.

Figure 3.

DATA

The physical layer frame structure of IEEE 802.11p

Data flow without CC Bus

Conventional architecture

..
Figure 2.

DATA

CCBus

Figure 4.

Proposed architecture of universal modem

The baseband signals processing of generic recievers for OFDM


modulated IEEE 802.11 families.

The concatenate memory, CC Mem ij, is only accessible by


the hardware accelerators or DSPs at stage i and j on the CC

II.

Bus. By localizing the data transmissions between the adjacent

PROPOSED UNIVERSAL MODEM

DSPs or hardware accelerators on the CC Bus, the stream

ARCHITECTURE

based

The baseband data processing can be partitioned into two


categories:

streaming-based

processing

and

block

based

by symbol operations, whereas the block-based processing


the

necessary

includes

operations.

modulation,

The

the

processing

partitions,

multicore

SDR

Accelerating

coprocessors

(CC
may

Bus)
also

and

be

public

included

in

or

in

the

inner

transceiver

processing

are

INNVER RECEIVER OF IEEE S02.11P

The physical layer frame structure of IEEE S02.11P is

Figure 2. is an example of the proposed architecture. It


bus

operations

III.

given in Figure 3. . Each frame contains three fields: preamble,

comprises of DSPs, concatenate memories (CC Mem), shared


concatenate

passing

the CC bus and CC Mems.

architecture is investigated for universal modem.

memory,

by

of the shared bus and shared memory can be greatly reduced by

includes interleaving, deinterleaving, and channel decoding etc.


on

achieved

streaming-based, the bandwidth and throughput requirements

channel

estimation, equalization etc., and the block based processing


Based

be

can be stored in the shared memories via the public bus. Since
most

streaming-based

demodulation,

can

traveling between the non-adjacent elements on the CC Bus

must wait for collection of a block data before starting to


processing

transmission

processing, broadcasting operations, feedback operations, or

processing. The streaming-based processing performs symbol

perform

data

exchanging data in CC Mems. The data for the block-based

bus.
the

signal and data. The preamble field is used for signal detection,
automatic gain control (AGC), timing synchronization and
initial channel estimation etc. The signal field carries the

architecture for performance enhancement if necessary. The

information about the data field, such as the data length and

DSPs are configured to perform the software functions required

data rate. The data field carries the baseband processed OFDM

by the target radio application. The CC bus connects DSPs,

symbols of the user data.

hardware accelerators, and concatenate memories serially. The


public bus connects DSPs, hardware accelerators and the
shared memory. The streaming-based processing is performed

A.

by the elements concatenated by the CC bus, and the data for

For IEEES02.11p, SO subcarriers (including the cyclic prefix)

block-based processing

is transmitted

from

the

The timing-related parameters of IEEE 802.1 Ip

DSPs or

must be processed in the duration of one OFDM symbol, which

hardware accelerators on the CC Bus to the shared memory via

is Sf.ls, by the streaming-based processing. Thus the throughput

the public bus. The block-based processing can be started once

criterion is 10M subcarriers per second. The latency criteria for

the block data in the shared memory is ready.

the baseband processing is determined by the parameter SIFS


(short inter frame spacing), which is the small time interval
between the data frame and its acknowledgement. For S02.l lp,
the SIFS parameter is 32f.ls.

276

frequency
Nt=1
+-+

numerator
63

57

I
43

21

Figure 5.

B.

The decision fee dback processing of IEEE 802.11P inner receiver

Figure 6.

time

The pilot allocation of IEEE 802.llp

The baseband signal processing of the receiver

The baseband processing of the generic OFDM-based


receiver for IEEE 8 02.11 standard families is shown in Figure
4. . The processing can be divided into two parts: an inner and
an outer part [ 6]. The inner part deals with the streaming-based
processing, carrier synchronization, channel estimation and
channel compensation. The outer part deals with the block
based processing, such as de-interleaving and error corrections.
Figure 7.

In this paper, the outer receiver is implemented by ASIC

The virtual platform of IEEE 802.11P receiver.

hardware accelerators due to the high computational capability


requirement, and we focus on the ASIP design for the inner

IV.

receiver.
C.

MULTlCORES SDR VIRTUAL PLATFORM FOR IEEE

8 02.l lp
In order to examine whether the proposed architecture

The decision feedback equalization for 1EEE802.11p

meets the timing parameters of IEEE 8 02.11p, the ESL virtual

Since IEEE 8 02.11p is designed for outdoor and high

platform is introduced. In this

mobility environment, the channel coherence time may be


smaller than a packet transmission time and the Doppler effects

8 02.11p, it is found that the sine and cosine calculations for

estimation. Figure 5. shows the pilot allocation for IEEE

frequency offset compensation consume too many instructions

8 02.11p with grey squares representing pilot positions. Let Nt

and cycles. In order to meet the timing criterion of IEEE

and Nt be the maximal spacing of adjacent pilots in the

8 02.11p, the hardware accelerator, CORDIC, is introduced to

frequency domain and time domain respectively. From Figure

perform the sine, cosine and phase calculations.

5. it can be seen that Nt= 1 and Nt=14 for IEEE 8 02.l lp. To
fulfill the sampling theorem for channel estimation [13], Nf and

A.

Nt must satisfy the following equations

Nr<NI_uull
. =

2*f,,*(l+L1N)

the instruction set

reference. After analysis of the assembly codes of IEEE

are obvious. Thus the pilot density is crucial for channel

The ASl?
Three

Nt ::;Nt_min =NIL

paper,

architecture (ISA) of PACDSP [8] is selected as the initial

types

of

application

specific

instructions

are

implemented in the ISA design of the ASIP. The first type

(1)

contains complex vector instructions, such as complex vector

multiplication, addition etc. The second type is for the FFT


acceleration,

which contains

radix-2 and radix-4 complex

where N is the number of subcarriers in the OFDM symbol, L

butterflies and bit-reorder instructions. The last type is for

is the taps of delay spread and fd is the normalized Doppler

speeding up the soft output calculation when demapping the

frequency. For the case with vehicle speed 120kmlhr, we have

QAM constellations [9], such as the instruction "subabs",

Nernin = I I I and NCrnin = 10. Obviously, the pilot allocation in

which calculates X-ABS(Y) with two inputs X and Y.

IEEE 8 02.11p cannot meet the channel estimation constraints

To verify the proposed processor, the Language for Instruction

in (1) when the vehicle speed is 120kmlhr. To enhance the

Set Architectures (LISA) [10] is applied for generating the

channel estimation performance of the IEEE 8 02.11P receiver,

instruction set simulator. LISA is a mixed behavioral/structural

a decision feedback algorithm is utilized to increase the pilot

modeling language for programmable processor architectures

density. The hard decision results of the modulated data

with peripherals and interfaces. The commercial tool, Synopsys

symbols are used as the pseudo pilots for re-estimation of the

Processor Designer (PD) [11], is used to generate the tool chain

channel equalizer. Figure 6. shows the block diagram of the

(including

decision feedback processing of IEEE 8 02.11p inner receiver.

277

assembler,

linker,

simulator

and

debugger)

of

LPoi

LP1

5ignal

dataO

datal

datal

D.

High level power estimation


In order to estimate the power consumption of the proposed

architecture, an instruction based power analysis is applied.

DSPl
Hl.

C"

DSP2

DeOAM

detection

SIMULATION RESULTS OF THE PROPOSED SYSTEM

TABLE!.

us
Figure 8.

Throughput criteria
(per OFDM symbol)

Task partition of DSPI and DPS2 for the IEEE 802.llp inner
receiver

SIFS

the DSP. The assembly codes for each ASIP are verified on the

Latency criteria (16 us


for outer receiver)

ISS generated by PD before integrated to the multicore system.

DSPI averaged usage

The ESL virtual platform

B.

To verify the functionality of the IEEE S02.11p inner

DSP2 averaged usage

receiver, the ESL virtual platform based on the Synopsys

Universal modem total


latency

Platform Architect (PA) [12] is built, as shown in Figure 7. .


The

virtual

platform

is

in

heterogeneous

multi-core

architecture, and composed of an ARM92 6 processor, two


DSPs, two CC Mems, two CORDICs,

memory (InMem), output memory (OutMem) and the shared

The InMem and OutMem in Figure 7. store the received

Type 2

data from digital front end and the operation results of the inner
receiver, respectively. The common information and the data
for the block based operations are stored in the Shared Memory.

The ARM processor, built from a commercial IP on Synopsys


Platform Architecture, is only used to initialize and trigger the

7680

16

3840

1203

4.6

1100

9.6

2303

Complex vector
multiplication
Hardware
accelerator

Type I

of DSPI and DPS2.

32

Instructions

Type

memory (Shared Memory). Figure S. shows the task partition

1920

INSTRUCTION BASED POWER ANALYSIS

TABLE n.

two CC Buses, input

Cycle count

Cycle
counts

Approximated
power(mW)

Cl

PI

26

C2

P2

15.38

Type 3

Memory access

C3

P3

6.5

Type 4

Arithematic

C4

P4

7.26

Type 5

Others

C5

P5

0.244

two DSPs in this virtual platform. The functional SystemC


models

of

the

ASIPs

generated

by

Synopsys

Processor

The instructions are classified into S types according to the

Designer are used for the two DSPs. The hardware accelerator,
CORDIC,
modeling

is

modeled

with

SystemC

in

functionary, as shown in TABLE II. The total cycle counts for

transaction-level

the simulation is C

(TLM). The CCBus and the interfaces of the

energy is E

hardware in the virtual platforms are modeled complying with

CI +C2+C3+C4+C5. The total consumed

(CI *P l + C2 *P2 + C3 *P3+ C4 *P4 + CS *PS),

and the average power

the TLM 2.0 standard, and all memories are modeled as storage

E/C In this paper, the power

approximation is derived from the TSMC CLN90G90nm specs.

arrays with SystemC. The simulation results on the ESL virtual

For Type I and Type 4instructions, the power data is estimated

platform shows that the public bus is activated only when the

by the power of 1 6 and 4 1 6 *1 6 multipliers respectively. The

ARM processor initializes the two DSPs, or the DSPs access


the outer receiver processed MCS (modulated coding scheme).
Thus the bandwidth of the shared bus and shared memory can

power data for Type 2 and Type 3 instructions is approximated


by the active power of the CCMem, which is a 92S *32 single
port SRAM, plus 5 and I 1 6 *1 6 multipliers respectively. For

be greatly reduced compared with the conventional multicore

Type S instructions, no arithmetic operations are involved.

architecture.

Thus the power of Type S instructions is estimated by the idle


power of the CCMem. Figure 9. shows the instruction statistics

Simulation Results

of the two DSPs in the proposed system according to the


instruction classification in TABLE II. . Figure 10. shows the

Since the bandwidth of IEEE S02.lIp is 10M subcarriers per

high level power analysis results of the proposed architecture.

second, the clock frequency of the universal modem system is

The average active powers for DSPI and DSP2 are 4. 6mW and

selected as 240MHz, which is 24 times of the throughput

7. 7mW respectively. Thus the total active power consumption

criteria. TABLE I. shows the simulation results for the system

of the proposed system is about 123. 6mW for IEEE S02.lIp

using the generic baseband processing of the receivers for

inner receiver processing.

OFDM-modulated IEEE S02.11 families in Figure 4. . It can be


seen that both the throughput and latency criterion are met.

E.

Moreover, at least 40% timing margin of DSPI is left and can


be used for decision feedback algorithm to enhance the channel

Decision Feedback Analysis


According to the simulation results in TABLE I. , at least

estimation of IEEES02.11p.

40% timing margin of DSPI can be used for the decision


feedback

algorithm

to

enhance

the

channel

performance of IEEES02.lIp. TABLE III.

estimation

shows the cycle

count analysis when applying the decision feedback algorithm


in Figure 6. . It can be seen that for the data modulated by

278

1 6QAM,

45

extra

cycles

per

subcarrier

are

needed

processing," IEEE Transactions on Very Large Scale Integration (VLSI)


Systems, vol. 20, no. 3, pp. 551-563, 2012.

for

performing decision feedback operations. Thus 15 subcarries


can

be

used

as

pseudo

pilots

for

channel

[5]

estimation

enhancement. Similarly, for the data modulated by 64QAM, 12


subcarries can be used as pseudo pilots. If the all the pseudo
DSPl Instruction Statistics

DSP2 Instruction Statistics

Memory

IEEE P802.11p: Part 11: Wireless LAN Medium Access Control (MAC)
and Physical Layer (PHY) Specitications: Amendment 6: Wireless
Access in Vehicular Environments, IEEE Std. 802.11p-20lO.
TABLE Ill.

Complex

DECISION FEEDBACK CYCLE COUNT ANALYSIS

Function

Cycle counts of
16QAM
modulated data

Cycle counts of
64QAM
modulated data

Phase
compensation

Hard decision

20

32

Re-estimate
equlizer

Update equlaizer

19

19

Totol cycle counts


per subcarrier

45

57

vector
i i
1%

Co m p l ex
accelerator
1%

5%

Figure 9.

The instruction statistics for IEEE 802.llp inner receiver


DSPl Power Analysis

DSP2 Power Analysis


Others
3%

Comple y(!cto r
mlAtipliC<ltion
5%

""etor
m lliti pliu tioo
'"

Figure 10. The active power analysis for IEEE 802.1 I P inner receiver

pilots are distributed evenly, we have Nf

4. Thus the pilot

allocation criterion to fulfill the sampling theorem of channel

CONCLUSIONS

IEEE 8 02.l lp standard in the ITS system is proposed. Due to


the increasing requirements of shared buses bandwidth on the
the platform with

avoid the bandwidth bottleneck. The proposed transceiver


architecture is verified by the electronic system-level (ESL)
processors (ASIP). According to the simulation results, both
the throughput and latency requirements are satisfied. The high
level power estimation results are also provided in this paper.
enhance

of

the

of

8 02.11p,

channel
the

estimation

and

capability

of

equalization
the

proposed

architecture with the decision feedback algorithm is analyzed


REFERENCES
[1]

T. Ulversoy, "Software defined radio: challenges and opportunities,"


IEEE Communications Surveys & Tutorials, vol. 12, no. 4, pp. 531-550,
2010.

[2]

[2] U. Ramacher, "Software-detined radio prospects for multistandard


mobile phones," IEEE Comupter, vol. 40, no. 10, pp. 62-69, 2007.

[3]

A. Nilsson, E. Tell, D. Liu, "An 11 mm2 , 70 mW Fully Programmable


Baseband Processor for Mobile WiMAX and DVB-T/H in 0.12 !lm
CMOS," IEEE Journal of Solid-State Circuits, vol. 44 , no. 1, pp. 90-97,
2009.

[4]

[4] G. Xuan, "Hierarchical design of an application-specific


instruction set processor for high-throughput and scalable FFT

[8]

T. Vogt, N. Wehn,"A recontigurable ASIP for convolutional and Turbo


decoding in an SDR environment," IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 16, no. 10, pp.l309-1320, 2008.

[9]

C.-N. Liu, "Optimization techniques of AAC decoder on PACDSP


VLlW processor," IEEE International Symposium on Circuits and
Systems, pp.l468-l47l, May 2008.

[13] Synopsys
Inc.,
Synopsys
Platform
Architect,
http://www.synopsys.comlSystems/ ArchitectureDesign/pages/PlatformA
rchitect.aspx

virtual platform with two application-specific instruction-set

performance

M. Speth, S.A. Fechtel, G. Fock, H. Meyr,"Optimum receiver design for


wireless broad-band systems using OFDM-part I," IEEE Transactions on
Communications, vol. 47 , no. 11, pp. 1668-1677,1999 .

[12] Synopsys
Inc.,
Synopsys
Processor
Designer,
http://www.synopsys.comlSystems/BlockDesigniprocessorDev/Pages/de
fault.aspx

concatenate buses and concatenate memories is proposed to

To

[7]

[11] U. Meyer-Baese, G. Botella, S. Mookherjee, E. Castillo, A.


Garcia,"Energy Optimization of Application-Specific Instruction-Set
Processors by using Hardware Accelerators in Semicustom ICs
Technology," Microprocessors and Microsystems, vol. 36, no. 2, pp.
127-137, 2012.

In this paper, a multi-core SDR architecture targeted on

SDR platform with multicore systems,

M. Sandell and O. Edfors, "A comparative study of pilot-based channel


estimators
for
wireless
OFDM,"
Sep.
1996.
(http://http://www.sm.luth.se/csee/sp/research/reportlsae96r.pdf)

[10] F. Tosato, P. Bisaglia,"Simplitied soft-output demapper for binary


interleaved COFDM with application to HlPERLAN/2," IEEE
International Conference on Communications, vol. 2, pp. 664-668, Aug.
2002.

estimation can be satisfied.


V.

[6]

279

Vous aimerez peut-être aussi