Vous êtes sur la page 1sur 7

The speed of military targek can e

d several tunes the


sped of sound. This can lead to a maximum allowable
processing latency in the ordcr of tens of mihecon&.

The application of parallel


DSP architectures to radar
signal processing

The Benefits of Parallelism


Parallel architectures allow the design of a system to
accurately rcflea the scale of a given problem. A conventional

by J.Tulodd&
SMiOrEngincer
Marconi Radar Systems
Chelmsford

hardware a p p m h can lud to overcnpincCring. Moreover.


the reusability of a parallel solution from one application to the

Summary
Increasing requirements for ndar system performance have
led to the need for impmwnents in signal processing
c o r n p " d capacity. Cauplcd with the need for flexibility
d
o
p
t
a
b
w of p r o g " b l e Sdutiom this has led to the
and a
development of parallel DSP architectures. Such solutions
offer potential benefits for fault tolerant and scalable systems.
Marconi Radar Systems has implunated massively parallel
signal processing archikctureabased on the Inmos 'Ttansputcr
and Texas Inshuments TMS320C40 devices. 'Ihe
characteristics of each device di& in ways whicb offer
advantages for certain architectural configurations and
applications. Hnamplcs are given from practical experience.
Issues such (IS p"sbg bandwidth, 8u arehitedure.
communications bandwidth and topology, multi-processor
support and devclqrmcnt ecnrimnment Cmsidend.Future
trends in algorithms and anlritecbues are discussed. The
coupling between these two design factors is seen to depend
on the development of an autamated parallel DSP design
environment.

The Radar Problem


The performance requinments of a radar system vary
according to many diffcnmt foctors. It is the scale of this
variation that demands a flexible apprmh to system design.
The largest computational demand is placed upon military
systems. Although the volume of civil air trafh is rapidly
increasing. the signal proceasing algorithms required to detect
and truck cooperative aircraft are much simpler than those
rcquired in defence applications. The technological limitations
of a radar system are specifically uploited in military
scenarios. Sea skimming missiks deliberately attempt to hide
their signal returns amongst the returns fmm the sea itself.
Stealth targets attunpt to reduce the effective cross secbonal
area presented to the radar. Radar jammers attempt to confuse
detection algorithms by broadcsting high levels of no&.
Alongside these tactical techniques are the natural obstacles
within a radar envimnment. Unwanted returns can be
generated from sea. land and rain. These returns are known as
'clutter'. All these facton increase the complexity of the
rcquired processing algorithms.

5/ 1

next "
kthe cost of future hardware alt"s.
With a
large number of identicalboards, an economy of scale leads to
a reduction in product life cyde cost (suchas in manufacturing
and tcst). Parallel solutions also reduce the diversify of
component spares required. Inherently, a parallel architeaure
offers seope for fault tolerance and graceful degradation above
that of a more conventional rpprorch by providing alternative
mutes for data flow. A h . 30ftwPle mcthodoiogies can be
chosen such that the functionalty mapped onto an array of
prowssors is very much independent of the array size. This
generalisation can be taken further by providing a layer of
abstraction whereby the functionality can be transportui onto a
number of Mmnt physical platforms (so enhancing
reusability and transportabfity).

Hardware Options
DSP devices provide eustormzed architectures in support of
classical DSP dgorittunS such as the fourier transform and
he-domain filtering. This is achieved by the customisation of
the arithmetic hardware and instruction set. General purpose
processors provide a lower cost solution Blmed at a larger
market. Categorizing the available products on the market is
often difficult and arguably only a marketing issue. Having
selected a suitable device. there is then the inter-connection
choice bctwtcn off-the-&& parallel promsing hardware and
in-house board design. This choice will largely depend on
project time-des. product availability and the amount of
software. support provided by the supplier.

Hardware Features
In addition to factors such as the availability of a device and
the likely software and hardware support, the most significant
hardware issues considerad are typically:-

the numerical accuracy


the FLOPS or MIPS figure
the number/speed of communication linLS
the amount of on-chip memory
the amount of possible off-chip memory
the amount of on-chip cache

the support for link handling


the support for process scheduling
the support for debugging

2 1995The Institution of Electrical Engineers.


Printed and published by the IEE, Savoy Place, London WC2R OBL. UK.

Two devices that have proved successful in radar applications


are the Inmos family of genual purpoat processors known BS
the Transputer and the Texas Instruments TMS320c40 DSP
device.

Radar System Design


Having identified an application that justifies a parallel
hardware solution, the overall constraints on the signal
processor must be ideniiiied. These will partly be defined by
the customer and secondly by the nature of the expected radar
environment. mical considerations are:-

*
*

the algorithms required


the reliability figure (MTBF)

--

the numerid accuracy required


the maximum dynamic range
the maximum acceptable Laancy
the input data rate
themaximumtaskrate

These factors determine which hardware features are of most


importance. For example. floating point arithmetic is more
appropriate for fourier analysis rather than fixed point.
However. this canbe a sipiticantly more wq.Rnsiveoption.

Partitioning the Problem


A signal processor is composed of many separable funcuons
such BS data distribution. fourier analysis. plot and clutter

thresholding and the p a h g i n g and output of results. A


strategy is required to map thtse funcbns onto a parallel
resource. An d c i e n t scheme is required whereby processor
idling time is reduced to an absolute minimum.
The two most common partitioning options are data fanning
and pro& farming (or a combination of both). Process
farming allocates a sub-set of the required functionality to
each processor and acts upon the input data..in a pipeline
fashion. Data farming allocates a sub-setof the input data to
each processor whilst applying the complete required
functionality.
Classical radar signal processing algorithms. required to
extract signals from noise and other interference. are largely
independent of the range at which they are applied. Moreover.
they may be adapted to operate Over a limited range extent. A
good example is range-achng CFAR (Constant False Alarm
Rate) for plot thresholding[ll. Thus. it is often appropriate to
use data fanning by allocating a subset of range to each
device. This is especially true where the processing does not
reduce data bandwidth such as in producing a video output.
There is a practical limitation on the number of processors that
can be applied to a given data set. Over partitioning is
constrained by two factors.

Firstly, a point is reached whereby uver-par&itioning of data


can yield to unacceptable communicatim overheads. This is
an inherent property ofparallel systems. Secondly. algorithms
will dictate the minimum number of range cells that must be
p r e s s e d collectively. Ideally then, these range cells should
reside on a single processor.

System Architecture
Input interfaces must more than accommodate the specified
data rate. Distribution and collation of data must be
implemented with the minimum possible communication
overhead. Secondly. suitable processor inter-connectivity must
wst to retlect the algorithrmc requirements. 'Ihe architecture
should be modular and expandable with few board types. The
total number of devices should account for the likely memory
and code space requirements. Obvious bottlenecksin data flow
can be identified at this stage.

Performance Prediction
Other than costing and technical risk analysis. the main task
that remains is to predict the achievable puformance.
Inherently, parallel systems am.more eomplex to analyse. than
conventional designs. This is mainly due to the more complex
data flow. This can lead to memory contention. inpuVoutput
bottlenecks. communication and general intempt handling
Overheads.

In an ideal processor environment. the behaviour of a radar


processing algorithm can be classified in terms of the
dependence of its execution time on the data. A fourier
transform will be deterministic whereas a thrcsholding task
will depend on the number of items being thresholded. Some
algorithms not only deqend on the number of items to be
processed but on the actual values in the data such as in a
s o r h g task. Such variations can be investigated by
prototyping code on the target hardware or commonly on a
hardware simulator. Also, instruction sets can be analysed
together with data books to estimate the worst case number of
instruction cycles required.
The concluding performance may prompt more thought on the
distribution of functionahty in the system. A combination of
process farming and data farming may be considered.

Ground Based 3-0 Air Surveillance


Thht: first application that will be considered is that of 3dimensional air surveillance from a land based site. These
radars are required in early warning scenarios where long
range coverage is required. For each detection, the range.
azimuth and elevation is required. Also, general background
clutter maps must be created. Such systems use one of two
principles to obtain elevation information

512

The results from the node are sent to an executive processing


module where corresponding data from other beams arc used
to derive the elevation estimates.
The inter-node arJliteaure is designed to allow alternate
routts to be defined to overcome individual node failure. Each
node monitors itself and its immediate neighbours. If a fault is
detected. it is mportcd to the executive. A reconfiguration
programme then maps out the faulty node so allowing it to be
physically qlaced whilst the system remains on line. This
new codguration employs a previously redundant node to
maintain full system performance. The system is also capable
of graceful degrdation in performance by reconfisuring with
reduced r ~ g coverrge.
e

Ille i n d node data flow is based upon four worker chains


Four head workers 0 exchange data
directly with the server (S). Each head w k e r exchanges data
with lower level workerr 0 to complete the distribution.
Each worker chain uunicates with the foreman (F).E d point hardware is employed. This is adequate k a u s e the
surveillance algorithms used do not rquire a very large
dynamic range.
(Sec Figun 3).

The behaviour of the. software is also well defined. his is


because the implementation of softwarc using o c c ~ mon the
Transputer is inherently &dent. The philosophy of this
approach m m that it is largely not ryor in fact wise
to hard-win occm with lower level machine
language. This was very much the c w in the development of
the TAP system. The result was that very little software
optimization was required at a machine level.
All these factors assist the designer in generally matchug the
processing rate to the required input task rate. When thisis not
the case, any short term overlap between processing the
current data and processing the nwct task. is managed by the
transputer scheduler (i.e. there is the capability to cope with
more than one azimuth e t o r at a given instant in time).
The TAP node has proved to be easily upgradable whilst
retairUng the same basic architecture. A 32-bit floating point
node (based upon the T801 Transputer) has becn constnrcced
within the same mechanical frame as that used by the fixed
pomt node. Also. for the fixed point node, pin compatible
Transputer upgrades to 25 and 30 MHZ are possible.

Multifunction Radar
This category of radar systems is the sccond application that
The principle is to wmbiie surveillance
capabilities with tracking by suitable control of a narmw
beam[2](See Figure 4). Thii permits the tracking of multiple

will be considered.

targets simultaneously (in contrast with a dedicated


mechanically steered backer).The transmitted waveforms are
tailored to the mode of operation. There will be a predictable
search pattern in normal sweillance mode. In a threat
scenario, the beam management software must redistribute the
scanning priority. for example, by giving higher priority to
tracking
(not all workers shown)
low bandwidth data
to the executive

Figure 3: TAP Node Data Flow


The amount of data allocated to each processor is fixed as
already described.Also, the data rate and task rate are ked.
Thii very much sirnplif~essystem analysis and performance
prediction.
The processing latency budget typically m m s that each node
must process its allocation of data in tens of milliseconds. To
determine the worst caw, process time, the worst case
execution time per range cell must be determined. In the
surveillance radar application, the computation required IS
reasonably deterministic with algorithms such as integration.
and temporal thresholdingI11
moving target indication (m)

Figure 4: Multifunction Radar

The algorithms selected can also depend on the beam


orientation. This demands rapid real-time algorithm
reconfiguration under very short latency constraints (tens of
milliseconds). Signal reixms from sea clutter are most
significant at low elevation angles. Under these conditions it is
more appropriate to use coherent processing to provide
doppler discrimination.

A fixed eight beam arrangement is used.

One method is to scan the elevation plane. for each azimuth


position using clckoNc beam forming. The second method is
to permanently mate a number of beams in elevation using a
single bum forming antenna array. ?he dntive signal
Stmgth 9cw1in two ~ b o u r i n beams
g
i s usad to interpolate
theelevationangle(SeeFigure1).

The signal processor is typically composed of -100 TAP


nodes. A TAP node is the lowest level, line replacable unit in
the system. Each node employs 50 Transputers giving an
overall peak system performancc of -100 GIPS (at lo(xf MIPS
per node). An may of nodes is configured in such a way that
each can communicate direaly with other immadiately
neighbouring nodes (SeeFire 2). Bach node is composed of
three main parts. The server is a data interface module (using
a T225 Transputer end an ASIC). This links the node to a high
speed interfacc using the FDDI Fibre DBtributed Data
Interface) protocol. Raw data is recognized by the node using
tagged data packets. On the arrival of new data, a p&t is
captured and replaced with processed data from the node. The
main core of the node is a collection of childmodules also
known as the workers. (each a 20 MHz T225 Transputer).
These each perform exactly the same processing on a sub-set
of data. The data p r o c a d by each workr is that produced
from a single bwn (npnscnting a fixed elevation sector) at
the current azimuth for a fixed subset of range

Fmre 1: 3-D Air Surveillance

The Inmos Transputer


The Transputer family of general purpose processors offers a
considerable number of building blocks for use in the design
of a radar signal procum. The most important hardware
feahues are listad beluw (using the specific case of a 722.5
running at uIMHz):...................................................................
.........................................................................
*

-*

16 bit fixed point architecture


20 MIPS (pealr)
4 x 20MbiWsec dlinks
4Kb~on-chipstaticRAM
64Kbym addressable &-chip memory
dedicated link intuface
microcodedscheduler
Data Interface

Node Array

In addinon

to a good technical performance. the Transputer


offers a purpose built programming environment. Occam is a

high level language designed specifically around the

Transputer architecture. Many pardel processes can be


defined with point to point communication automatically
synchronized and unbuffered. The hardware scheduler
P software kerncl end 90 reduces the
associated interrupt overheads. The scheduler, together with a
link interfxe, is able. to automatically deschedule an U0
process for the duration of a message transfer effectively
dccoupling the CPU from the link transfer.

removes the need for

The Transputer Array Processor


The Marconi Radar Systems Transputer Array Processor
(TAP) was designed speckally for multi-beam surveillance
radars. Several of these systems have been successfully
employed around the world

Figure 2: TAP System Architecture


The chdd modules transform the raw data into processed vldeo

suitable for &splay at an operator s termmal Thls data IS


routed back to the FDDI mterface Secondly, the chdd
modules produce canmdate detections and clutter estlmates
that are routed to a mgle processor on the node known as the
foreman Tlus device (a T425 Transputer) produces the final
confirmed detecuon hst and clutter map for the node

513

These waveforms require longer intcgratron periods. In cltar


search environments (hi@ elevation an&).
it is more
appropriate to use wn-wherent algorithms with shorter
integration periods.

The main system architecture IS based upon two identical


systems (known as slices) opu~ng
side by side (See figure
5). Digital Pulse Comprespion 0 is applied to the data
before the EPICS.The DPC boards are not shown.

These algorithms operate. only in the range dimension (no


doppler disckination) and 90 are simpler to i m p h m t and
pottntially offer ilmore efiicient use ofthe &Brch time budget.
Tht search kxibility is partly made possible with phase
scaoned array antenna technology whicb has provided the
required v&ty.
Blectronic IS well as mechanical control is
used to steer a nurow beam in elevation and azimuth.

EC40
Boards

The Texas Instruments


TMS32OC40 DSP Processor
Abbreviated U the 0 0 . this processor has proved to be a
high pcrfonnrnce DSP device ideal for parallel processing in
radar applications. The most impcrrtsnt features of the device
are listed below (using the example case ofa 50 MHz part).

32 bit floating point architeaure


50 Mmxlps (Peak)
6~20MBytes/Su:seriallinks
8K Bytts onchlp RAM
16 GBytes addnssable off-chip memory
512 Bytes on-chip cache
dedicatedDMAcoprocessor
JTAG debugging support

One of the mast notable features of the C40 is its powerful


debugging envinwunUrt provided by the enhanced u9t of
JTAG (Joint Test Action Gmup) technology together with the
PDM (parallel Detrupeing Manager) tool. A chain of
processors are physicrlly linksd by JTAG. They can then be
accessed from a global level command line typically through
an OSR windows interface. Any sub-set of processors can be
halted at any time and ucaminsd. They may then be restarted
to allow other dependant procesmars to be examined in the
same way.

The C40 DMA (Direct Memory Acccss) Coprocessor is used


to copy data from memory to link and visa versa Thls
operation is largely independentof the CPU.

The EMPAR Signal Processor


The Marcom Radar Systems EMPAR (European
MdhfuncUon Phased Array Radar) slgnal processor has
recently been developed for and delwered to a customer The
system IS composed of a number of idenhcal C40 processing
modules known IS EC40boards Each contam Iune C4Os
providmg a peak performance of 450 M E O P S per board
C40s are also used for raw data dntnbuaon and control m the
EPIC (EMPAR Processor InterfaceCard) un~ts

Boards
Figure 5: EMPAR Signal Processor

Plot extraction and cfisplay processing BS wcll as task


schcduhg are performed by a h@er level control urut known
as the RMC (Radar ManagrmentComputer).

Functionally the system operates at a regional level in the main


processing channel (the sum channel) using data farming on
mall sub-sets of range across a given number of pulse
intervals. At this level. a degree of uncertainty exists in the
extracted data. At a global level. more certainty is added by
collating the information from the regional EC40s onto a
single processor. This head global processor decides upon
further processing, if necessary. and this is communicated
back to a regional level. When global processing has no
further need for regional processing, it releases it to allow the
next batch of regional data to be processed. At the same time,
global processing performs further thresholding tasks on the
current data. Or!completion of this task, results are sent to
difference processing where azimuth and elevation angles
are computed. Thus. the EMPAR system is a combination of
data fanning and process farming with a multi-stage pipdine
architecture.
The number of boards in each slice is sufficient to process the
largest expected data set without exceeding the regional
processing latency budget (tens of milliseconds). Secondly. the
combination of two slices and several boards per slice is such
that throughput can be maintained for any possible sequence
of tasks. The multi-slice architecture can be tailored to deal
wth the expected load variation of any application. The
number of slices in the system and the number of processing
boards within each slice can easily be modified.The number
of boards within each slice need not be the same. In this case,
the slice selection logic could not only select the least loaded
slice but also chose the slice that is most suitable for the size of
the next data set.

Within each regional EC40. processed data IS farmed onto and


offthe board via a head regional worker in a Ibased
smture (See Figun 6).The owdl system I
is heededby
the hcadglobal worker with a top level branch to esch regional
EC40. This structure is duplicated for each slice in the system.

f
Pmem1ng

With hs information. not only can the optimum slice be


selected for the next task,but also any indication of potential
queue saturation can be detccted and acted upon. The regional
queue size is determined by estimating the worst case task
arrival sequence. This is possible because although the amount
of data to be processed is unknown. the minimum and
maximum amounts are well defined. The exact optimum
queue size is determined by analysis and simulation.

The EMPAR signal processor has the capability to provide


fault tolerance and graceful degradation. Redundant boards
could standby within each slice. Simple reconfiguration of the
software could map the boards into and out of the system.
Also. if the application dunands. graceful degradation could
be achieved by operating with only one slice or by reducing
the range coverage within either or both slim. In the latter
case. the slice selection logic could selectively degrade the
range coverage for only the least critical tasks.

p
1
F
l

Figure 6 Regional E W Data Flow

As with the TAP node,the ECM board has proved to be easily

The system is implemented using a mixturc of C


programming and C40 assunbly coding. The former is used
for high level control code. The latter is used for time critical
operations such as deta communications. sorting tash and
FpTs (Fast Pourier Transforms).Thus, maximum use is made
of the C40 facilities such as the parallel instruction set, circular
addressing. delayed branching and the onchip RAM. At the
same time. the high level code provides the required
readability at a system level (simplifying the task of high level
debugging).

upgradable with no changes to the board or systun


architecture. Early development of the system operated at 40
MHz. The current implementation operates at 50 MHz. A
6OMHz upgrade is also possible (the system can in fact operate
with a mixture of clock rates).

Concluding remarks

The dynamic behaviour of the systm is dependent on the


exact sequence of tasks (with varying size data sets) given to
the processor and also on the data content within each task.
Process times. on the same amount of data, may vary two-fold
or more between a clear data set and one containing many
candidate detections. To manage these variables a queue
management system is required.

Two very different applications have been described. It is not


intended that the reader should simply choose between them.
The important issue is the versatility of parallel design. Both
systems have demonstrated the benefits of this approach. Both
are evolving systems with scope for scalability and
upgradability. They achieve their objeaives in very different
but appropriate ways. Ihey both provide a cost effective high
performance machine that is not constrained to any specific
requirement. Each supports their own category of
requirements namely surveillance and multifunction radar

A task allocation programme monitors the loading on each

Future Algorithms

slice by logging t a b onto the system and logging them off.In


this way. the least loaded slice can be given the next task that
arrives. This degree of task management arises from the
expected variation in the input task rate. Although the
incoming data rate is constant. the amount of data to be
pnressed cannot be predicted by the processor. On the
rwional E M S , a server processor gathers new data for the
board from an EPIC. This is farmed out to four worker
processors where it i s placed on a fix& size queue. Data are
released from the queue when all eight workers are ready for
the next task. Many tasks may be queued but only a single task
is processed at any one time. DMA is used to manage the raw
data transfers with minimal impact on CPU time
As already described, a feedback loop exists whereby the data

dstribution control software has an estimate of the current task


queue depth.

Two areas of continuing research are neural networks and


chaos mathematics. Both are of interest in the unpredictable
world of radar data. As higher resolution radars are developed,
more information may be extracted from the environment. The
use of parallel architectures is a natural choice for neural
mathematics. One or more neurons may be physically
allccated to a single processor.

Future Architectures
Optical back-planes for parallel systems have already been
developed[31. Potentially. they provide more choice of interconnection topologies but with only a small commonications
overhead.

516

One principle is to use a holographic plate to define a board

References

level inter-conncction.
Silicon technology witbin a single device is approaching the
physical limits of the siliconwall. This has led to a trend
towards multi-processor fabrication on a single piece of
silicon.
Cross-Bar switching and shared memory technology are
continually improving. Hardware is becoming more
reprogrammable with gate array technology. All these factors
add to the efliciency in which parallel systems can be

implemented.

Automation of Design
The ideal duign environment is one where the software
developer nced not have any howledge of the parallel
machine architecture. Also. the machine architecture need not
make any assumptions about the application.The physical
inter-processorconnectivity can be defined by the application.
This connectivity and the functional mapping of a given
algorithm onto a parallel resource should be an automated
procus. To apply this levd of automation to a large &tion
of dependent algorithms. quating to a radar signal processor.
is a very difkult W. The coupling between algorithms and
architectures in this context is the subject of much rescarch.
Some tools are already emuging. & Ptolemy development
tool (University of California) allows a graphical
interpretation of a system to be allocated basic funtionality
with automatic code generation. Gate array technology has
been applied in attempt to automate the design of M occam
based parallel system. Problem-specific hardware can be
realised entirely by a software procus[4]. Another field of
research is in parallel p r o g r m i n g languages a d compilers.
The objective is to provide complete abstraction between the
required functionality and the physical solution[5].

111 SKOLNIK M.I..Tntroduction to Radar Systems. 2nd


Edition. McGraw-Hill. 1981.
[21 INGIB RJ.. THOMAS AS.. Signal Processing for
Multifunction Radars. GEC Review, VOL.10. NO.l. 1995.
131 FEL.DMAN M..Holographic Optical Interconnects for
Multichip Modules.
Electronic Eingincenng. September 1992.
[41 PAGE I.. LUK W.. Compihg occam into FPGAs. in
FFGAs, 4..
MOORE W.. LUK W.. 271-283, Abingdon
EEBrCS B&. 1991.

[51 BISSELING R.H.. McCOLL W.F.. Scientific Computing


on Bulk Synchronous Parallel Architectures. Tkchnical
Report 836. Department of Mathematics, University of
Utrecht. December 1994

A more Certain development is the increased use of parallel


systems across the radar product range. More use UI
commercial radar applicationswill maximize the returns from
this relatively new technology.

Acknowledgements
The author wishes to thank all members of Marconi Radar
Systems that have provided comment on this paper.

5/7

Vous aimerez peut-être aussi