Vous êtes sur la page 1sur 10

A Parallel S c a n C o n v e r s i o n Algorithm with Anti-Aliasing

for a G e n e r a l - P u r p o s e U l t r a c o m p u t e r
Eugene F i u m e
Alain F o u r n i e r
C o m p u t e r S y s t e m s R e s e a r c h Group
D e p a r t m e n t of C o m p u t e r S c i e n c e
U n i v e r s i t y of Toronto
Toronto, Ontario, M5S 1A4

Lar r y Ru d o l p h
D e p a r t m e n t of C o m p u t e r S c i e n c e
Carnegie-Mellon U n i v e r s i t y
P i t t s b u r g h , PA 15213

ABSTRACT 1. I n t r o d u c t i o n
The performance of a raster graphics system is
P o p u l a r a p p r o a c h e s t o s p e e d i n g up s c a n c o n v e r s i o n strongly influenced by the inefficiency of s c a n
often employ parallel processing. Recently, several conversion. Several recent papers have proposed
s p e c i a l - p u r p o s e p a r a l l e l a r c h i t e c t u r e s have b e e n speeding up scan conversion by employing special-
s u g g e s t e d . We p r o p o s e an a l t e r n a t i v e to t h e s e sys- purpose hardware. These systems exploit parallel
tems: the general-purpose ultracomputer, a parallel processing in various ways, some of which are:
p r o c e s s o r w i t h m a n y a u t o n o m o u s p r o c e s s i n g ele-
ments and a shared memory. The " s e r i a l
(I) "Intelligent" VLSI-based memory. This includes
s e m a n t i c s / p a r a l l e l e x e c u t i o n " f e a t u r e of this a r c h i -
systems such as PIXEL-PLANES, by Fuchs et al.
t e c t u r e is e x p l o i t e d in t h e f o r m u l a t i o n of a s c a n [FuPoSl, FPPB82], the smart m e m o r y architec-
conversion algorithm. Hidden surfaces ar e ture by Gupta etal. [GuSSSI], and the Rec-
r e m o v e d using a single s c a n li n e , z - b u f f e r a l g o r i t h m . tangular Area Filling Display System Architec-
Since e x a c t a n t i - a l i a s i n g is i n h e r e n t l y slow, a novel ture by Whelan [Whe182].
p a r a l l e l a n t i - a l i a s i n g a l g o r i t h m is p r e s e n t e d in (2) Hardware enhancements or graphics engines.
wh i ch subpixel c o v e r a g e by e d g e s is a p p r o x i m a t e d Clark's geometry engine, although not a scan
using a look-up table. The u l t i m a t e i n t e n s i t y of a conversion, system, illustrates the latter
pixel is t h e w e i g h t e d s u m of t h e i n t e n s i t y c o n t r i b u - [ClarSZ], and Whitted's enhanced frame buffer is
t i o n of t h e c l o s e s t edge, t h a t of t h e "losing" edges, an example of the former [Whir81]. The pro-
and t h a t of t h e b a c k g r o u n d . The a l g o r i t h m is f a s t posed systems of Fussell and Rathi [FuRa82],
and a c c u r a t e , it is a t t r a c t i v e e v e n in a s e r i a l and Weinberg [Wein81], are graphics engines.
e n v i r o n m e n t , and it avoids s e v e r a l a r t i f a c t s t h a t (3) Special-purpose, multiple-processor systems.
c o m m o n l y o c c u r in a n i m a t e d s e q u e n c e s . These systems incorporate special-purpose
CR C a t e g o r i e s and S u b j e c t D e s c r i p t o r s : B . 3 2 hardware to broadcast image descriptions to
[Memory Structurcs]: Design Styles Shared the processors. Image m e m o r y is often parti-
Memory; D.3.3 [ P r o g r a m m i n g L a n g u a g e s ] : Language tioned to enhance parallelism. Examples
C o n s t r u c t s - Concurrent p r o g r a m m i n g s t r u c t u r e s ; include Fueh's central broadcast controller
F.2.2 [Analysis of A l g o r i t h m s a n d P r o b l e m Complex- [Fuch77], Parke's splitter tree, and Parke's
ity]: Nonnumeri, cal A l g o r i t h m s and P r o b l e m s splitter tree/broadcast controller hybrid
Geometrical p r o b l e m s and computc~tions; 1.3.1 [ParkS0].
[ C o m p u t e r G r a p h i c s ] : H a r d w a r e A r c h i t e c t u r e - Ras- Obviously, any p a r a l l e l - p r o c e s s i n g s c h e m e s h o u l d
ter display devices; 1.3.3 [ C o m p u t e r G r a p h i c s ] : d e m o n s t r a b l y h a s t e n s c a n c o n v e r s i o n . The above
P i c t u r e / I m a g e G e n e r a t i o n - Display algorithms; p r o p o s a l s ar e no e x c e p t i o n . S e v e r a l i s s u e s r e m a i n
1.3.7 [Computer Graphics]: Three-Dimensional open, however. F i r s t , few p r o p o s a l s a d d r e s s t h e
G r a p h i c s and R e a l i s m - Y~sible l i n e ~ s u r f a c e algo- aliasing p r o b l e m . Indeed, a n t i - a l i a s i n g is d i f f i c u l t
rithm. to p e r f o r m on t h e s y s t e m s of F u ch s et al., F u s s e l l
and Rathi, Whelan, Fuchs, and P a r k e . Second,
d i sp l ay s y s t e m s exploiting p a r a l l e l i s m should
always e x h i b i t s u b s e r i a l b e h a v i o u r . Third, it is not
c l e a r t h a t a s p e c i a l - p u r p o s e s y s t e m is t h e b e s t
a p p r o a c h if s i m i l a r c o m p u t a t i o n a l power is r e q u i r e d
for o t h e r t a s k s . It is likely t h a t t h e f e a s i b i l i t y of
l a r g e - s c a l e d i sp l ay p r o c e s s o r s w i t h s p e c i a l - p u t ' p o s e
h a r d w a r e will coincide w i t h t h a t of g e n e r a l - p u r p o s e
Permission to copy without fee all or part of this material is granted
provided that the copies are not made or distributed for direct p a r a l l e i p r o c e s s n r s . The u ! t r a c o m p u t e r , d e s c r i b e d
commercial advantage, the ACM copyright notice and the title of the below, is one s u c h p r o c e s s o r . We wish to d e m o n -
publication and its date appear, and notice is given that copying is by s t r a t e t h a t t h e u l t r a c o m p u t e r can be a v e r y effec-
permission of the Association for Computing Machinery. To copy t i v e " g r a p h i c s e n g i n e " in i t s own r i g h t . This is illus-
otherwise, or to republish, requires a fee and/or specific permission. t r a t e d by p r e s e n t i n g a p a r a l l e l s c a n c o n v e r s i o n

© ACM 0-89791-109-1/83/007/0141 $00.75

141
a l g o r i t h m i n c l u d i n g a n t i - a l i a s i n g . The w o r s t c a s e is b a s e d on Law r i es's omega-network [Lawr75],
b e h a v i o u r of t h e a l g o r i t h m is s u b s e r i a l . i l l u s t r a t e d in F i g u r e i.
Not all p r o b l e m s n e c e s s a r i l y have f a s t e r p a r a l l e l
I~tale 1 I IStaee ~ I I~taee 3 I
i m p l e m e n t a t i o n s . However, p r o b l e m s s u c h as s c a n
conversion, which naturally decompose into a large
s e t of i n d e p e n d e n t s u b p r o b l e m s , a r e good candi-
d a t e s for p a r a l l e l p r o c e s s i n g . The o b j e c t i v e of a
g e n e r a l - p u r p o s e p a r a l l e l p r o c e s s o r d e s i g n is to
m a x i m i s e t h e d e g r e e of s u b p r o b l e m i n d e p e n d e n c e
o v e r a wide class of t a s k s . O t h e r w i s e , t h e m a j o r
a d v a n t a g e of s u c h a p r o c e s s o r o v e r s p e c i a l - p u r p o s e
s y s t e m s is lost. In our u l t r a c o m p u t e r model, sub-
p r o b l e m i n d e p e n d e n c e is f a c i l i t a t e d by a Small
r e p e r t o i r e of p o w e r f u l c o n c u r r e n t o p e r a t i o n s on
s h a r e d m e m o r y . To e a c h p r o c e s s i n g e l e m e n t (PE) of
the ultracomputer, a concurrent operation appears
t o e x e c u t e indivisibly. In fact, an i n t e l l i g e n t , m u l t i -
s t a g e n e t w o r k c l e v e r l y c o n n e c t s t h e PEs t o s h a r e d
m e m o r y , and c o m b i n e s all o p e r a t i o n s s i m u l t a n e -
ously d i r e c t e d at a v a r i a b l e i n t o one o p e r a t i o n . Pro- Figure 1, Ro~N~g through a~ omegg-~.et~mrk.l'o~" 8 PEs. Con-
g r a m s t h u s a p p e a r to h a v e a s e r i a l s e m a n t i c s , b u t neetions between PEa, switches, and M M s are by m e a n s of a
p a r a l l e l e x e c u t i o n . Moreover, p a r a l l e l p r o g r a m s ar e slt~ffts-ezcItaw,
ge: an object n u m b e r e d d id • • " "c/n in binary
s i m p l y e x p r e s s e d , unlike t h e o f t e n m o r e c o m p l i - is connected to the object n u m b e r e d d z'- • 01Dd * in the next
stage of the network. A message transmitted from P E
c a t e d t e c h n i q u e s r e q u i r e d to o p t i m i s e c o m p u t a - ~ D • • "P * to M M rn~ - • • rn * uses output port rn~ w h e n leaving
t i o n s on v e c t o r or p i p e l i n e p r o c e s s o r s . B e c a u s e of the i m switch. Similarly for travelling from M M to PE. The
this "serial s e m a n t i c s / p a r a l l e l exeetion" property, route from P E 5 (101e) to M M 2 (010e) is indicated.
t h e a l g o r i t h m s below c a n be i m p l e m e n t e d on any
p r o c e s s o r c a p a b l e of s i m u l a t i n g t h e c o n c u r r e n t
operations, although the resulting programs may The n o v e l t y of t h e NYU d e s i g n r e s t s in t h e i n t e l l i -
r u n m o r e slowly. gent switches, which i m p l e m e n t c o n c u r r e n t access
to v a r i a b l e s in s h a r e d m e m o r y . The n e t w o r k e a s i l y
Section 2 outlines the basic u l t r a c o m p u t e r archi-
r e a l i s e s c o n c u r r e n t f e t c h or s t o r e o p e r a t i o n s .
tecture. A scan conversion algorithm that utilises
O t h er m o r e p o w e r f u l c o n c u r r e n t o p e r a t i o n s c a n be
t h i s p a r a l l e l p r o c e s s i n g m o d e l is p r e s e n t e d in Sec-
i m p l e m e n t e d . P r e s e n t l y , one s u c h i n s t r u c t i o n is
t i o n 3. A n o v el a n t i - a l i a s i n g a l g o r i t h m is g i v en as an
s u p p o r t e d : t h e replace-add, w h i c h c r e a t e s t h e illu-
i n t e g r a l p a r t of s c a n c o n v e r s i o n . Lastly, t h e gen-
sion of i n d i v i s i b l y adding a v a l u e to a s h a r e d v a r i -
e r a l i t y of t h e u l t r a c o m p u t e r is i l l u s t r a t e d by n o t i n g
able, and r e t u r n i n g t h e s u m to t h e r e q u e s t i n g PE.
o t h e r p r o b l e m s to w h i c h it c a n be applied. This is
Specifically, t h e f o r m a t of t h e o p e r a t i o n is
d i s c u s s e d in S e c t i o n 4, as a r e t o p i c s for f u t u r e
research.
RepAdd(V,e) 2, w h e r e V d e n o t e s a s h a r e d ( i n t e g e r )
v a r i a b l e and e is an i n t e g e r e x p r e s s i o n . Let V h a v e
value v. Suppose P~:~ issues the command
2. U l t r a c o m p u t e r A r c h i t e c t u r e
S/ = R e p A d d ( V , e ~ ), a n d PEj issues t h e c o m m a n d
An ultracompz~er is a p a r a l l e l p r o c e s s o r c o m p o s e d Sj = R e p A d d ( V , e j ) s i m u l t a n e o u s l y . T h e n , a s s u m i n g
of m a n y p r o c e s s i n g e l e m e n t s (PEs), w h i c h have V is n o t s i m u l t a n e o u s l y u p d a t e d b y a n o t h e r PE,
m u l t i p l e - c y c l e a c c e s s to s h a r e d m e m o r y . U l t r a - either
c o m p u t e r s a r e a good m o d e l of p a r a l l e l c o m p u t a - S/ = v + ~ i
tion. S c h w a r t z has m a d e an e x t e n s i v e s u r v e y of t h i s S] = v + e i + el,
field, s u m m a r i s i n g v a r i o u s u p p e r and lower bounds or
for p a r a l l e l s o r t i n g a l g o r i t h m s , s e t o p e r a t i o n s , S~ = v + e / + e~
m a t r i x m u l t i p l i c a t i o n , etc. [Schw80]. U l t r a c o m p u t - Sj ----v+ej,
e r s a r e m o r e t h a n j u s t a t h e o r e t i c a l model, how- a n d in either case, t h e n e w v a l u e of V is v + e i + e j .
e v er. Indeed, our u l t r a c o m p u t e r m o d e l is b a s e d on Note t h a t RepAdd(V,0) is a f e t c h i n s t r u c t i o n .
t h e w o r k done at New York U n i v e r s i t y , at w h i c h a
When o p e r a t i o n s on t h e s a m e cell in s h a r e d m e m o r y
l a r g e - s c a l e i m p l e m e n t a t i o n is p l a n n e d [GGKM81].
m e e t at a switch, t h e y ar e s y n t h e s i s e d i n t o a single
The m o d e l we p r o p o s e is a v e r y s l i g h t e x t e n s i o n of
i n s t r u c t i o n . This is s e n t to t h e n e x t s t a g e in t h e
t h e NYU model, i n c o r p o r a t i n g a d d i t i o n a l c o n c u r r e n t
instructions. n e t w o r k in one cycle. I n s t r u c t i o n c o m b i n i n g c a n
o c c u r at any s t a g e in t h e n e t w o r k . H e n c e of all t h e
An NYU U l t r a c o m p u t e r is c o m p o s e d of N = 2D au t o - o p e r a t i o n s s i m u l t a n e o u s l y d i r e c t e d at a single v a r i -
n o m o u s PEs and c o n n e c t e d to N s h a r e d m e m o r y able, V, only one c u m u l a t i v e o p e r a t i o n a c t u a l l y
m o d u l e s . Local m e m o r y for e a c h PE is p r o v i d e d by " r e a c h e s " V. Thus m e m o r y t r a f f i c is r e d u c e d and
m e a n s of a p a r t i t i o n e d m e m o r y cache. PEs a c c e s s n e t w o r k b a n d w i d t h is i n c r e a s e d . Indeed, t h e pro-
s h a r e d m e m o r y v ia a D = l o g 2 N - s t a g e c o n n e c t i o n c e s s o r has t h e following s u r p r i s i n g p r o p e r t y : it is
n e t w o r k c o m p o s e d of an NxD a r r a y of " i n t e l l i g e n t "
a-input, a - o u t p u t s w i t c h e s 1. S w i t c h i n t e r c o n n e c t i o n 2. 'thesemantics of this operation has recently been modified in
t h e NYU design, a n d h a s s i n c e b e e n r e n a m e d F e t e h A d d .
i. The entire architecture can be easily genera]ised to N =/~ D S i n c e RepAdd c a n b e e a s i l y c o n s t r u c t e d f r o m F e t c h A d d , w e
PEs and a D --Iogk N-stage network using k-input, k- will c o n t i n u e t o u s e RepAdd in o u r u l t r a c o m p u t e r model.
output switches.

142
particularly efficient when many operations are a l g o r i t h m b e l o w m a k e s u s e of a n o t h e r c o n c u r r e n t
c o n c u r r e n t l y i s s u e d o n a s m a l l s e t of v a r i a b l e s . i n s t r u c t i o n , t h e R e p A n d . This o p e r a t i o n h a s t h e
S i m u l t a n e o u s u p d a t e of t h e s a m e v a r i a b l e b y all N s a m e f o r m a t as t h e R e p A d d , b u t p e r f o r m s a l o g i c a l
P E s is r e s o l v e d i n O(log N) t i m e , c o m p a r e d t o 0(N) a n d of t h e a r g u m e n t s i n s t e a d of a n a d d i t i o n . N o t e
time for typical parallel processors using t h a t in p r i n c i p l e , only a few Nand g a t e s in e a c h
s e m a p h o r e - l i k e m u t u a l e x c l u s i o n . This is a u s e f u l s w i t c h w o u l d b e r e q u i r e d t o r e a l i s e all 16 b o o l e a n
p r o p e r t y w h i c h is o f t e n e x p l o i t e d . F o r e x a m p l e , o p e r a t i o n s as c o n c u r r e n t i n s t r u c t i o n s .
RepAdd makes an effective synehronisation primi- In g e n e r a l , a n i n s t r u c t i o n s u p p o r t e d b y t h e c o n n e c -
t i v e [GoLR83]. M o r e o v e r , d a t a s t r u c t u r e s a l l o w i n g t i o n n e t w o r k m u s t b e a s s o c i a t i v e . Thus c o n c u r r e n t
parallel access are conveniently implemented using floating point operations cannot be properly real-
RepAdd. A polygon display list can be nicely i m p l e - i s e d 4. T h e r e e x i s t i n h e r e n t l y n o n - a s s o c i a t i v e o p e r a -
m e n t e d as a p a r a l l e l q u e u e . S u p p o s e t h e i n d e x t i o n s , s u c h as t h e g r o u p (or F o u r i e r ) c o m m u t a t o r .
N e x t P o l y g o n is u s e d as a s u b s c r i p t i n t o a p o l y g o n D e f i n e d as [a ,b ] = a b a - l b -5, t h i s o p e r a t i o n is n o t
list. Then every PE executing associative for non-commutative groups; thus a
R e p A d d ( N e x t P o l y g o n , 1 ) is g u a r a n t e e d to get a "RepCom" instruction for matrices under multipli-
unique value for NextPolygon. c a t i o n is i n h e r e n t l y u n r e a l i s a b l e .
The s t a n d a r d NYU u l t r a c o m p u t e r m o d e l s u p p o r t s The s e r i a l i s a t i o n p r i n c i p l e is a n e c e s s a r y p r o p e r t y
the three concurrent instructions described above: of t h e c o n n e c t i o n n e t w o r k : The n e t w o r k e n s u r e s
f e t c h , s t o r e , a n d R e p A d d . To r e a l i s e t h e s e o p e r a - t h a t t h e e f f e c t of s i m u l t a n e o u s o p e r a t i o n s b y t h e
t i o n s , a s w i t c h o n l y n e e d s a s m a l l a m o u n t of PEs is e q u i v a l e n t t o s o m e s e r i a l i s a t i o n of t h e o p e r a -
memory, and an adder. Implementation details, tions.
together with a network performance analysis, are
f o u n d i n [GGKM81]. A l t h o u g h t h e s e i n s t r u c t i o n s 3. A F a s t Parallel S c a n Conversion A l g o r i t h m
have proved useful for constructing good parallel
solutions to scientific and operating system prob- 3.1. Preliminaries
lems, we b e l i e v e a c o n c u r r e n t , flexible c o m p a r i s o n
Our d e f i n i t i o n of s c a n c o n v e r s i o n is t h e t r a d i t i o n a l
i n s t r u c t i o n is n e e d e d . We p r o p o s e a n e w c o n c u r r e n t
o n e (e.g. [NeSp79]). G i v e n a s c e n e r e p r e s e n t e d b y P
i n s t r u c t i o n , r e p L a c e - m i n i m u m , o r RepMin, w h i c h is
s i m p l e p o l y g o n s , d e t e r m i n e t h e s e t of p i x e l s a n d
easily r e a l i s e d by adding a e o m p a r a t o r to e a c h
their intensities that best approximates the scene.
s w i t c h . I t s s e m a n t i c s is d e f i n e d a s f o l l o w s . L e t V
The s o l u t i o n , b a s e d o n t h e c o n v e n t i o n a l s i n g l e -
d e n o t e a c e l l of s h a r e d m e m o r y h a v i n g v a l u e v, a n d
scanline z-buffer algorithm, performs hidden-
let e be an expression such that both v and e are
surface removal and anti-aliasing. Serial scanline
p a i r s ( i n t e n s i t y , d e p t h ) of v a l u e s s. T h e n RepMin(V,e)
a l g o r i t h m s t y p i c a l l y r e q u i r e a Y X - s o r t of p o l y g o n
c a u s e s all of V t o b e r e p l a c e d b y e iff e . d e p t h <
s p a n s i n t e r s e c t i n g w i t h a g i v e n s c a n l i n e [SuSS74].
v . d e p t h . The v a l u e r e t u r n e d b y R e p M i n will b e d i s -
H o w e v e r , R e p M i n a l l o w s u s t o d r o p t h e X s o r t . The
c u s s e d p r e s e n t l y . The u t i l i t y of R e p M i n i n s c a n
shared memory storing ultimate scanline intensi-
c o n v e r s i o n is o b v i o u s . C o n s i d e r t h e f o l l o w i n g p a r a l -
t i e s is a s s u m e d t o b e a v a i l a b l e t o a v i d e o c o n t r o l l e r ,
lel v e r s i o n of t h e z - b u f f e r a l g o r i t h m f o u n d i n
by dual-ported memory, for instance.
[NeSp79]. H e r e , t h e e n t i r e z - b u f f e r is a s s u m e d t o
b e a d d r e s s a b l e as a n n x m a r r a y of s h a r e d m e m o r y .
3.2. T h e A l g o r i t h m
E a c h PE e x e c u t e s t h e f o l l o w i n g .
F i r s t we b r i e f l y o u t l i n e t h e m a j o r s t e p s p e r f o r m e d
while polygons remain do begin
get P from polygon list (use RepAdd) b y e a c h PE. As i n t r a d i t i o n a l s c a n l i n e a l g o r i t h m s , a
Ypixels (x,y) E P do begin ¥ - s c a n l i n e b u c k e t is e m p l o y e d t o d e t e r m i n e p o l y g o n
i :--Polygonlntensity(P,x,y) s e g m e n t s t h a t e n t e r t h e s c e n e a t s c a n l i n e y.
z := PolygonDepth(P,x,y)
RepM'm((x,y), (i,z)) (I) R e m o v e b a c k f a c i n g p o l y g o n s .
end (2) Convert remaining polygons into s e t s of $1#a~-area~, i.e.
end trapezoidal or triangular regions. Insert each span-area
into the Y-bucket corresponding to its largest y-value.
Let u s n o w d i s c u s s t h e v a l u e r e t u r n e d b y a R e p M i n
(3) Scan convert span-areas:
o p e r a t i o n . We o n l y c o n s i d e r t h e c a s e w h e r e n P E s f o r y := ymin to ymax do
(0-<n~N) s i m u l t a n e o u s l y i s s u e a R e p M i n f o r c e l l V. (a) The span-areas from bucket y are inserted into the
I n f o r m a l l y , of all t h e R e p M i n ' s s i m u l t a n e o u s l y ac~vs span list (ASL).
d i r e c t e d a t v a r i a b l e V, t h e v a l u e r e t u r n e d t o a PE is (b) Process active spans for scanline y. Each PE takes a
one w h i c h has "lost" in at l e a s t one c o m p a r i s o n . span from the ASL. If the s p a n is large, only a fraction
M o r e o v e r , a n y v a l u e s e n t b y a p a r t i c u l a r PE is of it is taken at a time, thus permitting parallel pro-
r e t u r n e d e x a c t l y o n c e . P e r h a p s s u r p r i s i n g l y , t h i s is cessing of the span. For each pixel in its portion of a
a c h i e v a b l e in t h e s w i t c h e s , and c a n be s h o w n by span, the PE computes i n t e n s i t y and depth values, and
p e r f o r m s a table look-up to approximate the portion of
i n d u c t i o n o n n. the pixel covered by the span. The left and right end-
The NYU u l t r a c o m p u t e r a l s o p r e s e n t l y l a c k s c o n - points of the span are then updated. If the span-area
c u r r e n t l o g i c a l b i t o p e r a t i o n s . The s c a n c o n v e r s i o n is exhausted, it is removed from the ASL.
(c) Anti-aliasing. For each non-empty pixel, an approxi-
3. To make the r e p l a c e - m i n i m u m i n s t r u c t i o n qu~te genera], the mate anti-aliasing procedure is performed by
extent of. the intensity and depth subwords could be con-
trolled by a modifiable bit-mask stored in each switch. 4. In most computers, ((I0~--i0a)+1) ~ (i0a+(--I0~+I)),for
Clearly, t h e n a m e s of the subwords, "intensity" and "depth", large a.
are illustrative. In practice, the subwords cotdd be known by
a r b i t r a r y names.

143
determining the intensity contribution of the closest that two locks are necessary to have fully reusable
span, and adding in the average contribution of the locks for synchrordsation.
"losers". The coverage information computed in step
(b) is used in these calculations. i, j, p: i n t e g e r
InputList, Ngon, Lock I, Lock2, PolyIn, Poly0ut: s h a r e d
3.2.1. Data Structures The f i r s t PE in initialises Polyln, Poly0ut
For clarity, we only use static storage in shared if RepAdd(Lockl,1) = 1 t h e n PolyIn := Poly0ut :-- Lock2 := 0
memory. ASsume there are P input polygons found while Lock1 < N do ~nothing~
i n t h e a r r a y I n p u t L i s t . I n w h a t f o l l o w s , l e t V~ b e t h e I The last PE out resets Loekl for future Use
n u m b e r of v e r t i c e s i n i n p u t p o l y g o n P~, a n d l e t V if Lock2 = N-1 t h e n Loekl := 0
b e t h e l a r g e s t s u c h V~. A s s u m e t h e P E s a r e p r o - RepAdd(Lock2,1)
grammed in a high-level language such as Pascal or while Lock2 < N de ~nothing{
Euclid which allows programmer-defined data p := I~phdd(Poly0ut,1)
types. Note that arrays in shared memory are pos- while p ~ P do begin
sible, since their starting addresses can be stored for polygon InputList[p], calculate c
E
in the local memory f o r e a c h PE. The n a m e s
e := ~ (V[i].x-V[j].x)(V[i].y+V[j].y)
assigned to variables in shared memory begin with C~l
an upper case letter. Local variables begin with a w h e r e ]=i+l if i<V~ ; otherwise j = l
lower case letter. if c .c 0 t h e n ~the polygon f e c e s us, add it to Ngon]
Ngon[RepAdd(Pelyln,1)] := InputList[p]
p := RepAdd(Poly0ut, 1)
end
Polygon display list I
[nputList: a r r a y I..P of Polygon In the average case, each PE processes about P/N
Ngon: a r r a y I..P of Polygon p o l y g o n s . This a l g o r i t h m a s s u m e s t h a t N < P , s i n c e
- each polygon Pc contains an a r r a y 1..l~ of (x,y,z).
otherwise those PEs with PEid>P do no work. The
I Y bucket. Yp gives next available position for s e a n l i n e y ] a m o u n t of m e m o r y t r a f f i c t h i s a l g o r i t h m w o u l d
¥: matrix ymin..ymax 1..PV of SpanArea c a u s e is s u b o p t i m a l , s i n c e p o l y g o n d e f i n i t i o n s a r e
Yp: a r r a y ymin..ymax of 0..P moved around, rather than their pointers.
Active Span List. S reflects the n u m b e r of spans.
3.2.3. Decomposition of Polygons into Span-areas
ASL: a r r a y 1..PV of SpanArea
S: 1..PV :-- 0 As p r e s e n t e d i n t h i s p a p e r , t h e s c a n c o n v e r s i o n
algorithm presumes the input polygon list has been
I Some indices decomposed into span-areas: trapezoidal or tri-
PolyIn, PolyOut, CurrentSpan: Integer
a n g u l a r r e g i o n s . This i d e a is n o t n e w ( s e e [ L e e B l ,
I Locks for synchronisation. Assume they are initialised to 0 WeinS1, WhWeSl]). U n l i k e p o l y g o n s , s p a n - a r e a s h a v e
Loekl, Lock2: 0..P := 0 a b o u n d e d , c o n c i s e s p e c i f i c a t i o n i n t e r m s of l e f t a n d
right edges. Thus span-areas are useful in
SpanArea: t y p e scanline-oriented algorithms. However, desirable
record of
yt ~ top y I p r o p e r t i e s of t r a p e z o i d s s u c h as p l a n a r i t y a r e n o t
dy { height of span-area necessarily preserved after geometric transforma-
xl ~ current LHS tions. C o n s e q u e n t l y , t h e i n p u t p o l y g o n l i s t is
xr ~ current RHS I preprocessed for each frame. This additional com-
xm ~ multiplicity-see below; initiallyxm--xl-M~ p u t a t i o n c a n b e c i r c u m v e n t e d if p o l y g o n s a r e t r i -
dxl ~ &z ofLHS I a n g u l a t e d o n c e a n d f o r all, s i n c e t r i a n g l e s r e m a i n
AN
(trivially) planar after geometric transformations
dxr [ Az of RHS ( s e e [ F u R a 8 2 , Whir81]). The s c a n c o n v e r s i o n algo-
Ay
rithm easily adapts to triangles, but since span-
dyl [ ~ofLHS| a r e a s a r e s o s i m p l e t o w o r k w i t h , t h e a l g o r i t h m is
Az
presented using span-areas. Both triangles and
dyr ~ A.~ of RHS s p a n - a r e a s c a n l e a d t o f r a g m e n t a t i o n of v e r y s m a l l
Az (pixel-sized) polygons, making anti-aliasing critical.
DepthInfo
Intensitylnfo A m a x i m u m of V-1 s p a n - a r e a s a r e g e n e r a t e d f o r a
end p o l y g o n of V v e r t i c e s . An O(V logV) s e r i a l a l g o r i t h m
to decompose a simple polygon into span-areas was
recently published [LeeS1]. A straightforward,
3.2.2. Synchronisation, Initialisation, and Backfae- polygon-per-PE parallelisation of t h i s a l g o r i t h m
ing Polygon Removal
w o u l d y i e l d a n o ( P- V ~ - log V) a v e r a g e - c a s e r u n n i n g
S i n c e t h e c o d e i n t h i s s e c t i o n is f a m i l i a r , i t is a g o o d
p l a c e t o i l l u s t r a t e s o m e p r i n c i p l e s of s y n c h r o n i s a - t i m e . As e a c h s p a n - a r e a is g e n e r a t e d , i t is i n s e r t e d
t i o n a n d i n i t i a l i s a t i o n . A s s u m e e a c h PE h a s a c c e s s into the Y-bucket corresponding to the largest y
to a unique identifier in the manifest constant PEid, v a l u e of t h e s p a n - a r e a . This c a n b e d e t e r m i n e d o n -
w h i c h t a k e s o n a v a l u e b e t w e e n 1 a n d N. The follow- the-fly with no change in the order statistic.
ing code initialises PolyIn and PolyOut, performs
synchronisation, and removes backfacif~g polygons
as i n [ N e S p 7 9 , A p p e n d i x III]. We a s s u m e the
polygons in the input list have undergone perspec-
tive transformation. The r e a d e r m a y w i s h t o v e r i f y

144
3.2.4. S c a n Conversion
Each PE performs the following s c a n conversion p r o c e d u r e G e t S p a n ( v a r span: SpanArea;
loop. var spansLeft: Boolean)
gotSpan: B o o l e a n
for y:=ymin to ymax do begin S, ASL, CurrentSpan: shared
UpdateASL(y) M: Constant
InitialiseScanLine newLHS: Integer
ScanConvert(y)
<synchronise> spansLeft := CurrentSpan .c S
end for g o t S p a n := f a l s e
w h i l e ~ g o t S p a n a n d spansLeft do begin
U p d a t e A S L p l a c e s t h e c o n t e n t s of b u c k e t Y[y] i n t o span := ASL[CurrentSpan]
t h e a c t i v e s p a n l i s t . All P E s s y n c h r o n i s e a t t h e c o m - with ASL[CurrentSpan] do b e ~
p l e t i o n of s c a n c o n v e r s i o n f o r e a c h s c a n l i n e . This is } calculate new LHS of span, and see if LHS>RHS
newLHS := RepAdd(xm,M)
n o t n e c e s s a r y . If s u f f i c i e n t m e m o r y is a v a i l a b l e ,
gotSpan := newLHS ~ xr
the algorithm easily generalises to k-scanlines, k if ~gotSpan then
1. We n o w c o n s i d e r t h e s c a n c o n v e r s i o n p r o c e s s i n if span is exhausted, the firstPE advances CurrentSpan
more detail. and processes the remaining span segment
if newLHS-xr < M the,,
p r o c e d u r e InitialiseScanLine RepAdd(CurrentSpan, 1)
InitialiseXBucket spansLeft := CurrentSpan ~ S
CurrentSpan := 1 end with/while
<synchronise> if gotSpan t h e n span.xm := newLHS
end I n i t i a l i s e S c a n L i n e spansLeft := gotSpan
e n d GetSpan
p r o c e d u r e SeanConvert(y: ymin..ymax)
span: SpanArea
spansLeft: B o o l e a n
X: s h a r e d 3.2.5. A n t i - a l i a s i r ~
Get Span(span,spansLeft) The a l i a s i n g p r o b l e m is i m m e d i a t e l y a p p a r e n t t o
w h i l e spansLeft do begin anyone who has seen synthesised raster images.
Vx E span calculate pixeILnfe: Various aliasing artifacts are possible in both still
intensity, depth, and coverage mask and moving images. An a b u n d a n t literature
UpdatePixel(x, pixelInfe)
Get Span(span,spansLeft) d e s c r i b e s t h e p r o b l e m a n d s o m e of i t s s o l u t i o n s .
end while S e e [Crow77, Crow81] f o r a s t a r t . It is t h u s of p r i m e
AntiAliasScanline(y) importance to examine whether anti-aliasing can be
end ScanConvert incorporated into our algorithm. Since we
currently compute the picture scanline by scanline
The X b u c k e t contains all required scanline informa-
w i t h o u t b a c k t r a c k i n g o v e r s c a n l i n e s , we c a n n o t u s e
tion. It will be discussed shortly, as will the rou-
any scheme where the value at one pixel depends on
tines UpdatePixel a n d AntiAliasScanline.
t h e v a l u e of s o m e of i t s n e i g h b o u r s , u n l e s s w e a r b i -
G e t S p a n does the obvious: it returns an unpro- t r a r i l y p r i v i l e g e t h e x d i r e c t i o n 8.
cessed s p a n to the scan converter. H o w e v e r , the
The b e s t s o l u t i o n u n d e r t h e c i r c u m s t a n c e s is w h a t
routine is c o m p l i c a t e d b y the fact that w e wish to
we c a n call t h e Exact A r e a S a m p l i n g solution, w h e r e
get a subserial w o r s t case behaviour. In particular,
large spans should receive parallel treatment, for the intensity for the pixel is I = A~I~/l~. A, and/~
otherwise all P E s could wait for one P E to c o m p l e t e
are the areas a n d intensities of the visible surfaces
a long s p a n . One a p p r o a c h is f o r P E s t o r e c u r s i v e l y
within the pixel, a n d A is the total area of the pixel.
s u b d i v i d e l a r g e s p a n s s o t h a t e a c h PE p r o c e s s e s a
If colour is used, this f o r m u l a is u s e d for the three
s m a l l e r p o r t i o n of t h e s p a n . H o w e v e r e l e g a n t t h i s
primaries. As pointed out in [Catm78], a n d imple-
s o l u t i o n a p p e a r s , t h i s a p p r o a c h is l i k e l y t o i n c r e a s e
m e n t e d there a n d in [FuBar79], this requires a hid-
memory traffic substantially. The a p p r o a c h w e
d e n surface algorithm at the pixel level.
h a v e t a k e n a v o i d s t h i s p r o b l e m . A s s u m e t h e r e is a
c o n s t a n t M w h i c h d e n o t e s t h e m a x i m u m n u m b e r of W e c a n establish a m o r e formal lower bound, b y
p i x e l s i n a s p a n t h a t a PE is a l l o w e d t o p r o c e s s a t a showing that a n y algorithm that c o m p u t e s the E A S
t i m e . This v a l u e m a y b e e m p i r i c a l l y o r t h e o r e t i c a l l y can be u s e d to d e t e r m i n e the order of a list of ~%
determined, and represents a good balance between non-negative integers. T h e reduction is as follows.
t h e o v e r h e a d in G e t S p a n and t h e i n c r e a s e d effi- G i v e n alist NI,N2,...,N~ of n u m b e r s , construct a
c i e n c y i n p a r a l l e l p r o c e s s i n g of l a r g e s p a n s . Multi- scene with ~% rectangles of d e p t h Ni , with the left,
p l e c o p i e s of a s p a n m a y b e r e t u r n e d ; t h e i n d e x x m top a n d b o t t o m edges coincident with the pixel left,
is u s e d t o i n d i c a t e t h e l e f t m o s t p o i n t of t h e u n p r o - top a n d bottom, a n d the right edge of rectangle ~ at
c e s s e d p o r t i o n of t h e s p a n s . The following, is o n e N~. Without loss of generality, a s s u m e that the pixel
p o s s i b l e i m p l e m e n t a t i o n of G e t S p a n . It is s o m e - right edge is at m a x ( N i ). Let the intensity ~ of
w h a t t r i c k y s i n c e it m u s t cope w i t h t h e unlikely e a c h rectangle be D i-I w h e r e D is a constant
e v e n t t h a t two PEs s i m u l t a n e o u s l y t r y to get an greater than m a x (Ni) - rain (N~).
exhausted span. The answer to the gAS problem is then:

6. The ideais not totallywithout merit,sinceas seen on broad-


cast televisionit produces decent images. Note, moreover,
5. See the definitionof the SpanArea data type above, that a k-sean/ineversion(k > I) of the algorithmwould per-
rnita mltltJp]e-pixe]anti-aliasingscheme.

145
H x~Di-I(N~-Np ) where H is t h e h e i g h t of t h e using an extra bit to indicate the direction, which
i will t e l l w h e t h e r t o c o m p l e m e n t t h e m a s k o r n o t .
p i x e l , a n d N~ is t h e p r e d e c e s s o r of N¢ i n t h e s o r t e d The t a b l e is of c o u r s e p r e c o m p u t e d , a n d e a c h b i t is
order. The p r e d e c e s s o r of m i n ( N i ) is 0. This o n if t h e s u b p i x e l c o r r e s p o n d i n g t o i t is m o r e t h a n
t r a n s f o r m a t i o n c a n b e d o n e i n 0 ( n ) t i m e . I t is c l e a r half-covered by the halfplane described by the
t h a t t h e a n s w e r , w h e n e x p r e s s e d as a b a s e D index.
n u m b e r , c o n t a i n s Ni - N p i n t h e i t h d i g i t ( f r o m t h e
Of t h e f o u r e d g e s of a n o r m a l s p a n - a r e a , t w o a r e
least significant), and that therefore in 0(n) time
horizontal, and are relevant only at the start and at
one can find, for every number, its predecessor in
t h e e n d of i t s s c a n n i n g . F o r t h e s e , a s m a l l s p e c i a l
t h e s o r t e d o r d e r . C o m p u t i n g t h e a n s w e r t o t h e EAS
lookup table can be used, with the y fraction used
problem allows sorting with a 0(n) time transfor-
as t h e i n d e x . F o r t h e o t h e r t w o e d g e s , u p d a t i n g t h e
mation, and therefore takes at least 0(nlogn) time.
i n t e r s e c t i o n i n f o r m a t i o n f r o m p i x e l t o p i x e l is f a i r l y
While t h i s d o e s n o t p r o v e t h a t i t is n e c e s s a r y t o
simple, and requires only additions and subtrac-
s o l v e t h e h i d d e n s u r f a c e p r o b l e m t o s o l v e t h e EAS
tions.
problem, this shows that nothing easier than sort-
i n g will d o it. F o r o t h e r r e s u l t s a b o u t t h e EAS, s e e The c o v e r a g e m a s k h a s i n t e r e s t i n g b o o l e a n p r o p e r -
[FoFu83]. t i e s . I n d e e d , t h e m a s k f o r t h e i n t e r s e c t i o n of a
s p a n - a r e a w i t h a p i x e l is t h e a n d of t h e m a s k s of t h e
I n v i e w of t h i s r e s u l t , we will a i m f o r a n a p p r o x i -
s p a n - a r e a ° s e d g e s w h i c h c r o s s t h e p i x e l . T h u s we
m a t e s o l u t i o n . O u r a p p r o a c h will b e t o l i m i t t h e
get an accurate representation of t h e s u b p i x e l s
a m o u n t of c o m p u t a t i o n a n d t o u t i l i s e p a r a l l e l i s m as
c o v e r e d b y a g i v e n s p a n - a r e a . It is a l s o e a s i l y s e e n
much as possible.
that the mask for the background (indicating the
We s u b d i v i d e t h e p i x e l i n t o ~z × n s u b p i x e l s . I t is s u b p i x e l s w h e r e t h e b a c k g r o u n d is s e e n ) is t h e c o m -
c o n v e n i e n t t o h a v e ~ a p o w e r of 2, f o r e x a m p l e p l e m e n t of t h e o r of all t h e s p a n - a r e a m a s k s f o r
n =23=8. F o r e a c h l i n e w h i c h i n t e r s e c t s a p i x e l , t h e t h i s p i x e l . It is u n f o r t u n a t e l y i m p o s s i b l e t o go m u c h
t w o i n t e r s e c t i o n p o i n t s a l o n g t h e b o u n d a r i e s of t h e farther without making some approximations. The
p i x e l a r e u s e d as a n i n d e x i n t o a l o o k u p t a b l e , w h o s e p r o b l e m is t h a t we d o n o t w a n t t o c o m p u t e t h e Z
entries give the subpixels covered by the halfplane values at the subpixel resolution, since it would be
d e f i n e d b y t h i s l i n e . We will c a l l t h i s e n t r y t h e tantamount to going to a higher resolution. Each
mask f o r t h i s h a l f p l a n e . In o u r e x a m p l e , t h e m a s k s p a n - a r e a a t a g i v e n p i x e l is t h e n a s s o c i a t e d w i t h
w o u l d b e a 64 b i t n u m b e r . E a c h i n t e r s e c t i o n w i t h o n l y o n e Z v a l u e , n a m e l y i t s Z a t t h e c e n t r e of t h e
t h e b o u n d a r i e s of t h e p i x e l is c o m p u t e d w i t h k b i t s p i x e l . G i v e n t h a t , we c a n n o t g u a r a n t e e t h a t t h e
of f r a c t i o n ( w h e r e t h e r e a r e 2k i n t e r v a l s , s i n c e t h e depth comparison allows the visible areas to be
f r a c t i o n n/n= 1 i n t h e c u r r e n t p i x e l is 0 o n t h e n e x t e x a c t l y d e t e r m i n e d , u n l e s s t h e p l a n e s of s u p p o r t of
pixel). In our example, then, k=3. Thus each inter- the span-areas do not intersect within the pixel (see
s e c t i o n c a n b e f u l l y d e s c r i b e d as a k + 2 b i t n u m b e r , F i g u r e 3). We w i l l g i v e t w o a p p r o x i m a t i o n algo-
2 bits to identify the boundary crossed, and k bits rithms, and discuss where they succeed, and where
to give the crossing position along the boundary t h e y fail. L e t weight ( m a s k ) b e t h e f r a c t i o n of t h e
( s e e F i g u r e 2). The t o t a l e n t r y f o r a l i n e is t h e n a pixel covered by a mask (this can be easily com-
2(/¢ + 2 ) b i t n u m b e r , w h i c h i n o u r e x a m p l e is a i 0 b i t p u t e d b y c o u n t i n g t h e n u m b e r of o n e b i t s i n t h e
n u m b e r . This g i v e s a 1 K × 6 4 b i t t a b l e , w h i c h is s m a l l m a s k ) . The s p a n - a r e a w i t h t h e s m a l l e s t Z v a l u e is
e n o u g h t o a l l o w a c o p y f o r e a c h PE. A l t e r n a t i v e l y , c a l l e d t h e zvirLner; t h e o t h e r s a r e c a l l e d losers.
several PEs could directly share such a read-only
There are two ways to compute the final pixel inten-
table.
s i t y . One w a y n e c e s s i t a t e s t h e u s e of a n X - b u c k e t t o
hold pixel information for each span-area intersect-
t lz / ing with the current scanline; a pass over the con-
t e n t of t h i s b u c k e t w o u l d b e p e r f o r m e d a t t h e e n d
of t h e s c a n l i n e , s i n c e t h e f i n a l i n t e n s i t y c a n n o t b e
c o m p u t e d u n t i l t h e w i n n e r is k n o w n . The o t h e r
Code= Iii0100011 a p p r o x i m a t i o n c a n b e c o m p u t e d o n - t h e - f l y , a n d is
O0 Ol ~ ' ~ a l m o s t as a c c u r a t e as t h e f i r s t . The t w o m e t h o d s
i t 51 3 c a l c u l a t e i n t e n s i t i e s f z a n d 12, r e s p e c t i v e l y , a s fol-
r
lows.

c/ II Mask= FSFOE00080000000

II b i0
X

Figure 2. PL~el-/,~,ne ~,~tersecL~o~ e~cod'i,~g.

T h e o r d e r of t h e i n t e r s e c t i o n s is r e l e v a n t , s i n c e t h e
l i n e s h o u l d b e o r i e n t e d . We a d o p t a c o n v e n t i o n t h a t
t h e i n s i d e is t o t h e r i g h t w h e n g o i n g f r o m t h e f i r s t
i n t e r s e c t i o n t o t h e s e c o n d . T h e s i z e of t h e t a b l e c a n
be reduced by making it into a triangular array, and

146
procedure AntLAliasScanline(y:ymin..ymax)
I t = ff~.'n~e~'Comp + LoserCo~rtp 1 + B~ckWro~nddb~rtp x: Integer
[m= ~vtvte~.Corctp + Los~rCor~p 8 + Igaclc~rou~tdCo~p winner,pix: PixelInlo

W~vtne~Comp = 1. x~veight (.~ask~ ) InitialiseCx to xmin


B~.ckgrouTtdCov%p = It x~ue£gh~ (AAtlMa~k~ ) while Cx ~ xmax do b e ~
ZoserCornp * = Cot ,x~l~=,, x'we£gh, f (m.ask~ Arn, o.s/c.=, ) } Many PEa work on each pixel (i.e.X bucket)
x := RepAdd(Xp[Cx],-1) + i ~ get pixel info for span
Lose?'Corrt~ = = CoT"=xtue£glt~ (rrtcslc= )X ~. ~ x~ue~gltf (rrtllskl) winner := Pixels[Cx].Winner
=ll l
•oe~,llh,t (ra.ask,~Arn, a s l ~ ) , while x > 0 do begin
pix := X[Cx,x]
if pix ~ winner then begin
pix is a loser, calculateits contribution
CO~"~ = ~ue~glt¢ (wta.slcb Ar/taSP.~.). newMask := pix.Mask A winner.Mask
newlnt := Weight(newMask) x pix.lnt
RepAdd(Pixels [Cx].Loserlnt,newInt)
RepAdd(Pixels [Cx].SumOfWeights, Weight(newMask))
T h e subscripts ~o ,t, a n d b, stand for "winner", end if
x := RepAdd(Xp[Cx],-I) + 1
"loser", and "background", respectively. The
e n d while
correction factors are ratios of the actual coverage if x = 0 t h e n b e g i n
b y the losers c o m p a r e d to the s u m o3 their indivi- PE w i t h x=O a d d s b a c k g r o u n d a n d l o s e r s ' c o n t r i b I
dual coverage as c o m p u t e d b y e a c h algorithm. for Pixels[Cx], c o m p u t e :
Therefore the correction factors give a m e a s u r e of c := W e i g h t ( B a c k . M a s k / \ W i n n e r . M a s k ) / S u m O f W e i g h t s
the a m o u n t of overlap of the losers, h e n c e of the RepAdd(Winner.Int, cxLoserlnt + Back.lnt)
R~pAdd(Cx, i)
possible error. end then
else <synchronise> }allother PEa waitl
3.2.5.1. First a p p r o x i m a t e anti-aliasing algorithm end while
This solution requires a n X bucket. For e a c h pixel, end AntiAliasScanline
several additional pieces of irdormation are kept:
the current winner, b a c k g r o u n d data, the losers'
intensity, a n d their s u m of c o v e r a g e - m a s k weights. 3.2.5.2. Second approximate anti-aliasing algo-
The following data structures are used. rithIn
x bucket. Xp contains listof number of span-areas per pixel I N o X b u c k e t is required in this solution. W e only
X: matrix xmin..xmax I..PV of PixeIInfo k e e p four pieces of information for e a c h pixel,
Xp: array xmin..xmax of 0..PV := 0 Winner, Back, S u m 0 f W e i g h t s , a n d Losers. Winner,
Back, a n d S u m 0 f W e i g h t s are as in the first solution;
Additional pixel information I
Pixels: array xmin..'xmax of Losers is u s e d to k e e p track of the losers' coverage
Winner, Back: PixeIInfo a n d intensity contributions on-the-fly.
Loserlnt, Sum0fWeights: Integer
p r o c e d u r e UpdatePixel(x: xmin..xmax, pix: PixelInfo)
PixelInfo: t y p e r e c o r d of loser: PixelInfo
Depth
Int ~ intensity l o s e r := R e p M i n ( P i x e l s [ x ] . W i n n e r , pix)
Mask ~ coverage mask J i n t C o n t r i b := l o s e r . l n t x W e i g h t ( l o s e r . M a s k )
end Pixel[nfo RepAdd(Pixels [x].Losers.lnt,intContrib)
RepAnd(Pixels [x].Back.Mask, loser.Mask)
The S c a n C o n v e r t routine above executes the follow- RepAdd(Pixels [x].Su m 0fWeights, Weight(pix.Mask))
ing version of UpdatePixel a n d AntiAliasSeanline. end UpdatePixel
Recall that e a c h P E executes ScanConvert.
procedure AntiAliasScanline(y:ymin..ymax)
each PE handles a pixel,so if N > X, some PEs are idle I
p r o c e d u r e UpdatePixel(x: xmin..xmax, pix: PixelInfo) x := PEid + xmin - I
Add pixel from this span into bucket while x < xmax de begin
X[x,RepAdd(Xp[x], I)] := pix
pix := Pixels[x]
pix m a y be a "winner" compute background and losers'intensitycontrib I
RepMin(Pixels [x].Winner,pix) backlnt := pix.Back.Int× Weight(pix.Back.Mask)
I Determine how m u c h background is covered by pix
c := Weight(pix.Back.Mask/Xpix.Winner.Mask)/pix.Sum0fWeights
RepAnd(Pixels [x].Back.Mask, pix.Mask)
end UpdatePixel loserInt := pix.Losers.Intx Weight(pix.Winner.mask) × c
RepAdd(Pixels [x].Winner.lnt,back[nt+loserLnt)
x:=x+N
end while
end AntiAliasScanline

3.2.,5.3. Analysis of the a p p r o x i m a t i o n s


These approximations, a n d indeed all a p p r o x i m a -
tions of this kind, should be characterised in three
ways: w h e n they are right (here right is to be under-
stood exact within the subpixel resolution), w h e n
they are w r o n g a n d h o w w r o n g they c a n be, a n d
w h e n they are cortsiste~ztty wrong. T h e last is
important, since aliasing is particularly noticeable

147
Computer Graphics Volume 17, Number 3 July 1983

i n m o t i o n , b y crawling, s c i n t i l l a t i o n and other w i t h 10 2 b o u n d a r y p i x e l s e a c h , o n a s c r e e n w i t h 10e


annoying artifacts. If a n a l g o r i t h m c o m p u t e s a p i x e l s , s h o w s t h a t l e s s t h a n 5E of t h e p i x e l s w o u l d
w r o n g s h a d e , b u t is c o n s i s t e n t a s t h e p o l y g o n s have an error, and that for these the average error
m o v e , t h e n t h e s e a r t i f a c t s will b e a v o i d e d . w o u l d b e l e s s t h a n 10~ of t h e s h a d e of t h e p i x e l .
B o t h s o l u t i o n s will b e r i g h t w h e n t h e r e is o n l y o n e As t h e p o l y g o n s m o v e w i t h r e s p e c t t o e a c h o t h e r , we
span-area within the pixel, whether it covers the a v o i d t h e n u m e r o u s p r o b l e m s of p o i n t s a m p l i n g .
w h o l e p i x e l o r n o t . As l o n g as a s p a n - a r e a c o v e r s a t Since the wrong cases are computed from averages,
l e a s t o n e s u b p i x e l ( 1 / 6 4 of a p i x e l i n o u r e x a m p l e ) , t h e e r r o r s m a d e will n o t e x h i b i t l a r g e d i s c o n t i n u i -
i t will c o n t r i b u t e t o t h e t o t a l i n t e n s i t y of t h e p i x e l . t i e s , b u t will b e c o n s i s t e n t f r o m f r a m e t o f r a m e . I n
B o t h s o l u t i o n s a r e a l s o r i g h t w h e n n o n e of t h e t h e e x a m p l e of F i g u r e 3, a s p o l y g o n 3 m o v e s o u t of
s p a n - a r e a s o v e r l a p . This is e s p e c i a l l y i m p o r t a n t , t h e p i x e l , i t s c o n t r i b u t i o n t o t h e p i x e l i n t e n s i t y will
s i n c e we m i g h t h a v e c u t a p o l y g o n i n t o n u m e r o u s go s m o o t h l y f r o m ~13- ( w h i c h is w r o n g ) , t o 0 ( w h i c h i s
s m a l l s p a n - a r e a s . F o r t u n a t e l y we will n o t h a v e t o right).
pay a heavy price in aliasing problems. In fact, the
p r o b l e m s , if any, will b e a t t h e s i l h o u e t t e e d g e s of 3.2.6. Discussion
the objects, and not against the background, but
a g a i n s t e a c h o t h e r . The f i r s t s o l u t i o n h a s t h e a d d i - An i m p l e m e n t a t i o n of t h i s a l g o r i t h m h a s b e e n m a d e ,
t i o n a l a d v a n t a g e of b e i n g r i g h t w h e n t h e w i n n e r demonstrating t h a t t h e a p p r o a c h w o r k s , a n d i11us-
overlaps the losers, but the losers do not overlap trating a realisation in a pseudo-concurrent
e a c h o t h e r . T h e s e c o n d a l g o r i t h m will b e r i g h t i n language. The implementation is w r i t t e n i n Con-
c a s e of o v e r l a p b y t h e w i n n e r if t h e l o s e r c o v e r a g e current Euclid, a language developed at the Univer-
r a t i o is s e n s i b l y t h e s a m e u n d e r t h e w i n n e r as i n s i t y of T o r o n t o w h i c h s u p p o r t s p r o c e s s e s a n d m o n i -
tors. Concurrent operations such as RepAdd are
t h e r e s t of t h e p i x e l .
s i m u l a t e d u s i n g m o n i t o r s . The o n l y r e l e v a n t a s p e c t
F i g u r e s 3 a n d 4 g i v e e x a m p l e s of w r o n g c a s e s , a n d in which our implementation differs from one on an
the errors made by each algorithm. Figure 3 shows ultracomputer is s p e e d . The l a c k of " t r u e " c o n -
the worst case for both algorithms, where the c u r r e n c y , a n d t h e 0(N) p e r f o r m a n c e of c o n c u r r e n t
a m o u n t of o v e r l a p of t h e l o s e r s a n d t h e a r e a t h e y o p e r a t i o n s ( c o m p a r e d t o O(IogN) o n a n u l t r a c o m -
c o v e r is m a x i m a l . F i g u r e 4 s h o w s a c a s e w h e r e t h e puter), make our implementation somewhat slower
f i r s t a l g o r i t h m is r i g h t a n d t h e s e c o n d is w r o n g . than would be expected on an ultracomputer.
~xture 3. The w o r s t case f o r both algo'M,f,hrr~s. Span 1, the The a l g o r i t h m a b o v e h a s s e v e r a l a p p e a l i n g p r o p e r -
w i n n e r , c o v e r s a s l i v e r of t h e pixel. Losing s p a n 2 o b s c u r e s t i e s . I t is i n d e p e n d e n t of N, t h e n u m b e r of P E s i n
a n o t h e r loser, 3. the ultracomputer. I n d e e d , t h e s p e e d of t h e a l g o -
r i t h m is i n v e r s e l y p r o p o r t i o n a l t o N, u p t o a l o w e r
>XY
bound constant when N --~--. A good serial algo-

Z I<Z ~<Z 3
r i t h m is o b t a i n e d w h e n N = i. We e m p h a s i s e t h e f a c t
that the anti-aliasing techniques presented here
Correct Answer -~ Iz easily transfer to serial environments, as illus-
t r a t e d h e r e . A n o t h e r p r o p e r t y of t h e a l g o r i t h m is
Computed Answer =
Iz + Is
- - that although it scan converts polygons, the general
2
approach adapts to other scene representations
Error = 18 - Is (e.g. s c a n l i n e m e t h o d s f o r p a r a m e t r i c s u r f a c e s a s i n
2 [BCLW80]).
Several improvements could be made to the algo-
rithm. An i s s u e d e s e r v i n g of a t t e n t i o n is s p a c e
Figure 4. A bad c a s e f o r algor'~thrn 2 o ~ y . Winning s p a n 1, c o m p l e x i t y a n d m e m o r y t r a f f i c . By u s i n g d y n a m i -
c o v e r i n g half t h e pixel, o b s c u r e s losing s p a n 3. Losing s p a n 2 cally allocated shared memory and pointers, the
c o v e r s half t h e pixel as well.
a m o u n t of s t o r a g e r e q u i r e d w o u l d b e d r a s t i c a l l y
reduced; memory traffic would decrease, since
pointers would travel through shared memory.
Z,<Zz<Zs However, indirect shared memory references
require two passes through the connection network.
Correct A n s w e r = Ii + le A s o l u t i o n is t o m a k e g r e a t e r u s e of t h e c a c h e
Z i
m e m o r y l o c a l t o e a c h PE. A c o p y of t h e s t a t i c
A l g o r i t h m 1 - I1 + I8 pointers may be placed in the local memory for
2 e a c h PE, t h u s s a v i n g t h e O(log N) c o n n e c t i o n n e t -
It+ Iz+Is
work cycle time.
Algorithm2 = 2
2 4. Other Ultracomputer Applications and Future
Research
Error 2 = Iz - [a
4 As t h e p l e t h o r a of p u b l i s h e d p a r a l l e l a l g o r i t h m s
s h o w s [Schw801, t h e u l t r a c o m p u t e r is t r u l y a p o w e r -
ful, g e n e r a l - p u r p o s e t o o l . F a s t p a r a l l e l a l g o r i t h m s
exist for matrix multiplication, sorting, linear pro-
A g r o s s e s t i m a t e of t h e e x t e n t of t h e e r r o r s f o r 103 g r a m m i n g , f l u i d d y n a m i c s , e t c . We h o p e t o h a v e
p o l y g o n s , c o v e r i n g a n a v e r a g e of l 0 s p i x e l s e a c h , a n d demonstrated that the ultracomputer has great

148
potential in the computer graphics field. Other GoM~3 Gottlieb, A., B.D. Lubachevsky, and L. Rudolph, "Basic
applications would also significantly benefit from t e c h n i q u e s for t h e efficient coordination of v e r y
large n u m b e r s of c o o p e r a t i n g s e q u e n t i a l p r o c e s -
ultracomputer implementation. For instance, a sors", l~ransact~i,o~s oR Progrcsrnrni,ng Languages
parallel queue could be exploited to parallelise ray- a ~ l ~ystem.s (ACM) 5, 2 (Apr. 1983), 164-189.
tracing algorithms [WhirS0]. Since the processing
of o n e r a y ( o r a l t e r n a t i v e l y , o n e p i x e l ) i s a n i n d e p e n - GuSS81 Gupta, S., R.F. Sproull, a n d I.E. S u t h e r l a n d , "A VSLI
dent task, we believe significant speed-up in ray- a r c h i t e c t u r e for u p d a t i n g r a s t e r s c a n displays",
tracing can be achieved on an ultracomputer. Simi- ComputeT Graphics (ACM) 15, 3 (Aug. 1981), 71-78.
larly, we believe many problems in image process-
ing, signal processing, and artificial intelligence are La~-75 Lawrie, D.H., "Access and alignment of data in an
array processor", IEEE Transactions oR Computers,
likely to benefit. (:-.24,12 (Dec. 1975), 1145-1155.

5. Acknowledgements Lee81 Lee, D.T., "Shading of regions on v e c t o r display dev-


W e w i s h to t h a n k J o h n A m a n a t i d e s a n d Peter ices", ComputeT GrapAics (ACM) 15, 3 (Aug. 1981),
Schoeler for their suggestions, which have 87-44.
i m p r o v e d the clarity of this paper. T h e first t w o
authors gratefully a c k n o w l e d g e the financial sup-
NeSp?9 Newman, W.M., and R.F. Sproull, Principles of
2nteTac~ve ComputeT Graphics, Second Edition,
port of the N a t u r a l Sciences a n d Engineering McGraw-Hill, New York, 1979.
R e s e a r c h Council of Canada.
ParkS0 Parke, F.I., "Simulation and e x p e c t e d p e r f o r m a n c e of
References multiple processor z-buffer systems", Computer
Ccraph~cs (ACM) 14, 3 (July 19B0), 48-56.
BCLW80 Blinn, J.F., L.C. Carpenter, J.M. Lane, a n d T. Whitted,
"Scan line m e t h o d s for displaying p a r a m e t r i c a l l y Sehw80 Schwartz, J.T., " U l t r a c o m p u t e r s " , Tran~ac~$on.v oR
defined s u r f a c e s " , Comm. ACM 23, 1 (Jan. 1980), 23- Programming I_~nguages and ~lstern~ (ACM) 2, 4
34. (Oct. 1980), 484-522.

C a t m 7 8 Catmull, E., "A H i d d e n - S u r f a c e Algorithm with Anti- SuSS74 S u t h e r l a n d , I.E., R.F. Sproull, a n d R.A. S c h u m a c k e r ,
Aliasing", Computer Graphics (ACM), 12, 3, (Aug. 78), "A c h a r a c t e r i z a t i o n of t e n h i d d e n - s u r f a c e algo-
6-11. r i t h m s " , Compu~ng Surveys (ACM) 6, 1 (March 1974),
1-55.
Clar82 Clark, J.H., "The geometry engine: a VLSI geometry
system for graphics", Computer Graphics (ACM) 16, Wein81 Weinberg, R., "Parallel processing image synthesis
3 (July 1982), 127-134. and a n t i - a l i a s i n g ' , Computer Graphics (ACM) 15, 3
(Aug. 1981), 53-62.
Crow'/? Crow, F.C., "The Aliasing Problem in Computer-
Generated Shaded Images", C o m m . A C M 20, 11 (Nov. WhWe51 Whitted, T., a n d D.M. Weimer, "A software t e s t - b e d for
1977), 799-805. t h e d e v e l o p m e n t of 3-D r a s t e r g r a p h i c s s y s t e m s " ,
Computer Graphics (ACM) 15, 3 (Aug. 1981), 271-277.
CrowS1 Crow, F.C., "A Comparison of Antialiasing Tech-
niques", IEEE Computer Grc*pltics a~.d Applieat~o~zs, Whe152 Whelan, D.S., "A r e c t a n g u l a r a r e a filling display sys-
1, 1 (Jan. 81), 40-49. t e m a r c h i t e c t u r e " , Computer Graphics (ACM) 16, 3
(July 1982), 147-153.
FPPB82 Fuchs, H., J. Poulton, A. Paeth, and A. Bell, "Develop-
ing PIXEL-PLANES, a smart memory-based raster WhitSO Whirred, T., "An improved illumination model for
graphics system", 1982 Conference oR A d v a n c e d shaded display", C o m m . ACId 23, 6 (June 1980), 343-
ReseaTch in VLSI, MIT, January 1982, 137-146. 349.

FoFuB3 Fournier, A. and D. Fussell, "On the Power of the Whit81 Whitted, T., "Hardware e n h a n c e d 3-D r a s t e r display
F r a m e Buffer", u n p u b l i s h e d m a n u s c r i p t , 1983. systems", CMCCS Conference ProceedS,rigs, (June
1981), 349-356.
FuBar79 Fuchs, H. and J. Barros, "Efficient Generation of
S m o o t h Line Drawings on Video Displays", Computer
Graphics, 13, 2, (Aug. 79), 260-269.

FuPo81 Fuchs, H., a n d J. Poulton, "PtXEL-PLANES: a VLSI-


o r i e n t e d d e s i g n for 3-D r a s t e r graphics", CMCCS
Conference Proceedings, (June 1981), 343-348.

Full82 Fussell, D., a n d B.D. Ratl~i, "A VLSI-oriented a r c h i t e c -


t u r e for r e a l - t i m e r a s t e r display of s h a d e d
polygons", Graphics/nferfa~e "82, May 1982, 373-380.

Fueh77 Fuchs, H., "Distributing a visible s u r f a c e a l g o r i t h m


over m u l t i p l e p r o c e s s o r s " , Proceedings of ACM 1977,
Seattle (Oct. 1977), 449-481.

C~KM8$ Gottlieb, A., R. Grishman, C.P. Kruskal, K.P. ~IcAu-


liffe, L. Rudolph, a n d M. Snir, "The N Y U
U l t r a c o m p u t e r - - d e s i g n i n g a n MIMD s h a r e d m e m o r y
parallel c o m p u t e r " , IEEE Transactions on Comput-
ers, C-32, 2 (Feb. 1983), 175-189.

149
Computer Graphics Volume 17, Number 3 July 1983

Figure 5. A t i a s e d c u b e , E r ~ l a r g e d v i e w n.f, 2 5 6 x 2 5 6 Figure 6. Ant~.-~lic~sed c u b e .


resolution.

Figure 7. AI.'i,
a s e d [tczt~cz~f,~zbtecLoth,.
Figure 8. ATttti-~t/i,
atsed [taL'i,¢z~tczbLecLotlt.

~IIII 1 t | I 1 ~%'%~"*

III I I I I l l~ "/111 I I | l l l ~

~ ~ ~ m ~ lm F Ip

~ ~ ~ q q m m M
"q "ql 'ql q q III II II
Figure 9. CLose-up of some cover,zge mcLslcs. Each
y e l l o w a r e a i n d i c a t e s t h e s u b p i x e l c o v e r a g e of a n
o r i e n t e d line.

150

Vous aimerez peut-être aussi