Vous êtes sur la page 1sur 9

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. I, NO.

I, MARCH 1991 147


An Entropy Coding System for Digital HDTV
Applications
Shaw-Min Lei, Member, IEEE, and Ming-Ting Sun, Senior Member, jbEE
Abstract Run-length coding (RLC) and variable-length cOding (VLC)
are widely used techniques for lussless data compression. A high-speed
entropy coding system nsing these two techniques is considered for
digital high defnition television (HDTV) applications. Traditionally,
VLC decoding is implemented through a tree-searching algorithm as the
input bits are received serially. For HDTV applications, it is very
difcult to implement a real-time VLC decoder of this kind due to the
very high data rate required. In this paper, we introduce a parallel
structured VLC decoder which decodes each codeword in one clock
cycle regardless of its length. The required clock rate of the decoder is
thus lower and parallel processing architectnres become easy to adopt in
the entropy coding system. The parallel entropy coder and decoder will
be implemented in two experimental prototype chips which are designed
to encode and decode more than 52 million samples/s. Some related
system issues, such as the synchronization of variable-length codewords
and error concealment, are also discussed in this paper.
1. INTRODUCTION
R
UN-LENGTH coding (RLC) [1] and variable-length coding
(VLC) [2] are two widely adopted techniques for lossless
data compression. They have become part of the international
digital facsimile coding standard [3], and will be included in the
low-bit rate video coding standard [4] as well. In image or video
coding applications, these two statistical coding techniques are
ofen used in conjunction with various lossy coding techniques,
such as DCT, subband, or DPCM [5]-[7], to reduce the data
rate further wiihoit adding any degradation to the data. The
RLC represents consecutive zeros 1 by their run lengths thus
reducing the number of samples. The VLC assigns shorter
codewords to more frequent source symbols, and vice versa, so
that the average bit rate is reduced. Usually, the output distribu
tion of a lossy coder is very uneven for different quantization
levels and there are many zeros clustering together. Thus, RLC
and VLC can compress the data efectively.
In the digital advanced television (ATV) project of Bellcore,
these two techniques are used along with subband/DPCM source
coding algorithms [61, [7] as data compression algorithms for
the high defnition television on broadband integrated services
digital network (HDTV -on-BISDN) experimental research proto
type system [25]. Although there are other lossless data com
pression techniques, the combination of RLC and VLC was
chosen because it achieves very good compression efciency,
feasible hardware implementation, and reasonable means of
recovering from errors. The overall compression performance
was reported in [6] and [7]. A discussion of the overall proposed
Manuscript received July 3, 1990; revised October 20, 1990. This paper
was presented in part at the IEEE Interational Symposium on Circuits and
Systems, New Orleans, LA, May 1990 and the 3rd Interational Workshop
on HDTV, Italy, August 1989.
The authors are with Bellcore, 331 Newman Springs Road, Red Bank, NJ
07701.
IEEE Log Number 901881.
'
Although RLC on the other symbols is also possible, only zero run-length
coding is really efective in our applications.
HDTV system can be found in [7J and [25]. In this paper, we
focus on the entropy coding part of the whole system.
Some of the important parameters of our prototpe system are
as follows. The sampling rate of the HDTV system is about 52
MHz2. The sampling ratio between luminance (Y) and chromi
nance (U and V) component is 4 : 2 : 2. Thus, the total sample
rate is about 104 MHz. The video data will be compressed to
roughly 130 Mbps, and then carried in SONET STS-3c [8] at a
gross rate of 155.52 Mbps. The data rates involved in this
HDTV application are much higher than those in the digital
facsimile or low-bit rate video applications. The design of a
real-time implementation of an entropy coding (RLC/VLC)
system with such a high throughput is an important and challeng
ing problem.
Other issues also need to be considered in the design of the
high-throughput entropy coding system. Since there is no ex
plicit word-boundary in the variable-length coded data stream, a
transmission errof will cause the succeeding codewords to be
decoded erroneously. Means must be provided to recover from
such error propagation and minimize its efect on the quality of
the reconstructed picture.
The organization of this paper is as follows. A parallel
architecture for the entropy coding system that can achieve high
throughput with lower speed requirement is discussed in Section
II. The implementation of a parallel entropy coder and decoder
which can achieve the reqUired throughput for HDTV applica
tions is described in Section III. The codeword synchronization
and error concealment are discussed in Secton IV. Finally, a
summary is given in the last section.
I. THE SYSTEM ARCHITECTURE
In this section, we will discuss a parallel architecture for the
entropy coding system which achieves high throughput with
lower speed requirement. First, we will discuss the structures
for the VLC coder and decoder that are suitable for parallel
signal processing, Second, the overall entopy coding system is
described.
A. The Structures oj the VLC Coder and Decoder:
Parallel Versus Sedal
VLC encoding is a table look-up operation corresponding to a
mapping between source symbols and variable-length code
words. The concatenation of these variable-length codewords is
usually done by shifing out each codeword bit serially. On the
other side, the VLC decoding is usually carried out by tracing
along a coing tree at the input serial bit rate until a leaf (i. e. ,
2
Although some of the existing HDTV systems have sampling rates higher
than 70 MHz, 51.84 MHz (1/3 of SONET STS-3c rate) has been chosen for
this experimental prototype HDTV system. As the VLSI technology ad
vances, higher sampling rates will be achieved autmatically with the same
architecture.
1051-8215/9110300-0147$01.00 1991 IEEE
148 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 1, NO, 1, MARCH 191
decoded data) is reached [9 J. The operations of these encoding
and decoding methods are bit-serial. For the real-time HDTV
system, the lowest bit -serial rate is the coded compressed daa
rate which is about 130 Mbps. Such high clock rate processmg IS
not easy to achieve with today's low-cost CMOS VLSI technol
ogy. Can parallel processing reduce this rate? Unfortunately, the
partition of the source data into multiple paths cannot guarantee
that the coded data rate is equally distributed for each path.
Thus,the possible peak rate for each path may still be close to
130 Mbps. The other possibility is to operate these serial coder
and decodcr at the rate derived from the source samplc rate such
that parallel processing can efectively reduce the clock rate. If
the maximal code length of VLC is 16, the maximum bit-serial
rate is 10M X 16 1664 Mbps. In order to reduce the clock
rate to lower than 100 MHz, which is more comfortable for
today's CMOS VLSI technology, more than sixteen paths are
needed. This implies too many hardware duplications.
In Section Il, wc will introduce VLC coder and decoder
structures based on a parallel approach [12J-[14]. The parallel
coder (or decoder) encodes (or decodes) each codeword in one
clock-cycle regardless of its code length and thus can operate at
the video sample rate, instead of a high bit-scrial rate. Since they
operate at the video sample rate, the parallel VLC coder and
decoder also permit a system design based on parallel paths to
frther reduce the speed requirement. For example, if two paths
are use\ in the entropy coding system, each VLC coder or
decoper need to operate only at a peak rate of 52 MHz.
Oher parallel decoding methods can be found in [10J and
[Il]. In [10], Peake used a barrel shifter and two programmable
logic arrays (PLA's) to achieve decoding of each codeword in
one clock cycle. However, the decoder's interface and control
circuitry for the barrel shifter were not discussed. In r 11], a read
oply memory (ROM) and an N-shif register were used, and the
VLC was modifed to allow fewer code lengths for a high-speed
implementation. Such constraints on the VLC usually introduce
some degradation on compression efciency.
B. A System Overview
A general block diagram of an ATV codec is shown in Fig. 1.
The video codec consists of an A /D and D / A, a video source
coder/decoder, an entropy coder/decoder, and channel inter
faces. The input of the entropy coder comes from the source
coder. In order to achieve the required data compression, the
source coder may transform its input video signals into multi
subband signals (such as in te DCT or subband coding [25],
[7J) so that each subband can be quantized marc efciently. The
source coder outputs can then be further compressed by the
entropy coder. After the entropy coder, the instantaneous data
rate may vary over a wide range. Thus, a large bufer is needed
to smooth out the data rate so that the average rate is near the
channel rate.
It is very inefcient to use one entropy coder /decoder for each
subband since a lot of hardware would have to be duplicated. It
has been suggested3 that the multi-subband signals generated by
the source coder be multiplexed into the least number of paths
such that the speed requirement of the entropy coder/decoder is
still feasible for implementation. A data shufer is added after
the multiplexer to shufe the data on a line-by-!ine basis (i.e.,
one line of data from one subband followed by onc line of data
from another subband) so that the zero-runs of each subband are
preserved [25], [7J.
3
The idea was suggested by Dr. J. A. Bellisio.
R
G
B
R
G
B
Lssless
Entoy
Dcoe
Chan el
Iterae
Fig. 1. System diagram of generic HDTV codec.
It is pssible to multiplex all the l uminance and chrominance
subbands into a serial data stream with a total sample rate of 10
MHz4 so that only one entropy coder/decoder is needed. How
ever, assuming a maximum codeword length of 16, the VLC
coder/decoder would need to achieve a worst-case throughput of
1.66 Gbps. Also, the required interface bandwidth of the VLC
decoder and the bufer would have to be 1.664 Gbps. It is very
dificult and expensive to implement circuits to achieve this rate.
Moreover, the 10 MHz clock is more difcult to obtain in a
system where the video sampling clock is 52 MUz.
. .
A block diagram of a two-path entropy coder IS shown In Fig.
2. This architecture is more general and can be easily extended
to systems having more paths. In Fig. 2, the subbands are
separated into two paths, one for luminance and the other for
chrominance, such that the peak-rate in each path is only 52
MHz. Since the output of the VLC coder is bursty, a small line
frst-in-frst-out bufer (FIFO) is added afer each VLC coder.
This FIFO can accommodate the maximal amount of data that
buffer can absorb within one line period so that the data can be
multiplexed into the bufer on a line-by-linc basis. Also, this
FIFO smootbs out the data rate over one line period and the
resulting average rate is much lower than the peak rate. Thus,
the interface bandwidth to the bufer can be reduced from 1.664
Gbps to a moderate amount (e.g., 416 Mbps in
.
our

ystem) so
that this interface is easier to implement while still haVIng
enough bandwidth to accommodate most lines. Since the entropy
coder only compresses data statistically, in some very rare but
possible cases, the actual output rate of the entropy coer within
a line may be higher than the reduced interf'ce bandwidth. Some
data have to be discarded by the multiplexer in these cases. The
degradation caused by this loss of data is minimized by coding
the important subbands frst. Thus, if some data need to be
discarded, only the less important subbands are afected. A
special codeword may be needed to mark this forced-end-of-line
case. The entropy decoder is an inverse of the entropy coder
except some error-handling functions are included.
III. THE IMPLEMENTATION OF THE ENTROPY CODER AND
DECODER
In this section, the structures of the entropy coder and decoder
based on a parallel approach are described frst. Then full-custom
VLSI implementations of these two experimental prototype chips
are introduced.
A. The Entropy Coder
The entropy coder consists two major parts: RLC and VLC
coders. The input data is run length coded frst, and then
variable length coded. In the RLC coder, input data that is not
part of a zero run is passed through, otherwise the length pf the
zero run is determined and properly encoded. For both cases,
4
Many source coders preserve the sample rale.
LEI AND SUN: ENTROPY CODING SYSTEM FOR DIGITAL HDTV APPLICATIONS 149
Luiace
I
r-
Buffe Sts
RLC VLC FO
8
16
16
t
MU

Bufer
16
aee Chom
RLC VLC
FIFO
8 16 16 =
Fig, 2, Block diagram of two-path entropy coder,
VC
dt ou
clk
en
VLdti
(fr RL dt ot)
VLCoutpu[
available
clk
en
Uncoe
word Table
Codeword
Table
Coelengtit
Table
AND-Plae
OR-Plae
OR-Plae
Reset
PLA
Fig, 3, VLC coder.
one extra bit is added to indicate whether the symbol represents
a zero run or a single sample. The RLC coder can be easily
implemented by a counter, some registers and logic gates. When
a zero run is present, the RLC coder generates no output untl
the last zero or maximum run length is reached, Therefore, the
output of the RLC coder is not continuous, and the operation of
the downstream VLC coder is gapped,
The VLC coder maps the input data into variable-length
Coewords, concatenates them together, and segments them into
16-bit words for output. The parallel VLC coder shown in Fig. 3
encodes each codeword in one clock cycle regardless of its
length. The functions of some major circuit components in Fig,
3 are as follows. The PLA does the table look-up of the
codewords. The barrel shifter BS1 concatcnates these codewords
together and the barrel shifter BS2 segments them into 16-bit
words for output. Thc function of a 4-bit accumulator of code
lengths is mimicked by the barrel shifer BS3 and the register
L [. The carry-out of the accumulator forms the output-available
signal of te VLC coder. Thc signal en in Fig. 3 is an enable
signal which is derived fom the data-available of the RLC
coder. When there is a zero run in the source data, operation of
the VLC coder is suspended until the RLC coder obtains its run
length. During this time period, en is low and the registers in
the VLC coder will rctain their old data.
The codeword look-up table can be implemented by a read
only memory (ROM), programmable logic array (PLA) , or
random access memory (RM). Using RAM, a user pro
grammable VLC can be implemented, However, the size will be
larger, the sped will be slower than the other two approaches,
and extra circuitry is needed for preloading the eodebooks, A
ROM is more suitable when the number of eodebook entries is
2 n, where n is an integer, otherwise some address locations are
150 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VlDEO TECHNOLOGY, VOL. 1, NO. 1, MARCH 1991
wasted. For these reasons, a PLA is used in our current imple
mentation. Although there is only one PLA shown in Fig. 3,
multiple paged PLA's will be implemented to allow diferent
tables for diferent subband signals to achieve higher compres
sion efficiency.
Registers Wo and La store the results of the PLA table
look-up, i,e. , codeword and coe length respectively. The maxi
mal code length is 16 bits in our implementations. The codeword
from the PLA and in Wo is left-adjusted and stufed with 1 's, if
necessary, on the right. The frst bit is on the lef. The code
length is represented by 16 bits in a decoded form, i.e. , the
position of the only I indicates the length. WI stores the
concatenated previous codeword which have not yet been output.
We will call these codeword bits the residual bits. W2 is the
output latch. LI records the number of the residual bits in W
I
'
C3 indicates whether there is an output availahle in the output
latch W2.
The parallel concatenation of codewords is done hy barrel
shifcrs BSI and BS2 Functionally, the output of a barrel
shifter is a sliding window on its input data. BSI and BS2 both
provide 16-bit wide windows on their 31 input bits. BS3 has a
32-bit wide window on its 47 input bits. Thcy all have 16
diferent shift positions. BSI is controlled by the current code
length stored in Lo and shifs the codeword from Wo into WI
so that the rightmost bit of WI is the last bit of the codeword.
Consequently, the data stored in register WI is ready to concate
nate with the next codewords. The barrel shifter BS2 is con
trolled by the number of residual bits as recorded by L I' The
residual bits in WI are left-adjusted by BS2 If the sum of the
residual bits and the current codeword is more than 16, the
output of BS2 contains the frst 16 bits which have not yet been
output but are ready for output.
The combination of the barrel shifer BS3 and the latch LI
functions as a 4-bit accumulator of code lengths. If the residual
bit length is greater than 16, the right 16 bits of BS3's output are
all D's and C3 is set to 1 to indicate output-available. The 16-bit
pattcrn from the latch L 1 is partially duplicated into a 31-bit
input to BS] so that the barrel shifter functions as a rotator. The
other 16 input bits of BS3 are connected to "0" for detecting
the carry-out condition. If the right 16 bits of BS3's output are
all O's, there is a carry-out. The left 16 bits of BS3's output
update the new number of residual bits in W
I
' The number of
residual bits is usually between 1 and 16 except at the beginning
of the operation when i is zero. Since LI is 16 bits long, it can
only represent 1 to 16 in the decoded form. Thus, at the
beginning of the operation, LI needs to be set to 16, which is
modulo-16 equivalent to zero, and C2 is set to 1 to indicate this
is a zero, not 16. The example in Fig. 4 illustrates how this VLC
coder works.
There are several reasons for using the barrel shifter BS3
instead of a 4-bit accumulator: 1) a barrel shifter is faster than an
accumulator; 2) sinee the output is already a 16-bit decoded
patter, a 4-to-16 decoder is not required for BS2; and 3) the
decoded representation of a code length reduces the capacitive
loading on the bit-line in the PLA code length OR-plane. Conse
quently, using the barrel shifter results in faster circuitry and
also saves design time since the design of the barrel shifter is
available anyway.
5
The 16-bit maximal code length limitation has been found to incur very
little penalty (less than 1 %) on the coding rates of all the test sequences we
used. The circuits discmsed here can also be modified casily to accommo
date longer code length.
Wo ( 1 "2 \; CJ c2
(toeword) 1tr
}/"lJUlTf_I5Ul) (ourUlregmr) len)
o UUVD | @ 1 ? 1
1 (11[.**+*** \ ttt#{;Q1\1 - A
g1_g_+-. . Q .... .. .
\
0 1010 p^I^l 3
9 +... J}@1OJ QI01\IOJ . ... 11
1D + 13 _*. ... 0110101J `01@00 011U t
5 11111111111 15 1p JD100) 10111 J
6 10100J01 1 *+** J0 l !111II11Il > oq 111111111110 J
"10 17 0110 111l )11J 111\ I 10
8 111'J11111 16 101:110001001) 1010J1<1J
9 J101 ... ... . ** 111 l'11 11J1 M W , 111.11111100 15
I U jJg0 ...... ......... - 1 010 11 , 111 1111 U
Fig. 4. Example of VLC coder operations.
B. The Entropy Decoder
The entropy decoder contains a VLC dccoder followed by a
RLC decoder. It performs an inverse function of the entropy
coder. The RLC decoder passes the VLC-decoded codewords
through if they are not run-length codes, otherwise it outputs the
specifed number of zeros. During a zero run, while zeros are
being output, the operation of the VLC decoder must be sus
pended. Thus, the output of the RLC decoder is continuous, but
the operation of the VLC decoder needs to be intermittent in
analogy to the VLC coder. Similar to the RLC coder, the RLC
decoder can also be easily implemented by a counter, some
register, and logit: gates.
The VLC decoder is more difcult to implement than the VLC
coder. The input to the VLC decoder is a bit stream without
explicit word-boundaries. The VLC decoder has to decode a
codeword, determine its length, and shif the input data stream
by the number of bits corresponding to the decoded code length,
before det:oding the next codeword. These are recursive opera
tions that cannot be pipe lined .
A block diagram of a parallel VLC decoder is shown in Fig.
S, The fnctions of its major components are described as
follows. The PLA is the codebook table. It matches a codeword
and outputs the corresponding symbol and code length. The code
length is accumulated by barrel shifter BS
I
and register D2. The
barrel shifter BSo then shifs its opening window to the next
codeword according to this accumulated code length. An exam
ple of this VLC decoder operations is shown in Fig. 6.
In principle, a decoding table could be implemented by a
ROM, however, it would require a 216-word ROM which would
be very waste l'uL It is much more efcient to use a content
addressable memory (CAM) [12] or a PLA [10] whose sizes are
determined only by the number of code-book entries. A user
programmable VLC decoder can be implemented by using a
CAM [12], however, it would result in a circuit much larger and
slower than a circuit using a PLA. In the folowing discussion,
use of a PLA is assumed. The operation of the circuitry is as
follows.
The input data are stored in registers Do and D
I
The 16-bit
pattern in D2 represents the number of decoded bits (i.e.,
accumulated codelength) in DI. The number can lie between 0
and 15. This pattern controls the barrel shifter BSo so that the
undecoded bits appear at the output of the barrel shifter.
The AND-plane of the PLA essentially performs a parallel
pattern matching on the data stream. When a codeword is
matched, the corresponding word-line in the PLA AND-plane is
activated which enables the corresponding word transistors in
the OR-planes to output the decoded codewords and the code
length.
The decoded codc length is used to control the second barrel
shifter BS
J
whose function is a 4-bit accumulator, analogous to
LEI AND SUt: ENTROPY CODING SYSTEM FOR DIGITAL HDTV APPLICATIONS 15
I
Read
16
Codeword
Table
Coelengt
Table
De
word Table
A-Pa OR-Plae OR-Plae
PLA
Fig. 5. VLC decoder.
D1 DO
Decoded
SHIF Carry-out
Symbol
Barrel-Shifter
Output" 0111110000011101 001000110010101
Example Codebook:
01111100000111011001000110010101
b 2 0
a
00 2
b 01 2
c
100 3
d
101 3
e
110 3
f
1110 4
9 11110 5
h
11111 5
9
a
a
b
a
7
9
11
15
2
4
6
0
0
0
0
o
o
Fig. 6. Example of VLC decoder operations.
the BS3 in the VLC coder. BS] shifts the pattern of D2', output
according to the newly decoded code length. The resultant new
pattern corresponds to the accumulated code length. This new
pattern controls BSo so as to output the correct window or 16
bits for the next decoding cycle.
When the accumulated code length exceeds 15, a carry-out bit
becomes I. It indicates that all the bits in D] have been used and
that Do may not contain the whole next codeword. In this case,
when the gapped ready clock generated by the RLC decoder
latches the decoded output, a read signal is generated. The
contents of Do is loaded into D], a new 16-bit word is loaded
into Do, and the barrel shifter shifts to the new position, all at
the same time, to prepare for the next decoding cycle.
If the accumulated code length does not exceed 15, the
carry-out signal is O. Since the maximum code length is 16 and
at least 16 bits of data in Do and D] are not used yet, there arc
always enough bits for the next decoding cycle. Do and DI are
lef unchanged. The new accumulated code length pattern simply
controls the BSo to shift to the correct position for the next
decoding cycle. Thus, VLC decoding is achieved in one clock
cycle regardless of the code length.
C. A Full-custom IC Implementation
Since the circuit components of the entropy coer are similar
to those of the entropy decoder and the speed requirement of the
entropy decoder is more difcult to achieve, we will fous on the
entropy decoder here. The mask-size and the simulated speed of
some critical parts, barrel shifters and the PLA, are shown in
152 IEEE TRAlSACTlONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. VOL.!. NO. I. MARCH 1991
Fig. 7. Chip layout of entropy decoder.
TABLE I
THE LAYOUT SIZE AND SIMULATION SPEED OF THE CRITICAL PARTS
Technology
Barrel shifter BSo
Barrel shifter BS 1
PLA (with 16 entries)
1.2 14m double-metl CMOS
Mask-size 304 I"m x 271 I"m
Delay-time 1.5 ns from input and shift
to output
Mask-siLe 598 I"m X 282 I"m
Delay-time - 1.5 ns from inplll and shift
to output
Mask-size 557 Im X 29371"m
Evaluation-time - 4.2 ns to wordlength
OR-Plane output,
including an output bufer.
(assuming the maximum number of entries
with same length is 60)
Table L A mask layout of the decoder chip is shown in Fig. 7.
Six codebooks are included in the chip. The core of the chip
contains about 37K transistors in an area about 3.8 mm x 4.0
m. Operating speed higher than the required 52 MHz is
anticipated. For our usage. the maximum number of entries in a
codebook is 1 60. However, extension to more codebook entries
is easy.
The PLA's are implemented using Domino CMOS circuits
[15] which achieve high speed oprations with low power con
sumption [16]. The AND-plane and OR-plane of the PLA are
precharged in the frst half clock cycle and evaluated in the
second. During the precharge time of VLC decoder, the signals
are propagating through Do. Dj, BSo, and the PLA address
bufers. Thus. the time during the precharge is not wasted.
The critical path of the VLC decoder includes the PLA
AND-plane evaluation time, the code length OR-plane evalua
tion time, and the BS1 delay time. The PLA AND-plane evalua
tion is speeded up by employing larger-than-minimum size
transistors. Tbe OR-plane evaluation time is very dependent on
the capacitive loading on the bit-lines. Since the code length is
fully decoded, the transistors in the code length OR-plane are
very sparsely populated. This greatly reduces the capacitive
loading and increases the evaluation speed. To further speed up
the operation in the OR-plane, each bit is implementcd by
multiple transistors sharing drain difsions. This improves the
ratio of transistor strength to load capacitance. Also, the word
lines from the PLA address decoder are bufered between the
code length OR-plane and the decoded-word OR-plane. This
minimizes the capacitive loading on the word-lines which ad
dress the code length OR-plane.
To minimize the capacitive loading on the bit-line of the
codeword OR-plane, the transistors on the OR-plane are popu
lated in such a way that no bit-line ever has more than 50%
occupancy of transistor drains on it. If a bit-line has more "1"
entries, the polarity of that hit-line is inverted and the output
polarity is corrected by an inverting sense amplifier [17].
The PLA layouts are generated by a PLA generator written in
the C Language. This PLA generator makes the chips mask-pro
grammable for other systems using diferent codehooks.
IV. CODEWORD SYNCHRONIZATION AND ERROR
CONCEALMENT
One major concern on using variable-length code is its error
propagation property. An erroneous bit from transmission or
storage of the encoded bit stream will cause the codeword to be
misinterpreted and as the codeword's length is not fixed, this
may result in a loss of synchronization of the bit stream.
Decoding errors may propagate to the subsequent source sym
bols. Although resynchronization may naturally occur afer a
while [ 1 8], [19J or it can be guaranteed by careful designs of the
code [201-[22], the number of the decoded symhols may not be
correct. In video or image coding applications, this would result
in a shift on part of the reconstructed picture which is very
objectionable. A technique for resynchronizing both the code
word and the sample position is the use of synchronizing words
at suitable intervals. These synchronizing words have to be
recognizable whether the dccoding is synchronized or not. A
codeword with this property is called a clear codeword. Basi
cally. a clear codeword is a codeword which cannot be gener
ated by any concatenations of other codewords. The end-of-line
(EOLl codeword used in the international digital facsimile cod
ing standard [3] is an example of the clear codeword.
The identification of these clear codewords is not dependent
on the correct decoing of their proceeding symbols; they can be
identified by their special codeword patterns. Thus, if there is
any bit error in the coded data stream, the error propagation is
confned at most until the next clear codeword. Furthermore,
since the number of codewords between each pair of clear
codewords is known, most of the errors can be detected by
counting the decoded samples between the clear codewords.
Such error detection not only can prevent a possible position
shift of the decoed samples following the erroneous segment,
but also can activate error concealment mechanisms for the
erroneous segment. For example, if a bad line is detected, we
may repeat the previous line. If the erroneous segment is a
high-frequency suhband, we can retain other correct subbands
and only replace the erroneous subband by zeros. According to
the study in [23], these two error concealment techniques are
very effective.
If multiple variable-length coded bit streams have to be multi
plexed together, usually they cannot be multiplexed directly in a
word-interleaving fashion due to a diffrent number of words for
each bit stream. By using clear codewords as segment delim
iters. they can be multiplexed segment by segment. These clear
LEI AND SUN: EITROPY CODING SYSTEM FOR DIGITAL HDTV APPLICATlONS 153
o : Leaf
o : Branching Point
Fig. 8. Example of construction of code tree with shorter clear codeword.
codewords are recognizable to the demultiplexer and the seg
ments can be demultiplexed without the requirement of
variable-length decoding frst.
Although the clear codeword, EOL, is used in the interna
tional digital FAX coding standard [3], the design of an optimal
Hufman codebook which also includes clear codewords has not
been shown. A clear codeword cannot be obtained automatically
from the Hufman algorithm. Usually, the codewords generated
hy the Hufman algorithm are not clear, i.e., they can be
generated by a concatenation of other codewords. In order to
make a clear codeword, a reserved codeword has to be extended
by several bits. Naturally, it is desired to make these extension
bits as few as possible. With the code1engths given by the
Hufman algorithm, there are many diferent codes. For diferent
codes or diferent reserved codeword patterns, the number of the
needed extension bits may be diferent.
A method of fnding an efcient code with a clear codeword
will be introduced here. As shown heuristically in [24], a good
bit pattern for the reserved codeword is all I' s (or all 0' s).
(Since they are equivalent, we will only discuss the former case
without loss of any generality.) The all-one reserved codeword
tends to require shorter extension bits than others in order to
make it clear. To convert the reserved all-ones codeword into a
clear codeword, a sufx of a few more l's followed by a 0 is
needed. The sufce 0 after the all 1 's pattern in the clear
codeword is needed to mark the end of the clear codeword.
Otherwise, an all 1 's clear codeword cannot be clearly located
when it succeeds a codeword with sufx l's or when it precedes
a codeword with prefx 1 'so
If the reserved codeword and the clear codeword should have
the patterns we described above, it is simple to observe that we
should arrange the code tree such that a concatenation of code
words other than the reserved codeword will not form a long
segment of consecutive l's in order to obtain a shorter clear
codeword. For any code tree, all codewords (except the reserved
all-ones codeword) must contain at least one 0, otherwise the
prefx property 6 is contradicted. Thus, we only have to consider
the consecutive 1 's formed by the concatenation of two code
words. The longest possible run of 1 's in the concatenation of
codewords, other than the reserved codeword, is formed by the
codeword with longest sufx of l's followed by the codeword
with longest prefx of 1 's. If the reserved codeword is n 1 's,
there must be at least one codeword having a prefx of n 1 l's
and one O. Also, no codeword except the reserved codeword has
a prefx of l' s longer than n 1 as this would contradict the
prefx property. Thus, for the codewords other than the reserved
one, the maximal number of the prefx l's is always n 1.
Therefore, in order to ohtain the shortest clear codeword, we
only need to minimize the maximal number of consecutive l's in
the sufixes of the codewords. The resulting clear codeword is
the reserved all l' s appended by this number of longest sufx l' s
and a O. Thus, knowing the code length for each codeword from
the Hufman algorithm, we can rearrange the shape of the code
tree such that the longest sufx l' s of the code words (excluding
the reserved codeword) is minimized.
In the tree representation of a variable-length code, the num
ber of the leaves on level k is the number of codewords with
length k, while the root is viewed as level O. The key idea for
minimizing the sufx l's of codewords is the assignment of
noes with the longest sufx l's (except the all-one reserved
pattern) as leaves on each level. Thus, the nodes with longer
sufx l's are terminated into leaves, thereby thwarting the sufx
l's growth to the next level. The assignment will result in a code
tree with shortest sufx l' s of its code words . This code tree is
thus the optimal code tree in the sense that its reserved all-one
codeword needs least extension bits to make it clear.
An example of this optimal code tree is shown in Fig. 8. In
this example, the variable-length code contains three 3-bit code-
6
No codeword is a prefx of any other codeword.
154 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. VOL. 1. NO. I. MARCH lYYl
words, six 4-bit codewords, six 5-bit codeword" and four I-hit
codewords. Totally, there are 19 codewords, including the re
served codeword with eodclength of 5 bits. The frst and second
level have no leaves since no codeword is one or two hits long.
On the third level, there are 8 nodes and tree of them must be
assigned as leaves, since three 3-bit codewords are required.
The nodes with longer sufx I' s (exeept the all-one node) are
assigned to be leaves first. For example, node 011 has the
longest sufx l's (except the all-one pattern), so it is the frst
node we want to assign as a leaf. Besides thi s, nodes 001 and
101, which have the second longest sufx l' s, are also assigned
as leaves, The leave assignment for other levels is similar
except, on level 5, the node 11111 is frst chosen as a leaf sincc
it is the required 5-bit reserved codeword, In this example, the
longest sufx l' s is 2 bits, To make the reserved 11111 clear, it
has to be extended three more bits as 11111110.
V. SUMMARY
In this paper, a complete entropy coding system for HDTV
applications is described. Parallel structures of the entropy coder
and decoder are introduced. This parallel entropy coder (or
decoder) encodes (or decodes) each codeword in one clock cycle
regardless of its code length. Thus, te required clock rate is
lower and a parallel processing system is also easy to design.
These parallel entropy coder and decoder are implemented in
two experimental prototype chips which are capable to encode
and decode 52 million samples/s. Clear codewords are intro
duced for variable-length codeword synchronization and multi
plexing. A systematic method of designing an efcient code with
clear codewords is described.
ACKNOWLEDGMEN'j'
We would like t thank J. A. Bellisio, P. E. Fleischer, and
M. E. Lukacs for stimulating discussions and valuable com
ments. We also like to thank S. Palaniraj for hclping us on doing
layouts and writing the PLA generator.
REFERENCES
[1] W. K. Pratt, Digital Image Processing. New York: Wiley,
pp. 632, 1978.
[2] D. A. Hufman, "A method for te construction of minimum
redundancy codes," Proc. IRE, vol. 40, pp. 1098-1101. Sept.
1952.
[3] R. Hunter and A. H. Robinson, "International digital facsimile
coding standards," Proc. 1111, vol. 68, no. 7, pp. 854-867,
July 1980.
[4] CCITT SGXV, "Draft Revised Recommendation H.261-Video
code for audiovisual services at p X 6 kbit/s," OM XV-R
17-E, CCITT Study Group XV-Report R 17. Specialist Group
on Coding for Visual Telephony, Jan. 1990.
[5] W. H. Chen and W. K. Pratt, "Scene Adaptive Coder," IEEE
Trans. Commun., vol. COM-32, no. 3, pp. 225-232, Mar.
1984.
[6] T. -C. Chen, P. E. Fleischer, and S.-M. Lci, "A subband scheme
for advanced TV coding in BISD N applicati ons, " presented at
3rd Int. Workshop on HDTV, Italy, Aug. 1989.
[71 P. E. Fleischer, T.-c. Chen, and S.-M. Lei. " Coding of ad
vanced TV for BISDN usiug multiple subbands." Proc. of Inf.
Symp. on Circuits and Systems, New Orleans. LA. pp.
1314-1318, May 1990.
[8] R. Ballart and Y.-C Ching, "SONET: Now it's the standard
optical network," IEEE Communications lfagazine. Mar.
1989.
[9] J. L. Sicrc and A. Leger, "Silicon complexity of VLC decoder
vs Q-coder," JPEG N258, ISO/JTC1/SC2/WG8. CCITT SGVII.
Feb. 1989.
[ 101 J. W. Peake, "Decompaction, " IBM Technical Disclosure Bul
letin, vol. 26, no. 9, pp. 4794-4797, Feh. 1984.
[11] M. E. Lukacs, "Variable word length coding for a high data rate
DPCM video coder," in Proc. Picture Coding Symp., pp.
54-56, 1986.
[ 12] M.-T. Sun, K.-M. Yang, and K.-H. Tzou, "A high-speed pro
grammable VLSI for decoding variable-length codes," Applica
tions of Digital Image Processing XII, A. G. Tescher, ed.,
Proc. SPIE 1153, Aug. 1989.
l13J M.-T. Sun and S.-M. Lei, "A parallel variable-length-code
decoder for advanced television applications. presented at 3rd
Int' nl Workshop on HDTV. Italy. Aug. 1989.
[ 14] S.-M. Lei, M.-T. Sun, K. Ramachandran, and S. Palaniraj.
"VLSI implementation of an entropy coder and decoder for
advanced TV applications. " in Proc. of Int. Symp. on Circuits
and Systems, New Orleans, LA, pp. 3030-3033, May 1990.
[15] R. H. Krambeck, C. M. Lee, and H. S. Law, "High sped
compact circuits with CMOS," IEEE 1. Solid-State Circuits.
voL SC-17, pp. 614-619, June 1982.
[ 16] J. A. Pretorius, A. S. Shubal, and A. T. Salama. "Charge
redistribution and noise margins in domino CMOS logic," IEEE
Trans. Circuits Syst., vol. CAS-33, no. 8. pp. 786-793. Aug.
1986.
[17] P. C. Rossbach, R. W. Linderman, and D. M. Gallagher, "An
optimizing XROM silicon compiler," Proc. IEEE Custom In
tegrated Circuits Conj.. Portland, OR, pp. 13-16, May 4-7.
1987.
ll8J J. C. Maxted and J. P. Robinson. "Error recovery for variable
length codes," IEEE Trans. Inform. Theory, vol. 1T-31, no.
6. pp. 794-801. Nov. 1985.
[19] B. Rudner, "Construction of minimum-redundancy codes with an
optimum synchronizing property," IEEE Trans. Inform. The
ory, vol. 1T-17, pp. 478-487, July 1971.
[20] T. J. Ferguson and J. H. Rabinowitz. "Self-synchronizing
Hulfman codes," IEEE Trans. Inform. Theory, vol. IT-30,
no. 4, pp. 687-693, July 1984.
[21] P. G. Neumann, "Self-Synchronizing Sequential Coding with
Low Redundancy," Bell Sys. Tech. Journal, vol. 50, no. 3. pp.
951-981, Mar. 1971.
[22] P. G. Neumann, "Eficient Error-Limiting Variable-Length
Codes," IRE Trans. Inform. Theory, vol. IT-8, pp. 292-304.
July 1962.
[23] D. S. Lee and K. H. Tzou. "Hierarchical DCT coding of HDTV
for ATM networks. " Proc. ICASSP, vol. 4, pp. 2249-2252.
Apr. 1990.
[24] S.-M. Lei. "The construction of efcient variable-length codes
with clear synchr onizing codewords for digital video
applications," Packet Video ' 91, Kyoto. Japan, Mar. 18-19.
1991.
[25] T.-C. Chen, P. E. Fleischer, and K.-H. Tzou, "Multiple Block
size Transform Coding for Video Using a Subband Structure,"
IEEE Trans. Circuits Syst. Video Techno!., vol. 1, no. I, Mar.
1991.
LEI AND SUI: ENTROPY CODING SYSTEM FOR DIGITAL HDTV APPLICATIONS 155
Shaw-Min Lei (S'87-M'88) received the B.S.
and M.S. degrees from the National Taiwan
University. Taipei, R.O.C., in 1980 and 1982,
and the Ph.D. degree from the Universit of
California, Los Angeles in 1988, all in electri
cal engineering.
From 1982 to 1984, he was an Instructor of
Electrical Engineering at Naval Academy, Tai
wan. He has been with Bellcore. Red Bank, NI,
since August 1988. Presently, he is a member
of Technical Staf in the Digital Video District.
His current research interests include video coding, HDTV signal
processing. digital flter structure, VLSI architecture for digital signal
processing. data compression, and error control coding.
Ming-Ting Sun (S'79-M'85-SM'89) received
the B.S. degree fom National Taiwan Univer
sity in 1976, the M.S. degree from the Univer
sity of Texas at Alngton in 1981, and the
Ph.D. degree fom the University of California,
Los Angeles, in 1985, all in electrical engineer
ing.
He has been with Bellcore, Red Bank, NJ,
since 1985, where he is a Member of Technical
Staf. His reserch interests include VLSI archi
tecture and algorithms for video processing,
digital signal processing, and adaptive filters.
Dr. Sun received an Award of Excellence from Bellcore in 1987. He
has been published in about 30 publications and has been awarded a
patent. He is the Chairman of the IEEE CAS Standards Committee and
is an Associate Editor of the IEEE TRANSACTONS ON CIRCUITS AND
SYSTEMS POR VIDEO TECHNOLOGY.

Vous aimerez peut-être aussi