Vous êtes sur la page 1sur 6

Implementation Approaches to Huffman Decoding

Satish D. Warhade
Senior Software Engineer
Wipro Technologies,
26,Chamundi Complex, Bommanahalli
Bangalore-560068 INDIA
Tel +91-80-5732296
satish.warhade@wipro.com

ABSTRACT providing more computational power. At the same time, the


Processors have brought flexibility and programmability to the memory is becoming cheaper though not at the same rate as that
computational world. The emerging DSPs (Digital Signal of processor speed. The availability of fast processors enabled
Processors) are becoming fast in order to run the state of the art applications that need high MIPS. But due to relatively expensive
applications like Audio, Imaging, Video and Vocoders. Though on-chip memory and expensive bandwidth connecting PCs
the emerging DSPs have (except for video) sufficient MIPS (Personal Computer) through Internet, the necessity of efficient
(Million Instructions per Second) for running above-mentioned data compression schemes for storage and transmission was felt.
applications/standards, it is imperative that the applications
consume less MIPS and Memory to provide cost-effective This resulted in a spurt of activities generating Audio, Video,
solutions to the end users. Though the DSP architectures are Imaging and Speech codec (coder/decoder) standards. Storing of
optimized for Signal Processing applications, they are not so in an uncompressed signal requires a large amount of space.
case of search algorithms. But, Huffman encoding/decoding, Data compression reduces the number of bits required to store and
which uses search algorithms, has become one of the essential transmit the information. Due to the explosion of multimedia
components of the compression standards. Hence it is essential applications, effective compression techniques are becoming
that the Huffman Encoder/Decoder should be efficiently essential.
implemented on the DSP chosen. The complexity of decoder’s
implementation lies in fast search of the symbol encoded from the Huffman coding is a loss-less compression technique, often used
bit-stream without consuming large memory. These two in lossy compression schemes as the final step after
requirements are conflicting and in addition, the standards do decomposition and quantization of a signal. Huffman coding uses
have multiple tables of large code length. This paper describes unique variable length code words and no Huffman code is prefix
implementation of three different Huffman decoding techniques, of the other codes in the table. For a given probability density
discusses their relative merits/demerits and also suggests MIPS function of symbol set, by assigning short codes to frequently
and Memory efficient Huffman Decoder. occurring symbols and longer codes to infrequently occurring
symbols, Huffman's minimum redundancy encoding minimizes
Categories and Subject Descriptors the average number of bits required to represent the data.
Huffman coding is one of the Variable Length Coding (VLC).
Algorithms
General Terms Huffman coding is useful for the reduction of bit-rate by exploring
statistical redundancies and to encode a "minimum set" of
Algorithms information using entropy-coding technique. Usually entropy
Keywords coders exploit the symbol probabilities independent of previous
symbol, and hence they are optimal for uncorrelated sequences.
Huffman Decoding, Performance optimization

1. INTRODUCTION Table
Selection Table Number
Compression technology has gained significance due to various Logic
reasons. A typical audio clip lasts for at least 3 minutes. This
would amount to a storage requirement of around 8-9MB for Table Number
PCM audio samples. VLSI technology is enabling faster DSPs
Table
Table
symbol selected
selected Encoded Bit-stream, symbol
Table Number
Huffman Encoder Huffman Decoder

Figure 1: Functional Block Diagram of Huffman Encoder and


Decoder
2. SCOPE OF THE PAPER
Since Huffman coding is one of the key components of 3.1.1 Procedure for Table conversion
encoder/decoder in many of the compression standards, there is a
Table1 is used to demonstrate conversion of a valid Huffman
necessity of its implementation on the DSPs for cost effective
Table into Look-Up form.
solution. But the architecture and instruction set of DSPs are
optimised for computations with operands that are byte, half word Table1 has maximum Huffman code length of 3.
or word size. Since the symbols have variable length after The first row of the table has, symbol = 0, Huffman code length =
encoding, there is a necessity of extracting code words that are not 1 and Huffman code = 0. Since Huffman code length is less than
necessarily byte, half word or word size. On the other hand, the the maximum value, it can have symbol entries more than one in
memory access always fetches data that are only byte, half word the converted table.
or word aligned. Due to this, the speed of Huffman Decoder
implemented on DSPs is lower than that of corresponding Addresses for the look-up table entries are calculated as:
implementation on dedicated hardware. In addition to this, most Bits required for appending Huffman code = Maximum Huffman
of the compression standards have multiple Huffman tables code length – Length of Huffman code.
containing long code words that have to be stored on expensive Bits required to append Huffman code = 3 – 1 = 2
on-chip memory for fast access. These two factors emphasize the
importance of high speed and memory efficient implementation of 000 = 0, 001 = 1, 010 = 2, 011 = 3
Huffman decoding on Digital Signal Processors. These are the four addresses from 0 to 3, for which symbol = 0
This paper describes three different implementation techniques of and Huffman code length = 1 in the converted table.
Huffman Decoding that facilitate trade-off between memory and Next row of the table has symbol = 1, Huffman code length = 2
processing power requirements. The paper does not cover the and Huffman code = 3. Number of bits required to append
essentials and proof of optimality. Depending on the requirements Huffman code = 3 – 2 = 1.
of memory efficiency and time cost, different implementations of
110 = 6, 111 = 7
Huffman Decoding can be used.
Addresses 6 and 7 will have symbol = 1, Huffman code length =
This paper explains the following implementation techniques of
2. In this way, other addresses of the table entries are calculated.
Huffman Decoding:
Since the maximum code length for Table1 is 3, the number of
• Look-Up Table Method 3
rows in the converted table will be 2 = 2 = 8.
• N-Level Look-Up Table Method
Table2: Below shows converted Huffman Table for Table1
• Binary Tree Search Method Huffman code
Address Symbol
length
0 0 1
3. HUFFMAN DECODING TECHNIQUES
1 0 1
3.1 Look-up Table Method 2 0 1
In this implementation, Huffman tables used by the algorithm 3 0 1
should be converted in the form of look-up table as explained
below. Number of rows in the table will be 2L where L is 4 3 3
maximum Huffman code length. Tables used for this technique 5 6 3
must contain entry of symbols with its corresponding Huffman 6 1 2
code lengths. Huffman code length for each symbol is stored in
7 1 2
the table to determine the number of bits used for decoding a
symbol. Remaining bits are put back into the bit-stream, where the
next symbol starts. Number of bits to be extracted from the bit-
stream for every symbol search depends upon the maximum 3.1.2 Decoding Procedure
Huffman code length in the table. Huffman symbols are decoded For one symbol extraction the algorithm is:
within one search since extracted bits from the bit-stream gives Algorithm: Look Up Table
address to the symbol in the table
if ( !(end of bit-stream) )
Table 1: Huffman Table {
Huffman Huffman Binary bits = extract maximum Huffman code length number of bits from
Symbol
code length code Huffman code
bit-stream
0 1 0 0
symbol = symbol[bits]
1 2 3 11 length = Huffman code length[bits]
move the bit-stream pointer by (maximum Huffman code length -
3 3 4 100
length) number of bits backward
6 3 5 101 }
Example: Let "1000110" be a valid bit-stream. Since 3 is the The hit or miss entry in the table is either "1" or "0" respectively.
maximum Huffman code length of Table2, extract 3 bits from the For every hit entry in the table corresponding symbol and
bit-stream. First 3 bits give address "100" (= 4), which Huffman code length is stored and every miss entry in the table
corresponds to symbol 3 and Huffman code length = 3 in the contains corresponding next table index and number of bits
look-up table. Again extract 3 bits "011" (= 3) to get the next required for next level look-up. Symbol search in a worst case can
symbol from the bit-Stream. This address gives symbol = 0 and go up to the last Look-Up Table. Splitting of Huffman Table into
Huffman code length = 1. Since 3 bits are extracted from the bit- N-Level Look-Up Table decides maximum number of searches
stream and the actual length of Huffman code is 1, hence 2 bits required in getting symbol.
are put back into the bit-stream. The bit-stream pointer now points If maximum length of the Huffman code word is M and the
to 5th bit in the bit-stream. Again extract 3 bits from the bit–stream number of levels chosen are 3, the code word is split as follows to
for addressing "110" (= 6), its corresponding symbol = 1 and address the 3-Level Look-Up table.
Huffman code length = 2, put back 1 bit into the bit-stream. In
this way the symbols are searched in Look-Up Table method.
This method requires huge memory for tables having long 3.2.1 Procedure for Table conversion
Huffman codes and hence inefficient for such tables. But it is very Table3 is used to demonstrate conversion of a valid Huffman
useful for the tables having small Huffman codes. This is the Table into N-Level Look-Up Table form. For this table, maximum
fastest Huffman Decoding Technique. The table size requirement Huffman code length is 9. Let us choose 3-Level look-up table
will be as follows: method by choosing L1 = L2 = L3 = 3. All tables will have 8 (23)
entries in the table since maximum Huffman code length is
TableSize ∝ 2 L Where L is maximum Huffman code length. restricted to 3. Since L1 is restricted to 3, the codes having length
<=3 will be taken for 1-Level Look-Up table. The codes with
3.2 N-Level Look-Up Table Method length > 3 and <= 6 will be in 2-Level Look-Up and codes with
This method is an extension of Look-Up Table method described length > 6 will be in 3-Level Look-Up. Look-up Table can be
in sec. 3.1, which involves at most N-Level look-up tables. Based formed as mentioned in sec. 3.1.1. In worst case, 3 searches are
on the requirement of Huffman symbol search, Huffman codes required to get the symbol. For every hit, respective symbol and
can be split into N Level of Look-Up Tables. Tables used for this Huffman code length is stored. For every miss, respective next
technique must contain entry of symbols or next table index with level tables index and number of bits required for next level look-
its corresponding Huffman code lengths or number of bits up is stored.
required for next level look-up and hit or miss status. Number of
rows in each table will be 2L where L is maximum Huffman code
length in that table. Huffman code length for each symbol is 3-Level Look-Up Table method is illustrated with the help of
stored in table to determine the number of bits used for decoding a following example:
symbol. Remaining bits are put back into the bit-stream, where the
Table3: Huffman Table
next symbol starts. Number of bits to be extracted from the bit-
stream for every symbol search depends upon the maximum Huffman Binary
Symbol Huffman code
Huffman code length in the corresponding tables. code length Huffman code
L1+L2+L3 = M 2 1 0 0
Code Word
4 2 3 11
L1 L2 L3
5 3 5 101
7 4 9 1001
Level 1 Level 2 Level 3
9 5 17 10001
10 7 64 1000000
11 7 66 1000010
miss
miss
17 8 131 10000011
miss
19 8 134 10000110
25 8 135 10000111
T2 T4
T1 27 9 260 100000100
31 9 261 100000101

miss Table4, Tabel5, Table6 and Table7 are converted Look-Up Tables
for 3-Level Look-Up Table method of Table3:

T3 T5
Note: Hit = 1 and Miss = 0

Figure2: Block Diagram of N-Level Look-Up Table Method.


3.2.2 Decoding Procedure:
Algorithm: N-Level Look Up Table Example: Let "1001101011" be a valid bit-stream. Since
hit = 0 Huffman Tables are split for 3 bit Look-Up Table, extract 3 bits
m = L1 from the bit-stream. First 3 bits give address "100" (= 4), which
corresponds "0" (miss) entry in the table, index of next table is
while ( !(end of bit-stream) )
Table5 and 3 bits are required for next level Look-up table. Again
{ extract 3 bits "110" (= 6) to search the symbol. This address gives
bits = extract m number bits from bit-stream "1" as hit or miss entry so valid symbol = 7 and Huffman code
hit = Hit/Miss[bits]
length = 4, but till now (3+3) 6 bits are extracted from the bit-
stream and the actual length of Huffman code is 4, hence 2 bits
if(hit == 1) are put back into the bit-stream. The bit-stream pointer now points
{ to fifth bit in the bit-stream. Again extract 3 bits from the bit–
symbol = symbol[bits] stream for addressing "101" (= 5), hit or miss entry is "1" and
corresponding symbol = 5 and Huffman code length = 3. In this
length = Huffman code length/Bits for next Level [bits]
way the symbols are searched for 3-Level Look-Up Table method.
move the bit-stream pointer by (maximum Huffman code length -
length) number of bits backward This method uses modest memory and gives faster approach for
Huffman symbol search. This is the best method when memory
break and time taken to search Huffman symbols are critical. It is
} memory efficient for the tables having long Huffman codes. As
else can be seen in Figure2, the table size requirement will be as
follows:
{

TableSize ∝ [2 L1 + ∑ 2 Llevel * Misses level −1 ]


select next table = symbol[bits]
m = Huffman code length/Bits for next Level [bits]
}
}
Table4: 1-Level Look-Up Table Table6: 3-Level Look-Up Table
Symbol/ Huffman code Symbol/ Huffman code
Address Table Length/Bits for Hit/Miss Address Table Length/Bits for Hit/Miss
Index next level. Index next level.

0 2 1 1 0 10 7 1

1 2 1 1 1 10 7 1

2 2 1 1 2 10 7 1
3 2 1 1 3 10 7 1
4 Table 5 3 0 4 27 9 1
5 5 3 1 5 31 9 1
6 4 2 1 6 17 8 1
7 4 2 1 7 17 8 1

Table5: 2-Level Look-Up Table Table7: 3-Level Look-Up Table


Symbol/ Huffman code Symbol/ Huffman code
Address Table Length/Bits for Hit/Miss Address Table Length/Bits for Hit/Miss
Index next level. Index next level.
0 Table 6 3 0 0 11 7 1
1 Table 7 3 0 1 11 7 1
2 9 5 1 2 11 7 1
3 9 5 1 3 11 7 1
4 7 4 1 4 19 8 1
5 7 4 1 5 19 8 1
6 7 4 1 6 25 8 1
7 7 4 1 7 25 8 1
3.3 Binary Tree Search Method Converted Table for Binary Tree Search method of Huffman
In this implementation technique, Huffman tables used should be Table (Table3):
converted in the form of binary trees as explained below. A Table 8
binary tree is a finite set of elements that is either empty or is Address Symbol/offset Hit/Miss
partitioned into three disjoint subsets. The first subset contains a
single element called the root of the tree. The other two subsets 0 2 1
are called as left and right sub-trees of the original tree. Each 1 1 0
element of a binary tree is called a node of the tree. A branch 2 2 0
connects two nodes. Nodes without any branch are called leaves.
3 4 1
Binary Tree Search method uses leaves of the tree for storing the
symbols. Tables used for this technique must contain entries of 4 2 0
symbols or miss-offset and hit or miss status. The hit corresponds 5 5 1
to "1" and miss corresponds to “0” for respective entry in the
6 2 0
table. For every hit corresponding symbol entry is stored and for
every miss corresponding miss-offset is stored. Huffman decoding 7 7 1
for a symbol search begins at the root of binary tree and ends at 8 2 0
any of the leaves; one bit for each node is extracted from bit-
9 9 1
stream while traversing the binary tree. Huffman code length of a
decoded symbol is equal to number of bits extracted from the bit- 10 2 0
stream. The maximum number of searches required for decoding a 11 3 0
symbol depends upon the maximum Huffman code length in the
12 10 1
table. This method is a compromise between memory requirement
and number of Huffman Code searches, as compared to N-Level 13 3 0
Look-Up Table method. 14 11 1

3.3.1 Procedure for Table Conversion: 15 3 0


Table3 is used to demonstrate conversion of a valid Huffman 16 4 0
Table into Binary Tree Search form. Figure3 shows Binary Tree 17 17 1
representation for Table3. From Figure3, Table8 is constructed for
Binary Tree Search method. For every hit, corresponding symbol 18 19 1
entry is stored and for every miss corresponding miss-offset is 19 25 1
stored in the table. 20 27 1
0 1 21 31 1
Note: Bold entries in symbol/offset column are symbols.
2
0 1 3.3.2 Decoding Procedure:
4
Algorithm: Binary Tree Search
Address = 0
0 1
offset = 0
5 hit = 0
0 1
while ( !(end of bit-stream) )
7 {
0 1 bit = extract one bit from the bit-stream
Address = Address + offset + bit
9
hit = Hit/Miss[Address]
0 1
if(hit == 1)

0 1 {
0 1 symbol = symbol/offset[Address]
11
10 0 1 break
0 1 }
19 25
17 else
0 1 {

27 31 offset = symbol/offset [Address]


}
Figure3: Binary Tree Structure representation of Table3 }
Example: Let "1001101011" be a valid bit-stream. Since • Symbols, Huffman code length and Hit/Miss status can
Huffman Table8 is arranged for Binary Tree Search, extract 1 bits be packed into single word to reduce the memory
from the bit-stream for every search. First bits from the bit-stream
is “1” (=1), which corresponds "0" (miss) entry in the table, miss-
offset = 1. Extract next bit "0" (= 0) to search the symbol. This 5. CONCLUSION
index gives "0" as hit or miss entry and miss-offset = 2. Again Comparison of different Huffman decoding methods for Table3,
extract next bit “0” from the bit-stream this gives hit or miss entry
“0” and miss-offset = 2. Next bit from the bit-stream is “1” for Memory Operations
which hit or miss entry is “1” so valid symbol = 7. Till now 4 bits Huffman required for required for
are extracted from the bit-stream so Huffman code length = 4. In decoding converted tables decoding one
this way the symbols are searched for Binary Tree Search method. methods symbol in worst
(in words) case
This method is memory efficient for tables having long Huffman
codes, increases number of searches to decode a symbol. As can [M1] Look-Up
512*2 = 1024 4
be seen in Figure3, the table size requirement will be as follows: Table method
max codelength [M2] 2-Level
TableSize ∝ [2 + ∑2
bit = 2
* Missesbit −1 ] Look-Up Table
method
48*3 = 144 11

[M3] 3-Level
Look-Up Table 8*4*3 = 96 16
4. IMPLEMENTATION CONSIDERATION method
• For Huffman code tables having small maximum length [M4] 4-Level
the Look-Up Table Method is efficient both in terms of Look-Up Table 28*3 = 84 21
memory and speed. But, if maximum length is large, this method
implementation will require huge memory.
[M5] Binary
• On the other hand, Binary Tree Search Method is a Tree search 22*2 = 44 54
memory efficient method since it consumes least method
memory space. But, the speed will reduce by a factor of
maximum length of Huffman code. Note: Operations include extraction of bits, table look-up, pointer
arithmetic etc. Extraction of bits from the bit-stream is considered
• If both memory and speed is a concern, which is as a single operation.
generally a requirement, a combination of these two
methods can be employed after careful analysis of the
Huffman tables. In most of the tables, the code words
will be entered in the decreasing order of probability. If
that is not the case, it can be arranged so. This would
help in splitting of the tables appropriately in order to
combine the two methods to have fast decoding in most
of the cases and consuming less memory. In this case,
first search will be done on first part of the table using
Look-Up Table method. If there is a miss, the search
will be done in the second part of the table using Binary
Tree Search method.
• In the above method, if the sequence of code words is
such that very frequently the search visits second table
then there will be excessive overhead.
• N-Level Look-Up Table method is a compromise
between memory and speed. This method guarantees
sustained better performance relative to the combination
6. ACKNOWLEDGEMENT
My sincere thanks to Mr. Arun D. Naik without whose everlasting
of methods 1 and 3 in terms of speed. But the challenge
inspiration and valuable suggestions, this paper would not have
here is to arrive at the combination of optimum N-level
been possible. I want to express my gratitude to Mr. Madhu
and the number of bits (L1,L2,…) in each level. These
Parthasarathy for his constant encouragement and moral support. I
values have to be chosen after careful analysis of the
am deeply indebted to review team members and all my friends
tables to fit the decoding scheme within acceptable
for their valuable comments to improve this paper.
limits of memory and speed.
• In most of the applications, just one method for all 7. REFERENCE
tables may not provide required memory and speed.
[1] “Practical Huffman Coding” by Michael Schindler
Hence, different methods have to be judiciously chosen
for various Huffman tables. (www.compressconsult.com/huffman)

Vous aimerez peut-être aussi