Académique Documents
Professionnel Documents
Culture Documents
Overview
In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm Present a procedure for building Huffman codes when the probability model for the source is known A procedure for building codes when the source statistics are unknown Describe a new technique for code design that are in some sense similar to the Huffman coding approach Some applications
2
10
11
12
13
14
15
16
Codeword assignment
Start from the smallest source and work back to the original source Each merging point corresponds to a node in binary codeword tree
17
Example 1
We have an image with 2 bits/pixel, giving 4 possible gray levels. The image is 10 rows by 10 columns. In step 1 we find the histogram for the image.
18
Example 1
Converted into probabilities by normalizing to the total number of pixels
Gray level 0 has 20 pixels Gray level 1 has 30 pixels Gray level 2 has 10 pixels Gray level 3 has 40 pixels
a. Step 1: Histogram
19
Example 1
Step 2, the probabilities are ordered.
20
Example 1
Step 3, combine the smallest two by addition.
21
Example 1
Step 4 repeats steps 2 and 3, where reorder (if necessary) and add the two smallest probabilities.
Example 1
Step 5, actual code assignment is made.
Start on the right-hand side of the tree and assign 0s & 1s 0 is assigned to 0.6 branch & 1 to 0.4 branch
23
Example 1
The assigned 0 & 1 are brought back along the tree & wherever a branch occurs the code is put on both branches
24
Example 1
Assign the 0 & 1 to the branches labeled 0.3, appending to the existing code.
25
Example 1
Finally, the codes are brought back one more level, & where the branch splits another assignment 0 & 1 occurs (at 0.1 & 0.2 branch)
26
Example 1
Exercise
Using the example 1, find a Huffman code using the minimum variance procedure.
28
Example 2
Step 1: Source reduction
symbol x S N E W
Example 2
Step 2: Codeword assignment symbol x p(x) NEW 0.5 0.5 0.5 S 0 N 1 0 S 0.25 0.25 0 0.5 EW 10 0 E 0.125 N 1 0 0.25 1 W 0.125 1 111 110 W E 1
30
Example 2
1
NEW 0 1 0 S EW 10 N 1 0 110 W E
The codeword assignment is not unique. In fact, at each merging point (node), we can arbitrarily assign 0 and 1 to the two branches (average code length is the same).
31
Example 2
Step 1: Source reduction symbol x e a i o u p(x) 0.4 0.4 0.4 0.6 (aiou) 0.4
0.2 0.2
0.2 (ou)
compound symbols
32
Example 2
Step 2: Codeword assignment symbol x e a i o u p(x) 0.4 0.4 0.4 0.6 0 (aiou) 0.4 1 codeword 1 01 000 0010 0011
0.2 0.2
0.2 (ou)
compound symbols
33
Example 2
0 1 (aiou) e 00 01 (iou) a 000 001 (ou) i 0010 0011 o u binary codeword tree representation
34
Example 2
symbol x e a i o u p(x) 0.4 0.2 0.2 0.1 0.1 codeword length 1 1 01 2 3 000 0010 4 0011 4
l
i 1
pi li
H(X )
i 1
pi log 2 pi
2.122bps
r l
H ( X ) 0.078bps
If we use fixed-length codes, we have to spend three bits per sample, which gives code redundancy of 3-2.122=0.878bps
35
Example 3
Step 1: Source reduction
compound symbol
36
Example 3
Step 2: Codeword assignment
compound symbol
37
38
39
Update Procedure
40
Update Procedure
41
Update Procedure
42
Update Procedure
43
Update Procedure
44
Update Procedure
45
46
T
Stage 1 (First occurrence of t ) r /\ 0 t(1) Order: 0,t(1) * r represents the root * 0 represents the null node * t(1) denotes the occurrence of T with a frequency of 1
47
TE
Stage 2 (First occurrence of e) r / \ 1 t(1) / \ 0 e(1) Order: 0,e(1),1,t(1)
48
TEN
Stage 3 (First occurrence of n ) r / \ 2 t(1) / \ 1 e(1) / \ 0 n(1) Order: 0,n(1),1,e(1),2,t(1) : Misfit
49
Reorder: TEN
r / \ t(1) 2 / \ 1 e(1) / \ 0 n(1) Order: 0,n(1),1,e(1),t(1),2
50
TENN
Stage 4 ( Repetition of n ) r / \ t(1) 3 / \ 2 e(1) / \ 0 n(2) Order: 0,n(2),2,e(1),t(1),3 : Misfit
51
Reorder: TENN
r / \ n(2) 2 / \ 1 e(1) / \ 0 t(1) Order: 0,t(1),1,e(1),n(2),2 t(1),n(2) are swapped
52
TENNE
Stage 5 (Repetition of e ) r / \ n(2) 3 / \ 1 e(2) / \ 0 t(1) Order: 0,t(1),1,e(2),n(2),3
53
TENNES
Stage 6 (First occurrence of s) r / \ n(2) 4 / \ 2 e(2) / \ 1 t(1) / \ 0 s(1) Order: 0,s(1),1,t(1),2,e(2),n(2),4
54
TENNESS
Stage 7 (Repetition of s) r / \ n(2) 5 / \ 3 e(2) / \ 2 t(1) / \ 0 s(2) Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit
55
Reorder: TENNESS
r / \ n(2) 5 / \ 3 e(2) / \ 1 s (2) / \ 0 t(1) Order : 0,t(1),1,s(2),3,e(2),n(2),5 s(2) and t(1) are swapped
56
TENNESSE
Stage 8 (Second repetition of e ) r / \ n(2) 6 / \ 3 e(3) / \ 1 s(2) / \ 0 t(1) Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit
57
Reorder: TENNESSE
r / \ e(3) 5 / \ 3 n(2) / \ 1 s(2) / \ 0 t(1) Order : 1,t(1),1,s(2),3,n(2),e(3),5 N(2) and e(3) are swapped
58
TENNESSEE
Stage 9 (Second repetition of e )
r
/ \1 e(4) 5 0 / \1 3 n(2) 0/ \1 1 s(2) 0 / \1 0 t(1)
0
Order : 1,t(1),1,s(2),3,n(2),e(4),5
59
ENCODING
The letters can be encoded as follows: e:0 n : 11 s : 101 t : 1001
60
frequency
61
ENTROPY
Entropy = i=1,n
= - ( 0.44 * log20.44 + 0.22 * log20.22 + 0.22 * log20.22 + 0.11 * log20.11 ) = - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11) / log2 = 1.8367
62
63
SUMMARY
The average code length of ordinary Huffman coding seems to be better than the Dynamic version,in this exercise. But, actually the performance of dynamic coding is better. The problem with static coding is that the tree has to be constructed in the transmitter and sent to the receiver. The tree may change because the frequency distribution of the English letters may change in plain text technical paper, piece of code etc. Since the tree in dynamic coding is constructed on the receiver as well, it need not be sent. Considering this, Dynamic coding is better. Also, the average code length will improve if the transmitted text is bigger.
64