Vous êtes sur la page 1sur 29

IS502:MULTIMEDIA DESIGN FOR INFORMATION

SYSTEM

MULTIMEDIA OF DATA
COMPRESSION
Presenter Name: Mahmood A.Moneim
Supervised By: Prof. Hesham A.Hefny
Winter 2014

Multimedia Data Compression


Reduce the size of data.
Reduces storage space and hence storage cost.
Reduces time to retrieve and transmit data.

original data size


compression ratio
compressed data size
original Data
compressed Data

compress
decompress
By Mahmood A.Moneim

compressed Data
Decompressed Data
2

Lossless And Lossy Compression


Compression ratios of lossy compressors generally is
higher than lossless compressors.
E.g. 100(lossy) vs. 2(lossless).

Lossless compression is essential in applications


such as text file compression.
Lossy compression is acceptable in many imaging
and voice applications.
E.g. JPEG, MP3, etc.

By Mahmood A.Moneim

Kinds of Lossless Compressors


[1] Model and code
The source is modeled as a stochastic process.
The probabilities (or statistics) is given or acquired.
[2] Dictionary-based
There is no explicit model and there is no explicit
statistics gathering. Instead, a codebook (or
dictionary) is used to map source words into codewords.

By Mahmood A.Moneim

Model and Code


Example:
Shannon code
Huffman code
Arithmetic code

By Mahmood A.Moneim

Dictionary-based
Example:

LZ family
runlength code

By Mahmood A.Moneim

Basics of information theory


Entropy is a measure of disorder of the
system

By Mahmood A.Moneim

Shannon-Fano Algorithm
To illustrate the algorithm, lets suppose the symbols
to coded are characters in the word HELLO. The
frequency count of the symbols is

the top-down algorithm manner is:


Sort the symbols according to the frequency count.
Recursively divide the symbols into two parts, each with
approximately the same number counts, until all parts
contains only one symbol.

By Mahmood A.Moneim

Coding tree for HELLO by the


Shannon-Fano algorithm

By Mahmood A.Moneim

Cont.
Entropy

By Mahmood A.Moneim

10

Huffman code
Huffman code: (illustrated with a
manageable example)
Letter
Frequency (%)
A
25
B
15
C
10
D
20
E
30

Huffman code
Huffman code: Code formation
- Assign weights to each character
- Merge two lightest weights into one root
node with sum of weights .
- Repeat until one tree is left
- Traverse the tree from root to the leaf (for
each node, assign 0 to the left, 1 to the right)

Huffman code
Huffman code: Code Interpretation
- No prefix property: code for any character
never appears as the prefix of another code
(Verify)
- Receiver continues to receive bits until it
finds a code and forms the character
- 01110001110110110111 (extract the string)

Example. Find Huffman codes and compression ratio (C.R.) for Table 1,
assuming that uncompressed representation takes 8-bit per character and
assume that size of Huffman table is not part of the compressed size.
Table 1:
Char

Freq

90

60

50

20

12

Huffman Codes:
A

00

01

10

111

1101

11001

110000

110001

11

10

01

000

0010

00110

001111

001110

14

Huffman Tree
250
/
\
150 100
/ \ / \
A B C 50
/ \
30 D
/ \
18 E
/ \
10 F
/ \
G H

Char

Freq

90

60

50

20

12

Huffman
Code

00

01

10

111

1101

11001

110000

110001

C.R. = (250*8) / (2*90 + 2*60 + 2*50 + 3*20 + 4*12 + 5*8 + 6*7 + 6*3) = 3.29
15

Decompression - Huffman Codes


A

00

01

10

111

1101

11001

110000

110001

11

10

01

000

0010

00110

001111

001110

Compress DEAF using above Huffman Codes.


111 1101 00 11001
Decompress 110001 1101 00 111
Ans.: HEAD
Sunday, November 24, 2013

WILPD, B.I.T.S., PILANI EA ZC473


Multimedia Computing On-Line Lecture-6

16

Arithmetic compression
Arithmetic compression: is based on
Interpreting a character-string as a single real
number
Letter
Frequency (%) Subinterval [p, q]
A
25
[0, 0.25]
B
15
[0.25, 0.40]
C
10
[0.40, 0.50]
D
20
[0.50, 0.70]
E
30
[0.70, 1.0]

Arithmetic compression
Arithmetic compression: Coding CABAC
Generate subintervals of decreasing length,
subintervals depend uniquely on the strings
characters and their frequencies.
Interval [x, y] has width w = y x, the new
interval based on [p, q] is x = x + w.p, y = x +
w.q
Step 1: C 0..0.4.0.5..1
based on p = 0.4, q = 0.5

Arithmetic compression
Step 2: A 0.40.425.....0.5
based on p = 0.0, q = 0.25
Step 3: B
0.40.406250.41..0.425
based on p = 0.25, q = 0.4
Step 4: A
Step 5: C
0.406625 0.4067187
Final representation (midpoint)?

Arithmetic compression
Arithmetic compression: Extracting CABAC
N
0.4067
0.067
0.268
0.12
0.48

Interval[p, q]
0.4 0.5
0 0.25
0.25 0.4
0 0.25
0.4 0.5

Width Character
0.1
C
0.25
A
0.15
B
0.25
A
0.1
C

N-p
0.0067
0.067
0.018
0.12
0.08

(N-p)/width
0.067
0.268
0.12
0.48
0.8

When to stop? A terminal character is added to the original


character set and encoded. During decompression, once it is
encountered the process stops.

LZW Algorithm
LZW Compression
Begin
S= next input character
While not EOF
{
C= next input character
Is s+c exists in the dictionary
S= s+c
Else{
Output the code for s;
Add String s+ c to dictionary with a new code
S=c
}
}
Output the code for s
End

By Mahmood A.Moneim

21

LZW for String ABABBABCABABBA


Initially containing only three characters

By Mahmood A.Moneim

22

Cont.

By Mahmood A.Moneim

23

LZW Decompression

By Mahmood A.Moneim

24

Cont.
Input code for the decoder is 124523461.

By Mahmood A.Moneim

25

Run Length Encoding


Huffman code requires:
- frequency values
- bits are grouped into characters or units
Many items do not fall into such category
- machine code files
- facsimile Data (bits corresponding to light or
dark area of a page)
- video signals

Run Length Encoding


For such files, RLE is used.
Instead of sending long runs of 0s or 1s, it
sends only how many are in the run.
70%-80% space is white on a typed character
space, so RLE is useful.

Run Length Encoding

Runs with different characters


Send the actual character with the run-length
HHHHHHHUFFFFFFFFFYYYYYYYYYYYDGGGGG
code = 7, H, 1, U, 9, F, 11, Y, 1, D, 5, G

SAVINGS IN BITS (considering ASCII): ?

QUESTIONS?

By Mahmood A.Moneim

29

Vous aimerez peut-être aussi