Vous êtes sur la page 1sur 8

DATA

COMPRESSION
TEXT

DATA COMPRESSION
In Data Compression or bit-rate reduction involves encoding information
using fewer bits than the original representation.

Compression can be either lossy or lossless. Lossless compression reduces


bits by identifying and eliminating statistical redundancy.

No information is lost in lossless compression.

Lossy compression reduces bits by identifying marginally important


information and removing it.
Lets take an string : thisisthe
This would take 72 bits ! We can do better.
For this tiny string we would need 9 bits allowing up
to 512 unique characters.

Current Next Output Add to the


dictionary
t h t 116 th
256
h i h 104 hi
257
Current Next Output Add to the dictionary
i s i 105 is
258
s i s 115 si
259
i s is is in the dictionary
is t is 258 ist
260
t h th is in the dictionary
th e th 256 the
261
e - e 101 -
TEXT

RUN LENGTH ALGORITHM


Run-length encoding (RLE) is a very simple form of data
compression in which runs of data (that is, sequences in which
the same data value occurs in many consecutive data elements)
are stored as a single data value and count, rather than as the
original run.
Let us take a hypothetical single scan line, with B representing a
black pixel and W representing white:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWW
WWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
If we apply the run-length encoding (RLE) data compression
algorithm to the above hypothetical scan line, we get the
following: 12W1B12W3B24W1B14W
TEXT

HUFFMAN CODING
Huffman coding is a lossless data compression
algorithm. The idea is to assign variable length codes
to input characters, lengths of the assigned codes
are based on the frequencies of corresponding
characters.
The most frequent character gets the smallest code
and the least frequent character gets the largest
code.
TEXT

HUFFMAN CODING
1. Create a leaf node for each unique character and build a min
heap of all leaf nodes (Min Heap is used as a priority queue. The
value of frequency field is used to compare two nodes in min heap.
Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min
heap.
3. Create a new internal node with frequency equal to the sum of
the two nodes frequencies. Make the first extracted node as its left
child and the other extracted node as its right child. Add this node
to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node.
The remaining node is the root node and the tree is complete.
TEXT

BURROW-WHEELER ALGORITHM
When a character string is transformed by the BWT, none
of its characters change value.
The transformation permutes the order of the characters.
If the original string had several substrings that occurred
often, then the transformed string will have several places
where a single character is repeated multiple times in a
row.
This is useful for compression, since it tends to be easy to
compress a string that has runs of repeated characters by
techniques such as move-to-front transform and run-
length encoding.

Vous aimerez peut-être aussi