Académique Documents
Professionnel Documents
Culture Documents
Greedy Algorithms
Idea: When we have a choice to make, make the one that looks best right now
Greedy algorithms dont always yield an optimal solution Makes the choice that looks best at the moment in order to get optimal solution.
Knapsack capacity: W
There are n items: the i-th item has value vi and weight
wi
Goal:
find xi such that for all 0 xi 1, i = 1, 2, .., n wixi W and xivi is maximum
E.g.:
Item 3 Item 2 Item 1 10 $60 $100 $120 30 20
20 $80 --+ 30
50
50
20 $100 + 10
$60
$240
Greedy strategy 1:
E.g.:
Pick the item with the maximum value per pound vi/wi If the supply of that element is exhausted and the thief can carry more: take as much as possible from the item with the next greatest value per pound
v1 v2 vn ... w1 w2 wn
While w > 0 and as long as there are items remaining pick item with maximum vi/wi xi min (1, w/wi)
4.
5.
Huffmans algorithm achieves data compression by finding the best variable length binary encoding scheme for the symbols that occur in the file to be compressed.
The more frequent a symbol occurs, the shorter should be the Huffman binary word representing it.
Overview
Huffman codes: compressing data (savings of 20% to 90%) Huffmans greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string
C: Alphabet
Example
Assume we are given a data file that contains only 6 symbols, namely a, b, c, d, e, f With the following frequency table:
Find a variable length prefix-free encoding scheme that compresses this data file as much as possible?
Left tree represents a fixed length encoding scheme Right tree represents a Huffman encoding scheme
Example
O(lg n) O(lg n)
O(lg n)
Cost of a Tree T
let f(c) be the frequency of c in the file let dT(c) be the depth of c in the tree
Let B(T) be the number of bits required to encode the file (called the cost of T)
B(T ) f (c)dT (c)
cC
The running time of Huffman's algorithm assumes that Q is implemented as a binary minheap. For a set C of n characters, the initialization of Q in line 2 can be performed in O (n) time using the BUILD-MINHEAP The for loop in lines 3-8 is executed exactly n - 1 times, and since each heap operation requires time O (lg n), the loop contributes O (n lg n) to the running time. Thus, the total running time of HUFFMAN on a set of n characters is O (n lg n).
Prefix Code
Prefix(-free) code: no codeword is also a prefix of some other codewords (Un-ambiguous) An optimal data compression achievable by a character code can always be achieved with a prefix code Simplify the encoding (compression) and decoding Encoding: abc 0 . 101. 100 = 0101100 Decoding: 001011101 = 0. 0. 101. 1101 aabe Use binary tree to represent prefix codes for easy decoding An optimal code is always represented by a full binary tree, in which every non-leaf node has two children |C| leaves and |C|-1 internal nodes Cost:
Huffman Code
If no characters occur more frequently than others, then no advantage over ASCII
Encoding:
Given the characters and their frequencies, perform the algorithm and generate a code. Write the characters using the code
Decoding:
Given the Huffman tree, figure out what each character is (possible because of prefix property)
Both the .mp3 and .jpg file formats use Huffman coding at one stage of the compression
Dynamic programming
We make a choice at each step The choice depends on solutions to subproblems Bottom up solution, from smaller to larger subproblems
Greedy algorithm
Make the greedy choice and THEN Solve the subproblem arising after the choice is made
The choice we make may depend on previous choices, but not on solutions to subproblems
Top down solution, problems decrease in size