Vous êtes sur la page 1sur 5

CS500: Fundamentals of Algorithm Design and Analysis Fall 2006

Lecture Lecture 5 — September 26, 2006


Prof. Will Evans Scribe: David Meger

In this lecture we:

• Analyzed the expected running time of the randomized K-Select algorithm;

• Introduced the deterministic K-Select algorithm and analyzed it’s worst case performance;

• Began investigation of a lower bound for Median Finding.

Reading: Will mentioned several useful references during the course of this lecture:

• “generatingfunctionology” by H. Wilf

• “An Introduction to the Analysis of Algorithms” by Robert Sedgewick and Philippe Flajolet

both of which provide more information on generating functions and:

• “Computer Algorithms: Introduction to Design and Analysis” by Sara Baase and Allen Van
Gelder

which gives information on adversary arguements.

1 Expected Running Time of Randomized K-Select

The randomized K-Select algorithm is as follows:

RSelect(A, k)
1 if |A| = 1
2 then return A[1]
3 Pick pivot p from A[1...n] at random
4 L = {A[i]|A[i] < p}
5 R = {A[i]|A[i] > p}
6 if |L| > k
7 then return RSelect(L, K)
8 else if |L| = k − 1
9 then return p
10 else return RSelect(R, k − |L| − 1)

There are several significant observations to make about this algorithm:

1
• Randomization in this algorithm protects against malicious input ordering.

• If we can guarantee that the pivot is a “middle element”, then the size of the problem would
decrease “substantially” with each step.
• Formally, if the pivot has ≤ (1 − )n elements smaller and ≤ (1 − )n elements larger, then
the running time will have the following recursion:

T (n) ≤ cn + T ((1 − )n)


≤ cn + c(1 − )n + c(1 − )2 n + ...
≤ cn(1 + (1 − ) + (1 − )2 + ...)
cn



Intuition: A random pivot will be a “middle element” often enough to cause the problem size to
decrease rapidly. To formalize this, suppose we want a pivot that has ≤ 34 n elements smaller and
≤ 43 elements larger. Graphically, this means we want a pivot in the following range:

A pivot in the middle n/2 numbers is acceptable.


The probability of choosing such a “good” pivot is at least one half, and if we choose such a pivot,
we reduce the number of elements under consideration to less than 43 of the previous size.
Analysis of Expected Running Time:
Let Xi be the time spent by the algorithm when the size under consideration is between n( 34 )i and
n( 34 )i+1 . The total expected running time will be:

X = X0 + X1 + X2 ...
E[X] = E[X0 ] + E[X1 ] + E[X2 ] + ...
X
= E[Xi ]
i≥0

We must calculate the expected running time at each phase, X i . This is the expected number of
repititions, 2, times the work per repetition. This is the work needed to partition the set into L
and R, which is equal to the size of the set. Therefore, E[X i ] ≤ cn( 34 )i .

2
Note: The algorithm still reduces the size of the problem if it does not choose a “good” pivot. In
fact, it might never choose a “good” pivot and it will still terminate. Or, it might choose a really
good pivot in phase i and completely skip one or several phases. We can ignore both of these facts
because we’re interested in upper bounding the expected running time.
The overal expected running time is:

X 3
E[X] ≤ 2cn( )i
4
i≥0
= 2cn · 4
= 8cn

We are done. The expected running time of the randomized K-Select algorithm is linear in the
input size.

2 Worst Case Analysis of Deterministic K-Select

This algorithm was published by: M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E.
Tarjan in 1971. We can think of it as a modification of the randomized version as follows:

BSelect(A, k)
1 if |A| = 1
2 then return A[1]
3 p = GoodPivot(A, n)
4 L = {A[i]|A[i] < p}
5 R = {A[i]|A[i] > p}
6 if |L| > k
7 then return BSelect(L, K)
8 else if |L| = k − 1
9 then return p
10 else return BSelect(R, k − |L| − 1)

The GoodPivot algorithm would return the true median in the ideal case. Of course, this is too
expensive to employ as a sub-step. Instead the following algorithm picks an element which is close
enough using less computation:

GoodPivot(A, n)
1 Divide n elements into groups of 5 elements each
2 Find the median of each group of 5
3 Use BSelect to find median p of the n5 medians
4 return p

3
Note: Five is not the only possible choice of group size. In fact, 7 element groups give better
performance, but the pictures take more time to draw.
The following picture is useful for analysis:
n/5 groups

smaller than A
median

median p
row

larger than B
median

groups with median groups with median


smaller than p = floor(n/10) larger than p = floor(n/10)

The GoodPivot algorithm returns the element p shown above. All elements in group A are guaran-
n
teed to be ≤ p and all elements in group B are guaranteed ≥ p. This gives at least 3 10 − 1 elements
≤ p which implies |R| ≤ 10 . Similarly, at least 3 10 − 1 elements ≥ p which implies |L| ≤ 7n
7n n
10 . With
this information, we can write the running time recurrence:

T (n) ≤ cn + T (n/5) + T (7n/10)

The first term represents the cost to partition the set into L and R after the pivot is chosen and to
find the median of each group in GoodPivot. The second term represents the call to BSelect which
is made in GoodPivot in order to find the median of the n5 group medians. The third term is the
recursive call to BSelect on either L or R. This running time is linear because the coefficients sum
to less than one. We will formalize this using a recursion tree:

4
Cost per level
n cn

n/5 7n/10 cn(9/10)

2
n/25 7/10*1/5*n 7/10*1/5*n 49/100*n cn(9/10)

The sum of the costs at each level is:

X 9
T (n) ≤ cn ( )i
10
i≥0
≤ 10cn

3 Lower Bounds for the Median Finding Problem

In order to find a lower bound for the Median Finding problem, we can first try to use a decision
tree in much the same way it is used to analyze sorting.

x1 ? x2

x1 ? x3 x2 ? x3

x7 x1
Leaf nodes hold the outputs.

Recall: To get the sorting lower bound, each possible ordering of the n inputs must be at a leaf
node. This requires n! leaves, and a tree depth of n log 2 (n).
For the Median Problem, we require each of the n inputs to appear as a leaf node because the
decision tree for the median must be able to output any one of these inputs. A tree with n leaves
implies the depth of the tree is ≥ log 2 (n).
This is not a very good lower bound since we know that we must consider every element at least
once and that implies a lower bound of n. Better bounds will be shown next class.

Vous aimerez peut-être aussi