Algorithms For Data Science: CSOR W4246

Algorithms for Data Science
CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Tuesday, September 15, 2015
Outline
1 Recap
2 The running time of Mergesort and solving recurrences
3 Binary search
4 Integer multiplication
5 Fast matrix multiplication
Today
1 Recap
3 Binary search
The divide & conquer principle
I
I
Introduced asymptotic notation

The divide & conquer principle; application: Mergesort
I
I
I
Divide the problem into a number of subproblems that are

smaller instances of the same problem.
Conquer the subproblems by solving them recursively.
Combine the solutions to the subproblems into the
solution for the original problem.
Analyzed Mergesort
1. Correctness: by induction on the size of the array
2. Space: extra (n) space
I Input to Merge is stored in auxiliary arrays
Unlike insertion-sort, Mergesort does not sort in place
Today
1 Recap
3 Binary search
Mergesort: pseudocode
Mergesort (A, lef t, right)
if right == lef t then return
end if
middle = lef t + b(right lef t)/2c
Mergesort (A, lef t, middle)
Mergesort (A, middle + 1, right)
Merge (A, lef t, middle, right)
Remarks
I
Mergesort is a recursive procedure (why?)
Initial call: Mergesort(A, 1, n)
Subroutine Merge merges two sorted lists of sizes bn/2c, dn/2e

into one sorted list of size n. How can we accomplish this?
Running time of Mergesort

The running time of Mergesort satisfies:
T (n) = 2T (n/2) + cn, for n 2, constant c > 0
T (1) = c
This structure is typical of recurrence relations
I
an inequality or equation bounds T (n) in terms of an

expression involving T (m) for m < n
a base case generally says that T (n) is constant for small

constant n
Remarks
I
We ignore floor and ceiling notations.
A recurrence does not provide an asymptotic bound for

T (n): to this end, we must solve the recurrence.
Solving recurrences, method 1: recursion trees
The technique consists of three steps

1. Analyze the first few levels of the tree of recursive calls
2. Identify a pattern
3. Sum over all levels of recursion
Example: analysis of running time of Mergesort
T (n) = 2T (n/2) + cn, n 2
T (1) = c
The recursion tree of a generic recurrence relation

The running times of many recursive algorithms can be
expressed by the following recurrence
T (n) = aT (n/b) + cnk , for a, c > 0, b > 1,k 0
What is the tree of recursive calls for this recurrence?
I
a is the branching factor
b is the factor by which the size of each subproblem shrinks
at level i, there are ai subproblems, each of size n/bi

each subproblem at level i requires c(n/bi )k work
I
the height of the tree is logb n levels
Total work:
Plogb n
i=0
ai c(n/bi )k = cnk
log
Pb n
i=0

a i
bk
Solving recurrences, method 2: the Master theorem
Theorem 1 (Master theorem).

If T (n) = aT (dn/be) + O(nk ) for some
k 0, then
O(nlogb a ) ,
O(nk log n) ,
T (n) =
O(nk ) ,
constants a > 0, b > 1,

if a > bk
if a = bk
if a < bk
Example: running time of Mergesort

I
T (n) = 2T (n/2) + cn:

a = 2, b = 2, k = 1, bk = 2 = a T (n) = O(n log n)
Solving recurrences, method 3: the substitution method

The technique consists of two steps
1. Guess a bound
2. Use (strong) induction to prove that the guess is correct
Remark 1 (simple vs strong induction).

I
Simple induction: the induction step at n requires that the

inductive hypothesis holds at step n 1.
Strong induction: the induction step at n requires that the

inductive hypothesis holds at all steps 1, 2, . . . , n 1.
Strong induction is most useful when several instances of the

hypothesis are required to show the inductive step.
Exercise: show inductively that Mergesort runs in time

O(n log n).
What about...
1. T (n) = 2T (n 1) + 1, T (1) = 2
2. T (n) = 2T 2 (n 1), T (1) = 4
3. T (n) = T (2n/3) + T (n/3) + cn
Today
1 Recap
3 Binary search
Searching a sorted array
I
I
Input: sorted list A of n integers, integer x

Output:
1. index j s.t. 1 j n and A[j] = x; or
2. no if x is not in A
I
I

Output:
Example: A = {0, 2, 3, 5, 6, 7, 9, 11, 13}, n = 9, x = 7
I
I

Output:
Example: A = {0, 2, 3, 5, 6, 7, 9, 11, 13}, n = 9, x = 7

Idea: use the fact that the array is sorted and probe specific
entries in the array.
Binary search
First, probe the middle entry. Let mid = dn/2e.
I
If x == A[mid], return mid.
If x < A[mid] then look for x in A[1, mid 1];
Else if x > A[mid] look for x in A[mid + 1, n].

Initially, the entire array is active, that is, x might be anywhere in the array.
A[mid]
A[mid]
mid
Suppose x > A[mid].

Then the active area of the array, where x might be, is to the right of mid.
A[mid]
A[mid]
mid mid+1
Binary search pseudocode

binarysearch(A, lef t, right)
if right == lef t then
if A[lef t] == x then
return lef t
else
return no
end if
else
mid = lef t + d(right lef t)/2e
if A[mid] == x then
return mid
else
if A[mid] < x then lef t = mid + 1
else right = mid 1
end if
binarysearch(A, lef t, right)
end if
end if
Binary search running time
Observation: At each step there is a region of A where x

could be and we shrink the size of this region by a factor of 2
with every probe:
I
If n is odd, then we are throwing away dn/2e elements.
If n is even, then we are throwing away at least n/2

elements.
Binary search running time
Observation: At each step there is a region of A where x

could be and we shrink the size of this region by a factor of 2
with every probe:
I
If n is odd, then we are throwing away dn/2e elements.
If n is even, then we are throwing away at least n/2

elements.
Hence the recurrence for the running time is

T (n) T (n/2) + O(1)
Sublinear running time
Here are two ways to analyze the recurrence:

1. Master theorem: b = 2, a = 1, k = 0 T (n) = O(log n).
2. We can reason as follows:
I
Starting with an array of size n, after k probes, we are left

with an array of size at most 2nk (since every time we probe
an entry the active portion of the array halves).
Hence after k = log n probes, we are left with an array of

constant size (i.e., O(1)). At that time we can just search
linearly for x in the constant size array.
Concluding remarks on binary search
1. The right data structure can improve the running time of

the algorithm significantly.
I
I
What if we used a linked list to store the input?

Arrays allow for random access of their elements: given
an index, we can read any entry in an array in time O(1)
(constant time)
2. In general, we obtain running time O(log n) when the

algorithm does a constant amount of work to throw
away a constant fraction of the input.
Today
1 Recap
3 Binary search
Integer multiplication
I
How do we multiply two integers x and y?
Elementary school method: compute a partial product by

multiplying every digit of y separately with x and then add
up all the partial products.
Remark: this method works the same in base 10 or base 2.
Examples: (12)10 (11)10 and (1100)2 (1011)2

12
1100
11
1011
12
+ 12
132
1100
1100
0000
+ 1100
10000100
Elementary algorithm running time
A more reasonable model of computation: a single operation

on a pair of digits (bits) is a primitive computational step.
Assume we are multiplying n-digit (bit) numbers.
I
O(n) time to compute a partial product.
O(n) time to combine it in a running sum of all partial

products so far.
There are n partial products, each consisting of n bits,

hence total number of operations is O(n2 ).
Can we do better?
A first divide & conquer approach

Consider n-digit decimal numbers x, y.
x = xn1 xn2 . . . x0
y = yn1 yn2 . . . y0
Idea: rewrite each number as the sum of the n/2 high-order
digits and the n/2 low-order digits.
x = xn1 . . . xn/2 xn/21 . . . x0 = xH 10n/2 + xL
|
{z
}|
{z
}
xH
xL
y = yn1 . . . yn/2 yn/21 . . . y0 = yH 10n/2 + yL

|
{z
}|
{z
}
yH
yL
where each of xH , xL , yH , yL is an n/2-digit number.
Examples
I
n = 2, x = 12, y = 11
12 = |{z}
1 |{z}
101 + |{z}
2
|{z}
x
xH
10n/2
1
xL
10n/2
yL
11 = |{z}
1 |{z}
10 + |{z}
1
|{z}
y
yH
n = 4, x = 1000, y = 1110
1000
10 |{z}
102 + |{z}
0
|{z} = |{z}
x
xH
10n/2
2
xL
10n/2
yL
11 |{z}
10 + |{z}
1110
10
|{z} = |{z}
y
yH

x y = (xH 10n/2 + xL ) (yH 10n/2 + yL )
= xH yH 10n + (xH yL + xL yH )10n/2 + xL yL
In words, we reduced the problem of solving 1 instance of size n
(i.e., one multiplication between two n-digit numbers) to the
problem of solving 4 instances, each of size n/2 (i.e., computing
the products xH yH , xH yL , xL yH and xL yL ).

x y = (xH 10n/2 + xL ) (yH 10n/2 + yL )
In words, we reduced the problem of solving 1 instance of size n
(i.e., one multiplication between two n-digit numbers) to the
problem of solving 4 instances, each of size n/2 (i.e., computing
the products xH yH , xH yL , xL yH and xL yL ).
This is a divide and conquer solution!
I
Recursively solve the 4 subproblems.
Multiplication by 10n is easy (shifting): O(n) time.
Combine the solutions from the 4 subproblems to an

overall solution using 3 additions on O(n)-digit numbers:
O(n) time.
Karatsubas observation
Running time: T (n) 4T (n/2) + cn
I
by the Master Theorem: T (n) = O(n2 )
no improvement
I
no improvement
However, if we only needed three n/2-digit multiplications, then

by the Master theorem
T (n) 3T (n/2) + cn = O(n1.59 ) = o(n2 ).
I
no improvement
However, if we only needed three n/2-digit multiplications, then

by the Master theorem
T (n) 3T (n/2) + cn = O(n1.59 ) = o(n2 ).
Recall that
x y = xH yH 10n + (xH yL + xL yH )10n/2 + xL yL
Key observation: we do not need each of xH yL , xL yH .
We only need their sum, xH yL + xL yH .
Gausss observation on multiplying complex numbers
A similar problem: multiply two complex numbers a + bi, c + di

(a + bi)(c + di) = ac + (ad + bc)i + bdi2
Gausss observation on multiplying complex numbers
A similar problem: multiply two complex numbers a + bi, c + di

(a + bi)(c + di) = ac + (ad + bc)i + bdi2
Gausss observation: can be done with just 3 multiplications
(a + bi)(c + di) = ac + ((a + b)(c + d) ac bd)i + bdi2 ,
at the cost of few extra additions and subtractions.
Unlike multiplications, additions and subtractions of n-digit
numbers are cheap: O(n) time!
Karatsubas algorithm
x y = (xH 10n/2 + xL ) (yH 10n/2 + yH )
Similarly to Gausss method for multiplying two complex
numbers, compute only the three products
xH yH , xL yL , (xH + xL )(yH + yL )
and obtain the sum xH yL + xL yH from
(xH + xL )(yH + yL ) xH yH xL yL = xH yL + xL yH .
Combining requires O(n) time hence
T (n) 3T (n/2) + cn = O(nlog2 3 ) = O(n1.59 )
Pseudocode
Let k be a small constant.

Integer-Multiply(x, y)
if n == k then
return xy
end if
write x = xH 10n/2 + xL , y = yH 10n/2 + yL
compute xH + xL , yH + yL
product = Integer-Multiply(xH + xL , yH + yL )
xH yH = Integer-Multiply(xH , yH )
xL yL = Integer-Multiply(xL , yL )
return xH yH 10n + (product xH yH xL yL )10n/2 + xL yL
Concluding remarks
To reduce the number of multiplications we do few more

additions/subtractions: these are fast compared to
multiplications.
There is no reason to continue with recursion once n is

small enough: the conventional algorithm is probably more
efficient since it uses fewer additions.
When we recursively compute (xH + xL )(yH + yL ), each of

xH + xL , yH + yL might be (n/2 + 1)-digit integers. This
does not affect the asymptotics.
Today
1 Recap
3 Binary search
Fast matrix multiplication
Matrix multiplication: a fundamental primitive in numerical

linear algebra, scientific computing, machine learning and
large-scale data analysis.
I
Input: m n matrix A, n p matrix B
Output: m p matrix C = AB

Example: A = 10 01 , B = 11 11 , C =
I
1 0
0 1
Lower bounds on matrix multiplication algorithms for

m, p = (n)?
Conventional matrix multiplication
for 1 i m do
for 1 j p do
n
P
ai,k bk,j
ci,j =
k=1
end for
end for
I
Running time?
Can we do better?
A first divide & conquer approach: 8 subproblems

Assume square A, B where n = 2k for some k > 0.
Idea: express A, B as 2 2 block matrices and use the
conventional algorithm to multiply the two block matrices.
n/2n/2
z}|{

A11 A12
B11
A21 A22
B21
B12
B22

C11
=
C21
where
C11 = A11 B11 + A12 B21
C12 = A11 B12 + A12 B22
C21 = A21 B11 + A22 B21
C22 = A21 B12 + A22 B22
Running time?
C12
C22
Strassens breakthrough: 7 subproblems suffice (part 1)

Compute the following ten n/2 n/2 matrices.
1. S1 = B11 B22
2. S2 = A11 + A12
3. S3 = A21 + A22
4. S4 = B21 B11
5. S5 = A11 + A22
6. S6 = B11 + B22
7. S7 = A12 A22
8. S8 = B21 + B22
9. S9 = A11 A21
10. S10 = B11 + B12
Running time?
Strassens breakthrough: 7 subproblems suffice (part 2)

Compute the following seven products of n/2 n/2 matrices.
1. P1 = A11 S1
2. P2 = S2 B22
3. P3 = S3 B11
4. P4 = A22 S4
5. P5 = S5 S6
6. P6 = S7 S8
7. P7 = S9 S10
Compute C as follows:
1. C11 = P4 + P5 + P6 P2
2. C12 = P1 + P2
3. C21 = P3 + P4
4. C22 = P1 + P5 P3 P7
Running time?
Strassens running time and concluding remarks
Recurrence: T (n) = 7T (n/2) + cn2
By the Master theorem:

T (n) = O(nlog2 7 ) = O(n2.81 )
Recently, there is renewed interest in Strassens algorithm

for high-performance computing: thanks to its lower
communication cost (number of bits exchanged between
machines in the network or data center), it is better suited
than the traditional algorithm for multi-core processors.

Algorithms For Data Science: CSOR W4246

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Algorithms For Data Science: CSOR W4246

Transféré par

Droits d'auteur :

Formats disponibles

Algorithms for Data Science

2 The running time of Mergesort and solving recurrences

5 Fast matrix multiplication

The divide & conquer principle

Introduced asymptotic notation

Divide the problem into a number of subproblems that are

Mergesort is a recursive procedure (why?)

Initial call: Mergesort(A, 1, n)

Subroutine Merge merges two sorted lists of sizes bn/2c, dn/2e

Running time of Mergesort

an inequality or equation bounds T (n) in terms of an

a base case generally says that T (n) is constant for small

We ignore floor and ceiling notations.

A recurrence does not provide an asymptotic bound for

Solving recurrences, method 1: recursion trees

The technique consists of three steps

The recursion tree of a generic recurrence relation

a is the branching factor

b is the factor by which the size of each subproblem shrinks

at level i, there are ai subproblems, each of size n/bi

the height of the tree is logb n levels

Solving recurrences, method 2: the Master theorem

Theorem 1 (Master theorem).

constants a > 0, b > 1,

Example: running time of Mergesort

T (n) = 2T (n/2) + cn:

Solving recurrences, method 3: the substitution method

Remark 1 (simple vs strong induction).

Simple induction: the induction step at n requires that the

Strong induction: the induction step at n requires that the

Strong induction is most useful when several instances of the

Exercise: show inductively that Mergesort runs in time

2. T (n) = 2T 2 (n 1), T (1) = 4

3. T (n) = T (2n/3) + T (n/3) + cn

Searching a sorted array

Input: sorted list A of n integers, integer x

Searching a sorted array

Input: sorted list A of n integers, integer x

Example: A = {0, 2, 3, 5, 6, 7, 9, 11, 13}, n = 9, x = 7

Searching a sorted array

Input: sorted list A of n integers, integer x

Example: A = {0, 2, 3, 5, 6, 7, 9, 11, 13}, n = 9, x = 7

If x == A[mid], return mid.

If x < A[mid] then look for x in A[1, mid 1];

Else if x > A[mid] look for x in A[mid + 1, n].

Suppose x > A[mid].

Binary search pseudocode

Binary search running time

Observation: At each step there is a region of A where x

If n is odd, then we are throwing away dn/2e elements.

If n is even, then we are throwing away at least n/2

Binary search running time

Observation: At each step there is a region of A where x

If n is odd, then we are throwing away dn/2e elements.

If n is even, then we are throwing away at least n/2

Hence the recurrence for the running time is

Sublinear running time

Here are two ways to analyze the recurrence:

Starting with an array of size n, after k probes, we are left

Hence after k = log n probes, we are left with an array of

Concluding remarks on binary search

1. The right data structure can improve the running time of

What if we used a linked list to store the input?