Vous êtes sur la page 1sur 43

MH1402

Power of Randomiza5on:
Fingerprin5ng, Hashing,
sor5ng, searching, etc.
Danupon Nanongkai
Please leave your feedback for this lecture at h:p://goo.gl/xC4cl

In the next few classes, well learn


Fingerprin5ng Algorithm
One of Hashing Algorithms
One of Pa@ern Matching Algorithms (Ctrl+F)

And, above all,


the power of coin toss!
2

Basic Probability
If you toss a coin 1 Ome, what is the probability
that you will get a head?
(1/2)

If you toss a coin 2 Omes, what is the probability


that you will get a head?
(1/4)

If you toss a coin 10 Ome, what is the probability


that you will get a head?
()10 = 1/1,1024 1/1,000

If you toss a coin 20 Omes, what is the probability


that you will get a head?
()20 1/1,000,000

Warm-up Non-Programming Ques5on:



Co:on Candy RoO

Co:on Candy RoO

How co:on candy roO is sold: the number


of roOs you get depends on the number
on the wheel of fortune

Simplied SituaOon

1 2
7

Problem: CheaOng Sellers


Some sellers are fair, you have the same
chance of ge`ng 1 and 2.
But, some sellers are cheaOng: You will
never get number 2!
How can you check?
For now, lets assume that cheaOng sellers will
never let you get number 2. Smarter sellers
will give you number two someOmes. They
can be checked as well, but it will be harder.
8

Hint
You will never be sure if a seller is cheaOng or
not.
But you can spend a few dollars, and get a
certain amount of condent.
How many 5mes do you have to buy a ro5 to
be sure that you are right with probability at
least 999/1,000 ?
9

SoluOon
Play 10 Omes. If you get 1 every single Ome,
then announce that the seller is cheaOng.
If you get 2 for at least one Ome, then
announce that the seller is fair.

1 2

1 2

1 2


2
1
1 2

1 2

CheaOng!

Fair!
10

What is the probability that you are


wrong?
If the seller is cheaOng, you will announce
cheaOng for sure. So, you will always be
correct in this case.
If the seller is honest, you might be unlucky
and get 1 every Ome. What the chance that
its happens?
Its ( x x x ) = ()10 = 1/1,024.

11

Test yourself
How many Omes do you have to play if you
want the chance of being wrong to be at most
1/1,000,000 ?
Answer: 20 Omes: ()20 1/1,000,000

12

Try it yourself!

No, Im kidding. All games in casinos are designed to be


unfair and hard to check. Thats how they make money,
and youll end up losing all your money.
13

GeneraOng a random number in C++


(generate_random_number.cpp)

Learn more at h:p://www.cplusplus.com/arOcles/EywTUR/

14

Warm-up Programming QuesOon:


Unanimity vs. No Majority

15

Unanimity vs. No Majority


Problem: You are given an array A of n integers,
where n 1,000,000. This array either
has the same number in all entries (unanimity),
or
every number appears at most n/2 Omes (no
majority).

Examples (n=6)
Unanimity:
1 1 1 1 1 1
No majority:
1 2 1 3 4 1
Not an input:
1 2 1 1 2 1 (1 is a majority)

16

Unanimity vs. No Majority



Ques5on: How many entries in the array you
have to look at?
If you want to be correct for sure?
If you want to be correct with probability
999,999/1,000,000?

17

Algorithm that is always correct


Check if all entries are the same:
for i:=0 to n-1 do
for j:=0 to n-1 do

if A[i] A[j] then

return no majority
return unanimity

Test yourself: How many comparisons does it take?
a) n b) n2 c) n3

Can you do be@er?
18

Be:er algorithm that is always correct


Check if all entries are the same as A[0]:
for i:=1 to n-1 do
if A[i] A[0] then

return no majority
return unanimity

Test yourself: How many comparisons does it take?
a) n b) n2 c) n3


Can you do a bit be@er?
19

Even be:er algorithm that is always correct


Check if the rst (n/2)+1 entries are the same:
for i:=1 to n/2 do
if A[i] A[0] then

return no majority
return unanimity

The number of comparisons will be n/2

Can you do a bit be@er?
20

Code (uniform_vs_no_majority.cpp )

21

Randomized SoluOon that is correct


with probability 999,999/1,000,000
Main idea: Random a few entries and check if
they are all the same as A[0].
Think for a minute: How many random entries
do you need?
Answer: 20 (why?)

22

Randomized SoluOon that is correct


with probability 999,999/1,000,000
for i=0 to 20 do
let r be a random a number between 0 and n-1
if A[r] A[0] then
return no majority
return unanimity

How many comparisons does it need?
a) 10-100
b) n
c) n2
(20, to be exact)
23

Code (uniform_vs_no_majority.cpp )

24

Probability Analysis
If A is unanimous, the algorithm will be
correct.
If A has no majority (e.g. A = (1 1 2 3 3 4)),
The chance that a random A[j] is the same as A[0]
is at most ?
at most , since each number can appear at most n/2
Omes.

If we repeat 20 Omes, the chance that we get the


same number as A[0] every Ome is ?
()20 1/1,000,000.

This is the only case that we are wrong.


25

How long will these algorithms take?


Recall that the rst algorithm takes n/2
comparisons.
Recall that the second algorithm takes 20
comparisons
For n=2,000,000, the number of comparisons of
the rst algorithm is 500,000 5mes more!

26

Result from
uniform_vs_no_majority.cpp
I used n=2,000,000
The rst algorithm, which is always correct takes 2.754
milliseconds.
The second algorithm, which could be wrong, takes 0.009
milliseconds.
The second algorithm is 2,754 5mes faster!
Why didnt I get 500,000 Omes more as predicted?

We just count the number of comparisons. There are many other


factors. Regardless, our predicOon that the second algorithm is
much faster is correct.

I ran the second algorithm 10 Omes, and it always gives a


correct answer.

27

Why should you care?

Basic idea behind Deutsch-Jozsa Algorithm


Used in the study of Quantum Compu.ng
Used in formal language theory, etc.
This idea will be used again and again in other
contexts

28

Lets take a break and watch YouTube:

Random Sequences: Human vs Coin

QuesOon: Can human guess a


sequence of heads and tails that looks
as if it comes from a random coin?

29

Lets take a break and watch YouTube:

Random Sequences: Human vs Coin

This video shows why humans are terrible at


generaOng random sequences. Why ipping a
coin is dierent & introduces the concept of
frequency stability.

h:p://www.youtube.com/watch?
v=H2lJLXS3AYM&list=PLzQKN18Zu9GGLrUP2Jnc
3PJ-h7xUPFy76&index=8
30

Real Programming QuesOon:


Finding if a frequent number
exists in an array

31

Frequent Number Problem


Input: You are given an array A of n integers,
where n 2,000,000.

Problem: Is there a number that appears at
least 0.9n Omes? Answer such number if exists.

If you want to be correct for sure?
If you want to be correct with probability
999,999/1,000,000?
32

Algorithm that is always correct


Sort elements in A
The number x that that appears at least 0.9n
Omes will appear in entry n/2 (WHY?)
So, count the number of Omes A[n/2] appears.

Example: (n=10)
Input A = (1 1 1 1 2 1 1 1 1 1)
Sorted A = (1 1 1 1 1 1 1 1 1 2)
PosiOon n/2
33

Code (frequent.cpp)

34

How long will it take?


Ques5on: How much Ome (e.g. number of
comparisons) does sorOng take?

Answer: Approximately (n log n).

35

Randomized algorithm that is correct


with probability 999,999/1,000,000
Main idea: Random a few entries and count the
number of Omes each entry appears.
Think for a minute: If number x at least 0.9n
Omes, what is the chance that a randomly picked
number is x?
Answer: 9/10.

Observe: We will be done if we get x! (Why?)


QuesOon: How many random entries do you
need?
a) 6
b) 10
c) 20
Answer: 6 since (1/10)6 = 1/1,000,000
36

Randomized algorithm that is correct


with probability 999,999/1,000,000
for i=0 to 20 do
let r be a random a number between 0 and n-1
count the number of Omes A[r] appears
if it is more than 0.9n then return A[r]
return no frequent number


37

Code (frequent.cpp)

38

How long will it take?


Ques5on: How many Omes do we have to look at the
entries.

Answer:
- We have to scan through all entries once for each
of 20 entries.
- So, it will take 20n look ups. Or, roughly n.
- The number of comparisons will be roughly the
same.
39

Comparison of rough Ome


Always-correct algorithm: n log n
Can-be-wrong algorithm: n
So, the algorithm that is always correct should
be slower (at least for large n).

40

Results from the program


frequent.cpp
I used n=2,000,000
The rst algorithm which is always correct
takes 177.926 milliseconds.
The second algorithm which could be wrong
takes 5.155 milliseconds.
The second algorithm is 35 5mes faster!
I ran the second algorithm 10 Omes and it
always give a correct result
41

(Less) Frequent Number Problem


(Please Think)
Problem: You are given an array A of n integers,
where n 2,000,000.

Ques5on: Is there a number that appears more
than n/3 Omes? Answer one such number, if
exists.
If you want to be correct for sure?
If you want to be correct with probability
999,999/1,000,000?

42

Further reading/watching
Read the following link to learn how to
generate a random number:
h:p://www.cplusplus.com/arOcles/EywTUR/

43