Académique Documents
Professionnel Documents
Culture Documents
I T H A S TA K E N P L A C E .
G E O R G E B E R N A R D S H AW
W O R D S E M P T Y A S T H E W I N D A R E B E S T L E F T U N S A I D.
HOMER
E V E RY T H I N G B E C O M E S A L I T T L E D I F F E R E N T A S S O O N A S I T I S S P O K E N
O U T L O U D.
HERMANN HESSE
L A N G U A G E I S A V I R U S F R O M O U T E R S PA C E .
WILLIAM S. BUROUGHS
A N U P R A O , A M I R Y E H U D AYO F F
C O M M U N I C AT I O N
COMPLEXITY
( E A R LY D R A F T )
Contents
I Fundamentals 15
1 Deterministic Protocols 17
Some Examples of Communication Problems and Protocols 17
Defining 2 party protocols 19
Rectangles 20
Balancing Protocols 21
From Rectangles to Protocols 22
Some lower bounds 23
Rectangle Covers 29
Counterexample to deterministic direct sum of relations 31
2 Rank 33
Basic Properties of Rank 33
Lower bounds using Rank 35
Towards the Log-Rank Conjecture 37
Non-negative Rank and Covers 42
3 Randomized Protocols 45
Variants of Randomized Protocols 47
Public Coins vs Private Coins 49
Nearly Monochromatic Rectangles 50
6
4 Numbers On Foreheads 53
Cylinder Intersections 56
Lower bounds from Ramsey Theory 57
5 Discrepancy 63
Some Examples Using Convexity in Combinatorics 64
Lower bounds for Inner-Product 65
Lower bounds for Disjointness in the Number-on-Forehead model 68
6 Information 73
Entropy, Divergence and Mutual Information 75
Some Examples from Combinatorics 80
Lower bound for Indexing 84
Randomized Communication of Disjointness 85
Lower bounds on Non-Negative Rank 88
Lower bound for Number of Rounds 90
7 Compressing Communication 97
Correlated Sampling 99
Compressing a Single Round of Communication 101
Compressing Entire Protocols with Low Internal Information 106
Lower bounds from Compression Theorems 110
II Applications 115
Bibliography 185
Introduction
Acknowledgements
Thanks to Morgan Dixon, Abe Friesen, Mika Göös, Jeff Heer, Pavel
Hrubeš, Guy Kindler, Vincent Liew, Venkatesh Medabalimi, Shay
Moran, Rotem Oshman, Sebastian Pokutta, Kayur Patel, Sivaramakr-
ishnan Natarajan Ramamoorthy, Cyrus Rashtchian, Thomas Rothvoß,
and Makrand Sinha for many contributions to this book.
Conventions and Preliminaries
In this chapter, we set up notation and explain some basic facts that Anup Says: edited
are used throughout the book.
For a positive integer h, we use [h] to denote the set {1, 2, . . . , h}.
2[h] denotes the power set, namely the family of all subsets of [h].
All logarithms are computed base 2 unless otherwise specified. A
boolean function is a function whose values are in the set {0, 1}.
Random variables are denoted by capital letters (e.g. A) and
values they attain are denoted by lower-case letters (e.g. a). Events
in a probability space will be denoted by calligraphic letters (e.g.
E ). Given a = a1 , a2 , . . . , an , we write a≤i to denote a1 , . . . , ai . We
define a<i similarly. We write aS to denote the projection of a to
the coordinates specified in the set S ⊆ [n]. [k] to denotes the set
{1, 2, . . . , k}, and [k]<n denotes the set of all strings of length less than
n over the alphabet [k ], including the empty string. |z| denotes the
length of the string z.
Graphs
Probability
Anup Says: Removed mention of
Throughout this book, we consider only finite probability spaces. We measurability, since this will confuse
use the notation p( a) to denote both the distribution on the variable a, readers that don’t know what that is.
and the number Pr p [ A = a]. The meaning will be clear from context.
We write p( a|b) to denote either the distribution of A conditioned on
the event B = b, or the number Pr[ A = a| B = b]. Given a distribution
p( a, b, c, d), we write p( a, b, c) to denote the marginal distribution on
the variables a, b, c (or the corresponding probability). We often write
p( ab) instead of p( a, b) for conciseness of notation. If E is an event,
we write p(E ) to denote its probability according to p. We denote by
E p(a) [ g( a)] the expected value of g( a) in p. We write A − M − B to
assert that p( amb) = p(m) · p( a|m) · p(b|m).
The statistical distance (also known as total variational distance)
Anup Says: changed wording and
between p( x ) and q( x ) is defined to be: emphasis
E [| p( a|b) − p( a)|] ≤ e.
p(b)
E [ X ] > p( X > γ) · γ,
1
so p( X > γ) < E [ X ] /γ. e− x
1−x
2−2x
0.8
Cauchy-Schwartz Inequality
The Cauchy-Schwartz inequality says that for two vectors x, y ∈ Rn ,
their inner product is at most the products of their lengths:
s s
n n n
∑ xi yi = hx, yi ≤ k xk · kyk = ∑ xi2 · ∑ y2i .
i =1 i =1 i =1
Convexity
f ( x )+ f (y)
A function f : R → R is said to be convex if 2 ≥ f x+2 y , for
f ( x )+ f (y) x +y
all x, y in the domain. It is said to be concave if 2 ≤ f 2 .
Some convex functions: x2 , e x , x log x. Some concave functions:
√
log x, x.
Jensen’s inequality says if a function f is convex, then E [ f ( X )] ≥ x log x
√
f (E [ X ]), for any real-valued random variable X. Similarly, if f is x
concave, then E [ f ( X )] ≤ f (E [ X ]).
A consequence of Jensen’s inequality is the Arithmetic-Mean
Geometric-Mean inequality:
!1/n 1 2 3 4
∑in=1 ai n
n
≥ ∏ ai , x
i =1
Fundamentals
1
Deterministic Protocols
Equality Alice and Bob are given two n-bit strings x, y ∈ {0, 1}n and
want to know if x = y. There is a trivial solution: Alice can send
her input to Bob, and Bob can let her know if x = y. This is a 1
These terms will be made clear in due
deterministic1 protocol that takes n + 1 bits of communication, and course.
we shall prove that no deterministic protocol is more efficient. On
the other hand, for every number k, there is a randomized1 protocol
that uses only k + 1 bits of communication and errs with probability
at most 2−k : the parties can hash their inputs and check that the
Anup Says: added explicit dependence
hashes are the same. There is a non-deterministic1 protocol that on error
uses O(log n) bits of communication: If Alice guessed an index i
where xi 6= yi , she could send it to Bob and they could confirm
that their inputs are not the same.
Cliques and Independent Sets Alice and Bob are given and graph G
on n vertices, and two subsets A, B ⊆ [n] of the vertices. Alice
knows A and G and Bob knows B and G. In addition, A is always A clique is a set of vertices that are all
connected to each other. An indepen-
a clique in the graph, and B is always an independent set. They dent set is a set of vertices that contains
want to know whether A intersects B or not. There is no one-way no edges.
protocol that solves this problem efficiently using less than n bits
Anup Says: added clique and ind set
of communication. However, there is an interactive protocol that
descriptions
uses O(log2 n) bits of communication to solve the problem. If
18 communication complexity
O(k log n) that solves this problem, but any protocol with fewer
than k rounds requires Ω(n) bits of communication.
Graph Connectivity The input is an undirected graph on the vertices
[n]. There are k parties, and the j’th party knows all of the edges
except those that touch the vertices of [( j − 1)n/k, jn/k]. The parties
want to know whether 1 is connected to n in the graph. The trivial
deterministic protocol takes O(n2 /k) bits of communication. One
can show that there is no randomized protocol with less than n/2k
bits of communication.
Fact 1.2. The number of leaves in the protocol tree of π is at most 2kπ k .
Rectangles
Xv = { x ∈ X : ∃y ∈ Y ( x, y) ∈ Rv },
Yv = {y ∈ Y : ∃ x ∈ X ( x, y) ∈ Rv }.
Lemma 1.4. For every vertex v in the protocol tree, Rv is a rectangle with
Rv = Xv × Yv . Moreover, the rectangles given by all the leaves of the
protocol tree form a partition of the inputs to the protocol.
The lemma follows by induction. For the root vertex r, we see that
Rr = X × Y , so indeed the lemma holds. Now consider an arbitrary
vertex v such that Rv = Xv × Yv . Let u, w be the children of v in the
protocol tree. Suppose the first party is associated with v, and u is the
vertex that the players move to when f v ( x ) = 0. Thus
Figure 1.2: A partition of the space into
rectangles.
X u = { x ∈ X v : f v ( x ) = 0},
Anup Says: moved equation inline
X w = { x ∈ X v : f v ( x ) = 1},
deterministic protocols 21
Lemma 1.2 gives the best bound when the protocol tree is a
Anup Says: wording
full binary tree; then the number of leaves in the protocol tree is
exactly 2c . Does it ever make sense to have a protocol tree that is not Anup Says: wording
balanced? It turns out that one can always balance an unbalanced
tree.
Input: Alice knows x ∈ X , Bob
knows y ∈ Y , both know a
Theorem 1.6. If π is a protocol with ` leaves, then there is a protocol that protocol π that has ` leaves.
computes the outcome π ( x, y) with length at most d2 log3/2 `e. Output: The outcome of the
protocol π.
To prove the theorem, we need a simple lemma about trees.
while π has more than 1 leaf do
Find a vertex v as promised by
Lemma 1.7. In every protocol tree that has ` > 1 leaves, there is a vertex v Lemma 1.7;
such that the subtree rooted at v contains r leaves, and `/3 ≤ r < 2`/3. Alice and Bob exchange two
bits indicating if their inputs
Proof. Consider the sequence of vertices v1 , v2 , . . . defined as follows. are consistent with the path
in the protocol tree to v;
The vertex v1 is the root of the tree, which is not a leaf by the assump- if both inputs are consistent with
tion on `. For each i > 0, the vertex vi+1 is the child of vi that has v then
Replace π with the
the most leaves under it, breaking ties arbitrarily. Let `i denote the
subtree rooted at v;
number of leaves in the subtree rooted at vi . Then, `i+1 ≥ `i /2, and else
`i+1 < `i . Since `1 = `, and the sequence is decreasing until it hits 1, Remove v from the
protocol tree, and
there must be some i for which `/3 ≤ `i < 2`/3. replace v’s parent with
v’s sibling;
In each step of the balanced protocol (see Figure 1.4), the parties end
pick a vertex v as promised by Lemma 1.7, and decide whether end
Output the unique leaf in π;
( x, y) ∈ Rv using two bits of communication. That is, Alice sends
a bit indicating if x ∈ Xv and Bob sends a bit indicating if y ∈ Yv . Figure 1.4: Balancing Protocols
22 communication complexity
Observe that the set of horizontally good rectangles is determined Input: Alice knows x ∈ X , Bob
knows y ∈ Y , both know a
by x, and the set of vertically good rectangles is determined by y. set of monochromatic
Suppose g( x, y) = 0. Then there must be a rectangle R x,y ∈ R0 that rectangles R whose union
contains ( x, y). Since the rectangles of R1 are disjoint from R x,y , Fact contains ( x, y).
Output: g( x, y).
1.9 implies that every rectangle in R1 does not intersect R x,y both
horizontally and vertically. Thus either at most half of the rectangles while R1 is not empty do
if ∃ R ∈ R0 that is horizontally
in R1 intersect R x,y horizontally, or at most half of them intersect good then
R x,y vertically. Moreover, any such rectangle is consistent with both Alice sends Bob the name
of R;
players’ input. So we have shown: Both parties discard all
rectangles from R1 that
Claim 1.11. Any rectangle of R0 that contains ( x, y) is either horizontally do not horizontally
good, or vertically good. intersect R;
else if ∃ R ∈ R0 that is
In each step of the protocol, one of the parties announce the name vertically good then
Bob sends Alice the name
of a rectangle that is either horizontally good or vertically good, if of R;
such a rectangle exists. This leads to half of the rectangles in R1 Both parties discard all
rectangles from R1 that
being discarded. If no such rectangle exists, then it must mean that do not vertically
no rectangle of R0 covers ( x, y), and so g( x, y) = 1. Since R1 can intersect R;
survive at most c + 1 such discards, and a rectangle in the family else
The parties output 1;
can be described with c bits of communication, the communication end
complexity of the protocol is at most O(c2 ). end
The parties output 0;
Recent work4 has shown that there is function g under which the
inputs can be partitioned into 2c monochromatic rectangles, yet no Figure 1.7: A Protocol from Monochro-
matic Rectangle Covers
protocol can compute g using o (c2 ) bits of communication, showing
that Theorem 1.8 is tight. 4
Göös et al., 2015; and Kothari, 2015
Alice can send Bob her input, and Bob can respond with the value
of a function, giving a protocol with complexity n + 1. Is there
a protocol with complexity n? Since any such protocol induces
a partition into 2n monochromatic rectangles, a first attempt at
proving a lower bound might to try and show that there is no
large monochromatic rectangle. If we could prove that, then we
could argue that many monochromatic rectangles are needed
to cover the whole input. However, the equality function does
have large monochromatic rectangles. For example, the rectangle
R = {( x, y) : x1 = 0, y1 = 1}. This is a rectangle that has
density 14 , and it is monochromatic, since EQ( x, y) = 0 for every
( x, y) ∈ R. We will try to show that equality does not have a large
1-monochromatic rectangle, and argue that this is good enough to
prove a lower bound.
Observe that if x 6= x 0 , then the points ( x, x ) and ( x 0 , x 0 ) cannot be
in the same monochromatic rectangle. Otherwise, by Lemma 1.3,
( x, x 0 ) would also have to be included in this rectangle. Since the
rectangle is monochromatic, we would have EQ( x, x 0 ) = EQ( x, x ),
which is a contradiction. We have shown:
Alice can send her whole set X to Bob, which gives a protocol
with communication n + 1. Can we prove that this is optimal? Once
again, this function does have large monochromatic rectangles,
for example the rectangle R = {( X, Y ) : 1 ∈ X, 1 ∈ Y }, but we
shall show that there are no large monochromatic 1-rectangles.
Indeed, suppose R = A × B is a 1- monochromatic rectangle. Let
X 0 = ∪ X ∈ A X and Y 0 = ∪Y ∈ B Y. Then X 0 and Y 0 must be disjoint,
0 0
so | X 0 | + |Y 0 | ≤ n. On the other hand, | A| ≤ 2| X | , | B| ≤ 2|Y | , so
| R| = | A|| B| ≤ 2n . We have shown:
Richness
Sometimes we need to understand asymmetric communication proto-
cols, where we need separate bounds on the communication complex-
5
Miltersen et al., 1998
ity of Alice and Bob. The concept of richnesst5 is useful here:
The disjointness matrix here is at least (tk , 2kt )-rich, since every
choice Y allows for tk possible choices for X that are disjoint. By
Lemma 1.17, any protocol where Alice sends a bits and Bob sends
b bits induces a 1-monochromatic rectangle with dimensions
tk /2a × 2kt−a−b , so Claim 1.18 gives:
a/k
2kt− a−b ≤ 2kt−kt/2
n
⇒ a + b ≥ a/k+1 .
2
We conclude:
Theorem 1.19. If X, Y ⊆ [n], | X | = k and Alice sends at most a bits
and Bob sends at most b bits in a protocol computing Disj( X, Y ), then
a + b ≥ 2a/kn+1 .
For example, for k = 2, if Alice sends at most log n bits to Bob,
√
then Bob must send at least Ω( n) bits to Alice in order to solve
lopsided disjointness.
Span Suppose Alice is given a vector x ∈ {0, 1}n , and Bob is given
a n/2 dimensional subspace V ⊆ {0, 1}n . Their goal is figure out
whether or not x ∈ V. As in the case of disjointness, we start by
claiming that the inputs do not have 1-monochromatic rectangles
of a certain shape:
Claim 1.20. If A × B is a 1-monochromatic rectangle, then | B| ≤
2
2n /2−n log | A| .
Proof. The set of x’s in the rectangle spans a subspace of di-
mension at least log | A|. The number of n/2 dimensional sub-
n
spaces that contain this span is thus at most (n/2−2log | A|) ≤
2 /2− n log | A |
2n .
deterministic protocols 27
2
The problem we are working with is at least (2n/2 , 2n /4 /n!)-
2 A subspace of dimension n/2 can be
rich, since there are at least 2n /4 /n! subspaces, and each contains specified by picking the n/2 basis
2n/2 vectors. Applying Lemma 1.17 and Claim 1.20, we get that if vectors. For each such vector, there are
at least 2n/2 available choices. However,
there is a protocol where Alice sends a bits and Bob sends b bits,
we have over counted by a factor of n!,
then since every permutation of the basis
vectors gives the same subspace. This
2 /4− a − b 2 /2− n log 2n/2− a
2n /n! ≤ 2n 2
gives that there are at least 2n /4 /n!
subspaces.
⇒ n2 /4 − a(n + 1) − n log n ≤ b.
Theorem 1.21. If Alice sends a bits and Bob sends b bits to solve the span
problem, then b ≥ n2 /4 − a(n + 1) − n log n.
For example, if Alice sends at most n/8 bits, then Bob must
send at least Ω(n2 ) bits in order to solve the span problem. One of
the players must send a linear number of the bits in their input.
Fooling Sets
A set S ⊂ X × Y is called a fooling set if every monochromatic
rectangle can share at most 1 element with S. Fooling sets can be Anup Says: wording
used to prove several basic lower bounds on communication.
Krapchenko’s Method
We end the lower bounds part of this chapter with a method for non-
boolean relations. Let X = { x ∈ {0, 1}n : ∑in=1 xi = 0 mod 2} and
Y = {y ∈ {0, 1}n : ∑in=1 yi = 1 mod 2}. Since X and Y are disjoint,
for every x ∈ X , y ∈ Y , there is an index i such that xi 6= yi . Suppose
Alice is given x and Bob is given y, and they want to find such an
index i. How much communication is required?
Perhaps the most trivial protocol is for Alice to send Bob her entire
string, but we can use binary search to do better. Notice that
∑ xi + ∑ xi 6 = ∑ yi + ∑ yi mod 2.
i ≤n/2 i >n/2 i ≤n/2 i >n/2
Alice and Bob can thus exchange ∑i≤n/2 xi mod 2 and ∑i≤n/2 yi
mod 2. If these values are not the same, they can safely restrict their
attention to the strings x≤n/2 , y≤n/2 and continue. On the other hand,
if the values are the same, they can continue the protocol on the
strings x>n/2 , y>n/2 . In this way, in every step they communicate
2 bits and eliminate half of their input string, giving a protocol of
communication complexity 2 log n.
It is easy to see that log n bits of communication are necessary,
We need at least n monochromatic
because that’s how many bits it takes to write down the answer.
rectangles to cover pairs of the type
Now we shall prove that 2 log n bits are necessary, using a variant of (0, ei ), where ei is the i’th unit vector.
fooling sets. Consider the set of inputs
!2
t t √
2n−2
2 ≥ ∑ ri2 ≥ ∑ ri / t = n2 22n−2 /t,
i =1 i =1
proving that t ≥ n2 . This shows that the binary search protocol is the
best one can do.
Rectangle Covers
gk (( x1 , . . . , xk ), (y1 , . . . , yk )) = ( g( x1 , y1 ), g( x2 , y2 ), . . . , g( xk , yk )).
7
Feder et al., 1995
We shall use many of the ideas we have developed so far to prove
that7 :
deterministic protocols 31
In fact, one can show that even computing the two bits ∧ik=1 g( xi , yi ),
√ See Exercise ??.
and ∨ik=1 g( xi , yi ) requires k( c − log n − 1) bits of communication8 .
8
Theorem 1.8 and Lemma 1.29 imply that g has a protocol with
communication (`/k + log n + 1)2 . Thus,
as required.
Now we turn to proving Lemma 1.29. We find rectangles that
cover the inputs to g iteratively. Let S ⊆ {0, 1}n × {0, 1}n denote
the set of inputs to g that have not yet been covered by one of the
monochromatic rectangles we have already found. Initially, S is the
set of all inputs. We claim:
We repeatedly pick rectangles using Claim 1.30 until all of the in-
puts to g are covered. After d2n2`/k e steps, the number of uncovered
inputs is at most
`/k −`/k ·2n2`/k
22n · (1 − 2−`/k )2n2 ≤ 22n e−2 = 22n · e−2n < 1. Using 1 − x ≤ e− x for all x.
The most basic quantity associated with a matrix is its rank. The
rank of a matrix is the maximum size of a set of linearly independent
rows in the matrix. Its versatility stems from the fact that it has many
interpretations:
0’s with 1. Then we see that M0 = J − 2M, where J is the all 1’s matrix,
and so
The matrices we are working with are boolean, so one can view
the entries of the matrix as real numbers, or rationals, or coming
from the field of integers modulo 2: F2 . This potentially leads to 3
different notions of rank, but we have:
Lemma 2.7. The real rank of a boolean matrix is the same as its rational
rank. The real rank is always at least as large as the rank over F2 .
The proof of the first fact follows from Gaussian elimination. If the
rank over the rationals is r, we can always apply a linear transforma-
tion to the rows using rational coefficients to bring the matrix into
this form:
1 0 0 . . . 0 M1,r+1 . . . M1,n
0 1 0 . . . 0 M2,r+1 . . . M2,n
0 0 1 . . . 0 M3,r+1 . . . M2,n
.. .. .. . . ..
. . . . .
0 0 0 . . . 1 M Mr,n
r,r +1 . . .
0 0 0 . . . 0 0 ... 0
.. .. .. .. .. .. .. ..
. . . . . . . .
This transformation does not affect the rank over the reals, and now
it is clear that the rank is exactly r. Now if any set of rows is linearly
rank 35
The trivial protocol takes n bits, and one case use bounds on the
size of the largest rectangle to show that the communication is at
least Ω(n). Here it will be helpful to use Fact 2.4. If Pn represents See Exercise ??
the matrix whose entries are , sorting the rows and columns
lexicographically, we see that
" # " #
Pn−1 Pn−1 1 1
Pn = = ⊗ Pn−1 ,
Pn−1 − Pn−1 1 −1
Theorem 2.13. If the rank of a matrix is r, its communication complexity is Lovett actually proves that√the commu-
√
at most O( r log2 r ). nication is bounded by O( r log r ), but
we prove the weaker bound here for
The proof of Theorem 2.13 relies3 on a powerful theorem from ease of presentation.
convex geometry called John’s theorem4 . We use it to show:
3
Rothvoß, 2014
Lemma 2.14. Any m × n boolean matrix of rank√r > 1 must have a
monochromatic rectangle of size at least mn · 2−20 r log r . 4
John, 1948
Let us see how to use Lemma 2.14 to get a protocol. Let R be the
rectangle promised by the lemma. Then,
" rearranging
# the rows and
R A
columns, we can write the matrix as: . Now we claim5 that
B C
38 communication complexity
" #! " #!
R h i R A
rank + rank R A ≤ rank + 3.
B B C
So by Fact 2.3,
6
(t + 3)/2 ≤ 2t/3, when t ≥ 9.
" #!
R h i
rank + rank R A ≤ rank( A) + rank( B) + 2 7
Fact: 1 − x ≤ e− x , for x ≥ 0.
B
" #!
0 A
≤ rank +2 Input: Alice knows i, Bob knows j.
B C Output: Mi,j .
" #!
R A while rank( M ) > 9 do
≤ rank + 3. (2.2) Find a monochromatic
B C
rectangle R as promised by
" # Lemma 2.14;
R Write M =
R A
;
Now suppose has the smaller rank. Then Bob sends the bit B C
B
R
0 if his input is consistent with R and 1 otherwise. If it is consistent, if rank >
B
then if rank( M ) > 9, players have reduced6 the rank of the matrix by rank R A then
if i is consistent with R
a factor of at least 23 . If it is not consistent, the √
players have reduced then
the size of the matrix by a factor of 1 − 2−20 r log r . Both parties
replace
M with R A ;
By Lemma 2.8, we can assume that any matrix of rank r has at
else
most 2r rows and columns. The √
number of 0 transmissions in this Both parties
replace
protocol is at most 2r ln 2 · 2 20 r log r , since after that many transmis- M with B C ;
end
sions, the number of entries in the matrix have been reduced to7 else
√ √ √ √ if j is consistent with R
r log r 2r ·220 r log r −20 r log r 2r ln 2·220 r log r
22r (1 − 2−20 ) < 22r e−2 then
Both parties
replace
= 22r e−2r ln 2 = 1. M with
R
;
B
else
The number of 1 transmissions is at most O(log3/2 r ), since af- Both parties
replace
ter that many transmissions, the rank of the matrix is reduced to A
M with ;
less than√ 6. Thus, the number of leaves in this protocol is at most C
20 r log r √ end
(2r lnlog
2·2
) ≤ 2O( r log2 r ) .
By Theorem 1.6, we can balance the
3/2 r
end
√
protocol tree to obtain a protocol with communication O( r log2 r ) end
The parties exchange at most 9 bits
that computes the same function. to compute Mi,j , using Theorem
It only remains to prove Lemma 2.14. To prove it, we need to 2.9;
understand John’s theorem. A set K ⊆ Rr is called convex if whenever Figure 2.1: Protocol for Low Rank
√
r log2 r )
x, y ∈ K, then all the points on the line from x to y are also in K. The Matrices with 2O( leaves.
rank 39
For the rest of the proof, we assume that M has as least mn/2 0’s.
We can do this, because if M has more 1’s than 0’s, we can replace M
with J − M, where J is the all 1’s matrix. This can increase the rank by
at most 1, but now the role of 0’s and 1’s has been reversed.
Lemma 2.17 says something about the angles between the vectors
hvi ,w j i
we have found. Define θi,j = arccos kv kkw k . Then observe that
i j
when vi , w j are orthogonal, the angle is π/2. But when the inner
product is 1, the angle is at most arccos √1r ≤ π2 − 72π
√ .
r
So we get:
= π if Mi,j = 0,
2
θi,j 2π
≤ − √ if Mi,j = 1.
π
2 7 r
vi
π
4
wj
vi
For a fixed (i, j) and k the probability that hvk , zk i > 0 and
π/2−θi,j 2p
1 p
hwk , zk i < 0 is exactly 4 − 2π . So we get 7 r
t
= 1 if Mi,j = 0, wj
4
Pr[(i, j) ∈ R] t
R ≤ 1 − 1
√ if Mi,j = 1.
4 7 r
mn −14√r log r
E [ Q] ≥ ·2 · (1 − 1/r )
2 √
≥ mn · 2−16 r log r . since r > 1.
42 communication complexity
T T0 A1 , . . . , Ar 0 T 00
Another way to measure the complexity of a matrix is by measur- Figure 2.5: Going from a nearly
monochromatic rectangle to a
ing its non-negative rank. The non-negative rank of a m × n boolean monochromatic rectangle.
matrix M is the smallest number r such that M = AB, where A, B are
matrices with non-negative entries, such that A is an m × r matrix
and B is an r × n matrix. Equivalently, it is the smallest number of
non-negative rank 1 matrices that sum to M. Clearly, we have
Moreover, one can prove that if a matrix has both small rank and a
10
Lovász, 1990
small cover, then there is a small communication protocot10 :
Proof. The protocol is similar to the one used to prove Theorem 1.8.
For every rectangle R in the cover, we can write
" #
R A
M= ,
B C
or
" #!
R
rank ≤ (rank( M) − 3)/2. (2.4)
B
Exercise 2.1
Fix a function f : X × Y → {0, 1} with the property that in
every row and column of the communication matrix M f there are
exactly t ones. Cover the zeros of M f using O(t(log| X | + log|Y |))
monochromatic rectangles.
44 communication complexity
Exercise 2.2
Show that Nisan-Widgerson protocol (i.e., the proof of Lemma
2.13) goes though even if we weaken Lemma 2.14 to only guarantee
a rectangle with rank at most r/8 (instead of rank at most one, or
monochromatic).
Exercise 2.3
Recall that for a simple, undirected graph G the chromatic number
χ( G ) is the minimum number of colors needed to color the vertices
of G so that no two adjacent vertices have the same color. Show that
log χ( G ) is at most the deterministic communication complexity of
G’s adjacency matrix.
Exercise 2.4
For any symmetric matrix M ∈ {0, 1}n×n with ones in all diagonal
entries, show that
n2
2c ≥ ,
| M|
where c is the deterministic communication complexity of M, and
| M| is the number of ones in M.
Exercise 2.5
For any boolean matrix M, define rank2 ( M) to be the rank of M
over F2 , the field with two elements. Exhibit an explicit family of ma-
trices M ∈ {0, 1}n×n with the property that c ≥ rank2 ( M )/10, where c
is the deterministic communication complexity of M. Conclude that
this falsifies the analogue of log-rank conjecture for rank2 .
Exercise 2.6
√
Show that if f has fooling set of size s then rk ( M f ) ≥ s. Hint:
tensor product.
3
Randomized Protocols
used the rank method to argue that at least log (nk) ≈ k log(n/k ) Let J = [n];
bits of communication are required. Here we give a randomized while | J | > 1 do
Let J 0 be the first | J |/2
protocol2 that requires only O(k) bits of communication3 , which is elements of J;
more efficient when k n. Both parties use shared
randomness to sample a
Alice and Bob sample a sequence of sets R1 , R2 , . . . ⊆ [n], un- random function
formly at random. They exchange 2 bits to announce whether or 0
h : {0, 1}| J | → {0, 1}2 log log ` ;
not their sets are empty. If neither set is empty, Alice announces Alice sends h evaluated on the
bits in J 0 , h( x J 0 );
the index of the first set Ri that contains her set, and Bob an- Bob announces whether or not
nounces the index of the first set R j that contains his set. Now h ( x J 0 ) = h ( y J 0 );
if h( x J 0 ) = h(y J 0 ) then
Alice can safely replace her set with X ∩ R j , and Bob can replace Alice and Bob replace
his set with Y ∩ Ri . If at any point one of the parties is left with an J = J \ J0;
empty set, they can safely conclude that the inputs were disjoint. else
Alice and Bob replace
We will argue that if the sets are disjoint, this process terminates J = J0;
after O(k) bits of communication. end
Both parties announce x J , y J ;
Assume that X, Y are disjoint. Let us start by analyzing the
end
expected number of bits that will be communicated in the first
Figure 3.3: Public-coin protocol for
step. We claim: greater than.
2
Håstad and Wigderson, 2007
Claim 3.1. E [i ] = 2| X | , E [ j] = 2 |Y | .
3
Later, we show that Ω(k ) bits are
required.
Proof. The probability that the first set of the sequence contains
X is exactly 2−|X | . In the event that it does not contain X, we are
picking the first set that contains X from the rest of the sequence.
randomized protocols 47
Theorem 3.4. Let M be an m × n matrix. Then The minimax principle can also be seen
as a consequence of linear program-
ming duality.
min max xMy = max min xMy,
x ≥0 y ≥0 y ≥0 x ≥0
Theorem 3.5. If g : {0, 1}n × {0, 1}n → {0, 1} can be computed with c bits
of communication, and error e in the worst case, then it can be computed by
a private coin protocol with c + log(n/e2 ) + O(1) bits of communication, It is known that computing whether or
and error 2e in the worst case. not two n-bit strings are equal requires
Ω(log n) bits of communication if only
private coins are used. This shows that
Proof. We use the probabilistic method to find the required private
Theorem 3.5 is tight.
coin protocol. Let us pick t independent random strings, each of
which can be used as the randomness for the given private-coin
protocol.
For any fixed input, some of these t random strings lead to the
public coin protocol computing the right answer, and some of the
lead to the protocol computing the wrong answer. By the Chernoff
bound, the probability that 1 − 2e fraction of the t strings lead to the
2
wrong answer is at most 2−Ω(e t) . We set t = O(2n/e2 ) to be large
enough so that this probability is less than 2−2n . Then by the union
bound, we get that the probability that 2et of these strings give the
wrong answer for any input is less than 1. Thus there must be some
fixed strings with this property.
The private coin protocol is now simple. Alice samples one of the
t strings and sends its index to Bob, which takes at most log(n/e2 ) +
O(1) bits. Alice and Bob then run the original public coin protocol.
50 communication complexity
Pr[ g( x, y) = b|( x, y) ∈ R] ≥ 1 − e.
µ
Theorem 3.7. If there is a c-bit protocol that computes g with error e under
a distribution µ, then you can partition the inputs into 2c rectangles, such
that the average bias of a random rectangle from the partition is at least 1 − e.
Theorem 3.8. If there is a c-bit protocol that computes g with error e under
µ, then for every `, there are disjoint (1 − `e)-monochromatic rectangles
R1 , R2 , . . . , R2c such that Prµ [( x, y) ∈ ∪i Ri ] ≥ 1 − 1/`. Theorem 3.8 will be instrumental to
prove lower bounds on randomized
protocols
As a corollary, we get:
Corollary 3.9. If there is a c-bit protocol that computes g with error e under
µ, then for every `, there is a (1 − `e)-monochromatic rectangle of density at
least 2−c (1 − 1/`).
randomized protocols 51
Exercise 3.1
In this exercise we will develop a randomized protocol for greater-
than that requires only O(log log n) bits of communication. Let
x, y ∈ {0, 1}` be two strings. Alice and Bob want to find the smallest i
such that xi 6= yi .
Exercise 3.2
In this exercise, we design a randomized protocol for finding the
first difference between two n-bit strings. Alice and Bob are given n
bit strings x 6= y and want to find the smallest i such that xi 6= yi . In
class we saw how to accomplish this using O(log n log log n) bits of
communication. Here we do it with O(log n) bits of communication.
Define a rooted tree as follows. Every vertex will correspond to an
interval of coordinates from [n]. The root corresponds to the interval
I = [n]. Every internal vertex corresponding to the interval I will
have two children, the left child corresponding to the first half of I
and the right child corresponding to the right half of I. This defines
a tree of depth log n, where the leaves correspond to intervals of
size 1 (i.e. coordinates) of the input. At each leaf, attach a path of
length 3 log n. Every vertex of this path represents the same interval
of size 1. The depth of the tree is now 4 log n.
3. Use the Chernoff bound to argue that the number of hashes that
gives the right answer is high enough to ensure that the protocol
52 communication complexity
Exercise 3.3
Show that if the inputs to greater-than are sampled uniformly
and independently, then there is a protocol that communicates only
O(log(1/e)) bits and has error at most e under this distribution.
4
Numbers On Foreheads
1
Chandra et al., 1983
The number-on-forehead model1 of communication is one way
to generalize the case of two party communication to the multiparty
setting. There are k parties communicating, and the i’th party has an
input drawn from the set Xi written on their forehead. Each party can
When there are only k = 2 parties, this
see all of the inputs except the one that is written on their forehead. model is identical to the model of 2
The fact that each party can see most of the inputs means that parties party communication.
do not need to communicate as much. It also means that proving
lower bounds against this model is particularly hard. Indeed, we
do not yet known how to prove optimal lower bounds in the model
of computation, in stark contrast to models where the inputs are
Moreover, optimal lower bounds in
completely private. this model would have very interesting
We start with some examples of clever number-on-forehead proto- consequences to the study of circuit
complexity.
cols.
Intersection size Suppose there are k parties, and the i’th party has a
subset Xi ⊆ [n] on their forehead. The parties want to compute A protocol solving this problem would
the size of the intersection ∩i Xi . We shall describe a protocol2 that compute both the disjointness function
and the inner product function.
requires only O(k4 n/2k ) bits of communication.
We start by describing a protocol that requires only k2 log n bits 2
Grolmusz, 1998; and Babai et al., 2003
k
of communication, as long as n < (k/2 ). It is helpful to think of the
input as a k × n boolean matrix. Each of the parties knows all but In Chapter 5, we prove that at least
n/4k bits of communication are re-
one row of this matrix, and they wish to compute the number of
quired.
all 1’s columns. Let Ci,j denote the number of columns containing
54 communication complexity
j 1’s that are visible to the i’th party. The parties compute and
announce the values of Ci,j , for each i, j. The communication of the
protocol is at most k2 log n bits. Let A j denote the actual number of
columns with j ones in them.
Claim 4.1. If there are two valid solutions Ak , . . . , A0 and A0k , . . . , A00
that are both consistent with the values Ci,j , then either A0k = Ak , or for
each j, | A j − A0j | ≥ (kj).
as required.
k
√
Claim 4.1 implies that if n < (k/2 ) ≈ 2k / k, then there can
only be one solution for Ak , since | Ak/2 − A0k/2 | cannot exceed
n. To get the final protocol, the parties divide the columns of
k
the matrix into blocks of size at most (k/2 ), and compute Ak for
each such block separately. The total communication is then
k
n·k2 log (k/2 )
k = O(k4 n/2k ).
(k/2 )
a + 2b + a = 2( a + b)
⇒ a + 2b − W ( a + 2b) + a − W ( a) = 2( a + b − W ( a + b)).
as required.
Cylinder Intersections
where χi is a boolean function that does not depend on the i’th input.
numbers on foreheads 57
p
protocol that computes this function using O( log n) bits of commu-
nication. Here we show that Ω(log log log n) bits of communication 5
Chandra et al., 1983
are required5 .
Let cn is the communication of the exactly n problem. Three points
of [n] × [n] form a corner if they are of the form ( x, y), ( x + d, y), ( x, y +
d). A coloring of [n] × [n] with 2c colors is a function g : [n] × [n] → [C ].
We say that the coloring avoids monochromatic corners if there is
no corner with g( x, y) = g( x + d, y) = g( x, y + d). Let Cn be the
minimum number of colors required to avoid monochromatic corners
in any coloring of [n] × [n]. We claim that Cn essentially captures the
value of cn :
Next we prove6 :
log log n
Theorem 4.7. Cn ≥ Ω log log log n .
2r
We shall prove by induction that as long as C > 3, if n ≥ 2C ,
then any coloring of [n] × [n] with C colors must contain either a
monochromatic corner, or a rainbow corner with r colors. When
2( C +1)
r = C + 1, this means that if n ≥ 2C , [n] × [n] must
contain a
log log n 2(Cn + 1) log Cn ≥ log log n,
monochromatic corner, proving that Cn ≥ Ω log log log n .
which cannot happen if Cn =
For the base case, when r = 2, n = 4, two of the points of the type o (log log n/ log log log n).
( x, n − x ) must have the same color. If ( x, n − x ) and ( x 0 , n − x 0 ) have
the same color, with x > x 0 , then ( x 0 , n − x ), ( x, n − x ), ( x 0 , n − x 0 ) are
either a monochromatic corner, or a rainbow corner with 2 colors.
2r 2r 2(r −1)
For the inductive step, if n = 2C , n contains m = 2C −C
consecutive disjoint intervals: [n] = I1 ∪ I2 ∪ . . . ∪ Im , each of size
2(r −1)
exactly 2C . By induction, each of the sets Ij × Ij must have either
a monochromatic corner, or a rainbow-corner with r − 1 colors. If
one of them has a monochromatic corner, we are done, so suppose
they all have rainbow-corners with r − 1 colors. Since a rainbow
corner is specified by choosing the center, choosing the colors and
2(r −1) 2
choosing the offsets for each color, there are at most (2C ) · 2C ·
C 2(r −1) C
(2 ) rainbow-corners in each interval. This number is at most
2C 2(r −1) +C +C2r −1 2r 2(r −1)
2 < 2C − C = m, so there must be j < j0 that Figure 4.9: A rainbow-corner induced
by two smaller rainbow corners.
have exactly the same rainbow corner with the same coloring. Then
we see (Figure 4.9) that these two rainbow corners must induce a
monochromatic corner centered in the box Ij × Ij0 , or a rainbow corner
with r colors.
Exercise 4.1
Define the generalized inner product function GIP as follows. Here
each of the k players is given a binary string xi ∈ {0, 1}n . They want
to compute GIP( x ) = ∑nj=1 ∏ik=1 xi,j (mod 2).
Each vector xi can be interpretted as
This exercise outlines a number-on-forehead GIP protocol using a subset of [n]. Our set intersection
O(n/2k + k) bits. It will be convenient to think about the input X as a protocol computes GIP with O(k4 n/2k )
bits. This improved protocol, by A.
k × n matrix with rows corresponding to x1 , . . . , xk .
Chattopadhyay, slightly improves a
famous protocol of V. Grolmusz.
• Fix z ∈ {0, 1}n . Assume the first t coordinates of z are ones and
the rest are zeros. For ` ∈ {0, 1, . . . , k − 1} define c` as the number
of columns in X with ` ones, followed by either a one or zero,
followed by k − ` − 1 zeros. Note that GIP( x ) = ck (mod 2). Find
a protocol to compute GIP( x ) using O(k ) bits assuming the players
know ct (mod 2).
One can extend this protocol to com-
• Exhibit an overall protocol for GIP by showing that the players can pute any function of the number of all
agree upon a vector z and communicate to determine ct (mod 2) ones rows using O(n/2k + k log n) bits.
using O(n/2k + k) bits.
Exercise 4.2
60 communication complexity
lim cn = ∞.
n→∞
Exercise 4.3
A three player NOF puzzle demonstrates that unexpected effi-
ciency is sometimes possible.
Inputs: Alice has a number i ∈ [n] on her forehead, Bob has a
number j ∈ [n] on his forehead, and Charlie has a string x ∈ {0, 1}n
on his forehead.
Output: On input (i, j, x ) the goal is for Charlie to output the bit xk
where k = i + j (mod n).
Question: Find a deterministic protocol such that Bob sends one bit
to Charlie, and Alice sends b n2 c bits to Charlie. Alice and Bob must
numbers on foreheads 61
Exercise 4.4
Show that any degree d polynomial over F2 over the variables
x1 , . . . , xn can be computed by d + 1 players with O(d) bits of Number-
On-Forehead communication, for any partition of the inputs where
each party has n/(d + 1) bits on their forehead. (You may assume
d + 1 divides n exactly).
5
Discrepancy
1 − 2e
2c ≥ .
maxR Ex,y χ R ( x, y) · (−1) g( x,y)
Lemma 5.3. Every n-vertex graph with e(n2 ) edges has at least (en − 1)4 /4
4-cycles.
Proof. Let 1x,y be 1 when there is an edge between the vertices x and Figure 5.1: A dense graph with no
y, and 0 otherwise. Then if x, x 0 , y, y0 are chosen uniformly at random, 3-cycles.
discrepancy 65
We can use similar ideas to prove that every dense bipartite graph
must contain a reasonably large bipartite clique. Next we show a
slightly different way to prove this:
So if di is the degree of the i’th vertex, the expected size of the set R
is at least
! log n
n log n e log n
di 2 log(e/e) 1 n di 2 log(e/e) √
∑ en ∑
2 log(e/e)
≥ n · ≥ n · = n. By convexity.
i =1
n i=1 en e
Say Alice and Bob are given x, y ∈ {0, 1}n and want to compute
h x, yi mod 2. We have seen that this requires n + 1 bits of communi-
cation using a deterministic protocol. Here we show that it requires
≈ n/2 bits of communication even using a randomized protocol.
66 communication complexity
Lemma 5.5. For any rectangle R, the discrepancy of R with respect to the
inner product is at most 2−n/2 .
Theorem 5.6. Any 2-party protocol that computes the inner-product with
error at most e over the uniform distribution must have communication at
least n/2 − log(1/(1 − 2e)).
where the inequality follows from the fact that E [ Z ]2 ≤ E Z2 for
any real valued random variable Z. Now we can drop χk ( x ) from
this expression to get:
h i2
GIP( x )
E χS ( x ) · (−1)
x
" #2
k −1
GIP( x )
≤ E E ∏ χi ( x ) · (−1)
x1 ,...,xk−1 xk
i =1
" #
k −1 −1
∑nj=1 ( xk + xk0 ) ∏ik= 1 xi,j
= E
x1 ,...,xk ,xk0
∏ χi (x)χi (x ) · (−1) 0
i =1
Theorem 5.8. Any randomized protocol for computing the generalized inner
product in the number-on-forehead model with error e requires n/4k−1 −
log(1/(1 − 2e)) bits of communication.
68 communication complexity
At first it may seem that the discrepancy method is not very useful
for proving lower bounds against functions like disjointness, which
do have large monochromatic rectangles.
Suppose Alice and Bob are given two sets X, Y ⊆ [n] and want to
compute disjointness. If we use a distribution on inputs that gives
intersecting sets with probability at most e, then there is a trivial
protocol with error at most e. On the other hand, if the probability of
intersection is at least e, then then there must be some fixed coordi-
nate i such that an intersection occurs in coordinate i with probability
at least e/n. Setting R = {( X, Y ) : i ∈ X, i ∈ Y }, we get
h i
E χ R ( X, Y ) · (−1)Disj(X,Y ) ≥ e/n,
h m
i2
Disj( Xi ,Yi )
E χ R ( X, Y ) · (−1)∑i=1
h m
i2
= E A( X ) · B(Y ) · (−1)∑i=1 Disj(Xi ,Yi )
h i2
m
≤ E A( X )2 E B(Y ) · (−1)∑i=1 Disj(Xi ,Yi )
h m m 0
i
≤ E B(Y ) B(Y 0 ) · (−1)∑i=1 Disj(Xi ,Yi )+∑i=1 Disj(Xi ,Yi )
X,Y,Y 0
h
m
i
∑ Disj( Xi ,Yi )+∑im=1 Disj( Xi ,Yi0 )
≤ E E (−1) i = 1
Y,Y 0 X
discrepancy 69
Lemma 5.9 may not seem useful at first, because under the given
distribution, the probability that X, Y are disjoint is 2−m . However,
we can actually use it to give a linear lower bound on the communi-
cation of deterministic protocols. Suppose a deterministic protocol
for disjointness has communication c. Then there must be at most 2c
monochromatic 1-rectangles R1 , . . . , Rt that cover all the 1’s. When-
ever X, Y are disjoint, we have that ∑m j=1 Disj( Xi , Yi ) = m. On the
other hand, the probability that X, Y are disjoint is exactly 2−m . Thus,
we get
" #
t
−m ∑m
j=1 Disj( Xi ,Yi )
2 ≤E ∑ χRi (X, Y ) · (−1)
i =1
t h m i
Disj( Xi ,Yi )
≤ ∑ E χ Ri ( X, Y ) · (−1)∑ j=1
i =1
m q
≤ 2c · (1/ ∏ | Ij |).
j =1
h i m
2k −1 − 1
∑m Disj( X1,j ,...,Xk,j )
E χS ( X ) · (−1) j=1 ≤ ∏ q .
j =1 | Ij |
h i2
∑m Disj( X1,j ,...,Xk,j )
E χS ( X ) · (−1) j=1
" #2
k −1
χ k ( X )2 · E ∑m
j=1 Disj( Tj )
≤ E
X1 ,...,Xk−1 Xk
∏ χi (X ) · (−1)
i =1
" #2
k −1
∑m
j=1 Disj( Tj )
≤ E
X1 ,...,Xk−1
E
Xk
∏ χi (X ) · (−1)
i =1
" #
k −1
∑m 0
j=1 Disj( Tj )+Disj( Tj )
= E
X1 ,...,Xk−1 ,Xk ,Xk0
∏ χi (X )χi (X ) · (−1) 0
, (5.2)
i =1
Then we get:
" #
m m
(5.2) ≤ E ∏ Zj ≤ ∏ E Zj ,
j =1 j =1
Proof.
1 (1/| Ij |) · γ|Q|−1 (1 − γ)| Ij |−|Q|
E
| Q|
= ∑ | Q|
Q,v
1 1 1
= ∑ γ|Q| (1 − γ)| Ij |−|Q| ≤ (1 − γ + γ ) | I j | = .
γ| Ij | Q6=∅
γ| Ij | γ| Ij |
discrepancy 71
(2k −2 − 1 )2
Zj = q
0 | · |X0 \ X |
| Xk,j \ Xk,j k,j k,j
!
(2k −2 − 1 )2 1 1
≤ · 0 | + |X0 \ X | , By the Arithmetic√mean - geometric
2 | Xk,j \ Xk,j k,j k,j mean inequality: ab ≤ a+2 b .
" #
(2k −2 − 1 )2 0
E Zj ≤ Pr[v = v ] + E 0 |
| Xk,j \ Xk,j
2k−1 − 1 2(2k−1 − 1)(2k−2 − 1)2
≤ +
| Ij | (2k−2 − 1)| Ij |
(2k −1 − 1 )2
= ,
| Ij |
as required.
1
Shannon, 1948
Shannon’s seminal work on information theory1
has had a big im-
pact on communication complexity. Shannon wanted to measure the
amount of information (or entropy) contained in a random variable
X. Shannon’s definition was motivated by the observation that the
amount of information contained in a message is not the same as the
length of the message. Suppose we are working in the distributional
setting, where the inputs are sampled from some distribution µ.
• Consider a protocol where Alice’s first message to Bob is a c-bit The entropy of the message is 0.
string that is always 0c , no matter what her input is. This message
does not convey any information to Bob. We might as well run the
protocol imagining that this first message has already been sent,
and so reduce the communication of the first step to 0.
The entropy of the message is log |S|.
• Consider a protocol where Alice’s first message to Bob is a random
string from a set S ⊆ {0, 1}c , with |S| 2c . In this case, the
parties should use log |S| bits to index the elements of the set,
reducing the communication from c to log |S|.
1
H (e)
0.5
0
0 0.5 1
e
Figure 6.1: The entropy of a bit with
p(1) = e.
1
H (X) = ∑ p(x) log(1/p(x)) = pE(x) log
p( x )
.
x
so some vertex will be available at the j’th step. This ensures that
every step of this process succeeds.
Conversely, suppose X can be encoded in such a way that i is
encoded using `i bits. Then the expected length of the encoding is:
h i
−`
E [`i ] = E [log(1/p(i ))] − E log(2 i /p(i ))
p (i ) p (i ) p (i )
!
h i
−`i
≥ H ( X ) − log E 2 /p(i ) By convexity of the log function,
p (i ) E [log Y ] ≤ log (E [Y ]).
!
= H ( X ) − log ∑2 −`i
.
i
If you pick a random path starting from the root in the protocol tree,
you hit the leaf encoding i with probability 2−`i . The probability
that you hit one of theleaves encoding
a number from [n] is thus
∑i 2−`i ≤ 1. Thus log ∑i 2−`i is at most 0, and the entropy is at
most the expected length of the encoding.
p( x )
Fact 6.2. ≥ 0.
q( x )
Proof.
p( x ) p( x )
= E log
q( x ) p( x ) q( x )
q( x ) q( x )
= − ∑ p( x ) log ≥ − log ∑ p( x ) = log 1 = 0. The inequality follows from the convex-
x p( x ) x p( x ) ity of the log function.
76 communication complexity
4.5
0.8
3.5
3
0.5
γ
2.5
1.5
0.2 1
0.5
0
0.1 0.5 0.9
e
information of any random variable with itself is the same as its en-
tropy I ( A : A) = H ( A). On the other hand, if A, B are independent,
I ( A : B) = 0. In general, the mutual information is always a num-
ber between these two quantities: 0 ≤ I ( A : B) ≤ H ( A). The first
inequality follows from Fact 6.2, and the second by observing:
1 p( a, b)
H ( A) − I ( A : B) = E log − log
p( a,b) p( a) p( a) p(b)
p(b)
= E log ≥ 0.
p( a,b) p( a, b)
Chain Rules
Chain rules allow one to relate bounds on the information of a
collection of random variables to the information associated with
each variable. Suppose p( a, b) and q( a, b) are two distributions. Then
we have
p( a, b) p( a) · p(b| a)
= E log
q( a, b) p( a,b) q( a) · q(b| a)
p( a) p(b| a)
= E log + E log
p( a,b) q( a) p( a,b) q(b| a)
" #
p( a) p(b| a)
= + E .
q( a) p( a) q(b| a)
In words, the total divergence is the sum of the divergence from the
first variable, plus the expected divergence from the second variable.
Similar chain rules hold for the entropy and mutual informa-
tion. Suppose A, B are two random bits that are always equal. Then
H ( AB) = 1 6= H ( A) + H ( B), so the entropy does not add in general.
Nevertheless, a chain rule does exist for entropy. Denote
1
H ( B | A) = E log .
p( a,b) p(b| a)
h i
2
H ( AB) = E p(a,b) log p(a) p1(b|a) =
Then we have the chain rule2 :
H ( AB) = H ( A) + H ( B | A). h i
Suppose A, B, C are three random bits that are all equal to each E p(a,b) log p(1a) + log p(b1|a) = H ( A) +
H ( B | A) .
other. Then I ( AB : C ) = 1 < 2 = I ( A : C ) + I ( B : C ). On the other
hand, if A, B, C are three random bits satisfying A + B + C = 0 mod 2,
we have I ( AB : C ) = 1 > 0 = I ( A : C ) + I ( B : C ). Nevertheless, a
chain rule does hold for mutual information, after we use the right 3
I ( AB :hC ) = i
definition. Denote: p( a,c)· p(b| a,c)
E p(a,b,c) log p(a) p(b|a)· p(c) =
h i
p(b, c| a) p(b| a,c)
I ( A : C ) + E p(a,b,c) log p(b|a) =
I ( B : C | A) = E log . h i
p( a,b,c) p(b| a) p(c| a) p(b,c| a)
I ( A : C ) + E p(a,b,c) log p(b|a)· p(c|a =
I ( A : C ) + I ( B : C | A ).
Then we have the chain rule3 : I ( AB : C ) = I ( A : C ) + I ( B : C | A).
78 communication complexity
Subadditivity
Each of the definitions we have seen so far satisfies the property that
conditioning on variables can either only increase the quantity or
only decrease the quantity, a property that we loosely refer to as
subadditivity. We start with the divergence. Suppose p( a, b), q( a, b) are
two distributions. Then:
" #
p( a|b) p( a) p( a|b)
E = E log + log
p(b) q( a) p( a,b) q( a) p( a)
p( a) p( a)
= + I ( A : B) ≥ .
q( a) q( a)
One consequence of this last inequality is:
Fact 6.4. If q( x1 , . . . , xn ) is a product distribution, then for any p,
p ( x1 , . . . , x n ) n p ( xi )
q ( x1 , . . . , x n
≥ ∑ q ( xi )
.
i =1
H ( AB) = H ( A) + H ( B) − I ( A : B) ≤ H ( A) + H ( B) . p ( x1 , . . . , x n )
q ( x1 , . . . , x n
This last inequality also implies that n
" #
p ( xi | x <i )
= ∑ p(Ex
H ( A) ≥ H ( AB) − H ( B) = H ( A | B) . i =1 <i q ( xi | x <i )
" #
n p ( xi | x <i )
We have already seen that conditioning on a random variable can = ∑ p(Ex
i =1 <i q ( xi )
both decrease, or increase the mutual information. Nevertheless, n p ( xi )
when A, B are independent, we can prove4 : ≥ ∑ q ( xi )
.
i =1
I ( AB : C ) ≥ I ( A : C ) + I ( B : C ) .
Shearer’s Inequality 4
I ( AB : C ) − I ( A : C ) − I ( B : C ) =
I ( B : C | A) − I ( B : C ) = H ( B | A) −
A useful consequence of subadditivity is Shearer’s inequality: H ( B | AC ) − H ( B) + H ( B | C ) ≥ 0,
since H ( B | AC ) ≤ H ( B | C ), and
Lemma 6.5. Suppose X = X1 , . . . , Xn is a random variable and S ⊆ [n] is a H ( B | A ) = H ( B ).
set sampled independently of X. Then if p(i ∈ S) ≥ e for every i ∈ [n], we
have H ( XS | S) ≥ e · H ( X ).
Proof. Suppose S = { a, b, c}, with a < b < c. Then we can express
H ( XS ) = H ( X a ) + H ( Xb | X a ) + H ( Xc | X a , Xb )
≥ H ( X a | X< a ) + H ( Xb | X< b ) + H ( Xc | X< c ) ,
by subadditivity. In general, we get that
" #
H ( XS | S ) ≥ E
S
∑ H ( Xi | X < i )
i ∈S
n
= ∑ p ( i ∈ S ) H ( Xi | X < i ) ≥ e · H ( X ) .
i =1
information 79
Pinsker’s Inequality
Pinsker’s inequality bounds the statistical distance between two
distributions in terms of the divergence between them.
0.8 0.6
0.5
0.4
0.5
γ
0.3
0.2
0.1
0.2
0
0.1 0.5 0.9
e
p( x ) 2
Lemma 6.6. ≥ ln 2 · | p − q |2 .
q( x )
See the notational remarks in Prob-
ability section of the Conventions
Proof. Let T be the set that maximizes p( T ) − q( T ), and define Chapter.
1 if x ∈ T,
xT =
0 otherwise.
p( x ) p( xT )
≥
q( x ) q( xT ) f
2 2 g
≥ · ( p( x T = 1) − q( x T = 1))2 = · | p − q |2 .
ln 2 ln 2
The first inequality follows from the chain rule for divergence. It
only remains to prove the second inequality. Suppose p( x T = 1) =
e ≥ q( x T = 1) = γ. Then we shall show that
0 0.67 1
e
e 1−e 2 Figure 6.4: f = e log 2/3 e
+ (1 −
e log + (1 − e) log − · ( e − γ )2 (6.1) 1− e 2
γ 1 − γ ln 2 e) log 1/3 , g = ln 2 (e − 2/3)2 .
80 communication complexity
−e 1−e 4( γ − e )
+ −
γ ln 2 (1 − γ) ln 2 ln 2
γ − eγ − e + eγ 4(γ − e)
= −
γ(1 − γ) ln 2 ln 2
(γ − e) 1
= −4 .
ln 2 γ (1 − γ )
H ( B) /n ≥ I ( A1 , . . . , An : B) /n
n
≥ (1/n) ∑ I A j : BA< j .
j =1
E [I ( Ai : BA<i )] ≤ H ( B) /n.
Let S be a set of n3 points in R3 , and let Sxy , Syz , Sxz denote the pro-
jections of S onto the xy, yz, xz planes.
Claim 6.9. One of the three projections must have size at least n2 .
H ( XY ) + H (YZ ) + H ( XZ ) 2
≥ · H ( XYZ ) = 2 log n,
3 3
so one of the first three terms must be at least 2 log n, proving that
the projection must be of size at least n2 .
H ( XYZ ) = H ( XY ) + H ( Z | Y ) .
we use convexity:
d
H (Z | Y) = ∑ 2mi · log di
i
n di
2m ∑
= · · log di
i
n
n
≥ · d · log d = log d, since the function x log x is concave
2m
H ( XYZW ) = H ( XYZ ) + H (W | XZ )
≥ H ( XYZ ) + H ( XWZ ) − H ( XZ ) using subadditivity
Combining this with our bound for H ( XYZ ), we get that H ( XYZW ) ≥
log(4m4 /n4 ). There are some redundant cycles where two of the ver-
tices are the same. After accounting for these, we are left with at least 7
very similar reasoning can be used to
4m4 /n4 − 4n3 distinct cycles. give a bound in the case that the girth is
Finally we turn to bounding the girth of a graph. The girth is the even
g −1
2
H ( X | X0 ) = ∑ H ( Xi | Xi − 1 ) .
i =1
information 83
dv
H ( Xi | Xi − 1 ) = ∑ 2m log(dv − 1)
v
2m dv
=
n ∑ n
log(dv − 1)
v
≥ log(d − 1).
g−1
H ( X | X0 ) ≥ log(d − 1).
2
On the other hand, since the girth of the graph is g, the entire path X
g −1
is determined by X0 , Xt . So we have log n ≥ H ( X | X0 ) ≥ 2 log(d −
1), as required.
q
with e = ` ln 2 . Since p ( x ) is uniform for each i, the probability
2n i
that Bob makes an error in the i’th coordinate must be at least 1/2 −
) − p( xi )|. So the probability that Bob makes an error is at least
| p( xi |mq
ln 2
1/2 − ` 2n , proving that at least Ω(n) bits must be transmitted if the
Note that if Alice has a random set
protocol has a small probability of error. from a family of sets of size 2Ω(n) , the
lower bound for indexing would still
hold. The lower bound even extends to
the case that Bob knows x1 , . . . , xi−1 .
Randomized Communication of Disjointness
13
Kalyanasundaram and Schnitger,
One of the triumphs of information theory is its ability to prove
1992; Razborov, 1992; Bar-Yossef et al.,
optimal lower bounds on the randomized communication complexity 2004; and Braverman and Moitra, 2013
of functions like disjointness13 , which we do not know how to prove
any other way. This result is especially impactful
because many other lower bounds
in other models (more in Part II) are
Theorem 6.13. Any randomized protocol that computes the disjointness consequences of Theorem 6.13.
function with error 1/2 − e must have communication Ω(e2 n).
Proof of Claim 6.14. For any h, m, let αhm be the statistical distance of
p( xt |hm) from uniform, and let β hm denote the distance of p(yt |hm)
from uniform. Let
In particular,
p(αhm ≤ γ ∧ β hm ≤ γ)
≥ p( xt = 0 = yt ) · p(αhm ≤ γ ∧ β hm ≤ γ| xt = 0 = yt )
1 − 4γ
≥ ,
4
we get
1 − 4γ 1 1 3γ
p(E ) ≥ · − 2γ = − + 2γ2 ,
4 4 16 4
proving that γ ≥ Ω(1), as required.
88 communication complexity
= 1 if x, y are disjoint,
A x,y
≤ 1 − δ if x, y are not disjoint.
A x,y
q( xy) = .
∑ a,b A a,b
A(m) x,y
q( xy|m) = ,
∑ a,b A(m) a,b
q( xy|m) is a product distribution,
and since A(m) has non-negative rank 1: if
∑ a,b A(m) a,b the rank of A(m) is 1, there must be
q(m) = .
∑ a,b A a,b v x , vy ∈ R such that q( xy|m) = v x · vy .
Let D denote the event that the sets X, Y sampled in this distribu-
tion are disjoint. Since A x,y = 1 for disjoint sets x, y, q( xy|D) is the
uniform distribution on all pairs of disjoint sets. Let Hi = X<i Y>i .
information 89
I ( Xi : M | Hi , Yi , D) + I (Yi : M | Hi , Xi , D) ≥ Ω(δ4 ).
Before proving Claim 6.17, we show how to use it. By Lemma 6.15,
we get that
2 log r
n
≥ ∑ I (Xi : M | Hi , Yi , D) + I (Yi : M | Hi , Xi , D)
i =1
≥ Ω ( δ4 n ),
4 n)
proving that r ≥ 2Ω(δ as required. Next we turn to proving Claim
6.17.
I ( Xi : M | hi , Yi , D) + I (Yi : M | hi , Xi , D)
2
≥ (I ( Xi : M | hi , Yi = 0, D) + I (Yi : M | hi , Xi = 0, D))
3
≥ Ω ( δ4 ).
I ( Xi : M | hi , Yi = 0, D) = γ4 ,
In this sum, the contribution of the terms for which αm > γ is at most
∑ p(m, xi = 0 = yi ) ≤ ∑ p ( m | y i = 0)
m:αm >γ m:αm >γ
Randomized Pointer-Chasing
In the k step pointer-chasing problem, the input is a directed graph
z0 z1
on the vertex set [2n], where every vertex has exactly one edge com-
ing out of it. Let 1 = z0 , z1 , z2 , . . . , zk be the path of length k starting
at the first vertex. However, Alice only knows all of the edges that
originate in the vertices A = {1, . . . , n}, and Bob knows all of the
edges that originate at the vertices B = {n + 1, . . . , 2n}. The goal of the z2 z7
parties is to output whether or not zk is even15 .
There is an obvious deterministic protocol that takes k rounds
and k log n bits of communication: in each step one of the players
announces z1 , z2 , . . . , zk . There is a randomized protocol with k − 1
rounds and O((k + n/k ) log n) bits of communication. In the first z6 z3
step, Alice and Bob use shared randomness to pick 10n/k vertices
in the graph and announce the edges that originate at these vertices.
z4
Alice and Bob then continue to use the deterministic protocol, but
do not communicate if one of the edges they need has already been
announced. In expectation, this protocol will have k + 1 − 10 rounds16 .
We shall prove that any randomized or deterministic protocol with
k − 1 rounds must have much more communication.
Let the graph be sampled uniformly, subject to the constraint that
z5
every edge from from either A → B or from B → A. Let m≤k−1
denote the first k − 1 messages of a protocol whose communication Figure 6.7: An example of an input to
complexity is `. The key idea here is quite similar to the lower bound pointer chasing, with n = 8, k = 7.
information 91
Alice sends the i + 1’st message In this case, fixing ri−1 leaves mi and zi
independent. Pick mi by greedily setting each bit of mi in such a
way that the probability of that bit is maximized conditioned on
ri−1 and all previous bits. This ensures that
To choose zi , define
B1 = {z0 , z1 , . . . , zi−1 }
( )
p ( x j | m i , r i −1 ) `+k
B2 = j : > 4·
p( x j ) n
p( Zi = j|ri−1 )
B3 = j : < 1/2
p( Zi = j|z≤i−1 )
We shall prove:
`+k p ( x j | m i , r i −1 )
| B2 − B1 | · 4 ·
n
≤ ∑ p( x j )
j∈ B2 − B1
p ( x j | m ≤ i , z ≤ i −1 )
= ∑ p ( x j | z ≤ i −1 )
Since x j is independent of z≤i−1 for all
j∈ B2 − B1 j∈/ B1 .
p ( m ≤ i | z ≤ i −1 ) = p ( m i | r i −1 ) · p ( m ≤ i −1 | z ≤ i −1 )
≥ 2−|mi | · 2−|m≤i−1 |−i+1 ≥ 2−`−k ,
p ( m ≤ i | z ≤ i ) = p ( m i | r i −1 ) · p ( m ≤ i −1 | z ≤ i )
p ( z i | m ≤ i −1 , z ≤ i −1 )
≥ 2−|mi | · · p ( m ≤ i −1 | z ≤ i −1 )
p ( z i | z ≤ i −1 )
p ( m ≤ i −1 | z ≤ i )
≥ 2−|m≤i |−i · (1/2) = 2−|m≤i |−(i+1) p ( m ≤ i −1 , z i | z ≤ i −1 )
=
p ( z i | z ≤ i −1 )
by the choice of mi , and the fact that zi ∈
/ B2 . p ( z i | m ≤ i −1 , z ≤ i −1 )
= · p ( m ≤ i −1 | z ≤ i −1 )
p ( z i | z ≤ i −1 )
94 communication complexity
Bob sends the i + 1’st message In this case, we pick zi first. Define the
sets:
B1 = {z0 , z1 , . . . , zi−1 }
( )
p ( x j | r i −1 ) `+k
B2 = j : > 4·
p( x j ) n
p( Zi = j|ri−1 )
B3 = j : < 1/2
p( Zi = j|z≤i−1 )
`+k p ( x j | r i −1 )
| B2 − B1 | · 4 ·
n
≤ ∑ p( x j )
j∈ B2 − B1
p ( x j | m ≤ i −1 , z ≤ i −1 )
= ∑ p ( x j | z ≤ i −1 )
Since x j is independent of z≤i−1 for all
j∈ B2 − B1 j∈/ B1 .
p ( m ≤ i | z ≤ i ) ≥ p ( m i | r i −1 , z i ) · p ( m ≤ i −1 | z ≤ i )
p ( z i | r ≤ i −1 )
≥ 2−|mi | · · p ( m ≤ i −1 | z ≤ i −1 )
p ( z i | z ≤ i −1 )
≥ 2−|m≤i |−(i+1) ,
as required.
information 95
Exercise 6.1
Show that for anys two joint distributions p( x, y), q( x, y) with same
support, we have
" # " #
p( x |y) p( x |y)
E ≤ E .
p(y) p( x ) p(y) q( x )
Exercise 6.2
Suppose n is odd, and x ∈ {0, 1}n is sampled uniformly at random
from the set of strings that have more 1’s than 0’s. Use Pinsker’s
inequality to show that the expected number of 1’s in x is at most
√
n/2 + O( n).
Exercise 6.3
Let X be a random variable supported on [n] and g : [n] → [n] be a Use the fact that α log α ≥
− log e
≥ −1,
e
function. Prove that for α > 0.
H ( X | g( X )) − 1
Pr[ X 6= g( X )] ≥ .
log n
Exercise 6.4
Let G be a family of graphs on n vertices, such that every two
vertices in the graph share a clique on r vertices. Show that the
n Hint: Partition the graph into r parts
number of graphs in the family is at most 2( 2 ) /2r−1 . uniformly at random and throw away
all edges that do not stay within a part.
Analyse the entropy of the resulting
distribution on graphs from the family.
I ( X : M | YR) + I (Y : M | XR)
= ∑ I ( X : Mi | YRM<i ) + I (Y : Mi | XRM<i ) ,
i
Alice or Bob sends the next bit of the protocol. If Alice sends the next
bit, then
I (Y : Mi | XRm<i ) = 0,
because Mi is determined by the variables XRm<i . Similarly, if Bob
sends the next bit in the protocol, then
I ( X : Mi | YRm<i ) = 0.
Moreover, if Alice sends the next bit, then by the chain rule, we have
I ( X : Mi | YRm<i )
≤ I ( X : Mi | YRm<i ) + I (Y : Mi | Rm<i )
= I ( XY : Mi | Rm<i ) ,
where the inequality is an equality when X, Y are independent of
each other, because in this case Y is independent of Mi after fixing
R, m<i . Similarly, if Bob sends the next bit, we have
I (Y : Mi | XRm<i ) ≤ I ( XY : Mi | Rm<i ) ,
I ( X : M | YR) + I (Y : M | XR)
= ∑ I ( X : Mi | YRM<i ) + I (Y : Mi | XRM<i )
i
≤ ∑ I ( XY : Mi | RM<i )
i
= I ( XY : M | R) ,
so the internal information never exceeds the external information.
The two quantities are equal when X, Y are independent.
What we are really after is an analogy to Theorem 6.1—we want
to show that information characterizes communication. Such a state- The same argument proves that
I ( X : M | YR) is at most the expected
ment would be immensely useful, because the quantities defining number of bits sent by Alice in the
information are much easier to work with than communication protocol, and I (Y : M | XR) is at most
the expected number of bits sent by Bob
complexity. in the protocol.
Correlated Sampling
3
9
p 8
7
1
5
r q 6
4
m
Figure 7.1: An example of the sampling
procedure. (m4 , ρ4 ) is selected in this
Lemma 7.1. There is a protocol for Alice and Bob, who are each given case.
distributions p(m), q(m) to use public randomness and no communication
in such a way that Alice samples M A distributed according to p(m), Bob
samples M B distributed according to q(m) and the probability that M A 6=
M B is at most 2| p − q|.
Proof. Alice and Bob will use public randomness to sample a se-
quence (m1 , ρ1 ), (m2 , ρ2 ), . . . , where mi is a uniformly random ele-
ment from the support of m, and ρ1 is uniformly random from [0, 1].
Alice will set m = mi , where i is the minimum number for which
ρi ≤ p(mi ). Similarly, Bob will set m0 = m j where j is the minimum
number such that ρ j < p(m j ).
The expected values of i and j are
Let r (m A , m B ) denote the joint distribution of the outputs of Alice proportional to the size of the universe,
and Bob. Let E denote the event that Alice sets i = 1. Then we claim so the time required to carry out this
procedure is also proportional to the
that r ( M A = m A | E) = p( M = m A ). Indeed, by the definition of the size of the universe.
process, we have r ( M A = m a |¬ E) = r ( M A = m A ). Since
Anup Says: make exercise
r ( M A = m A ) = r ( E)r ( M A = m A | E) + (1 − r ( E))r ( M A = m A |¬ E),
we have r ( M A = m A ) = r ( M A = m A | E).
compressing communication 101
∑m | p(m) − q(m)|
r ( B| F ) =
∑m max{ p(m), q(m)}
∑m | p(m) − q(m)|
≤
∑m p(m) + | p(m) − q(m)|
≤ ∑ | p(m) − q(m)|, since ∑m p(m) = 1
m
as required.
External Information
Suppose we would just like to compress the first message in a pro-
tocol down to its external information. If the message M is sent
by Alice, who has the input X, and Bob has the input Y, then the
external information can be expressed as
I ( XY : M) = I ( X : M) + I (Y : M | X )
= I (X : M) . since after fixing X, Y and M are
independent.
In analogy with Theorem 6.1, we prove that there is a way to simu-
late4 the sending of the message M using I ( X : M) + O(log I ( X : M )) 4
Harsha et al., 2007; and Braverman
bits of communication in expectation. The theorem follows from the and Garg, 2014
Theorem 7.2. Suppose Alice knows two distributions p, q, and Bob knows
q. There is a protocol for Alice and Bob to sample an element according to p
using !
p(m) p(m)
+ 2 log + O (1)
q(m) q(m)
bits of communication in expectation.
As a corollary, we get
Corollary 7.3. Alice and Bob can use public randomness to simulate sending
M with expected communication I ( X : M) + 2 log I ( X : M ) + O(1).
To prove the corollary, if r (m, x ) denotes
The protocol we use is inspired by the correlated sampling idea. the joint distribution of X, M, let
p(m) = r (m| x ) and q(m) = r (m).
The public random tape will consist of a sequence of samples Then Jensen’s inequality proves that
(m1 , ρ1 ), (m2 , ρ2 ), . . . , where each mi is a uniformly random element the expected communication of the
resulting protocol is at most I ( X : M ) +
from the support of m, and ρi is a uniformly random number from
2 log I ( X : M ) + O(1).
[0, 1].
102 communication complexity
5
2
p ST 7
9
1 6
r q 8
4
m
Figure 7.2: The sampling procedure
of Theorem 7.2. Here T is 3 and the
Given this public randomness, Alice finds the minimum index sampled point is the 3’rd point of ST .
r such that p( M = mr ) ≥ ρr . The value mr has exactly the right
distribution. Unfortunately, communicating r can be too expensive,
What is the expected communication
so Alice cannot simply send r to Bob. Instead, Alice computes the complexity of sending r?
positive integer
ρr
T= ,
q ( M = mr )
and sends T to Bob. Given T, Alice and Bob both compute the set
( & ')
ρj
ST = j : T = .
q( M = m j )
Alice sends Bob the number K for which r is the K’th smallest ele-
ment of ST .
We have already shown in Section 7 that the sample mr has the
right distribution. To analyze the expected communication of the
protocol, we need two basic claims. The first claim, whose proof
we sketch in the margin, is used to encode the integers sent in the
protocol.
compressing communication 103
Claim 7.4. One can encode all positive integers in such a way that at most
log z + 2 log log z + O(1) bits are used to encode the integer z.
Proof of Claim 7.4: A naive encoding
To argue that the expected length of T is small, we need the follow- would have Alice send a bit to indicate
ing claim: whether there is another bit left to send
in the encoding, and then send the bit
Claim 7.5. For any two distribution p(m), q(m), the contribution of the of data. This would take 2dlog ze + O(1)
bits. To get a better bound, first send
terms with p(m) < q(m) to the divergence is at least −1: the integer dlog ze using the naive
encoding, and then send dlog ze more
p(m)
∑ p(m) log
q(m)
> −1. bits to encode z.
m:p(m)<q(m)
Now, to bound the expected number of bits required to transmit T, Proof of Claim 7.5: Let E denote the
observe that by Claim 7.4, this is at most subset of m’s for which p(m) < q(m).
Then we have
E [log T + 2 log log T + O(1)] ≤ E [log T ] + 2 log E [log T ] + O(1), ∑ p(m) log
p(m)
m:p(m)<q(m)
q(m)
where the inequality follows from Jensen’s inequality. By Claim 7.5, p(m)
we can bound = ∑ p(m) log
q(m)
m∈ E
p(m) q(m)
E [log T ] ≤ ∑ p(m) log ≥ − p( E) · ∑ p(m| E) log
p(m)
m q(m) m∈ E
q(m)
p(m) ≥ − p( E) · log ∑ p(m| E)
≤ ∑ p(m) log q(m) + 1 m∈ E
p(m)
m:p(m)>q(m) q( E)
= − p( E) · log
p( E)
p(m) p(m)
≤
q(m)
− ∑ p(m) log
q(m)
+1 ≥ p( E) · log p( E).
m:p(m)<q(m)
For 0 ≤ x ≤ 1, x log x is maximized
p(m) when its derivative is 0: log e + log x = 0.
≤ + 2, So the maximum is attained at x = 1/e,
q(m) − log e
proving that p( E) log p( E) ≥ e > 1.
and
E [ Z |¬ A] = Pr[i ∈ ST |¬ A]
(1/u) ∑m T p(m) − ( T − 1) p(m)
≤
(1/u) ∑m (1 − p(m))
(1/u) ∑m p(m)
≤
(1/u) ∑m (1 − p(m))
1/u 1
= = .
(1/u)(u − 1) u−1
Thus we get
(1 − 1/u)/(u − 1)
E [K ] ≤ 1 + = 2.
1/u
So the expected number of bits required to transmit K is a constant.
Internal Information
Now suppose we wish to compress a single message sent from Alice
to Bob down to its internal information. This is strictly harder than
the problem for external information—when Y is a constant, the two
problems are the same.
Theorem 7.6. Suppose Alice knows two distributions p, and Bob knows q.
For every e, there is a protocol for Alice to sample an element according to
the distribution p while communicating an expected
v
u
p(m) u p(m)
+t + log(1/e)
q(m) q(m)
bits, such that Bob also computes the same sample, except with probability e.
As a corollary, we get
Corollary 7.7. Alice and Bob can use public randomness to simulate
p
sending M with expected communication I ( X : M | Y ) + I ( X : M | Y ) +
log(1/e).
To prove the corollary, if r ( x, y, m) de-
notes the joint distribution of X, Y, M,
We shall use very similar ideas to obtain a protocol as in the previ- let p(m) = r (m| x ) = r (m| xy) and
ous section. However, our simulating protocol will be interactive, and q(m) = r (m|y). Then Jensen’s inequality
proves that the expected communica-
there will be a small possibility of committing an error. tion of the resulting
p protocol is at most
As in the previous section, Alice and Bob will use public random- I ( X : M | Y ) + I ( X : M | Y ).
ness to sample a sequence of points (m1 , ρ1 ), (m2 , ρ2 ), . . . , where each
compressing communication 105
5
2
8
p 9
7
Q4 4
Q3
r q Q2
1
6
m
Figure 7.3: Sampling from p when the
sender knows only one distribution.
mi is a uniformly random element of the support, and ρi is a uni-
formly random number in [0, 1]. As before, Alice picks the smallest
index r such that p( M = mr ) > ρr . In analogy with thel idea form
ρ
external compression, we would really like to compute q( M=r m )
r
with small communication. Unfortunately, Alice does not know q, so
she cannot compute this ratio without interacting with Bob. Instead,
Alice and Bob will try to guess the ratio. To do so, they will graduate
increase a threshhold T until it is larger than this ratio. They will
then use hashing to find r.
For each index i, let h(i ) = h(i )1 , h(i )2 , . . . be an infinite sequence
of uniformly random bits, sampled publicly. h(i ) is a hash function
that Alice and Bob will try to use to quickly agree on the value of r.
The protocol will proceed in rounds. In round k, Alice and Bob set
2
T = 2k , and Bob computes the set
( & ')
ρj
QT = j : T ≥ .
q( M = m j )
For a parameters αk , β k , Alice will send Bob all the bits of h(r )≤αk
106 communication complexity
that she has not already sent him. For each i = 1, 2, . . . , T, and
j = 1, 2, . . . , αk , Bob will compute the value
g(i, j) is Bob’s best guess for the index of Qi that is consistent with
the first j bits of h(r ) that he sees. If there is any index s ≤ k such that
2 2
g(2s , αk ) = g(2s , αk − β k ), then Bob stops the protocol and outputs
2
g(2s , αk ) for the smallest such index s. If there is no such index, Bob
sends Alice a bit to indicate that the protocol should continue, and
the parties begin the next round.
Intuitively, if k is large enough so that Q T contains r, then all
indices of Q T that are less than r will eventually become inconsistent
with h(r ). If T is smaller, then the probability that any index will
remain consistent with the hashes for β k steps is small.
First, let us analyze the probability that the protocol makes
an error. The protocol outputs g2s2 ,α − β 6= r only if we have
k k
g2s2 ,α − β = g2s2 ,α . The probability of this event is at most 2− β k .
k k k
Thus the probability of an error is at most
∞ k ∞
∑ ∑ 2− β k = ∑ k · 2− β k
k =1 i =1 k =1
I = I ( X : M | YR) + I (Y : M | XR) .
We shall prove:
Theorem 7.8. One can simuate any such protocol π with communication
√
complexity O( I · C · log C ).
compressing communication 107
The idea for the proof is quite striaghtforward. Alice and Bob use
correlated sampling to repeatedly guess the bits of the messages in
the protocol, without communicating. Then, they communicate a few
bits to fix the errors in the transmissions.
First observe that without loss of generality, we can assume that
there is no public randomness in the protocol we are simulating. This
is because for each fixing of the public randomness R = r, if the
internal information cost is Ir , and we obtain a simulating protocol
√
with communication Ir · C log C, then the expected number of bits
communicated for average r is
hp i r √
E Ir · C · log C ≤ E [ Ir ] · C · log C = I · C · log C. by convexity
p (r ) p (r )
γ(m<i ) = p( Mi = 1| xym<i ).
Input: Alice knows x ∈ X , Bob
These numbers define the correct m that our simulation protocol will knows y ∈ Y .
attempt to compute. To define the correct m, for each i, set mi = 1 if Output: m distributed according
to p(m| xy).
ρi < γ(m<i ), and set mi = 0 otherwise. The correct m has exactly the
right distribution—the probability that m is correct is Alice and Bob use public
randomness to sample
C C ρ1 , . . . , ρC ∈ [0, 1] uniformly and
independently;
∏ γ(m<i )mi (1 − γ(m<i ))1−mi = ∏ p(mi |xym<i ) = p(m|xy). Set j = 0;
i =1 i =1
while j ≤ C do
for i = j + 1, . . . , C do
Although Alice and Bob cannot compute γ(m<i ) without commu-
If ρi < γ A (m<i ), set
nicating, they can compute the numbers: miA = 1, otherwise set
miA = 0;
γ A (m<i ) = p( Mi = 1| xm<i ) and γ B (m<i ) = p( Mi = 1|ym<i ). If ρi < γ B (m<i ), set
miB = 1, otherwise set
miB = 0;
Moreover, if it is Alice’s turn to speak, then γ A (m<i ) = γ(m<i ), and
end
if it is Bob’s turn to speak, then γ B (m<i ) = γ(m<i ), so: if m A = m B then
Set i = C + 1;
Claim 7.9. Either γ(m<i ) = γ A (m<i ), or γ(m<i ) = γ B (m<i ). else
Set i to be the smallest
So, Alice and Bob use these numbers to try and guess the correct number such that
miA 6= miB ;
m. Alice computes m A by setting miA = 1 if and only if ρi < γ A (m<i ), If Alice was to send the
and Bob computes m B by setting miB = 1 if and only if ρi < γ B (m<i ). A , set
i’th bit after m< i
B A
Of course, m A and m B are likely to be quite different. However, mi = mi , otherwise set
miA = miB ;
by Claim 7.9, if they are the same, then they must both be equal end
to m. To compute m, Alice and Bob communicate to find the first end
index j where m jA 6= m Bj . Using the results of Exercise 3.1, this takes Figure 7.4: Compressing protocols to
O(log C/e) communication, if the probability of making an error is e. their internal information.
108 communication complexity
A dictates that Alice was supposed to send the j’th bit, then Bob
If m< j
sets m Bj = m jA , otherwise Alice sets m jA = m Bj . The two parties then
use ρ j+1 , . . . , ρC to recompute m A , m B . They repeat this procedure
until m A = m B = m.
To analyze the correctness of the simulation, we need to argue that
Alice and Bob can find m with small communication. To prove that
the communication complexity of the protocol is small, we appeal
to Pinsker’s inequality. We say that the protocol made a mistake at
i if during its execution, miA was found to be not equal to miB . This
happens exactly when ρi lies in between the numbers γ A (m<i ) and
γ B (m<i , so given that m is sampled by the protocol, the probability
that there is a mistake at i is at most
h i
E |γ A (m<i ) − γ B (m<i )|
p( xym)
Now for each fixing of m<i , if the i’th message is supposed to be sent
by Alice, we have
m mA mB
4
Figure 7.5: Finding the correct path. In
this case the correct path is obtained
after 3 mistakes have been fixed.
110 communication complexity
depth has a children, and every vertex at odd depth has b children.
Let E denote a subset of the edges of the tree, such that every vertex
is connected to exactly one of its children in E. The goal of Alice and
Bob is to compute the unique leaf in T a,b that remains connected to
the root of the tree. However, Alice only knows a subset E A ⊆ E of
Including the edges in the left subtrees
the edges, and Bob only knows a subset EB ⊆ E. E A is promised to corresponds to knowing x1 , . . . , xi−1 in
contain all the edges of E at even depth, and moreover, if a vertex v the indexing problem.
at odd depth is to the left of a sibling that is picked by its parent in
E A , then all edges of E in the subtree rooted at v are included in E A .
Similarly, EB contains all edges of E at odd depth, and in addition,
if a vertex v at even depth is to the left of a sibling that is picked by
its parent in EB , then all edges of E in the subtree rooted at v are
included in EB . A natural hard distribution for this problem is when
The inputs to Alice and Bob are corre-
the edges are sampled uniformly and independently. See Figure 7.6 lated, even though the edges sampled
for an example. are independent.
Theorem 7.10 shows that any protocol that has few rounds of By symmetry, the same result holds
when Bob sends the first message of the
communication must make an error:
protocol but Alice knows the first edge
of the tree.
Corollary 7.11. Any randomized k − 1 round protocol where Alice sends the
first message and Alice sends a0 bits in each round, Bob sends b0 bits in each
round must make an error with probability
r r !
1 b0 ln 2 a0 ln 2
+ ( k − 1) + ,
2 2a 2b
compressing communication 111
2 EB
2 EA
2 E A \ EB
as required.
Theorem 7.12. Any k round protocol for computing the greater than
function on x, y ∈ [2n ] must transmit at least Ω(n1/k /k2 ) bits in some
round.
Proof. We prove the theorem by appealing to the lower bound for the
tree pointer chasing problem (Theorem 7.10). Say we have an input
to the tree pointer chasing problem on a tree T a,a of depth k. We set
n = ak .
We show how Alice and Bob can transform the tree into inputs
x ∈ {0, 1, . . . , 2n − 1} and y ∈ {0, 1, . . . , 2n − 1} for the greater-than
problem, without communicating. The numbers x, y are best thought
of as ak−1 -digit numbers written in base a. The transformation will
guarantee that the greater than function will reveal undue informa-
tion about the identity of the last leaf in the tree.
We describe how to carry out the reduction. If the tree is of depth
1, with the edge coming out of the root in Alice’s input, we set x ∈
{0, 1, . . . , a − 1}, and y = d a/2e. If the edge from the root is in Bob’s
compressing communication 113
x= y0 x1 x2
y= y0 y1 y2
x, y
x0 , y0 x1 , y1 x2 , y2
Applications
8
Circuits, Branching Programs
It is well known that every (monotone) function f : {0, 1}n → The size of the circuit captures the total
{0, 1} can be computed by a (monotone) circuit of depth n and size at number of basic operations needed
to evaluate it. The depth captures the
most O(2n /n). number of parallel steps needed to
The importance of understanding boolean circuits stems from the evaluate it: if a circuit has size s and
depth d, then it can be evauated by s
fact that they are a universal model of computation. Any function
processors in d time steps.
that can be computed by an algorithm in T (n) steps can also be com-
118 communication complexity
puted by circuits of size Õ( T (n)). Thus to prove lower bounds on the
time complexity of algorithms, it is enough to prove that there are no
A super-polynomial lower bound on
small circuits that can carry out the computation. However, we know the circuit size of an NP problem would
of no explicit function (even outside NP) for which we can prove a imply that P 6= NP, resolving the most
famous open problem in computer
super-linear lower bound, highlighting the difficulty in proving lower
science.
bounds on algorithms. In contrast, counting arguments imply that
almost every function requires circuits of exponential size. The number of circuits of size s can by
bounded by 2O(s log s) , while the number
n
of functions f is 22 , so if s 2n /n, one
cannot hope to compute every function
Karchmer-Wigderson Games with a circuit of size s.
X ⊆ ∪ ρ ∈ R Sρ , Y is disjoint ∪ρ∈ R Sρ .
Proof. For each message m that Alice sends, let Xm denote the set of
inputs that are consistent with that message. Then by Lemma 8.2, we
get that there is a restriction ρm such that Xm ⊆ Sρm but Y is disjoint
from Sρm . Since every x ∈ X is consistent with some message m, the Sr
sets Sρ obtained in this way must cover X . Figure 8.3: Lemma 8.3
Matching
One of the most well studied combinatorial problems is the problem
of finding the largest matching in a graph. A matching is a set of dis-
joint edges. Today, we know of several polynomial time algorithms
2
Kleinberg and Tardos, 2006
that can find the matching of largest size in a given graph2 . This
translates to polynomial sized circuits for computing whether or not
a graph has a matching of any given size.
circuits, branching programs 121
Theorem 8.5. Any randomized protocol solving this game must communi-
cate Ω(n) bits.
As a corollary, we get:
Corollary 8.6. Every monotone circuit computing Match has depth Ω(n).
Set m = n/3. We shall show that if the parties can solve the
monotone Karchmer-Wigderson game using c bits of communication,
then they can get a randomized protocol for computing disjointness
on a universe of size m. Since any such randomized protocol requires 4
By Theorem 6.13.
linear communication complexity4 , this gives a communication lower
bound of Ω(n). Suppose Alice and Bob get inputs X ⊆ [m] and
Y ⊆ [m]. Alice constructs the graph GX on the vertex set [3m + 2]
such that for each i, GX contains the edge {3i, 3i − 1} if i ∈ X, and
has the edge {3i, 3i − 2} if i ∈
/ X. In addition, GX contains the edge
{3m + 1, 3m + 2}. Alice’s graph consists of m + 1 disjoint edges. Bob
uses Y to build a graph HY on the same vertex set as follows. For
each i ∈ [m], Bob connects 3i − 2 to all the other 3m + 1 vertices of the
graph if i ∈ Y. If i ∈
/ Y, Bob connects 3i to all the other vertices. Since
every edge of HY contains exactly one of {3i, 3i + 1}.
Alice and Bob permute the vertices of the graph randomly and run
the protocol promised by the monotone Karchmer-Wigderson game
on GX and HY . If X and Y are disjoint, the outcome of the protocol
must be the edge corresponding to {3m + 1, 3m + 2}. On the other
hand, if X and Y intersect in k elements, then the outcome of the
protocol is equally likely to be one of the edges that corresponds
to these k elements, so the probability that it is {3m + 1, 3m + 2}’th
edge is at most 1/2. If Bob sees that the i’th edge is output, then
122 communication complexity
2X 2Y
allowed to restrict the input, then one can simulate any protocol for a
Karchmer-Wigderson game with fewer rounds. This is the technical
heart of the proof of Theorem 8.7.
Suppose Alice is given an input from a set X ⊆ {0, 1}n , and Bob
is given an input from the set Y ⊆ {0, 1}n . Let α ∈ {0, 1, ∗} be a
Note that the sets Xα , Yρ may become
restriction. Applying the restriction gives us new sets of inputs: empty.
X α = { x ∈ X : x k α }, Y α = { y ∈ Y : y k α }.
The heart of the proof of Theorem 8.7 is the following lemma. Say
that a two round protocol where Alice speaks first has restrictions of
size t, if every restriction sent by Alice has size at most t.
Before we give the proof of Lemma 8.9, let us use it to prove Theo-
1
1
rem 8.7. Set t = n d−1 /16. Suppose the circuit has s < (8/7)n d−1 /16 /d =
(8/7)t /d gates, and is of depth d. Now consider the corresponding
Karchmer-Wigderson protocol. There are at most s possible two
round protocols that are executed as the last 2 round protocol in the
Karchmer-Wigderson game. We claim:
124 communication complexity
1
Claim 8.10. There is a k-restriction β, with k < n − n d−1 /2, such that π β
can be simulated by a two-round protocol with restrictions of size t.
Thus the probability that we are left with a two-round protocol with
restrictions of size t is at least 1 − 1/d − (4/7)t > 0. This proves that
some choice of restriction gives the claim.
solve the Karchmer-Wigderson game. Once Alice has ρy , she can find
γ ∈ Rα that is consistent with her input x, which allows her to solve
the game, since x k γ ∦ ρy . We only need to show that ρy is of size at
most t with high probability.
y
Claim 8.11. If ρi 6= ∗, there is a β ∈ Rα such that β i 6= ∗.
y Ra
Proof. If not, we could set ρi = ∗ to obtain a smaller restriction that set in
ry and r
is inconsistent with every γ ∈ Rα , contradicting the minimality of set in
ρy .
Figure 8.6: The restrictions from R must
cover ρy .
Given ρy , α, if |ρy |
> t, we shall compute an ` + t-restriction ρ and
y
blame it. ρ will have the property that if ρi 6= ∗, then ρi 6= ∗ or αi 6= ∗. Recall that by Lemmas 8.2 and 8.3,
Sρy must be disjoint from S β , for every
To compute ρ, let ρ = α to begin with, so |ρ| = `. We need to set
y β ∈ Rα .
t more coordinates of ρ to bit values. Whenever ρi 6= ∗, we must
circuits, branching programs 125
Claim 8.12. For any restriction ρ, the number of restrictions α that could Set ρ = α;
lead to ρ being blamed is at most (2t)t . while |ρ| < ` + t do
Let β ∈ Rα be the
lexicographically first
Proof. Given ρ, one can immediately identify the lexicographically restriction such that ∃i ∈ [n]
y
first restriction β ∈ Rα : it is the first restriction in R that is consistent with β i 6= ∗ 6= ρi , yet ρi = ∗;
Set ρi = β i .
with ρ. There are then at most t options for the first coordinate that end
was set in ρ. Given the first coordinate that was set in ρ, there at most Output ρ;
2t options for the next coordinate of ρ that was set: it is either one of Figure 8.7: Algorithm for computing ρ
the coordinates of β, or a coordinate in the next restriction in R that is from α, ρy .
consistent with ρ after setting the first coordinate to ∗. In this way, we
see that there are at most (2t)t choices for the set of coordinates that
were set by the algorithm for computing ρ from α.
Claims 8.12 implies that the probability that any ρ is blamed when
Rα contains no restriction of size bigger than t is at most
n `+t (2t)t n−` t t
·2 · n ≤ · 2 · (2t)t Since the number of ` + t -restrictions
`+t ( ` ) · 2` `+t n
is (`+ t) · 2`+t , and the number of
`-restrictions is (n` ) · 2` .
en t t
≤ · 2 · (2t)t since e ≤ 1/2.
n/2
= (8te)t .
Theorem 8.13. Any circuit of depth k − 1 that computes F must have size
n
at least 2 16k −1 .
126 communication complexity
Branching Programs
log n
on the left and R1 , . . . , R(1/2) log n on the right, such that | Qi | = 2 log 3e ,
√
| Ri | = n, and Qi , Ri form a bipartite clique. The parties simply pick
the set S in such a way that the i’th players input can be mapped to
Ri . This proves the theorem.
Boolean Formulas
Theorem 8.15. Any formula computing Distinct must use Ω(n2 ) gates.
Lemma 8.16. If there is a 1-round protocol where Alice sends Bob t bits and
Bob outputs Distinct(y1 , . . . , yn , z), then t ≥ log (2n
n ) = 2n − O (log n ).
Armed with Lemma 8.16, we are ready to prove the formula lower
bound:
Ti
Resolution Refutations
F =( x2 ∨ x1 ) ∧ (¬ x2 ∨ x1 ) ∧ (¬ x1 ∨ x3 ∨ ¬ x4 )
∧ (¬ x1 ∨ x3 ∨ x4 ) ∧ (¬ x1 ∨ ¬ x3 )
( a ∨ b) ∧ (¬ a ∨ c) ⇒ (b ∨ c).
The resolution refutation for F shown in Figure 9.1 uses this rule to
give a proof that F cannot be satisfied. In general, a resolution
refutation is a sequence of clauses where each clause is derived by
combining two previously derived clauses using the resolution rule.
The proof ends when the empty clause (namely a contradiction) is
derived. The proof is said to be tree-like if every derived clause is
used only once.
Finding solutions to boolean formulas is a central problem because 2
Wikipedia, 2016b
of its connection2 to the complexity classes NP and coNP. The best
132 communication complexity
(¬ x1 _ ¬ x3 )
(¬ x2 _ x1 )
(¬ x1 _ x3 _ x4 ) (¬ x1 )
( x1 )
(¬ x1 _ x3 ) )(
( x2 _ x1 )
(¬ x1 _ x3 _ ¬ x4 )
Axiom 9.2. Each hole contains exactly one pigeon, and the n − 1 pigeons
that are in the holes are distinct.
This can only make it easier to derive a contradiction. Indeed, H is implied by Axiom 9.2.
Claim 9.3. One of the big clauses must survive the assignment.
V
assignment to all the variables where i∈S\{i0 } Pi is true, yet C is
false. Suppose i00 ∈ / S and xi00 ,j is set to true in this assignment. Then
consider what happens when we set xi00 ,j to be false and xi0 ,j to be
true and leave the rest of the variables as they are. Doing so must
V
make C true, since i∈S Pi is now true in the assignment. Since C is a
disjunction of unnegated variables, this can only happen if C contains
xi0 ,j . Thus for each i0 ∈ S, there must be at least 3n/4 − n/2 = n/4
values of j for which xi0 ,j is in the clause C. So C is big.
Claim 9.4. If C is big, then the probability that C survives the random
n/8
assignment is at most 6364 .
Proof. Since C is big, in each of the first n/8 assignments, there are at
least n/4 − n/8 = n/8 pigeons which if assigned to n/4 − n/8 = n/8
holes would lead to the clause vanishing from the proof. Thus the
probability that the clause survives the first n/8 assignments of
pigeons to holes is at most
(n/4 − n/8)(n/4 − n/8)
1− = 1 − 1/64 = 63/64.
n2
n/8
64
Now suppose the proof has less than 63 clauses. Then by
Claim 9.4, there is an assignment of the pigeons to holes such that
every big clause does not survive. On the other hand, by Claim 9.3,
at least one big clause must survive. So the proof must have at least
n/8
64
63 clauses.
Cutting Planes
( a ∨ b) ⇒ a + b ≥ 1
(¬ a ∨ c) ⇒ 1 − a + c ≥ 1.
proof systems 135
1+b+c ≥ 2
⇒ b + c ≥ 1,
which corresponds to the clause (b ∨ c). So cutting planes are at least
as expressive as resolution refutations.
Lemma 9.5. If a formula can be refuted in s steps using resolution, then it
can be refuted in O(s) steps using cutting planes.
In fact, cutting planes gives a strictly stronger proof system. For
example, one can give a cutting planes proof of the pigeonhole
principle using just O(n2 ) proof steps. Rewriting the clauses of the
pigeonhole principle as linear inequalities, we get:
⇒ xi,j + xi0 ,j ≤ 1.
We shall use these inequalities to derive the inequality:
while
n n n −1
∑ Pi ≡ ∑ ∑ xi,j ≥ n.
i =1 i =1 j =1
It only remains to show how to derive Lk,j . L1,j and L2,j follow
immediately from H1,2,j . To derive Lk,j from Lk−1,j , for every r < q ≤
k, we derive the inequality
r
Er,q,j ≡ ∑ xi,j + xq,j ≤ 1.
i =1
r
Er−1,q,j + Lr,j + Hr,q,j ≡ ∑ 2xi,j + 2xq,j ≤ 3 Er 1,q,j
i =1 Lr,j
r
⇒ ∑ xi,j + xq,j ≤ 3/2 Hr,q,j
i =1
r Er,q,j
⇒ ∑ xi,j + xq,j ≤ 1
i =1 Figure 9.2: Adding Er−1,q,j , Lr,j , Hr,q,j
≡ Er,q,j . gives the terms of Er,q,j .
136 communication complexity
We shall prove:
Theorem 9.6. When k = Ω(n), any tree-like cutting planes proof of the
unsatisfiability of F must derive 2Ω(n/ log n) inequalities.
Claim 9.8. There must be an inequality L in the tree-like proof such that at
least s/3, but no more than 2s/3 of the inequalities in the proof are used to
derive L.
where here all of the variables on the left hand side are known to
Alice, and all the variables on the right hand side are known to
2
Bob. Since the variables are boolean, there are at most 23k possible
3k
values for the left hand side, and at most 23k(k−1)+( 2 ) possible values
that can be taken by the right hand side. Thus Alice and Bob can
use the randomized protocol for solving the greater-than problem
2
on a set of size 2O(k ) to compute whether or not this inequality
is satisfied by their variables. They expend O(log(k) + log log(s))
bits of communication in order to make sure that output of their
computation is correct with error 1/ log(s).
If the inequality L is not satisfied, Alice and Bob can safely discard
the rest of the proof, and continue to find a clause used to derive
L that evaluates to false. Otherwise, all of the inequalities used
to derive L can safely be discarded, and Alice and Bob can start
their search from the beginning of the proof after discarding all
the inequalities used to derive L. In either case, they discard at
least s/3 inequalities. Thus this process can repeat at most O(log s)
138 communication complexity
2X 2Y
Exercise 9.1
Show that the formula that asserts that there cannot be a graph
which both has a k-matching and a set of size k − 1 that covers every
edge requires an exponential number of inequalities to prove in the
cutting planes proof system.
Exercise 9.2
Show that the formula that asserts that there cannot be a graph
on [n] which both has a path from 1 to n and a set S ⊂ [n] with
1 ∈ S, n ∈
/ S k-matching and a set of size k − 1 that covers every edge
requires an exponential number of inequalities to prove in the cutting
planes proof system.
10
Data Structures
Sort Statistics
Suppose we want to maintain a set S ⊆ [n] of k numbers, so that you
can quickly add and delete numbers from the set, as well as compute These are operations need to be carried
the minimum of the set. A trivial solution is to store the k numbers in out efficiently in the execution the
fastest algorithms for computing the
a list. Then adding a number is fast, but finding the minimum might shortest path connecting two vertices of
take as long as k steps. A better solution is to maintain the numbers a graph.
in a heap. The numbers are stored in a balanced binary tree, with
the property that every node is at most as large as the value of its
children. One can add a number to the heap by adding it at a leaf,
and bubbling it up the tree. One can delete the minimum by deleting
the number at the root, inserting one of the numbers at a leaf into the
root, and bubbling down the number. This takes only O(log k) time
for each operation. See Figure 10.1 for an example.
Another solution is to maintain the numbers in a balanced binary
tree (Figure 10.2). Each memory location corresponds to a node in
the binary tree. Each leaf corresponds to an element of [k]. Each node
142 communication complexity
7 5 9 8 7 5 9 8 7 5 9 8
8 9 6 6 9 10 10 8 9 6 6 9 10 10 8 9 6 6 9 10
4 4 4
10 7 5 7 5 7
7 5 9 8 7 10 9 8 7 6 9 8
8 9 6 6 9 10 8 9 6 6 9 10 8 9 10 6 9 10 2
4 4 2
5 7 5 2 5 4
7 6 9 2 7 6 9 7 7 6 9 7
8 9 10 6 9 10 8 8 9 10 6 9 10 8 8 9 10 6 9 10 8
2 2 3 2 7 8 4 9 12 1 16 16
1 2 2 1 3 3 0 0 0 2 7 8 2 9 10 2 11 12 0 0 0 1 16 16
0 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1
Predecessor Search
Suppose we want to maintain a set of numbers S ⊆ [n] and be able to
quickly determine the predecessor of x, defined as
P( x ) = arg max y,
y∈S,y≤ x
data structures 143
min max
S \ I1 S \ I2 S \ I3 S \ I4 TS
Number of bits in each cell w This is the word-size of the data structure.
Set Intersection
Suppose we wish to store an arbitrary subset Y ⊆ [n], so that on
input X ⊆ [n], one can quickly compute whether or not X ∩ Y is
empty3 . There are several solutions one could come up with:
3
Miltersen et al., 1998
• We could store Y as string of n bits, broken up into words of size
w. This would give the parameters s = dn/we, t = dn/we.
tw ≥ n2 /4 − t log s · (n + 1) − n log n.
As a consequence, we get:
Theorem 10.4. In any static data structure solving the span problem, if
s < 2n/8t , then tw = Ω(n2 ).
Predecessor Search
In the predecessor search problem, the data structure is required to
encode a subset S ⊆ [u] of size n. The data structure should also be
able to compute the predecessor P( x ) of any element x ∈ [u]. This is
the largest element of S that is at most x. We have seen that there is a
data structure that can handle all of these operations in time log log u.
Here we show that this bound is essentially tight5 . 5
Ajtai, 1988; Beame and Fich, 2002;
Pǎtraşcu and Thorup, 2006; and Sen
Theorem 10.5. Any data structure solving the predecessor search problem and Venkatesh, 2008
log n
with s = poly(n) must either have time larger than Ω( log(w log n) ) or must
work only when log n · log log n ≥ Ω (log log u − log log w).
146 communication complexity
So the protocol for tree pointer chasing must make an error. Since the
protocol for predecessor search is correct, it must be the case that the
k
protocol works only when u < ( a + b)b . So we must have
log log u
≤ k log b + log log( a + b)
≤ log n · log((25 log s · log n)2 ) + log log((25 log n(log s + w))2 )
≤ O(log n log log n) + log log w, by assumption log s = O(log n)
as claimed.
Next we describe how to carry out the reduction. If the tree is of
depth 1, with the edge coming out of the root in Alice’s input, we set
S = {0, 1, . . . , a − 1}, and x ∈ {0, 1, . . . , a − 1} to be the name of the
child that is connected to the root. If Bob knows the edge coming out
of the root, we set S = {i } ⊂ {0, 1, . . . , b − 1}, where i corresponds to
the leaf of the tree that is connected to the root, and we set x = b − 1.
In either case, S is a set of size at most ( a + b), defined on a universe
of size at most a, and the predecessor of x in S determines the output
of the tree pointer chasing problem.
If the tree is of depth k > 1, with Alice knowing the edge coming
out of the root, we first compute x0 , . . . , x a−1 , S0 , . . . , Sa−1 , which
data structures 147
x 0 , S0 x 1 , S1 x 2 , S2
x = i · t + xi ,
and
a
[
S= { i · t + y : y ∈ Si } ,
i =1
k −1
where t = ( a + b)b . The new universe is of size at most a · ( a +
k −1 k
b)b ≤ ( a + b)b , and |S| is at most a · ak−1 = ak . If the edge touching
the root belongs to Bob, we compute x0 , . . . , xb−1 and S0 , . . . , Sb−1 This corresponds to writing x in base
using the b subtrees of depth k − 1. If i ∈ {0, 1, . . . , b} corresponds to t with the digits x0 , x1 , x2 , . . . , xb−1 ,
and setting the y’th element of S to be
the edge touching the root, we set x0 , x1 , . . . , xi−1 , y, 0, . . . , 0, where y ∈ Si .
b −1
x= ∑ x j · t b − j −1 ,
j =0
and
( )
i −1
S= ∑ x j · t b − j − 1 + y · t b − i − 1 : y ∈ Si ,
j =0
k −1
where u = ( a + b)b . Since Bob knows x0 , . . . , xi−1 , S can be
computed by Bob. The size of the universe in this case is at most
k −1 b k
( a + b)b ≤ ( a + b)b .
In both cases, the predecessor of x in S determines the relevant
predecessor of xi in Si , and hence determines the output of the tree
pointer chasing input.
148 communication complexity
2 S0 2 S0 2 S0 2 S0 2 S0 2 S0
2 S1 2 S1 2 S1 2 S1 2 S1
2 S2 2 S2 2 S2 2 S2
2 S3 2 S3 2 S3
2 S4 2 S4
2 S5
r r − j −1
kr − j − 1
tu · ∑ kr −s = tu · ∑ ks = tu · ≤ 2tu kr− j−1 .
s = j +1 s =0
k−1
H ( A | BC ) ≥ H ( A | B) − H (C | B) by subadditivity of entropy
r− j r − j −1
≥k − 2k tu (w + log s) since the number of bits
needed to describe Z is at most
r− j 2tu (w + log s) 2kr− j−1 · tu · (w + log s)
≥k 1− .
k
kr − j
r− j 2tu (w + log s)
∑ H ( A i | A <i BC ) ≥ k 1 −
k
. (10.1)
i =1
4t (w+log s)
Combining this bound with (10.1), and setting k = u1−h(e) we get
that the probability that a cell belonging to round j is queried is at
least
2tu (w + log s) 1 − h(e)
E [ Q] ≥ 1 − − h(e) =
k 2
The expected number of cells queried is then at least:
as required.
Graph Connectivity
ensures that the trees of the data structure remain balanced. This
gives a simple data structure of size O(n log n) where each operation
takes O(log n) time.
u−i = (u1 , u2 , . . . , ui−1 , ui+1 , . . . , ud ), and for every i, u−i , let Si,u−i be a 1
Suppose Alice and Bob are given subsets X, Y ⊆ ∪i,u−i Si,u−i , such 3
that for every i, u−i , | X ∩ Si,u−i | = 1. So k = | X | = dBd−1 , while the
4
size of Y could be as large as dBd . We shall show how Alice and Bob
can use the data structure to solve the lopsided disjointness problem Figure 10.7: The butterfly graph with
d = 3, B = 2.
with inputs X, Y.
data structures 153
!
n log log n es log2 n n
≤t log +
log2 n n log log n log2 n
= o (n log log n/ log n) .
Theorem 10.10. Any data structure solving the graph connectivity problem
with error e must satisfy:
tu (w + log s)
tq · log ≥ Ω((1 − h(e)) log n).
1 − h(e)
Setting w = log n, s = poly(n), e < 1/2
to be constant, and tu = polylog(n), we
Proof. For parameters k, r, we sample a uniformly random graph that get that tq ≥ Ω(log n/ log log n).
consists of two disconnected k-ary trees of depth r. The number of
vertices in such a graph is n = 2(kr+1 − 1)/(k − 1). We add the edges
of this graph to the data structure in r rounds. In the first round,
we add all the edges at depth r. In the j’th round, we add the edges
at depth r − j + 1 to using the data structure. Finally, we pick two
random leaves from the graph and query whether or not they are
connected.
Say that a cell of the data structure belongs to round j if it was
last touched in round j of the updates. Let B denote all edges not
added to the graph in the j’th round. Let C denote the the locations
and contents of all cells that belong to rounds i > j. After fixing
B, the roots of the two trees have been fixed, and the identities of
r − j +1
the leaves have also been determined. Let A ∈ {0, 1}2k be the
random variable which has a bit for each vertex v in the graph at
depth r − j + 1, such that Av = 0 if v is connected to the first tree in
the graph, and Av = 1 if v is connected to the second tree.
Then we see that
r − j +1 p
2k √
H ( A | B) = log r j 1 ≥ 2kr− j+1 − 1 − log kr− j+1 , using the fact that (2a
a) ≥ 2
2a−1 / a
k − +
r r− j
∑ 2kr−i+1 = 2 · ∑ ki
i = j +1 i =0
k r − j +1 − 1
= 2·
k−1
r− j
≤ 4k . since k − 1 ≥ k/2
By subadditivity, we get
H ( A | BC ) ≥ H ( A | B) − H (C | B)
p
≥ 2kr− j+1 − log kr− j+1 − 1 − 4kr− j tu (w + log s)
≥ 2kr− j+1 − 6kr− j tu (w + log s) since log k(r− j+1)/2 ≤ k(r− j+1)/2 ≤ kr− j
data structures 155
H ( AU , AV | UVBC )
2 !
1 1
≥ −
k r − j +1 2kr− j+1
· 2kr− j+1 − 6kr− j tu (w + log s)
1 6tu (w + log s)
≥ 2− −
2kr− j+1 k
7tu (w + log s)
≥ 2− . since kr− j+1 ≥ k
k
Let Q be 1 if the data structure queries a cell that belongs to round
j, and 0 otherwise. Let X, Y be two random leaves that queried
for connectivity. For each fixing of B, these leaves correspond to
two vertices U, V at depth r − j + 1 that we wish to compute the
connectivity of. For each fixing of X, Y, B, C, let eX,Y,B,C denote the
probability that the data structure makes an error in answering
the query. If the data structure makes a query to a cell belong-
ing to round j, then H ( AU , AV ) ≤ 1 + Q, and if it does not then
H ( AU , AV ) ≤ H ( AU ⊕ AV ) + H ( AU ) ≤ 1 + h(eX,Y,B,C ), since the
parity of AU , AV can have at most h(eX,Y,B,C ) bits of entropy. Thus we
156 communication complexity
get:
H ( AU , AV | XYBC ) ≤ [1 + Q + h(eX,Y,B,C )]
E
X,Y,B,C
≤ E [1 + Q ] + h E [eX,Y,B,C ] by convexity of h(e)
X,Y,B,C
= 1 + E [ Q ] + h ( e ).
Thus,
7tu (w + log s)
E [ Q] ≥ 1 − h(e) − .
k
14t (w+log s)
We set k = (u1−h(e)) , to get E [ Q] ≥ (1 − h(e))/2. Thus the expected
number of queries made must be at least
log n
r · (1 − h(e))/2 ≥ 14tu (w+log s)
· (1 − h(e))/2,
log 1− h ( e )
as required.
Dictionaries
Exercise 10.1
Modify the Van Emde Boas tree data structure so that it can maintain
the median of n numbers, with time O(log log n) for adding, deleting
and querying the median.
11
Extension Complexity of Convex Polytopes
A convex polytope is a subset of Euclidean space that can be A set S is convex if whenever x, y ∈ S,
defined by a finite number of linear inequalities. Any n × d matrix A the line between x, y is also in S. We
see from the definition that P is always
and an n × 1 vector b define the polytope
convex.
P = { x ∈ Rd : Ax ≤ b}.
that only intersects P on its boundary. A face must have dimension For example, if the polytope is defined
less than the polytope, since the polytope has dimension d. When the by the 3 × d matrix A by Ax ≤ b, and
A1 + A2 = A3 , b1 + b2 = b3 , then the
dimension of the face is exactly one less than the dimension of the third inequality is implied by the first
polytope itself, we call the face a facet. If the polytope is defined by two, and the polytope can have at most
2 facets.
n inequalities, it can have at most n facets, though it may have fewer
facets than inequalities.
Fact 11.1. Every face of P can be expressed as the intersection of some subset
of the facets of P.
Fact 11.2. v ∈ P is a vertex of P if and only if there are no distinct u, w ∈ P Figure 11.2: A polytope in the plane
such that v = u+2 w . with 5 facets and 5 vertices.
160 communication complexity
x = V·µ
µi ≥ 0 for i = 1, 2, . . . , k
k
∑ µi = 1 ,
i =1
x = V·µ
µi ≥ 0 for i = 1, 2, . . . , k
E( x ) ≤ 0,
U ( x ) ≥ z, ⇔ z − U ( x ) ≤ 0.
L( x ) ≤ z, ⇔ L( x ) − z ≤ 0.
where here E, U, L are affine maps. Every inequality E( x ) ≤ 0 in- An affine map is a function of the type
F ( x ) = a0 + ∑ik=1 ai xi .
duces the same inequality for P. Every pair of inequalities L, U
induces the inequality
L ( x ) ≤ U ( x ). ⇔ L( x ) − U ( x ) ≤ 0.
Now given a graph with the edge set E, consider the problem of
finding the point in the graph polytope that minimizes the linear
function
L( x ) = ∑ n · x{u,v} + ∑ x{u,v} .
/E
{u,v}∈ {u,v}∈ E
L( x ) = ∑ x{u,v} − ∑ x{u,v} .
{u,v}∈ E /E
{u,v}∈
We claim that the size of the largest matching in the graph is the
same as maxx∈ M L( x ). Indeed, the largest matching in the graph
itself has value equal to its size. On the other hand, since every
point of the polytope is a convex combination of matchings, if
there is a point in the polytope that achieves a larger value, then
there is a matching that achieves a larger value. Such a matching
cannot x{u,v} > 0 for any {u, v} ∈ / E, since setting x{u,v} = 0
extension complexity of convex polytopes 163
extension complexity4 . 4
Kaibel and Pashkovich, 2010
Consider any polytope P = π ( Q) ⊆ R2 which is the projection
of a polytope Q = { Ax ≤ b} ⊆ Rd . We can assume that P The result can also be proved when n is
is obtained from Q by restricting the point in Q to the first 2 not a power of 2, though this requires
more work.
coordinates. Suppose x1 ≥ 0 is an inequality defining a facet
164 communication complexity
d
c 1 x d + 1 + ∑ c i x i ≤ bi ,
i =2
|S|
∑ xi = ∑i
i ∈S i =1
and
|T |
∑ xi = ∑ i,
i∈T i =1
∑ xi = ∑ xi + ∑ xi − ∑ xi
i ∈S∪ T i ∈S i∈T i ∈ T ∩S
|S| |T | | T ∩S|
≤ ∑i+∑i− ∑ i using the ineuqality applied to the set
T∩S
i =1 i =1 i =1
|S| |T | |S∪ T |
= ∑i+ ∑ i< ∑ i,
i =1 i =| T ∩S|+1 i =1
contradicting the constraint for S ∪ T. Thus all the sets that give
constraints that x satisfies with equality correspond to sets S1 ⊆
S2 ⊆ . . . ⊆ Sk that form a chain. Let π be a permutation with
π (Si ) = [|Si |] for i = 1, 2, . . . , k. This permutation also satisfies the
same equations with equality. If x 6= π, for small enough e, we get
that that x + eπ ∈ Q and x − eπ ∈ Q are two distinct points and
( x +eπ )+( x −eπ ) Recall Fact 11.2.
x= 2 lies on the line between them. This contradicts
the fact that x is a vertex of Q.
To see that each of these inequalities gives a facet, observe that
every permutation satisfying π (S) = [|S|] gives a point in the
halfspace corresponding to the inequality for S. Moreover if a, b
are such that h a, π i = b, then it must be the case that ai = a j
whenever i, j ∈ S and whenever i, j ∈ / S. Otherwise we could
violate this constraint by swapping the values of π (i ), π ( j). This
proves that the dimension of all such permutations is at least n − 2,
and so the inequality defines a facet of the permutahedron. 5
Goemans, 2015
2413
2314 3412
1423
1324
4312 3421
3214
1432
4213 2431
4321
4231
4123 2341
1342
3124
4132 3241
1234
3142
2134 1243
2143
for any Y ∈ Q,
n n n n
∑ (Yv)i = ∑ ∑ Yij · j = ∑ j.
i =1 i =1 j =1 j =1
some i ∈ S, j > |S| such that Yij > 0. Then by decreasing Yij and
Yi0 j0 , and increasing Yij0 and Yi0 j , we get a new point Y ∈ Q with
a smaller value in (11.1). Thus we can assume that the minimum
value of (11.1) is achieved with Yi0 j0 = 0 for all i0 ∈/ S, j0 ≤ |S|. For 101
v g = 1 − vh . 000
v g ≤ vh , 100
v g ≥ vh ,
v g ≥ vr ,
v g ≤ v h + vr .
Lemma 11.9 gives a powerful way to prove both upper and lower
bound on the extension complexity of polytopes. The lower bounds
are usually proved using ideas strongly inspired by lower bounds in
communication complexity.
∑ x{u,v} = n − 1
u,v
∑ x{u,v} ≤ |S| − 1 for every S ⊆ [n] the number of edges in any subset is at
most the size of the subset
u,v∈S
Then we see that ∑ a,b,c u a,b,c v a,b,c is exactly the slack of T from
the facet of the set S. So setting U to be the matrix whose rows
correspond to the vectors u for each set S, and V to be the matrix
T
whose columns correspond to the vectors v for each tree T, we can
express the slack matrix as UV. This proves that the non-negative
rank of the slack matrix is at most n3 . By Lemma 11.9, the spanning a
tree polytope has an extended formulation with at most n3 facets.
S
Separating Polytopes
Consider any separating polytope for the function f : {0, 1}n →
{0, 1}. We can use Lemma 11.10 to get a lower bound on the exten-
Since we showed that the minimal
sion complexity. extension complexity of a separating
Let polytope is at most linear in the size
n of the smallest circuit computing f ,
∆( x, y) = ∑ y i (1 − x i ) + (1 − y i ) x i the bound we prove here gives a lower
i =1 bound on the circuit size of f .
Theorem 11.11. Suppose for every e > 0, rank+ ( M f ,e ) > k. Then the
extension complexity of every separating polytope for f is at at least k.
∆( x, y) ≥ e.
We claim that for y, there is a value e > 0 such that this inequality is
satisfied by all the points in P. Indeed, if this is not the case, then one
can find a sequence of points x1q , x2 , . . . ∈ P such that ∆( x j , y) ≤ 1/j.
√
Since we must have k x j − yk ≤ ∑in=1 (1/j)2 = n/j, this sequence
A compact set is a set that is both
must converge to y. Since P is compact, this implies that y ∈ P, which
closed and bounded. The limit of every
contradicts the fact that P is a separating polytope for f . convergent sequence contained in a
Thus, by taking the smallest e valid for all y’s, we get that ∆( x, y) ≥ compact set must itself lie in the set.
The cut polytope K is the convex hull of all cuts in a graph. For-
mally, for every set A ⊆ [n + 1] define the vertex
1 if i ∈ A, j ∈
/ A,
y{Ai,j} = for all i 6= j ∈ [n + 1].
0 otherwise.
The cut polytope is the convex hull of all such vertices. Observe that
c A vertex of the cut polytope:
y A = y A , so there are 2n vertices.
In fact, C and K are isomorphic. Consider the linear map π : C →
/A
∈ ∈A /A
∈ ∈A ∈A /A
∈
/A
∈
K, where for i ≤ j ∈ [n + 1] we set ∈A 1
/A
∈
0 1
x + x − 2x
{i } { j} {i,j} if i, j ≤ n, ∈A 1 0 1
π ( x ){i,j} = ∈A 1 0 1 0
x if j = n + 1. 0 1 0 1 1
{i } /A
∈
Proof. For every set of size 1, {i }, C contains the unit vector x {i} in
the corresponding direction. For every set {i, j} of size 2, C contains
the unit vector x{i,j} − x{i} − x{ j} in that direction. So the dimension of
C is full.
∑ x {i } ≤ 1 + ∑ xS
i∈ B S∈( B2 )
holds. Indeed, we see that for any vertex x A , the left hand side is
exactly | A ∩ B|, and the right hand side is exactly 1 + (| A∩ B|
2 ), so the
inequality holds. Moreover, the equation is satisfied with equality
exactly when | A ∩ B| = 1. In all other cases, the slack of the inequality
is at least 1. So if we let Q be the polytope defined by the inequalities
given by every set B, and the inequalities 1 ≥ x{i,j} ≥ 0 then Q is a
bounded polytope that contains C .
Matching Polytope
We defined the matching polytope as the convex hull of all matchings 10
Rothvoß, 2014
in a graph on n vertices. Here we prove10 :
Like the lower bound for the correlation polytope, the proof will
rely crucially on entropy based inequalities. Recall that the facets of
this polytope are given by:
Given a set X and a edge, we say that the set cuts the edge if the
edge goes from inside the set to outside the set. A matching is called
perfect if every vertex of the graph is contained in some edge of the
matching. When the matching is a perfect matching, the inequality of
each of these facets corresponds to asserting that every odd set must
The proof will also show that the
cut at least one edge of the matching. convex hull of perfect matchings has
It will be convenient to work with 4n + 6 vertices. Let S be the slack exponential extension complexity.
matrix of the matching polytope. We shall prove that rank+ (S) ≥
2Ω ( n ) . Sx,y is one less than the number of
edges of y cut by x.
Consider the distribution on cuts and matchings given by:
Sx,y
q( xy) =
∑i,j Si,j
X
Y
Let D denote the event that X, Y are consistent with P, and for
every i, Xi does not cut the edges of Yi . We will show:
Before proving Lemma 11.15, we show how to use it. For each
fixing of P, conditioned on the event D , every pair X, Y has exactly
the same probability, and the cut and matching are sampled inde-
pendently in each of the cycles. So the pairs ( X1 , Y1 ), . . . , ( Xn , Yn ) are
mutually independent. Thus by Lemma 6.15, we get
n
2 log r ≥ ∑ I (Xi : T | X<i Y≥i PD) + I (Yi : T | X≤i Y>i PD)
i =1
≥ Ω ( n ), by Lemma 11.15
In fact, we shall prove the stronger fact that for any fixed z, if
p( xyct) = q( xyct|zE ).
A direct computation shows that the weights in S look like When xi = ∅, there are 2 potential
values for yi , each with relative weight
Yi 6= Ai Yi = Ai 2. When xi 6= ∅, there is one value of yi
" # with a weight of 4, and one value with
Xi = ∅ 2 2 a weight of 2.
S=
Xi 6 = ∅ 2 4
So
p( Xi 6= ∅, Yi = Ai )
p( Xi = ∅, Yi 6= Ai ) = .
2
We will prove:
p( Xi = ∅, Yi 6= Ai )
2 1 + 4η
≤ p( Xi 6= ∅, Yi = Ai ) · · + 4η,
5 1 − 4η
which implies that η ≥ Ω(1). To prove the bound, we start by
expressing:
r
p( Xi = ∅, Yi 6= Ai ) = ∑ p(Xi = ∅, Yi 6= Ai , t).
t =1
β ct = | p(yi |ct, Xi = ∅) − p(yi |c, Xi = ∅)|. note: p(yi |ct, Xi = ∅) = p(yi |ct)
So we can bound
p( Xi = ∅, Yi 6= Ai ) ≤ 4η + ∑ p( Xi = ∅, Yi 6= Ai , c, t).
(c,t)∈G
When (c, t) ∈ G ,
1
p( Xi = ∅, Yi 6= Ai |ct) 4 +η 1 + 4η
≤ 1
= .
p( Xi 6= ∅, Yi = Ai |ct) 4 −η 1 − 4η
Which implies:
p( Xi = ∅, Yi 6= Ai )
1 + 4η
≤ 4η +
1 − 4η ∑ p( Xi 6= ∅, Yi = Ai , c, t).
(c,t)∈G
p( Xi = ∅, Yi 6= Ai )
1 1 + 4η
≤ 4η +
(53)
·
1 − 4η ∑ p( Xi 6= ∅, Yi = Ai , t)
(c,t)∈G
r
4 1 + 4η
≤ 4η +
(53)
·
1 − 4η ∑ p(Xi 6= ∅, Yi = Ai , t) by Claim 11.17
t =1
2 1 + 4η
= 4η + · · p( Xi 6= ∅, Yi = Ai ),
5 1 − 4η
as promised.
Now we turn to proving each of the claims:
Proof of Claim 11.17. We claim that if c and c0 are two sets with |c ∩
c0 | ≤ 1, then we cannot have (c, t) ∈ G and (c0 , t) ∈ G . Indeed,
suppose (c, t), (c0 , t) ∈ G are as in Figure 11.12. Then since
αct < η < 1/2, p( xi = ∅|ct) has positive probability, and the
cut shown in Figure 11.12 has positive probability conditioned on
t. Similarly, since β c0 t < η < 1/k, the two edges shown in Figure
11.12 have positive probability conditioned on t. However, since
the matching and cut are independent conditioned on t, both have
positive probability conditioned on t. But this cannot happen, since
such a matching and cut have 0 probability in q.
This means that all of the sets c, c0 that are in G must intersect in
at least 2 elements. We claim that family of subsets in ([53]) can have
at most 4 sets, if all pairs of sets are to have pairwise intersections
of size 2. We can obtain 4 = (43) such sets by taking the collection ([nk ]) denotes the collection of subsets of
[n] of size k.
([43]), and we claim that there is no family of 5 sets whose pairwise
extension complexity of convex polytopes 177
c c0
We claim:
p( β bi t > η | Xi = ∅, Yi 6= Ai )
p ( β bi t > η | X i = ∅ )
=
p(Yi 6= Ai | Xi = ∅)
≤ 2η.
Exercise 11.1
Give a factorization of the slack matrix of the permutahedron
S = UV, where U is a non-negative 2n − 2 × r matrix, V is a non-
negative r × n! matrix, and r = O(n2 ).
Exercise 11.2
The cube in n dimensions is the convex hull of the set {0, 1}n .
Identify the facets of the cube. Is it possible that the extension com-
√
plexity of the cube is O( n)?
Exercise 11.3
Show that the non-negative rank of the slack matrix of a regular 2k
sided polygon in the plane is at most O(k ) by giving a factorization
of the matrix into non-negative matrices.
Exercise 11.4
Given two disjoint sets A, B, each of size n, define the bipartite
matching polytope to be the convex hull of all bipartite matchings:
matchings where every edge goes from A to B. Using what we know
about the permutahedron, show that the extension complexity of the
bipartite matching polytope is at most O(n2 ).
Exercise 11.5
Show that there is a k for which the convex hull of cliques of size k
has extension complexity 2Ω(n) /n.
12
Distributed Computing
Lemma 12.1. There is a family of t subsets of [5d2 log t], such that for any
d + 1 sets S1 , . . . , Sd+1 in the family, S1 is not contained in the union of
S2 , . . . , S d + 1 .
Proof. Pick the t sets at random from [5d2 log t], where each element
is included in each set independently with probability 1/d. Then for
a particular choice of S1 , . . . , Sd+1 , the probability that some element
j ∈ S1 but j is not in any of the other sets is
1
(1 − 1/d)d /d ≥ 2−2 /d = d.
4
2
The probability that there is no such j is at most (1 − 14 d)5d log t ≤
e−d log t .
The number of choices for d + 1 such sets from the family is (d+t 1) ≤
t 1 ≤ 2d log t . Thus, by the union bound, the probability that the
d +
family does not have the property we need is at most e−d log t 2d log t <
1.
In each round of the protocol, all parties send their current color
to all the other parties. If there are t colors in a particular round,
each party looks at the d colors she received and associates each
with a set from the family promised by Lemma 12.1. She picks a
color by picking an element that belongs to her own set but not to
any of the others. Thus, the next round will have at most 5d2 log t
colors. Continuing in this way, the number of colors will be reduced
to O(d2 log d) in log∗ n rounds.
A C
v w
B D
i i
i i
if (i, j) 2
/X if (i, j) 2
/Y
j j
Consider the protocol obtained when Alice simulates all the ver-
tices close to A, B, and Bob simulates all the nodes close to C, D. This
protocol must solve the disjointness problem, and so has communi-
cation at least Ω(n2 ). This proves that the O(n) links that cross from
the left to the right in the above network must carry at least Ω(n2 )
bits of communication to compute the diameter of the graph.
Detecting Triangles
4
Drucker et al., 2014
Another basic measure associated with a graph is its girth,
which is the length of the shortest cycle in the graph. Here we show
that any distributed protocol for computing
√ the girth of an n vertex
graph must involve at least Ω(n2 2−O( log n) )) bits of communication.
We prove4 this by showing that any such protocol can be used
to compute disjointness in the number-on-forehead
√ model with 3
parties, and a universe of size Ω(n2 2−O( log n) )). Applying Theorem
5.12 gives the lower bound.
Suppose Alice, Bob and Charlie have 3 sets X, Y, Z ⊆ U written
on their foreheads, where U is a set that we shall soon specify. Let
A, B, C be 3 disjoint sets of size 2n. We shall define a graph GX,Y,Z on
the vertex set A ∪ B ∪ C, that will have a triangle (namely a cycle of
length 3) if and only if X ∩ Y ∩ Z is non-empty. c a
A a if 2Q C
2
To construct GX,Y,Z we need the coloring
√ promised by Theorem
c
4.2. This is a coloring of [n] with 2O( log n) colors such that there are
no monochromatic arithmetic progressions. Since such √ a coloring if b a2Q if c b2Q
exists, there must be a subset Q ⊆ [n] of size n2−O( log n) that does
not contain any non-trivial arithmetic progressions.
b
Now define a graph G on the vertex set A ∪ B ∪ C, where for each B
a ∈ A, b ∈ B, c ∈ C, Figure 12.2: The graph G.
distributed computing 183
( a, b) ∈ G ⇔ b − a ∈ Q,
(b, c) ∈ G ⇔ c − b ∈ Q,
c−a
( a, c) ∈ G ⇔ ∈ Q.
2
√
log n)
Claim 12.3. The graph G has at least n| Q| = Ω(n2 2−O( ) triangles,
and no two triangles in G share an edge.
( a, b) ∈ G ⇔ a triangle of U containing ( a, b) is in Z,
B
(b, c) ∈ G ⇔ a triangle of U containing (b, c) is in X,
2X C
( a, c) ∈ G ⇔ a triangle of U containing ( a, b) is in Y.
2Y
Given sets X, Y, Z as input, Alice, Bob and Charlie build the network 2Z
GX,Y,Z and execute the protocol for detecting triangles, with Alice, Figure 12.3: The graph GX,Y,Z .
Bob, Charlie simulating the behavior of the nodes in A, B, C of the
network. Each of the players knows enough information to simulate
the behavior of these nodes. By Theorem 5.12, the total
√ communi-
log n)
cation of triangle detection must be at least Ω(n2 2−O( ), as
required.
Noga Alon, Shlomo Hoory, and Nathan Linial. The moore bound for
irregular graphs. Graphs and Combinatorics, 18(1):53–57, 2002. URL
http://dx.doi.org/10.1007/s003730200002.
Paul Beame and Faith E. Fich. Optimal bounds for the predecessor
problem and related problems. J. Comput. Syst. Sci., 65(1):38–72, 2002.
Tomàs Feder, Eyal Kushilevitz, Moni Naor, and Noam Nisan. Amor-
tized communication complexity. SIAM Journal on Computing, 24(4):
736–750, 1995. Prelim version by Feder, Kushilevitz, Naor FOCS 1991.
Samuel Fiorini, Thomas Rothvoß, and Hans Raj Tiwary. Extended for-
mulations for polygons. Discrete & Computational Geometry, 48(3):658–
668, 2012. URL http://dx.doi.org/10.1007/s00454-012-9421-9.
Pavel Hrubes and Anup Rao. Circuits with medium fan-in. In 30th
Conference on Computational Complexity, CCC 2015, June 17-19, 2015,
Portland, Oregon, USA, volume 33, pages 381–391, 2015.