Conic Programming

Linear Conic Programming
Yinyu Ye
December 2004
ii
Preface
This monograph is developed for MS&E 314, Semidenite Programming,
which I am teaching at Stanford. Information, supporting materials, and computer programs related to this book may be found at the following address on
the World-Wide Web:
http://www.stanford.edu/class/msande314
Please report any question, comment and error to the address:
yinyu-ye@stanford.edu
A little story in the development of semidenite programming. One day
in 1990, I visited the Computer Science Department of the University of Minnesota and met a young graduate student, Farid Alizadeh. He, working then on
combinatorial optimization, introduced me semidenite optimization or linear
programming over the positive denite matrix cone. We had a very extensive
discussion that afternoon and concluded that interior-point linear programming
algorithms could be applicable to solving SDP. I urged Farid to look at the LP
potential functions and to develop an SDP primal potential reduction algorithm.
He worked hard for several months, and one afternoon showed up in my oce
in Iowa City, about 300 miles from Minneapolis. He had everything worked out,
including potential, algorithm, complexity bound, and even a dictionary from
LP to SDP, but was stuck on one problem which was how to keep the symmetry
of the matrix. We went to a bar nearby on Clinton Street in Iowa City (I paid
for him since I was a third-year professor then and eager to demonstrate that I
could take care of students). After chatting for a while, I suggested that he use
X 1/2 X 1/2 to keep symmetry instead of X 1 which he was using, where
X is the current symmetric positive denite matrix and is the symmetric
directional matrix. He returned to Minneapolis and moved to Berkeley shortly
after, and few weeks later sent me an e-mail message telling me that everything
had worked out beautifully.
At the same time or even earlier, Nesterov and Nemirovskii developed a
more general and powerful theory in extending interior-point algorithms for
solving conic programs, where SDP was a special case. Boyd and his group
later presented a wide range of SDP applications and formulations, many of
which were incredibly novel and elegant. Then came the primal-dual algorithm,
the Max-Cut, ... and SDP established its full popularity.
YINYU YE
iii
iv
Stanford, 2002
PREFACE
Chapter 1
Introduction and
Preliminaries
1.1
Introduction
Semidenite Programming, hereafter SDP, is a natural extension of Linear programming (LP) that is a central decision model in Management Science and
Operations Research. LP plays an extremely important role in the theory and
application of Optimization. In one sense it is a continuous optimization problem in minimizing a linear objective function over a convex polyhedron; but it
is also a combinatorial problem involving selecting an extreme point among a
nite set of possible vertices. Businesses, large and small, use linear programming models to optimize communication systems, to schedule transportation
networks, to control inventories, to adjust investments, and to maximize productivity.
In LP, the variables form a vector which is regiured to be nonnegative, where
in SDP they are components of a matrix and it is constrained to be positive
semidenite. Both of them may have linear equality constraints as well. One
thing in common is that interior-point algorithms developed in past two decades
for LP are naturally applied to solving SDP.
Interior-point algorithms are continuous iterative algorithms. Computation
experience with sophisticated procedures suggests that the number of iterations
necessarily grows much more slowly than the dimension grows. Furthermore,
they have an established worst-case polynomial iteration bound, providing the
potential for dramatic improvement in computation eectiveness.
The goal of the monograph is to provide a text book for teaching Semidenite
Programming, a modern Linear Programming decision model and its applications in other scientic and engineering elds. One theme of the monograph is
the mapping between SDP and LP, so that the reader, with knowledge of LP,
can understand SDP with little eort.
The monograph is organized as follows. In Chapter 1, we discuss some
1
CHAPTER 1. INTRODUCTION AND PRELIMINARIES
necessary mathematical preliminaries. We also present several decision and

optimization problems and several basic numerical procedures used throughout
the text.
Chapter 2 is devoted to studying the theories and geometries of linear
and matrix inequalities, convexity, and semidenite programming. Almost all
interior-point methods exploit rich geometric properties of linear and matrix
inequalities, such as center, volume, potential, etc. These geometries are
also helpful for teaching, learning, and research.
Chapter 3 is focused on interior-point algorithms. Here, we select two types
algorithms: the path-following algorithm and the potential reduction algorithm.
Each algorithm has three forms, the primal, the dual and the primal-dual form.
We analyze the worst-case complexity bound for them, where we will use the
real number computation model in our analysis because of the continuous nature
of interior-point algorithms. We also compare the complexity theory with the
convergence rate used in numerical analysis.
Not only has the convergnece speed of SDP algorithms been signicantly
improved during the last decade, but also the problem domain applicable by
SDP has dramatically widened. Chapters 4, 5, and 6 would describe some
of SDP applications and new established results in Engineering, Combinatory
Optimization, Robust Optimization, Euclidean Geometry Computation, etc.
Finally, we discuss major computational issues in Chapter 7. We discuss several eective implementation techniques frequently used in interior-point SDP
software, such as the sparse linear system, the predictor and corrector step, and
the homogeneous and self-dual formulation. We also present major diculties
and challengies faced by SDP.
1.2
Mathematical Preliminaries
This section summarizes mathematical background material for linear algebra,

linear programming, and nonlinear optimization.
1.2.1
Basic notations
The notation described below will be followed in general. There may be some
deviation where appropriate.
By R we denote the set of real numbers. R+ denotes the set of nonnegative
real numbers, and R+ denotes the set of positive numbers. For a natural number
n, the symbol Rn (Rn , Rn ) denotes the set of vectors with n components in

+
+
R (R+ , R+ ). We call Rn the interior of Rn .

+
+
The vector inequality x y means xj yj for j = 1, 2, ..., n. Zero represents
a vector whose entries are all zeros and e represents a vector whose entries
are all ones, where their dimensions may vary according to other vectors in
expressions. A vector is always considered as a column vector, unless otherwise
stated. Upper-case letters will be used to represent matrices. Greek letters will
1.2. MATHEMATICAL PRELIMINARIES
typically be used to represent scalars. For convenience, we sometime write a

column vector x as
x = (x1 ; x2 ; . . . ; xn )
and a row vector as
x = (x1 , x2 , . . . , xn ).
Addition of matrices and multiplication of matrices with scalars are standard. The superscript T denotes transpose operation. The inner product in
Rn is dened as follows:
n
x, y := xT y =
xj yj
for
x, y Rn .
j=1
The l2 norm of a vector x is given by

x
xT x,
and the l norm is

x
= max{|x1 |, |x2 |, ..., |xn |}.
In general, the p norm is

1/p
|xj |
p = 1, 2, ...
The dual of the p norm, denoted by . , is the q norm, where

1 1
+ = 1.
p q
In this book, . generally represents the l2 norm.
For natural numbers m and n, Rmn denotes the set of real matrices with m
rows and n columns. For A Rmn , we assume that the row index set of A is
{1, 2, ..., m} and the column index set is {1, 2, ..., n}. The ith row of A is denoted
by ai. and the jth column of A is denoted by a.j ; the i and jth component of
A is denoted by aij . If I is a subset of the row index set and J is a subset of
the column index set, then AI denotes the submatrix of A whose rows belong
to I, AJ denotes the submatrix of A whose columns belong to J, AIJ denotes
the submatrix of A induced by those components of A whose indices belong to
I and J, respectively.
The identity matrix is denoted by I. The null space of A is denoted N (A)
and the range of A is R(A). The determinant of an n n-matrix A is denoted
by det(A). The trace of A, denoted by tr(A), is the sum of the diagonal entries
in A. The operator norm of A, denoted by A , is
A
:= max n
0=xR
Ax 2
.
x 2
For a vector x Rn , Dx represents a diagonal matrix in Rnn whose diagonal entries are the entries of x, i.e.,
Dx = diag(x).
A matrix Q Rnn is said to be positive denite (PD), denoted by Q
if
xT Qx > 0,
for all
x = 0,
and positive semi-denite (PSD), denoted by Q

xT Qx 0,
for all
0,
0, if
x.
If Q 0, then Q is called negative denite (ND), denoted by Q 0; if Q 0,

then Q is called negative semi-denite (NSD), denoted by Q
0. If Q is
symmetric, then its eigenvalues are all real numbers; furthermore, Q is PSD if
and only if all its eigenvalues are non-negative, and Q is PD if and only if all its
eigenvalue are positive. Given a PD matrix Q we can dene a Q-norm , . Q ,
for vector x as
x Q = xT Qx .
Mn denotes the space of symmetric matrices in Rnn . The inner product
in Mn is dened as follows:
X, Y := X Y = trX T Y =
for X, Y Mn .
Xi,j Yi,j
i,j
This is a generalization of the vector inner product to matrices. The matrix

norm associated with the inner product is called Frobenius norm:
X f = trX T X .
Mn denote the set of positive semi-denite matrices in Mn . Mn denotes the

+
+
set of positive denite matrices in Mn . We call Mn the interior of Mn .

+
+
{xk } is an ordered sequence x0 , x1 , x2 , ..., xk , .... A sequence {xk } is
0
0
convergent to x, denoted xk x, if
xk x 0.
A point x is a limit point of {xk } if there is a subsequence of {xk } convergent

0
to x.
If g(x) 0 is a real valued function of a real nonnegative variable, the
notation g(x) = O(x) means that g(x) cx for some constant c; the notation
g(x) = (x) means that g(x) cx for some constant c; the notation g(x) = (x)
means that cx g(x) cx. Another notation is g(x) = o(x), which means that
g(x) goes to zero faster than x does:

lim
x0
g(x)
= 0.
x
1.2.2
Convex sets
If x is a member of the set , we write x ; if y is not a member of , we

write y . The union of two sets S and T is denoted S T ; the intersection
of them is denoted S T . A set can be specied in the form = {x : P (x)} as
the set of all elements satisfying property P .
For y Rn and > 0, B(y, ) = {x : x y } is the ball or sphere of
radius with center y. In addition, for a positive denite matrix Q of dimension
n, E(y, Q) = {x : (x y)T Q(x y) 1} is called an ellipsoid. The vector y is
the center of E(y, Q).
A set is closed if xk x, where xk , implies x . A set is open
if around every point y there is a ball that is contained in , i.e., there is
an > 0 such that B(y, ) . A set is bounded if it is contained within a
ball with nite radius. A set is compact if it is both closed and bounded. The
(topological) interior of any set , denoted , is the set of points in which
are the centers of some balls contained in . The closure of , denoted , is
the smallest closed set containing . The boundary of is the part of that
is not in .
A set C is said to be convex if for every x1 , x2 C and every real number
, 0 < < 1, the point x1 + (1 )x2 C. The convex hull of a set is the
intersection of all convex sets containing .
A set C is a cone if x C implies x C for all > 0. A cone that is also
convex is a convex cone. For a cone C E, the dual of C is the cone
C := {y :
x, y 0 for all
x C},
where , is an inner product operation for space E.

Example 1.1 The n-dimensional non-negative orthant, Rn = {x Rn : x
+
0}, is a convex cone. The dual of the cone is also Rn ; it is self-dual.
+
Example 1.2 The set of all positive semi-denite matrices in Mn , Mn , is a
+
convex cone, called the positive semi-denite matrix cone. The dual of the cone
is also Mn ; it is self-dual.
+
Example 1.3 The set {(t; x) Rn+1 : t x } is a convex cone in Rn+1 ,
called the second-order cone. The dual of the cone is also the second-order cone
in Rn+1 ; it is self-dual.
A cone C is (convex) polyhedral if C can be represented by
C = {x : Ax 0}
for some matrix A (Figure 1.1).
Example 1.4 The non-negative orthant is a polyhedral cone, and neither the
positive semi-denite matrix cone nor the second-order cone is polyhedral.
Polyhedral Cone
Nonpolyhedral Cone
Figure 1.1: Polyhedral and nonpolyhedral cones.

The most important type of convex set is a hyperplane. Hyperplanes dominate the entire theory of optimization. Let a be a nonzero n-dimensional vector,
and let b be a real number. The set
H = {x Rn : aT x = b}
is a hyperplane in Rn (Figure 1.2). Relating to hyperplane, positive and negative
closed half spaces are given by
H+ = {x : aT x b}
H = {x : aT x b}.
+
0
Figure 1.2: A hyperplane and half-spaces.
A set which can be expressed as the intersection of a nite number of closed
half spaces is said to be a convex polyhedron:
P = {x : Ax b}.
A bounded polyhedron is called polytope.

Let P be a polyhedron in Rn , F is a face of P if and only if there is a vector
c for which F is the set of points attaining max {cT x : x P } provided the
this maximum is nite. A polyhedron has only nite many faces; each face is a
nonempty polyhedron.
The most important theorem about the convex set is the following separating
theorem (Figure 1.3).
Theorem 1.1 (Separating hyperplane theorem) Let C E, where E is either
Rn or Mn , be a closed convex set and let y be a point exterior to C. Then there
is a vector a E such that
a, y < inf a, x .
xC
The geometric interpretation of the theorem is that, given a convex set C

and a point y outside of C, there is a hyperplane containing y that contains C
in one of its open half spaces.
Figure 1.3: Illustration of the separating hyperplane theorem; an exterior point

y is separated by a hyperplane from a convex set C.
Example 1.5 Let C be a unit circle centered at the point (1; 1). That is, C =
{x R2 : (x1 1)2 + (x2 1)2 1}. If y = (2; 0), a = (1; 1) is a separating
hyperplane vector. If y = (0; 1), a = (0; 1) is a separating hyperplane vector.
It is worth noting that these separating hyperplanes are not unique.
We use the notation E to represent either Rn or Mn , depending on the context, throughout this book, because all our decision and optimization problems
take variables from one or both of these two vector spaces.
1.2.3
Real functions
The real function f (x) is said to be continuous at x if xk x implies f (xk )

f (x). The real function f (x) is said to be continuous on set E, where recall
that E is either Rn or Mn , if f (x) is continuous at x for every x .
A function f (x) is called homogeneous of degree k if f (x) = k f (x) for all

0.
Example 1.6 Let c Rn be given and x Rn . Then cT x is homogeneous of

+
degree 1 and
n
P(x) = n log(c x)
log xj
j=1
is homogeneous of degree 0, where log is the natural logarithmic function. Let
C Mn be given and X Mn . Then xT Cx is homogeneous of degree 2, C X

+
and det(X) are homogeneous of degree 1, and
P(X) = n log(C X) log det(X)
is homogeneous of degree 0.
A set of real-valued function f1 , f2 , ..., fm dened on E can be written as a
single vector function f = (f1 , f2 , ..., fm )T Rm . If f has continuous partial
derivatives of order p, we say f C p . The gradient vector or matrix of a
real-valued function f C 1 is a vector or matrix
f (x) = {f /xij },
for i, j = 1, ..., n.
If f C 2 , we dene the Hessian of f to be the n2 -dimensional symmetric matrix

2
f (x) =
2f
xij xkl
for i, j, k, l = 1, ..., n.
If f = (f1 , f2 , ..., fm )T Rm , the Jacobian matrix of f is
f1 (x)
.
...
f (x) =
fm (x)
f is a (continuous) convex function if and only if for 0 1,
f (x + (1 )y) f (x) + (1 )f (y).
f is a (continuous) quasi-convex function if and only if for 0 1,
f (x + (1 )y) max[f (x), f (y)].
Thus, a convex function is a quasi-convex function. The level set of f is given
by
L(z) = {x : f (x) z}.
f is a quasi-convex function implies that the level set of f is convex for any
given z (see Exercise 1.9).
A group of results that are used frequently in analysis are under the heading
of Taylors theorem or the mean-value theorem. The theorem establishes the
linear and quadratic approximations of a function.
Theorem 1.2 (Taylor expansion) Let f C 1 be in a region containing the line

segment [x, y]. Then there is a , 0 1, such that
f (y) = f (x) +
f (x + (1 )y)(y x).
Furthermore, if f C 2 then there is a , 0 1, such that

f (y) = f (x) +
f (x)(y x) + (1/2)(y x)T
f (x + (1 )y)(y x).
We also have several propositions for real functions. The rst indicates that
the linear approximation of a convex function is a under-estimate.
Proposition 1.3 Let f C 1 . Then f is convex over a convex set if and
only if
f (y) f (x) + f (x)(y x)
for all x, y .
The following proposition states that the Hessian of a convex function is
positive semi-denite.
Proposition 1.4 Let f C 2 . Then f is convex over a convex set if and
only if the Hessian matrix of f is positive semi-denite throughout .
1.2.4
Inequalities
There are several important inequalities that are frequently used in algorithm
design and complexity analysis.
Cauchy-Schwarz: given x, y Rn , then
xT y x y .
Arithmetic-geometric mean: given x Rn ,
+
xj
xj
1/n
Harmonic: given x Rn ,
+
xj
1/xj n2 .
Hadamard: given A Rmn with columns a1 , a2 , ..., an , then

det(AT A)
aj .
10
1.3
Some Basic Decision and Optimization Problems
A decision or optimization problem has a form that is usually characterized by

the decision variables and the constraints. A problem, P, consists of two sets,
data set Zp and solution set Sp . In general, Sp can be implicitly dened by the
so-called optimality conditions. The solution set may be empty, i.e., problem P
may have no solution.
Theorem 1.5 Weierstrass theorem A continuous function f dened on a compact set (bounded and closed) E has a minimizer in ; that is, there is an
x such that for all x , f (x) f (x ).
In what follows, we list several decision and optimization problems. More
problems will be listed later when we address them.
1.3.1
System of linear equations
Given A Rmn and b Rm , the problem is to solve m linear equations for n

unknowns:
Ax = b.
The data and solution sets are
Zp = {A Rmn , b Rm }
and Sp = {x Rn : Ax = b}.
Sp in this case is an ane set. Given an x, one can easily check to see if x is in
Sp by a matrix-vector multiplication and a vector-vector comparison. We say
that a solution of this problem is easy to recognize.
To highlight the analogy with the theories of linear inequalities and linear
programming, we list several well-known results of linear algebra. The rst
theorem provides two basic representations, the null and row spaces, of a linear
subspaces.
Theorem 1.6 Each linear subspace of Rn is generated by nitely many vectors,
and is also the intersection of nitely many linear hyperplanes; that is, for each
linear subspace of L of Rn there are matrices A and C such that L = N (A) =
R(C).
The following theorem was observed by Gauss. It is sometimes called the
fundamental theorem of linear algebra. It gives an example of a characterization
in terms of necessary and sucient conditions, where necessity is straightforward, and suciency is the key of the characterization.
Theorem 1.7 Let A Rmn and b Rm . The system {x : Ax = b} has a
solution if and only if that AT y = 0 implies bT y = 0.
A vector y, with AT y = 0 and bT y = 1, is called an infeasibility certicate for
the system {x : Ax = b}.
Example 1.7 Let A = (1; 1) and b = (1; 1). Then, y = (1/2; 1/2) is an
infeasibility certicate for {x : Ax = b}.
1.3. SOME BASIC DECISION AND OPTIMIZATION PROBLEMS
1.3.2
11
Linear least-squares problem
Given A Rmn and c Rn , the system of equations AT y = c may be overdetermined or have no solution. Such a case usually occurs when the number of
equations is greater than the number of variables. Then, the problem is to nd
an y Rm or s R(AT ) such that AT y c or s c is minimized. We can
write the problem in the following format:
(LS)
or
minimize
subject to
AT y c
y Rm ,
(LS)
minimize
subject to
sc 2
s R(AT ).
In the former format, the term AT y c 2 is called the objective function,

y is called the decision variable. Since y can be any point in Rm , we say this
(optimization) problem is unconstrained. The data and solution sets are
Zp = {A Rmn , c Rn }
and
Sp = {y Rm : AT y c
AT x c
for every x Rm }.
Given a y, to see if y Sp is as the same as the original problem. However, from

a projection theorem in linear algebra, the solution set can be characterized and
represented as
Sp = {y Rm : AAT y = Ac},
which becomes a system of linear equations and always has a solution. The
vector s = AT y = AT (AAT )+ Ac is the projection of c onto the range of AT ,
where AAT is called normal matrix and (AAT )+ is called pseudo-inverse. If A
has full row rank then (AAT )+ = (AAT )1 , the standard inverse of full rank
matrix AAT . If A is not of full rank, neither is AAT and (AAT )+ AAT x = x
only for x R(AT ).
The vector c AT y = (I AT (AAT )+ A)c is the projection of c onto the
null space of A. It is the solution of the following least-squares problem:
(LS) minimize
subject to
xc 2
x N (A).
In the full rank case, both matrices AT (AAT )1 A and I AT (AAT )1 A are
called projection matrices. These symmetric matrices have several desired properties (see Exercise 1.15).
1.3.3
System of linear inequalities
Given A Rmn and b Rm , the problem is to nd a solution x Rn

satisfying Ax b or prove that the solution set is empty. The inequality
12
problem includes other forms such as nding an x that satises the combination
of linear equations Ax = b and inequalities x 0. The data and solution sets
of the latter are
Zp = {A Rmn , b Rm }
and Sp = {x Rn : Ax = b, x 0}.
Traditionally, a point in Sp is called a feasible solution, and a strictly positive

point in Sp is called a strictly feasible or interior feasible solution.
The following results are Farkas lemma and its variants.
Theorem 1.8 (Farkas lemma) Let A Rmn and b Rm . Then, the system
{x : Ax = b, x 0} has a feasible solution x if and only if that AT y 0
implies bT y 0.
A vector y, with AT y 0 and bT y = 1, is called a (primal) infeasibility certicate for the system {x : Ax = b, x 0}. Geometrically, Farkas lemma means
that if a vector b Rm does not belong to the cone generated by a.1 , ..., a.n ,
then there is a hyperplane separating b from cone(a.1 , ..., a.n ).
Example 1.8 Let A = (1, 1) and b = 1. Then, y = 1 is an infeasibility
certicate for {x : Ax = b, x 0}.
Theorem 1.9 (Farkas lemma variant) Let A Rmn and c Rn . Then, the
system {y : AT y c} has a solution y if and only if that Ax = 0 and x 0
imply cT x 0.
Again, a vector x 0, with Ax = 0 and cT x = 1, is called a (dual) infeasibility
certicate for the system {y : AT y c}.
Example 1.9 Let A = (1; 1) and c = (1; 2). Then, x = (1; 1) is an infeasibility certicate for {y : AT y c}.
We say {x : Ax = b, x 0} or {y : AT y c} is approximately feasible in the
sense that we have an approximate solution to the equations and inequalities.
In this case we can show that any certicate proving their infeasibility must
have large norm. Conversely, if {x : Ax = b, x 0} or {y : AT y c} is
approximately infeasible in the sense that we have an approximate certicate
in Farkas lemma, then any feasible solution must have large norm.
Example 1.10 Given > 0 but small. Let A = (1, 1) and b = . Then,
x = (0; 0) is approximately feasible for {x : Ax = b, x 0}, and the infeasibility
certicate y = 1/ has a large norm.
Let A = (1; 1) and c = (1; 1 ). Then, y = 1 is approximately feasible
for {y : AT y c}, and the infeasibility certicate x = (1/ ; 1/ ) has a large
norm.
1.3.4
13
Linear programming (LP)
Given A Rmn , b Rm and c, l, u Rn , the linear programming (LP)

problem is the following optimization problem:
minimize
subject to
cT x
Ax = b, l x u,
where some elements in l may be meaning that the associated variables

are unbounded from below, and some elements in u may be meaning that
the associated variables are unbounded from above. If a variable is unbounded
either from below or above, then it is called a free variable
The standard form linear programming problem is given below, which we
will use throughout this book:
(LP ) minimize cT x
subject to Ax = b, x 0.
The linear function cT x is called the objective function, and x is called the
decision variables. In this problem, Ax = b and x 0 enforce constraints on
the selection of x. The set Fp = {x : Ax = b, x 0} is called feasible set or
feasible region. A point x Fp is called a feasible point, and a feasible point
x is called an optimal solution if cT x cT x for all feasible points x. If there
is a sequence {xk } such that xk is feasible and cT xk , then (LP) is said
to be unbounded.
The data and solution sets for (LP), respectively, are
Zp = {A Rmn , b Rm , c Rn }
and
Sp = {x Fp : cT x cT y,
for every y Fp }.
Again, given an x, to see if x Sp is as dicult as the original problem.

However, due to the duality theorem, we can simplify the representation of the
solution set signicantly.
With every (LP), another linear program, called the dual (LD), is the following problem:
(LD)
maximize
subject to
bT y
AT y + s = c, s 0,
where y Rm and s Rn . The components of s are called dual slacks. Denote

by Fd the sets of all (y, s) that are feasible for the dual. We see that (LD) is
also a linear programming problem where y is a free vector.
The following theorems give us an important relation between the two problems.
Theorem 1.10 (Weak duality theorem) Let Fp and Fd be non-empty. Then,
cT x bT y
where
x Fp , (y, s) Fd .
14
This theorem shows that a feasible solution to either problem yields a bound
on the value of the other problem. We call cT x bT y the duality gap. From
this we have important results.
Theorem 1.11 (Strong duality theorem) Let Fp and Fd be non-empty. Then,
x is optimal for (LP) if and only if the following conditions hold:
i) x Fp ;
ii) there is (y , s ) Fd ;
iii) cT x = bT y .
Theorem 1.12 (LP duality theorem) If (LP) and (LD) both have feasible solutions then both problems have optimal solutions and the optimal objective values
of the objective functions are equal.
If one of (LP) or (LD) has no feasible solution, then the other is either
unbounded or has no feasible solution. If one of (LP) or (LD) is unbounded
then the other has no feasible solution.
The above theorems show that if a pair of feasible solutions can be found to
the primal and dual problems with equal objective values, then these are both
optimal. The converse is also true; there is no gap. From this condition, the
solution set for (LP) and (LD) is
cT x bT y
n
m
n
Ax
Sp = (x, y, s) (R+ , R , R+ ) :
AT y s
= 0
= b
,
= c
(1.1)
which is a system of linear inequalities and equations. Now it is easy to verify

whether or not a pair (x, y, s) is optimal.
For feasible x and (y, s), xT s = xT (c AT y) = cT x bT y is called the
complementarity gap. If xT s = 0, then we say x and s are complementary to
each other. Since both x and s are nonnegative, xT s = 0 implies that xj sj = 0
for all j = 1, . . . , n. Thus, one equation plus nonnegativity are transformed into
n equations. Equations in (1.1) become
Dx s =
Ax =
AT y s =
0
b
c.
(1.2)
This system has total 2n + m unknowns and 2n + m equations including n

nonlinear equations.
The following theorem plays an important role in analyzing LP interiorpoint algorithms. It give a unique partition of the LP variables in terms of
complementarity.
15
Theorem 1.13 (Strict complementarity theorem) If (LP) and (LD) both have
feasible solutions then both problems have a pair of strictly complementary solutions x 0 and s 0 meaning
X s = 0
and
x + s > 0.
Moreover, the supports

P = {j : x > 0}
j
and
Z = {j : s > 0}
j
are invariant for all pairs of strictly complementary solutions.

Given (LP) or (LD), the pair of P and Z is called the (strict) complementarity partition. {x : AP xP = b, xP 0, xZ = 0} is called the primal
optimal face, and {y : cZ AT y 0, cP AT y = 0} is called the dual
Z
P
optimal face.
Select m linearly independent columns, denoted by the index set B, from A.
Then matrix AB is nonsingular and we may uniquely solve
AB xB = b
for the m-vector xB . By setting the variables, xN , of x corresponding to the
remaining columns of A equal to zero, we obtain a solution x such that
Ax = b.
Then, x is said to be a (primal) basic solution to (LP) with respect to the basis
AB . The components of xB are called basic variables. A dual vector y satisfying
AT y = cB
B
is said to be the corresponding dual basic solution. If a basic solution x 0,
then x is called a basic feasible solution. If the dual solution is also feasible,
that is
s = c AT y 0,
then x is called an optimal basic solution and AB an optimal basis. A basic
feasible solution is a vertex on the boundary of the feasible region. An optimal
basic solution is an optimal vertex of the feasible region.
If one or more components in xB has value zero, that basic solution x is
said to be (primal) degenerate. Note that in a nondegenerate basic solution the
basic variables and the basis can be immediately identied from the nonzero
components of the basic solution. If all components, sN , in the corresponding
dual slack vector s, except for sB , are non-zero, then y is said to be (dual)
nondegenerate. If both primal and dual basic solutions are nondegenerate, AB
is called a nondegenerate basis.
Theorem 1.14 (LP fundamental theorem) Given (LP) and (LD) where A has
full row rank m,
16
i) if there is a feasible solution, there is a basic feasible solution;

ii) if there is an optimal solution, there is an optimal basic solution.
The above theorem reduces the task of solving a linear program to that
searching over basic feasible solutions. By expanding upon this result, the simplex method, a nite search procedure, is derived. The simplex method is to
proceed from one basic feasible solution (an extreme point of the feasible region)
to an adjacent one, in such a way as to continuously decrease the value of the
objective function until a minimizer is reached. In contrast, interior-point algorithms will move in the interior of the feasible region and reduce the value of
the objective function, hoping to by-pass many extreme points on the boundary
of the region.
1.3.5
Quadratic programming (QP)
Given Q Rnn , A Rmn , b Rm and c Rn , the quadratic programming

(QP) problem is the following optimization problem:
(QP )
minimize
subject to
q(x) := (1/2)xT Qx + cT x
Ax = b, x 0.
We may denote the feasible set by Fp . The data and solution sets for (QP) are
Zp = {Q Rnn , A Rmn , b Rm , c Rn }
and
Sp = {x Fp : q(x) q(y) for every y Fp }.
A feasible point x is called a KKT point, where KKT stands for KarushKuhn-Tucker, if the following KKT conditions hold: there exists (y Rm , s
Rn ) such that (x , y , s ) is feasible for the following dual problem:
(QD) maximize
subject to
d(x, y) := bT y (1/2)xT Qx
AT y + s Qx = c, x, s 0,
and satises the complementarity condition

(x )T s = (1/2)(x )T Qx + cT x (bT y (1/2)(x )T Qx = 0.
Similar to LP, we can write the KKT condition as:
(x, y, s) (Rn , Rm , Rn )
+
+
and
Dx s = 0
Ax = b
AT y + Qx s = c.
(1.3)
Again, this system has total 2n + m unknowns and 2n + m equations including

n nonlinear equations.
1.4. ALGORITHMS AND COMPUTATIONS
17
The above condition is also called the rst-order necessary condition. If Q is

positive semi-denite, then x is an optimal solution for (QP) if and only if x
is a KKT point for (QP). In this case, the solution set for (QP) is characterized
by a system of linear inequalities and equations. One can see (LP) is a special
case of (QP).
1.4
Algorithms and Computations
An algorithm is a list of instructions to solve a problem. For every instance of

problem P, i.e., for every given data Z Zp , an algorithm for solving P either
determines that Sp is empty or generates an output x such that x Sp or x is
close to Sp in certain measure. The latter x is called an approximate solution.
Let us use Ap to denote the collection of all possible algorithm for solving
every instance in P. Then, the (operation) complexity of an algorithm A Ap
for solving an instance Z Zp is dened as the total arithmetic operations: +,
, , /, and comparison on real numbers. Denote it by co (A, Z). Sometimes it
is convenient to dene the iteration complexity, denoted by ci (A, Z), where we
assume that each iteration costs a polynomial number (in m and n) of arithmetic operations. In most iterative algorithms, each iteration can be performed
eciently both sequentially and in parallel, such as solving a system of linear
equations, rank-one updating the inversion of a matrix, pivoting operation of a
matrix, multiplying a matrix by a vector, etc.
In the real number model, we introduce , the error for an approximate
solution as a parameter. Let c(A, Z, ) be the total number of operations of
algorithm A for generating an -approximate solution, with a well-dened measure, to problem P. Then,
c(A, ) := sup c(A, Z, ) fA (m, n, ) for any
> 0.
ZZp
We call this complexity model error-based. One may also view an approximate
solution an exact solution to a problem -near to P with a well-dened measure
in the data space. This is the so-called backward analysis model in numerical
analysis.
If fA (m, n, ) is a polynomial in m, n, and log(1/ ), then algorithm A is
a polynomial algorithm and problem P is polynomially solvable. Again, if
fA (m, n, ) is independent of and polynomial in m and n, then we say algorithm A is a strongly polynomial algorithm. If fA (m, n, ) is a polynomial
in m, n, and (1/ ), then algorithm A is a polynomial approximation scheme
or pseudo-polynomial algorithm . For some optimization problems, the complexity theory can be applied to prove not only that they cannot be solved
in polynomial-time, but also that they do not have polynomial approximation
schemes. In practice, approximation algorithms are widely used and accepted
in practice.
Example 1.11 There is a strongly polynomial algorithm for sorting a vector in
18
descending or ascending order, for matrix-vector multiplication, and for computing the norm of a vector.
Example 1.12 Consider the bisection method to locate a root of a continuous
function f (x) : R R within interval [0, 1], where f (0) > 0 and f (1) < 0.
The method calls the oracle to evaluate f (1/2) (counted as one operation). If
f (1/2) > 0, we throw away [0, 1/2); if f (1/2) < 0, we throw away (1/2, 1]. Then
we repeat this process on the remaining half interval. Each step of the method
halves the interval that contains the root. Thus, in log(1/ ) steps, we must have
an approximate root whose distance to the root is less than . Therefore, the
bisection method is a polynomial algorithm.
We have to admit that the criterion of polynomiality is somewhat controversial. Many algorithms may not be polynomial but work ne in practice. This
is because polynomiality is built upon the worst-case analysis. However, this
criterion generally provides a qualitative statement: if a problem is polynomial
solvable, then the problem is indeed relatively easy to solve regardless of the
algorithm used. Furthermore, it is ideal to develop an algorithm with both
polynomiality and practical eciency.
1.4.1
Convergence rate
Most algorithms are iterative in nature. They generate a sequence of everimproving points x0 , x1 , ..., xk , ... approaching the solution set. For many optimization problems and/or algorithms, the sequence will never exactly reach the
solution set. One theory of iterative algorithms, referred to as local or asymptotic convergence analysis, is concerned with the rate at which the optimality
error of the generated sequence converges to zero.
Obviously, if each iteration of competing algorithms requires the same amount
of work, the speed of the convergence of the error reects the speed of the algorithm. This convergence rate, although it may hold locally or asymptotically,
provides evaluation and comparison of dierent algorithms. It has been widely
used by the nonlinear optimization and numerical analysis community as an efciency criterion. In many cases, this criterion does explain practical behavior
of iterative algorithms.
Consider a sequence of real numbers {rk } converging to zero. One can dene
several notions related to the speed of convergence of such a sequence.
Denition 1.1 . Let the sequence {rk } converge to zero. The order of convergence of {rk } is dened as the supermum of the nonnegative numbers p satisfying
0 lim sup
k
|rk+1 |
< .
|rk |p
Denition 1.2 . Let the sequence {rk } converge to zero such that
lim sup
k
|rk+1 |
< .
|rk |2
1.5. BASIC COMPUTATIONAL PROCEDURES
19
Then, the sequence is said to converge quadratically to zero.

It should be noted that the order of convergence is determined only by the
properties of the sequence that holds as k . In this sense we might say that
the order of convergence is a measure of how good the tail of {rk } is. Large
values of p imply the faster convergence of the tail.
Denition 1.3 . Let the sequence {rk } converge to zero such that
lim sup
k
|rk+1 |
= < 1.
|rk |
Then, the sequence is said to converge linearly to zero with convergence ratio .
Linear convergence is the most important type of convergence behavior. A
linearly convergence sequence, with convergence ratio , can be said to have a
tail that converges to zero at least as fast as the geometric sequence c k for a
xed number c. Thus, we also call linear convergence geometric convergence.
As a rule, when comparing the relative eectiveness of two competing algorithms both of which produce linearly convergent sequences, the comparison
is based on their corresponding convergence ratiothe smaller the ratio, the
faster the algorithm. The ultimate case where = 0 is referred to as superlinear convergence.
1
Example 1.13 Consider the conjugate gradient algorithm for minimizing 2 xT Qx+
c. Starting from an x0 Rn and d0 = Qx0 +c, the method uses iterative formula
xk+1 = xk k dk
where
k =
(dk )T (Qxk + c)
,
dk 2
Q
and
dk+1 = Qxk+1 k dk
where
k =
(dk )T Q(Qxk+1 + c)
.
dk 2
Q
This algorithm is superlinearly convergent (in fact, it converges in nite number

of steps).
1.5
Basic Computational Procedures
There are several basic numerical problems frequently solved by interior-point

algorithms.
20
1.5.1
Gaussian elimination method
Probably the best-known algorithm for solving a system of linear equations is

the Gaussian elimination method. Suppose we want to solve
Ax = b.
We may assume a11 = 0 after some row switching, where aij is the component
of A in row i and column j. Then we can subtract appropriate multiples of the
rst equation from the other equations so as to have an equivalent system:
a11
0
A1.
A
x1
x
b1
b
This is a pivot step, where a11 is called a pivot, and A is called a Schur complement. Now, recursively, we solve the system of the last m 1 equations for
x . Substituting the solution x found into the rst equation yields a value for
x1 . The last process is called back-substitution.
In matrix form, the Gaussian elimination method transforms A into the form
U
0
C
0
where U is a nonsingular, upper-triangular matrix,

U
0
A=L
C
0
and L is a nonsingular, lower-triangular matrix. This is called the LU -decomposition.

Sometimes, the matrix is transformed further to a form
D
0
C
0
where D is a nonsingular, diagonal matrix. This whole procedure uses about

nm2 arithmetic operations. Thus, it is a strong polynomial-time algorithm.
1.5.2
Choleski decomposition method
Another useful method is to solve the least squares problem:

(LS)
minimize
AT y c .
The theory says that y minimizes AT y c if and only if

AAT y = Ac.
So the problem is reduced to solving a system of linear equations with a symmetric semi-positive denite matrix.
1.5. BASIC COMPUTATIONAL PROCEDURES
21
One method is Choleskis decomposition. In matrix form, the method transforms AAT into the form
AAT = LLT ,
where L is a lower-triangular matrix and is a diagonal matrix. (Such a
transformation can be done in about nm2 arithmetic operations as indicated in
the preceding section.) L is called the Choleski factor of AAT . Thus, the above
linear system becomes
LLT y = Ac,
and y can be obtained by solving two triangle systems of linear equations.
1.5.3
The Newton method
The Newton method is used to solve a system of nonlinear equations: given

f (x) : Rn Rn , the problem is to solve n equations for n unknowns such that
f (x) = 0.
The idea behind Newtons method is to use the Taylor linear approximation at
the current iterate xk and let the approximation be zero:
f (x)
f (xk ) +
f (xk )(x xk ) = 0.
The Newton method is thus dened by the following iterative formula:

xk+1 = xk ( f (xk ))1 f (xk ),
where scalar 0 is called step-size. Rarely, however, is the Jacobian matrix
f inverted. Generally the system of linear equations
f (xk )dx = f (xk )
is solved and xk+1 = xk + dx is used. The direction vector dx is called a
Newton step, which can be carried out in strongly polynomial time.
A modied or quasi Newton method is dened by
xk+1 = xk M k f (xk ),
where M k is an n n symmetric matrix. In particular, if M k = I, the method
is called the gradient method, where f is viewed as the gradient vector of a real
function.
The Newton method has a superior asymptotic convergence order equal 2
for f (xk ) . It is frequently used in interior-point algorithms, and believed to
be the key to their eectiveness.
22
1.5.4
Solving ball-constrained linear problem
The ball-constrained linear problem has the following form:

(BP )
or
minimize
subject to
(BD)
cT x
Ax = 0,
minimize
subject to
bT y
AT y
1,
1.
x minimizes (BP) if and only if there always exists a y such that they satisfy
AAT y = Ac,
and if c AT y = 0 then
x = (c AT y)/ c AT y ;
otherwise any feasible x is a solution. The solution y for (BD) is given as
follows: Solve
AAT y = b,
and if y = 0 then set
y = / AT y ;
y
otherwise any feasible y is a solution. So these two problems can be reduced to

solving a system of linear equations.
1.5.5
Solving ball-constrained quadratic problem
The ball-constrained quadratic problem has the following form:

(BP )
minimize (1/2)xT Qx + cT x
subject to Ax = 0, x 2 1,
(BD)
minimize (1/2)y T Qy + bT y
subject to
y 2 1.
or simply
This problem is used by the classical trust region method for nonlinear optimization. The optimality conditions for the minimizer y of (BD) are
(Q + I)y = b,
and
0,
(Q + I)
1,
(1 y 2 ) = 0,
0.
These conditions are necessary and sucient. This problem can be solved in
polynomial time log(1/ ) and log(log(1/ )) by the bisection method or a hybrid
of the bisection and Newton methods, respectively. In practice, several trust
region procedures have been very eective in solving this problem.
The ball-constrained quadratic problem will be used an a sub-problem by
several interior-point algorithms in solving complex optimization problems. We
will discuss them later in the book.
1.6. NOTES
1.6
23
Notes
The term complexity was introduced by Hartmanis and Stearns [155]. Also
see Garey and Johnson [118] and Papadimitriou and Steiglitz [248]. The N P
theory was due to Cook [70] and Karp [180]. The importance of P was observed
by Edmonds [88].
Linear programming and the simplex method were introduced by Dantzig
[73]. Other inequality problems and convexity theories can be seen in Gritzmann and Klee [141], Grtschel, Lovsz and Schrijver [142], Grnbaum [143],
o
a
u
Rockafellar [264], and Schrijver [271]. Various complementarity problems can be
found found in Cottle, Pang and Stone [72]. The positive semi-denite programming, an optimization problem in nonpolyhedral cones, and its applications can
be seen in Nesterov and Nemirovskii [241], Alizadeh [8], and Boyd, Ghaoui,
Feron and Balakrishnan [56]. Recently, Goemans and Williamson [125] obtained several breakthrough results on approximation algorithms using positive
semi-denite programming. The KKT condition for nonlinear programming was
given by Karush, Kuhn and Tucker [195].
It was shown by Klee and Minty [184] that the simplex method is not a
polynomial-time algorithm. The ellipsoid method, the rst polynomial-time algorithm for linear programming with rational data, was proven by Khachiyan
[181]; also see Bland, Goldfarb and Todd [52]. The method was devised independently by Shor [277] and by Nemirovskii and Yudin [239]. The interior-point
method, another polynomial-time algorithm for linear programming, was developed by Karmarkar. It is related to the classical barrier-function method studied
by Frisch [109] and Fiacco and McCormick [104]; see Gill, Murray, Saunders,
Tomlin and Wright [124], and Anstreicher [21]. For a brief LP history, see the
excellent article by Wright [323].
The real computation model was developed by Blum, Shub and Smale [53]
and Nemirovskii and Yudin [239]. Other complexity issues in numerical optimization were discussed in Vavasis [318].
Many basic numerical procedures listed in this chapter can be found in
Golub and Van Loan [133]. The ball-constrained quadratic problem and its
solution methods can be seen in Mor [229], Sorenson [281], and Dennis and
e
Schnable [76]. The complexity result of the ball-constrained quadratic problem
was proved by Vavasis [318] and Ye [332].
1.7
Exercises
1.1 Let Q Rnn be a given nonsingular matrix, and a and b be given Rn

vectors. Show
(Q + abT )1 = Q1
1
1+
bT Q1 a
Q1 abT Q1 .
This formula is called the Sherman-Morrison-Woodbury formula.
24
1.2 Prove that the eigenvalues of all matrices Q Mnn are real. Furthermore, show that Q is PSD if and only if all its eigenvalues are non-negative,
and Q is PD if and only if all its eigenvalues are positive.
1.3 Using the ellipsoid representation in Section 1.2.2, nd the matrix Q and
vector y that describes the following ellipsoids:
1. The 3-dimensional sphere of radius 2 centered at the origin;
2. The 2-dimensional ellipsoid centered at (1; 2) that passes the points (0; 2),
(1; 0), (2; 2), and (1; 4);
3. The 2-dimensional ellipsoid centered at (1; 2) with axes parallel to the line
y = x and y = x, and passing through (1; 0), (3; 4), (0; 3), and (2; 1).
1.4 Show that the biggest coordinate-aligned ellipsoid that is entirely contained
in Rn and has its center at xa Rn can be written as:

+
+
E(xa ) = {x Rn :
(X a )1 (x xa ) 1}.
1.5 Show that the non-negative orthant, the positive semi-denite cone, and
the second-order cone are all self-dual.
1.6 Consider the convex set C = {x R2 : (x1 1)2 + (x2 1)2 1} and let
y R2 . Assuming y C,
1. Find the point in C that is closest to y;
2. Find a separating hyperplane vector as a function of y.
1.7 Using the idea of Exercise 1.6, prove the separating hyperplane theorem
1.1.
1.8
Given an m n matrix A and a vector c Rn , consider the function

n
B(y) = j=1 log sj where s = c AT y > 0. Find B(y) and 2 B(y)
in terms of s.
Given C Mn , Ai Mn , i = 1, , m, and b Rm , consider the

m
function B(y) := log det(S), where S = C

and
yi Ai
0. Find
B(y)
i=1
B(y) in terms of S.
The best way to do this is to use the denition of the partial derivative
f (y)i = lim
f (y1 , y2 , ..., yi + , ..., ym ) f (y1 , y2 , ..., yi , ..., ym )

.
1.9 Prove that the level set of a quasi-convex function is convex.

1.10 Prove Propositions 1.3 and 1.4 for convex functions in Section 1.2.3.
1.7. EXERCISES
25
1.11 Let f1 , . . ., fm be convex functions. Then, the function f (x) dened below
is also convex:
max fi (x)
i=1,...,m
fi (x)
i=1
1.12 Prove the Harmonic inequality described in Section 1.2.4.

1.13 Prove Farkas lemma 1.7 for linear equations.
1.14 Prove the linear least-squares problem always has a solution.
1.15 Let P = AT (AAT )1 A or P = I AT (AAT )1 A. Then prove
1. P = P 2 .
2. P is positive semi-denite.
3. The eigenvalues of P are either 0 or 1.
1.16 Using the separating theorem, prove Farkas lemmas 1.8 and 1.9.
1.17 If a system AT y c of linear inequalities in m variables has no solution,
show that AT y c has a subsystem (A )T y c of at most m + 1 inequalities
having no solution.
1.18 Prove the LP fundamental theorem 1.14.
1.19 If (LP) and (LD) have a nondegenerate optimal basis AB , prove that the
strict complementarity partition in Theorem 1.13 is
P = B.
1.20 If Q is positive semi-denite, prove that x is an optimal solution for
(QP) if and only if x is a KKT point for (QP).
1.21 Prove X S 0 if both X and S are positive semi-denite matrices.
1.22 Prove that two positive semi-denite matrices are complementary to each
other, X S = 0, if and only if XS = 0.
1.23 Let both (LP) and (LD) for a given data set (A, b, c) have interior feasible
points. Then consider the level set
(z) = {y : c AT y 0, z + bT y 0}
where z < z and z designates the optimal objective value. Prove that (z) is
bounded and has an interior for any nite z < z , even Fd is unbounded.
26
1.24 Given an (LP) data set (A, b, c) and an interior feasible point x0 , nd
the feasible direction dx (Adx = 0) that achieves the steepest decrease in the
objective function.
1.25 Given an (LP) data set (A, b, c) and a feasible point (x0 , y 0 , s0 ) (Rn , Rm , Rn )
+
+
for the primal and dual, and ignoring the nonnegativity condition, write the systems of linear equations used to calculate the Newton steps for nding points that
satisfy the optimality equations (1.2) and (1.3), respectively.
1.26 Show the optimality conditions for the minimizer y of (BD) in Section
1.5.5:
(Q + I)y = b,
and
are necessary and sucient.
0,
y 1,
(Q + I)
0,
(1 y ) = 0,
Chapter 2
Semidenite Programming
2.0.1
Semi-denite programming (SDP)
Given C Mn , Ai Mn , i = 1, 2, ..., m, and b Rm , the semi-denite

programming problem is to nd a matrix X Mn for the optimization problem:
(SDP )
inf
subject to
C X
Ai X = bi , i = 1, 2, ..., m, X
0.
Recall that the operation is the matrix inner product

A B := trAT B.
The notation X 0 means that X is a positive semi-denite matrix, and X 0
means that X is a positive denite matrix. If a point X
0 and satises all
equations in (SDP), it is called a (primal) strictly or interior feasible solution. .
The dual problem to (SDP) can be written as:
(SDD)
sup
bT y
m
subject to
i yi Ai + S = C, S
0,
which is analogous to the dual (LD) of LP. Here y Rm and S Mn . If

a point (y, S
0) satises all equations in (SDD), it is called a dual interior
feasible solution.
m
Example 2.1 Let P (y Rm ) = C + i yi Ai , where C and Ai , i = 1, . . . , m,

are given symmetric matrices. The problem of minimizing the max-eigenvalue
of P (y) can be cast as a (SDD) problem.
In semi-denite programming, we minimize a linear function of a matrix in
the positive semi-denite matrix cone subject to ane constraints. In contrast
to the positive orthant cone of linear programming, the positive semi-denite
27
28
CHAPTER 2. SEMIDEFINITE PROGRAMMING
matrix cone is non-polyhedral (or non-linear), but convex. So positive semidenite programs are convex optimization problems. Semi-denite programming unies several standard problems, such as linear programming, quadratic
programming, and convex quadratic minimization with convex quadratic constraints, and nds many applications in engineering, control, and combinatorial
optimization.
From Farkas lemma for linear programming, a vector y, with AT y 0 and
T
b y = 1, always exists and is called a (primal) infeasibility certicate for the
system {x : Ax = b, x 0}. But this does not hold for matrix equations in
the positive semi-denite matrix cone.
Example 2.2 Consider
A1 =
1 0
0 0
and
b=
A2 =
0
1
1
0
0
2
where we have the following matrix system

A1 X = 0, A2 X = 2,
X M2 .
+
The problem is that {y : yi = Ai X, i = 1, 2, ..., m, X 0} is not a closed set.

Similary,
0 1
1 0
C=
and A1 =
1 0
0 0
makes C A1 y1 infeasible but it does not have an infeasible certicate.
We have several theorems analogous to Farkas lemma.
Theorem 2.1 (Farkas lemma in SDP) Let Ai Mn , i = 1, ..., m, have rank m
m
(i.e., i yi Ai = 0 implies y = 0) and b Rm . Then, there exists a symmetric
matrix X 0 with
Ai X = bi , i = 1, ..., m,
if and only if
m
i
yi Ai
m
i
0 and
yi Ai = 0 implies bT y < 0.
Corollary 2.2 (Farkas lemma in SDP) Let Ai Mn , i = 1, ..., m, have rank

m
m
m (i.e., i yi Ai = 0 implies y = 0) and C Mn . Then, i yi Ai C if and
only if X 0, X = 0 and
Ai X = 0,
i = 1, ..., m,
implies C X > 0.
In other words, an X
0, X = 0,
Ai X = 0,
m
i
i = 1, ..., m,
and C X 0 proves that

yi Ai
C is impossible. Note the dierence
between the LP and SDP.
The weak duality theorem for SDP is identical to that of (LP) and (LD).
29
Corollary 2.3 (Weak duality theorem in SDP) Let Fp and Fd , the feasible sets
for the primal and dual, be non-empty. Then,
C X bT y
where
X Fp , (y, S) Fd .
But we need more to make the strong duality theorem hold.

Theorem 2.4 (Strong duality theorem in SDP) Let Fp and Fd be non-empty
and at least one of them has an interior. Then, X is optimal for (PS) if and
only if the following conditions hold:
i) X Fp ;
ii) there is (y, S) Fd ;
iii) C X = bT y or X S = 0.
Again note the dierence between the above theorem and the strong duality
theorem for LP.
Example 2.3 The
0 1
C= 1 0
0 0
following SDP has
0
0
0 , A1 = 0
0
0
and
b=
a duality gap:
0
0 0
1 0 , A1 = 1
0
0 0
0
10
1 0
0 0
0 2
Two positive semi-denite matrices are complementary to each other, XS =

0, if and only if XS = 0 (Exercise 1.22). From the optimality conditions, the
solution set for certain (SDP) and (SDD) is
Sp = {X Fp , (y, S) Fd : C X bT y = 0},
or
Sp = {X Fp , (y, S) Fd : XS = 0},
which is a system of linear matrix inequalities and equations.
In general, we have
Theorem 2.5 (SDP duality theorem) If one of (SDP) or (SDD) has a strictly
or interior feasible solution and its optimal value is nite, then the other is
feasible and has the same optimal value. If one of (SDP) or (SDD) is unbounded
then the other has no feasible solution.
Note that a duality gap may exists if neither (SDP) nor (SDD) has a strictly
feasible point. This is in contrast to (LP) and (LD) where no duality gap exists
if both are feasible.
Although semi-denite programs are much more general than linear programs, they are not much harder to solve. It has turned out that most interiorpoint methods for LP have been generalized to semi-denite programs. As in
LP, these algorithms possess polynomial worst-case complexity under certain
computation models. They also perform well in practice. We will describe such
extensions later in this book.
30
2.1
2.1.1
Analytic Center
AC for polytope
Let be a bounded polytope in Rm represented by n (> m) linear inequalities,

i.e.,
= {y Rm : c AT y 0},
where A Rmn and c Rn are given and A has rank m. Denote the interior
of by
m
T
= {y R : c A y > 0}.
Dene
(cj aT y),
j
d(y) =
y ,
j=1
where a.j is the jth column of A. Traditionally, we let s := c AT y and call

it a slack vector. Thus, the function is the product of all slack variables. Its
logarithm is called the (dual) potential function,
n
log(cj aT y) =
.j
B(y) := log d(y) =

j=1
log sj ,
(2.1)
j=1
and B(y) is the classical logarithmic barrier function. For convenience, in what
follows we may write B(s) to replace B(y) where s is always equal to c AT y.
Example 2.4 Let A = (1, 1) and c = (1; 1). Then the set of is the interval
[1, 1]. Let A = (1, 1, 1) and c = (1; 1; 1). Then the set of is also the
interval [1, 1]. Note that
d(1/2) = (3/2)(1/2) = 3/4
and
B(1/2) = log(3/4),
and
d (1/2) = (3/)(1/2)(1/2) = 3/8
and
B (1/2) = log(3/8).
The interior point, denoted by y a and sa = c AT y a , in that maximizes

the potential function is called the analytic center of , i.e.,
B() := B(y a , ) = max log d(y, ).
y
(y a , sa ) is uniquely dened, since the potential function is strictly concave in a
bounded convex . Setting B(y, ) = 0 and letting xa = (S a )1 e, the analytic

center (y a , sa ) together with xa satisfy the following optimality conditions:
Xs =
Ax =
AT y s =
e
0
c.
(2.2)
Note that adding or deleting a redundant inequality changes the location of the
analytic center.
2.1. ANALYTIC CENTER
31
Example 2.5 Consider = {y R : y 0, y 1}, which is interval [0, 1].

The analytic center is y a = 1/2 with xa = (2, 2)T .
Consider
n times
= {y R : y 0, , y 0, y 1},
which is, again, interval [0, 1] but y 0 is copied n times. The analytic
center for this system is y a = n/(n + 1) with xa = ((n + 1)/n, , (n +
1)/n, (n + 1))T .
The analytic center can be dened when the interior is empty or equalities
are presented, such as
= {y Rm : c AT y 0, By = b}.
Then the analytic center is chosen on the hyperplane {y : By = b} to maximize
the product of the slack variables s = c AT y. Thus, the interior of is not
used in the sense that the topological interior for a set is used. Rather, it refers
to the interior of the positive orthant of slack variables: Rn := {s : s 0}.
+
When say has an interior, we mean that
Rn {s : s = c AT y for some y where By = b} = .

+
Again Rn := {s Rn : s > 0}, i.e., the interior of the orthant Rn . Thus, if

+
+
+
has only a single point y with s = c AT y > 0, we still say is not empty.
Example 2.6 Consider the system = {x : Ax = 0, eT x = n, x 0}, which
is called Karmarkars canonical set. If x = e is in then e is the analytic center
of , the intersection of the simplex {x : eT x = n, x 0} and the hyperplane
{x : Ax = 0} (Figure 2.1).
2.1.2
AC for SDP
Let be a bounded convex set in Rm represented by n (> m) a matrix inequality, i.e.,

m
= {y Rm : C
yi Ai
0, }.
Let S = C
m
i
yi Ai and
m
B(y) := log det(S)) = log det(C
yi Ai ).
(2.3)
i
m
a
The interior point, denoted by y a and S a = C i yi Ai , in that maximizes the potential function is called the analytic center of , i.e.,
max B(y).
y
32
x3
(0,0,3)
Ax=0
x1
(1,1,1)
(3,0,0)
(0,3,0)
x2
Figure 2.1: Illustration of the Karmarkar (simplex) polytope and its analytic
center.
(y a , S a ) is uniquely dened, since the potential function is strictly concave in a
bounded convex . Setting B(y, ) = 0 and letting X a = (S a )1 , the analytic

center (y a , S a ) together with X a satisfy the following optimality conditions:
XS
AX
AT y S
2.2
= I
= 0
= C.
(2.4)
Potential Functions for LP and SDP
We show how potential functions can be dened to solve linear programming

problems and semi-denite programming.
We assume that for a given LP data set (A, b, c), both the primal and dual
have interior feasible point. We also let z be the optimal value of the standard
form (LP) and (LD). Denote the feasible sets of (LP) and (LD) by Fp and Fd ,
respectively. Denote by F = Fp Fd , and the interior of F by F .
2.2.1
Primal potential function for LP
Consider the level set

(z) = {y Rm : c AT y 0, z + bT y 0},
(2.5)
where z < z . Since both (LP) and (LD) have interior feasible point for given
(A, b, c), (z) is bounded and has an interior for any nite z, even := Fd is
unbounded (Exercise 1.23). Clearly, (z) , and if z2 z1 , (z2 ) (z1 )
and the inequality z + bT y is translated from z = z1 to z = z2 .
2.2. POTENTIAL FUNCTIONS FOR LP AND SDP
33
From the duality theorem again, nding a point in (z) has a homogeneous
primal problem
minimize cT x zx0
s.t.
Ax bx0 = 0, (x , x0 ) 0.
For (x , x0 ) satisfying
Ax bx0 = 0, (x , x0 ) > 0,
let x := x /x0 F p , i.e.,

Ax = b, x > 0.
Then, the primal potential function for (z) (Figure 2.2), as described in the
preceding section, is
n
P(x , (z)) = (n + 1) log(cT x zx0 )
log xj
j=0
= (n + 1) log(cT x z)
log xj =: Pn+1 (x, z).

j=1
The latter, Pn+1 (x, z), is the Karmarkar potential function in the standard LP
form with a lower bound z for z .
b T y = bT y a
ya
ya
bT y = z
The objective hyperplane
The updated objective hyperplane
Figure 2.2: Intersections of a dual feasible region and the objective hyperplane;
bT y z on the left and bT y bT y a on the right.
One algorithm for solving (LD) is suggested in Figure 2.2. If the objective
hyperplane is repeatedly translated to the analytic center, the sequence of new
34
analytic centers will converge to an optimal solution and the potentials of the
new polytopes will decrease to .
As we illustrated before, one can represent (z) dierently:
times
(z) = {y : c AT y 0, z + bT y 0, , z + bT y 0},
(2.6)
i.e., z + bT y 0 is copied times. Geometrically, this representation does

not change (z), but it changes the location of its analytic center. Since the last
inequalities in (z) are identical, they must share the same slack value and
the same corresponding primal variable. Let (x , x0 ) be the primal variables.
Then the primal problem can be written as
times
minimize
c x zx0 zx0
times
s.t.
Ax bx0 bx0 = 0, (x , x0 ) 0.
Let x = x /(x0 ) F p . Then, the primal potential function for the new (z)
given by (2.6) is
n
P(x, (z))
(n + ) log(cT x z(x0 ))
log xj log x0
j=1
(n + ) log(cT x z)
log xj + log
j=1
=: Pn+ (x, z) + log .

The function
n
Pn+ (x, z) = (n + ) log(cT x z)
log xj
(2.7)
j=1
is an extension of the Karmarkar potential function in the standard LP form

with a lower bound z for z . It represents the volume of a coordinate-aligned
ellipsoid whose intersection with A(z) contains S(z) , where z + bT y 0 is
duplicated times.
2.2.2
Dual potential function for LP
We can also develop a dual potential function, symmetric to the primal, for
(y, s) F d
Bn+ (y, s, z) = (n + ) log(z bT y)
log sj ,
j=1
(2.8)
2.2. POTENTIAL FUNCTIONS FOR LP AND SDP
35
where z is a upper bound of z . One can show that it represents the volume of a
coordinate-aligned ellipsoid whose intersection with the ane set {x : Ax = b}
contains the primal level set
times
T
{x Fp : c x z 0, , cT x z 0},
where cT x z 0 is copied times (Exercise 2.1). For symmetry, we may
write Bn+ (y, s, z) simply by Bn+ (s, z), since we can always recover y from s
using equation AT y = c s.
2.2.3
Primal-dual potential function for LP
A primal-dual potential function for linear programming will be used later. For
x F p and (y, s) F d it is dened by

n
n+ (x, s) := (n + ) log(xT s)
log(xj sj ),
(2.9)
j=1
where 0.
We have the relation:
n
n+ (x, s) =
(n + ) log(cT x bT y)
log xj
j=1
log sj
j=1
Pn+ (x, bT y)
log sj
j=1
n
Bn+ (s, cT x)
log xj .
j=1
Since
n+ (x, s) = log(xT s) + n (x, s) log(xT s) + n log n,
then, for > 0, n+ (x, s) implies that xT s 0. More precisely, we

have
n+ (x, s) n log n
xT s exp(
).
We have the following theorem:

Theorem 2.6 Dene the level set
() := {(x, y, s) F : n+ (x, s) }.
i)
( 1 ) ( 2 )
if
1 2 .
36
ii)
() = {(x, y, s) F : n+ (x, s) < }.
iii) For every , () is bounded and its closure () has non-empty intersection with the solution set.
Later we will show that a potential reduction algorithm generates sequences
{x , y k , sk } F such that
n+n (xk+1 , y k+1 , sk+1 ) n+n (xk , y k , sk ) .05
for k = 0, 1, 2, .... This indicates that the level sets shrink at least a constant
rate independently of m or n.
2.2.4
Potential function for SDP
The potential functions for SDP of Section 2.0.1 are analogous to those for LP.
For given data, we assume that both (SDP) and (SDD) have interior feasible
points. Then, for any X F p and (y, S) F d , the primal potential function is
dened by
Pn+ (X, z) := (n + ) log(C X z) log det(X),
z z;
the dual potential function is dened by

Bn+ (y, S, z) := (n + ) log(z bT y) log det(S),
z z,
where 0 and z designates the optimal objective value.
For X F p and (y, S) F d the primal-dual potential function for SDP is

dened by
n+ (X, S) :=
=
=
=
(n + ) log(X S) log(det(X) det(S))

(n + ) log(C X bT y) log det(X) log det(S)
Pn+ (X, bT y) log det(S)
Bn+ (S, C X) log det(X),
where 0. Note that if X and S are diagonal matrices, these denitions

reduce to those for LP.
Note that we still have (Exercise 2.2)
n+ (X, S) = log(X S) + n (X, S) log(X S) + n log n.
Then, for > 0, n+ (X, S) implies that X S 0. More precisely, we
have
n+ (X, S) n log n
X S exp(
).
We also have the following corollary:
2.3. CENTRAL PATHS OF LP AND SDP
37
Corollary 2.7 Let (SDP) and (SDD) have non-empty interior and dene the
level set
() := {(X, y, S) F : n+ (X, S) }.
i)
( 1 ) ( 2 )
if
1 2 .
ii)
() = {(X, y, S) F : n+ (X, S) < }.
iii) For every , () is bounded and its closure () has non-empty intersection with the solution set.
2.3
Central Paths of LP and SDP
Many interior-point algorithms nd a sequence of feasible points along a central path that connects the analytic center and the solution set. We now present
this one of the most important foundations for the development of interior-point
algorithms.
2.3.1
Central path for LP
Consider a linear program in the standard form (LP) and (LD). Assume that
F = , i.e., both F p = and F d = , and denote z the optimal objective value.

The central path can be expressed as
C=
(x, y, s) F : Xs =
xT s
e
n
in the primal-dual form. We also see
C = (x, y, s) F : n (x, s) = n log n .

For any > 0 one can derive the central path simply by minimizing the
primal LP with a logarithmic barrier function:
(P )
minimize
s.t.
cT x j=1 log xj
Ax = b, x 0.
Let x() F p be the (unique) minimizer of (P). Then, for some y Rm it

satises the optimality conditions
Xs =
Ax =
AT y s =
e
b
c.
(2.10)
38

Consider minimizing the dual LP with the barrier function:
(D)
maximize
s.t.
bT y + j=1 log sj
AT y + s = c, s 0.
Let (y(), s()) F d be the (unique) minimizer of (D). Then, for some x Rn
it satises the optimality conditions (2.10) as well. Thus, both minimizers x()
and (y(), s()) are on the central path with x()T s() = n.
Another way to derive the central path is to consider again the dual level
set (z) of (2.5) for any z < z (Figure 2.3).
Then, the analytic center (y(z), s(z)) of (z) and a unique point
(x (z), x0 (z)) satises
Ax (z) bx0 (z) = 0, X (z)s = e, s = c AT y, and x0 (z)(bT y z) = 1.
Let x(z) = x (z)/x0 (z), then we have
Ax(z) = b, X(z)s(z) = e/x0 (z) = (bT y(z) z)e.
Thus, the point (x(z), y(z), s(z)) is on the central path with = bT y(z) z
and cT x(z) bT y(z) = x(z)T s(z) = n(bT y(z) z) = n. As we proved earlier
in Section 2.2, (x(z), y(z), s(z)) exists and is uniquely dened, which imply the
following theorem:
Theorem 2.8 Let both (LP) and (LD) have interior feasible points for the
given data set (A, b, c). Then for any 0 < < , the central path point
(x(), y(), s()) exists and is unique.
ya
The objective hyperplanes

Figure 2.3: The central path of y(z) in a dual feasible region.
The following theorem further characterizes the central path and utilizes it
to solving linear programs.
39
Theorem 2.9 Let (x(), y(), s()) be on the central path.

i) The central path point (x(), s()) is bounded for 0 < 0 and any given
0 < 0 < .
ii) For 0 < < ,
cT x( ) < cT x()
bT y( ) > bT y().
and
iii) (x(), s()) converges to an optimal solution pair for (LP) and (LD). Moreover, the limit point x(0)P is the analytic center on the primal optimal
face, and the limit point s(0)Z is the analytic center on the dual optimal
face, where (P , Z ) is the strict complementarity partition of the index
set {1, 2, ..., n}.
Proof. Note that
(x(0 ) x())T (s(0 ) s()) = 0,
since (x(0 ) x()) N (A) and (s(0 ) s()) R(AT ). This can be rewritten
as
n
s(0 )j x()j + x(0 )j s()j = n(0 + ) 2n0 ,

j
or
n
j
x()j
s()j
+
x(0 )j
s(0 )j
2n.
Thus, x() and s() are bounded, which proves (i).

We leave the proof of (ii) as an exercise.
Since x() and s() are both bounded, they have at least one limit point
which we denote by x(0) and s(0). Let x (x = 0) and s (s = 0),
P
Z
Z
P
respectively, be the unique analytic centers on the primal and dual optimal
faces: {xP : AP xP = b, xP 0} and {sZ : sZ = cZ AT y
Z
0, cP AT y = 0}. Again, we have
P
n
s x()j + x s()j = n,
j
j
j
or
jP
x
j
x()j
+
jZ
s
j
s()j
Thus, we have
x()j x /n > 0, j P
j
and
s()j s /n > 0, j Z .
j
= n.
40
This implies that
x()j 0, j Z
and
s()j 0, j P .
Furthermore,
jP
or
jZ
x
j
x()j
x
j
jP
s
j
jZ
s
j
1
s()j
x()j
jP
s()j .
jZ
However, ( jP x )( jZ s ) is the maximal value of the potential function

j
j
over all interior point pairs on the optimal faces, and x(0)P and s(0)Z is one
interior point pair on the optimal face. Thus, we must have

jP
Therefore,
s =
j
x
j
x(0)j
jP
jZ
x(0)P = x
P
s(0)j .
jZ
and s(0)Z = s ,
Z
since x and s are the unique maximizer pair of the potential function. This
P
Z
also implies that the limit point of the central path is unique.
We usually dene a neighborhood of the central path as
N () =
(x, y, s) F :
Xs e
and =
where . can be any norm, or even a one-sided norm as

x
= | min(0, min(x))|.
We have the following theorem:
xT s
n
41
Theorem 2.10 Let (x, y, s) N () for constant 0 < < 1.

i) The N () {(x, y, s) : xT s n0 } is bounded for any given 0 < .
ii) Any limit point of N () as 0 is an optimal solution pair for (LP) and
(LD). Moreover, for any j P
xj
(1 )x
j
,
n
where x is any optimal primal solution; for any j Z

sj
(1 )s
j
,
n
where s is any optimal dual solution.
2.3.2
Central path for SDP
Consider a SDP problem in Section 2.0.1 and Assume that F = , i.e., both
F p = and F d = . The central path can be expressed as
C = (X, y, S) F : XS = I, 0 < < ,

or a symmetric form
C = (X, y, S) F : X .5 SX .5 = I, 0 < < ,

where X .5 Mn is the square root matrix of X Mn , i.e., X .5 X .5 = X.
+
+
We also see
C = (X, y, S) F : n (X, S) = n log n .

When X and S are diagonal matrices, this denition is identical to LP.
We also have the following corollary:
Corollary 2.11 Let both (SDP) and (SDD) have interior feasible points. Then
for any 0 < < , the central path point (X(), y(), S()) exists and is
unique. Moreover,
i) the central path point (X(), S()) is bounded where 0 < 0 for any
given 0 < 0 < .
ii) For 0 < < ,
C X( ) < C X()
and
bT y( ) > bT y().
iii) (X(), S()) converges to an optimal solution pair for (SDP) and (SDD),
and the rank of the limit of X() is maximal among all optimal solutions
of (SDP) and the rank of the limit S() is maximal among all optimal
solutions of (SDD).
42
2.4
Notes
The SDP with a duality gap was constructed by Freund.
2.5
Exercises
2.1 Let (LP) and (LD) have interior. Prove the dual potential function Bn+1 (y, s, z),
where z is a upper bound of z , represents the volume of a coordinate-aligned
ellipsoid whose intersection with the ane set {x : Ax = b} contains the primal
level set {x Fp : cT x z}.
2.2 Let X, S Mn be both positive denite. Then prove
n (X, S) = n log(X S) log(det(X) det(S)) n log n.
2.3 Consider linear programming and the level set
() := {(X, y, S) F : n+ (x, s) }.
Prove that
( 1 ) ( 2 )
if
1 2 ,
and for every () is bounded and its closure () has non-empty intersection
with the solution set.
2.4 Prove (ii) of Theorem 2.9.
2.5 Prove Theorem 2.10.
2.6 Prove Corollary 2.11. Here we assume that X() = X( ) and y() =
y(mu ).
Chapter 3
Interior-Point Algorithms
This pair of semidenite programs can be solved in polynomial time. There
are actually several polynomial algorithms. One is the primal-scaling algorithm,
and it uses only X to generate the iterate direction. In other words,
X k+1
S k+1
= Fp (X k ),
where Fp is the primal algorithm iterative mapping.

Another is the dual-scaling algorithm which is the analogue of the dual potential reduction algorithm for linear programming. The dual-scaling algorithm
uses only S to generate the new iterate:
X k+1
S k+1
= Fd (S k ),
where Fd is the dual algorithm iterative mapping.

The third is the primal-dual scaling algorithm which uses both X and S to
generate the new iterate and references therein):
X k+1
S k+1
= Fpd (X k , S k ),
where Fpd is the primal-dual algorithm iterative mapping.

All these algorithms generate primal and dual iterates simultaneously, and
possess O( n ln(1/ )) iteration complexity to yield the duality gap accuracy .

Other scaling algorithms have been proposed in the past. For example, an SDP
equivalent of Dikins ane-scaling algorithm could be very fast. However this
algorithm may not even converge.
Recall that Mn denotes the set of symmetric matrices in Rnn . Let Mn
+
denote the set of positive semi-denite matrices and Mn the set of positive
+
denite matrices in Mn . The goal of this section is to extend interior-point
algorithms to solving the positive semi-denite programming problem (SDP)
and (SDD) presented in Section 2.0.1.
43
44
CHAPTER 3. INTERIOR-POINT ALGORITHMS
(SDP) and (SDD) are analogues to linear programming (LP) and (LD). In
fact, as the notation suggest, (LP) and (LD) can be expressed as a positive
semi-denite program by dening
C = diag(c),
Ai = diag(ai ),
b = b,
where ai is the ith row of matrix A. Many of the theorems and algorithms
used in LP have analogues in SDP. However, while interior-point algorithms for
LP are generally considered competitive with the simplex method in practice
and outperform it as problems become large, interior-point methods for SDP
outperform other methods on even small problems.
Denote the primal feasible set by Fp and the dual by Fd . We assume that
both F p and F d are nonempty. Thus, the optimal solution sets for both (SDP )
and (SDD) are bounded and the central path exists, see Section 2.3 Let z
denote the optimal value and F = Fp Fd . In this section, we are interested
in nding an approximate solution for the SDP problem:
C X bT y = S X .
For simplicity, we assume that a central path pair (X 0 , y 0 , S 0 ), which satises
(X 0 ).5 S 0 (X 0 ).5 = 0 I
and
0 = X 0 S 0 /n,
is known. We will use it as our initial point throughout this section.
Let X F p , (y, S) F d , and z z . Then consider the primal potential

function
P(X, z) = (n + ) log(C X z) log det X,
and the primal-dual potential function
(X, S) = (n + ) log(S X) log det XS,
where =
n. Let z = bT y. Then S X = C X z, and we have

(x, s) = P(x, z) log det S.
Like in Chapter 4, these functions will be used to solve SDP problems.

Dene the -norm,, which is the traditional l2 operator norm for matrices,
of Mn by
X := max {|j (X)|},
j{1,...,n}
where j (X) is the jth eigenvalue of X, and the Euclidean or l2 norm, which
is the traditional Frobenius norm, by
X := X
(j (X))2 .
X X =
j=1
3.1. POTENTIAL REDUCTION ALGORITHM FOR LP
45
We rename these norms because they are perfect analogues to the norms of
vectors used in LP. Furthermore, note that, for X Mn ,
n
tr(X) =
j (X)
and
det(I + X) =
j=1
(1 + j (X)).
j=1
Then, we have the following lemma which resembles Lemma 3.1.

We rst prove two important lemmas.
Lemma 3.1 If d Rn such that d
< 1 then
eT d
log(1 + di ) eT d
i=1
Lemma 3.2 Let X Mn and X
d 2
2(1 d
< 1. Then,
tr(X) log det(I + X) tr(X)
3.1
X 2
2(1 X
Potential Reduction Algorithm for LP
Let (x, y, s) F . Then consider the primal-dual potential function:

n
n+ (x, s) = (n + ) log(xT s)
log(xj sj ),
j=1
where 0. Let z = bT y, then sT x = cT x z and we have

n
n+ (x, s) = Pn+ (x, z)
log sj .
j=1
Recall that when = 0, n+ (x, s) is minimized along the central path. However, when > 0, n+ (x, s) means that x and s converge to the
optimal face, and the descent gets steeper as increases. In this section we
choose = n.
The process calculates steps for x and s, which guarantee a constant reduction in the primal-dual potential function. As the potential function decreases,
both x and s are forced to an optimal solution pair.
Consider a pair of (xk , y k , sk ) F . Fix z k = bT y k , then the gradient vector

of the primal potential function at xk is
Pn+ (xk , z k ) =
(n + )
(n + )
c (X k )1 e = T k
c (X k )1 e.
k )T xk
(s
c x zk
We directly solve the ball-constrained linear problem for direction dx :

minimize
s.t.
Pn+ (xk , z k )T dx
Adx = 0, (X k )1 dx .
46
Let the minimizer be dx . Then

dx =
X k pk
,
pk
where
pk = p(z k ) :== (I X k AT (A(X k )2 AT )1 AX k )X k Pn+ (xk , z k ).
Update
xk+1 = xk + dx = xk
X k pk
,
pk
(3.1)
and,
Pn+ (xk+1 , z k ) Pn+ (xk , z k ) pk +
2
.
2(1 )
Here, we have used the fact

Pn+ (xk+1 , z k ) Pn+ (xk , z k )
n+
(X k )1 dx 2
cT dx eT (X k )1 dx +
cT xk z k
2(1 (X k )1 dx
(X k )1 dx 2
=
P(xk , z k )T dx +
2(1 (X k )1 dx )
2
= pk +
2(1 )
Thus, as long as pk > 0, we may choose an appropriate such that

Pn+ (xk+1 , z k ) Pn+ (xk , z k )
for some positive constant . By the relation between n+ (x, s) and Pn+ (x, z),
the primal-dual potential function is also reduced. That is,
n+ (xk+1 , sk ) n+ (xk , sk ) .
However, even if pk is small, we will show that the primal-dual potential
function can be reduced by a constant by increasing z k and updating (y k , sk ).
We focus on the expression of pk , which can be rewritten as
pk
= (I X k AT (A(X k )2 AT )1 AX k )(
=
(n + )
X k s(z k ) e,
cT xk z k
(n + )
X k c e)
cT xk z k
(3.2)
where
s(z k ) = c AT y(z k )
(3.3)

and
47
x z
y(z k ) = y2 c (n+) y1 ,
y1 = (A(X k )2 AT )1 b,
y2 = (A(X k )2 AT )1 A(X k )2 c.
(3.4)
Regarding pk = p(z k ) , we have the following lemma:

Lemma 3.3 Let
k =
cT xk z k
(xk )T sk
=
n
n
and
(xk )T s(z k )
.
n
If
n
,1 ,
n + 2
p(z k ) < min
(3.5)
then the following three inequalities hold:

s(z k ) > 0,
X k s(z k ) e < ,
and
< (1 .5/ n)k .
(3.6)
Proof. The proof is by contradiction.

i) If the rst inequality of (3.6) is not true, then j such that sj (z k ) 0 and
p(z k ) 1
(n + )
xj sj (z k ) 1.
nk
ii) If the second inequality of (3.6) does not hold, then

p(z k )
=
=
(n + ) k k
(n + )
(n + )
X s(z )
e+
ee 2
nk
nk
nk
(n + ) 2 k k
(n + )
(
) X s(z ) e 2 +
ee 2
k
n
nk
(n + ) 2 2
(n + )
(
) +(
1)2 n
(3.7)
nk
nk
n
2
,
n + 2
where the last relation prevails since the quadratic term yields the minimum at
n
(n + )
=
.
nk
n + 2
iii) If the third inequality of (3.6) is violated, then
(n + )
1
.5
(1 + )(1 ) 1,
nk
n
n
48

which, in view of (3.7), leads to
p(z k )
(n + )
1)2 n
nk
.5
1
((1 + )(1 ) 1)2 n
n
n
2
(1 )
2 2 n
(
(1 )2 .
The lemma says that, when p(z k ) is small, then (xk , y(z k ), s(z k )) is in
the neighborhood of the central path and bT y(z k ) > z k . Thus, we can increase
z k to bT y(z k ) to cut the dual level set (z k ). We have the following potential
reduction theorem to evaluate the progress.
Theorem 3.4 Given (xk , y k , sk ) F . Let = n, z k = bT y k , xk+1 be given

k+1
k
k+1
by (3.1), and y
= y(z ) in (3.4) and s
= s(z k ) in (3.3). Then, either
n+ (xk+1 , sk ) n+ (xk , sk )
or
n+ (xk , sk+1 ) n+ (xk , sk )
where > 1/20.
Proof. If (3.5) does not hold, i.e.,
p(z k ) min
n
,1 ,
n + 2
then
Pn+ (xk+1 , z k ) Pn+ (xk , z k ) min
n
2
,1 +
,
n + 2
2(1 )
hence from the relation between Pn+ and n+ ,

n+ (xk+1 , sk ) n+ (xk , sk ) min
2
n
,1 +
.
n + 2
2(1 )
Otherwise, from Lemma 3.3 the inequalities of (3.6) hold:
i) The rst of (3.6) indicates that y k+1 and sk+1 are in F d .
49
ii) Using the second of (3.6) and applying Lemma 3.1 to vector X k sk+1 /, we
have
n
log(xk sk+1 )
j j
n log(xk )T sk+1
j=1
n
log(xk sk+1 /)
j j
= n log n
j=1
X k sk+1 / e 2
2(1 X k sk+1 / e
2
n log n +
2(1 )
n log n +
n log(xk )T sk
log(xk sk ) +
j j
j=1
2
.
2(1 )
iii) According to the third of (3.6), we have
n(log(xk )T sk+1 log(xk )T sk ) =
n log
.
k
2
Adding the two inequalities in ii) and iii), we have

n+ (xk , sk+1 ) n+ (xk , sk )
2
+
.
2 2(1 )
Thus, by choosing = .43 and = .3 we have the desired result.
Theorem 3.4 establishes an important fact: the primal-dual potential function can be reduced by a constant no matter where xk and y k are. In practice,
one can perform the line search to minimize the primal-dual potential function.
This results in the following primal-dual potential reduction algorithm.
Algorithm 3.1 Given a central path point (x0 , y 0 , s0 ) F . Let z 0 = bT y 0 . Set

k := 0.
While (sk )T xk do
1. Compute y1 and y2 from (3.4).
2. If there exists z such that s(z) > 0, compute
z = arg min n+ (xk , s(z)),
and if n+ (xk , s()) < n+ (xk , sk ) then y k+1 = y(), sk+1 = s() and
z
z
z
z k+1 = bT y k+1 ; otherwise, y k+1 = y k , sk+1 = sk and z k+1 = z k .
50

3. Let xk+1 = xk X k p(z k+1 ) with
= arg min n+ (xk X k p(z k+1 ), sk+1 ).
4. Let k := k + 1 and return to Step 1.

The performance of the algorithm results from the following corollary:
Corollary 3.5 Let =
bT y 0 )/ ) iterations with
n. Then, Algorithm 3.1 terminates in at most O( n log(cT x0

cT xk bT y k .
Proof. In O( n log((x0 )T s0 / )) iterations
n log((x0 )T s0 / ) = n+ (xk , sk ) n+ (x0 , s0 )
n log(xk )T sk + n log n n+ (x0 , s0 )
=
n log((xk )T sk /(x0 )T s0 ).
Thus,
n log(cT xk bT y k ) =
n log(xk )T sk
n log ,
i.e.,
cT xk bT y k = (xk )T sk .
3.2
Primal-Dual (Symmetric) Algorithm for LP
Another technique for solving linear programs is the symmetric primal-dual
algorithm. Once we have a pair (x, y, s) F with = xT s/n, we can generate

a new iterate x+ and (y + , s+ ) by solving for dx , dy and ds from the system of
linear equations:
Sdx + Xds = e Xs,
Adx = 0,
(3.8)
AT dy ds = 0.
Let d := (dx , dy , ds ). To show the dependence of d on the current pair (x, s)
and the parameter , we write d = d(x, s, ). Note that dT ds = dT AT dy = 0
x
x
here.
The system (3.8) is the Newton step starting from (x, s) which helps to nd
the point on the central path with duality gap n, see Section 2.3.1. If = 0, it
steps toward the optimal solution characterized by the system of equations (1.2);
if = 1, it steps toward the central path point (x(), y(), s()) characterized
by the system of equations (2.10); if 0 < < 1, it steps toward a central
path point with a smaller complementarity gap. In the algorithm presented in
3.2. PRIMAL-DUAL (SYMMETRIC) ALGORITHM FOR LP
51
this section, we choose = n/(n + ) < 1. Each iterate reduces the primaldual potential function by at least a constant , as does the previous potential
reduction algorithm.
To analyze this algorithm, we present the following lemma, whose proof is
omitted.
Lemma 3.6 Let the direction d = (dx , dy , ds ) be generated by equation (3.8)
with = n/(n + ), and let
=
min(Xs)
xT s
(XS)1/2 ( (n+) e
Xs)
(3.9)
where is a positive constant less than 1. Let

x+ = x + dx ,
y + = y + dy ,
and
s+ = s + ds .
Then, we have (x+ , y + , s+ ) F and

n+ (x+ , s+ ) n+ (x, s)

min(Xs) (XS)1/2 (e
(n + )
2
Xs) +
.
xT s
2(1 )
Let v = Xs. Then, we can prove the following lemma (Exercise 3.3):
Lemma 3.7 Let v Rn be a positive vector and n. Then,

min(v) V 1/2 (e
(n + )
v)
eT v
3/4 .
Combining these two lemmas we have

n+ (x+ , s+ ) n+ (x, s)

3/4 +
2
=
2(1 )
for a constant . This result will provide a competitive theoretical iteration

bound, but a faster algorithm may be again implemented by conducting a line
search along direction d to achieve the greatest reduction in the primal-dual
potential function. This leads to
Algorithm 3.2 Given (x0 , y 0 , s0 ) F . Set

While (sk )T xk do
n and k := 0.
1. Set (x, s) = (xk , sk ) and = n/(n + ) and compute (dx , dy , ds ) from

(3.8).
52

2. Let xk+1 = xk + dx , y k+1 = y k + dy , and sk+1 = sk + ds where
= arg min n+ (xk + dx , sk + ds ).
Theorem 3.8 Let = O( n). Then, Algorithm 3.2 terminates in at most
O( n log((x0 )T s0 / )) iterations with

cT xk bT y k .
3.3
Potential Reduction Algorithm for SDP
Consider a pair of (X k , y k , S k ) F . Fix z k = bT y k , then the gradient matrix of

the primal potential function at X k is
P(X k , z k ) =
n+
C (X k )1 .
Sk X k
The following corollary is an analog to LP.
Corollary 3.9 Let X k Mn and (X k ).5 (X X k )(X k ).5

+
X
Mn
+
and
P(X, z k ) P(X k , z k )
+
P(X k , z k ) (X X k )
(X k ).5 (X X k )(X k ).5 2

2(1 (X k ).5 (X X k )(X k ).5
Let
A1
A
A = 2 .
...
Am
Then, dene
A1 X
A X
= b,
AX = 2
...
Am X
and
AT y =
y i Ai .
i=1
Then, we directly solve the following ball-constrained problem:

minimize
s.t.
P(X k , z k ) (X X k )
A(X X k ) = 0,
(X k ).5 (X X k )(X k ).5 < 1.
< 1. Then,
3.3. POTENTIAL REDUCTION ALGORITHM FOR SDP
53
Let X = (X k ).5 X(X k ).5 . Note that for any symmetric matrices Q, T Mn
and X Mn ,
+
Q X .5 T X .5 = X .5 QX .5 T and XQ
= QX
= X .5 QX .5 .
Then we transform the above problem into

minimize
s.t.
where
(X k ).5 P(X k , z k )(X k ).5 (X I)

A (X I) = 0, i = 1, 2, ..., i,
X I ,
(X k ).5 A1 (X k ).5
A1
(X k ).5 A2 (X k ).5
A2
A =
... :=
...
Am
(X k ).5 Am (X k ).5
Let the minimizer be X and let X k+1 = (X k ).5 X (X k ).5 . Then

X I =
X k+1 X k =
Pk
Pk
(X k ).5 P k (X k ).5
,
Pk
(3.10)
where
Pk
Pk =
and
PA (X k ).5 P(X k , z k )(X k ).5
=
or
(X k ).5 P(X k , z k )(X k ).5 A y k
n+
(X k ).5 (C AT y k )(X k ).5 I,
Sk X k
Sk X k
T
(A A )1 A (X k ).5 P(X k , z k )(X k ).5 .
n+
is the projection operator onto the null space of A , and
A1 A1 A1 A2 ... A1 Am
A A1 A2 A2 ... A2 Am
T
Mm .
A A := 2
...
...
...
...
Am A1 Am A2 ... Am Am
yk =
Here, PA
In view of Corollary 3.9 and

P(X k , z k ) (X k+1 X k )
P(X k , z k ) (X k ).5 P k (X k ).5

Pk
(X k ).5 P(X k , z k )(X k ).5 P k
Pk
Pk 2
= P k ,
Pk
=
=
=
54
we have
P(X k+1 , z k ) P(X k , z k ) P k +
2
.
2(1 )
Thus, as long as P k > 0, we may choose an appropriate such that

P(X k+1 , z k ) P(X k , z k )
for some positive constant .
Now, we focus on the expression of P k , which can be rewritten as
P (z k ) := P k =
with
n+
(X k ).5 S(z k )(X k ).5 I
Sk X k
S(z k ) = C AT y(z k )
and
y(z k ) := y k = y2
(3.11)
(3.12)
Sk X k
C X k zk
y1 = y2
y1 ,
n+
n+
(3.13)
where y1 and y2 are given by

y1
y2
= (A A )1 A I = (A A )1 b,
T
= (A A )1 A (X k ).5 C(X k ).5 .
(3.14)
Regarding P k = P (z k ) , we have the following lemma resembling Lemma

3.3.
Lemma 3.10 Let
k =
Sk X k
C X k zk
=
n
n
and
S(z k ) X k
.
n
If
P (z k ) < min
n
,1 ,
n + 2
(3.15)

S(z k )
0,
(X k ).5 S(z k )(X k ).5 e < ,
and
< (1 .5/ n)k .

(3.16)
Proof. The proof is by contradiction. For example, if the rst inequality of

(3.16) is not true, then (X k ).5 S(z k )(X k ).5 has at least one eigenvalue less than
or equal to zero, and
P (z k ) 1.
The proof of the second and third inequalities are similar to that of Lemma 3.3.
Based on this lemma, we have the following potential reduction theorem.
3.3. POTENTIAL REDUCTION ALGORITHM FOR SDP
55
Theorem 3.11 Given X k F p and (y k , S k ) F d , let = n, z k = bT y k ,

X k+1 be given by (3.10), and y k+1 = y(z k ) in (3.13) and S k+1 = S(z k ) in
(3.12). Then, either
(X k+1 , S k ) (X k , S k )
or
(X k , S k+1 ) (X k , S k ) ,
where > 1/20.

Proof. If (3.15) does not hold, i.e.,
P (z k ) min
n
,1 ,
n + 2
then, since (X k+1 , S k ) (X k , S k ) = P(X k+1 , z k ) P(X k , z k ),

(X k+1 , S k ) (X k , S k ) min
n
,1
n + 2
2
.
2(1 )
Otherwise, from Lemma 3.10 the inequalities of (3.16) hold:
i) The rst of (3.16) indicates that y k+1 and S k+1 are in F d .

ii) Using the second of (3.16) and applying Lemma 3.2 to matrix
(X k ).5 S k+1 (X k ).5 /, we have
n log S k+1 X k log det S k+1 X k
= n log S k+1 X k / log det(X k ).5 S k+1 (X k ).5 /
= n log n log det(X k ).5 S k+1 (X k ).5 /
(X k ).5 S k+1 (X k ).5 / I 2
n log n +
2(1 (X k ).5 S k+1 (X k ).5 / I )
2
n log n +
2(1 )
2
n log S k X k log det S k X k +
.
2(1 )
iii) According to the third of (3.16), we have
n(log S k+1 X k log S k X k ) =
n log
.
k
Adding the two inequalities in ii) and iii), we have

(X k , S k+1 ) (X k , S k )
+
.
2
2(1 )
Thus, by choosing = .43 and = .3 we have the desired result.
56
Theorem 3.11 establishes an important fact: the primal-dual potential function can be reduced by a constant no matter where X k and y k are. In practice,
one can perform the line search to minimize the primal-dual potential function.
This results in the following potential reduction algorithm.
Algorithm 3.3 Given x0 F p and (y 0 , s0 ) F d . Let z 0 = bT y 0 . Set k := 0.

While S k X k do
1. Compute y1 and y2 from (3.14).
2. Set y k+1 = y(), S k+1 = S(), z k+1 = bT y k+1 with
z
z
z = arg min (X k , S(z)).
zz k
If (X k , S k+1 ) > (X k , S k ) then y k+1 = y k , S k+1 = S k , z k+1 = z k .

3. Let X k+1 = X k (X k ).5 P (z k+1 )(X k ).5 with
= arg min (X k (X k ).5 P (z k+1 )(X k ).5 , S k+1 ).

The performance of the algorithm results from the following corollary:
Corollary 3.12 Let = n. Then, Algorithm 3.3 terminates in at most
O( n log(C X 0 bT y 0 )/ ) iterations with

C X k bT y k .
Proof. In O( n log(S 0 X 0 / )) iterations
n log(S 0 X 0 / )
Thus,
= (X k , S k ) (X 0 , S 0 )
n log S k X k + n log n (X 0 , S 0 )
n log(S k X k /S 0 X 0 ).
=
n log(C X k bT y k ) =
n log S k X k
i.e.,
C X k bT y k = S k X k .
n log ,
3.4. PRIMAL-DUAL (SYMMETRIC) ALGORITHM FOR SDP
3.4
57
Primal-Dual (Symmetric) Algorithm for SDP
Once we have a pair (X, y, S) F with = S X/n, we can apply the primaldual Newton method to generate a new iterate X + and (y + , S + ) as follows:
Solve for dX , dy and dS from the system of linear equations:
D1 dX D1 + dS
AdX
AT dy dS
= R := X 1 S,
= 0,
= 0,
(3.17)
where
D = X .5 (X .5 SX .5 ).5 X .5 .
Note that dS dX = 0.
This system can be written as
dX + dS
A dX
T
A dy dS
= R,
= 0,
= 0,
(3.18)
where
dX = D.5 dX D.5 ,
and
dS = D.5 dS D.5 ,
R = D.5 (X 1 S)D.5 ,
D.5 A1 D.5
A1
D.5 A2 D.5
A2
A =
... :=
...
D.5 Am D.5
Am
Again, we have dS dX = 0, and

T
dy = (A A )1 A R , dS = A dy , and dX = R dS .
Then, assign
dS = AT dy
and
dX = D(R dS )D.
Let
V 1/2 = D.5 XD.5 = D.5 SD.5 Mn .

+
Then, we can verify that S X = I V . We now present the following lemma,
whose proof is very similar to that for LP and 3.6 and will be omitted.
Lemma 3.13 Let the direction dX , dy and dS be generated by equation (3.17)
with = n/(n + ), and let
=
1/2
IV
1/2
n+ V
V 1/2
(3.19)
58
where is a positive constant less than 1. Let

X + = X + dX ,
y + = y + dy ,
and
S + = S + dS .
Then, we have (X + , y + , S + ) F and

(X + , S + ) (X, S)

V 1/2 n+ V 1/2
2
IV
.
+
2(1 )
V 1/2
Applying Lemma 3.7 to v Rn as the vector of the n eigenvalues of V , we

can prove the following lemma:
Lemma 3.14 Let V Mn and

+
n. Then,
V 1/2 n+ V 1/2
IV
V 1/2
3/4.
From these two lemmas we have

(X + , S + ) (X, S)

3/4 +
2
=
2(1 )
for a constant . This leads to Algorithm 3.4.
Algorithm 3.4 Given (X 0 , y 0 , S 0 ) F . Set =

While S k X k do
n and k := 0.
1. Set (X, S) = (X k , S k ) and = n/(n + ) and compute (dX , dy , dS ) from

(3.17).
2. Let X k+1 = X k + dX , y k+1 = y k + dy , and S k+1 = S k + dS , where
= arg min (X k + dX , S k + dS ).
Theorem 3.15 Let = n. Then, Algorithm 3.4 terminates in at most
O( n log(S 0 X 0 / )) iterations with

C X k bT y k .
3.5. DUAL ALGORITHM FOR SDP
3.5
59
Dual Algorithm for SDP
An open question is how to exploit the sparsity structure by polynomial interiorpoint algorithms so that they can also solve large-scale problems in practice. In
this paper we try to respond to this question. We show that many large-scale
semidenite programs arisen from combinatorial and quadratic optimization
have features which make the dual-scaling interior-point algorithm the most
suitable choice:
1. The computational cost of each iteration in the dual algorithm is less that
the cost the primal-dual iterations. Although primal-dual algorithms may
possess superlinear convergence, the approximation problems under consideration require less accuracy than some other applications. Therefore,
the superlinear convergence exhibited by primal-dual algorithms may not
be utilized in our applications. The dual-scaling algorithm has been shown
to perform equally well when only a lower precision answer is required.
2. In most combinatorial applications, we need only a lower bound for the
optimal objective value of (SDP). Solving (SDD) alone would be sucient
to provide such a lower bound. Thus, we may not need to generate an X
at all. Even if an optimal primal solution is necessary, our dual-scaling
algorithm can generate an optimal X at the termination of the algorithm
with little additional cost.
3. For large scale problems, S tends to be very sparse and structured since it
is the linear combination of C and the Ai s. This sparsity allows considerable savings in both memory and computation time. The primal matrix,
X, may be much less sparse and have a structure unknown beforehand.
Consequently, primal and primal-dual algorithms may not fully exploit
the sparseness and structure of the data.
These problems include the semidenite relaxations of the graph-partition
problem, the box-constrained quadratic optimization problem, the 0 1 integer
set covering problem, etc.
The dual-scaling algorithm, which is a modication of the dual-scaling linear programming algorithm, reduces the Tanabe-Todd-Ye primal-dual potential
function
(X, S) = ln(X S) ln det X ln det S.
The rst term decreases the duality gap, while the second and third terms keep
X and S in the interior of the positive semidenite matrix cone. When > n,
the inmum of the potential function occurs at an optimal solution. Also note
that, using the arithmetic-geometric mean inequality, we have
n ln(X S) ln det X ln det S n ln n.
60

Let operator A(X) : S n
be dened as
A1 X
A2 X
A(X) =
.
.
.
.
Am X
Since A(X)T y =
AT : m S n is
m
i=1
m
i=1
yi (Ai X) = (
yi Ai ) X, the adjoint operator
m
T
A (y) =
yi Ai .
i=1
Let z = C X for some feasible X and consider the dual potential function
(y, z ) = ln( bT y) ln det S.
z
Its gradient is
(y, z ) =
b + A(S 1 ).
z bT y
For any given y and S = C AT (y) such that S
(3.20)
0 and
(S k ).5 AT (y y k ) (S k ).5 < 1,

using the above lemma, the concavity of the rst term in the potential function,
and the fact that
(S k ).5 S(S k ).5 I = (S k ).5 (S S k )(S k ).5 = (S k ).5 AT (y y k ) (S k ).5 ,
we establish an overestimater for the potential reduction:
(y, z k ) (y k , z k )
= ln(k bT y) ln(k bT y k ) ln det((S k ).5 S(S k ).5 )

z
z
ln(k bT y) ln(k bT y k ) + trace((S k ).5 S(S k ).5 I)
z
z
(S k ).5 (AT (yy k ))(S k ).5
+ 2(1 (S k ).5 (AT (yyk ))(S k ).5 )
= ln(k bT y) ln(k bT y k ) + A((S k )1 )T (y y k )
z
z
(S k ).5 (AT (yy k ))(S k ).5
+ 2(1 (S k ).5 (AT (yyk ))(S k ).5 )
(S k ).5 (AT (yy k ))(S k ).5
(y k , z k )T (y y k ) + 2(1 (S k ).5 (AT (yyk ))(S k ).5 ) .
(3.21)
Therefore, beginning with a strictly feasible dual point (y k , S k ) and upper

bound z k , each iteration solves this following problem.
Minimize
Subject to
T (y k , z k )(y y k )
(S k ).5 AT (y y k ) (S k ).5 ,
(3.22)
where is a positive constant less than 1. For simplicity, in what follows we let
k = z k b T y k .
61
The rst order Karusch-Kuhn-Tucker conditions state that the minimum

point, y k+1 , of this convex problem satises
M k (y k+1 y k )+ (y k , z k ) = M k (y k+1 y k )+( k
b+A((S k )1 )) = 0
z bT y k
(3.23)
for a positive value of , where
A1 (S k )1 (S k )1 A1 A1 (S k )1 (S k )1 Am
.
.
..
.
.
Mk =
.
.
.
Am (S k )1 (S k )1 A1
and
Am (S k )1 (S k )1 Am
A1 (S k )1
.
.
A((S k )1 ) =
.
.
Am (S k )1
The matrix M k is a Gram matrix and is positive denite when S k 0 and the
Ai s are linearly independent. In this paper, it will sometimes be referred to as
M.
Using the ellipsoidal constraint, the minimal solution, y k+1 , of (3.22) is given
by
y k+1 y k =
d(k )y
z
(3.24)
T (y k , z k )(M k )1 (y k , z k )
where
d(k )y = (M k )1 (y k , z k ).
z
(3.25)
Unlike linear programming, semidenite programs require a signicant amount
of the time to compute the system of equations used to to determine the step
direction. For arbitrary symmetric matrices Ai , the AHO direction can be
computed in 5nm3 + n2 m2 + O(max{m, n}3 ) operations, the HRVW/KSH/M
direction uses 2nm3 + n2 m2 + O(max{m, n}3 ) operations, and the NT direction
uses nm3 + n2 m2 /2 + O(max{m, n}3 ) operations. The complexity of computing
the matrix is a full order of magnitude higher than any other step of the algorithm. Fujisawa, Kojima and Nakata explored another technique for computing
primal-dual step directions that exploit the sparsity of the data matrices. However, it is our belief that only the dual-scaling algorithm can fully exploit the
structure and sparsity of many problems, as explained below.
k
Generally, Mij = Ai (S k )1 (S k )1 Aj . When Ai = ai aT , the Gram matrix
i
can be rewritten in the form
T k 1 2
(a1 (S ) a1 )
(aT (S k )1 am )2
1
.
.
..
.
.
Mk =
(3.26)
.
.
.
(aT (S k )1 a1 )2
m
and
(aT (S k )1 am )2
m
aT (S k )1 a1
1
.
.
A((S k )1 ) =
.
.
T
k 1
am (S ) am
62
This matrix can be computed very quickly without computing, or saving, (S k )1 .

Instead, S k can be factored, and then we can use
63
Algorithm M: To compute M k and A((S k )1 ), factor S k = Lk (Lk )T and do

the following:
For i = 1 : m;
Solve Lk wi = ai ;
T
k
A((S k )1 )i = wi wi and Mii = (A((S k )1 )i )2 ;
T
k
Mij = (wi wj )2 ;
For j = 1 : m 1;
end;
end.
Solving each of the m systems of equations uses n2 + O(n) oating point operations. Since there are m(m + 1)/2 vector multiplications, Algorithm M, uses
nm2 + n2 m + O(nm) operations after factoring S k . Note that these operations
can be signicantly reduced if S k is structured and sparse. In applications like
the maximum cut problem, the matrix S k is indeed very sparse while its inverse
is usually dense, so working with S k is faster than working with its inverse. Using matrices of the form Ai = ai aT also reduces the complexity of primal-dual
i
algorithms by a factor of n, but even the quickest direction to compute takes
about twice as long as our dual-scaling direction. Furthermore, they all need to
handle dense X.
Algorithm M needs to store all vectors w1 , ..., wm and they are generally
dense. To save storage and exploit the sparsity of ai , ..., am , an alternative
algorithm is
Algorithm M: To compute M k and A((S k )1 ), factor S k = Lk (Lk )T and do
the following:
For i = 1 : m;
Solve S k wi = ai ;
k
T
A((S k )1 )i = wi ai and Mii = (A((S k )1 )i )2 ;
For j = i + 1 : m;
k
T
Mij = (wi aj )2 ;
end;
end.
Algorithm M does not need to store wj but uses one more back-solver for wi .
To nd a feasible primal point X, we solve the least squares problem
Minimize
Subject to
(S k ).5 X(S k ).5

A(X) = b.
k
I
(3.27)
This problem looks for a matrix X(k ) near the central path. Larger values of
z
generally give a lower objective value, but provide a solution matrix that is
not positive denite more frequently. The answer to (3.27) is a by-product of
computing (3.25), given explicitly by
X(k ) =
z
k k 1 T
A (d(k )y ) + S k (S k )1 .
z
(S )
(3.28)
64
Creating the primal matrix may be costly. However, the evaluation of the
primal objective value C X(k ) requires drastically less work.
z
C X(k )
z
= bT y k + X(k ) S k
z
= bT y k + trace
= bT y k +
= bT y k +
k
k 1
(S )
k 1
AT (d(k )y ) + S k (S k )1 S k
z
k
AT (d(k )y )
z
trace (S )
k
k 1
k T
d( )y A((S ) ) + n
z
+I
Since the vectors A((S k )1 ) and d(k )y were previously found in calculating
z
the dual step direction, the cost of computing a primal objective value is the
cost of a vector dot product! The matrix X(k ) never gets computed during
z
the iterative process, saving time and memory. On the other hand, primal-dual
methods require far more resources to compute the primal variables X.
Dening
P (k ) = k (S k ).5 X(k )(S k ).5 I,

z
z
(3.29)
we have the following lemma:

Lemma 3.16 Let k =
n + n, and < 1. If
k
n
z k bT y k
,
n
P (k ) < min(
z
X(k )S k
z
n
CX(k )bT y k
z
,
n
n
, 1 ),
n + 2
(3.30)

1. X(k )
z
0;
2. (S k ).5 X(k )(S k ).5 I ;

z
3. (1 .5/ n)k .
Proof. The proofs are by contradiction. If the rst inequality is false, then
(S k ).5 X(k )(S k ).5 has at least one nonpositive eigenvalue, which by (3.29) imz
plies that P (k ) 1.
z
If the second does not hold, then
P (k )
z
=
=
=
>
(S k ).5 X(k )(S k ).5 I 2

z
nk
k .5
(S ) X(k )(S k ).5 nk I + nk I I 2
z
nk
k .5
k
k .5
2
(S ) X( )(S ) nk I + nk I I 2
z
nk
2
2
2 + nk 1 n
nk
n
2 n+2
where the last inequality is true because the quadratic term has a minimum at
n
= n+2 .
nk
65
If the third inequality does not hold, then
>
nk
1
1+
n
.5
1
n
1.
which leads to
P (k )
z
1
nk
1
1+
n
1
2 n

2
2 n
(1 )2 .
Focusing on the expression P (k ), it can be rewritten as

z
k
P (k ) = k (S k ).5 (S k )1 AT (d(k )y ) + S k (S k )1 (S k ).5 I

z
z
k .5 T
k
k .5
= (S ) A d( )y (S )
z
= (S k ).5 AT
y k+1 y k
(S k ).5
which by (3.22), makes

T (y k , z k )d(k )y = P (k )
z
z
and
T (y k , z k )(y k+1 y k ) = P (k ) .
(3.31)
(3.32)
Updating the dual variables according to

y k+1 = y k +
d()y
z
P (k+1 )
z
and
S k+1 = C AT (y k+1 ), (3.33)
assures the positive deniteness of S k+1 when < 1, which assures that they
are feasible. Using (3.32) and (3.21), the reduction in the potential function
satises the inequality
(y k+1 , z k ) (y k , z k ) P (k ) +
2
.
2(1 )
(3.34)
The theoretical algorithm can be stated as follows.

DUAL ALGORITHM. Given an upper bound z 0 and a dual point (y 0 , S 0 )
such that S 0 = C AT y 0
0, set k = 0, > n + n, (0, 1), and do the
following:
while z k bT y k do
begin
66

1. Compute A((S k )1 ) and the Gram matrix M k (3.26) using Algorithm M
or M.
2. Solve (3.25) for the dual step direction d(k )y .
z
3. Calculate P (k ) using (3.31).
z
4. If (3.30) is true, then X k+1 = X(k ), z k+1 = CX k+1 , and (y k+1 , S k+1 ) =
z
(y k , S k );
P (k )
z
else y k+1 = y k +
and z k+1 = z k .
d(k+1 )y , S k+1 = C AT (y k+1 ), X k+1 = X k ,

z
endif
5. k := k + 1.
end
We can derive the following potential reduction theorem based on the above
lemma:
Theorem 3.17
(X k+1 , S k+1 ) (X k , S k )
where > 1/50 for a suitable .
Proof.
(X k+1 , S k+1 )(X k , S k ) = (X k+1 , S k+1 ) (X k+1 , S k ) + (X k+1 , S k ) (X k , S k ) .
In each iteration, one of the dierences is zero. If P (k ) does not satisfy
z
(3.30), the dual variables get updated and (3.34) shows sucient improvement
in the potential function when = 0.4.
On the other hand, if the primal matrix gets updated, then using Lemma
3.2 and the rst two parts of Lemma 3.16,
n ln X k+1 S k ln det X k+1 ln det S k
= n ln X k+1 S k ln det X k+1 S k
= n ln X k+1 S k / ln det X k+1 S k /
= n ln n ln det (S k ).5 X k+1 (S k ).5 /
k .5
) X k+1 (S k ).5 /I
n ln n + 2(1 (S k ).5 X k+1 (S k ).5 /I )
(S
2
2(1)
k
k
n ln n +
n ln X S
ln det X k ln det S k +
2
2(1)
Additionally, by the third part of Lemma 3.16
n ln(X k+1 S k ) ln(X k S k ) =
n ln
k
2
3.6. NOTES
67
Adding the two inequalities gives

(X k+1 , S k ) (X k , S k )
2
+
2
2(1 )
By choosing = 0.4 again, we have the desired result.

This theorem leads to
Corollary 3.18 Let n + n and (X 0 , S 0 ) ( n) ln(X 0 S 0 ). Then,

the algorithm terminates in at most O(( n) ln(X 0 S 0 / )) iterations.
Proof. In O(( n) ln(X 0 S 0 / )) iterations,
(X k , S k ) ( n) ln( )).
Also,
(n) ln(C X k bT y k ) = (n) ln(X k S k ) (X k , S k )n ln n (X k , S k ).
Combining the two inequalities,
C X k bT y k = X k S k < .
Again, from (3.28) we see that the algorithm can generate an X k as a byproduct. However, it is not needed in generating the iterate direction, and it is
only explicitly used for proving convergence and complexity.
Theorem 3.19 Each iteration of the dual algorithm uses O(m3 + nm2 + n2 m +
n3 ) oating point iterations.
Proof. Creating S, or S + AT (d(k )), uses matrix additions and O(mn2 ) opz
erations; factoring it uses O(n3 ) operations. Creating the Gram matrix uses
nm2 + 2n2 m + O(nm) operations, and solving the system of equations uses
O(m3 ) operations. Dot products for z k+1 and P (k ) , and the calculation of
z
k+1
y
use only O(m) operations. These give the desired result.
3.6
Notes
The primal potential reduction algorithm for positive semi-denite programming

is due to Alizadeh [9, 8], in which Ye has suggested studying the primal-dual
potential function for this problem and looking at symmetric preserving scal1/2
1/2
ings of the form X0
XX0
, and to Nesterov and Nemirovskii [241], and
the primal-dual algorithm described here is due to Nesterov and Todd [242, 243].
One can also develop a dual potential reduction algorithm. In general, consider
(P SP )
inf
s.t.
C X
A X = b, X K,
68
and its dual
(P SD)
sup
s.t.
bT y
A Y + S = C, S K,
where K is a convex homogeneous cone.

Interior-point algorithms compute a search direction (dX , dY , dS ) and a new
strictly feasible primal-dual pair X + and (Y + ; S + ) is generated from
X + = X + dX , Y + = Y + dY , S + = S + dS ,
for some step-sizes and .
The search direction (dX , dY , dS ) is determined by the following equations:
A dX = 0,
dS = A dY
and
dX + F (S)dS =
or
dS + F (X)dX =
(feasibility)
n+
X F (S) (dual scaling),
X S
n+
S F (X) (primal scaling),
X S
or
(3.35)
(3.36)
(3.37)
n+
S F (X) (joint scaling),
X S
(3.38)
S = F (Z)X.
dS + F (Z)dX =
(3.39)
where Z is chosen to satisfy

The dierences among the three algorithms are the computation of the search
direction and their theoretical close-form step-sizes. All three generate an optimal solution (X, Y, S), i.e.,
X S
in a guaranteed polynomial time.
Other primal-dual algorithms for positive semi-denite programming are in
Alizadeh, Haeberly and Overton [10, 12], Boyd, Ghaoui, Feron and Balakrishnan [56], Helmberg, Rendl, Vanderbei and Wolkowicz [156], Jarre [170], de
Klerk, Roos and Terlaky.[185], Kojima, Shindoh and Hara [190], Monteiro and
Zhang [228], Nesterov, Todd and Ye [244], Potra and Sheng [257], Shida, Shindoh and Kojima [276], Sturm and Zhang [285], Tseng [305], Vandenberghe and
Boyd [313, 314], and references therein. Ecient interior-point algorithms are
also developed for optimization over the second-order cone; see Andersen and
Christiansen [17], Lobo, Vandenberghe and Boyd [204], and Xue and Ye [326].
These algorithms have established the best approximation complexity results
for some combinatorial problems.
Primal-dual adaptive path-following algorithms, the predictor-corrector algorithms and the wide-neighborhood algorithms can also be developed for solving (SDP).
3.7. EXERCISES
3.7
69
Exercises
3.1 Prove a slightly stronger variant of Lemma 3.1: If D Mn such that

0 D < 1 then
Trace(D) log det(I D) Trace(D)
3.2 Given S
0, nd the minimizer of the least-squares problem

minimize
s.t.
Given X
D 2
2(1 D
S 1/2 XS 1/2 I
AX = 0.
0 nd the minimizer of the least-squares problem

minimize X 1/2 SX 1/2 I
s.t.
S = C AT y.
3.3 Let v Rn be a positive vector and

min(v) V 1/2 (e
n. Prove
(n + )
v)
eT v
3/4 .
3.4 Prove the following convex quadratic inequality

(Ay + b)T (Ay + b) cT y d 0
is equivalent to a matrix inequality
I
(Ay + b)T
Ay + b
cT y + d
0.
Using this relation to formulate a convex quadratic minimization problem with

convex quadratic inequalities as an (SDD) problem.
3.5 Prove Corollary 3.9.
3.6 Prove Lemma 3.13.
3.7 Describe and analyze a dual potential algorithm for positive semi-denite
programming in the standard form.
70
Chapter 4
SDP for Combinatorial

Optimization
4.1
Approximation
A (randomized) algorithm for a maximization problem is called (randomized)

r-approximation algorithm, where 0 < r 1, if it outputs a feasible solution
with its (expected) value at least r times the optimum value for all instances of
the problem.
More precisely, let w (> 0) be the (global) maximum value of a given problem instance. Then, a r-approximate maximizer x satises
w(x) r w .
or
E[w(x)] r w .
A (randomized) algorithm for a minimization problem is called (randomized)
r-approximation algorithm, where 1 r, if it outputs a feasible solution with
its (expected) value at most r times the optimum value for all instances of the
problem.
More precisely, let w (> 0) be the (global) minimal value of a given problem
instance. Then, a r-approximate minimizer x satises
w(x) r w .
or
E[w(x)] r w .
71
72
CHAPTER 4. SDP FOR COMBINATORIAL OPTIMIZATION
4.2
Ball-Constrained Quadratic Minimization

z :=
Minimize
xT Qx + 2q T x
(BQP)
Subject to
(4.1)
1.
Here, the given matrix Q Mn , the set of n-dimensional symmetric matrices;

vector q Rn ; and . is the Euclidean norm.
4.2.1
Homogeneous Case: q = 0
Matrix-formulation: Let
z =
X = xxT .
Minimize
QX
Subject to
I X 1,
(BQP)
X
0,
Rank(X) = 1.
SDP-relaxation: Remove the rank-one constraint.

z SDP :=
Minimize
QX
Subject to
I X 1,
(SDP)
X
0.
The dual of (SDP) can be written as:

z SDP =
Maximize
Subject to
yI + S = Q,
(DSDP)
y 0,
0.
X is a minimal matrix solution to SDP if and only if there exist a feasible

dual variable y 0 such that
S = Q y I
y (1 I X ) = 0
S X = 0.
Observation: z SDP z .
Theorem 4.1 The SDP relaxation is exact, meaning
z SDP z .
4.2. BALL-CONSTRAINED QUADRATIC MINIMIZATION

Moreover, let a decomposition of X be
r
X =
xj xT ,
j
j=1
where r is the rank of X . Then, for any j,

x = (I X ) xj / xj
j
is a solution to (BQP).
4.2.2
Non-Homogeneous Case
X = (1; x)(1; x)T Mn+1 ,
Q =
0
q
qT
Q
I =
0
0
0T
I
I1 =
1
0
0T
0
and
z =
Minimize
Q X
Subject to
I X 1,
(BQP)
I1 X = 1,
X 0, Rank(X) = 1.
z =
Minimize
Q X
Subject to
I X 1,
(SDP)
I1 X = 1,
X 0.
z SDP =
Maximize
y1 + y2
Subject to
y 1 I + y 2 I1 + S = Q ,
(DSDP)
y1 0,
0.
73
74
X is a minimal matrix solution to SDP if and only if there exist a feasible
dual variables (y1 0, y2 ) such that
S = Q y 1 I y 2 I1
y1 (1 I X ) = 0
S X = 0.
Lemma 4.2 Let X be a positive semidenite matrix of rank r, A be a given
symmetric matrix. Then, there is a decomposition of X
r
xj xT ,
j
X=
j=1
such that for all j,

xT Axj = A (xj xT ) = A X/r.
j
j
Proof. We prove for the case A X = 0. Let
r
pj pT
j
X=
j=1
be any decomposition of X. Assume that

pT Ap1 < 0
1
and
pT Ap2 > 0.
2
Now let
x1 = (p1 + p2 )/
1 + 2
x2 = (p2 p1 )/
1 + 2,
and
where makes
(p1 + p2 )T A(p1 + p2 ) = 0.
We have
p1 pT + p2 pT = x1 xT + x2 xT
2
1
2
1
and
A (X x1 xT ) = 0.
1
Note that X x1 xT is still positive semidenite and its rank is r 1.
1
z SDP z .
4.3. HOMOGENEOUS QUADRATIC MINIMIZATION
75
Moreover, there is a decomposition of X be

r
X =
xj xT ,
j
j=1
where r is the rank of X , such that for any j,

x = xj /(x0 )
j
j
is a solution to (BQP).
4.3
Homogeneous Quadratic Minimization

z :=
Minimize
xT Qx
Subject to
xT A1 x 1,
(QP-2)
(4.2)
xT A2 x 1.
z =
X = xxT .
Minimize
QX
Subject to
A1 X 1,
(QP-2)
A2 X 1,
X 0, Rank(X) = 1.
z =
Minimize
QX
Subject to
A1 X 1,
(SDP)
A2 X 1,
X 0.
z SDP =
Maximize
y1 + y2
Subject to
y1 A1 + y2 A2 + S = Q,
(DSDP)
y1 , y2 0,
0.
Assume that there is no gap between the primal and dual. Then, X is a
minimal matrix solution to SDP if and only if there exist a feasible dual variable

(y1 , y2 ) 0 such that
S = Q y1 A1 y2 A2
76
y1 (1 A1 X ) = 0
y2 (1 A2 X ) = 0
S X = 0.
z SDP z .
Moreover, there is a decomposition of X
r
xj xT ,
j
X =
j=1
where r is the rank of X , such that for any j,

x = xj
j
for some is a solution to (QP-2).
4.4
Max-Cut Problem
Consider the Max Cut problem on an undirected graph G = (V, E) with nonnegative weights wij for each edge in E (and wij = 0 if (i, j) E), which is the
problem of partitioning the nodes of V into two sets S and V \ S so that
w(S) :=
wij
iS, jV \S
is maximized. A problem of this type arises from many network planning, circuit
design, and scheduling applications.
This problem can be formulated by assigning each node a binary variable
xj :
1
z = Maximize w(x) :=
wij (1 xi xj )
4 i,j
(MC)
Subject to
x2 = 1,
i
i = 1, ..., n.
The Coin-Toss Method: Let each node be selected to one side, or xi be 1,

independently with probability .5. Then, swap nodes from the majority side to
the minority side using the greedy method.
E[w(x)] 0.5 z .
4.4. MAX-CUT PROBLEM
4.4.1
77
Semi-denite relaxation
z SDP :=
The dual is
minimize
s.t.
QX
Ij X = 1, j = 1, ..., n,
X 0.
z SDP = maximize
s.t.
eT y
Q D(y).
(4.3)
(4.4)
Let V = (v1 , . . . , vn ) Rnn , i.e., vj is the jth column of V , such that

X = V TV .
Generate a random vector u N (0, I):
x = sign(V T u),
sign(xj ) =
4.4.2
(4.5)
1 if xj 0
1 otherwise.
Approximation analysis
Then, one can prove from Sheppard [274] (see Goemans and Williamson [125]
and Bertsimas and Ye [50]):
E[i xj ] =
x
arcsin(Xij ),
i, j = 1, 2, . . . , n.
V j /||Vj ||
U
= -1
=1
arccos(.)
V i /||Vi ||
=1
= -1
vT u
vT u
j
i
Figure 4.1: Illustration of the product ( vi ) ( vj ) on the 2-dimensional
unit circle. As the unit vector u is uniformly generated along the circle, the
product is either 1 or 1.
78
Lemma 4.5 For x [1, 1)

1 (2/) arcsin(x)
.878.
1x
Lemma 4.6 Let X
0 and d(X) 1. Then arcsin[X]
X.
Theorem 4.7 We have

i) If Q is a Laplacian matrix, then
E(T Q) .878z SDP .878z ,
x x
so that
z .878z SDP .
ii) If Q is positive semidenite
E(T Q)
x x
so that
z
4.5
2 SDP
2
z
z,
2 SDP
z
.
Multiple Ellipsoid-Constrained Quadratic Maximization

(HQP ) z :=
maximize
subject to
xT Q0 x
xT Qi x 1, i = 1, ..., m.
The previous best approximation ratio, when Qi
0 for all i, was
1
,
min(m2 , n2 )
that is, we can compute a feasible solution x, in polynomial time, such that
x T Q0 x
4.5.1
1
z.
min(m2 , n2 )
SDP relaxation
(SDP ) z SDP := maximize Q0 X
subject to Qi X 1, i = 1, ..., m
X 0.
4.5. MULTIPLE ELLIPSOID-CONSTRAINED QUADRATIC MAXIMIZATION79

and its dual
(SDD) z SDD :=
m
i=1
minimize
subject to
yi
m
Z = i=1 yi Qi Q0
Z 0, yi 0, i = 1, ..., m.
We assume that (SDD) is feasible. Then (SDP) and (SDD) have no duality
gap, that is,
z SDP = z SDD .
Moreover, z = z SDP if and only if (SDP) has a feasible rank-one matrix
solution and meet complementarity condition.
4.5.2
Bound on Rank
Lemma 4.8 Let X be an maximizer of (SDP). Then we can compute another

maximizer of (SDP) whose rank is no more than m by solving a linear program.
Suppose the rank of initial X is r. Then, there exist column vectors xj ,
j = 1, ..., r, such that
r
X =
xj xT .
j
j=1
Let
aij = Qi xj xT = xT Qi xj .
j
j
Then consider linear program
maximize
subject to
r
j=1
r
j=1
a0j vj
aij vj 1, i = 1, ..., m
vj 0, j = 1, ..., r.
Note that v1 = ... = vr = 1 is an optimal solution for the problem.

But we should have at least r inequalities active at an optimal basic solution.
Thus, we should have at most m of vj are positive at an optimal basic solution.
Let X be an SDP maximizer whose rank r min(m, n). Then, there exist
column vectors xj , j = 1, ..., r, such that
r
xj xT .
j
X =
j=1
Note that for each i 1 and j

xT Qi xj = Qi xj xT Qi X 1.
j
j
Thus, each xj is a feasible solution of (HQP). Then, we can select an xj such
that xT Q0 xj is the largest, and have
j
max xT Q0 xj
j
j=1,...,r
1
r
xT Q0 xj =
j
j=1
80

1
1
1
1
Q0 X = z SDP z
z.
r
r
r
min(m, n)
Thus, we have
Theorem 4.9 For solving HQP-m, where Qi

approximation ratio
1
.
min(m, n)
4.5.3
0 for i = 1, ..., m, we have
Improvements
Lemma 4.10 Let X be an maximizer of (SDP). Then we can compute another

maximizer of (SDP) whose rank is no more than m 1 by solving a linear
program.
Thus, we have
Theorem 4.11 For solving HQP-m, where Qi
approximation ratio
1
.
min(m 1, n)
0 for i = 1, ..., m, we have
For large m, consider SDP in the standard form:

z := Minimize
Subject to
C X
Ai X = bi , i = 1, ..., m
X
0.
Theorem 4.12 (Pataki [252], Barvinok [34]) Let X be a minimizer of the

SDP in the standard form. Then we can compute a minimizer of the problem
whose rank r satises r(r + 1)/2 m in polinomial time.
Proof. If the rank of X , r, satises the inequality, then we need do nothing.
Thus, we assume r(r + 1)/2 > m, and let
V V T = X ,
V Rnr .
Then consider
Minimize
V T CV U
Subject to
V T Ai V U = bi , i = 1, ..., m
U
0.
Note that V T CV , V T Ai V s and U are r r symmetric matrices and

V T CV I = C V V T = C X = z .
(4.6)
4.5. MULTIPLE ELLIPSOID-CONSTRAINED QUADRATIC MAXIMIZATION81

Moreover, for any feasible solution of (4.6) one can construct a feasible solution
for the original SDP using
X(U ) = V U V T
and C X(U ) = V T CV U.
(4.7)
Thus, the minimal value of (4.6) is also z , and U = I is a minimizer of (4.6).

Now we show that any feasible solution U to (4.6) is a minimizer for (4.6);
thereby X(U ) of (4.7) is a minimizer for the original SDP. Consider the dual of
(4.6)
m
z := Maximize bT y = i=1 bi yi
m
i=1
V T CV
Subject to
yi V T Ai V T .
(4.8)
Let y be a dual maximizer. Since U = I is an interior optimizer for the primal,

the strong duality condition holds, i.e.,
m
T
yi V T Ai V T ) = 0
I (V CV
i=1
so that we have
V T CV
yi V T Ai V T = 0.
i=1
Then, any feasible solution of (4.6) satises the strong duality condition so that
it must be also optimal.
Consider the system of homogeneous linear equations
V T Ai V W = 0, i = 1, ..., m
where W is a r r symmetric matrices (does not need to be denite). This
system has r(r + 1)/2 real number variables and m equations. Thus, as long as
r(r +1)/2 > m, we must be able to nd a symmetric matrix W = 0 to satisfy all
m equations. Without loss of generality, let W be either indenite or negative
semidenite (if it is positive semidenite, we take W as W ), that is, W has at
least one negative eigenvalue, and consider
U () = I + W.
Choosing = 1/|| where is the least eigenvalue of W , we have

U ( )
and it has at least one 0 eigenvalue or rank(U ( )) < r, and

V T Ai V U ( ) = V T Ai V (I + W ) = V T Ai V I = bi , i = 1, ..., m.
That is, U ( ) is a feasible and so it is an optimal solution for (4.6). Then,
X(U ( )) = V U ( )V T
82
is a new minimizer for the original SDP, and rank(X(U ( ))) < r.
This process can be repeated till the system of homogemeous linear equations
has only all zero solution, which is necessarily given by r(r + 1)/2 m. Thus,
we must be able to nd an SDP solution whose rank satises the inequality. The
total number of such reduction steps is bounded by n 1 and each step uses
no more than O(m2 n) arithmetic operations. Therefore, the total number of
arithmetic operations is a polynomial in m and n, i.e., in (strongly) polynomial
time given the least eigenvalue of W .
4.5.4
Randomized Rounding
Let X = V T V and V is r n, and consider

z SDP :=
maximize
subject to
V Q0 V T U

V Qi V U 1, i = 1, ..., m
U 0.
Note that U = I is an optimal solution for the problem.
Without loss of generality, assume that D := V Q0 V T is a diagonal matrix

Qi V , i = 1, ..., m. Rewrite the problem
and Ai = V
z SDP := maximize D U
subject to Ai U 1, i = 1, ..., m
U 0,
and U = I is an optimal solution for the problem.
Generates a random r-dimension vector u where, independently,
uj =
Note that
and
1 with probability.5
1 otherwise.
E[uuT ] = I,
D uuT = D I = z SDP .
Let
u=
Then, for i = 1, ..., m,
1
Ai uuT
maxi
u.
Ai uuT 1.
That is, uuT is a feasible solution, and
D uuT =
z SDP
.
maxi (Ai uuT )
Using the linear transformation

x = V T u,

4.6. MAX-BISECTION PROBLEM
83
xxT is a feasible solution for (SDP), and
Q0 xxT =
z SDP
,
maxi (Ai uuT )
where for i = 1, ..., m

Ai I 1.
Lemma 4.13 For any given vector a, the probability
Pr(aaT uuT > a 2 ) 2 exp(/2).
Lemma 4.14 The probability
Pr(Ai uuT > ) Pr(Ai uuT > (Ai I)) 2r exp(/2).
Lemma 4.15 The probability
Pr(max(Ai uuT ) > ) 2mr exp(/2).
i
Choose
= 2 ln(4m2 ),
then
Pr(max(Ai uuT ) > 2 ln(4m2 ))
i
1
r
,
2m
2
or
Pr
1
1
maxi (Ai uuT )

2 ln(4m2 )
1
.
2
Theorem 4.16 With probability at least .5

E(T Q0 x) =
x
4.6
1
1
z SDP
z.
2 ln(4m2 )
2 ln(4m2 )
Max-Bisection Problem
Consider the Max-Bisection problem on an undirected graph G = (V, E) with

non-negative weights wij for each edge in E (and wij = 0 if (i, j) E), which
is the problem of partitioning the even number of nodes in V into two sets S
and V \ S of equal cardinality so that
w(S) :=
wij
iS, jV \S
84
is maximized. This problem can be formulated by assigning each node a binary

variable xj :
w :=
Maximize
1
4
(MB)
wij (1 xi xj )
i,j
subject to
xj = 0
or
eT x = 0
j=1
x2 = 1, j = 1, . . . , n,
j
where e n (n even) is the column vector of all ones, superscript T is the
transpose operator. Note that xj takes either 1 or 1, so that we can choose
either S = {j : xj = 1} or S = {j : xj = 1}. The constraint eT x = 0 ensures
that |S| = |V \ S|.
A problem of this type arises from many network planning, circuit design,
and scheduling applications. In particular, the popular and widely used MinBisection problem sometimes can be solved by nding the Max-Bisection over
the complementary graph of G.
The Coin-Toss Method: Let each node be selected to one side, or xi be 1,
independently with probability .5. Then, swap nodes from the majority side to
the minority side using the greedy method.
E[w(x)] 0.5 z .
4.6.1
SDP relaxation
Maximize
Subject to
1
4
wij (1 Xij )
i,j
Xii = 1,
X
i,j
i = 1, ..., n,
Xij = 0,
0.
What complicates matters in Max-Bisection, comparing to Max-Cut, is that

two objectives are actually soughtthe objective value of w(S) and the size of
S. Therefore, in any (randomized) rounding method at the beginning, we need
to balance the (expected) quality of w(S) and the (expected) size of S. We want
high w(S); but, at the same time, zero or small dierence between |S| and n/2,
since otherwise we have to either add or subtract nodes from S, resulting in a
deterioration of w(S) at the end. Our method is built upon a careful balance
of the two, plus an improved proof technique.
4.6.2
85
The .651-method of Frieze and Jerrum
We rst review the Frieze and Jerrum method, and then proceed with our
improved method.
Let X be an optimal solution of Problem (SDP). Since X is positive semidefinite, the randomization method of Goemans and Williamson [125] essentially
generates a random vector u from a multivariate normal distribution with 0
mean and covariance matrix X, that is, using (4.5) to generate S = {i : xi = 1}
or S = {i : xi = 1}.
Then, one can prove Sheppard [274] (see Goemans and Williamson [125],
Frieze and Jerrum [110], and Bertsimas and Ye [50]):
E[i xj ] =
x
and
1
arcsin(Xij ),
i, j = 1, 2, . . . , n,
(4.9)
arcsin(Xij ) (1)(1 Xij ),
where
(1) := min
arcsin(y)
.878567
1y
1y<1
which is a special case of Denition (4.21) in Section 5.

Thus,
E[w(S)] =
1
4
1
4
wij 1
i,j
arcsin(Xij )
wij (1)(1 Xij )

i,j
= (1) wSD
(1) w ;
(4.10)
and
n2
(eT x)2
4
4
1
4
1
4
1
i,j
(1)(1 Xij )
i,j
= (1)
since
arcsin(Xij )
n2
,
4
(4.11)
Xij = eeT X = 0.
i,j
However, x may not satisfy eT x = 0, i.e., S may not be a bisection. Then,
using a greedy method, Frieze and Jerrum have adjusted S by swapping nodes
86
from the majority block into the minority block until they are equally sized.
Note that inequality (4.11) assures that not too many nodes need to be swapped.
Also, in selecting swapping nodes, Frieze and Jerrum make sure that the least
weighted node gets swapped rst. More precisely, (w.l.o.g) let |S| n/2; and for
each i S, let (i) = jS wij and S = {i1 , i2 , ..., i|S| }, where (i1 ) (i2 )
... (i|S| ). Then, assign S = {i1 , i2 , ..., in/2 }. Clearly, the construction of
guarantees that
bisection S
w(S)
n w(S)
,
2|S|
n/2 |S| n.
(4.12)
In order to analyze the quality of bisection S, they dene two random variables:
1
w := w(S) =
wij (1 xi xj )

4 i,j
and
m := |S|(n |S|) =
n2
(eT x)2
=
4
4
4
(1 xi xj )

i,j
(since (eT x)2 = (2|S| n)2 ). Then, from (4.10) and (4.11),
E[w] (1) w
E[m] (1) m ,
and
where
m =
n2
.
4
Thus, if a new random variable

z=
w
m
+ ,
w
m
(4.13)
then
E[z] 2(1),
and
z3
since w/w 2|S|/n 2 and m/m 1.

For simplicity, Frieze and Jerrums analysis can be described as follows. They
repeatedly generate samples u N (0, X), create x or S using the randomization
using the swapping procedure, and record the largest

method (4.5), construct S
value of w(S) among these samples. Since E[z] 2(1) and z 3, they are
almost sure to have one z to meet its expectation before too long, i.e., to have
z 2(1).
Moreover, when (4.14) holds and suppose
w(S) = w ,
which from (4.13) and (4.14) implies that
m
2(1) .
m
(4.14)
87
Suppose that |S| = n; then the above inequality implies

2(1) 4(1 ).
(4.15)
Applying (4.12) and (4.15), one can see that
w(S)
=
w(S)
2
w
2
(2(1) 4(1 ))w
2
2( 2(1) 1)w .
The last inequality follows from simple calculus that = 2(1)/2 yields the
minimal value for (2(1) 4(1 ))/(2) when 0 1. Note that for
(1) .878567, 2( 2(1) 1) > .651. Since the largest w(S) in the process
is at least as good as the one who meets z 2(1), they proceed to prove a
.651-approximation algorithm for Max-Bisection.
4.6.3
A modied rounding and improved analyses
Our improved rounding method is to use a convex combination of X and a

positive semidenite matrix P as the covariance matrix to generate u and x, i.e,
u N (0, X + (1 )P ),
x = sign(u),
S = {i : xi = 1}
or S = {i : xi = 1}
such that |S| n/2, and then S from the Frieze and Jerrum swapping procedure.
One choice of P is
n
1
P =
(I eeT ),
n1
n
1
where I is the n-dimensional identity matrix. Note that I n eeT is the projection matrix onto the null space of eT x = 0, and P is a feasible solution for
(SDP). The other choice is P = I, which was also proposed in Nesterov [240],
and by Zwick [345] for approximating the Max-Cut problem when the graph is
sparse.
Again, the overall performance of the rounding method is determined by two
factors: the expected quality of w(S) and how much S need to be downsized.
The convex combination parameter used in X + (1 )P provides a balance
between these two factors. Typically, the more use of X in the combination
results in higher expected w(S) but larger expected dierence between |S| and
n/2; and the more use of P results in less expected w(S) and more accurate |S|.
88
More precisely, our hope is that we could provide two new inequalities in
replacing (4.10) and (4.11):
1
4
E[w(S)] =
wij (1 E[i xj ]) w
x
(4.16)
i,j
and
E[|S|(n |S|)] = E
(eT x)2
n2
1
=
4
4
4
(1 E[i xj ])
x
i,j
n2
,
4
(4.17)
such that would be slightly less than (1) but would be signicantly greater
than (1). Thus, we could give a better overall bound than .651 for MaxBisection.
Before proceed, we prove a technical result. Again, let two random variables
w := w(S) =
1
4
wij (1 xi xj ) and

m := |S|(n |S|) =
i,j
1
4
(1 xi xj ).

i,j
Then, from (4.16) and (4.17),

E[w] w
and E[m] m ,
where m =
n2
.
4
Furthermore, for a given parameter 0, let new random variable

z() =
w
m
+ .
w
m
(4.18)
Then, we have
E[z()] +
and
z 2 + .
Note that Frieze and Jerrum used = 1 in their analyses.

Now we prove the following lemma:
Lemma 4.17 Assume (4.16) and (4.17) hold. Then, for any given /(4
), if random variable z() meets its expectation, i.e., z() + , then
w(S) 2
( + ) w .
In particular, if
=
(
1)
2
1
(which is greater than /(4 ) since > 0), then
w(S)
w .
1+ 1

Proof. Suppose
w(S) = w
89
and
|S| = n,
which from (4.18) and z() + implies that

+ 4(1 ).
Applying (4.12) we see that
w(S)
w(S)
2
w
=
2
+ 4(1 )
w
2
2( ( + ) ) w .
The last inequality follows from simple calculus that
+
=
2
yields the minimal value for ( + 4(1 ))/(2) in the interval [0, 1], if
/(4 ).
In particular, substitute
1
1)
(
2
1
into the rst inequality, we have the second desired result in the lemma.
The motivation to select =
1
2 ( 1
1) is that it yields the maximal
value for 2( ( + ) ). In fact, when both = = (1) .878567 as in

the case of Frieze and Jerrum,
1+
> 0.6515,
1
which is just slightly better than 2( 2(1) 1) 0.6511 proved by Frieze and
Jerrum. So their choice = 1 is almost optimal. We emphasize that is
only used in the analysis of the quality bound, and is not used in the rounding
method.
4.6.4
A simple .5-approximation
To see the impact of in the new rounding method, we analyze the other
extreme case where = 0 and
P =
1
n
(I eeT ).
n1
n
90
That is, we generate u N (0, P ), then x and S. Now, we have
1
1
2
1
E[w(S)] = E
wij (1 xi xj ) =

1 + arcsin(
)
4 i,j
4
n1
wij .5w
i=j
(4.19)
and
n2
(eT x)2
n2 n n(n 1) 2
1
1 n2
+
arcsin(
) (1 ) , (4.20)
4
4
4
4
4
n1
n
4
where from (4.9) we have used the facts that

E[i xj ] =
x
and
1
2
2
1
arcsin(
),
n1
i=j
wij w .
wij =
i<j
i=j
In other words, we have in Lemma 4.17

= .5
and
=1
1
.
n
Comparing (4.19) and (4.20) to (4.10) and (4.11), here the rst inequality on
w(S) is worse, .5 vs .878567; but the second inequality is substantially better,
1
1 n vs .878567; i.e., x here is a bisection with probability almost 1 when n is
large. Using Lemma 4.17, we see that the method is a
.5
>
1+ 1
1 + 1/n
approximation method. The same ratio can be established for P = I.
4.6.5
A .699-approximation
We now prove our main result. For simplicity, we will use P = I in our rounding
method. Therefore, we discuss using the convex combination of X + (1 )I

for a given 0 1, i.e.,
as the covariance matrix to generate u, x, S and S
u N (0, X + (1 )P ),
x = sign(u),
S = {i : xi = 1}
or S = {i : xi = 1}
such that |S| n/2, and then S from the Frieze and Jerrum swapping procedure.
Dene
2
1 arcsin(y)
;
(4.21)
() := min
1y<1
1y
91
and
() := (1
1
)b() + c(),
n
(4.22)
where
b() = 1
2
arcsin()
and c() = min
1y<1
2 arcsin() arcsin(y)
.
1y
Note that (1) = (1) .878567 as shown in Goemans and Williamson [125];
1
and (0) = .5 and (0) = 1 n . Similarly, one can also verify that
(.89) .835578,
b(.89) .301408
and
c(.89) .660695.
We now prove another technical lemma:

Lemma 4.18 For any given 0 1 in our rounding method, inequalities
(4.16) and (4.17) hold for
= ()
Proof. Since 1
have
and
= ().
arcsin() 0 for any 0 1, from denition (4.21) we
E[w(S)] =
1
4
1
4
wij 1
i,j
arcsin(Xij )
wij ()(1 Xij )

i,j
= () wSD
() w .
Noting that
Xij = n,
i=j
from the denition (4.22) we have

E[|S|(n |S|)]
=
=
n2
(eT x)2
4
4
1
2
1 arcsin(Xij )
4
i=j
1
4
1
4
1
4
1
i=j
2
2
2
arcsin() + arcsin() arcsin(Xij )
b() +
i=j
2
2
arcsin() arcsin(Xij )
b() + c()(1 Xij )

i=j
92

=
=
=
1
(n2 n)b() + (n2 n)c() + nc()
4
1
n2
(1 )b() + c()
n
4
2
n
()
.
4
Lemmas 4.17 and 4.18 together imply that for any given between 0 and 1,
our rounding method will generate a
()
w(S)
1+
1 ()
as soon as (bounded) z() of (4.18) meets its expectation. Thus, we can set
()
to a value in [0, 1] such that
is maximized, that is, let
1+
1()
= arg max
[0,1]
and
rM B =
()
1+
1 ()
( )
1+
1 ( )
In particular, if = .89 is selected in the new rounding method,

(.89) > .8355,
and for n suciently large ( 104 )
(.89) = (1
1
)b(.89) + c(.89) > .9620,
n
which imply
rM B =
( )
1+
1 ( )
(.89)
1+
1 (.89)
> .69920.
This bound yields the nal result:

Theorem 4.19 There is a polynomial-time approximation algorithm for MaxBisection whose expected cut is at least rM B times the maximal bisection cut, if
the number of nodes in the graph is suciently large. In particular, if parameter
= .89 is used, our rounding method is a .699-approximation for Max-Bisection.
The reader may ask why we have used two dierent formulations in dening
() of (4.21) and () of (4.22). The reason is that we have no control on the
ratio, , of the maximal bisection cut w over the total weight i<j wij , i.e.,
:=
w
i<j
wij
4.7. QUADRATIC OPTIMIZATION
93
Note that ranges from 1/2 to 1. Indeed, using the second derivation in Lemma
4.18, we can also prove that in Lemma 4.17
1
b() + c().
2
Thus, in the worse case = 1, we can only establish
1
b() + c().
2
Then, for = .89, we have .8113, which is less than .8355 established by
using the rst derivation.
However, if .8, then we have .8490. For Max-Bisection on these
graphs, our method is a .710 approximation for setting = .89. This bound
can be further improved by setting a smaller . In general, the quality bound
improves as decreases. When near 1/2, we have a close to 1 approximation
1
1
if = 0 is chosen, since b(0) = 1, c(0) = 0, 2 and 1 n .
In any case, we can run our rounding method 100 times for parameter =
.00, .01, ..., .98, .99 and report the best rounding solution among the 100
tries. This will ensure us to produce a solution with a near best guarantee, but
without the need to know .
4.7
Quadratic Optimization
In this section we consider approximating the global minimizer of the following

QP problem with certain quadratic constraints:
q(Q) :=
minimize
s.t.
q(x) := xT Qx
n
2
j=1 aij xj = bi , i = 1, . . . , m,
e x e,
where symmetric matrix Q Mn , A = {aij } Rmn and b Rm are given

and e Rn is, again, the vector of all ones. We assume that the problem is
feasible and denote by x(Q) its global minimizer.
Normally, there is a linear term in the objective function:
q(x) = xT Qx + cT x.
However, the problem can be homogenized as
minimize
s.t.
xT Qx + t cT x
n
2
j=1 aij xj = bi , i = 1, . . . , m,
e x e, 1 t 1
by adding a scalar variable t. There always is an optimal solution (x, t) for this
problem in which t = 1 or t = 1. If t = 1, then x is also optimal for the
94
original problem; if t = 1, then x is optimal for the original problem. Thus,

without loss of generality, we can assume q(x) = xT Qx throughout this section.
The function q(x) has a global maximizer over the bounded feasible set as
well. Let q := q(Q) and q := q(Q) denote their maximal and minimal
objective values, respectively. We now present a fast algorithm to compute a

4/7-approximate minimizer, that is, to compute a feasible x, such that
q() q
x
4/7.
qq
4.7.1
Positive semi-denite relaxation
The algorithm for approximating the QP minimizer is to solve a positive semidenite programming relaxation problem:
p(Q) :=
minimize
s.t.
QX
Ai X = bi , i = 1, . . . , m,
d(X) e, X 0.
(4.23)
Here, Ai = diag(ai ), ai = (ai1 , . . . , ain ), and unknown X Mn is a symmetric

matrix. Furthermore, d(X) is a vector containing the diagonal components of
X. Note that d(X) e can be written as Ij X 1, j = 1, . . . , n, where Ij is
the all-zero matrix except the jth diagonal component equal to 1.
The dual of the relaxation is
p(Q) =
maximize
s.t.
eT z + bT y
Q D(z) +
m
i=1
yi Ai , z 0,
(4.24)
where D(z) is the diagonal matrix such that d(D(z)) = z Rn . Note that the
relaxation is feasible and its dual has an interior feasible point so that there is
no duality gap between the primal and dual. Denote by X(Q) and (y(Q), z(Q))
an optimal solution pair for the primal (4.23) and dual (4.24). For simplicity,
in what follows we let x = x(Q) and X = X(Q).
We have the following relations between the QP problem and its relaxation:
Proposition 4.20 Let q := q(Q), q := q(Q), p := p(Q), p := p(Q),
(, z ) = (y(Q), z(Q)). Then,

y
i) q is the maximal objective value of xT Qx in the feasible set of the QP problem;
ii) p = eT z + bT y and it is the maximal objective value of Q X in the feasible
m
set of the relaxation, and D() + i=1 yi Ai Q 0;
z
iii)
p q q p.

95
Proof. The rst and second statements are straightforward to verify. Let
X = xxT Mn . Then X 0, d(X) e,
n
Ai X = xT Ai x =
aij (xj )2 = bi , i = 1, . . . , m,
j=1
and Q X = xT Qx = q(x) = q(Q). Thus, we have q(Q) p(Q), or p q.

Similarly, we can prove q(Q) p(Q), or p q .

Since X is positive semi-denite, there is a matrix factorization V =
(v 1 , . . . , v n ) Rnn , i.e., v j is the jth column of V , such that X = V T V .
Then, after obtaining X and V , we generate a random vector u uniformly distributed on the unit sphere in Rn and assign
x = D(V T u),
where
(4.25)
D = diag( v 1 , . . . , v n ) = diag( x11 , . . . , xnn ),
and, for any x Rn , (x) Rn is the vector whose jth component is sign(xj ):
sign(xj ) =
1 if xj 0
1 otherwise.
It is easily see that x is a feasible point for the QP problem and we will show
later that its expected objective value, Eu q(), satises

x
Eu q() q
x
4
1 .
qq
2
7
That is, x is a 4/7-approximate minimizer for the QP problem expectantly. One
can generate u repeatedly and choose the best x in the process. Thus, we will
almost surely generate a x that is a 4/7-approximate minimizer.
4.7.2
Approximation analysis
First, we present a lemma which will be only used in our analysis.

Lemma 4.21 Let u be uniformly distributed on the unit sphere in Rn . Then,
q(Q) =
minimize Eu ((V T u)T DQD(V T u))

s.t.
Ai (V T V ) = bi , i = 1, . . . , m,
vj 1, j = 1, . . . , n,
where
D = diag( v1 , . . . , vn ).
96
Proof. Since, for any feasible V , D(V T u) is a feasible point for the QP
problem, we have
q(Q) Eu ((V T u)T DQD(V T u)).
On the other hand, for any xed u with u = 1, we have
n
Eu ((V T u)T DQD(V T u)) =
qij vi
T
T
vj Eu ((vi u)(vj u)).
(4.26)
i=1 j=1
Let us choose vi =
problem. Then
xi
x
x, i = 1, . . . , n. Note that V is feasible for the above
T
T
Eu ((vi u)(vj u)) =
Thus,
1 if (xi ) = (xj )
1
otherwise.
T
T
vj Eu ((vi u)(vj u)) = xi xj
vi
which implies that for this particular feasible V

q(Q) = q(x) = Eu ((V T u)T DQD(V T u)).
These two relations give the desired result.
For any function of one variable f (t) and X Rnn , let f [X] Rnn be the
matrix with the components f (xij ). We have the next technical lemma whose
proof is an exercise.
Lemma 4.22 Let X
X.
Now we are ready to prove the following theorem, where we use inmum
to replace minimum, since for simplicity we require X to be positive denite
in our subsequent analysis.
Theorem 4.23
q(Q) =
inmum
s.t.
2
1
XD1 ]D)
Q (D arcsin[D
Ai X = bi , i = 1, . . . , m,
d(X) e, X
where
0,
D = diag( x11 , . . . , xnn ).
Proof. For any X = V T V
0, d(X) e, we have
T
T
Eu ((vi u)(vj u))
T
T
= 1 2Pr{(vi u) = (vj u)}
= 1 2Pr (
T
T
vj u
vi u
) = (
) .
vi
vj

We have the following relation:
Pr (
97
T
T
vj u
vi u
) = (
)
vi
vj
T
vi vj
vi vj
1
arccos
as illustrated in Figure 4.1. Then, noting arcsin(t) + arccos(t) =

T
T
Eu ((vi u)(vj u)) =
2
arcsin
T
vi vj
vi vj
we have
and using equality (4.26) we further have

n
Eu ((V T u)T DQD(V T u))
qij vi
vj
i=1 j=1
2
arcsin
T
vi vj
vi vj
2
Q (D arcsin[D1 XD1 ]D).
Finally, Lemma 4.21 gives us the desired result.

Theorem 4.23 and Lemma 4.22 lead to our main result:
Theorem 4.24 We have
i)
pq
2
( p).
p
qp
2
( p).
p
ii)
iii)
ppqq
4
( p).
p
Proof. We prove (i). Recall z = z(Q) 0, y = y(Q), p = p(Q) =
m
T
T
e z + b y , and D() + i=1 yi Ai Q 0. Thus, for any X 0, d(X) e and
D = diag( x11 , . . . , xnn ), from Theorem 4.23
q = q(Q)
2
2
Q (D arcsin[D1 XD1 ]D)
m
Q D()
z
i=1
yi Ai + D() +
yi Ai
i=1
Lemma 1.2 of Goemans and Williamson [125].
D arcsin[D1 XD1 ]D
98

m
Q D()
z
D arcsin[D1 XD1 ]D
yi Ai
i=1
+ D() +
z
D arcsin[D1 XD1 ]D
yi Ai
i=1
m
Q D()
z
DD1 XD1 D
yi Ai
i=1
+ D() +
z
D arcsin[D1 XD1 ]D
yi Ai
i=1
m
(since Q D()
z
yi Ai
0 and arcsin[D1 XD1 ]
D1 XD1 .)
i=1
m
Q D()
z
yi Ai
i=1
m
+ D() +
z
D arcsin[D1 XD1 ]D
yi Ai
i=1
m
QX
D() +
z
yi Ai
i=1
m
+ D() +
z
D arcsin[D1 XD1 ]D
yi Ai
i=1
m
Q X z T d(X)
yi aT d(X)
i
i=1
m
+ d(D arcsin[D
z
XD
yi aT d(D arcsin[D1 XD1 ]D)

i
]D) +
i=1
Q X z T d(X) y T b + z T ( d(X)) + y T ( b)
2
2
1
1
(since d(D arcsin[D XD ]D) = d(X) and aT d(X) = bi )
i
2
T
T
Q X + ( 1) d(X) + ( 1) b
z
y
2
2
Q X + ( 1)(T e + y T b)
z
2
(since 0 d(X) e and z 0)
p
Q X + ( 1).
2
Let X
0 converge to X, then Q X p and we prove (i).

Replacing Q with Q proves (ii) in the theorem.
Adding the rst two inequalities gives (iii) of the theorem.
4.8. NOTES
99
The result indicates that the positive semi-denite relaxation value p p
is a constant approximation of q q. Similarly, the following corollary can be
devised:
Corollary 4.25 Let X = V T V
0, d(X) e, Ai X = bi (i = 1, . . . , m), D =
diag( x11 , . . . , xnn ), and x = D(V T u) where u with u = 1 is a random
vector uniformly distributed on the unit sphere. Moreover, let X

0 X.
Then,
lim Eu (q()) = lim
x
XX
XX
2
2
2
Q (D arcsin[D1 XD1 ]D) p + (1 ).
p
Finally, we have the following theorem:

Theorem 4.26 Let x be randomly generated from X. Then
Eu q() q
x
1 < 4/7.
qq
2
Proof. Since
pq

2
2
2
2
p + (1 )p (1 ) + p q p,
we have, from Corollary 4.25,

Eu q() q
x
qq
=
=
4.8
2
p
2
p
+ (1 ) q
qq
2
p +
2
p +
2
p +
2
p +
2
(1 ) q
p
2
(1 )p q
2
(1 ) p
p
2
(1 )p p
2
(1 )( p)
p
2
p
( p)
2
=
2
2
1.
Notes
Semidenite relaxations have recently appeared in relation to relaxations for

0-1 optimization problems. In [207], a lifting procedure is presented to obtain
2
a problem in n ; and then the problem is projected back to obtain tighter
100
inequalities. See also [29]. Several of the operators that arise in our applications are similar to those that appear in [207]. However, our motivation and
approach is dierent. A discussion of several applications for semidenite relaxation appears in [9]. Also see recent papers by Fujie and Kojima [113] and
Polijak, Rendl and Wolkowicz [254].
During the past ten years, there have been several remarkable results on approximating specic quadratic problems using positive semi-denite programming. Goemans and Williamson [125] proved an approximation result for the
Maxcut problem where 1 0.878. Nesterov [240] extended their result to
approximating a boolean QP problem where 4/7. Most of the results are
related to graph partition.
Given an undirected graph G = (V, E) with |V | = n, non-negative weights
wij on edges (i, j) E, and an integer k (1 < k < n), the maximization graph
partition (MAX-GP) problem is to determine a subset S V of k nodes such
that an objective function w(S) is maximized. Some examples of MAX-GP are:
Dense-k-Subgraph (DSP), where the total edge weights of the subgraph induced
by S is maximized; Max-Cut with size k (MC), where the total edge weights of
the edges crossing between S and V \ S is maximized; Max-Not-Cut with size k
(MNC), where the total edge weights of the non-crossing edges between S and
V \ S is maximized; Max-Vertex-Cover with size k (MVC), where the total edge
weights of the edges covered by S is maximized.
Since these MAX-GP problems are NP-hard (e.g., see [100] for DSP, [2] for
MC, [151] for MNC, and [253] for MVC), one should not expect to nd polynomial time algorithms for computing their optimal solutions. Therefore, we
are interested in how close to optimality one can approach in polynomial time.
A (randomized) polynomial time approximation algorithm for a maximization
problem has a performance guarantee or worst case ratio 0 < r 1, if it outputs
a feasible solution whose (expected) value is at least r times the maximal value
for all instance of the problem. Such an algorithm is often called (randomized)
r-approximation algorithm. A key step in designing a good approximation algorithm for such a maximization problem is to establish a good upper bound
on the maximal objective value. Linear programming (LP) and semidenite
programming (SDP) have been frequently used to provide such upper bounds
for many NP-hard problems.
There are several approximation algorithms for DSP. Kortsarz and Peleg
[193] devised an approximation algorithm which has a performance guarantee
O(n0.3885 ). Feige, Kortsarz and Peleg [101] improved it to O(n1/3+ ), for
some > 0. The other approximation algorithms have performance guarantees
which are the function of k/n, e.g., a greedy heuristic by Asahiro et.al. [23],
and SDP relaxation based algorithms developed by Feige and Langberg [99]
and Feige and Seltser [100], and Srivastav and Wolf [282]. The previously best
performance guarantee, k/n for general k and k/n+ k for k n/2, was obtained
by Feige and Langberg [99]. Moreover, for the case k = n/2, Ye and Zhang [151],
using a new SDP relaxation, obtained an improved 0.586 performance guarantee
from 0.517 of Feige and Langberg [99].
For approximating the MC problem with size k, both LP and SDP based
4.9. EXERCISES
101
approximation algorithms have a performance ratio 1/2 for all k, see Ageev
and Sviridenko [2] and Feige and Langberg [99]. For k = n/2, i.e., the MaxBisection, Frieze and Jerrum [110] obtained a 0.651-approximation algorithm
(the same bound was also obtained by Andersson [16] in his paper for max-psection). Subsequently, this ratio has been improved to 0.699 by Ye [334]. Both
of their approximation algorithms are based on SDP relaxations.
For approximating the MNC problem with size k, the LP-based approximation algorithm has a performance ratio 1 2k(nk) for all k, and the SDP-based
n(n1)
algorithm has a ratio .5 + k for k n/2, see Feige and Langberg [99]. Again,
Ye and Zhang [151] obtained a 0.602-approximation algorithm for MNC when
k = n/2, comparing to .541 of Feige and Langberg [99].
Han et al. [151] has presented an improved method to round an optimal
solution of the SDP relaxation of the MAX-GP problem for general k. This
rounding technique is related to the well-known rounding method introduced by
Goemans and Williamson [125], Feige and Goemans [98] for MAX-DICUT and
MAX 2-SAT, Zwick [344] for constraint satisfaction problems, and Nesterov[240]
and Zwick [345] for MAX-CUT. This kind of randomized algorithm can be derandomized by the technique of Mahajan and Ramesh[214].
What complicates matters in the MAX-GP problem, comparing to the MAXCUT problem, is that two objectives are soughtthe objective value of w(S)
and the size of S. Therefore, in any (randomized) rounding method, we need to
balance the (expected) quality of w(S) and the (expected) size of S. One wants
high w(S); but, at the same time, zero or small dierence between |S| and k,
since otherwise we have to either add or subtract nodes from S, resulting in a
deterioration of w(S) at the end. Our improved rounding method is built upon
this balance need.
As consequences of the improved rounding method, they have yielded improved approximation performance ratios for DSP, MC, MNC and MVC, on
a wide range of k. On approximating DSP, for example, our algorithm has
guaranteed performance ratios .648 for k = 3n/5, 0.586 for k = n/2, 0.486 for
k = 2n/5 and 0.278 for k = n/4. For MC and MNC, the performance guarantees
are also much better than 0.5 for a wide range of k. On approximating MVC,
our algorithm has guaranteed performance ratios .845 for k = 3n/5, 0.811 for
k = n/2, and 0.733 for k = 2n/5.
4.9
Exercises
4.1 Let X be a positive semidenite matrix of rank r, A be a given symmetric

matrix. Then, there is a decomposition of X
r
xj xT ,
j
X=
j=1
such that for all j,
xT Axj = A (xj xT ) = A X/r.

j
j
102
4.2 Prove Theorem 4.4.

4.3 The s t Max-Cut Problem: This is a max-cut problem where we require
that a pair of nodes s and t be separated by the cut.
Construct the SDP relaxation for the problem.
Show that the problem can be approximated by the factor 0.878.
The result holds for more pairs need to be separated.
4.4 Triangle Inequality: In the Max-Cut or other problems, decision variable
xi is either 1 or 1. Thus, for any three variables we have
|xi + xj + xk | 1
and
|xi xj | + |xj xk | |xk xi |
which call triangle-type inequalities. Show how to incoporate these inequalities
in the SDP relaxation of the problem.
4.5 Let X
X.
4.6 Prove Lemma 4.10.

4.7 In solving the standard SDP problem with m equality constraints, we have
shown that there is an optimal SDP solution whose rank r satisfying
r(r + 1)
m,
2
and there is a strongly polynomial-time algorithm to nd such a solution from
any exactly optimal SDP solution. In reality, one may never generate an exactly optimal SDP solution but an -optimal solution X, meaning its minimal
objective value is away from the optimal z :
C X z + .
Prove that, from X, one can nd an -optimal SDP solution whose rank r
satises
r(r + 1)
m+1
2
in strongly polynomial time. Show how the rank-reduction procedure works.
4.8 Consider the following eigenvalue optimization problem
minimize
nmax (Q + Diag())
y
subject to
eT y = 0
(4.27)
y Rm ,
where max (A) is the maximum eigenvalue of the matrix A. Show that this
problem is equivalent to the dual of the SDP relaxation for Max-Cut.
4.9. EXERCISES
103
4.9 Quadratic 0-1 Programming: The following problem is referred as the

Quadratic 0-1 programming
maximize
xT Bx
subject to
x {0, 1}n .
(4.28)
Show that the following matrix representation is equivalent to the above

quadratic 0-1 programming problem (4.28):
maximize
BX
subject to X =
diag(X)T
diag(X)
0,
(4.29)
rank(X) = 1,
where diag(A) is the vector of diagnoal elements in A.
Note 1. It suces to show that both problems have the same feasible region.
Note 2. Problem (4.29) without the rank 1 constraint on X is an SDP
relaxation of (4.28).
4.10 The 2-Catalog Segmentation Problem ([325]): Given a ground set I of n
items, a family {S1 , S2 , , Sm } of subsets of I and an integer 1 k n. The
problem is to nd two subsets A1 , A2 I such that |A1 | = |A2 | = k to maximize
m
i=1 max{|Si A1 |, |Si A2 |}. Here I can be the list of goods; there are m
customers where customer i is interested in the goods of Si ; A1 and A2 are the
two catalogs one of which could be sent to each of the customers such that the
(total) satisfaction is maximized. Find an SDP relaxation to the problem.
4.11 The Sparsest Cut Problem ([22]): Given a graph G = (V, E). For any
cut (S, S) with |S| |V |/2, the edge expansion of the cut is |E(S, S)|/|S|. The
problem is to nd the S such that |E(S, S)|/|S| is the smallest. Find an SDP
relaxation to the problem.
104
Chapter 5
SDP for Geometry

Computation
One of the most studied problems in distance geometry is the Graph Realization
problem, in which one is given a graph G = (V, E) and a set of nonnegative
weights {dij : (i, j) E} on its edges, and the goal is to compute a realization
of G in the Euclidean space Rd for a given dimension d, i.e. to place the
vertices of G in Rd such that the Euclidean distance between every pair of
adjacent vertices vi , vj equals to the prescribed weight dij . This problem and its
variants arise from applications in various areas, such as molecular conformation,
dimensionality reduction, Euclidean ball packing, and more recently, wireless
sensor network localization. In this chapter, we discuss SDP formulations of
such Euclidean distance geometry problems.
5.1
The basic SDP model
The basic mathematical model of Euclidean distance geometry optimization

can be described as a quadratically constrained optimization problem. Let n
unknown points xj Rd , j = 1, ..., n. For each pair of (i, j), we are given
Euclidean distance upper bound dij and lower bound dij between the two points
xi and xj , and an objective function of x1 , ..., xn . Moreover, we require that
each point xj where is a convex set. Then, the basic model can be
formulated as:
minimize
subject to
f (x1 , ..., xn )
(dij )2 xi xj
xj , j,
(dij )2 , i < j
(5.1)
Note that the bounding from below constraint

(xi xj )
105
(dij )2
(5.2)
106
CHAPTER 5. SDP FOR GEOMETRY COMPUTATION
is not convex, so that the ecient convex optimization techniques cannot apply
to solving this problem.
Let X = [x1 x2 ... xn ] be the d n matrix that needs to be determined.
Then
xi xj 2 = eT X T Xeij ,
ij
where eij is the vector with 1 at the ith position, 1 at the jth position and
zero everywhere else. Let Y = X T X. Then problem (5.1) can be rewritten as:
minimize
subject to
f (X, Y )
(dij )2 eT Y eij (dij )2 , i < j,

ij
xj , j,
Y = X T X.
(5.3)
Our approach is to relax problem (5.3) to a semidenite program:

minimize
subject to
f (X, Y )
(dij )2 eT Y eij (dij )2 , i < j,

ij
xj , j,
Y
X T X.
(5.4)
The last matrix inequality is equivalent to (Boyd et al. [56])

Z :=
I
XT
X
Y
0.
Then, the problem can be written as a standard SDP problem:

minimize
subject to
f (Z)
Z(1 : d, 1 : d) = I,
(dij )2 (0; eij )T Z(0; eij ) (dij )2 , i < j,

Z(1 : d, j) j = d + 1, ..., d + n,
Z 0.
(5.5)
If f (Z) is a convex function of Z, then (5.5) becomes a convex optimization

problem. In particular, if f (Z) is a linear function of Z and is a polyhedron,
then (5.5) is a (linear) semidenite program.
5.2
Wireless sensor network localization
There has been an increase in the use of semidenite programming (SDP) for
solving wide range of Euclidean distance geometry problems, such as data compression, metric-space embedding, ball packing, chain folding etc. One application of the SDP Euclidean distance geometry model lies in ad hoc wireless
sensor networks which are constructed for monitoring environmental information(temperature, sound levels, light etc) across an entire physical space. Typical networks of this type consist of a large number of densely deployed sensor
5.2. WIRELESS SENSOR NETWORK LOCALIZATION
107
nodes which gather local data and communicate with other nodes within a small
range. The sensor data from these nodes are relevant only if we know what location they refer to. Therefore knowledge of of each sensor position becomes
imperative. Generally, the use of a GPS system is a very expensive solution to
this requirement.
Indeed, other techniques to estimate node positions are being developed that
rely just on the measurements of distances between neighboring nodes [51, 65,
79, 116, 161, 163, 245, 268, 269, 270, 273]. The distance information could
be based on criterion like time of arrival, angle of arrival and received signal
strength. Depending on the accuracy of these measurements and processor,
power and memory constraints at each of the nodes, there is some degree of
error in the distance information. Furthermore, it is assumed that we already
know the positions of a few anchor nodes. The problem of nding the positions of
all the nodes given a few anchor nodes and partial distance information between
the nodes is called the position estimation or localization problem.
In particular, the paper [51] describes an SDP relaxation based model for
the position estimation problem in sensor networks. The optimization problem
is set up so as to minimize the error in the approximate distances between the
sensors. Observable traces are developed to measure the quality of the distance
data. The basic idea behind the technique is to convert the non-convex quadratic
distance constraints into linear constraints by introducing relaxations to remove
the quadratic term in the formulation. The performance of this technique is
highly satisfactory compared to other techniques. Very few anchor nodes are
required to accurately estimate the position of all the unknown sensors in a
network. Also the estimation errors are minimal even when the anchor nodes
are not suitably placed within the network. More importantly, for each sensor
the model generates numerical data to measure the reliability and accuracy of
the positions computed from the model, which can be used to detect erroneous
or outlier sensors.
5.2.1
An SDP relaxation model
For simplicity, let the sensor points be placed on a plane. Recall that we have
m known anchor points ak R2 , k = 1, ..., m, and n unknown sensor points
xj R2 , j = 1, ..., n. For every pair of two points, we have a Euclidean distance
measure if the two are within a communication distance range R. Therefore,
say for (i, j) Nx , we are given Euclidean distance data dij between unknown
kj between anchor k and
sensors i and j, and for (k, j) Na we know distance d
sensor j. Note that for the rest of pairs we have only a lower bound R for their
pair-wise distances. Therefore, the localization problem can be formulated as
108
an error minimization problem with mixed equalities and inequalities:

minimize
subject to
|ij | + k,jNa |kj |
= (dij )2 + ij , (i, j) Nx , i < j,

2
= (dkj )2 + kj , for (k, j) Na ,

2
R2 , for the rest i < j,
2
R2 , for the rest k, j.
i,jNx , i<j
2
xi xj
ak xj
xi xj
ak xj
Thus, we relax the problem to a semidenite program:

minimize
subject to
|ij | + k,jNa |kj |

(1; 0; 0) Z(1; 0; 0) = 1
(0; 1; 0)T Z(0; 1; 0) = 1
(1; 1; 0)T Z(1; 1; 0) = 2
(0; eij )T Z(0; eij ) = (dij )2 + ij , (i < j, j) Nx ,

T
(ak ; ej ) Z(ak ; ej ) = (dkj )2 + kj , (k, j) Na ,

T
2
(0; eij ) Z(0; eij ) R , (i < j, j) Nx ,
/
(ak ; ej )T Z(ak ; ej ) R2 , (k, j) Na ,
/
Z 0.
i,jNx , i<j
T
(5.6)
The matrix of Z has 2n + n(n + 1)/2 unknown variables. Consider the case
that among {k, i, j}, there are 2n + n(n + 1)/2 of the pairs where each of them
has same distance upper and lower bounds, and = 0 for the minimal solution
of (5.6). Then we have at least 2n + n(n + 1)/2 linear equalities among the
constraints. Moreover, if these equalities are linearly independent, then Z has
a unique solution. Therefore, we can show
Proposition 5.1 If there are 2n + n(n + 1)/2 distance pairs each of which has
an accurate distance measure and other distance bounds are feasible. Then, the
minimal value of = 0 in (5.6). Moreover, if (5.6) has a unique minimal
solution
I
X
,
Z=
T Y
then we must have Y = (X)T X and X equal true positions of the unknown
sensors. That is, the SDP relaxation solves the original problem exactly.
Proof. Let X be the true locations of the n points, and
Z =
I
(X )T
X
(X ) X
T
Then Z and = 0 is a feasible solution for (5.6).
On the other hand, since Z is the unique solution to satisfy the 2n+n(n+1)/2

equalities, we must have Z = Z so that Y = (X )T X = X T X.
We present a simple case to show what it means for the system has a unique
solution. Consider n = 1 and m = 3. The accurate distance measures from
109
unknown b1 to known a1 , a2 and a3 are d11 , d21 and d31 , respectively. Therefore,
the three linear equations are
y 2xT a1
y 2xT a2
y 2xT a3
= (d11 )2 a1
= (d21 )2 a2
= (d31 )2 a3
2
2
2
This system has a unique solution if it has a solution and the matrix
1
a1
1
a2
1
a3
is nonsingular. This essentially means that the three points a1 , a2 and a3 are
not on the same line, and then x = b1 can be uniquely determined. Here, the
SDP method reduces to the so-called triangular method. Proposition 5.1 and
the example show that the SDP relaxation method has the advantage of the
triangular method in solving the original problem.
5.2.2
Probabilistic or error analyses
The case discussed in Proposition 5.1 is deterministic. Alternatively, each xj

can be viewed a random point xj since the distance measures contain random
errors. Then the solution to the SDP problem provides the rst and second
moment information on xj , j = 1, ..., n. Such an interpretation appears to be
rst stated in Bertsimas and Ye [50].

Generally, we have
E[j ] xj , j = 1, ..., n
x
and
E[T xj ] Yij ,
xi
where
Z=
i, j = 1, ..., n.
XT
is the optimal solution of the SDP problem. Thus,

Y XT X
represents the co-variance matrix of xj , j = 1, ..., n.
These quantities also constitute error management and analyses of the original problem data. For example,
n

Trace(Y X T X) =
(Yjj xj
),
j=1
the total trace of the co-variance matrix, measures the quality of distance sample
data dij and dkj . In particular, individual trace
Yjj xj
(5.7)
110
which is also the variance of xj , helps us to detect possible distance measure
errors, and outlier or defect sensors. These errors often occur in real applications
either due to the lack of data information or noisy measurement, and are often
dicult to detect since the true location of sensors is unknown.
We again use the same simple case to illustrate our theory. Consider n = 1
and m = 3. The inexact distance measures from unknown b1 to known a1 , a2
and a3 are d11 + , d21 + and d31 + , respectively, where is a random error
with zero mean. Therefore, the three linear equations are
y 2T a1 + a1
x
y 2T a2 + a2
x
y 2T a3 + a3
2
2
2
= (d11 )2 + 2 d11 +
= (d21 )2 + 2 d21 +
= (d31 )2 + 2 d31 +
2
2
2
Taking expect values on both sides, we have

E[] 2E[]T a1 + a1
y
x
E[] 2E[]T a2 + a2
y
x
E[] 2E[]T a3 + a3
y
x
= (d11 )2 + E[ 2 ]
= (d21 )2 + E[ 2 ]
= (d31 )2 + E[ 2 ]
2
2
or
E[] E[]T E[] + E[] a1
y
x
x
x
E[] E[]T E[] + E[] a2
y
x
x
x
T
E[] E[] E[] + E[] a3
y
x
x
x
2
2
2
= (d11 )2 + E[ 2 ]
= (d21 )2 + E[ 2 ]
= (d31 )2 + E[ 2 ].
The solution to the linear equation is

E[] = b1 ,
x
and
E[] E[]T E[] = E[ 2 ]
y
x
x
or
E[] = E[ 2 ] + b1 2 .
y
That is, x is a point estimate of b1 . Moreover, from
E[] E[ x 2 ] + E[ x 2 ] b1
y
= E[] b1
y
= E[ 2 ],
we have
E[ x 2 ] E[ 2 ]
y
and E[ x 2 ] b1
E[ 2 ],
so that the quantity of y x 2 is a lower bound estimate for the error variance
and the variance of x is also bounded by the error variance. This quantity gives
an interval estimation of b1 .
More generally, we have
111
Proposition 5.2 Let the noisy measurements
dij = dij +
j,
i=j
and
dkj = dkj +
j,
k, j
where dij are the true distances and j are independent random errors with zero
mean. Moreover, let the minimal = 0 in (5.6) and the anchor points are
linear independent. Then, we have
E[j ] = bj
x
and
E[Yjj ] = bj
+ E[ 2 ] j
j
and
E[Yij ] = (bi )T bj i = j,
where bj is the true position of xj , j = 1, ..., n, and
Z=
XT
is the minimizer of (5.6).

Proof. Since = 0, we have, for all i, j, k,
Yii 2Yij + Yjj

T
Yjj 2j ak + ak 2
x
= (dij +
= (dkj +
+ j )2
2
j) .
Taking expect values on both sides, we have
E[Yii ] 2E[Yij ] + E[Yjj ]
E[Yjj ] E[j ]T E[j ] + E[j ] ak

x
x
x
(dij )2 + E[ 2 ] + E[ 2 ]
i
j
= (dkj )2 + E[ 2 ].
j
or
E[Yii ] 2E[Yij ] + E[Yjj ] E[i ] E[j ] 2 + E[j ] E[i ]

x
x
x
x
E[Yjj ] E[j ]T E[j ] + E[j ] ak

x
x
x
(dij )2 + E[ 2 ] + E[ 2 ]
i
j
(dkj )2 + E[ 2 ].
j
Thus,
E[j ] = bj
x
and E[Yjj ] = bj
+ E[ 2 ] j
j
and
E[Yij ] = (bi )T bj i = j
is the solution satisfying these equations.
112
5.3
General SDP theory on graph realization
In the previous section, we show that if all accurate distances are given, then
the graph can be localized by solving an SDP problem in polynomial time. In
real applications, only partial distances(or edges) are known to the model. Can
these graph be localized or realized in polynomial time? In this section, we
given a positive answer to a family of uniquely localizable graphs. Informally,
a graph is uniquely localizable in dimension d if (i) it has a unique realization
in Rd , and (ii) it does not have any nontrivial realization whose ane span is
Rh , where h > d. Specically, we present an SDP model that guarantees to
nd the unique realization in polynomial time when the input graph is uniquely
localizable. The proof employs SDP duality theory and properties of interior
point algorithms for SDP. To the best of our knowledge, this is the rst time such
a theoretical guarantee is proven for a general localization algorithm. Moreover,
in view of the hardness result of Aspnes et. al. [24], The result is close to be the
best possible in terms of identifying the largest family of eciently realizable
graphs. We also introduce the concept of strong localizability. Informally, a
graph is strongly localizable if it is uniquely localizable and remains so under
slight perturbations. We show that the SDP model will identify all the strongly
localizable subgraphs in the input graph.
5.3.1
Preliminaries
For the simplication of our analyses, we ignore the lower bound constraints
and study the Graph Localization problem that is dened as follows: nd a
realization of x1 , . . . , xn Rd such that
ak xj
xi xj
= d2
kj
(k, j) Na
d2
ij
(i, j) Nx
(5.8)
Recall that m anchor points a1 , . . . , am Rd whose locations are known, and n

sensor points x1 , . . . , xn Rd whose locations we wish to determine. Further
more, we are given partial accurate Euclidean distance values dkj between ak
and xj for (k, j) Na , and dij between xi and xj for some (i < j, j) Nx .
Again, we can write the relaxed problem as a standard SDP problem, namely,
nd a symmetric matrix Z R(d+n)(d+n) to:
maximize
subject to
Z1:d,1:d = Id
(0; eij )(0; eij )T Z = d2
ij
(i, j) Nx
(ak ; ej )(ak ; ej ) Z = d2
kj
(k, j) Na
(5.9)
where Z1:d,1:d is the d d principal submatrix of Z. Note that this formulation

forces any possible feasible solution matrix to have rank at least d.
5.3. GENERAL SDP THEORY ON GRAPH REALIZATION
113
The dual of the SDP relaxation is given by:

minimize
(i,j)Nx
V
0
subject to
wkj d2
kj
yij d2 +
ij
Id V +
0
0
(k,j)Na
yij (0; eij )(0; eij )T
wkj (ak ; ej )(ak ; ej )T
(5.10)
(i,j)Nx
(k,j)Na
Note that the dual is always feasible, as V = 0, yij = 0 for all (i, j) Nx and
wkj = 0 for all (k, j) Na is a feasible solution.
5.3.2
Analysis of the SDP relaxation
We now investigate when will the SDP (5.9) have an exact relaxation, i.e. when
will the solution matrix Z have rank d. Suppose that problem (5.9) is feasible.
This occurs when, for instance, dkj and dij represent exact distance values for
the positions X = [1 x2 . . . xn ]. Then, the matrix Z = (Id ; X)T (Id ; X) is a

x
feasible solution for (5.9). Now, since the primal is feasible, the minimal value
of the dual must be 0, i.e. there is no duality gap between the primal and dual.
Let U be the (d + n)dimensional dual slack matrix, i.e.:
U
V
0
0
0
yij (0; eij )(0; eij )T
+
(i,j)Nx
wkj (ak ; ej )(ak ; ej )T
+
(k,j)Na
Then, from the duality theorem for SDP (see, e.g., [10]), we have:
Theorem 5.3 Let Z be a feasible solution for (5.9) and U be an optimal slack
matrix of (5.10). Then,

1. complementarity condition holds: Z U = 0 or Z U = 0;
2. Rank(Z) + Rank(U ) d + n;
3. Rank(Z) d and Rank(U ) n.

An immediate result from the theorem is the following:
Corollary 5.4 If an optimal dual slack matrix has rank n, then every solution
of (5.9) has rank d. That is, problems (5.8) and (5.9) are equivalent and (5.8)
can be solved as an SDP in polynomial time.
Another technical result is the following:
114
Proposition 5.5 If every sensor point is connected, directly or indirectly, to

an anchor point in (5.8), then any solution to (5.9) must be bounded, that is,
Yjj is bounded for all j = 1, . . . , n.
Proof. If sensor point xj is connected to an anchor point ak , then we have:
xj
2aT xj + ak
k
Yjj 2aT xj + ak
k
= d2
kj
so that from the triangle inequality xj in (5.9) is bounded. Hence, we have:
Yjj d2 + 2 ak
kj
xj ak
Furthermore, if xi is connected to xj and Yjj is bounded, we have:

Yii 2
Yii Yjj + Yjj Yii 2Yij + Yjj = d2

ij
so that from the triangle inequality Yii must be also bounded.

In general, a primal (dual) maxrank solution is a solution that has the
highest rank among all solutions for primal (5.9) (dual (5.10)). It is known
[147, 130, 210] that various pathfollowing interiorpoint algorithms compute
the maxrank solutions for both the primal and dual in polynomial time. This
motivates the following denition.
Denition 5.1 Problem (5.8) is uniquely localizable if there is a unique local
ization X Rdn and there is no xj Rh , j = 1, . . . , n, where h > d, such
that:
(ak ; 0) xj 2 = d2
(k, j) Na
kj
2
2
xi xj = dij
(i, j) Nx
xj = (j ; 0)
x
for some j {1, . . . , n}
The latter says that the problem cannot have a nontrivial localization in some
higher dimensional space Rh (i.e. a localization dierent from the one obtained
by setting xj = (j ; 0) for j = 1, . . . , n), where anchor points are augmented to
x
(ak ; 0) Rh , for k = 1, . . . , m.
We now develop the following theorem:
Theorem 5.6 Suppose that the network is connected.
statements are equivalent:
Then, the following
1. Problem (5.8) is uniquely localizable.

2. The maxrank solution matrix of (5.9) has rank d.
3. The solution matrix of (5.9) satises Y = X T X.
Proof. The equivalence between 2. and 3. is straightforward.
Now, since any rank d solution of (5.9) is a solution to (5.8), from 2. to 1.
we need to prove that if the maxrank solution matrix of (5.9) has rank d then
it is unique. Suppose not, i.e., (5.9) has two rankd feasible solutions:
Z1 =
Id
T
X1
X1
T
X1 X1
and Z2 =
Id
T
X2
X2
T
X2 X2
115
Then, the matrix Z = Z1 + Z2 , where + = 1 and , > 0 is a feasible

solution and its rank must be d, since all feasible solution of (5.9) has rank at
least d but the maxrank is assumed to be d. Therefore, we have:
=
Id
T
T
X1 + X2
Id
BT
X1 + X2
T
T
X1 X1 + X2 X2
B
BT B
where B = X1 +X2 . It follows that (X1 X2 )T (X1 X2 ) = 0, or X1 X2 =

0, i.e. Z1 = Z2 , which is a contradiction.
Next, we prove the direction from 1. to 2., that is, the rank of a maxrank
solution of (5.9) is d. Suppose that there is a feasible solution Z of (5.9) whose
rank is greater than d. Then, we must have Y
X T X and Y = X T X. Thus, we
T
T
have the decomposition Y X X = (X ) X , where X = [x1 , . . . , xn ] Rrn
and r is the rank of Y X T X. Now, consider the point:
xj
xj
xj =
Then, we have:
xj
Rd+r
for j = 1, . . . , n
= Yjj , (i )T xj = Yij
x
i, j;
and there exist at least one xj such that xj = xj or xj = 0. Moreover,
since the network is connected, we conclude from Proposition 5.5 that Yii and
Yij are bounded for all i, j. Hence, we have:
(ak ; 0) xj
xi xj
2
2
= d2
kj
= d2
ij
(k, j) Na
(i, j) Nx
In other words, xj is a localization of problem (5.8) in Rd+r , which is a contra

diction.
Theorem 5.6 establishes, for the rst time, that as long as problem (5.8) is
uniquely localizable, then the realization can be computed in polynomial time
by solving the SDP relaxation. Conversely, if the relaxation solution computed
by an interiorpoint algorithm (which generates maxrank feasible solutions)
has rank d (and hence Y = X T X), then X is the unique realization of problem
(5.8). Moreover, as the recent result of Aspnes et. al. [24] shows, the results of
Theorem 5.6 are close to be the best possible, since the problem of computing
a realization of the sensors on the plane is NPcomplete in general, even when
the instance has a unique solution on the plane.
5.3.3
Strongly Localizable Problem
Although unique localizability is an useful notion in determining the solvability

of the Sensor Network Localization problem, it is not stable under perturbation.
As we shall see in Section 5.3.4, there exist networks which are uniquely localizable, but may no longer be so after small perturbation of the sensor points.
This motivates us to dene another notion called strong localizability.
116
Denition 5.2 We say problem (5.8) is strongly localizable if the dual of its
SDP relaxation (5.10) has an optimal dual slack matrix with rank n.
Note that if a network is strongly localizable, then it is uniquely localizable from
Theorems 5.3 and 5.6, since the rank of all feasible solution of the primal is d.
We show how we can construct a rankn optimal dual slack matrix. First,
note that if U is an optimal dual slack matrix of rank n, then it can be written
in the form U = (X T ; In )T W (X T ; In ) for some positive denite matrix W
of rank n. Now, consider the dual matrix U . It has the form:
U=
U11
T
U12
U12
U22
where U22 is an n n matrix. Moreover, it can be decomposed as U22 = A + D,

where Aij = yij if i = j, Aii = j Aij ; and D is a diagonal matrix where
Dii = (k,i)Na wki . (If there is no (k, i) Na , then Dii = 0.) Note that if
we impose the constraints yij 0 and wki 0, then both A and D are positive
semidenite. Moreover, we have the following:
Proposition 5.7 Suppose that the network is connected. Furthermore, suppose
that yij < 0 for all (i, j) Nx , and that wki < 0 for some (k, i) Na , with
Na = . Then, U22 is positive denite, i.e. it has rank n.
Proof. Since A and D are positive semidenite, we have xT U22 x 0 for all
x Rn . We now show that there is no x Rn \{0} such that xT Ax = xT Dx =
0. Suppose to the contrary that we have such an x. Then, since D is diagonal,
n
we have xT Dx = i=1 Dii x2 = 0. In particular, for Dii > 0, we have xi = 0.
i
Now, note that:
n
xT Ax =
(xi xj )2 Aij
xi xj Aij =
i=1 j=1
i<j
Thus, xT Ax = 0 implies that xi = xj for all i, j. Since Na = , there exists an i

such that Dii > 0, whence xi = 0. It follows that x = 0. Proposition 5.7 gives
us a recipe for putting U into the desired form. First, we set U22 to be a positive
denite matrix. Then, we need to set U12 = XU22 , where X is the matrix
containing the true locations of the sensors. We now investigate when this is
possible. Note that the above condition is simply a system of linear equations.
Let Ai be the set of sensors connected to anchor i, and let E be the number of
sensorsensor edges. Then, the above system has E + i |Ai | variables. The
number of equations is E + 3m, where m is the number of sensors that are
connected to some anchors. Hence, a sucient condition for solvability is that
the system of equations are linearly independent, and that i |Ai | 3m. In
particular, this shows that the trilateration graphs dened in [89] are strongly
localizable.
We now develop the next theorem.
117
Theorem 5.8 If a problem (graph) contains a subproblem (subgraph) that is

strongly localizable, then the submatrix solution corresponding to the subproblem
in the SDP solution has rank d. That is, the SDP relaxation computes a solution
that localizes all possibly localizable unknown points.
Proof. Let the subproblem have ns unknown points and they are indexed as
1, . . . , ns . Since it is strongly localizable, an optimal dual slack matrix Us of
the SDP relaxation for the subproblem has rank ns . Then in the dual problem
of the SDP relaxation for the whole problem, we set V and those w associated
with the subproblem to the optimal slack matrix Us and set all other w equal
0. Then, the slack matrix:
U=
Us
0
0
0
must be optimal for the dual of the (wholeproblem) SDP relaxation, and it
is complementary to any primal feasible solution of the (wholeproblem) SDP
relaxation:
Zs
Id Xs
Z=
0 where Zs =
T
Xs Ys
However, we have 0 = Z U = Zs Us and Us , Zs
0. The rank of Us is ns
implies that the rank of Zs is exactly d, i.e. Ys = (Xs )T Xs , so Xs is the unique
realization of the subproblem.
5.3.4
A Comparison of Notions
In this section, we will show that the notions of unique localizability, strong
localizability and rigidity in R2 are all distinct.
5.3.5
Unique Localizability Strong Localizability
We have already remarked earlier that a strongly localizable graph is necessarily uniquely localizable. However, as we shall see, the converse is not true.
Let G1 be the network shown in Figure 5.1(a). The key feature of G1 is that
the sensor x2 lies on the line joining anchors a1 and a3 . It is not hard to check
that this network is uniquely localizable. Now, suppose to the contrary that G1
is strongly localizable. Then, the dual slack matrix U admits the decomposition
U = (X T , I)T W (X T , I). It is easy to verify that:

U12
= (21 a2 + y31 a3 , y12 a1 + y32 a3 )

y
U22
(21 + y31 ) y12

y
y12
y12
(12 + y32 ) y12
y
and the form of U requires that U12 = XU22 . This is equivalent to the
following system of equations:
(1 a2 )21 + (1 a3 )31 = (1 x2 )y12
x
y
x
y
x
(5.11)
(2 a1 )12 + (2 a3 )32 = (1 x2 )y12

x
y
x
y
x
(5.12)
118
Since x2 lies on the ane space spanned by a1 and a3 , equation (5.12) implies
that y12 = 0. However, equation (5.11) would then imply that x1 lies on the
ane space spanned by a2 and a3 , which is a contradiction. Thus, we conclude

that G1 is not strongly localizable.
a1
a1=(0,1.4)
x1
x2
x2=(0.6,0.7)
x1=(0,0.5)
a2=(1,0)
a2
a3=(1,0)
a3
(a) A uniquely localizable, but not

strongly localizable network
(b) A rigid network that is not

uniquely localizable
Figure 5.1: A comparison of graph notions
5.3.6
Rigid in R2 Unique Localizability
By denition, a uniquely localizable network is rigid in R2 . However, the converse is not true. To see this, let G2 be the network shown in Figure 5.1(b).
Note that G2 can be viewed as a perturbed version of G1 . It is easy to verify
that G2 is rigid. Thus, by Theorem 5.6, it can fail to be uniquely localizable
only if it has a realization in some higher dimension. Indeed, the above network
has an 3dimensional realization. The idea for constructing such a realization
is as follows. Let us rst remove the edge (x1 , x2 ). Then, reect the subgraph
induced by a1 , x2 , a2 across the dotted line. Now, consider two spheres, one
centered at a2 and the other centered at a3 , both having radius 5/2. The
intersection of these spheres is a circle, and we can move x1 along this circle
until the distance between x1 and x2 equals to the prespecied value. Then, we
can put the edge (x1 , x2 ) back and obtain an 3dimensional realization of the
network.
More precisely, for the above realization, the reected version of x2 has co
495
112
ordinates x2 = 173 , 185 , 0 . Now, let x1 = 0, 23 , 64 . It is straightforward
370
64
to verify that:
x1 a2
= x1 a2
5
4
x1 a3
= x1 a3
5
4
x1 x2
= x1 x2
2
5
119
Hence, we conclude that G2 is not uniquely localizable.

It would be nice to have a characterization on those graphs which are rigid
in the plane but have higher dimensional realizations. However, nding such
a characterization remains a challenging task, as such characterization would
necessarily be noncombinatorial, and would depend heavily on the geometry
of the network. For instance, the networks shown in Figure 5.2, while having
the same combinatorial property as the one shown in Figure 5.1(b), are uniquely
localizable (in fact, they are both strongly localizable):
a1
a1
x2
x2
x1
a2
a2
a3
(a)
a3
x1
(b)
Figure 5.2: Strongly localizable networks
5.3.7
Preliminary computational and simulation results
The computational results presented here were generated using the interiorpoint algorithm SDP solvers SeDuMi of Sturm and DSDP2.0 of Benson and
Ye. The performance of this technique seems highly satisfactory compared to
other techniques. Very few anchor nodes are required to accurately estimate
the position of all the unknown nodes in a network. Also the estimation errors
are minimal even when the anchor nodes are not suitably placed within the
network.
Simulations were performed on a network of 50 sensors or nodes randomly
generated in a square region of [.5 .5] [.5 .5]. The distances between the
nodes was then calculated. If the distance between two notes was less than a
given radiorange between [0, 1], a random error was added to it
dij = dij (1 + randn(1) noisyf actor),

where noisyf actor was a given number between [0, 1], and then both upper and
lower bound constraints were applied for that distance in the SDP model. If the
distance was beyond the given radio range, only the lower bound constraint,
120
1.001 radiorange2 , was applied. The average estimation error is dened by

n
xj aj ,
n j=1
where xj comes from the SDP solution and aj is the true position of the jth

node. As discussed earlier, we called the trace of Y X T X the total trace.
Connectivity indicates how many of the nodes, on average, are within the radio
range of a sensor.
Also the original and the estimated sensors were plotted. The blue diamond
nodes refer to the positions of the anchor nodes, green circle nodes to the true
locations of the unknown sensors and red star nodes to their estimated positions
from X. The discrepancies in the positions can be estimated by the osets

between the true and the estimated points as indicated by the solid lines, see
Figure 5.3. Even with noisy data measurement, the position estimations for the
sensors are fairly accurate. Figure 5.4 shows the correlation between individual
oset errors (not observable) and individual traces (observable) of corresponding
sensors when the radio range is set at 0.25.
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.1
0.2
0.2
0.3
0.3
0.4
0.4
0.5
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
(a) error:0.023, connectivity:7.8,

trace:0.16, radio range=0.20.
0.4
0.5
0.5
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
total (b) error:0.002, connectivity:10.5,

trace:0.04, radio range=0.30.
0.4
0.5
total
Figure 5.3: Position estimations with 3 anchors, noisy factor=0, and various
radio ranges.
Several key questions have to be answered for the future research: What is
the minimal radio range such that the network is localizable? How does the
sensor geographical distribution change the minimal radio range? What is the
sensitivity of the distance data? How to lter the data noisy? etc.
5.4. OTHER DISTANCE GEOMETRY PROBLEMS

0.5
121
0.25
0.4
0.3
0.2
0.2
0.1
0.15
0.1
0.1
0.2
0.3
0.05
0.4
0.5
0.5
0.4
0.3
(a) number
range=0.25.
0.2
0.1
of
0.1
0.2
0.3
0.4
anchors=3,
0.5
10
15
radio (b) number

range=0.30.
20
of
25
30
35
anchors=3,
40
45
50
radio
Figure 5.4: Diamond: the oset distance between estimated and true positions,
Box: the square root of trace Yjj xj 2 .
5.4
Other distance geometry problems
We list few other variances of the distance geometry or graph localization problem.
5.4.1
Metric distance embedding
The basic mathematical model of the metric distance embedding problem can be
described as a quadratically constrained optimization problem. We are given
metric p-norm distances dij for all i, j = 1, ..., n. The problem is to nd n
unknown points xj Rn , j = 1, ..., n such that
xi xj
= (dij )2 i = j
or the error is minimized.

Thus, an optimization problem can be formulated as:
minimize
subject to
(dij )2 xi xj 2 (dij )2 , i < j

xj Rn , j = 1, ..., n.
(5.13)
Its SDP relaxation becomes
p :=
minimize
subject to
(dij )2 (eij eT ) Y (dij )2 , i < j

ij
Y
0.
(5.14)
122
Again
eij = (0, ..., 1, ..., 1, ...)T .
The SDP dual is
p :=
Maximize
Subject to
i<j
uij (dij )2
T
i,j (uij vij )eij eij ,
2
i,j vij (dij ) = 1
uij , vij 0.
Result and questions:
2 = 1, and p O(log n) for all p.
Reduce Y s rank to O(log n) for 2 = 1 + .
rank O(log n) for p O(log n) for all p.
Better bound for 1 ?
5.4.2
Molecular conrmation
The basic mathematical model of the molecular conrmation problem can be

described as a quadratically constrained optimization problem as well. For a
pair of (i, j) N , we are given Euclidean distance upper bound dij and lower
bound dij between points i and j, the problem is to nd Y such that
(dij )2 (eij eT ) Y (dij )2 , i < j N .

ij
Furthermore, it is desired to nd other missing dij such that the Euclidean
distance matrix can be completed.
The progem can be formulated as an SDP:
minimize I Y
subject to (dij )2 (eij eT ) Y (dij )2 , i < j N

ij
Y
0.
5.4.3
(5.15)
Euclidean ball parking
The Euclidean ball packing problem is an old mathematical geometry problem

with plenty modern applications in Bio-X and chemical structures. It is a special
case of the Euclidean distance geometry problem, say d = 3, xj R3 is the
unknown position of the center of ball j, dij = and dij = ri + rj , where ri

and rj are the radii of balls i and j, respectively; and represents a convex
container where the balls should be packed.
5.4. OTHER DISTANCE GEOMETRY PROBLEMS
123
Say that we want to pack n balls in a box with width and length equal 2R
and like to minimize the height of the box, we can formulate it as
minimize
subject to
xi xj 2 (ri + rj )2 , i = j,
R + rj xj (1) R rj , j,
R + rj xj (2) R rj , j,
rj xj (3) rj , j,
(5.16)
The rst constraint tells that the distance between any two balls should be
greater than or equal to the sum of the two balls radii, the second says that
the rst coordinate of xj is within the width constraint and the third is for the
length requirement, and the fourth indicates that the ball center has to be above
rj from the bottom and its height is below rj where is to be minimized.
5.4.4
Data dimensionality reduction
Given P , a data point set of p1 , ..., pn Rd , a fundamental question is how to

embed P into Q of q1 , ..., qn Rk , where k
d, such that qj s keep all essential
information of P , such as the norms, distances and angles between pj s. In other
words, nd a d k-dimension subspace such that the projections of pj s onto the
subspace has a minimal information loss. We believe that this problem can
be posted as an Euclidean distance geometry problem
Lk :=
minimize
subject to
nk
T
2
i=1 ((pl ) xi ) ,
nk
T
2
i=1 ((pl pm ) xi )
2
xi = 1, i = 1, ..., n
l,
, l < m,
k, (xh )T xi = 0, h < i.
(5.17)
Here, x1 , ..., xnk represent the n k orthogonal bases of a n k-dimension subspace. Thus, (pT xi ; ...; pT xnk ) represents the projection of pl onto the n k
l
l
subspace. After the problem being solved and the subspace is found, the projections of pl s onto the complement k-dimension subspace must have all remaining
information of pl s. In fact, the model without pair-wise distance constraints
is called the minimum radius problem, which is a fundamental computational
geometry and statistics problem.
Assume that x1 , x2 , , xnk Rd are the optimal solution of (5.17). Then
one can easily verify that the matrix Y = x1 xT + x2 xT + + xnk xT
1
2
nk is a
feasible solution for the following SDP model:
k :=
minimize
subject to
pT Y pl , l,
l
(pl pm )T Y (pl pm ) , l < m,
Trace(Y ) = n k, I Y
0, Y
(5.18)
0.
It follows that k Rk (P )2 . Then, we can design a rounding procedure to

, a solution to (5.18), into orthogonal n k bases and prove their
round Y
quality.
124
Theorem 5.9 We can computed in polynomial time, a (n k)-subspace such

2
that, with probability at least 1 n , its projected value of (5.17) is at most
12 log(n) of the minimal one.
5.5
A special case: The k-radius of P
The outer-radius, Rk (P ), is the minimum of the Euclidean radius of P projected

onto a k-dimensional subspace. Computing the minimal radius is a fundamental
problem in computational convexity with applications in global optimization,
data mining, statistics and clustering, and has received considerable attention
in the computational geometry literature [60, 153].
The square of Rk (P ) can be dened by the optimal value of the following
Euclidean distance minimization:
Rk (P )2 := minimize
k
T
2
subject to
i=1 ((pl ) xi ) , pl P,
2
xi = 1, i = 1, ..., k, (xi )T xj = 0, i = j k.
(5.19)
This leads to the following SDP relaxation:
k :=
minimize
subject to
Trace(pl pT X) , pl P,
l
Trace(X) = k, I X 0, X
(5.20)
0.
It follows that k Rk (P )2 .
Lemma 5.10 There exists an integer r k such that we can compute, in

polynomial time, r nonnegative reals 1 , 2 , , r and r orthogonal unit vectors
v1 , v2 , , vr such that
r
i=1
(i).
i = k.
(ii). max1ir i 1.
(iii).
r
i=1
i p, vi
Rk (P )2 , for any p P .
Proof. We solve the semidente program (5.20), and let X be an optimal

solution of (5.20). We claim that the rank of X , say r, is at least k. This
follows from the fact that Trace(X ) = k and I X
0. In other words,
Trace(X ) = k implies that the sum of the eigenvalues of X is equal to k,
and I X
0 implies that the all eigenvalues are less than or equal to 1.
Therefore, X has at least k non-zero eigenvalues, which implies that the rank
of X is at least k. Let 1 , 2 , , r be the r nonnegative eigenvalues and
r
v1 , v2 , , vr be the corresponding eigenvectors. Then we have i=1 i = k
and max1ir i 1. Furthermore, for any p P ,
r
i p, vi
i=1
= Trace(ppT
i vi vi ) = Trace(ppT X ) k Rk (P )2 .
i=1
5.5. A SPECIAL CASE: THE K-RADIUS OF P
5.5.1
125
Deterministic First Rounding
In this section, we prove a lemma concerning how to deterministically group the

eigenvalues and their eigenvectors. The proof of the lemma is elementary but it
plays an important role for proving our main result.
Lemma 5.11 The index set {1, 2, , r} can be partitioned into k sets I1 , I2 , , Ik
such that
(i). k Ii = {1, 2, , r}, and for any i = j, Ii Ij = .
i=1
(ii). For any i : 1 i k,
jIi
1
j 2 .
Proof. Recall that

j=1 j = k and 0 j 1 for all j. Without
loss of generality, we can assume that 1 2 r . Our partitioning
algorithm is the same as the Longest-Processing-Time heuristic algorithm for
parallel machine scheduling problem. The algorithm works as follows:
STEP 1. For i = 1, 2, , k, set Ii = and let Li = 0. Let I = {1, 2, , r}.
STPE 2. While I =
choose j from I with the smallest index;
choose set i with the smallest value Li ;
Let Ii := Ii {j}, Li := Li + j and I := I {j}.
It is clear that when the algorithm stops, the sets I1 , I2 , , Ik satisfy condition (i). Now we prove condition (2) by contradiction. Assume that there exists
1
some t such that jIt j < 2 .
We now claim that, for all i, jIi j 1. Otherwise, suppose jIt j > 1
for some t . Note that j 1 for every j and thus there are at least two
eigenvalues are assigned to It . Denote the last eigenvalue by s . It follows
that jIt j s = jIt \{s } j jIt j since, otherwise, we would
have not assigned s to It in the algorithm. However, since jIt j < 1 , we
2
must have jIt j s = jIt \{s } j < 1 . Thus, s > jIt j 1 > 1 .
2
2
2
This is impossible since s is the last eigenvalue assigned to It , which implies
s j for every j It , and we have already proved that there must exist an
l such that s = l It and l jIt \{s } j < 1 . Therefore, jIi j 1
2
for all i, and in particular
jIt
j <
1
2.
It follows that
k
i=1
k
i=1
jIi j
r
j=1 j = k.
< k.
This
However, we know that, by condition (i),
jIi j =
results a contradiction. Therefore, such t does not exists and we have proved
condition (ii).
Notice that the running time of the partitioning algorithm is bounded by
O(r k). An alternative way of partitioning the eigenvalues is the following:
First, put the eigenvalues that are greater than or equal to 1/2 into distinct
subsets. If the number of such eigenvalues, say l, is not less than k, then we are
done. Otherwise, arbitrarily put the remaining eigenvalues into k l subsets
such that the sum of eigenvalues in each subset is greater than or equal to 1/2.
This method is suggested by an anonymous referee.
126
5.5.2
Randomized Second Rounding
Assume now that we have found I1 , I2 , , Ik . Then our next randomized rounding procedure works as follows.
STEP 1. Generate a r dimensional random vector such that each entry of
takes value, independently, 1 or 1 with probability 1 each way.
2
STEP 2. For i = 1, 2, , k, let
jIi
xi =
j vj
jIi j
The following Lemmas show that x1 , x2 , , xk form a feasible solution for

the original problem. In other words, they are k orthogonal unit vectors.
Lemma 5.12 For i = 1, 2, , k, xi = 1.
Proof. Recall that vl , vj = 0 for any l = j and vj = 1. By denition,
xi
= xi , xi
jIi
j
j
j v j
j
j vj ,
j
jIi
j vj , j
j v j
(j )2 j vj
j v j
jIi
jIi
1
jIi
jIi
jIi
1
jIi j
jIi
1
jIi
1
jIi
1
jIi
j v j
j
jIi
j
jIi
j
jIi
= 1
Lemma 5.13 If s = t then xs , xt = 0.

Proof. The proof is similar as that of Lemma 5.12.
xs , xt
j vj
5.5. A SPECIAL CASE: THE K-RADIUS OF P

j
j v j
jIs
jIs
j v j
jIt
=
jIs
jIt
jIt
127
j v j ,
j v j
jIt
jIs
0.
The last equality holds since for any j Is and l It , vj , vl = 0.

Now we establish a bound on the performance of our algorithm. First, let us
introduce Bernsteins Theorem (see, e.g., [238]), which is a form of the Cherno
Bound.
Lemma 5.14 Let be a random vector whose entries are independent and
either 1 or 1 with probability .5 each way. Then, for any vector e and > 0,
2
prob{ , e
Let Cip =
jIi
> e 2 } < 2 exp( ).

2
j p, vj 2 . Then we have
Lemma 5.15 For each i = 1, 2, , k and each p P , we have

prob{ p, xi
> 12 log(n) Cip } <
2
.
n3
Proof. Given i and p, dene a |Ii | dimensional vector e such that its entries are
j p, vj , j Ii , respectively. Furthermore, we dene the vector |Ii whose
entries are those of with indices in Ii . First notice that
e
j p, vj )2 =
jIi
j p, vj
= Cip .
jIi
On the other hand,

p, xi
=
jIi
p,
jIi
j vj j
j
1
jIi
p,
j v j j
jIi
2
2 p,
j v j j
jIi
(since
j
jIi
1
)
2
128
= 2
j j p, vj
jIi
= 2 |Ii , e
Thus
prob{ p, xi
> 12 log(n)Cip } prob{ |Ii , e
> 6 log(n) e 2 }.
Therefore, the conclusion of the lemma follows by using Lemma 5.14 and by
letting = 6 log(n).
Theorem 5.16 We can computed in polynomial time, a (d k)-at such that,
2
with probability at least 1 n , the distance between any point p P and F is
at most 12 log(n) Rk (P ).
Proof. For given i = 1, 2, , k and p P , consider the event
Bip = {| p, xi
> 12 log(n) Cip }
and B = i,p Bip . The probability that the event B happens is bounded by
prob{ p, xi
> 12 log(n) Cip } <
i,p
2kn
2
.
3
n
n
If B does not happen, then for any i and p,

p, xi
12 log(n) Cip .
Therefore, for each p P ,

k
p, xi
i=1
Cip 12 log(n) Rk (P )2 .
12 log(n)
i=1
The last inequality follows from Lemma 5.10. This completes the proof by
taking F as the at which is orthogonal to the vectors x1 , x2 , , xk .
5.6
Distributed SDP computing
The SDP problems presented in this chapter can be solved in a distributed fashion which has not been studied before. Here we describe an iterative distributed
SDP computation scheme to for sensor network localization. We rst partition
the anchors into many clusters according to their physical positions, and assign
some sensors into these clusters if a sensor has a direct connection to one of the
5.6. DISTRIBUTED SDP COMPUTING
129
anchors. We then solve semidenite programs independently at each cluster,

and x those sensors positions which have high accuracy measures according
the SDP computation. These positioned sensors become ghost anchors and
are used to decide the remaining un-positioned sensors. The distributed scheme
then repeats. A round of the distributed computation method is straightforward
and intuitive:
1. Partition the anchors into a number of clusters according to their geographical positions. In our implementation, we partition the entire sensor
area into a number of equal-sized squares and those anchors in a same
square form a regional cluster.
2. Each (unpositioned) sensor sees if it has a direct connection to an anchor
(within the communication range to an anchor). If it does, it becomes an
unknown sensor point in the cluster to which the anchor belongs. Note
that a sensor may be assigned into multiple clusters and some sensors are
not assigned into any cluster.
3. For each cluster of anchors and unknown sensors, formulate the error
minimization problem for that cluster, and solve the resulting SDP model
if the number of anchors is more than 2. Typically, each cluster has less
than 100 sensors and the model can be solved eciently.
4. After solving each SDP model, check the individual trace (5.7) for each
unknown sensor in the model. If it is below a predetermined small tolerance, label the sensor as positioned and its estimation xj becomes an
anchor. If a sensor is assigned in multiple clusters, we choose the xj
that has the smallest individual trace. This is done so as to choose the
best estimation of the particular sensor from the estimations provided by
solving the dierent clusters.
5. Consider positioned sensors as anchors and return to Step 1 to start the
next round of estimation.
Note that the solution of the SDP problem in each cluster can be carried
out at the cluster level so that the computation is highly distributive. The only
information that needs to be passed among the neighboring clusters is which of
the unknown sensors become positioned after a round of SDP solutions.
In solving the SDP model for each cluster, even if the number of sensors is
below 100, the total number of constraints could be in the range of thousands.
However, many of those bounding away constraints, i.e., the constraints between two remote points, are inactive or redundant at the optimal solution.
Therefore, we adapt an iterative active constraint generation method. First,
we solve the problem including only partial equality constraints and completely
ignoring the bounding-away inequality constraints to obtain a solution. Secondly we verify the equality and inequality constraints and add those violated
at the current solution into the model, and then resolve it with a warm-start
solution. We can repeat this process until all of the constraints are satised.
130
Typically, only about O(n + m) constraints are active at the nal solution so
that the total number of constraints in the model can be controlled at O(n+m).
5.6.1
Preliminary computational and simulation results
Simulations were performed on networks of 2, 000 to 4, 000 sensor points which

are randomly generated in a square region of [.5 .5] [.5 .5]. We generally
select the rst 510% of the points as anchors, that is, anchors are also uniformly
distributed in the same random manner. The tolerance for labeling a sensor as
positioned is set at 0.01 (1 + noisyf actor) radiorange. One simulation solves
a network localization with 4, 000 sensors, where the entire sensor region is
partitioned into 10 10 equal-sized squares, that is, 100 clusters, and the radio
range is set at .035. The total solution time for the ve round computation on
the single Pentium 1.2 GHz and 500 MB PC, excluding computing dij , is about
four minutes using DSDP2.0. The nal solution is ploted in Figure 3.
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.1
0.1
0.2
0.3
0.4
0.5
Figure 5.5: Third and nal round position estimations in the 4, 000 sensor
network, noisy-factor=0, radio-range=0.045, and the number of clusters=100.
It is usually the outlying sensors at the boundary or the sensors which do not
have many anchors within the radio range that are not estimated in the initial
5.7. NOTES
131
stages of the method. Gradually, as the number of well estimated sensors or

ghost anchors grows, more and more of these points are positioned. We have
also estimated the same 4, 000 network using noisy data. It is noted that the
erroneous points are isolated within particular regions. This clearly indicates
that the clustering approach prevents the propagation of errors to other clusters.
We feel that our distributed computation techlogy is promising for solving very
very large scale distance geometry related problems.
5.7
Notes
The distance geometry problem and its variants arise from applications in
various areas, such as molecular conformation, dimensionality reduction, Euclidean ball packing, and more recently, wireless sensor network localization
[5, 51, 79, 159, 269, 273]. In the sensor networks setting, the vertices of G correspond to sensors, the edges of G correspond to communication links, and the
weights correspond to distances. Furthermore, the vertices are partitioned into
two sets one is the anchors, whose exact positions are known (via GPS, for
example); and the other is the sensors, whose positions are unknown. The goal
is to determine the positions of all the sensors. We shall refer to this problem
as the Sensor Network Localization problem. Note that we can view the Sensor
Network Localization problem as a variant of the Graph Realization problem in
which a subset of the vertices are constrained to be in certain positions.
In many sensor networks applications, sensors collect data that are location
dependent. Thus, another related question is whether the given instance has
a unique realization in the required dimension (say, in R2 ). Indeed, most
of the previous works on the Sensor Network Localization problem fall into
two categories one deals with computing a realization of a given instance
[51, 79, 89, 159, 268, 269, 270, 273], and the other deals with determining
whether a given instance has a unique realization in Rd using graph rigidity
[89, 137]. It is interesting to note that from an algorithmic viewpoint, the
two problems above have very dierent characteristics. Under certain non
degeneracy assumptions, the question of whether a given instance has a unique
realization on the plane can be decided eciently [167], while the problem of
computing a realization on the plane is NPcomplete in general, even if the given
instance has a unique realization on the plane [24]. Thus, it is not surprising
that all the aforementioned heuristics for computing a realization of a given instance do not guarantee to nd it in the required dimension. On another front,
there has been attempts to characterize families of graphs that admit polynomial time algorithms for computing a realization in the required dimension. For
instance, Eren et. al. [89] have shown that the family of trilateration graphs has
such property. (A graph is a trilateration graph in dimension d if there exists
an ordering of the vertices 1, . . . , d + 1, d + 2, . . . , n such that (i) the rst d + 1
vertices form a complete graph, and (ii) each vertex j > d + 1 has at least d + 1
edges to vertices earlier in the sequence.) However, the question of whether
there exist larger families of graphs with such property is open.
132
We should mention here that various researchers have used SDP to study
the distance geometry problem (or its variants) before. For instance, Barvinok
[33, 34] has studied this problem in the context of quadratic maps and used
SDP theory to analyze the possible dimensions of the realization. Alfakih and
Wolkowicz [6, 7] have related this problem to the Euclidean Distance Matrix
Completion problem and obtained an SDP formulation for the former. Moreover, Alfakih has obtained a characterization of rigid graphs in [3] using Euclidean distance matrices and has studied some of the computational aspects of
such characterization in [4] using SDP. However, these papers mostly addressed
the question of realizability of the input graph, and the analyses of their SDP
models only guarantee that they will nd a realization whose dimension lies
within a certain range. Thus, these models are not quite suitable for our application. In contrast, our analysis takes advantage of the presence of anchors and
gives a condition which guarantees that our SDP model will nd a realization
in the required dimension. We remark that SDP has also been used to compute
and analyze distance geometry problems where the realization is allowed to have
a certain amount of distortion in the distances [35, 203]. Again, these methods can only guarantee to nd a realization that lies in a range of dimensions.
Thus, it would be interesting to extend our method to compute lowdistortion
realizations in a given dimension. For some related work in this direction, see,
e.g., [26].
The SDP models were rst used by [63, 64] for the ad hoc wireless sensor
network localization, and the results described here were based on their work.
The result on the radii of point sets is due to [339], which
improves the ratio
of [317] by a factor of O( log d) that could be as large as O( log n). This ratio
also matches the previously best known ratio for approximating the special case
R1 (P ) of [238], the width of point set P .
5.8
Exercises
5.1 Find the SDP relaxation to the following minimization problem

minimize
subject to
2
2
k,jNa (kj )
i,jNx , i<j (ij ) +
2
2
xi xj = (dij ) + ij , (i, j) Nx , i <
ak xj 2 = (dkj )2 + kj , for (k, j) Na ,

xi xj 2 R2 , for the rest i < j,
ak xj 2 R2 , for the rest k, j;
j,
and interpret the objective function.

5.2 Find the SDP relaxation to the following minimization problem
minimize
subject to
2
i,jNx , i<j (ij )
xi xj
ak xj
xi xj
ak xj
+ k,jNa (kj )2
= dij + ij , (i, j) Nx , i < j,
= dkj + kj , for (k, j) Na ,

2
R2 , for the rest i < j,
2
R2 , for the rest k, j;
5.8. EXERCISES
133
and interpret the objective function.

5.3 The Un-Folding Problem: with the same notions, consider the problem
maximize
subject to
i
ak
xi 2
xj
xi xj
= d2
kj
(k, j) Na
= d2
ij
(i, j) Nx
(5.21)
Find the SDP relaxation and its dual, and explain what they are.
5.4 Given one sensor and three anchors in R2 , and let the three distances from
the sensor to three anchors are given. Show that the graph is stronly localizable
if and only if the sensors position lies in the liner space of the three anchors
positions.
5.5 Prove that the graphs depicted in Figures 5.2(a) and 5.2(b) are strongly
localizable.
5.6 Suppose for the angle of two edges joint at a point is known. How to incoporate this information in the SDP relaxation of the distance geometry problem?
134
Chapter 6
SDP for Robust

Optimization
6.1
Robust Optimization
Consider an optimization problem

Minimize
f (x, )
Subject to
F (x, ) K Rm .
(OPT)
(6.1)
where is the data of the problem and x Rn is the decision vector, and K is
a convex cone.
For deterministic optimization, we assume is known and xed. In reality,
may not be certain.
Knowledge of belonging to a given uncertain set U .
The constraints must be satised for every in the uncertain set U .
Optimal solution must give the best guaranteed value of supU f (x, ).
This leads to the so-called robust counterpart:
Minimize
supU f (x, )
Subject to
F (x, ) K
(ROPT)
for all
Other methods dealing with data uncertainty include:

135
U.
(6.2)
136
CHAPTER 6. SDP FOR ROBUST OPTIMIZATION
6.1.1
Stochastic Method
Minimize
E [f (x, )]
Subject to
E [F (x, )] K.
(EOPT)
6.1.2
Sampling Method
Minimize
Subject to
F (x, k ) K
(SOPT)
f (x, k ) z
for large sxamples
k U.
This problem may be a semin-innite programming problem.
6.2
Robust Quadratic Optimization

qT x
Minimize
(EQP)
Subject to
Ax
(6.3)
1.
Here, vector q Rn and A Rmn ; and . is the Euclidean norm.
6.2.1
Ellipsoid uncertainty
Let A be uncertain and

k
A = A0 +
uj Aj | uT u 1.
j=1
Minimize
qT x
(REQP)
Subject to
(A0 +
k
j=1
uj Aj )x 1
uT u 1.
uj Aj )x
uT u 1.
The problem can be rewritten as

Minimize
qT x
(REQP)
Subject to
Let
(A0 +
k
j=1
F (x) = (A0 x, A1 x, Ak x).
6.2. ROBUST QUADRATIC OPTIMIZATION

Then
(A0 +
uj Aj )x
1
u
j=1
137
1
u
F (x)T F (x)
The problem becomes

qT x
Minimize
(REQP)
1
u
Subject to
0
0
1
u
6.2.2
1
0
1
0
F (x)T F (x)
0
I
1
u
1
u
0.
S-Lemma
Lemma 6.1 Let P and Q be two symmetric matrices such that there exists u0
satisfying (u0 )T P u0 > 0. Then the implication that
uT P u 0 uT Qu 0
holds true if and only if there exists 0 such that
Q
P.
From the lemma, we immediately see

T
1
u
1
0
1
u
0
0
1
u
F (x)T F (x)
1
0
1
u
0
I
if and only if there is a 0 such that

1
0
This is equivalent to
0
0
1
0
F (x)T F (x)
0
0
1
0
1
0
0
I
0
I
F (x)
F (x)
which is a SDP constraint since F (x) is linear in x. Here we have used another
technical lemma
Lemma 6.2 Let P a symmetric matrix and A be a rectangle matrix. Then
P AT A
if and only if
P
A
AT
I
0
0.
138
6.2.3
SDP for Robust QP
Consider and x as variables, REQP nally becomes a SDP problem:

qT x
Minimize
(REQP)
Subject to
6.3
1 0
0
I
F (x)
F (x)
0.
General Robust Quadratic Optimization

Minimize
qT x
Subject to
xT AT Ax + 2bT x + 0.
(EQP)
(6.4)
Here, vector q, b Rn and and A Rmn .
6.3.1

k
(A, b, ) = (A0 , b0 , 0 ) +
uj (Aj , bj , j )| uT u 1.
j=1
Minimize
qT x
Subject to
xT AT Ax + 2bT x + 0
(REQP)
uT u 1.
Let again
F (x) = (A0 x, A1 x, Ak x).
Then
xT AT Ax + 2bT x + =
1
u
0 + 2xT b0
1 /2 + xT b1
k /2 + xT bk
1 /2 + xT b1
0
k /2 + xT bk
0
F (x)T F (x)
1
u
6.4. MORE ROBUST QUADRATIC OPTIMIZATION
6.3.2
139
SDP formulation
From the S-lemma, we immediately see if and only if there is 0 such that
0
+ 2xT b0 1 /2 + xT b1 k /2 + xT bk
1 /2 + xT b1
F (x)T F (x)) 1 0
0 I
k /2 + xT bk
0
0
which is is equivalent to
0
+ 2xT b0 1 /2 + xT b1
1 /2 + xT b1
0

k /2 + xT bk
0
k /2 + xT bk
0
+
0
F (x)
1
0
0
I
F (x)T
0.
which is a SDP constraint since the matrix is linear in x.

Consider and x as variables, REQP nally becomes a SDP problem:
Minimize
(REQP)

Subject to
6.4
qT x
0 + 2xT b0 1 /2 + xT b1
1 /2 + xT b1
k /2 + xT bk
0
F (x)
k /2 + xT bk
More Robust Quadratic Optimization

Minimize
qT x
Subject to
xT AT Ai x + 2bT x + i 0
i
i
(EQP)
i = 1, ..., t.
(6.5)
Here, vector q, bi Rn and and Ai Rmn .
6.4.1

k
j
uj (Aj , bj , i )| uT u 1.
i i
0
(Ai , bi , i ) = (A0 , b0 , i ) +
i i
j=1
Minimize
qT x
Subject to
xT AT Ai x + 2bT x + i 0, i = 1, ..., t,
i
i
(REQP)
uT u 1.
F (x)T
0.
140
6.4.2
SDP formulation
Let again
Fi (x) = (A0 x, A1 x, Ak x).
i
i
i
Consider i , i = 1, ..., t, and x as variables, REQP nally becomes a SDP
problem:
Min q T x

s.t.
0
i + 2xT b0 i
i
1 /2 + xT b1
i
i

i /2 + xT bk
i
1
i /2 + xT b1
i
i
0
Fi (x)
k
i /2 + xT bk
i
0
Fi (x)T
i
I
i = 1, ..., t.
6.5
Tool for Robust Optimization
Recall:
Minimize
supU f (x, )
Subject to
F (x, ) 0 for all U.
(ROPT)
(6.6)
Express
= (u)
where
u U.
Then,
sup f (x, ) = sup f (x, (u))
U
uU
In convex cases, we represent

sup f (x, (u))
uU
by its conic dual problem and let the dual objective function be
(0 , x)
where 0 is the dual variables. The dual objective function is an upper bound
on f (x, (u)) so we could represent the problem by
Minimize
(0 , x)
Subject to
F (x, ) 0 for all U
(ROPT)
dual constraints on 0 .
(6.7)
0,
6.5. TOOL FOR ROBUST OPTIMIZATION
141
We handle the constraints in the exact same manner, and the problem becomes
Minimize
(0 , x)
(ROPT)
(6.8)
Subject to (, x) C
dual constraints on 0 , ,
where (, x) is, component-wise, the dual objective function of
sup F (x, (u)),
uU
and C is a suitable cone.
6.5.1
Examples
ax + b 0
where
(a, b) = (a0 , b0 ) +
u U.
ui (ai , bi )|
i=1
Case of u 1:
k
a0 x + b0 +
max
u: u 1
ui (ai x + bi )
i=1
The dual is
Minimize
Subject to
(a0 x + b0 )
k
i
i=1 (a x
+ bi )2 .
Thus, the robust constraint becomes

k
0 (a0 x + b0 ) +
(ai x + bi )2 .
i=1
Case of u
1:
k
a0 x + b0 +
max
u: u
ui (ai x + bi )
i=1
The dual is
Minimize
Subject to
(a0 x + b0 )
k
i=1
|(ai x + bi )|.
142

k
0 (a0 x + b0 ) +
|(ai x + bi )|.
i=1
Case of u
1:
k
a0 x + b0 +
max
u: u
1 1
ui (ai x + bi )
i=1
The dual is
Minimize
Subject to
(a0 x + b0 ) maxi {|(ai x + bi )|}.

0 (a0 x + b0 ) + max{|(ai x + bi )|}.
i
6.6
6.6.1
Applications
Robust Linear Least Squares
Minimize
Ay c
y Y.
If the data (A, c) are from the set

k
(A, c) = (A0 , c0 ) +
ui (Ai , ci )| u 1,
i=1
the robust counter part can be drawn from the earlier disscusion, since the
original problem is a convex QP problem.
6.6.2
Heat dissipation problem
Consider an electric circuit represented as a connected oriented graph with

N arcs, each of which possesses a given conductance sij . Assuming that the
arcs of the circuit are charged with external voltages y and internal voltages v.
These voltages induce currents in the circuit; as a result, the circuit dissipates
heat. According to Kirchos laws, the heat is given by
H = min (Qv + P y)T S(Qv + P y),
m
vR
6.6. APPLICATIONS
143
where Q and P are some matrices given by the topology of the circuit, m is the
number of nodes of the circuit, and S is N N diagonal matrix with sij as its
diagonal entries.
The problem is to choose y such that
min min (Qv + P y)T S(Qv + P y) ,
m
yY
vR
where Y is a convex set.

Now if
k
S = S(u) = S 0 +
ui S i | u 1,
i=1
we have a robust optimization counterpart:

min max min (Qv + P y)T S(u)(Qv + P y)
m
yY
vR
uU
Since (Qu + P y)T S(u)(Qu + P y) is convex in (y, v) and linear in u, this problem
is equivalent to
min
yY,vRm
6.6.3
max(Qv + P y)T S(u)(Qv + P y) .
uU
Portfolio optimization
Minimize
xT V x
Subject to
T x
eT x = 1.
Now V and may be uncertain, for example,

k
0
ui (V i , i )| u 1.
(V, ) = (V , ) +
i=1
The robust counterpart of the portfolio problem is

Minimize
maxV V [xT V x]
Subject to
minS [T x]
eT x = 1
144
6.7
Exercises
6.1 Prove the S-lemma.

6.2 In the linear least squares problem, assume that
k
(A, c) = (A0 , c0 ) +
ui (Ai , ci )| u 1.
i=1
Construct the robust counterparts of the problem for three norms on u:

u
1,
1,
and
1.
6.3 In the Heat dissipation problem, assume that

k
S = S(u) = S 0 +
ui S i | u 1,
i=1
and si 0 for all i = 0, ..., k. Construct the robust counterparts of the problem
for three norms on u:
u
1,
1,
and
1.
6.4 In the portafolio optimization problem, assume that

k
(V, ) = (V 0 , 0 ) +
ui (V i , i )| u 1,
i=1
and V i 0 for all i = 0, ..., k. Construct the robust counterparts of the problem
for three norms on u:
u
1,
1,
and
1.
Chapter 7
SDP for Quantum

Computation
7.1
Quantum Computation
Quantum Computation uses Quantum Mechanics to enhance computation power.

First concrete result was developed for Integer Factorization by Shor in 1994;
and then for Quantum search algorithm and space reduction by Grover in 1996.
The eld is closely linked to Quantum Information Theory and Statistics, Quantum Channels and Optimization.
To describe quantum mechanics, Hilbert sapces and linear operators are
generally used. In this lecture, we restrict ourselves to nite-dimensional real
vector spaces. In classical quantum physics, a quantum state, or a state, is
N
a
column vector v in the N -dimensional vector space R with norm ||v||2 =
v v = 1.
Nowadays, another formalism is widely accepted, in which a state is a postive
semidenite symmetric matrix V with trace 1 that belongs to RN N . A matrix
of such property is called a density matrix. For example, a density matrix V
with rank one can be represented as V = vv T , and T r(V ) = 1. A density matrix
of rank higher than 1 can be written as
N
T
i v i v i ,
V =
i=1
where i 0 for all i and
i = 1.
A density matrix of rank 1 is called a pure sate, and a density matrix of rank
higher than 1 is called a mixed state. The state V is interpreted as a probalistic
T
mixture of pure states vi vi with mixing probabilities i .
145
146
7.2
CHAPTER 7. SDP FOR QUANTUM COMPUTATION
Completely Positive Map
The state of a system may change as time goes on.

Given {A1 , A2 , , Ak }, where Ai RM N , such that
k
Ai AT = I,
i
i=1
trace-preserving complete positive (CP) map of V is

k
Ai V A T .
i
T (V ) =
i=1
Note that T (V ) RM M , so CP is a linear mapping that conveys quantum

states in RN N to the states in RM M . In quantum computing, the pure state
n
is in R2 or N = 2n , called n-qubit. When n = 1, a unit orthogonal (1; 0) and
(0; 1) represent the classical bits.
A measurement is described as a nite set of symmetric positive semidenite
matrices {M1 , M2 , , Mk } whose sum is an identity matrix, i.e.,
k
Mi
0, i = 1, , k, and
Mi = I,
i=1
called Positive Operator-Valued Measures (POVM).

Given the system in state V and a measurement {M1 , M2 , , Mk } performed on the system, we obtain a random varible X with
Pr(X = l) = V Ml ,
l = 1, , k.
A fundamental problem is quantum detection theory is to nd an optimal

measurement for given mixed states.
Suppose we are given m states V1 , V2 , , Vm with corresponding prior probabilities 1 , 2 , , m . For a measurement M = {M1 , . . . , Mk }, we have
P (j|i) = T r(Vi Mj ).
Let cij ( 0) be the cost (penlaty) of taking a true state i to be j. Then the
average cost is
C(M ) =
i P (j|i)cij =
i,j
Tr
j
i Vi cij
Mj .
Let Wj = i i Vi cij , j = 1, . . . , k. The problem of minimizing the average

cost can be formulated as the following SDP:
Minimize
k
j=1
T r(Wj Mj )
Subject to
k
j=1
Mj = I,
Mj
0, j = 1, . . . , k.
7.3. CHANNEL CAPACITY PROBLEM
147
What is the dual?

k
j=1
Maximize
Subject to
T r(U )
Wj
U, j = 1, . . . , k.
Thus, the optimality Condition becomes

k
Wi
Mj Wj =
j=1
7.3
Wj Mj ,
i = 1, ..., k.
j=1
Channel Capacity Problem
A classical discrete memoryless channel consists of a set of input symbols A =

{a1 , a2 , . . . , am }, a set of output symbols B = {b1 , b2 , . . . , bn } and a set of conn
ditional probabilities Vij = P (bj |ai ) with
Vij = 1, for i = 1, . . . , m.
j=1
Therefore, V = {Vij } is a stochastic matrix, where Vij is the probability

that bj is received when ai is sent through the channel. The capacity C(V ) of a
channel V , dened as the maximum of rates (speeds) at which information can
be transmitted reliably, is given by following theorem due to Shannon.
The classical channel coding theorem is modeled as:
C(V )
Maximize
I(p, V )
Subject to
m
i=1
pi = 1, with p = {pi }
pi 0, i = 1, . . . , m,
where
I(p, V ) =
pi
Vij
Vij log
pk Vkj
j
k
The Kullback -Leibler divergence, dened as follows

D(q||r) =
qj log
j
qj
,
rj
is a measure of distance between two probability distributions q = (qj ) and

r = (rj ), and plays an important role in information theory.
Then, the function I(p, V ) can be written in terms of the above KullbackLeibler divergence
I(p, V ) =
pi D(Vi ||pV ),
i
where r = pV is a distribution dened by rj = i pi Vij . With a channgel V

exed, I(p, V ) is concave in p = (pi ). Hence, the computatution of C(V ) is a
concave maximization problem with linear constraints.
148
A quantum memoryless channel consists of a set S1 of input states (or density

matrices) in Rmm , S2 of output states (or density matrices) in Rnn , and a
CP (map) : S1 S2 . The capacity C() of a quantum channel , dened
to be the maximum of rates (speeds) at which reliable transmission of classical
information through the channel is possible.
The Quantum channel coding theorem due to A. S. Holevo is modeled by
C(V )
Maximize
I({i }, {Xi }, )
Subject to
X i S1 ,
d
i=1
i = 1, . . . , d
i = 1
i 0, i = 1, . . . , d,
where d = n2 , and
i T r((Xi )) log T r((Xi )) log T r((X)) ,
I({i }, {Xi }, ) =
i
with X = i i Xi .
Observe that I({i }, {Xi }, ) can be written in terms of the quantum counterpart of the Kullback-Leibler divergence (often called relative entropy)
D(X||Y ) = T r(X)[log T r(X) log T r(Y )].
Once the ensemble X = {Xi } is xed, I({i }, {Xi }, ) is a concave function in
= {i }. This is again a convex programming problem.
But, in general, I({i }, {Xi }, ) is neither convex nor concave in (, X).
7.4
Quantum Interactive Proof System
Interation Between Prover and Verier.

Consider two CP mappings: given {A1 , A2 , , Ak } and {B1 , B2 , , Bl },
where Ai , Bi RM N , such that
l
Ai AT = I
i
T
Bi Bi = I,
and
i=1
i=1
Dene
Ai V AT
i
T1 (V ) =
i=1
and
l
T
Bi V Bi .
T2 (V ) =
i=1
7.5. NOTES
149
The question is, are there two states V1 and V2 such that
T1 (V1 ) = T2 (V2 );
or for all V1 and V2
T1 (V1 ) T2 (V2 ) .
An SDP Formulation for this problem is
Minimize
Subject to
I V1 = I V2 = 1,
V1 , V 2 0
T1 (V1 ) T2 (V2 ) + t I
t I T1 (V1 ) + T2 (V2 )
7.5
Notes
7.6
Exercises
0
0.
150
Bibliography
[1] M. Abdi, H. El Nahas, A. Jard, and E. Moulines. Semidenite positive relaxation of the maximum-likelihood criterion applied to multiuser
detection in a CDMA context. IEEE Signal Processing Letters, 9(6):165
167, June 2002.
[2] A. A. Ageev and M. I. Sviridenko, An Approximation Algorithm for
Hypergraph Max k-Cut with Given Sizes of Parts, Proceedings of
ESA2000, Lecture Notes in Computer Science 1879, (2000) 32-41.
[3] A. Y. Alfakih. Graph Rigidity via Euclidean Distance Matrices. Linear
Algebra and Its Applications 310:149165, 2000.
[4] A. Y. Alfakih. On Rigidity and Realizability of Weighted Graphs. Linear
Algebra and Its Applications 325:5770, 2001.
[5] A. Y. Alfakih, A. Khandani, H. Wolkowicz. Solving Euclidean Distance
Matrix Completion Problems Via Semidenite Programming. Comput.
Opt. and Appl. 12:1330, 1999.
[6] A. Y. Alfakih, H. Wolkowicz. On the Embeddability of Weighted Graphs
in Euclidean Spaces. Research Report CORR 9812, University of Waterloo, Dept. of Combinatorics and Optimization, 1998.
[7] A. Y. Alfakih, H. Wolkowicz. Euclidean Distance Matrices and the Molecular Conformation Problem. Research Report CORR 200217, University of Waterloo, Dept. of Combinatorics and Optimization, 2002.
[8] F. Alizadeh. Combinatorial optimization with interior point methods
and semidenite matrices. Ph.D. Thesis, University of Minnesota, Minneapolis, MN, 1991.
[9] F. Alizadeh. Optimization over the positive semi-denite cone: Interior
point methods and combinatorial applications. In P. M. Pardalos, editor,
Advances in Optimization and Parallel Computing, pages 125. North
Holland, Amsterdam, The Netherlands, 1992.
[10] F. Alizadeh. Interior point methods in semidenite programming with
applications to combinatorial optimization. SIAM J. Optimization, 5:13
51, 1995.
151
152
BIBLIOGRAPHY
[11] F. Alizadeh, J. P. A. Haeberly, and M. L. Overton, Primal-dual interior point methods for semidenite programming, Technical Report
659, Computer Science Department, Courant Institute of Mathematical
Sciences, New York University, 1994.
[12] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity
and nondegeneracy in semidenite programming. Math. Programming,
77:111128, 1997.
[13] B. Alkire and L. Vandenberghe. Convex optimization problems involving
nite autocorrelation sequences. Mathematical Programming. To appear.
[14] E. D. Andersen and K. D. Andersen. The APOS linear programming
solver: an implementation of the homogeneous algorithm. Working Paper, Department of Management, Odense University, Denmark, 1997.
[15] E. D. Andersen and Y. Ye. On a homogeneous algorithm for the monotone complementarity problem. Working Paper, Department of Management Sciences, The University of Iowa, IA, 1995. To appear in Math.
Programming.
[16] G. Andersson and L. Engebrestsen, Better Approximation Algorithms
for Set Splitting and Not-All-Equal SAT, Information Processing Letters
65, (1998) 305-311.
[17] K. D. Andersen and E. Christiansen. A newton barrier method for
minimizing a sum of Euclidean norms subject to linear equality constraints. Preprint 95-07, Department of Mathematics and Computer
Science, Odense University, Denmark, 1995.
[18] K. M. Anstreicher. A standard form variant and safeguarded linesearch
for the modied Karmarkar algorithm. Math. Programming, 47:337351,
1990.
[19] K. M. Anstreicher and M. Fampa, A long-step path following algorithm
for semidenite programming problems, Working Paper, Department of
Management Science, The University of Iowa, Iowa City, 1996.
[20] K. M. Anstreicher, D. den Hertog, C. Roos, and T. Terlaky. A long
step barrier method for convex quadratic programming. Algorithmica,
10:365382, 1993.
[21] K. M. Anstreicher. On long step path following and SUMT for linear
and quadratic programming. SIAM J. Optimization, 6:33-46, 1996.
[22] S. Arora, S. Rao and U. Vazirani. Expander ows, geometric embeddings
and graph partitioning. The 35th STOC04 2004.
BIBLIOGRAPHY
153
[23] Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama, Greedily nding

a dense subgraph, in Proceedings of the 5th Scandinavian Workshop on
Algorithm Theory (SWAT), Lecture Notes in Computers Science, 1097,
136-148, Springer-Verlag, 1996.
[24] James Aspnes, David Goldenberg, Yang Richard Yang. On the Computational Complexity of Sensor Network Localization. ALGOSENSORS
2004, in LNCS 3121:3244, 2004.
[25] M. Badoiu, S. Har-Peled and P. Indyk, Approximate Clustering via
Core-sets, In Proc. ACM Symp. Theory of Computing, 2002.
[26] M. Bdoiu. Approximation Algorithm for Embedding Metrics into a
a
TwoDimensional Space. Proc. 14th SODA 434-443, 2003.
[27] M. Bdoiu, E. D. Demaine, M. T. Hajiaghayi, P. Indyk. Low
a
Dimensional Embedding with Extra Information. Proc. 20th SoCG 320
329, 2004.
[28] O. Bahn, O. Du Merle, J. L. Gon, and J. P. Vial. A cutting plane
method from analytic centers for stochastic programming. Math. Programming, 69:4574, 1995.
[29] Balas, E., Ceria, S. and Cornuejols, G., A lift-and-project cutting plane
algorithm for mixed 0-1 programs. Mathematical Programming, 58:295
324, 1993.
[30] R. Banavar and A. Kalele. A mixed norm performance measure for the
design of multirate lterbanks. IEEE Transactions on Signal Processing,
49(2):354359, Feb. 2001.
[31] I. Brny and Fredi. Computing the volume is dicult. In Proceedings
aa
u
of the 18th Annual ACM Symposium on Theory of Computing, page 442447, ACM, New York, 1986.
[32] G. Barequet and S. Har-Peled, Eciently Approximating the MinimumVolume Bounding Box of a Point Set in Three Dimensions, J. Algorithms, 38:91-109, 2001.
[33] A. Barvinok. Problems of Distance Geometry and Convex Properties of
Quadratic Maps. Disc. Comp. Geom. 13:189202, 1995.
[34] A. Barvinok. A Remark on the Rank of Positive Semidenite Matrices
Subject to Ane Constraints. Disc. Comp. Geom. 25(1):2331, 2001.
[35] A. Barvinok. A Course in Convexity. AMS, 2002.
[36] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of linear programming, Part I: Ane and projective scaling trajectories. Transactions
of the American Mathematical Society, 314(2):499526, 1989.
154
BIBLIOGRAPHY
[37] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of linear programming, Part II: Legendre transform coordinates. Transactions of the
American Mathematical Society, 314(2):527581, 1989.
[38] D. A. Bayer and J. C. Lagarias. Karmarkars linear programming algorithm and Newtons method. Math. Programming, 50:291330, 1991.
[39] M. Bellare and P. Rogaway. The complexity of approximating a nonlinear
program. Math. Programming, 69:429-442, 1995.
[40] S. J. Benson and Y. Ye.
DSDP User Guide.
unix.mcs.anl.gov/ benson, November 2000.
http://www-
[41] S. J. Benson, Y. Ye, and X. Zhang. Mixed linear and semidenite programming for combinatorial optimization. Optimization Methods and
Software, 10:515544, 1999.
[42] S. J. Benson, Y. Ye, and X. Zhang. Solving large scale sparse semidenite
programs for combinatorial optimization. SIAM Journal of Optimization,
10:443461, 2000.
[43] A. Ben-Tal, M. Kocvara, A. Nemirovski, and J. Zowe. Free material optimization via semidenite programming: the multiload case with contact
conditions. SIAM Review, 42(4):695715, 2000.
[44] A. Ben-Tal and A. Nemirovski. Robust truss topology design via semidefinite programming. SIAM J. Optim., 7(4):9911016, 1997.
[45] A. Ben-Tal and A. Nemirovski. Structural design via semidenite programming. In Handbook on Semidenite Programming, pages 443467.
Kluwer, Boston, 2000.
[46] A. BenTal and A. S. Nemirovskii. Interior point polynomial time
method for truss topology design. SIAM J. Optimization, 4:596612,
1994.
[47] D. P. Bertsekas and J. N. Tsitsiklis. Introduction to Linear Optimization.
Athena Scientic, Belmont, MA, 1997.
[48] D. P. Bertsekas. Nonlinear Programming. Athena Scientic, Belmont,
MA, 1995.
[49] D. Bertsimas and J. Nino-Mora. Optimization of multiclass queuing
networks with changeover times via the achievable region approach: part
ii, the multi-station case. Mathematics of Operations Research, 24(2),
May 1999.
[50] D. Bertsimas and Y. Ye. Semidenite relaxations, multivariate normal
distributions, and order statistics. Handbook of Combinatorial Optimization (Vol. 3), D.-Z. Du and P.M. Pardalos (Eds.) pp. 1-19, (1998 Kluwer
Academic Publishers).
BIBLIOGRAPHY
155
[51] P. Biswas, Y. Ye. Semidenite Programming for Ad Hoc Wireless Sensor

Network Localization. Proc. 3rd IPSN 4654, 2004.
[52] R. G. Bland, D. Goldfarb and M. J. Todd. The ellipsoidal method: a
survey. Operations Research, 29:10391091, 1981.
[53] L. Blum, M. Shub and S. Smale. On a theory of computations over
the real numbers: NP-completeness, recursive functions and universal
machines. Proc. 29th Symp. Foundations of Computer Science, 387397,
1988.
[54] H.L. Bodlaender, P. Gritzmann, V. Klee and J. Van Leeuwen, The
Computational Complexity of Norm Maximization, Combinatorica, 10:
203-225, 1990.
[55] K. H. Borgwardt. The Simplex Method: A Probabilistic Analysis.
Springer, Berlin, 1987.
[56] S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix
Inequalities in System and Control Science. SIAM Publications. SIAM,
Philadelphia, 1994.
[57] B. Borchers. CSDP, a C library for semidenite programming. Optimization Software and Methods, 11:613623, 1999.
[58] B. Borchers. SDPLIB 1.0 : A collection of semidenite programming test
problems. Technical report, Faculty of Mathematics, Institute of Mining
and Technology, New Mexico Tech, Socorro, NM, USA, July 1998.
[59] B. Borchers. Presentation at the Seventh DIMACS Implementation Challenge Piscataway, NJ, November 2, 2000.
[60] A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovasz and M. Simonovits, Deterministic and Randomized Polynomial-time Approximation of Radii, To appear in Mathematika.
[61] A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovasz and M. Simonovits, Approximation of Diameters: Randomization Doesnt Help,
In Proc. IEEE Symp. Foundations of Comp. Sci., 244-251, 1998.
[62] A. Brieden, Geometric Optimization Problems Likely Not Contained in
APX, Discrete Comput. Geom., 28:201-209, 2002.
[63] P. Biswas and Y. Ye. Semidenite Programming for Ad Hoc Wireless Sensor Network Localization. Management Science and Engineering, Stanford, CA 94305, September 2003.
[64] P. Biswas and Y. Ye. A distributed method for solving semideinite programs arising from Ad Hoc Wireless Sensor Network Localization. Management Science and Engineering, Stanford, CA 94305, October 2003.
156
BIBLIOGRAPHY
[65] N. Bulusu, J. Heidemann, D. Estrin. GPS-less low cost outdoor localization for very small devices. TR 00-729, Computer Science, University of
Southern California, April, 2000.
[66] S. Burer and R. D. C. Monteiro. An ecient algorithm for solving the
MAXCUT SDP relaxation. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA USA,
December 1998.
[67] S. Burer, R. D. C. Monteiro, and Y. Zhang. Solving semidenite programs via nonlinear programming. part I: Transformations and derivatives. Technical Report TR99-17. Department of Computational and
Applied Mathematics, Rice University, TX, September 1999.
[68] S. Burer, R. D. C. Monteiro, and Y. Zhang. Solving semidenite programs via nonlinear programming. part II: Interior point methods for a
subclass of SDPs. Technical Report TR99-23. Department of Computational and Applied Mathematics, Rice University, TX, October 1999.
[69] G. Calaore and M. Indri. Robust calibration and control of robotic
manipulators. In American Control Conference, pages 20032007, 2000.
[70] S. A. Cook. The complexity of theorem-proving procedures. Proc. 3rd
ACM Symposium on the Theory of Computing, 151158, 1971.
[71] R. W. Cottle and G. B. Dantzig. Complementary pivot theory in mathematical programming. Linear Algebra and its Applications, 1:103-125,
1968.
[72] R. Cottle, J. S. Pang, and R. E. Stone. The Linear Complementarity
Problem, chapter 5.9 : Interiorpoint methods, pages 461475. Academic
Press, Boston, 1992.
[73] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey, 1963.
[74] L. Danzer, B. Grnbaum and V. Klee. Hellys theorem and its relatives.
u
in V. Klee, editor, Convexity, page 101180, AMS, Providence, RI, 1963.
[75] T. Davidson, Z. Luo, and K. Wong. Design of orthogonal pulse shapes
for communications via semidenite programming. IEEE Transactions
on Communications, 48(5):14331445, May 2000.
[76] J. E. Dennis and R. E. Schnabel. Numerical Methods for Unconstrained
Optimization and Nonlinear Equations. PrenticeHall, Englewood Clis,
New Jersey, 1983.
[77] C. de Soua, R. Palhares, and P. Peres. Robust H lter design for
uncertain linear systems with multiple time-varying state delays. IEEE
Transactions on Signal Processing, 49(3):569575, March 2001.
BIBLIOGRAPHY
157
[78] I. I. Dikin. On the convergence of an iterative process. Upravlyaemye

Sistemi, 12:5460, 1974. (In Russian).
[79] L. Doherty, L. E. Ghaoui, S. J. Pister. Convex Position Estimation in
Wireless Sensor Networks. Proc. 20th INFOCOM Volume 3:16551663,
2001.
[80] A. Doherty, P. Parrilo, and F. Spedalieri. Distinguishing separable and
entangled states. Physical Review Letters, 88(18), 2002.
[81] Y. Doids, V. Guruswami, and S. Khanna. The 2-catalog segmentation
problem. In Proceedings of SODA, pages 378380, 1999.
[82] C. Du, L. Xie, and Y. Soh. H ltering of 2-D discrete systems. IEEE
Transactions on Signal Processing, 48(6):17601768, June 2000.
[83] H. Du, L. Xie, and Y. Soh. H reduced-order approximation of 2-D digital lters. IEEE Transactions on Circuits and Systems I: Fundamental
Theory and Applications, 48(6):688698, June 2001.
[84] G. E. Dullerud and F. Paganini. A Course in Robust Control Theory,
volume 36 of Texts in Applied Mathematics. Springer-Verlag, 2000.
[85] B. Dumitrescu, I. Tabus, and P Stoica. On the parameterization of positive real sequences and MA parameter estimation. IEEE Transactions
on Signal Processing, 49(11):26302639, November 2001.
[86] M. E. Dyer and A. M. Frieze. The complexity of computing the volume
of a polyhedron. SIAM J. Comput., 17:967974, 1988.
[87] M. E. Dyer, A. M. Frieze and R. Kannan. A random polynomial time
algorithm for estimating volumes of convex bodies. J. Assoc. Comp.
Mach., 38:117, 1991.
[88] J. Edmonds. Systems of distinct representatives and linear algebra. J.
Res. Nat. Bur. Standards, 71B, 1967, pages 241245.
[89] T. Eren, D. K. Goldenberg, W. Whiteley, Y. R. Yang, A. S. Moore,
B. D. O. Anderson, P. N. Belhumeur. Rigidity, Computation, and Randomization in Network Localization. Proc. 23rd INFOCOM, 2004.
[90] G. Elekes. A geometric inequality and the complexity of computing
volume. Discr. Comput. Geom., 1:289292, 1986.
[91] J. Elzinga and T. G. Moore. A central cutting plane algorithm for the
convex programming problem. Math. Programming, 8:134145, 1975.
[92] U. Faigle, W. Kern, and M. Streng, Note on the Computaional Complexity of j-Radii of Polytopes in Rn , Mathematical Programming, 73:15, 1996.
158
BIBLIOGRAPHY
[93] J. Falkner, F. Rendl, and H. Wolkowicz, A computational study of graph

partitioning, Mathematical Programming, 66(2):211-240, 1994.
[94] S. C. Fang and S. Puthenpura. Linear Optimization and Extensions.
PrenticeHall, Englewood Clis, NJ, 1994.
[95] L. Faybusovich. Jordan algebras, Symmetric cones and Interior-point
methods. Research Report, University of Notre Dame, Notre Dame,
1995.
[96] L. Faybusovich. Semidenite programming: a path-following algorithm
for a linear-quadratic functional. SIAM J. Optimization, 6:10071024,
1996.
[97] U. Feige and M. X. Goemans. Approximating the value of two prover
proof systems, with applications to max 2sat and max dicut. In Proceedings of the 3nd Israel Symposium on Theory and Computing Systems,
pages 182189, 1995.
[98] U. Feige and M. X. Goemans, Approximating the value of two prover
proof systems, with applications to MAX 2SAT and MAX DICUT, in
Proceedings of the Third Israel Symposium on Theory of Computing and
Systems, (1995) 182-189.
[99] U. Feige and M. Langberg, Approximation algorithms for maximization
problems arising in graph partitioning, to appear in J. Algorithms.
[100] U. Feige and M. Seltser, On the densest k-subgraph problem, Technical
report, Department of Applied Mathematics and Computer Science, The
Weizmann Institute, Rehovot, September 1997.
[101] U. Feige, G. Kortsarz and D. Peleg, The dense k-subgraph problem,
to appear in Algorithmica.
[102] U. Feige and M. Langberg. Approximation algorithms for maximization
problems arising in graph partitioning. Journal of Algorithms, 41:174
211, 2001.
[103] U. Feige and M. Langberg. The rpr2 rounding technique for semidente
programs. In ICALP, Lecture Notes in Computer Science. Springer,
Berlin, 2001.
[104] A. V. Fiacco and G. P. McCormick. Nonlinear Programming : Sequential
Unconstrained Minimization Techniques. John Wiley & Sons, New York,
1968. Reprint : Volume 4 of SIAM Classics in Applied Mathematics,
SIAM Publications, Philadelphia, PA, 1990.
[105] R. Freund and F. Jarre. An interiorpoint method for fractional programs
with convex constraints. Math. Programming, 67:407440, 1994.
BIBLIOGRAPHY
159
[106] R. M. Freund. Dual gauge programs, with applications to quadratic

programming and the minimum-norm problem. Math. Programming,
38:4767, 1987.
[107] R. M. Freund. Polynomialtime algorithms for linear programming based
only on primal scaling and projected gradients of a potential function.
Math. Programming, 51:203222, 1991.
[108] E. Fridman and U. Shaked. A new H lter design for linear time delay
systems. IEEE Transactions on Signal Processing, 49(11):28392843,
July 2001.
[109] K. R. Frisch. The logarithmic potential method for convex programming. Unpublished manuscript, Institute of Economics, University of
Oslo, Oslo, Norway, 1955.
[110] A. Frieze and M. Jerrum. Improved approximation algorithms for max
k-cut and max bisection. Algorithmica, 18:6781, 1997.
[111] M. Fu, C. de Souza, and Z. Luo. Finite-horizon robust Kalman lter
design. IEEE Transactions on Signal Processing, 49(9):21032112, Sept.
2001.
[112] M. Fu, Z.-Q. Luo and Y. Ye. Approximation algorithms for quadratic
programming. manuscript, Department of Electrical and Computer Engineering, McMaster University, Hamilton, Ontario, CANADA L8S 4K1,
1996.
[113] T. Fujie and M. Kojima, Semidenite programming relaxation for nonconvex quadratic programs, Journal of Global Optimization 10:367-380,
(1997).
[114] K. Fujisawa, M. Fukuda, M. Kojima, and K. Nakata. Numerical Evaluation of SDPA (SemiDenite Programming Algorithm). In High Performance Optimization. Kluwer Academic Press, 1999, pp. 267301.
[115] K. Fujisawa, M. Kojima and K. Nakata, Exploiting Sparsity in PrimalDual Interior Point Methods for Semidenite Programming Research
Report B-324, Department of Mathematical and Computing Sciences,
Tokyo Institute of Technology, Oh-Okayama, Meguro-ku, Tokyo 152,
September 1997.
[116] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and S.
Wicker. An empirical study of epidemic algorithms in large scale multihop wireless networks. UCLA/CSD-TR-02-0013, Computer Science,
UCLA, 2002.
[117] D. Gale. The Theory of Linear Economic Models. McGraw-Hill, New
York, 1960.
160
BIBLIOGRAPHY
[118] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide

to the Theory of NP-Completeness. W. H. Freeman, San Francisco, 1979.
[119] D. M. Gay. A variant of Karmarkars linear programming algorithm for
problems in standard form. Math. Programming, 37:8190, 1987. Errata
in Math. Programming, 40:111, 1988.
[120] J. Geromel. Optimal linear ltering under parameter uncertainty. IEEE
Transactions on Signal Processing, 47(1):168175, Jan. 1999.
[121] J. Geromel and M. De Oliveira. H2 /H robust ltering for convex
bounded uncertain systems. IEEE Transactions on Automatic Control,
46(1):100107, Jan. 2001.
[122] L. E. Gibbons, D. W. Hearn and P. M. Pardalos. A continuous based
heuristic for the maximum clique problem. DIMACS Series in Discrete
Mathematics and Theoretical Computer Science, 26:103-124, 1996.
[123] P. E. Gill, W. M. Murray, and M. H. Wright. Practical Optimization.
Academic Press, London, 1981.
[124] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin, and M. H. Wright.
On projected Newton barrier methods for linear programming and an
equivalence to Karmarkars projective method. Math. Programming,
36:183209, 1986.
[125] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisability problems using semidenite
programming. Journal of Assoc. Comput. Mach., 42:11151145, 1995.
[126] M. X. Goemans and D. P. Williamson, Approximation algorithms
for MAX-3-CUT and other problems via complex semidenite programming, in 33rd STOC, pp. 443-452, 2001.
[127] J. L. Gon, Z. Q. Luo, and Y. Ye. Complexity analysis of an interiorpoint cutting plane method for convex feasibility problem. SIAM J.
Optimization, 6(3):638652, 1996.
[128] J. L. Gon and J. P. Vial. Cutting planes and column generation techniques with the projective algorithm. J. Optim. Theory Appl., 65:409
429, 1990.
[129] D. Goldfarb and G. Iyengar. Robust portfolio selection problems. Mathematics of Operations Research. Submitted.
[130] D. Goldfarb, K. Scheinberg. Interior Point Trajectories in Semidenite
Programming. SIAM J. Opt. 8(4):871886, 1998.
BIBLIOGRAPHY
161
[131] D. Goldfarb and M. J. Todd. Linear Programming. In G. L. Nemhauser,

A. H. G. Rinnooy Kan, and M. J. Todd, editors, Optimization, volume 1
of Handbooks in Operations Research and Management Science, pages
141170. North Holland, Amsterdam, The Netherlands, 1989.
[132] A. J. Goldman and A. W. Tucker. Polyhedral convex cones. In H. W.
Kuhn and A. W. Tucker, Editors, Linear Inequalities and Related Systems, page 1940, Princeton University Press, Princeton, NJ, 1956.
[133] G. H. Golub and C. F. Van Loan. Matrix Computations, 2nd Edition.
Johns Hopkins University Press, Baltimore, MD, 1989.
[134] C. C. Gonzaga. Polynomial ane algorithms for linear programming.
[135] C. C. Gonzaga. An algorithm for solving linear programming problems
in O(n3 L) operations. In N. Megiddo, editor, Progress in Mathematical
Programming : Interior Point and Related Methods, Springer Verlag,
New York, 1989, pages 128.
[136] C. C. Gonzaga. Path following methods for linear programming. SIAM
Review, 34(2):167227, 1992.
[137] J. Graver, B. Servatius, H. Servatius. Combinatorial Rigidity. AMS,
1993.
[138] P. Gritzmann and V. Klee, Inner and Outer j-Radii of Convex Bodies
in Finite-Dimensional Normed Spaces, Discrete Comput. Geom., 7:255280, 1992.
[139] P. Gritzmann and V. Klee, Computational Complexity of Inner and
Outer j-Radii of Polytopes in Finite-Dimenional Normed Spaces, Math.
Program., 59:162-213, 1993.
[140] P. Gritzmann and V. Klee, On the Complexity of Some basic Problems
in Computational Convexity: I. Containment Problems, Discrete Math.,
136:129-174, 1994.
[141] P. Gritzmann and V. Klee. Mathematical programming and convex geometry. In P. Gruber and J. Wills, editors, Hand Book of Convex Geometry, North Holland, Amsterdam, 1993, page 627674.
[142] M. Grtschel, L. Lovsz and A. Schrijver. Geometric Algorithms and
o
a
Combinatorial Optimization. Springer, Berlin, 1988.
[143] B. Grnbaum. Convex Polytopes. John Wiley & Sons, New York, NY,
u
1967.
[144] O. Gler. Barrier functions in interior-point methods. Mathematics of
u
Operations Research, 21(4):860885, 1996.
162
BIBLIOGRAPHY
[145] O. Gler. Existence of interior points and interior paths in nonlinear

u
monotone complementarity problems. Mathematics of Operations Research, 18(1):128147, 1993.
[146] O. Gler and L. Tuncel. Characterizations of the barrier parameter of
u
homogeneous convex cones. Research Report CORR 95-14, Department
of Combinatorics and Optimization, University of Waterloo, Waterloo,
Ontario, Canada, 1995. To appear in Math. Programming.
[147] O. Gler and Y. Ye. Convergence behavior of interior point algorithms.
u
[148] D. Guo, L. Rasmussen, S. Sun, and T. Lim. A matrix-algebraic approach
to linear parallel interference cancellation in CDMA. IEEE Transactions
on Communications, 48(1):152161, Jan. 2000.
[149] E. Halperin and U. Zwick, Approximation algorithms for MAX 4-SAT
and rounding procedures for semidenite programs, Proceedings of 7th
IPCO (1999), 202-217.
[150] E. Halperin and U. Zwick. A unied framework for obtaining improved
approximation algorithms for maximum graph bisection problems. In
IPCO, Lecture Notes in Computer Science. Springer, Berlin, 2001.
[151] Q. Han, Y. Ye, and J. Zhang. An improved rounding method and semidefinite programming relaxation for graph partition. Math. Programming,
92:509535, 2002.
[152] S. Har-Peled and K.R. Varadarajan, Approximate Shape Fitting via
Linearization, In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci.,
66-73, 2001.
[153] S. Har-Peled and K. Varadarajan, Projective Clustering in High Dimensions Using Core-sets, In Proc. ACM Symp. Comput. Geom., 2002.
[154] S. Har-Peled and K. Varadarajan, High-Dimensional Shap Fitting in
Linear Time, In Proc. ACM Symp. Comput. Geom., 2003.
[155] J. Hartmanis and R. E. Stearns, On the computational complexity of
algorithms. Trans. A.M.S, 117:285306, 1965.
[156] C. Helmberg and F. Rendl. A spectral bundle method for semidenite
programming. SIAM Journal of Optimization, 10:673696, 2000.
[157] C. Helmberg, F. Rendl, R. J. Vanderbei and H. Wolkowicz. An interior point method for semidenite programming. SIAM J. Optimization,
6:342361, 1996.
[158] B. Hendrickson. Conditions for Unique Graph Realizations. SIAM J.
Comput. 21(1):6584, 1992.
BIBLIOGRAPHY
163
[159] B. Hendrickson. The Molecule Problem: Exploiting Structure in Global

Optimization. SIAM J. Opt. 5(4):835857, 1995.
[160] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Programming, Algorithms and Complexity. Ph.D. Thesis, Faculty of
Mathematics and Informatics, TU Delft, NL2628 BL Delft, The Netherlands, 1992.
[161] J. Hightower and G. Boriello. Location systems for ubiquitous computing.
IEEE Computer, 34(8) (2001) 57-66.
[162] R. Horst, P. M. Pardalos, and N. V. Thoai. Introduction to Global Optimization. Kluwer Academic Publishers, Boston, 1995.
[163] A. Howard, M. J. Mataric, and G. S. Sukhatme. Relaxation on a mesh:
a formalism for generalized localization. In Proc. IEEE/RSJ Intl Conf.
on Intelligent Robots and Systems (IROS01) (2001) 1055-1060.
[164] L. Huaizhong and M. Fu. A linear matrix inequality approach to robust
H ltering. IEEE Transactions on Signal Processing, 45(9):23382350,
Sept. 1997.
[165] P. Huard. Resolution of mathematical programming with nonlinear constraints by the method of centers. In J. Abadie, editor, Nonlinear Programming, pages 207219. North Holland, Amsterdam, The Netherlands,
1967.
[166] P. Huard and B. T. Liu. La mthode des centres dans un espace
e
e
topologique. Numeriche Methematik, 8:56-67, 1966.
[167] B. Jackson, T. Jordn. Connected Rigidity Matroids and Unique Reala
izations of Graphs. Preprint, 2003.
[168] R. Jagannathan and S. Schaible. Duality in generalized fractional programming via Farkas lemma. Journal of Optimization Theory and Applications, 41:417424, 1983.
[169] B. Jansen. Interior Point Techniques in Optimization. Ph.D. Thesis,
Faculty of Mathematics and Informatics, TU Delft, NL2628 BL Delft,
The Netherlands, 1995.
[170] F. Jarre. An interiorpoint method for minimizing the largest eigenvalue
of a linear combination of matrices. SIAM J. Control and Optimization,
31:13601377, 1993.
[171] F. Jarre. Interiorpoint methods for convex programming. Applied Mathematics & Optimization, 26:287311, 1992.
[172] F. Jarre, M. Kocvara, and J. Zowe. Optimal truss design by interior
point methods. SIAM J. Optim., 8(4):10841107, 1998.
164
BIBLIOGRAPHY
[173] F. John. Extremum problems with inequalities as subsidiary conditions.

In Sduties and Essays, presented to R. Courant on his 60th Birthday,
January 8, 1948, Interscience, New York, 1948, page 187-204.
[174] N. Johnson and S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions, John Wiley & Sons, 1972.
[175] W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz Mapping
into Hilbert Space, Comtemporary Mathematics, 26:189-206, 1984.
[176] M. J Kaiser, T. L. Morin and T. B. Trafalis. Centers and invariant points
of convex bodies. In P. Gritzmann and B. Sturmfels, editors, Applied
Geometry and Discrete Mathematics: The Victor Klee Festschrift, Amer.
Math. Soc. and Ass. Comput. Mach., Providence, RI, 1991.
[177] D. Karger, R. Motwani and M. Sudan, Approximate graph coloring by
semidenite programming, in 35th FOCS (1994), 2-13.
[178] S. E. Karisch, F. Rendl, and J. Clausen, Solving graph bisection problems with semidenite programming, Technical Report DIKU-TR-97/9,
Department of Computer Science, University of Copenhagen, July 1,
1997.
[179] N. K. Karmarkar. A new polynomialtime algorithm for linear programming. Combinatorica, 4:373395, 1984.
[180] R. M. Karp. Reducibility among combinatorial problems. R. E.
Miller and J. W. Thatcher, eds., Complexity of Computer Computations,
Plenum Press, New York, 1972, pages 85103.
[181] L. G. Khachiyan. A polynomial algorithm for linear programming. Doklady Akad. Nauk USSR, 244:10931096, 1979. Translated in Soviet Math.
Doklady, 20:191194, 1979.
[182] L. G. Khachiyan and M. J. Todd. On the complexity of approximating
the maximal inscribed ellipsoid for a polytope. Math. Programming,
61:137160, 1993.
[183] J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. In Proceedings of the 30th Symposium on Theory of Computation,
pages 473482, 1998.
[184] V. Klee and G. J. Minty. How good is the simplex method. In O. Shisha,
editor, Inequalities III, Academic Press, New York, NY, 1972.
[185] E. de Klerk, C. Roos and T. Terlaky. Initialization in semidenite programming via a selfdual skewsymmetric embedding. Report 9610,
Faculty of Technical Mathematics and Computer Science, Delft University of Technology, Delft, 1996. To appear in Operations Research Letters.
BIBLIOGRAPHY
165
[186] M. Kocvara, J. Zowe, and A. Nemirovski. Cascadingan approach to

robust material optimization. Computers and Structures, 76:431442,
2000.
[187] M. Kojima, N. Megiddo, T. Noma and A. Yoshise. A unied approach to
interior point algorithms for linear complementarity problems. Lecture
Notes in Computer Science 538, Springer Verlag, New York, 1991.
[188] M. Kojima, S. Mizuno, and A. Yoshise. A polynomialtime algorithm
for a class of linear complementarity problems. Math. Programming,
44:126, 1989.
[189] M. Kojima, S. Mizuno, and A. Yoshise. An O( nL) iteration potential

reduction algorithm for linear complementarity problems. Math. Programming, 50:331342, 1991.
[190] M. Kojima, S. Shindoh, and S. Hara Interior-point methods for the monotone semidenite linear complementarity problem in symmetric matrices.
SIAM J. Optimization, 7:86125, 1997.
[191] M. Kojima, N. Megiddo, and Y. Ye. An interior point potential reduction
algorithm for the linear complementarity problem. Math. Programming,
54:267279, 1992.
[192] K. O. Kortanek, F. Potra, and Y. Ye. On some ecient interior point
methods for nonlinear convex programming. Linear Algebra and Its Applications, 152:169189, 1991.
[193] G. Kortsarz and D. Peleg, On choosing a dense subgraph, in 34th
FOCS, (1993) 692-701.
[194] E. Kranich. Interiorpoint methods bibliography. SIAG/OPT Views
andNews, A Forum for the SIAM Activity Group on Optimization, 1:11,
1992.
[195] H. W. Kuhn and A. W. Tucker. Nonlinear programming. In J. Neyman,
editor, Proceedings of the Second Berkeley Symposium on Mathematical
Statistics and Probability, pages 481492, University of California Press,
Berkeley and Los Angeles, CA, 1961.
[196] J. C. Lagarias. The nonlinear geometry of linear programming, Part III :
Projective Legendre transform coordinates and Hilbert geometry. Transactions of the American Mathematical Society, 320:193225, 1990.
[197] J. C. Lagarias. A collinear scaling interpretation of Karmarkars linear
programming algorithm. SIAM J. Optimization, 3:630636, 1993.
[198] J. Lasserre. Global optimization with polynomials and the problem of
moments. SIAM J. Optimization, 11:796817, 2001.
166
BIBLIOGRAPHY
[199] J. Lasserre. Semidenite programming vs. lp relaxation for polynomial

programming. Mathematics of Operations Research, 27(2):347360, May
2002.
[200] M. Laurent. Matrix Completion Problems. The Encyclopedia of Optimization 3:221229, 2001.
[201] A. Levin. On an algorithm for the minimization of convex functions.
Soviet Math. Doklady, 6:286-290, 1965.
[202] C. Lin and R. Saigal, On solving large scale semidenite programming
problems - a case study of quadratic assignment problem, Technical Report, Dept. of Ind. and Oper. Eng., University of Michigan, Ann Arbor,
MI, 1997.
[203] N. Linial, E. London, Yu. Rabinovich. The Geometry of Graphs and
Some of Its Algorithmic Applications. Combinatorica 15(2):215245,
1995.
[204] M. S. Lobo, L. Vandenberghe, and S. Boyd. Second-order cone programming. Manuscript, Information Systems Laboratory, Dept. of Electrical
Engineering, Stanford University, Stanford, CA, 1997.
[205] L. Lovasz. An Algorithmic Theory of Numbers, Graphs and Convexity,
volume 50 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1986.
[206] L. Lovsz, On the Shannon Capacity of a graph, IEEE Transactions
a
of Information Theory IT-25(1):1-7, Jan. 1979.
[207] L. Lovsz and A. Shrijver. Cones of matrices and setfunctions, and 0 1
a
optimization. SIAM J. Optimization, 1:166-190, 1990.
[208] W. Lu. A unied approach for the design of 2-D digital lters via semidefinite programming. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 49(6):814826, June 2002.
[209] D. G. Luenberger. Linear and Nonlinear Programming, 2nd Edition.
AddisonWesley, Menlo Park, CA, 1984.
[210] Z. Q. Luo, J. Sturm and S. Zhang. Superlinear convergence of a symmetric primal-dual path following algorithm for semidenite programming.
SIAM J. Optimization 8(1):59-81, 1998.
[211] Z. Q. Luo, J. Sturm and S. Zhang. Duality and self-duality for conic
convex programming. Report 9620/A, Econometric Institute, Erasmus
University Rotterdam, 1996.
[212] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Computational experience
with a globally convergent primaldual predictorcorrector algorithm for
linear programming. Math. Programming, 66:123135, 1994.
BIBLIOGRAPHY
167
[213] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Interior point methods : Computational state of the art. ORSA Journal on Computing, 6:1
14, 1994.
[214] S. Mahajan and H. Ramesh, Derandomizing semidenite programming
based approximation algorithms, in 36th FOCS, pp. 162-169, 1995.
[215] M. Mahmoud and A. Boujarwah. Robust H ltering for a class of linear
parameter-varying systems. IEEE Transactions on Circuits and Systems
I: Fundamental Theory and Applications, 48(9):11311138, Sept. 2001.
[216] L. McLinden. The analogue of Moreaus proximation theorem, with
applications to the nonlinear complementarity problem. Pacic Journal
of Mathematics, 88:101161, 1980.
[217] N. Megiddo. Pathways to the optimal set in linear programming. In
N. Megiddo, editor, Progress in Mathematical Programming : Interior
Point and Related Methods, pages 131158. Springer Verlag, New York,
1989. Identical version in : Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, 1986, pages 135.
[218] S. Mehrotra. Quadratic convergence in a primaldual method. Mathematics of Operations Research, 18:741-751, 1993.
[219] S. Mehrotra. On the implementation of a primaldual interior point
method. SIAM J. Optimization, 2(4):575601, 1992.
[220] G. Millerioux and J. Daafouz. Global chaos synchronization and robust
ltering in noisy context. IEEE Transactions on Circuits and Systems I:
Fundamental Theory and Applications, 48(10):11701176, Oct. 2001.
[221] Hans D. Mittelmann.
Independent
http://plato.la.asu.edu/, November 2000.
Benchmark
Results
[222] S. Mizuno, M. J. Todd, and Y. Ye. On adaptive step primaldual

interiorpoint algorithms for linear programming. Mathematics of Operations Research, 18:964981, 1993.
[223] S. Mizuno, M. J. Todd, and Y. Ye. A surface of analytic centers and
primaldual infeasibleinteriorpoint algorithms for linear programming.
Mathematics of Operations Research, 20:135162, 1995.
[224] R. D. C. Monteiro and I. Adler. Interior path following primaldual
algorithms : Part I : Linear programming. Math. Programming, 44:27
41, 1989.
[225] R. D. C. Monteiro and I. Adler. Interior path following primaldual algorithms : Part II : Convex quadratic programming. Math. Programming,
44:4366, 1989.
168
BIBLIOGRAPHY
[226] R. D. C. Monteiro and J-S. Pang. On two interior-point mappings for

nonlinear semidenite complementarity problems. Working Paper, John
Hopkins University, 1996.
[227] R. D. C. Monteiro and P. R. Zanjcomo, Implementation of primal-dual
a
methods for semidenite programming based on Monteiro and Tsuchiya
directions and their variants, Technical Report, School of Ind. and Systems Engineering, Georgia Institute of Technology, Atlanta, 1997.
[228] R. D. C. Monteiro and Y. Zhang. A unied analysis for a class of pathfollowing primal-dual interior-point algorithms for semidenite programming. School of ISyE, Georgia Tech, GA 30332, 1995.
[229] J. J. Mor. The Levenberg-Marquardt algorithm: implementation and
e
theory. In G. A. Watson, editor: Numerical Analysis Springer-Verlag,
New York, 1977.
[230] J. Mor, Z. Wu. Global Continuation for Distance Geometry Problems.
e
SIAM J. Opt. 7:814836, 1997.
[231] K. G. Murty. Linear Complementarity, Linear and Nonlinear Programming. Heldermann, Verlag, Berlin, 1988.
[232] K. G. Murty and S. N. Kabadi. Some NP-complete problems in quadratic
and nonlinear programming. Math. Programming, 39:117-129, 1987.
[233] M. Muramatsu. Ane scaling algorithm fails for semidenite programming. Research report No.16, Department of Mechanical Engineering,
Sophia University, Japan, July 1996.
[234] M. Muramatsu and T. Tsuchiya. An ane scaling method with an
infeasible starting point: convergence analysis under nondegeneracy assumption. Annals of Operations Research, 62:325356, 1996.
[235] M. Muramatsu and R. Vanderbei, Primal-dual ane scaling algorithms
fail for semidenite programming, Technical Report, SOR, Princeton
University, Princeton, NJ, 1997.
[236] S. G. Nash and A. Sofer. A barrier method for largescale constrained
optimization. ORSA Journal on Computing, 5:4053, 1993.
[237] D. J. Newman. Location of the maximum on unimodal surfaces. J.
Assoc. Comput. Math., 12:395-398, 1965.
[238] A. Nemirovski, C. Roos and T. Terlaky, On Maximization of Quadratic
Forms Over Intersection of Ellipsoids with Common Center, Math. Program., 86:463-473, 1999.
BIBLIOGRAPHY
169
[239] A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method

Eciency in Optimization. John Wiley and Sons, Chichester, 1983.
Translated by E. R. Dawson from Slozhnost Zadach i Eektivnost
Metodov Optimizatsii, 1979, Glavnaya redaktsiya ziko-matematicheskoi
literatury, Izdatelstva Nauka.
[240] Y. E. Nesterov. Semidenite relaxation and nonconvex quadratic optimization. Optimization Methods and Software 9 (1998) 141-160.
[241] Yu. E. Nesterov and A. S. Nemirovskii. Interior Point Polynomial Methods in Convex Programming: Theory and Algorithms. SIAM Publications. SIAM, Philadelphia, 1993.
[242] Yu. E. Nesterov and M. J. Todd. Self-scaled barriers and interior-point
methods for convex programming. Mathematics of Operations Research,
22(1):142, 1997.
[243] Yu. E. Nesterov and M. J. Todd. Primal-dual interior-point methods
for self-scaled cones. Technical Report No. 1125, School of Operations
Research and Industrial Engineering, Cornell University, Ithaca, NY,
1995, to appear in SIAM J. Optimization.
[244] Yu. Nesterov, M. J. Todd and Y. Ye. Infeasible-start primal-dual methods
and infeasibility detectors for nonlinear programming problems. Technical Report No. 1156, School of Operations Research and Industrial
Engineering, Cornell University, Ithaca, NY, 1996.
[245] D. Niculescu and B. Nath. Ad-hoc positioning system. In IEEE GlobeCom, Nov. 2001.
[246] R. Palhares, C. de Souza, and P. Dias Peres. Robust H ltering for
uncertain discrete-time state-delayed systems. IEEE Transactions on
Signal Processing, 48(8):16961703, Aug. 2001.
[247] R. Palhares and P. Peres. LMI approach to the mixed H2 /H ltering design for discrete-time uncertain systems. IEEE Transactions on
Aerospace and Electronic Systems, 37(1):292296, Jan. 2001.
[248] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. PrenticeHall, Englewood Clis, NJ, 1982.
[249] P. M. Pardalos and J. B. Rosen. Constrained Global Optimization: Algorithms and Applications. Springer-Verlag, Lecture Notes in Computer
Sciences 268, 1987.
[250] J. Park, H. Cho, and D. Park. Design of GBSB neural associative memories using semidenite programming. IEEE Transactions on Neural
Networks, 10(4):946950, July 1999
170
BIBLIOGRAPHY
[251] P. Parrilo. Semidenite programming relaxations for semialgebraic problems. Mathematical Programming. To appear.
[252] G. Pataki. On the Rank of Extreme Matrices in Semidenite Programs
and the Multiplicity of Optimal Eigenvalues. Mathematics of Operations
Research, 23(2):339-358, 1998.
[253] E. Petrank, The hardness of approximation: gap location, Computational Complexity, 4 (1994), 133-157.
[254] S. Polijak, F. Rendl and H. Wolkowicz. A recipe for semidenite relaxation for 0-1 quadratic programming. Journal of Global Optimization,
7:51-73, 1995.
[255] R. Polyak. Modied barrier functions (theory and methods). Math.
Programming, 54:177222, 1992.
[256] F. A. Potra and Y. Ye. Interior point methods for nonlinear complementarity problems. Journal of Optimization Theory and Application,
88:617642, 1996.
[257] F. Potra and R. Sheng. Homogeneous interiorpoint algorithms for
semidenite programming. Reports On Computational Mathematics,
No. 82/1995, Department of Mathematics, The University of Iowa, 1995.
[258] E. Rains. A semidenite program for distillable entanglement. IEEE
Transactions on Information Theory, 47(7):29212933, November 2001.
[259] M. Ramana. An exact duality theory for semidenite programming and
its complexity implications. Math. Programming, 77:129162, 1997.
[260] M. Ramana, L. Tuncel and H. Wolkowicz. Strong duality for semidefinite programming. CORR 95-12, Department of Combinatorics and
Optimization, University of Waterloo, Waterloo, Ontario, Canada, 1995.
To appear in SIAM J. Optimization.
[261] L. Rasmussen, T. Lim, and A. Johansson. A matrix-algebraic approach
to successive interference cancellation in CDMA. IEEE Transactions on
Communications, 48(1):145151, Jan. 2000.
[262] F. Rendl and H. Wolkowicz. A semidenite framework for trust region
subproblems with applications to large scale minimization. Math. Programming, 77:273300, 1997.
[263] J. Renegar. A polynomialtime algorithm, based on Newtons method,
for linear programming. Math. Programming, 40:5993, 1988.
[264] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 1970.
BIBLIOGRAPHY
171
[265] C. Roos, T. Terlaky and J.-Ph. Vial. Theory and Algorithms for Linear
Optimization: An Interior Point Approach. John Wiley & Sons, Chichester, 1997.
[266] M. Saadat and T. Chen. Design of nonuniform multirate lter banks by
semidenite programming. IEEE Transactions on Circuits and Systems
II: Analog and Digital Signal Processing, 47(11):13111314, November
2000.
[267] R. Saigal and C. J. Lin. An infeasible start predictor corrector method
for semi-denite linear programming. Tech. Report, Dept. of Industrial
and Operations Engineering, University of Michigan, Ann Arbor, MI,
Dec 1995.
[268] C. Savarese, J. Rabay, K. Langendoen. Robust Positioning Algorithms
for Distributed AdHoc Wireless Sensor Networks. USENIX Annual
Technical Conference, 2002.
[269] A. Savvides, C.C. Han, M. B. Srivastava. Dynamic FineGrained Localization in AdHoc Networks of Sensors. Proc. 7th MOBICOM, 166179,
2001.
[270] A. Savvides, H. Park, M. B. Srivastava. The Bits and Flops of the n
hop Multilateration Primitive for Node Localization Problems. Proc. 1st
WSNA, 112121, 2002.
[271] A. Schrijver. Theory of Linear and Integer Programming. John Wiley &
Sons, New York, 1986.
[272] U. Shaked, L. Xie, and Y. Soh. New approaches to robust minimum variance lter design. IEEE Transactions on Signal Processing, 49(11):2620
2629, November 2001.
[273] Y. Shang, W. Ruml, Y. Zhang, M. P. J. Fromherz. Localization from
Mere Connectivity. Proc. 4th MOBIHOC, 201212, 2003.
[274] W. F. Sheppard, On the calculation of the double integral expressing
normal correlation, Transactions of the Cambridge Philosophical Society
19 (1900) 23-66.
[275] H. D. Sherali and W. P. Adams, Computational advances using the
reformulation-linearization technique (rlt) to solve discrete and continuous nonconvex problems. Optima, 49:1-6, 1996.
[276] Masayuki Shida, Susumu Shindoh and Masakazu Kojima. Existence of
search directions in interior-point algorithms for the SDP and the monotone SDLCP. Research Report B-310, Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology, Oh-Okayama, Meguro,
Tokyo 152 Japan, 1996. To appear in SIAM J. Optimization.
172
BIBLIOGRAPHY
[277] N. Z. Shor, Quadratic optimization problems, Soviet Journal of Computer and Systems Sciences, 25:1-11, 1987. Originally published in
Tekhnicheskaya Kibernetika, No. 1, 1987, pp. 128-139.
[278] M. Skutella, Semidenite relaxations for parallel machine scheduling,
in Proceedings of the 39th Annual IEEE Symposium on Foundations of
Computer Science (FOCS98), pp. 472-481, 1998.
[279] M. Skutella, Convex quadratic and semidenite programming relaxations in scheduling, Journal of the ACM, 48(2), 206-242, 2001.
[280] G. Sonnevend. An analytic center for polyhedrons and new classes
of global algorithms for linear (smooth, convex) programming. In
A. Prekopa, J. Szelezsan, and B. Strazicky, editors, System Modelling
and Optimization : Proceedings of the 12th IFIPConference held in Budapest, Hungary, September 1985, volume 84 of Lecture Notes in Control
and Information Sciences, pages 866876. Springer Verlag, Berlin, Germany, 1986.
[281] D. C. Sorenson. Newtons method with a model trust region modication.
SIAM J. Numer. Anal., 19:409-426, 1982.
[282] A. Srivastav and K. Wolf, Finding dense subgraphs with semidenite
programming, in Approximation Algorithms for Combinatorial Optimization, K. Jansen and J. Rolim (Eds.) (1998), 181-191.
[283] P. Stoica, T. McKelvey, and J. Mari. Ma estimation in polynomial time.
IEEE Transactions on Signal Processing, 48(7):19992012, July 2000.
[284] J.
F.
Sturm.
SeDumi,
Let
SeDuMi
seduce
http://fewcal.kub.nl/sturm/software/sedumi.html, October 2001.
you.
[285] J. F. Sturm and S. Zhang. Symmetric primal-dual path following algorithms for semidenite programming. Report 9554/A, Econometric
Institute, Erasmus University, Rotterdam, 1995
[286] J.F. Sturm and S. Zhang. On cones of nonnegative quadratic functions.
SEEM2001-01, Chinese University of Hong Kong, March 2001.
[287] K. Tanabe. Complementarityenforced centered Newton method for
mathematical programming. In K. Tone, editor, New Methods for Linear
Programming, pages 118144, The Institute of Statistical Mathematics,
467 Minami Azabu, Minatoku, Tokyo 106, Japan, 1987.
[288] K. Tanabe. Centered Newton method for mathematical programming. In
M. Iri and K. Yajima, editors, System Modeling and Optimization, Proceedings of the 13th IFIP Conference, pages 197206, Springer-Verlag,
Tokyo, Japan, 1988.
BIBLIOGRAPHY
173
[289] R. A. Tapia. Current research in numerical optimization. SIAM News,

20:1011, March 1987.
[290] S. P. Tarasov, L. G. Khachiyan, and I. I. Erlikh. The method of inscribed

ellipsoids. Soviet Mathematics Doklady, 37:226230, 1988.
[291] H. Tan and L. K. Rasmussen. The application of semidenite programming for detection in CDMA. IEEE Journal on Selected Areas in Communications, 19(8):14421449, August 2001.
[292] Z. Tan, Y. Soh, and L. Xie. Envelope-constrained H lter design: an
LMI optimization approach. IEEE Transactions on Signal Processing,
48(10):29602963, Oct. 2000.
[293] Z. Tan, Y. Soh, and L. Xie. Envelope-constrained H FIR lter design.
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal
Processing, 47(1):7982, Jan. 2000.
[294] T. Terlaky, editor. Interior-Point Methods in Mathematical Programming
Kluwer Academic Publisher, Boston, 1996.
[295] M. J. Todd, Improved bounds and containing ellipsoids in Karmarkars
linear programming algorithm. Mathematics of Operations Research,
13:650-659, 1988.
[296] M. J. Todd. Potential-reduction methods in mathematical programming.
[297] M. J. Todd, A study of search directions in primal-dual interior-point
methods for semidenite programming, Optimization Methods and Software 11&12 (1999) 1-46.
[298] M. J. Todd and B. P. Burrell. An extension of Karmarkars algorithm
for linear programming using dual variables. Algorithmica, 1(4):409424,
1986.
[299] M. J. Todd and Y. Ye. A centered projective algorithm for linear programming. Mathematics of Operations Research, 15:508529, 1990.
[300] K. C. Toh. A note on the calculation of steplengths in interior point
methods for semidenite programming. To appear Computational Optimization and Applications, 2001.
[301] K. C. Toh, M. J. Todd, and R. H. Ttnc. SDPT3 A MATLAB
uu u
software package for semidenite programming, version 1.3 Optimization
Software and Methods, 11:545-581, 1999.
[302] K. C. Toh and M. Kojima. Solving some large scale semidenite programs
via the conjugate residual method. Working paper, August 2000.
174
BIBLIOGRAPHY
[303] Y. L. Tong, The Multivariate Normal Distribution, Springer-Verlag, New

York, 1990.
[304] J. F. Traub, G. W. Wasilkowski and H. Wozniakowski. InformationBased Complexity. Academic Press, San Diego, CA, 1988.
[305] P. Tseng. Search directions and convergence analysis of some infeasible path-following methods for the monotone semi-denite LCP. Report,
Department of Mathematics, University of Washington, Seattle, Washington 98195, 1996.
[306] P. Tseng. Approximating QP. Report, Department of Mathematics,
University of Washington, Seattle, Washington, 2003.
[307] C. Tseng and B. Chen. H fuzzy estimation for a class of nonlinear
discrete-time dynamic systems. IEEE Transactions on Signal Processing,
49(11):26052619, Nov. 2001.
[308] H. Tuan, P. Apkarian, and T. Nguyen. Robust and reduced-order ltering: new LMI-based characterizations and methods. IEEE Transactions
on Signal Processing, 49(12):29752984, Dec. 2001.
[309] H. Tuan, P. Apkarian, T. Nguyen, and T. Narikiyo. Robust mixed
H2 /H ltering of 2-D systems. IEEE Transactions on Signal Processing, 50(7):17591771, July 2002.
[310] A. W. Tucker. Dual systems of homogeneous linear relations, in H. W.
Kuhn and A. W. Tucker, Editors, Linear Inequalities and Related Systems, page 3, Princeton University Press, Princeton, NJ, 1956.
[311] P. M. Vaidya. A new algorithm for minimizing a convex function over
convex sets. Math. Programming, 73:291341, 1996.
[312] P. M. Vaidya and D. S. Atkinson. A technique for bounding the number of
iterations in path following algorithms. In P. Pardalos, editor, Complexity
in Numerical Optimization page 462489, World Scientic, New Jersey,
1993.
[313] L. Vandenberghe and S. Boyd. A primaldual potential reduction method
for problems involving matrix inequalities. Math. Programming, 69:205
236, 1995.
[314] L. Vandenberghe and S. Boyd. Semidenite programming. SIAM Review,
38(1):4995, 1996.
[315] L. Vandenberghe, S. Boyd, and S.P. Wu. Determinant maximization
with linear matrix inequality constraints. Technical Report, Information
Systems Laboratory, Stanford University, CA, 1996.
BIBLIOGRAPHY
175
[316] R. J. Vanderbei. LOQO: An Interior Point Code for Quadratic Programming. Program in Statistics & Operations Research, Princeton University, NJ, 1995.
[317] K. Varadarajan, S. Venkatesh and J. Zhang, On Approximating the
Radii of Point Sets in High Dimensions, In Proc. 43rd Annu. IEEE
Sympos. Found. Comput. Sci., 2002.
[318] S. A. Vavasis. Nonlinear Optimization: Complexity Issues. Oxford Science, New York, NY, 1991.
[319] F. Wang and V. Balakrishnan. Robust Kalman lters for linear timevarying systems with stochastic parametric uncertainties. IEEE Transactions on Signal Processing, 50(4):803813, April 2002.
[320] S. Wang, L. Xie, and C. Zhang. H2 optimal inverse of periodic FIR digital
lters. IEEE Transactions on Signal Processing, 48(9):26962700, Sept.
2000.
[321] T. Wang, R. D. C. Monteiro, and J.-S. Pang. An interior point potential reduction method for constrained equations. Math. Programming,
74(2):159196, 1996.
[322] H. Wolkowicz and Q. Zhao, Semidenite programming for the graph
partitioning problem, manuscript, 1996, to appear Discrete Applied
Math..
[323] M. H. Wright. A brief history of linear programming. SIAM News,
18(3):4, November 1985.
[324] S. J. Wright. Primal-Dual Interior-Point Methods SIAM, Philadelphia,
1996.
[325] C. Xu, Y. Ye and J. Zhang. Approximate the 2-Catalog Segmentation
Problem Using Semidenite Programming Relaxations. Optimization
Methods and Software, 18:(6):705 - 719, 2003
[326] G. Xue and Y. Ye. Ecient algorithms for minimizing a sum of Euclidean
norms with applications. Working Paper, Department of Management
Sciences, The University of Iowa, 1995. To appear in SIAM J. Optimization.
[327] F. Yang and Y. Hung. Robust mixed H2 /H ltering with regional
pole assignment for uncertain discrete-time systems. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications,
49(8):12361241, Aug. 2002.
[328] M. Yannakakis. Computing the minimum llin is NPcomplete. SIAM
J. Algebraic Discrete Methods, pages 7779, 1981.
176
BIBLIOGRAPHY
[329] Y. Ye. Interior Point Algorithms : Theory and Analysis. WileyInterscience Series in Discrete Mathematics and Optimization. John Wiley & Sons, New York, 1997.
[330] Y. Ye. Karmarkars algorithm and the ellipsoid method. Operations
Research Letters, 6:177182, 1987.
[331] Y. Ye. An O(n3 L) potential reduction algorithm for linear programming.
[332] Y. Ye. Combining binary search and Newtons method to compute real
roots for a class of real functions. Journal of Complexity, 10:271280,
1994.
[333] Y. Ye. Approximating quadratic programming with bound and quadratic
constraints. Mathematical Programming 84 (1999) 219-226.
[334] Y. Ye. A .699 approximation algorithm for Max-Bisection. Mathematical
Programming 90:1 (2001) 101-111
[335] Y. Ye and M. Kojima. Recovering optimal dual solutions in Karmarkars
polynomial algorithm for linear programming. Math. Programming,
39:305317, 1987.
[336] Y. O. Gler, R. A. Tapia, and Y. Zhang. A quadratically convergent
Ye,
u
O( nL)iteration algorithm for linear programming. Math. Programming, 59:151162, 1993.
[337] Y. Ye, M. J. Todd and S. Mizuno. An O( nL)iteration homogeneous

and selfdual linear programming algorithm. Mathematics of Operations
Research, 19:5367, 1994.
[338] Y. Ye and S. Zhang. New results on quadratic minimization. Working
Paper, Department of SSEM, The Chinese University of Hong Kong
(May, 2001), to appear in SIAM J. Optimization.
[339] Y. Ye and J. Zhang. An improved algorithm for approximating the radii
of point sets. APPROX 2003, August 2003.
[340] Q. Zhao, Semidenite Programming for Assignment and Partitioning
Problems PhD thesis, University of Waterloo, 1996.
[341] Q. Zhao, S. Karisch, F. Rendl, and H. Wolkowicz, Semidenite programming relaxations for the quadratic assignment problems, J. Combinatorial Optimization, 2.1, 1998, CORR 95-27.
[342] H. Zhou, L. Xie, and C. Zhang. A direct approach to H2 optimal deconvolution of periodic digital channels. IEEE Transactions on Signal
Processing, 50(7):16851698, July 2002.
BIBLIOGRAPHY
177
[343] J. Zowe, M. Kocvara, and M. Bendsoe. Free material optimization

via mathematical programming. Mathematical Programming, 9:445466,
1997.
[344] U. Zwick, Approximation algorithms for constraint satisfaction problems involving at most three variables per constraint, In 9th SODA
(1998) 201-210.
[345] U. Zwick. Outward rotationvs: a tool for rounding solutions of semidefinite programming relaxations, with applications to max cut and other
problems. In Proceedings of the 31th Symposium on Theory of Computation, pages 679687, 1999.

Conic Programming

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Conic Programming

Transféré par

Droits d'auteur :

Formats disponibles

Linear Conic Programming

CHAPTER 1. INTRODUCTION AND PRELIMINARIES

necessary mathematical preliminaries. We also present several decision and

This section summarizes mathematical background material for linear algebra,

n, the symbol Rn (Rn , Rn ) denotes the set of vectors with n components in

R (R+ , R+ ). We call Rn the interior of Rn .

1.2. MATHEMATICAL PRELIMINARIES

typically be used to represent scalars. For convenience, we sometime write a

The l2 norm of a vector x is given by

and the l norm is

= max{|x1 |, |x2 |, ..., |xn |}.

In general, the p norm is

The dual of the p norm, denoted by . , is the q norm, where

CHAPTER 1. INTRODUCTION AND PRELIMINARIES

and positive semi-denite (PSD), denoted by Q

If Q 0, then Q is called negative denite (ND), denoted by Q 0; if Q 0,

This is a generalization of the vector inner product to matrices. The matrix

Mn denote the set of positive semi-denite matrices in Mn . Mn denotes the

set of positive denite matrices in Mn . We call Mn the interior of Mn .

A point x is a limit point of {xk } if there is a subsequence of {xk } convergent

g(x) goes to zero faster than x does:

1.2. MATHEMATICAL PRELIMINARIES

If x is a member of the set , we write x ; if y is not a member of , we

(topological) interior of any set , denoted , is the set of points in which

are the centers of some balls contained in . The closure of , denoted , is

where , is an inner product operation for space E.

CHAPTER 1. INTRODUCTION AND PRELIMINARIES

Figure 1.1: Polyhedral and nonpolyhedral cones.

1.2. MATHEMATICAL PRELIMINARIES

A bounded polyhedron is called polytope.

The geometric interpretation of the theorem is that, given a convex set C

Figure 1.3: Illustration of the separating hyperplane theorem; an exterior point

The real function f (x) is said to be continuous at x if xk x implies f (xk )

CHAPTER 1. INTRODUCTION AND PRELIMINARIES

A function f (x) is called homogeneous of degree k if f (x) = k f (x) for all

Example 1.6 Let c Rn be given and x Rn . Then cT x is homogeneous of

is homogeneous of degree 0, where log is the natural logarithmic function. Let

C Mn be given and X Mn . Then xT Cx is homogeneous of degree 2, C X

If f C 2 , we dene the Hessian of f to be the n2 -dimensional symmetric matrix

If f = (f1 , f2 , ..., fm )T Rm , the Jacobian matrix of f is

1.2. MATHEMATICAL PRELIMINARIES

Theorem 1.2 (Taylor expansion) Let f C 1 be in a region containing the line

Furthermore, if f C 2 then there is a , 0 1, such that

f (x)(y x) + (1/2)(y x)T

Hadamard: given A Rmn with columns a1 , a2 , ..., an , then

CHAPTER 1. INTRODUCTION AND PRELIMINARIES

Some Basic Decision and Optimization Problems

A decision or optimization problem has a form that is usually characterized by

System of linear equations

Given A Rmn and b Rm , the problem is to solve m linear equations for n

1.3. SOME BASIC DECISION AND OPTIMIZATION PROBLEMS

Linear least-squares problem

In the former format, the term AT y c 2 is called the objective function,

Given a y, to see if y Sp is as the same as the original problem. However, from

System of linear inequalities

Given A Rmn and b Rm , the problem is to nd a solution x Rn

CHAPTER 1. INTRODUCTION AND PRELIMINARIES

Traditionally, a point in Sp is called a feasible solution, and a strictly positive

1.3. SOME BASIC DECISION AND OPTIMIZATION PROBLEMS

Linear programming (LP)

Given A Rmn , b Rm and c, l, u Rn , the linear programming (LP)

where some elements in l may be meaning that the associated variables

Again, given an x, to see if x Sp is as dicult as the original problem.

where y Rm and s Rn . The components of s are called dual slacks. Denote