Vous êtes sur la page 1sur 2

Optimal parameter selection for the

alternating direction method of multipliers


(ADMM): quadratic problems

Euhanna Ghadimi

Andre Teixeira

Mikael Johansson

ACCESS Linnaeus Center, Electrical Engineering, Royal Institute of


Technology, Stockholm, Sweden (e-mail: {euhanna, andretei,
mikaelj}@ee.kth.se).
Abstract: The alternating direction method of multipliers (ADMM) has emerged as a powerful
technique for large-scale structured optimization. Despite many recent results on the convergence
properties of ADMM, a quantitative characterization of the impact of the algorithm parameters
on the convergence times of the method is still lacking. In this paper we nd the optimal
algorithm parameters that minimize the convergence factor of the ADMM iterates in the context
of constrained quadratic programming. Numerical study shows that our parameter selection
rules signicantly improves the convergence time of the algorithm.
Keywords: Algorithm, Optimization, Convergence rate
1. INTRODUCTION
The alternating direction method of multipliers is a pow-
erful algorithm for solving structured convex optimization
problems. While the ADMM method was introduced for
optimization in the 1970s, its origins can be traced back
to the alternating direction implicit techniques for solving
elliptic and parabolic partial dierence equations devel-
oped in the 1950s (see Boyd et al. (2011) and references
therein). ADMM enjoys the strong convergence properties
of the method of multipliers and the decomposability prop-
erty of dual ascent and is particularly useful for solving
optimization problems that are too large to be handled
by generic optimization solvers. The method has found a
large number of recent applications in diverse areas such
as compressed sensing Yang and Zhang (2011), regular-
ized estimation Wahlberg et al. (2012), image process-
ing Figueiredo and Bioucas-Dias (2010), machine learn-
ing Forero et al. (2010), and resource allocation in wireless
networks Joshi et al. (2012). This broad range of applica-
tions has triggered a strong interest in developing a better
understanding of the theoretical properties of ADMMDeng
and Yin (2012); Luo (2012); Boley (2012).
Mathematical decomposition is a classical approach for
parallelizing numerical optimization algorithms. If the de-
cision problem has a favorable structure, then decomposi-
tion techniques like primal and dual decomposition allow
to distribute the computations on multiple processorsLas-
don (1970); Bertsekas and Tsitsiklis (1989). The processors
are coordinated towards optimality by solving a suitable
master problem using gradient or subgradient techniques.
If problem parameters such as Lipschitz constants and
convexity parameters of the cost function are known, the
optimal step-sizes and associated convergence rates are
well-known (e.g., Nesterov (2004)). A drawback of the

This work was sponsored in part by the Swedish Foundation for


Strategic Research, SSF, and the Swedish Research Council, VR.
gradient method is its sensitivity to the choice of the step-
size, even to the point where poor parameter selection
can lead to algorithm divergence. In contrast, the ADMM
technique is surprisingly robust to poorly selected algo-
rithm parameters. In fact, under rather mild conditions,
the method is guaranteed to converge for all positive values
of its single step-size parameter. It is now known that if the
objective functions are strongly convex and have Lipschitz-
continuous gradients, then some variations of the ADMM
iterations converge linearly to the stationary point Deng
and Yin (2012) Luo (2012). The application of ADMM to
quadratic problems was considered in Boley (2012) and it
was conjectured that the iterates converge linearly in the
neighborhood of the optimal solution. It is important to
stress that even when the ADMM method has linear con-
vergence rate, the number of iterations ensuring a desired
accuracy, i.e. the convergence time, is heavily aected by
choice of the algorithm parameter. In this paper, we nd
the algorithm parameters that minimize the convergence
factor of the ADMM iterations for quadratic program-
ming with linear inequality constraints. We establish linear
convergence rate and develop techniques to minimize the
convergence factor of the ADMM iterates. This allows
us to give explicit expressions for the optimal algorithm
parameters and the associated convergence factors.
2. OPTIMAL CONVERGENCE FACTOR FOR
QUADRATIC PROGRAMMING
In this section, we consider a quadratic programming (QP)
problem of the form
minimize
1
2
x

Qx + q

x
subject to Ax c
(1)
where Q S
n
++
, q R
n
, A R
mn
is full rank and
c R
m
.
The QP-problem (1) can be put on the ADMM standard
form by introducing a slack vector z and putting an innite
penalty on negative components in z, i.e.
minimize
1
2
x

Qx + q

x + I
+
(z)
subject to Ax c + z = 0,
(2)
where I
+
() is the indicator function of the positive or-
thant. The associated augmented Lagrangian is
L

(x, z, u) =
1
2
x

Qx + q

x + I
+
(z) +

2
Ax c + z + u
2
2
,
where u = /, which leads to the scaled ADMM itera-
tions
x
k+1
= (Q + A

A)
1
[q + A

(z
k
+ u
k
c)],
z
k+1
= max{0, Ax
k+1
u
k
+ c},
u
k+1
= u
k
+ Ax
k+1
c + z
k+1
.
(3)
The next theorem guarantees that (3) converges globally
at linear rate to the global optimum of (1). The optimal
step-size

and the smallest achievable convergence factor


are characterized immediately afterwards.
Theorem 1. Consider the QP (1) and the corresponding
ADMM iterations (3). For all values R
++
the algo-
rithm (3) converges at a linear rate to the global opti-
mum (1).
Theorem 2. Consider the QP (1) and the corresponding
ADMM iterations (3). If the constraint matrix A is either
full row-rank or invertible then the optimal step-size and
the corresponding convergence factor are

1
(AQ
1
A

)
n
(AQ
1
A

1
,

=

n
(AQ
1
A

n
(AQ
1
A

) +

1
(AQ
1
A

)
n
(AQ
1
A

)
,
(4)
where
1
(AQ
1
A

) and
n
(AQ
1
A

) are the smallest


and largest eigenvalues of AQ
1
A

, respectively.
3. NUMERICAL EXAMPLES
In this section, we evaluate our parameter selection rules
on numerical examples. Fig. 1 illustrates the convergence
of the ADMM iterations for the 170 QPs (available on-
line at Ghadimi et al. (2013)) as a function of the step-
size . Since A

has non-empty null-space, our previous


parameter is an heuristic for the actual optimal unknown
parameter. As shown in Fig. 1, our heuristic step-size

results in a number of iterations close to the empirical


minimum.
4. CONCLUSIONS AND FUTURE WORK
We studied optimal parameter selection for the alternating
direction method of multipliers for quadratic program-
ming. The global convergence of the algorithm at linear
rate was established and explicit expressions for the pa-
rameters that ensure the smallest possible convergence
factors provided. We validated the analytical results on
numerical examples. As future work, we plan to extend
the analytical results for more general classes of objective
functions.
0 20 40 60 80 100
10
1
10
2
10
3
10
4
*

n
o
.

i
t
e
r
a
t
i
o
n
s


N
avg
N
max
N
min
Fig. 1. Number of iterations for ADMM applied to the QP
problems.
REFERENCES
Bertsekas, D.P. and Tsitsiklis, J.N. (1989). Parallel and
distributed computation: numerical methods. Prentice-
Hall, Upper Saddle River, NJ, USA.
Boley, D. (2012). Linear convergence of ADMM on a model
problem. Technical report, Department of Computer
Science and Engineering, University of Minnesota, TR
12-009.
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J.
(2011). Distributed optimization and statistical learning
via the alternating direction method of multipliers.
Foundations and Trends in Machine Learning, 3 Issue:
1, 1122.
Deng, W. and Yin, W. (2012). On the global and lin-
ear convergence of the generalized alternating direction
method of multipliers. Technical report, Rice University
CAAM Technical Report ,TR12-14, 2012.
Figueiredo, M. and Bioucas-Dias, J. (2010). Restoration of
poissonian images using alternating direction optimiza-
tion. IEEE Transactions on Image Processing, 19(12),
31333145.
Forero, P.A., Cano, A., and Giannakis, G.B. (2010).
Consensus-based distributed support vector machines.
J. Mach. Learn. Res., 99, 16631707.
Ghadimi, E., Teixeira, A., Shames, I., and Johansson,
M. (2013). mpc dataset. Technical report. URL
https://www.dropbox.com/s/x2w74mpbezejbee/MPC_
QP_quadtank_170_Np5_SxQ.mat.
Joshi, S., Codreanu, M., and Latva-aho, M. (2012). Dis-
tributed resource allocation for MISO downlink systems
via the alternating direction method of multipliers. In
Forty Sixth Asilomar Conference on Signals, Systems
and Computers. doi:10.1109/ACSSC.2012.6489052.
Lasdon, L. (1970). Optimization theory for large systems.
Courier Dover Publications.
Luo, Z. (2012). On the linear convergence of the alternat-
ing direction method of multipliers. ArXiv e-prints.
Nesterov, Y. (2004). Introductory Lectures on Convex
Optimization: A Basic Course. Springer-Verlag New
York, LCC.
Wahlberg, B., Boyd, S., Annergren, M., and Wang, Y.
(2012). An ADMM algorithm for a class of total
variation regularized estimation problems. In 16th IFAC
Symposium on System Identication.
Yang, J. and Zhang, Y. (2011). Alternating direction al-
gorithms for l
1
-problems in compressive sensing. SIAM
J. Sci. Comput., 33(1), 250278.

Vous aimerez peut-être aussi