by
Shidong Shan
BSc (Hon.), Acadia University, 2006
A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF
Master of Science
in
The Faculty of Graduate Studies
(Computer Science)
The University of British Columbia (Vancouver)
July 2008 c Shidong Shan 2008
ii
To my father, Kunsong Shan
ii
The well known LevenbergMarquardt method is used extensively for solving nonlinear leastsquares problems. We describe an extension of the Levenberg Marquardt method to problems with bound constraints on the variables. Each iteration of our algorithm approximately solves a linear leastsquares problem subject to the original bound constraints. Our approach is especially suited to largescale problems whose functions are expensive to compute; only matrix vector products with the Jacobian are required. We present the results of nu merical experiments that illustrate the eﬀectiveness of the approach. Moreover, we describe its application to a practical curve ﬁtting problem in ﬂuorescence optical imaging.
iii
Abstract 
. 
. 
. . . . . . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
ii 
Contents 
. . . . . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
iii 

List of Tables 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
v 

List of Figures 
. . . . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
vi 

Acknowledgements . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
vii 


. . . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
1 


. . . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
2 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
3 


. 
5 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
5 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
6 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
6 


. 
8 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
8 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
9 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
10 


. 
12 

. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
12 

Normal equations
. . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
12 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
13 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
14 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
14 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
16 


. 
17 


. 
20 

. . . . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
20 

. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
21 

. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
24 

. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
25 
Contents
iv

. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
27 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
30 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
30 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
31 


. 
31 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
34 


. 
. 
. 
. 
. 
. 
. 
. 
. 
38 


. 
. 
. 
. 
. 
. 
. 
. 
38 


. 
40 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
41 


. 
42 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
42 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
43 


. 
43 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
43 


. 
45 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
48 


. 
. 
. 
. 
. 
. 
. 
. 
49 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
52 

Bibliography 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
53 
v

. 
. 
. 
. 
. 
32 


. 
. 
. 
. 
. 
. 
33 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
36 


47 

. 
48 



data points with depth > 1mm, (C) data points with SNR > 20, 

(D) data points with SNR > 20 and depth > 1mm 
. 
48 



data points with SNR > 20 and depth > 1mm 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
49 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 



BCNLS 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 
vi

. 
13 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
18 



566: number of function evaluations. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
34 



566: CPU 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
35 


function evaluations 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
37 


. 
37 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
39 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
39 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
40 


. 
44 



without ﬁtting DC component 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
45 



ﬁtting DC component 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
46 


. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
47 


. 
. 
. 
. 
. 
49 
vii
I am grateful to my supervisor Dr. Michael Friedlander, who made the subject of
Numerical Optimization exciting and beautiful. With his knowledge, guidance, and assistance, I have managed to accomplish this work.
I would like to thank Dr. Ian Mitchell for his careful reading of this thesis and his helpful comments that have improved this thesis.
I am fortunate to have had the ﬁnancial support for this work from the Natural Sciences and Engineering Research Council of Canada (NSERC) in
the form of a postgraduate scholarship. Also, I am thankful for the MITACS Accelerate BC internship opportunity at ART Advanced Research Technologies Inc. It has been a privilege to work with Guobin Ma, Nicolas Robitaille, and Simon Fortier in the Research and Development group at ART Advanced Re search Technologies Inc. Their work and ideas on curve ﬁtting in ﬂuorescence
optical imaging have formed a strong foundation for Chapter 6 in this thesis. In addition, I wish to thank the three of them for reading a draft of Chapter 6 and their valuable comments and suggestions for improving the chapter. The Scientiﬁc Computing Laboratory at the University of British Columbia has been a pleasant place to work. I have enjoyed the support and company of my current and past colleagues: Ewout van den Berg, Elizabeth Cross, Marisol Flores Garrido, Hui Huang, Dan Li, Tracy Lau, and Jelena Sirovljevic. Among them, Ewout van den Berg has always been of great help to me.
I am deeply indebted to my parents and family for their unconditional love
and understanding. My brother Hongqing has inspired me and set a great example for me to fulﬁll my dreams. Thanks are also due to many of my friends for their care and friendship over the years. Above all, I sincerely thank my best friend and partner Bill Hue, whose love, support, and faithful prayers give me strength and courage to cope in diﬃcult times. Without him, I could have not achieved this far in my academic endeavor. Finally, I want to dedicate this thesis to my late father, who suddenly passed away while I was working on this thesis. In his life, my father was always passionate that his children would have every educational opportunity. There is no doubt that my father would be very proud if he could live to witness another academic accomplishment of his youngest daughter.
1
This thesis proposes an algorithm for nonlinear leastsquares problems subject
to simple bound constraints on the variables. The problem has the general form
minimize
x∈R ^{n}
subject to
^{1}
_{2} r(x) ^{2}
2
≤ x ≤ u,
where , u ∈ R ^{n} are vectors of lower and upper bounds on x, and r is a vector
of realvalued nonlinear functions
r(x) =
r _{1} (x)
r _{2} (x)
. . .
r _{m} (x)
;
we assume throughout that each r _{i} (x) : R ^{n} → R is a twice continuously diﬀer
entiable nonlinear function.
Nonlinear leastsquares problems arise in many dataﬁtting applications in
science and engineering. For instance, suppose that we model some process by
a nonlinear function φ that depends on some parameters x, and that we have
the actual measurements y _{i} at time t _{i} ; our aim is to determine parameters x
so that the discrepancy between the predicted and observed measurements is
minimized. The discrepancy between the predicted and observed data can be
expressed as
r _{i} (x) = φ(x, t _{i} ) − y _{i} .
Then, the parameters x can be estimated by solving the nonlinear leastsquares
problem
minimize
x
^{1} _{2} r(x) _{2} ^{2} .
(1.1)
Very large nonlinear leastsquares problems arise in numerous areas of ap
plications, such as medical imaging, tomography, geophysics, and economics. In
many instances, both the number of variables and the number of residuals are
large. It is also very common that only the number of residuals is large. Many
approaches exist for the solution of linear and nonlinear leastsquares problems,
especially for the case where the problems are unconstrained. See Bjorck [3] for
a comprehensive discussion of algorithms for least squares, including detailed
error analyses.
Chapter 1.
Introduction
2
Because of the high demand in the industry, nonlinear leastsquares soft
ware is fairly prevalent. Major numerical software libraries such as SAS, and
programming environments such as Mathematica and Matlab, contain nonlin
ear leastsquares implementations. Other high quality implementations include,
for example, DFNLP [50], MINPACK [38, 41], and NL2SOL [27]; see Mor´e and
Wright [44, Chapter 3]. However, most research has focused on the nonlinear
leastsquares problems without constraints. For problems with constraints, the
approaches are fewer (also see Bjorck [3]). In practice, the variables x often
have some known lower and upper bounds from its physical or natural settings.
The aim of this thesis is to develop an eﬃcient solver for largescale nonlinear
leastsquares problems with simple bounds on variables.
The organization of the thesis is as follows. In the remainder of this chapter, we
introduce some deﬁnitions and notation that we use in this thesis. Chapter 2
reviews some theoretical background on unconstrained and boundconstrained
optimization. On unconstrained optimization, we summarize the basic strate
gies such as linesearch and trustregion methods. For boundconstrained opti
mization, we review activeset and gradientprojection methods. We also brieﬂy
discuss some existing algorithms for solving the boundconstrained optimization
problems, including ASTRAL [53], BCLS [18], and LBFGSB [57].
Chapter 3 focuses on unconstrained leastsquares problems. We start with
a brief review of linear leastsquares problems and the normal equations. We
also review twonorm regularization techniques for solving illconditioned linear
systems. For nonlinear leastsquares problems, we discuss two classical non
linear least squares algorithms: the GaussNewton method and the Levenberg
Marquardt method. We also explain the selfadaptive updating strategy for the
damping parameter in the LevenbergMarquardt method. These two methods
and the updating strategy are closely related to the proposed algorithm in this
thesis.
Chapter 4 is the main contribution of this thesis. We explain our pro
posed algorithm, named BCNLS, for solving the boundconstrained nonlinear
leastsquares problems. We describe our algorithm within the ASTRAL [53]
framework. The BCNLS algorithm is based on solving a sequence of linear
leastsquares subproblems; the bounds in the subproblem are copied verbatim
from the original nonlinear problem. The BCLS leastsquares solver forms the
computational kernel of the approach.
The results of numerical experiments are presented in Chapter 5. In Chap
ter 6, we illustrate one real dataﬁtting application arising in ﬂuorescence optical
imaging. The BCNLS software package and its documentation are available at
http://www.cs.ubc.ca/labs/scl/bcnls/.
Chapter 1.
Introduction
3
In this section, we brieﬂy deﬁne the notation used throughout the thesis.
The gradient of the function f(x) is given by the nvector
g(x) = ∇f(x) =
∂f(x)/∂x
_{1}
∂f(x)/∂x
_{2}
. . .
∂f(x)/∂x
_{n}
,
and the Hessian is given by
H(x) = ∇ ^{2} f(x) =
∂ ^{2} f(x) ∂x ^{2} ∂ ^{2} f(x) ∂x _{2} ∂x _{1}
1
.
. .
∂ ^{2} f(x) ∂x _{n} ∂x _{1}
∂ ^{2} f(x)
∂x _{1} ∂x _{2}
∂ ^{2} f(x) ∂x ^{2}
2
.
. .
∂ ^{2} f(x) ∂x _{n} ∂x _{2}
·
·
·
·
·
·
.
.
.
·
·
·
∂ ^{2} f(x)
∂x _{1} ∂x _{n}
∂ ^{2} f(x)
∂x _{2} ∂x _{n}
. . .
∂ ^{2} f(x) ∂x ^{2}
n
.
The Jacobian of r(x) is the mbyn matrix
J(x) = ∇r(x) ^{T} =
∇r _{1} (x) ^{T}
∇r _{2} (x) ^{T}
.
. .
∇r _{m} (x) ^{T}
.
For nonlinear leastsquares problems, the objective function has the special form
and, in this case,
f(x) =
1
_{2} r(x) ^{T} r(x),
g(x) = J(x) ^{T} r(x),
H(x) = J(x) ^{T} J(x) +
m
r _{i} (x)∇ ^{2} r _{i} (x).
i=1
(1.2)
(1.3)
(1.4)
Most specialized algorithms for nonlinear leastsquares problems exploit the
special structure of the nonlinear leastsquares objective function (1.2). Our
proposed algorithm also explores this structure and makes use of the above
structural relations.
We use the following abbreviations throughout:
• 
x _{k} : the value of x at iteration k; 
• 
x ^{∗} : a local minimizer of f; 
• _{k} , u _{k} : the lower and upper bounds for the subproblem at iteration k.
Chapter 1.
Introduction
4
• 
r _{k} = r(x _{k} ), vector values of r at x _{k} ; 
• 
f _{k} = f(x _{k} ), function value of f at x _{k} ; 
• 
J _{k} = J(x _{k} ), Jacobian of r(x) at x _{k} ; 
• 
g _{k} = g(x _{k} ), gradient of f(x) at x _{k} ; 
• H _{k} = H(x _{k} ), Hessian of f(x) at x _{k} .
5
This chapter reviews some fundamental techniques for unconstrained and bound
constrained optimization. We begin with some basic strategies for unconstrained
optimization problems, then we examine some specialized methods for bound
constrained problems.
The general unconstrained optimization problem has the simple mathematical
formulation
minimize
x∈R ^{n}
f(x).
(2.1)
Hereafter we assume that f(x) : R ^{n} → R is a smooth realvalued function. By
Taylor’s theorem, a smooth function can be approximated by a quadratic model
in some neighbourhood of each point x ∈ R ^{n} .
The optimality conditions for unconstrained optimization problem (2.1) are
wellknown. We summarize the conditions below without explanation. For more
detailed descriptions, see Nocedal and Wright [47, Chapter 2].
FirstOrder Necessary Conditions: If x ^{∗} is a local minimizer and f is
continuously diﬀerentiable in an open neighbourhood of x ^{∗} , then g(x ^{∗} ) = 0.
SecondOrder Necessary Conditions: If x ^{∗} is a local minimizer of f and
∇ ^{2} f exists and is continuous in an open neighbourhood of x ^{∗} , then g(x ^{∗} ) = 0
and H(x ^{∗} ) is positive semideﬁnite.
SecondOrder Suﬃcient Conditions: Suppose that ∇ ^{2} f is continuous
in an open neighbourhood of x ^{∗} , g(x ^{∗} ) = 0, and H(x ^{∗} ) is positive deﬁnite. Then
x ^{∗} is a strict local minimizer of f.
In general, algorithms for unconstrained optimization generate a sequence
of iterates {x _{k} } _{k}_{=}_{0} using the rule
∞
x _{k}_{+}_{1} = x _{k} + d _{k} ,
where d _{k} is a direction of descent. The two main algorithmic strategies are
based on diﬀerent subproblems for computing the updates d _{k} .
Chapter 2.
Background on Optimization
6
2.1.1 Linesearch methods
Line search methods obtain a descent direction d _{k} in two stages. First, d _{k} is
found such that
g
T
k
d _{k} < 0.
(2.2)
That is, it requires that f is strictly decreasing along d _{k} within some neighbour
hood of x _{k} . The most obvious choice for search direction is d _{k} = −g _{k} , which is
called the steepestdescent direction. However, any direction d _{k} that solves the
equations
B _{k} d _{k} = −g _{k}
for some positive deﬁnite matrix B _{k} also satisﬁes (2.2).
Second, the distance to move along d _{k} is determined by a step length α _{k}
that approximately solves the onedimensional minimization problem
minimize
α>0
f(x _{k} + αd _{k} ).
Popular line search conditions require that α _{k} gives suﬃcient decrease in f and
ensures that the algorithm makes reasonable progress. For example, the Armijo
condition imposes suﬃcient decrease by requiring that
f(x _{k} + α _{k} d _{k} ) ≤ f(x _{k} ) + σ _{1} α _{k} g
T
k
d _{k} ,
(2.3)
for some constant σ _{1} ∈ (0, 1). The curvature condition imposes “reasonable
progress” by requiring that α _{k} satisﬁes
g(x _{k} + α _{k} d _{k} ) ^{T} d _{k} ≥ σ _{2} g
T
k
d _{k} ,
(2.4)
for σ _{2} ∈ (σ _{1} , 1).
The suﬃcient decrease (2.3) and curvature conditions (2.4)
collectively are known as the Wolfe conditions. The strong Wolfe conditions
strengthen the curvature condition and require that, in addition to (2.3), α _{k}
satisﬁes
g(x _{k} + α _{k} d _{k} ) ^{T} d _{k}  ≤ σ _{2} g
T
k
d _{k} .
2.1.2 Trustregion methods
Trustregion methods were ﬁrst developed for nonlinear leastsquares problems
[32, 37, 45]. When f(x) is twice continuously diﬀerentiable, Taylor’s theorem
gives
f(x + d) = f + g ^{T} d + _{2} d ^{T} Hd + O( d ^{3} ).
1
Thus the quadratic model function m _{k} used at each iterate x _{k} is
m _{k} (d) = f _{k} + g
T
k
d
+ ^{1} _{2} d ^{T} H _{k} d.
(2.5)
The fundamental idea of trustregion methods is to deﬁne a trustregion
radius ∆ _{k} for each subproblem. At each iteration, we seek a solution d _{k} of the
Chapter 2.
Background on Optimization
7
subproblem based on the quadratic model (2.5) subject to some trusted region:
minimize 
m _{k} (d) 
(2.6) 
d∈R ^{n} 

subject to 
d _{2} ≤ ∆ _{k} . 
One of the key ingredients in a trustregion algorithm is the strategy for
choosing the trustregion radius ∆ _{k} at each iteration. Given a step d _{k} , we
deﬁne the quantity
ρ _{k} =
f(x _{k} ) − f(x _{k} + d _{k} )
m _{k} (0) − m _{k} (d _{k} )
,
(2.7)
which gives the ratio between the actual reduction and predicted reduction. The
actual reduction represents the actual decrease of the objective function for the
trial step d _{k} . The predicted reduction in the denominator of (2.7) is the reduction
predicted by the model function m _{k} . The choice of ∆ _{k} is at least partially
determined by the ratio ρ _{k} at previous iterations. Note that the predicted
reduction should always be nonnegative. If ρ _{k} is negative, the new objective
value f(x _{k} + d _{k} ) is greater than the current value f _{k} , so the step must be
rejected. If ρ _{k} is close to 1, there is good agreement between the model and
the function over the step, so it is safe to expand the trust region for the next
iteration. If ρ _{k} is positive but signiﬁcantly smaller than 1, we do not alter the
trust region, but if it is close to zero or negative, we shrink the trust region ∆ _{k}
at next iteration.
Given some constants η _{1} , η _{2} , γ _{1} , and γ _{2} that satisfy
0 < η _{1} ≤ η _{2} < 1
and
0 < γ _{1} ≤ γ _{2} < 1,
we can update the trustregion radius by
∆ k+1 ∈
(∆ _{k} ,∞),
[γ _{2} ∆ _{k} , ∆ _{k} ],
(γ _{1} ∆ _{k} , γ _{2} ∆ _{k} ),
if
if
if
ρ
_{k}
ρ
_{k}
ρ
_{k}
> η _{2} ,
∈ [η _{1} , η _{2} ],
< η _{1} ;
(2.8)
See, e.g. Conn, Gould and Toint [9, Chapter 7]. Note that (2.6) can be solved
equivalently as
minimize
d∈R ^{n}
g
T
k
d + ^{1} _{2} d ^{T} (H _{k} + λ _{k} I)d,
for some positive λ _{k} that is larger than the leftmost eigenvalue of H _{k} . Thus,
the solution satisﬁes
(H _{k} + λ _{k} I)d = −g _{k} .
(2.9)
The relationship between (2.6) and (2.9) is used in Chapter 4 when we describe
the BCNLS algorithm. For a summary of trustregion methods, see Nocedal
and Wright [47, Chapter 4].
Chapter 2.
Background on Optimization
8
Bound constrained optimization problems have the general form
minimize
x∈R ^{n}
subject to
f(x)
≤ x ≤ u,
(2.10)
where l, u ∈ R ^{n} are lower and upper bounds on the variables x. The feasible
region is often called a “box” due to its rectangular shape. Note that both the
lower and upper bounds may be optional, and when some components of x lack
an upper or a lower bound, we set the appropriate components of or u to −∞
or +∞, respectively.
We deﬁne the set of binding constraints [20, 53] by
B(x ^{∗} ) =
i
x
∗
i
x
∗
i
=
_{i} ,
= u _{i} ,
l _{i} = u _{i} .
if 
g(x ^{∗} ) _{i} ≥ 
0, 
or 
if 
g(x ^{∗} ) _{i} ≤ 
0, 
or 
Furthermore, the strictly binding set is deﬁned as
B _{s} (x ^{∗} ) = B(x ^{∗} ) ∩ {i : g(x ^{∗} ) _{i} = 0