Vous êtes sur la page 1sur 9

Numer Algor (2016) 72:425–433

DOI 10.1007/s11075-015-0053-z

ORIGINAL PAPER

A modified scaling parameter for the memoryless


BFGS updating formula

Saman Babaie–Kafaki1

Received: 21 March 2015 / Accepted: 2 September 2015 / Published online: 10 September 2015
© Springer Science+Business Media New York 2015

Abstract Based on an eigenvalue analysis, condition number of the scaled mem-


oryless BFGS (Broyden–Fletcher–Goldfarb–Shanno) updating formula is obtained.
Then, a modified scaling parameter is proposed for the mentioned updating formula,
minimizing the given condition number. The suggested scaling parameter can be con-
sidered as a modified version of the self–scaling parameter proposed by Oren and
Spedicato. Numerical experiments are done; they demonstrate practical effectiveness
of the proposed scaling parameter.

Keywords Unconstrained optimization · Large–scale optimization · Memoryless


BFGS update · Eigenvalue · Condition number

Mathematics Subject Classification (2010) 90C53 · 49M37 · 65F15

1 Introduction

Comprising a class of optimization problems with great significance, unconstrained


optimization problems are in the form of
min f (x), (1.1)
x∈Rn
with the objective function f : Rn → R, here assumed to be continuously differen-
tiable. As known, quasi–Newton methods are useful tools for solving (1.1) because

 Saman Babaie–Kafaki
sbk@semnan.ac.ir

1 Department of Mathematics, Faculty of Mathematics, Statistics and Computer Science, Semnan


University, P.O. Box: 35195–363, Semnan, Iran
426 Numer Algor (2016) 72:425–433

of no requiring explicit expression of the (inverse) Hessian and being often globally
and locally superlinearly convergent [26].
Iterations of quasi–Newton methods are in the following form:
x0 ∈ Rn , xk+1 = xk + sk , sk = αk dk , k = 0, 1, ..., (1.2)
where αk is a step length often determined to satisfy the Wolfe conditions [26], i.e.,
f (xk + αk dk ) − f (xk ) ≤ δαk ∇f (xk )T dk , (1.3)
∇f (xk + αk dk )T dk ≥ σ ∇f (xk )T dk , (1.4)
with 0 < δ < σ < 1, and dk is the search direction computed by
dk = −Hk gk ,
in which gk = ∇f (xk ), and Hk ∈ Rn×n is an approximation of the inverse Hessian;
more exactly, Hk ≈ ∇ 2 f (xk )−1 . The significant step of quasi–Newton methods is
updating the matrix Hk to achieve a new approximation Hk+1 for ∇ 2 f (xk+1 )−1 ,
generally in the following form:
Hk+1 = Hk + Hk ,
where Hk is a correction matrix. The matrix Hk+1 is imposed with the scope of sat-
isfying a particular equation, namely secant (quasi–Newton) equation, which implic-
itly includes the Hessian information. Among the most popular secant equations there
is the standard secant equation, that is,
Hk+1 yk = sk ,
in which yk = gk+1 − gk .
One of the well–known quasi–Newton updating formulae is the BFGS formula
given by
 
sk ykT Hk + Hk yk skT ykT Hk yk sk skT
Hk+1 = Hk −
BF GS
+ 1+ T , (1.5)
skT yk sk y k skT yk
being computationally superior in contrast to the other formulae [26]. It can be seen
that if Hk is a positive definite matrix and the line search ensures that skT yk > 0, as
guaranteed by the Wolfe conditions (1.3) and (1.4), then Hk+1 BF GS is also a positive

definite matrix [26] and consequently, the generated search direction is a descent
direction. Also, under convexity assumption on the objective function, the BFGS
method has been shown to be globally and locally superlinearly convergent [26]. It is
worth noting that recently Li and Fukushima [20] proposed a modified BFGS method
which is globally and locally superlinearly convergent even for nonconvex objective
functions.
In order to increase numerical stability of the quasi–Newton methods in the sense
of improving condition number of successive approximations of the inverse Hessian,
scaled quasi–Newton methods have been developed [26]. Achieving an ideal distri-
bution of the eigenvalues of quasi–Newton updating formulae is the main object of
the scaling approach. In an essential scheme, replacing Hk by θk Hk in (1.5) in which
Numer Algor (2016) 72:425–433 427

θk > 0 is called the scaling parameter, the scaled BFGS updating formula can be
proposed as follows:
 
sk ykT Hk + Hk yk skT ykT Hk yk sk skT
Ĥk+1 = θk Hk − θk + 1 + θk T . (1.6)
skT yk sk y k skT yk
The most popular choices for θk in (1.6) have been suggested by Oren and
Spedicato [25],
s T yk
θk = T k , (1.7)
yk Hk yk
and, Oren and Luenberger [24],
skT Hk−1 sk
θk = , (1.8)
skT yk
(see also [23]). Note that a scaled BFGS updating formula in the form of (1.6) with
one of the parameters (1.7) or (1.8) is called a self–scaling BFGS updating formula.
As an important disadvantage of the self–scaling BFGS method, making it
improper for solving large–scale problems, in each iteration of the method the matrix
Hk ∈ Rn×n should be saved. To overcome this defect, replacing Hk by the identity
matrix in (1.6), self–scaling memoryless BFGS updating formula has been proposed
as follows:
 
sk ykT + yk skT ykT yk sk skT
H̄k+1 = θk I − θk + 1 + θk T . (1.9)
skT yk sk yk skT yk
Similarly, memoryless version of the scaling parameters (1.7) and (1.8) can be
respectively given by
s T yk
θk = k 2 , (1.10)
||yk ||
and
||sk ||2
θk = T , (1.11)
sk y k
where ||.|| stands for the Euclidean norm. Thus, the search direction dk+1 =
−H̄k+1 gk+1 can be computed by a few inner products.
Recently, scaled memoryless BFGS updating formula has attracted special atten-
tion. For example, Andrei [1–5] dealt with the scaled memoryless BFGS precon-
ditioned conjugate gradient methods based on the updating formula (1.9) with
the scaling parameter (1.11). Then, for uniformly convex objective functions [26],
Babaie–Kafaki [8, 11] established the sufficient descent property for the methods
suggested in [1–5]. To employ function values in addition to the gradient information,
Babaie–Kafaki [10, 12] made some modifications on the Andrei’s approach, using
the modified secant equations proposed by Zhang et al. [31, 32] (see also [15, 30])
and Wei et al. [28] (see also [7, 21, 29]). To achieve global convergence as well as
sufficient descent property without convexity assumption on the objective function,
Babaie–Kafaki and Ghanbari [14] suggested a memoryless BFGS method using the
modified secant equation proposed by Li and Fukushima [20] (see also [33]). More-
over, Dai and Kou [16] proposed a family of nonlinear conjugate gradient methods
428 Numer Algor (2016) 72:425–433

based on the scaled memoryless BFGS updating formula (1.9). More recently, Kou
and Dai [19] proposed a modified self–scaling memoryless BFGS method which
satisfies the sufficient descent condition without convexity assumption on the objec-
tive function. Also, Babaie–Kafaki [13] dealt with optimality of the self–scaling
parameters (1.7) and (1.8) for the memoryless quasi–Newton updating formulae.
Here, as a result of an eigenvalue analysis, a scaling parameter is suggested for
the memoryless BFGS updating formula which can be regarded as a modified ver-
sion of the scaling parameter (1.10). This work is organised as follows. In Section 2,
a modified scaled memoryless BFGS method is proposed. The method is numeri-
cally compared with the scaled memoryless BFGS methods proposed by Oren et al.
[24, 25] in Section 3. Finally, conclusions are drawn in Section 4.

2 A modified scaled memoryless BFGS method

In order to deal with our modified scaling parameter, it is necessary to conduct an


eigenvalue analysis on the matrix H̄k+1 defined by (1.9). Hereafter, we assume that
for all k ≥ 0, the scaling parameter θk and the curvature skT yk are positive. Hence,
H̄k+1 is a positive definite matrix [26].
Since skT yk > 0, we have sk  = 0 and yk  = 0. So, there exists a set of mutually
 n−2
orthonormal vectors uik i=1 such that

skT uik = ykT uik = 0, i = 1, ..., n − 2,

which yields

H̄k+1 uik = θk uik , i = 1, ..., n − 2.

That is, H̄k+1 has n − 2 eigenvalues being equal to θk . Next, we find the two
remaining eigenvalues of H̄k+1 , namely λ−k and λk .
+

As known, the trace of H̄k+1 is equal to the sum of its eigenvalues. Thus, we get
 
||y || 2 ||sk ||2
λ− + k
k + λ k = 1 + θk . (2.1)
skT yk skT yk

On the other hand, from the Sherman–Morrison formula [26], it can be seen that

−1 1 1 sk skT yk ykT
H̄k+1 = I− + .
θk θk skT sk skT yk

−1
Note that H̄k+1 has n − 2 eigenvalues being equal to θk−1 , and two other
−1 −1
eigenvalues being equal to λ−
k and λ+
k . Hence,

−1 1 1
det(H̄k+1 )= × +. (2.2)
θkn−2 λ−
k λk
Numer Algor (2016) 72:425–433 429

−1 1
Also, H̄k+1 is a rank–two update of the matrix I . Therefore, considering
θk
equality (1.2.70) of [26], we get

−1 1 skT yk
det(H̄k+1 )= × ,
θkn−1 ||sk ||2
which together with (2.2) yields
||sk ||2
λ− +
k λk = θ k . (2.3)
skT yk
Now, considering (2.1) and (2.3), the eigenvalues λ− +
k and λk can be computed by
 
± 1 ||yk ||2 ||sk ||2
λk = 1 + θk T
2 sk y k skT yk

 2
1 ||yk ||2 ||sk ||4 ||sk ||2
± 1 + θk T T
− 4θ k .
2 sk y k (sk yk )2 skT yk
Moreover, after some algebraic manipulations it can be seen that
 
||yk ||2 ||sk ||2

1 + θk T − 2θ k
sk y k skT yk

 2
 ||yk ||2 ||sk ||4 ||sk ||2
≤  1 + θk T − 4θ ,
k
sk y k (skT yk )2 skT yk
and as a result,
0 < λ− +
k ≤ θk ≤ λ k . (2.4)
As known, matrix condition number plays an important role in the error analysis
of a numerical problem related to a matrix in the sense of measuring sensitivity of the
solution to data perturbations [27]. A matrix with a large condition number is called
an ill–conditioned matrix. To compute condition number of the scaled memoryless
BFGS updating formula (1.9), note that from inequality (2.4) we have ||H̄k+1 || = λ+ k
−1 −1
and ||H̄k+1 || = λ−
k . So,

−1 λ+
κ(H̄k+1 ) = ||H̄k+1 || × ||H̄k+1 || = k
, (2.5)
λ−
k
where κ(.) stands for the spectral condition number. As seen, the more distance
between λ− +
k and λk , the more growth of κ(H̄k+1 ). Thus, in order to enhance numer-
ical stability in the scaled memoryless BFGS method, it is reasonable to compute θk
in (1.9) by minimizing κ(H̄k+1 ). In a simple scheme, considering (2.4) and (2.5), an
optimal value for the scaling parameter θk , namely θk∗ , can be computed as follows:
θk∗ = arg min (λ+ −
k − λk ),
θk
430 Numer Algor (2016) 72:425–433

making λ− +
k as closer as possible to λk , and as a result, making κ(H̄k+1 ) ≥ 1 as closer
as possible to 1. More exactly,
 
s Ty 2(s T y )2
θk∗ = k 2
k k k
−1 . (2.6)
||yk || ||sk ||2 ||yk ||2

As an important problem, the scaling parameter θk∗ may be a nonpositive scaler.


(skT yk )2 1
More precisely, if ≤ , then θk∗ ≤ 0. In such situation, we employ the
||sk ||2 ||yk ||2 2
scaling parameter (1.10) proposed by Oren and Spedicato [25]. In other words, we
use the following hybridization of the scaling parameters (1.10) and (2.6):


⎪ 2(skT yk )2 (skT yk )2 1
s T y ⎨ − 1, > ,
θ̄k∗ = k 2 ||sk || ||yk || ||sk || ||yk ||
k 2 2 2 2 2 (2.7)
||yk || ⎪ ⎪

1, otherwise.

Remark 2.1 Another well–known quasi–Newton updating formula is the DFP


(Davidon–Fletcher–Powell) formula which has a dual relationship with the BFGS
updating formula [26]. Hence, a similar eigenvalue analysis can be carried out for the
scaled memoryless DFP updating formula [13].

3 Numerical experiments

Here, we present some numerical results obtained by applying MATLAB 7.7.0.471


(R2008b) implementation of the scaled memoryless BFGS method with the updat-
ing formula (1.9) in which the scaling parameter θk is computed by Oren–Spedicato

Fig. 1 Total number of function


and gradient evaluations
performance profiles for
SBFGS–N, SBFGS–OS and
SBFGS–OL
Numer Algor (2016) 72:425–433 431

Fig. 2 CPU time performance


profiles for SBFGS–N,
SBFGS–OS and SBFGS–OL

formula (1.10), Oren–Luenberger formula (1.11) and the formula (2.7), here respec-
tively abbreviated to SBFGS–OS, SBFGS–OL and SBFGS–N. The runs were
performed on a set of 105 unconstrained optimization test problems of the CUTEr
collection [18] with the minimum dimension being equal to 100, as specified in [9],
using a computer Intel(R) Core(TM)2 Duo CPU 2.00 GHz with 1 GB of RAM.
In the line search procedure, the Wolfe conditions (1.3) and (1.4) are used with
δ = 0.0001 and σ = 0.9, and the step length αk is computed using Algorithm 3.5
of [22]. Moreover, all attempts for finding an approximation of the solution were
terminated by reaching maximum of 10000 iterations or achieving a solution with
||gk ||∞ < 10−6 (1 + |f (xk )|).
Efficiency comparisons were drawn using the Dolan–Moré performance profile
[17], on the running time and the total number of function and gradient evaluations
being equal to Nf + 3Ng , where Nf and Ng respectively denote the number of
function and gradient evaluations. Performance profile gives, for every ω ≥ 1, the
proportion p(ω) of the test problems that each considered algorithmic variant has a
performance within a factor of ω of the best. Figures 1 and 2 demonstrate the results
of comparisons.
As seen, SBFGS–OS outperforms SBFGS–OL while SBFGS–N is preferable to
SBFGS–OS with respect to the total number of function and gradient evaluations as
well as with respect to the running time. This provides promising practical support
for our theoretical discussion. As a detailed observation for the SBFGS–N method,
it is notable that averagely in 38.45% of the iterations we have θ̄k∗ = θk∗ , with θk∗ and
θ̄k∗ respectively given by (2.6) and (2.7).

4 Conclusions

Computation of the parameter of scaled memoryless BFGS updating formula by


decreasing its condition number has been investigated by conducting an eigenvalue
432 Numer Algor (2016) 72:425–433

analysis. Considering importance of the scaled memoryless BFGS preconditioning


for the conjugate gradient methods, this study can be regarded as an effort to find a
reasonable solution for the 8th open problem in nonlinear conjugate gradient meth-
ods given by Andrei [6]. Our discussion led to a scaling parameter which can be
taken into account as a modified version of the Oren–Spedicato parameter. Numer-
ical results showed that the proposed scaling parameter turns out to be practically
effective.

Acknowledgments This research was supported by Research Council of Semnan University. The author
is grateful to Professor Michael Navon for providing the line search code. He also thanks the anonymous
reviewers for their valuable comments and suggestions helped to improve the quality of this work.

References

1. Andrei, N.: A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimiza-
tion. Appl. Math. Lett. 20(6), 645–650 (2007)
2. Andrei, N.: Scaled conjugate gradient algorithms for unconstrained optimization. Comput. Optim.
Appl. 38(3), 401–416 (2007)
3. Andrei, N.: Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained
optimization. Optim. Methods Softw. 22(4), 561–571 (2007)
4. Andrei, N.: A scaled nonlinear conjugate gradient algorithm for unconstrained optimization. Opti-
mization 57(4), 549–570 (2008)
5. Andrei, N.: Accelerated scaled memoryless BFGS preconditioned conjugate gradient algorithm for
unconstrained optimization. European J. Oper. Res. 204(3), 410–420 (2010)
6. Andrei, N.: Open problems in conjugate gradient algorithms for unconstrained optimization. B.
Malays. Math. Sci. So. 34(2), 319–330 (2011)
7. Babaie–Kafaki, S.: A modified BFGS algorithm based on a hybrid secant equation. Sci. China Math.
54(9), 2019–2036 (2011)
8. Babaie–Kafaki, S.: A note on the global convergence theorem of the scaled conjugate gradient
algorithms proposed by Andrei. Comput. Optim. Appl. 52(2), 409–414 (2012)
9. Babaie–Kafaki, S.: A quadratic hybridization of Polak–Ribière–Polyak and Fletcher–Reeves conju-
gate gradient methods. J. Optim. Theory Appl. 154(3), 916–932 (2012)
10. Babaie–Kafaki, S.: A modified scaled memoryless BFGS preconditioned conjugate gradient method
for unconstrained optimization. 4OR 11(4), 361–374 (2013)
11. Babaie–Kafaki, S.: A new proof for the sufficient descent condition of Andrei’s scaled conjugate
gradient algorithms. Pac. J. Optim. 9(1), 23–28 (2013)
12. Babaie–Kafaki, S.: Two modified scaled nonlinear conjugate gradient methods. J. Comput. Appl.
Math. 261(5), 172–182 (2014)
13. Babaie–Kafaki, S.: On optimality of the parameters of self–scaling memoryless quasi–Newton
updating formulae. J. Optim. Theory Appl. (2015). doi:10.1007/s10957-015-0724-x
14. Babaie–Kafaki, S., Ghanbari, R.: A modified scaled conjugate gradient method with global conver-
gence for nonconvex functions. Bull. Belg. Math. Soc. Simon Stevin 21(3), 465–477 (2014)
15. Babaie–Kafaki, S., Ghanbari, R., Mahdavi–Amiri, N.: Two new conjugate gradient methods based on
modified secant equations. J. Comput. Appl. Math. 234(5), 1374–1386 (2010)
16. Dai, Y.H., Kou, C.X.: A nonlinear conjugate gradient algorithm with an optimal property and an
improved Wolfe line search. SIAM J. Optim. 23(1), 296–320 (2013)
17. Dolan, E.D., Moré, J.J., Benchmarking optimization software with performance profiles: Math.
Program. 91(2, Ser. A), 201–213 (2002)
18. Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEr: a constrained and unconstrained testing environment,
revisited. ACM Trans. Math. Softw. 29(4), 373–394 (2003)
19. Kou, C.X., Dai, Y.H.: A modified self–scaling memoryless Broyden–Fletcher–Goldfarb–Shanno
method for unconstrained optimization. J. Optim. Theory Appl. 165(1), 209–224 (2015)
Numer Algor (2016) 72:425–433 433

20. Li, D.H., Fukushima, M.: A modified BFGS method and its global convergence in nonconvex
minimization. J. Comput. Appl. Math. 129(1–2), 15–35 (2001)
21. Li, G., Tang, C., Wei, Z.: New conjugacy condition and related new conjugate gradient methods for
unconstrained optimization. J. Comput. Appl. Math. 202(2), 523–539 (2007)
22. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
23. Oren, S.S.: Self–scaling variable metric (SSVM) algorithms. II. Implementation and experiments.
Management Sci. 20(5), 863–874 (1974)
24. Oren, S.S., Luenberger, D.G.: Self–scaling variable metric (SSVM) algorithms. I. Criteria and
sufficient conditions for scaling a class of algorithms. Management Sci. 20(5), 845–862 (1973/74)
25. Oren, S.S., Spedicato, E.: Optimal conditioning of self–scaling variable metric algorithms. Math.
Program. 10(1), 70–90 (1976)
26. Sun, W., Yuan, Y.X.: Optimization Theory and Methods: Nonlinear Programming. Springer, New
York (2006)
27. Watkins, D.S.: Fundamentals of Matrix Computations. Wiley, New York (2002)
28. Wei, Z., Li, G., Qi, L.: New quasi–Newton methods for unconstrained optimization problems. Appl.
Math. Comput. 175(2), 1156–1188 (2006)
29. Yuan, Y.X.: A modified BFGS algorithm for unconstrained optimization. IMA J. Numer. Anal. 11(3),
325–332 (1991)
30. Yuan, Y.X., Byrd, R.H.: Non–quasi–Newton updates for unconstrained optimization. J. Comput.
Math. 13(2), 95–107 (1995)
31. Zhang, J., Xu, C.: Properties and numerical performance of quasi–Newton methods with modified
quasi–Newton equations. J. Comput. Appl. Math. 137(2), 269–278 (2001)
32. Zhang, J.Z., Deng, N.Y., Chen, L.H.: New quasi–Newton equation and related methods for uncon-
strained optimization. J. Optim. Theory Appl. 102(1), 147–167 (1999)
33. Zhou, W., Zhang, L.: A nonlinear conjugate gradient method based on the MBFGS secant condition.
Optim. Methods Softw. 21(5), 707–714 (2006)

Vous aimerez peut-être aussi