An LP Newton Method - Nonsmooth Equations, KKT Systems, and Nonisolated Solutions

Als Manuskript gedruckt
Technische Universität Dresden

Institut für Numerische Mathematik
An LP-Newton Method: Nonsmooth Equations,

KKT Systems, and Nonisolated Solutions
F. Facchinei, A. Fischer, and M. Herrich
MATH–NM–5–2011
September 2011
An LP-Newton Method: Nonsmooth Equations,
KKT Systems, and Nonisolated Solutions
Francisco Facchinei
Department of Computer and
System Sciences Antonio Ruberti
Universita di Roma “La Sapienza”
Via Ariosto 25, 00185 Rome, Italy
facchinei@dis.uniroma1.it
Andreas Fischer1 and Markus Herrich

Institute of Numerical Mathematics
Department of Mathematics
Technische Universität Dresden
01062 Dresden, Germany
Andreas.Fischer@tu-dresden.de
Markus.Herrich@tu-dresden.de
September 15, 2011
Abstract. We define a new Newton-type method for the solution of constrained systems of
equations and analyze in detail its properties. Under suitable conditions, that do not include
differentiability or local uniqueness of solutions, the method converges locally quadratically
to a solution of the system of equations, thus filling an important gap in the existing theory.
The new algorithm improves on known methods and, when particularized to KKT systems
deriving from optimality conditions for constrained optimization or variational inequalities, it
has theoretical advantages even over methods specifically designed to solve such systems.
Keywords. Quadratic convergence, Newton method, nonsmooth system, nonisolated solution,

KKT system
Mathematics Subject Classification (2010). 90C30, 90C33, 49M15, 65K05
1 Part of this research was done while this author was visiting the Department of Computer and System Sciences
Antonio Ruberti at the University of Rome La Sapienza. The financial support by the University of Rome La
Sapienza is kindly acknowledged.
2
1 Introduction
In this paper we develop a fast, local method for the solution of the constrained system of
equations
F(z) = 0, z ∈ Ω, (1)
where Ω ⊆ Rn is a nonempty and closed set and F : Rn → Rm is a given continuous map.
This problem has a long and glorious history and its importance cannot be overestimated.
To put this paper in perspective it might be useful to present a very short review of results. We
begin by considering the most classical case of problem (1): the solution of a square system of
equations, which corresponds to n = m and Ω = Rn , that is we consider first the solution of
F(z) = 0 (2)
with F : Rn → Rn . The prototype of all fast, local algorithms is Newton’s method which starts
at a point z0 “close to a solution” and generates a sequence {zk } by setting zk+1 equal to the
solution of the linear system
F 0 (zk )(z − zk ) = −F(zk ), (3)
where F 0 denotes the Jacobian of F. The classical result for this method states that if z∗ is
a solution of (2), F is twice continuously differentiable in a neighborhood of z∗ , and F 0 (z∗ )
is nonsingular, then {zk } converges quadratically to z∗ provided that z0 belongs to a suitably
small neighborhood of z∗ (throughout this paper all convergence rates are quotient rates, so that
quadratic stands for Q-quadratic). This result is the cornerstone for the development of a host
of variants and relaxations which all assume at least the continuous differentiability of F and
the nonsingularity of the Jacobian of F at z∗ .
However, in the optimization community the need for results that go beyond the classical
ones has long been felt. The development of new applicative fields and the refinements of the
analysis of standard settings all pointed to the need to extend the classical Newton method in at
least three directions:
1. Relaxation of the differentiability assumption;
2. Relaxation of the nonsingularity assumption;
3. The ability to find a solution lying in some prescribed set Ω.
The differentiability issue is probably the one on which has attracted more attention due to its
practical importance. In fact it is well understood that nondifferentiable systems of equations
arise quite naturally both when modeling natural phenomena and in the study of problems that
are usually thought as “smooth”, for example the Kojima reformulation of the KKT system
of an optimization problem or the equation reformulations of complementarity systems. We
cannot go into the details of these developments, but it is safe to say that, computationally,
the big breakthrough has been the development of semismooth Newton methods ([26, 30, 31]
and [12] for more bibliographical references). For example, if the function F is assumed to be
strongly semismooth the iteration (3) can be substituted by
Vk (z − zk ) = −F(zk ), Vk ∈ ∂ F(zk ),
3
where ∂ F(zk ) denotes the generalized Jacobian of Clarke. Assuming that all the elements in
∂ F(z∗ ) are nonsingular, the semismooth Newton method can be shown to be well defined and
quadratically convergent to z∗ provided that z0 belongs to a suitably small neighborhood of
z∗ . The remarkable things about this semismooth Newton method are its simplicity and for-
mal similarity to the classical Newton method and the fact that when F happens to be twice
continuously differentiable around z∗ it reduces automatically to the classical method.
Advancements on the relaxation of the nonsingularity assumption that are useful in opti-
mization contexts are more recent. They were prompted by the desire to develop fast methods
for the solution of classical problems, typically KKT systems and complementarity problems,
under assumptions that are weaker than traditional ones. For example, the study of the conver-
gence properties of algorithms for the solution of the KKT system of a constrained optimization
problem were traditionally carried out assuming the linear independence of the gradients of the
active constraints. But this is a strong assumption, and as soon as one relaxes it, the solutions
of the KKT system become nonisolated in general, since the multipliers are no longer unique.
It is then clear that in this setting the nonsingularity assumption is not reasonable, since it ob-
viously implies the local uniqueness of the solution. The key to advancements on these issues
was the understanding that crucial to the quadratic convergence of the Newton method is not
the nonsingularity of the Jacobian per se, but rather one of its consequences: the “error bound
condition”. Let Z be the set of solutions of the system F(z) = 0. We say that F provides a local
error bound around some z∗ ∈ Z if there exist positive constants ` and δ , such that
dist[s, Z] ≤ `kF(s)k for all s ∈ Bδ (z∗ ),
where, Bδ (z∗ ) := {z ∈ Rn | kz − z∗ k ≤ δ } is the ball of radius δ > 0 and dist[s, Z] :=

inf{ks − zk | z ∈ Z} denotes the distance of a point s to the solution set Z. Roughly speak-
ing, the error bound condition holds if the function F itself provides an (over-) estimate of the
distance to the solution set for every point sufficiently close to the solution z∗ . Assuming for
simplicity twice continuous differentiability of F and this error bound condition, it is possible
to design a Levenberg-Marquardt method that retains quadratic convergence even in the case of
nonisolated solutions. With this approach, given zk , the new iteration zk+1 is the unique solution
of the following quadratic convex optimization problem
min kF(zk ) + F 0 (zk )(z − zk )k22 + µk kz − zk k22 , (4)

z
where µk is a strictly positive scalar. If the sequence {µk } is chosen appropriately, it can be
shown that the resulting algorithm generates a sequence {zk } that converges quadratically to
a possibly nonisolated solution of (2) [13, 17, 38], see also [10] for the use of Levenberg-
Marquardt method to compute an isolated solution of a nonsmooth system of equations. One
important point to note is however that, in spite of the many efforts devoted to this issue, the
semismooth Newton method and the Levenberg-Marquardt method under an error bound con-
dition seem “incompatible”: fine details apart, when F is nondifferentiable, to date there is no
general method with a fast local convergence rate for systems F(z) = 0 with nonisolated solu-
tions under a mere error bound condition. This is a crucial issue since it is by now very clear that
this is precisely the feature one needs in order to make interesting advancements in the solution
of structured system of equations like those arising from the KKT conditions of optimization
problems, variational inequalities and generalized Nash games (see [12] for more information
about variational inequalities and [11] for a discussion of generalized Nash games).
4
Turning to the addition of the constraint z ∈ Ω, the subject is still in its infancy. The utility
of limiting the search of solutions to a prescribed set is of obvious applicative importance. And
indeed, in many situations it is known a priori the solutions we are interested in should belong
to some set: for example, the multipliers relative to inequality constraints in a KKT system
should be nonnegative. The presence of a constraint z ∈ Ω could also have a less obvious
technical significance: by restricting the region where we are looking for the solution we could
also obtain gains in terms of differentiability and nonsingularity assumptions that could lead to
improvements with respects to the developments outlined above. This is a rather technical issue
that we don’t discuss further at this point, but that will play an important role in this paper. For
the time being we only mention that the presence of a constraint set Ω, if convex, can easily
be incorporated in the Levenberg-Marquardt approach, essentially by changing the subproblem
(4) to
min kF(zk ) + F 0 (zk )(z − zk )k22 + µk kz − zk k22 , s.t. z ∈ Ω, (5)
z
see [25] and [1]. Other approaches, essentially interior point methods, may also be suited for
the solution of constrained systems of equations, but are not very relevant to our developments,
we refer the interested reader to [12] for further information.
In this paper we propose a new, fast, local algorithm for the solution of the constrained
system of equations (1). The new method is rather different from previous methods. Given an
iterate zk ∈ Ω, we take zk+1 as a solution of the subproblem
min γ
z,γ
z ∈ Ω,
kF(zk ) + G(zk )(z − zk )k ≤ γkF(zk )k2 ,
kz − zk k ≤ γkF(zk )k,
γ ≥ 0,
where G(zk ) is a suitable substitute for the Jacobian of F at zk (if the function F is differentiable
then we can take it to be the Jacobian). If k · k is the infinity norm and if Ω is polyhedral (an
assumption which is satisfied in practically all applications), the above problem is a simple
linear program, whence the name LP-Newton method. The main contributions we provide in
this paper are:
• The definition of the new LP-Newton method and the investigation of its local conver-
gence properties based on a new set of assumptions;
• A thorough analysis of the assumptions under which quadratic convergence can be guar-
anteed, proving that the new method can successfully deal with interesting classes of
constrained nonsmooth systems of equations with nonisolated solutions, thus overcom-
ing one of the main limitations of all Newton-type methods proposed so far;
• A detailed discussion of the applicability of the LP-Newton method to KKT systems

arising from constrained minimization problems or variational inequalities, showing that
the new method compares favorably to existing, specialized methods for KKT systems,
as those in [14, 16, 20, 22, 23, 24, 35, 36, 37].
5
We also give a few hints to the applicability of the LP-Newton method to a wide range of
problems beyond KKT systems and discuss some numerical examples.
The rest of the paper is organized as follows. Section 2 describes our new method for the
solution of problem (1). It is shown that this method converges quadratically to a solution of
(1) under weak assumptions. In Section 3 these assumptions are analyzed in great detail. The
results of Section 3 are applied in Section 4 to establish the local convergence properties of our
method for the solution of KKT systems. In the last section we give further applications of our
method and some numerical examples.
2 The Algorithm and its Local Convergence Properties

In this section we describe our iterative method for the solution of the constrained system of
nonlinear equations (1) and show that the algorithm is well-defined. Then, after stating our
basic assumptions, we investigate the local convergence properties of the algorithm. The main
result of this section is Theorem 1 on the local quadratic convergence of the method.
Throughout the paper the solution set of (1) is denoted by Z and it is assumed that this set is
nonempty, i.e.,
Z := {z ∈ Ω | F(z) = 0} 6= ∅. (6)
We recall that Ω is simply assumed to be a closed set. For computational reasons, see comments
after (7), the convexity of Ω is a highly desirable characteristic, but convexity is not strictly
necessary from the theoretical point of view. However, even if we won’t assume convexity
explicitly, we stress from the outset that, in essentially all problems we are interested in, Ω is
either just the whole space, Ω = Rn , or a polyhedral set.
In the description of the algorithm we use a mapping G : Rn → Rm×n . We will shortly de-
scribe the requirements this mapping must meet. However, for the time being, one should think
of G(s) as a suitable substitute of the Jacobian. If the function F is continuously differentiable,
then taking G(s) := F 0 (s) will be the natural choice, while, if F is only locally Lipschitzian,
G(s) can be taken to be an element of the B-subdifferential ∂B F(s) (see Subsection 3.1 for
definition) or of its convex hull ∂ F(s).
For given s ∈ Ω we consider the following optimization problem:
min γ
z,γ
z ∈ Ω,
kF(s) + G(s)(z − s)k ≤ γkF(s)k2 , (7)
kz − sk ≤ γkF(s)k,
γ ≥ 0.
The subproblems of our algorithm will be of this form. By k · k we denote an arbitrary but fixed
vector norm in Rn or Rm . A convenient choice of the norm is the infinity norm, i.e., k ·k = k ·k∞ .
Then, if Ω is polyhedral, (7) is just a linear optimization problem. It is here that the convexity
of Ω plays an important computational role since in any case, whatever norm, if Ω is convex,
(7) is a convex optimization problem and, as such, “easily” solvable.
The next proposition shows that problem (7) has a solution for any s ∈ Rn .
6
Proposition 1 For any s ∈ Rn ,
(a) the optimization problem (7) has a solution and
(b) the optimal value of (7) is zero if and only if s is a solution of (1).
Proof. Assertion (b) is obvious, so we only prove (a). If s ∈ Z the assertion is clear. Otherwise,
let z̄ be an element of Ω. With
γ̄ := max{kF(s) + G(s)(z̄ − s)kkF(s)k−2 , kz̄ − skkF(s)k−1 },
we see that the feasible set of (7) contains (z̄, γ̄). This remains true if we modify (7) by adding
the constraint γ ≤ γ̄. The modified problem has a nonempty compact feasible set and a contin-
uous objective. Thus, the theorem of Weierstrass shows that the modified problem is solvable.
Since (7) has the same solution set it also has a solution.
Due to Proposition 1 the optimal value of problem (7) is well-defined for any s and will be
denoted by γ(s). Now, we formally describe our method for the solution of problem (1).
Algorithm 1: LP-Newton Algorithm

(S.0) : Choose a starting point z0 ∈ Ω. Set k := 0.
(S.1) : If zk ∈ Z then stop.
(S.2) : Compute a solution (zk+1 , γ k+1 ) of (7) with s := zk .
(S.3) : Set k := k + 1 and go to (S.1).
Algorithm 1 is well-defined for any starting point z0 due to Proposition 1. Moreover, although
subproblem (7) needs not have a unique solution in respect to the z-part, it suffices that the
algorithm picks an arbitrary solution of the subproblem.
To analyze local convergence properties of a sequence generated by Algorithm 1 we now
state some assumptions. To this end, let z∗ ∈ Z denote an arbitrary but fixed solution of (1).
Moreover, let δ > 0 be the arbitrary but fixed radius of the ball Bδ (z∗ ) around z∗ .
Assumption 1 There exits L > 0 such that
kF(s)k ≤ L dist[s, Z]
holds for all s ∈ Bδ (z∗ ) ∩ Ω.
This assumption is a very weak one and it is satisfied if F is locally Lipschitz continuous.
Assumption 2 There exists ` > 0 such that
dist[s, Z] ≤ `kF(s)k
If Assumption 2 holds, we also say that F provides a local error bound around z∗ on Ω. As
we discussed in the Introduction, Assumption 2 turned out to be a key condition for proving
local superlinear convergence in cases when z∗ is not an isolated solution of problem (1), see
7
[1, 9, 13, 17, 18, 19, 25, 33, 38] as examples for its use in Levenberg-Marquardt methods and
[4, 5, 14, 16, 17, 20, 23, 24, 27, 35, 37, 39] for further methods and related assumptions.
The next two assumptions are rather more technical and new to some extent, at least in
the analysis of Newton type methods. If their technicality may make it difficult to immediately
appreciate these assumptions’ significance, it is precisely this quality that will allow us to obtain
some of the strong results that will be described later on. These assumptions will be discussed
in depth in the next section. Note also that these assumptions give implicit conditions on the
choice of the mapping G.
Assumption 3 There exists Γ ≥ 1 such that
γ(s) ≤ Γ
Recall that γ(s), the optimal value of program (7), is well-defined by Proposition 1. Assumption
3 requires that these optimal values are uniformly bounded in a neighborhood of z∗ (intersected
with Ω). In the analysis of stability issues of parametric optimization problems an assumption
that turns out to be important is an inf-boundedness condition, see for example [2] and refer-
ences therein. It can be shown that Assumption 3 is in fact equivalent to requiring uniform
inf-boundedness around z∗ of the subproblems (7) when these are seen as parametric problems
with s as parameter. A key part of our developments will be understanding when, in practical
settings, this condition is satisfied. In Section 3 we will show that Assumption 3 is satisfied in
the standard case in which F is differentiable, has a locally Lipschitz continuous derivative and
provides a local error bound on Ω. In addition, in Section 3 we will present further sufficient
conditions for Assumption 3 to hold that cover nonsmooth settings.
Assumption 4 There exists α̂ > 0 such that

w ∈ L (s, α) := w ∈ Ω | kw − sk ≤ α, kF(s) + G(s)(w − s)k ≤ α 2
implies
kF(w)k ≤ α̂α 2
for all s ∈ (Bδ (z∗ ) ∩ Ω) \ Z and all α ∈ [0, δ ].
This assumption requires that the mapping
w 7−→ F(s) + G(s)(w − s)
is in some sense a good approximation of the mapping w 7−→ F(w) for w ∈ Ω with w suffi-
ciently close to s. In Section 3 we will provide some sufficient conditions for Assumption 4. In
particular, we will prove in Subsection 3.1 that also Assumption 4 holds in the standard setting
in which F is differentiable with F 0 being locally Lipschitzian. Moreover, Subsection 3.3 will
show that this assumption may hold also for structured nondifferentiable functions.
The next theorem provides the local quadratic convergence of Algorithm 1 if Assumptions
1 – 4 are satisfied.
8
Theorem 1 Algorithm 1 is well-defined for any starting point z0 ∈ Ω. If Assumptions 1 – 4 are
satisfied, then there is r > 0 such that any infinite sequence {zk } generated by Algorithm 1 with
starting point z0 ∈ Br (z∗ ) ∩ Ω converges quadratically to some ẑ ∈ Z.
In order to prove this theorem we need two preliminary lemmas that are stated and proved next.
Lemma 1 Let Assumption 3 be satisfied and define the set F (s, Γ) by
F (s, Γ) := z ∈ Ω | kz − sk ≤ ΓkF(s)k, kF(s) + G(s)(z − s)k ≤ ΓkF(s)k2 .

Then, for any s ∈ Bδ (z∗ ) ∩ Ω, the set F (s, Γ) is nonempty. If, in addition, Assumption 1 is
satisfied, then
kF(s) + G(s)(z − s)k ≤ ΓL2 dist[s, Z]2 and kz − sk ≤ ΓL dist[s, Z]
hold for all z ∈ F (s, Γ).
Proof. Let us choose any s ∈ Bδ (z∗ ) ∩ Ω and let (z(s), γ(s)) be a solution of problem (7). Then,
Assumption 3 yields z(s) ∈ F (s, Γ) so that this set is nonempty.
Now, let z ∈ F (s, Γ) be arbitrary but fixed. The definition of the set F (s, Γ) and Assumption
1 imply
kF(s) + G(s)(z − s)k ≤ ΓkF(s)k2 ≤ ΓL2 dist[s, Z]2
and
kz−sk ≤ ΓkF(s)k ≤ ΓL dist[s, Z].
Lemma 2 Let Assumptions 1 – 4 be satisfied. Then, there are ε > 0 and C > 0 such that, for
any s ∈ Bε (z∗ ) ∩ Ω,
1
dist[z, Z] ≤ C dist[s, Z]2 ≤ dist[s, Z]
2
holds for all z ∈ F (s, Γ).
Proof. Let us first choose any ε according to

1
min δ , δ Γ−1 L−1 , α̂ −1 `−1 Γ−2 L−2 .

0<ε ≤ (8)
2
For s ∈ Z the assertion is clear because then F (s, Γ) = {s} holds.
So, let us choose s ∈ (Bε (z∗ )∩Ω)\Z and z ∈ F (s, Γ). Lemma 1, together with (8), provides
1
kz − sk ≤ ΓL dist[s, Z] ≤ ΓLks − z∗ k ≤ ΓLε ≤ δ .
2
Therefore,
δ δ
kz∗ − zk ≤ kz∗ − sk + ks − zk ≤ + = δ,
2 2
i.e., z ∈ Bδ (z∗ ) ∩ Ω follows. Since z ∈ F (s, Γ) and Γ ≥ 1 yield
kF(s) + G(s)(z − s)k ≤ ΓkF(s)k2 ≤ Γ2 kF(s)k2
9
and
kz − sk ≤ ΓkF(s)k,
we have z ∈ L (s, α), with α := ΓkF(s)k. Moreover,
δ
α = ΓkF(s)k ≤ ΓL dist[s, Z] ≤ ΓLε ≤
2
follows by Assumption 1 and (8). Thus, Assumption 4 implies
kF(z)k ≤ α̂α 2 = α̂Γ2 kF(s)k2 .
Using this, Assumptions 1 and 2, and (8), we obtain
dist[z, Z] ≤ `kF(z)k
≤ α̂`Γ2 kF(s)k2
≤ α̂`Γ2 L2 dist[s, Z]2
≤ α̂`Γ2 L2 ε dist[s, Z]
1
≤ dist[s, Z].
2
Hence, the assertion follows with C := α̂`Γ2 L2 .
Proof of Theorem 1
We already noted that Algorithm 1 is well-defined for any z0 ∈ Ω. With ε according to (8) let
us choose r so that
ε
0<r≤ .
1 + 2ΓL
We first show by induction that the assertions
zk ∈ Bε (z∗ ) ∩ Ω (9)
and
zk+1 ∈ F (zk , Γ) (10)
are valid for all k ∈ N. For k = 0 the first assertion is clear by r < ε. Moreover, since r < ε < δ
due to (8), Assumption 3 implies z1 ∈ F (z0 , Γ). Suppose now that (9) and (10) hold for k =
0, . . . , ν. To show zν+1 ∈ Bε (z∗ ) ∩ Ω we first note that
ν
kzν+1 − z∗ k ≤ kzν − z∗ k + kzν+1 − zν k ≤ kz0 − z∗ k + ∑ kz j+1 − z j k. (11)
| {z } j=0
≤r
Due to (9) and (10) for k = 0, . . . , ν and Lemma 1 we have, for all j = 0, . . . , ν,
kz j+1 − z j k ≤ ΓL dist[z j , Z]. (12)
Because of (9) and (10), Lemma 2 implies

j
1 1
dist[z , Z] ≤ dist[z j−1 , Z] ≤ · · · ≤
j
dist[z0 , Z]
2 2
10
for j = 0, . . . , ν. Therefore, using (11) and (12), we obtain
j ν
ν+1 ∗ 0 1
kz − z k ≤ r + ΓL dist[z , Z] ∑
| {z } j=0 2
≤r | {z }
≤2
≤ (1 + 2ΓL)r ≤ ε.
Thus, zν+1 ∈ Bε (z∗ ) ∩ Ω is valid. This and Assumption 3 imply zν+2 ∈ F (zν+1 , Γ). Hence, (9)
and (10) hold for k = ν + 1 and, consequently, for all k ∈ N.
Because of (9) and (10), Lemma 2 provides
1
dist[zk+1 , Z] ≤ C dist[zk , Z]2 ≤ dist[zk , Z] (13)
2
for all k ∈ N. This yields
lim dist[zk , Z] = 0. (14)
k→∞
For j, k ∈ N with k > j we obtain from Lemma 1, (10), and Lemma 2 that
k−1 k−1 i
k j i+1 i j 1
kz − z k ≤ ∑ kz − z k ≤ ΓL dist[z , Z] ∑ ≤ 2ΓL dist[z j , Z]. (15)
i= j i= j 2
So, due to (14), {zk } is a Cauchy sequence and thus, by the closedness of Z, converges to some
ẑ ∈ Z.
Finally, we prove the convergence rate. The use of (15) for k + 1 instead of j and k + j
instead of k together with (13) leads to
kzk+ j − zk+1 k ≤ 2ΓL dist[zk+1 , Z] ≤ 2CΓL dist[zk , Z]2
for any k, j ∈ N with j > 1. Passing to the limit for j → ∞ we obtain
kẑ − zk+1 k ≤ 2CΓL dist[zk , Z]2 ≤ 2CΓLkẑ − zk k2 .
The latter shows the quadratic convergence of {zk } to ẑ and completes the proof.
Remark 1 Algorithm 1 can be modified in such a way that in each step the subproblem (7) with
s := zk is not solved exactly. Instead, only a feasible point (zk+1 , γ k+1 ) of (7) is determined. If,
for some S > 0, γ k ≤ S can be ensured for all k ∈ N the theory provided in this section is
applicable (just replace Γ by S in the proofs) and shows the local quadratic convergence of the
modified algorithm. There are several possibilities to achieve this. Such techniques for saving
computational costs belong to our current research work [6] but they are not within the scope
of this paper.
It may be useful to conclude this section by illustrating the behavior of the method on a
very simple linear problem. Suppose that F : R → R is given by F(z) := z, Ω := R, and
G(s) := F 0 (s) = 1. The only solution of this problem is z∗ = 0. It can be readily checked
that Assumptions 1 – 4 are all satisfied. Given zk , subproblem (7) can be written as
min γ
z,γ
−γ|zk |2 ≤ z ≤ γ|zk |2 ,
−γ|zk | ≤ z − zk ≤ γ|zk |,
γ ≥ 0.
11
It is easy to verify that this problem has a unique solution given by
(zk )3
zk+1 = .
|zk | + (zk )2
Using this, it is simple seen that {zk } converges to 0 from any starting point z0 and that the
convergence rate is quadratic.
We remark that the standard Newton method would converge to the solution in one iteration
on this linear problem. The non-finiteness on linear problems is in a sense a “price” we have
to pay in order to be able to tackle a much wider array of problems than the standard Newton
method does. However, without going into details, we mention that under suitable assumptions
the standard Newton method may be regarded as an inexact version of Algorithm 1, cf. Remark
1.
3 Discussion of Assumptions
In this section we discuss in detail the new Assumptions 3 and 4. To this end, we first investigate
these assumptions in relation to existing conditions, particularly the (semi)smoothness of F. In
Subsection 3.1 we show that some standard conditions that have been used in the literature to
establish local convergence of Newton-type methods in nonstandard situations, imply our as-
sumptions. Furthermore, in Subsection 3.2 and 3.3 we establish that our key Assumptions 3 and
4 can hold for some genuinely nonsmooth functions F with nonisolated solutions. In particular,
in Subsection 3.2 we develop sufficient conditions for Assumption 3 that are applicable when F
is a continuous selection of functions, while Subsection 3.3 deals with conditions which guar-
antee Assumption 4 for an important class of continuous selections. The results of this section
will be used in Sections 4 and 5 to analyze the applicability of Algorithm 1 to the solution of,
among others, KKT systems, complementarity and feasibility problems.
3.1 Relations to Existing Conditions

In this subsection we first introduce two conditions that have been used in the literature to estab-
lish state-of-the-art local quadratic convergence results for systems that are either nonsmooth
or have nonisolated solutions. Roughly speaking, we prove that any of these two conditions,
Conditions 1 and 2 below, implies our Assumptions 3 and 4. Examples in Section 5 show that
the reverse implications do not hold, thus establishing that our framework improves on existing
methods.
Condition 1 below is a smoothness condition that, in combination with Assumption 2, was
used to prove local convergence properties of a Levenberg-Marquardt method for the solution of
constrained systems of equations with possibly nonisolated solutions, see [25]. As Assumptions
3 and 4, Condition 1 restricts the choice of the mapping G.
Condition 1 There exist κ0 > 0 and δ0 > 0 such that
kF(s) + G(s)(w − s) − F(w)k ≤ κ0 kw − sk2 (16)
holds for all pairs (w, s) with w ∈ Bδ0 (z∗ ) ∩ Ω and s ∈ (Bδ0 (z∗ ) ∩ Ω) \ Z.
12
Note that Condition 1 implies the differentiability of F for all s in the interior of the set
(Bδ (z∗ ) ∩ Ω) \ Z. Vice versa, if F is continuously differentiable with Lipschitz gradient on
Bδ0 (z∗ ) ∩ Ω then Condition 1 holds. Proposition 2 will show that Condition 1 implies both
Assumption 3 (if F provides a local error bound on Ω) and Assumption 4.
The second condition we will consider plays a crucial role for proving local quadratic con-
vergence of semismooth Newton methods. Let F be locally Lipschitz continuous around a
solution z∗ ; we denote by ∂ F(z) the generalized Jacobian of Clarke that, in turn, is the convex
hull of the limiting Jacobian (or B-subdifferential) ∂B F(z), i.e., ∂ F(z) := conv ∂B F(z) with
∂B F(z) := { lim F 0 (z` ) | lim z` = z, z` ∈ DF },

`→∞ `→∞
where DF ⊆ Rn is the set of points where F is differentiable.

Condition 2 F is locally Lipschitz continuous and there exists κ1 > 0 such that
sup {kF(s) +V (z∗ − s)k |V ∈ ∂ F(s)} ≤ κ1 kz∗ − sk2 (17)

Condition 2 does not imply differentiability of F. Even for Ω = Rn , this assumption is slightly
weaker than the strong semismoothness of F at a solution z∗ , see [15, Section 5] for a discus-
sion. Moreover, it is easily seen that we can equivalently state the condition by replacing ∂ F(z)
with ∂B F(z) in (17). To prove superlinear or quadratic convergence of the semismooth New-
ton method the nonsingularity of all matrices in ∂B F(z∗ ) is needed in [30], where it is further
assumed that m = n. In our setting, in which m and n can be different, we will generalize the
condition in [30] by assuming that the rank of ∂B F(z∗ ) is n. Note that this condition implies that
the solution z∗ is locally unique. Proposition 4 will show that Condition 2 and the local unique-
ness of z∗ imply Assumption 3 when G(s) ∈ ∂ F(s). Moreover, it will be clear from Proposition
5 that Condition 2 with the rank condition above guarantees Assumption 4 for G(s) ∈ ∂B F(s).
Given the facts described above, to show that Assumptions 3 and 4 can hold although neither
Condition 1 nor Condition 2 (the latter associated with the full rank condition) is satisfied, we
refer to Examples 5 and 7 – 11 in Section 5.
We now show that in our setting Condition 1 implies both Assumptions 3 and 4.
Proposition 2 Let Condition 1 be satisfied. Then, for δ > 0 sufficiently small, the following
assertions are valid:
(a) If Assumption 2 holds then Assumption 3 is satisfied.
(b) Assumption 4 is satisfied.
Proof. Let us choose δ according to 0 < δ ≤ 12 δ0 with δ0 from Condition 1.
(a) First note that γ(s) = 0 for any s ∈ Z. Now, for s ∈ (Bδ (z∗ ) ∩ Ω) \ Z let s⊥ ∈ Z be so that
ks − s⊥ k = dist[s, Z] holds. Then, we have
ks⊥ − z∗ k ≤ ks⊥ − sk + ks − z∗ k ≤ 2δ ≤ δ0 .
Thus, inequality (16) with w := s⊥ and Assumption 2 yield
kF(s) + G(s)(s⊥ − s)k ≤ κ0 ks⊥ − sk2 ≤ κ0 `2 kF(s)k2 .
13
Assumption 2 further provides
ks − s⊥ k = dist[s, Z] ≤ `kF(s)k.
Obviously, (s⊥ , Γ⊥ ) with Γ⊥ := ` max{1, κ0 `} is feasible for problem (7). Hence, As-
sumption 3 is satisfied.
(b) Let s ∈ (Bδ (z∗ ) ∩ Ω) \ Z and α ∈ [0, δ ] be arbitrarily chosen. Then, w ∈ L (s, α) implies
w ∈ Ω and
kw − z∗ k ≤ kw − sk + ks − z∗ k ≤ α + δ ≤ 2δ ≤ δ0 .
By (16) and w ∈ L (s, α) it follows that
kF(w)k − kF(s) + G(s)(w − s)k ≤ κ0 kw − sk2 ≤ κ0 α 2

| {z }
≤α 2
and therefore
kF(w)k ≤ (1 + κ0 )α 2 .
Thus, Assumption 4 is satisfied, with α̂ := 1 + κ0 .
The next result easily follows by recalling that Condition 1 holds if F is sufficiently smooth in
a neighborhood of z∗ .
Corollary 1 Let F be differentiable with a locally Lipschitz continuous derivative around z∗ .

Then, with G(s) = F 0 (s) for all s ∈ Bδ (z∗ ) ∩ Ω, assertions (a) and (b) of Proposition 2 are valid
for δ > 0 sufficiently small.
We now consider Condition 2 and its relationship to Assumptions 3 and 4. To analyze this point
we first state a further condition, Condition 3 below, showing that Proposition 3 is equivalent to
Assumption 3, provided that Assumptions 1 and 2 hold.
Condition 3 There exists κ > 0 such that, for any s ∈ Bδ (z∗ ) ∩ Ω, an s ∈ Ω exists with
ks − s k ≤ κ dist[s, Z] (18)
and
kF(s) + G(s)(s − s)k ≤ κ dist[s, Z]2 . (19)
Note that, if z∗ is an isolated solution and δ > 0 is sufficiently small, we can take s := z∗ .
Then, (18) holds for any κ ≥ 1 and (19) is related to Condition 2. This shows that Condition 3,
and therefore Assumption 3 (see next proposition), can also be viewed as a relaxation of strong
semismoothness.
Proposition 3 The following assertions are valid:
(a) If Assumption 2 is satisfied then Condition 3 implies Assumption 3.
(b) If Assumptions 1 is satisfied then Assumption 3 implies Condition 3.
14
Proof. Let s ∈ Bδ (z∗ ) ∩ Ω be arbitrarily chosen.
(a) Due to Condition 3, there is s ∈ Ω satisfying (18) and (19). This and Assumption 2 imply
ks − s k ≤ κ`kF(s)k
and
kF(s) + G(s)(s − s)k ≤ κ`2 kF(s)k2 .
Obviously, (s , Γ ) with Γ := κ` max{1, `} is feasible for problem (7). Hence, Assump-
tion 3 is satisfied.
(b) Let (z(s), γ(s)) denote a solution of problem (7) which exists due to Proposition 1. By
Assumption 3, we have z(s) ∈ F (s, Γ) (see Lemma 1 for the definition of the set F (s, Γ)).
This, together with Assumption 1, yields
kz(s) − sk ≤ ΓkF(s)k ≤ ΓL dist[s, Z]
and
kF(s) + G(s)(z(s) − s)k ≤ ΓkF(s)k2 ≤ ΓL2 dist[s, Z]2 .
Thus, Condition 3 is satisfied for s := z(s) and κ := ΓL max{1, L}.
The following two propositions establish the desired relations between Condition 2 and As-
sumptions 3 and 4, respectively.
Proposition 4 Let Assumption 2 and Condition 2 be satisfied and suppose that z∗ is the only
solution of (1) within Bδ1 (z∗ )∩Ω for some δ1 > 0. Then, with G(s) ∈ ∂ F(s) for all s ∈ Bδ (z∗ )∩
Ω, Assumption 3 holds for δ > 0 sufficiently small.
Proof. Since z∗ is the only solution of (1) in Bδ1 (z∗ ) ∩ Ω, we have ks − z∗ k = dist[s, Z] for all
s ∈ Bδ (z∗ ) ∩ Ω if δ ∈ (0, 12 δ1 ]. Furthermore, (17) in Condition 2 and G(s) ∈ ∂ F(s) yield
kF(s) + G(s)(z∗ − s)k ≤ κ1 ks − z∗ k2 = κ1 dist[s, Z]2 .
Thus, Condition 3 is satisfied with κ := max{1, κ1 } and with s := z∗ for all s ∈ Bδ (z∗ ) ∩ Ω.
By Proposition 3 (a), Assumption 3 holds for δ ∈ (0, 12 δ1 ].
Proposition 5 Suppose that Condition 2 is satisfied and that the rank of all matrices in ∂B F(z∗ )
is equal to n. Then, with G(s) ∈ ∂B F(s) for all s ∈ Bδ (z∗ ) ∩ Ω, Assumption 4 holds for δ > 0
sufficiently small.
Proof. Let s ∈ (Bδ (z∗ ) ∩ Ω) \ Z and α ∈ [0, δ ] be arbitrarily chosen. Take any w ∈ L (s, α).
Then, the inequalities
kF(s) + G(s)(w − s)k ≤ α 2 and kw − sk ≤ α (20)
hold and, by the latter inequality,

w ∈ B2δ (z∗ ) ∩ Ω (21)
follows. The local Lipschitz continuity of F in Condition 2 implies that, for all x, y ∈ B2δ (z∗ ) ∩
Ω,
kF(x) − F(y)k ≤ L0 kx − yk,
15
with some L0 > 0. In particular,
kF(w)k = kF(w) − F(z∗ )k ≤ L0 kw − z∗ k (22)
holds. Furthermore, choosing δ > 0 sufficiently small, the rank condition on ∂B F(z∗ ) and the
upper semicontinuity of ∂B F : Rn ⇒ Rm yield that any V ∈ ∂B F(s) has (full) rank n for all
s ∈ Bδ (z∗ ). In addition, there exists c > 0 such that, for all s ∈ Bδ (z∗ ) ∩ Ω,
kV + k ≤ c for all V ∈ ∂B F(s), (23)
where V + ∈ Rn×m denotes the pseudo-inverse of V . Note that rank(V ) = n ≤ m implies V +V =

I ∈ Rn×n . Setting V := G(s), we therefore obtain
w − z∗ = w − s + s − z∗ = V + ( (F(s) +V (w − s)) − F(s) −V (z∗ − s) ).
Thus, by (23), (20), and Condition 2 it follows that
kw − z∗ k ≤ kV + k(kF(s) +V (w − s)k + kF(s) +V (z∗ − s)k)

≤ c(α 2 + κ1 kz∗ − sk2 )
≤ c(α 2 + κ1 (kw − z∗ k2 + 2kw − z∗ kkw − sk + kw − sk2 ))
≤ c(α 2 + κ1 (kw − z∗ k2 + 2αkw − z∗ k + α 2 ))
and
kw − z∗ k(1 − cκ1 kw − z∗ k − 2cκ1 α) ≤ c(1 + κ1 )α 2 . (24)
For δ > 0 sufficiently small, we have
1
1 − cκ1 kw − z∗ k − 2cκ1 α ≥
2
due to (21) and α ∈ [0, δ ] (as needed for Assumption 4). This, (22), and (24) yield
kF(w)k ≤ L0 kw − z∗ k ≤ 2cL0 (1 + κ1 )α 2 .
Hence, Assumption 4 holds with α̂ := 2cL0 (1 + κ1 ) and δ > 0 sufficiently small.

The next corollary summarizes the assertions of the last two propositions.
Corollary 2 Let Condition 2 be satisfied and suppose that the rank of all matrices in ∂B F(z∗ ) is
equal to n. Then, with G(s) ∈ ∂B F(s) for all s ∈ Bδ (z∗ ) ∩ Ω, Assumptions 3 and 4 are satisfied
3.2 Assumption 3
This subsection deals with continuous selections of functions. We will provide conditions under
which Assumption 3 holds for continuous selections. The results of this subsection are used in
Section 4 to prove convergence properties of Algorithm 1 for the solution of reformulated KKT
systems.
16
Definition 1 Let F 1 , . . . , F p : Rn → Rm be continuous functions. A function F : Rn → Rm is said
to be a continuous selection of the functions F 1 , . . . , F p on the set N ⊆ Rn if F is continuous
on N and F(z) ∈ {F 1 (z), . . . , F p (z)} for all z ∈ N . We denote by
A (z) := i ∈ {1, . . . , p} | F(z) = F i (z)

the set of indices of those selection functions that are active at z and by
Zi := {z ∈ Ω | F i (z) = 0}
the solution set of the constrained equation F i (z) = 0, z ∈ Ω.

The function F is termed piecewise affine (linear) if N = Rn and the selection functions
are all affine (linear).
For a better understanding we give an example of a continuous selection.
Example 1 Let H : Rn → Rn be a continuous function. Then, the function F : Rn → Rn with

 
min{z1 , H1 (z)}
F(z) := 
 .. 
. 
min{zn , Hn (z)}
n
is a continuous selection of functions F 1 , F 2 , . . . , F 2 on Rn . For example, if n = 2 we have the
functions F i with i = 1, . . . , 4 given by
F 1 (z) := (z1 , z2 )> , F 2 (z) := (H1 (z), z2 )> ,

F 3 (z) := (z1 , H2 (z))> , F 4 (z) := (H1 (z), H2 (z))> .
The next is the key result of this section, roughly speaking showing that, with respect to As-
sumption 3, a continuous selection inherits the behavior of its “pieces”.
Theorem 2 Let F : Rn → Rm be a continuous selection of the functions F 1 , . . . , F p : Rn → Rm

on the set Bδ (z∗ ) ∩ Ω. Moreover, suppose that, for every i ∈ A (z∗ ), a number Γi ≥ 1 and a
mapping Gi : Rn → Rm×n exist such that
γi (s) ≤ Γi
holds for all s ∈ Bδ (z∗ ) ∩ Ω, where γi (s) denotes the optimal value of the program
min γ
z,γ
z ∈ Ω,
kF i (s) + Gi (s)(z − s)k ≤ γkF i (s)k2 , (25)
kz − sk ≤ γkF i (s)k,
γ ≥ 0.
Then, with G(s) ∈ {Gi (s) | i ∈ A (s)} for all s ∈ Bδ (z∗ ) ∩ Ω, Assumption 3 is satisfied.
17
Proof. Let us assume the contrary. Then, a sequence {sk } in Bδ (z∗ ) ∩ Ω exists such that {sk }
converges to z∗ and {γ(sk )} tends to infinity. The latter means that the sequence of optimal
values of the programs
min γ
z,γ
z ∈ Ω,
kF(sk ) + G(sk )(z − sk )k ≤ γkF(sk )k2 ,
kz − sk k ≤ γkF(sk )k,
γ ≥ 0
goes to infinity. Subsequencing if necessary, we can assume without loss of generality that there
is an index j ∈ {1, . . . , p} with j ∈ A (sk ) for all k ∈ N such that
F(sk ) = F j (sk ) and G(sk ) = G j (sk )
for all k ∈ N. Therefore, we have

lim γ j (sk ) = ∞ (26)
k→∞
for the optimal values γ j (sk ) of the programs (25) for i = j. By continuity of F j and F, we
obviously get
F j (z∗ ) = lim F j (sk ) = lim F(sk ) = F(z∗ )
k→∞ k→∞
so that j ∈ A (z∗ )
follows. Therefore, by assumption, γ j (sk ) ≤ Γ j holds for all sufficiently large
k, which contradicts (26). Hence, Assumption 3 is valid.
In order to guarantee that the “pieces” of a continuous selection satisfy the assumptions of
Theorem 2 we can use, in view of Proposition 2 (see below for the details), the following
condition.
Condition 4 The function F : Rn → Rm is a continuous selection of the functions F 1 , . . . , F p :

Rn → Rm on the set Bδ (z∗ ) ∩ Ω and, for every i ∈ A (z∗ ), an `i > 0 exists such that
dist[s, Zi ] ≤ `i kF i (s)k
Note that, for every i ∈ A (z∗ ), Zi is nonempty since F i (z∗ ) = F(z∗ ) = 0. The next result uses
Condition 4 to guarantee Assumption 3 is satisfied.
Corollary 3 Suppose that Condition 4 is satisfied. Moreover, suppose that the functions
F 1 , . . . , F p : Rn → Rm aredifferentiable on B δ (z∗ )∩Ω with locally Lipschitz continuous deriva-
tives. Then, with G(s) ∈ (F i )0 (s) | i ∈ A (s) for all s ∈ Bδ (z∗ ) ∩ Ω, Assumption 3 is satisfied
Proof. Let i ∈ A (z∗ ) be arbitrary but fixed. Note, that Condition 4 implies Assumption 2 for
the function F i . Since F i is differentiable and has a locally Lipschitz continuous derivative,
Assumption 3 holds for F i (for δ > 0 sufficiently small) due to Corollary 1. So, there is a
constant Γi ≥ 1 such that the optimal values γi (s) of the programs (25), with Gi (s) := (F i )0 (s),
18
are bounded above by Γi for all s ∈ Bδ (z∗ ) ∩ Ω. Therefore, the hypotheses of Theorem 2 are
satisfied and Assumption 3 holds.
From the result just proved we can easily deduce that Assumption 3 is in particular satisfied if
F is piecewise affine and Ω is a polyhedral set.
Corollary 4 Let F : Rn → Rm be piecewise affine and Ω ⊆ Rn a polyhedral set. Then, As-
sumption 2 is satisfied. Moreover, with G(s) ∈ {(F i )0 (s) | i ∈ A (s)} for all s ∈ Bδ (z∗ ) ∩ Ω,
Assumption 3 is also valid for δ > 0 sufficiently small.
Proof. The validity of Assumption 2 can be derived from Theorem 2.1 in [28]. Due to Hoff-
man’s error bound [21], Condition 4 is satisfied and, by Corollary 3, Assumption 3 as well (for
δ > 0 sufficiently small).
The next two results, assuming that the function F has a rather special structure, are more tech-
nical; their importance will become clear when we deal with KKT systems in the next section.
Let us consider the situation, where the mapping F and the vector z are split so that
F(z) = (Fa (z), Fb (z)) ∈ Rma × Rmb , z = (x, y) ∈ Rnx × Rny , (27)
where ma , mb , nx , ny ∈ N with ma + mb = m and nx + ny = n. Moreover, the matrix G(z) ∈ Rm×n
(i.e., the Jacobian or a substitute) is also split accordingly, namely
Ga (z) Gya (z)
x
x y
G(z) = (G (z), G (z)) =
Gxb (z) Gyb (z)
with Gxa (z) ∈ Rma ×nx , Gya (z) ∈ Rma ×ny , Gxb (z) ∈ Rmb ×nx , and Gyb (z) ∈ Rmb ×ny .
Theorem 3 Let Assumption 2 be satisfied and let F and z be split according to (27). Suppose
that
z = (x, y) ∈ Z ∩ Bδ (z∗ ) implies x = x∗
and that Ω has the form Ω = Rnx × Ω0 for some polyhedral set Ω0 ⊆ Rny . Moreover, suppose
that F is a continuous selection of functions F 1 , . . . , F p on Rn , that Fa is differentiable with
locally Lipschitz continuous derivative, that Fa (x∗ , ·) is affine and that Fb does not depend on
the variable x and is piecewise affine. Then, with G(s) ∈ {(F i )0 (s) | i ∈ A (s)} for all s ∈
Bδ (z∗ ) ∩ Ω, Assumption 3 is satisfied for δ > 0 sufficiently small.
Proof. By the assumptions made, the mapping F(x∗ , ·) : Rny → Rm is piecewise affine and thus,
by Corollary 4, the optimal value γ(y) of the program
min γ
w,γ
(x∗ , w) ∈ Ω,
kF(x∗ , y) + Gy (x∗ , y)(w − y)k ≤ γkF(x∗ , y)k2 ,
kw − yk ≤ γkF(x∗ , y)k,
γ ≥ 0
is bounded by some Γ∗ ≥ 1 for all (x∗ , y) ∈ Bδ (z∗ ) ∩ Ω with δ > 0 sufficiently small. This
means that, for any (x∗ , y) ∈ Bδ (z∗ ) ∩ Ω, there is ŷ ∈ Ω0 such that
kF(x∗ , y) + Gy (x∗ , y)(ŷ − y)k ≤ Γ∗ kF(x∗ , y)k2 (28)
19
and
kŷ − yk ≤ Γ∗ kF(x∗ , y)k. (29)
Moreover, note that, for all (x, y) ∈ Rnx × Rny ,
Fb (x, y) = Fb (x∗ , y), (30)
Gxb (x, y) = 0, and Gyb (x, y) = Gyb (x∗ , y) (31)
hold since Fb does not depend on x.
Now let us choose any s := (x, y) ∈ Bδ (z∗ ) ∩ Ω. Setting ŝ := (x∗ , ŷ) we obtain from (28) –
(31) that
kF(s) + G(s)(ŝ − s)k
= kF(s) + Gx (s)(x∗ − x) + Gy (s)(ŷ − y)k
= kF(x∗ , y) + Gy (x∗ , y)(ŷ − y) + F(s) − F(x∗ , y) + Gx (s)(x∗ − x)
+ (Gy (s) − Gy (x∗ , y)) (ŷ − y)k
≤ Γ∗ kF(x∗ , y)k2
+ kFa (s) − Fa (x∗ , y) + Gxa (s)(x∗ − x)k + kFb (s) − Fb (x∗ , y) + Gxb (s)(x∗ − x)k
| {z } | {z }
≤L0 kx−x∗ k2 =0
y y
+ kGya (s) − Gya (x∗ , y)k ky − ŷk + kGb (s) − Gb (x∗ , y)k ky − ŷk
| {z } | {z }
≤L0 kx−x∗ k =0
∗
≤ Γ∗ kF(x , y)k + L0 kx − x∗ k2 + ky − ŷkkx − x∗ k
2

≤ Γ∗ kF(x∗ , y)k2 + L0 (dist[s, Z]2 + Γ∗ kF(x∗ , y)k dist[s, Z]), (32)

where L0 > 0 exists due to the local Lipschitz continuity of (Gxa , Gya ) = Fa0 . By the assumptions
made, F is Lipschitz continuous on Bδ (z∗ ) ∩ Ω with some modulus L1 > 0. This yields
kF(x∗ , y)k − kF(s)k ≤ kF(x∗ , y) − F(x, y)k ≤ L1 kx − x∗ k.
Thus, with dist[s, Z] ≤ `kF(s)k by Assumption 2,
kF(x∗ , y)k ≤ L1 kx − x∗ k + kF(s)k
≤ L1 dist[s, Z] + kF(s)k (33)
≤ (L1 ` + 1)kF(s)k
follows. Therefore, using Assumption 2 again, we obtain from (32) that
kF(s) + G(s)(ŝ − s)k ≤ (Γ∗ (L1 ` + 1)2 + L0 `2 + L0 Γ∗ `(L1 ` + 1))kF(s)k2 (34)
and, with (29) and (33), that
ks − ŝk ≤ kx − x∗ k + ky − ŷk
≤ dist[s, Z] + Γ∗ (L1 ` + 1)kF(s)k (35)
≤ (` + Γ∗ (L1 ` + 1))kF(s)k.
Hence, setting
Γ̂ := max{Γ∗ (L1 ` + 1)2 + L0 `2 + L0 Γ∗ `(L1 ` + 1), ` + Γ∗ (L1 ` + 1)}
(34) and (35) show that the point (ŝ, Γ̂) is feasible for problem (7) so that its optimal value is
bounded by Γ̂. Since s ∈ Bδ (z∗ )∩Ω was chosen arbitrarily, Assumption 3 is satisfied for Γ := Γ̂
and δ > 0 sufficiently small.
20
Proposition 6 Let Assumption 2 be satisfied and let F be split according to (27). Suppose that
F is a continuous selection of functions F 1 , . . . , F p : Rn → Rm on Rn , that Fa is differentiable
with locally Lipschitz continuous derivative, that Fb is piecewise affine, and δ̂ > 0 exists such
that A (z) = A (z∗ ) holds for all z ∈ Z ∩ Bδ̂ (z∗ ). Then, with G(s) ∈ {(F i )0 (s) | i ∈ A (s)} for all
s ∈ Bδ (z∗ ) ∩ Ω, Assumption 3 is satisfied for δ > 0 sufficiently small.
Proof. By the continuity of the selection functions F 1 , . . . , F p we can assume that δ > 0 is
sufficiently small so that
A (s) ⊆ A (z∗ ) for all s ∈ Bδ (z∗ ). (36)
For the rest of the proof, let s ∈ Bδ (z∗ ) ∩ Ω be arbitrarily chosen and s⊥ ∈ Z satisfy dist[s, Z] =
ks−s⊥ k. This implies s⊥ ∈ Z ∩B2δ (z∗ ). Thus, with δ ∈ (0, 21 δ̂ ], we obtain that A (s⊥ ) = A (z∗ )
by assumption. Due to (36) this implies A (s) ⊆ A (s⊥ ). Hence, there is an index i ∈ A (s),
such that
F(s) = F i (s), G(s) = (F i )0 (s), and F(s⊥ ) = F i (s⊥ ) (37)
hold. Since Fbi is affine, we have Fbi (s) = Ai s + bi , with some Ai ∈ Rmb ×n and bi ∈ Rmb , and
(Gxb (s), Gyb (s)) = Ai . This together with (37) yields
Fb (s) + (Gxb , Gyb )(s⊥ − s) = Ai s + bi + Ai s⊥ − Ai s = Ai s⊥ + bi = Fb (s⊥ ) = 0.
Thus, taking into account Assumption 2, we obtain
kF(s) + G(s)(s⊥ − s)k = kFa (s) + (Gxa , Gya )(s⊥ − s)k

= kFa (s) + Fa0 (s)(s⊥ − s)k
(38)
≤ L0 ks − s⊥ k2
≤ L0 `2 kF(s)k2 ,
where L0 > 0 exists due to the local Lipschitz continuity of Fa0 . Assumption 2 also ensures
ks − s⊥ k = dist[s, Z] ≤ `kF(s)k for all s ∈ Bδ (z∗ ). (39)
Since s ∈ Bδ (z∗ ) ∩ Ω was chosen arbitrarily, (38) and (39) show that Assumption 3 is satisfied
for Γ := ` max{L0 `, 1} and δ > 0 sufficiently small.
3.3 Assumption 4
In this subsection we show that Assumption 4 is satisfied by a particular, but practically impor-
tant class of nonsmooth functions. The interesting thing to note is that the set Ω will play an
important role in the analysis. It will turn out that even if Ω = Rn in the statement of the prob-
lem it might be crucial to force Ω to be a smaller set. This will be done without eliminating any
solution and will have the advantage of making Assumption 4 satisfied. In the Introduction we
mentioned that the set Ω could have a not obvious technical use: the results in this subsection
substantiate this statement.
We already presented sufficient conditions for Assumption 4 in Propositions 2 and 5. In
this subsection we investigate Assumption 4 for a class of continuous selections with a par-
ticular structure. To this end, let F be split again according to (27). Moreover, suppose that
21
Fa : Rn → Rma is differentiable with a locally Lipschitz continuous derivative and that Fb :
Rn → Rmb is given by
j
 
min {B1 (z)}
 1≤ j≤q 
j
Fb (z) := min {B (z)} :=  ..
, (40)
 
1≤ j≤q
.
 j 
min {Bmb (z)}
1≤ j≤q
where B1 , . . . , Bq : Rn → Rmb are given differentiable functions with a locally Lipschitz continu-
ous derivative. Obviously, F is a continuous selection of functions F 1 , . . . , F p : Rn → Rm on Rn
with p = qmb , where F i (for i = 1, . . . , p) is differentiable and has a locally Lipschitz continuous
derivative.
To motivate our approach we first consider a simple example whose purpose is twofold. On
the one hand it shows how the structure described above arises quite naturally when dealing
with optimization and complementarity problems. On the other hand it illustrates how we can,
quite naturally, add some constraints to a system of equations without changing its solution set.
Example 2 The complementarity problem
T (x) ≥ 0, x ≥ 0, x> T (x) = 0
with a function T : Rn0 → Rn0 can be reformulated as the following system of equations

T (x) − y
F(x, y) := = 0.
min{x, y}
This system fits the above setting for T sufficiently smooth, just set Fa (z) := T (x)−y, B1 (z) := x,
and B2 (z) := y. It is clear that the solutions of the system
F(x, y) = 0, (x, y) ∈ Rn
are the solutions of the original complementarity system. We can also define an alternative,
constrained system by setting
Ω := {(x, y) | x ≥ 0, y ≥ 0}
Then, the solutions of the constrained system
F(x, y) = 0, (x, y) ∈ Ω
are the same as those of the unconstrained system and of the original complementarity problem
(if we disregard the y-component). We will show shortly that although equivalent from the
point of view of the solution sets, the constrained system is advantageous with respect to the
unconstrained one in that Assumption 4 can be shown to hold for the constrained system.
The next theorem formalizes and extends what we illustrated in Example 2.
22
Theorem 4 Let F be split according to (27) with Fb : Rn → Rmb given by (40) and consider the
problem
F(z) = 0, z ∈ Ω̄, (41)
where Ω̄ ⊆ Rn is a nonempty and closed set. Suppose that F is a continuous selection of
functions F 1 , . . . , F p on Rn , that Fa : Rn → Rma and B1 , . . . , Bq : Rn → Rmb are differentiable
with locally Lipschitz continuous derivatives. For Ω := Ω̄ ∩ {z ∈ Rn | B j (z) ≥ 0, j = 1, . . . , q},
the solution set of the problem
F(z) = 0, z∈Ω (42)
coincides with the solution set of (41). Furthermore, with G(s) ∈ {(F i )0 (s) | i ∈ A (s)} for all
s ∈ Bδ (z∗ ) ∩ Ω, Assumption 4 is satisfied for problem (42).
Proof. By (40), the equivalence of the solution sets of (41) and (42) is obvious and does not
need a proof. To prove the assertion on Assumption 4 let s ∈ Bδ (z∗ ) ∩ Ω, α ∈ [0, δ ], and
w ∈ L (s, α) be arbitrarily chosen. Then,
kw − sk ≤ α and kF(s) + G(s)(w − s)k = kF i(s) + (F i(s) )0 (s)(w − s)k ≤ α 2 (43)
hold for any i(s) ∈ A (s). Since the selection functions F 1 , . . . , F p are differentiable and have
locally Lipschitz continuous Jacobians, a constant L > 0 exists such that
kF i(s) (w) − F i(s) (s) − (F i(s) )0 (s)(w − s)k ≤ Lkw − sk2 .
Therefore, by (43), we obtain
kF i(s) (w)k ≤ Lkw − sk2 + kF i(s) (s) + (F i(s) )0 (s)(w − s)k ≤ (L + 1)α 2 . (44)
For any i(w) ∈ A (w), this implies

i(w) i(s)
|Ft (w)| = |Ft (w)| ≤ (L + 1)α 2 for all t = 1, . . . , ma (45)
since Fa = Fai for all i = 1, . . . , p. Because of w ∈ L (s, α) ⊆ Ω we obtain (taking into account
the definition of Ω)
j
0 ≤ min {Bt (w)} ≤ Bti (w) for all i = 1, . . . , q and all t = 1, . . . , mb .
1≤ j≤q
Using this, (40), and (44), we get, for t = 1, . . . , mb ,

j i(w) i(s)
0 ≤ min {Bt (w)} = Fma +t (w) = Fma +t (w) ≤ Fma +t (w) ≤ (L + 1)α 2 . (46)
1≤ j≤q
Now, the latter and (45) provide
kF(w)k∞ ≤ (L + 1)α 2 .
By the equivalence of norms in Rm , this shows that Assumption 4 is satisfied.

We already discussed the importance of the convexity of the set Ω from a computational point
of view. Assuming that Ω̄ in the previous theorem is convex, then the same holds for Ω if all B j
are concave. In particular, this condition will be satisfied if the B j are affine; this latter case is
23
the most common one and will be illustrated in the next section. Another observation that may
be of interest from the computational point of view, is that the set Ω, as defined in Theorem 4,
requires the use of q additional constraints. In some case it is possible to reduce this number
and define a different Ω. Corollary 5 below deals exactly with this case and covers a situation
of interest. In particular it can be applied to Example 2 and to the reformulation of the KKT
systems we discuss in the next section. We need a simple preliminary result.
Lemma 3 Suppose that a, b ∈ R are given so that a + b ≥ 0. Then,
| min{a, b}| ≤ max{|a|, |b|}
holds.
Corollary 5 Suppose that the assumptions of Theorem 4 are satisfied except that now q := 2
and that Ω := Ω̄ ∩ {z ∈ Rn | B1 (z) + B2 (z) ≥ 0}. Then, the assertion of Theorem 4 still holds.
Proof. Just repeat the proof of Theorem 4 for q = 2 and note that, instead of (46), we get
i(w) i(s)
|Fma +t (w)| ≤ |Fma +t (w)| ≤ (L + 1)α 2
for all t = 1, . . . , mb . To verify this note first that the right inequality is implied by (44) like
in the proof of Theorem 4. The left inequality follows from the application of Lemma 3, with
a := Bt1 (w), b := Bt2 (w), by taking into account that a + b ≥ 0 due to the definition of Ω and that
i(w) i(s)
Fma +t (w) = min{Bt1 (w), Bt2 (w)}, Fma +t (w) ∈ {Bt1 (w), Bt2 (w)}
holds by (40).
One could think that our analysis is lacking and it should be possible to show that, in the setting
of Theorem 4, it can be proved that Assumption 4 holds also for the problem F(z) = 0, z ∈ Ω̄.
The following example shows that this is not the case and that the redefinition of Ω is indeed
necessary.
Example 3 Let F : R2 → R be given by F(x, y) := min{x, y} and take Ω̄ = R2 , i.e., we want to
solve the problem
min{x, y} = 0, (x, y) ∈ R2 .
Theorem 4 gives us the alternative formulation
min{x, y} = 0, x ≥ 0, y ≥ 0;
while Corollary 5 gives rise to
min{x, y} = 0, x + y ≥ 0.
Theorem 4 and Corollary 5 ensure that the second and third constrained systems do satisfy
Assumption 4. We now show that the first, unconstrained reformulation instead does not. To
this end take z∗ := (0, 0), s := (β , 0), and w := (−β , 0) (see the definition of Assumption 4),
where β is a positive number. Then, F(s) = 0, F(w) = −β and G(s) = (0, 1). Whatever α > 0,
if we take β = α/2 we have w ∈ L (s, α). But |F(w)| = α/2 and therefore it is not possible
to find a positive α̂ for which α/2 = |F(w)| ≤ α̂α 2 for any α small enough, as required by
Assumption 4.
24
Functions of the type (Fa , Fb ) considered so far in this subsection, with Fa continuously differ-
entiable and Fb of the form (40), encompass a rather wide range of problems, see also the next
section. However, it is interesting to observe that we can apply the results of this subsection
to a still wider range of functions. In fact, to satisfy Assumption 4, it is not too difficult to see
that we can deal with functions of the type (Fa , Fb ) with smooth Fa and Fb such that, roughly
j
speaking, each of its components Fb is the composition of a finite number of min- and max-
functions and of smooth functions. Since the corresponding theory is in principle very similar
to that developed in Theorem 4, but requires a very complicated notation, we chose to restrict
our presentation and to illustrate the approach to Fb of a more complex nature only through
two examples: one here below, and Example 5 in Section 5, where numerical results are also
reported.
Example 4 Consider the nonsmooth equation in two variables
F(x) := max{ex1 − 1, min{x1 , x2 − x1 }} + min{1 − x12 , x1 − x22 } = 0.
By introducing additional variables y1 := min{x1 , x2 − x1 } and y2 := min{1 − x12 , x1 − x22 } it is
easy to see that our equation is equivalent to the system
 
min{x1 , x2 − x1 } − y1
 min{1 − x2 , x1 − x2 } − y2  = 0, (47)
1 2
− min{−(ex1 − 1), −y1 } + y2
where the equivalence is to be understood in the sense that a point x∗ is a solution of the original
equation if and only if x∗ together with some suitable y∗ solves system (47). In turn, system
(47) is obviously equivalent to
 
min{x1 − y1 , x2 − x1 − y1 }
 min{1 − x2 − y2 , x1 − x2 − y2 }  = 0
1 2
min{−(ex1 − 1) − y2 , −y1 − y2 }
We see then that, at the expenses of an increase of the number of variables, we have reduced
the original equation to a system of the type considered in this subsection, to which we can
consequently apply Theorem 4.
4 Application to Karush-Kuhn-Tucker Systems

In this section we will apply Algorithm 1 to a nonsmooth reformulation of Karush-Kuhn-Tucker
(KKT) systems. Subsequently, in Section 5 we will discuss some examples that show these
conditions are the weakest among those known to ensure superlinear convergence and we will
also briefly illustrate the applicability of the theory developed so far to some other problems.
Let H : Rn0 → Rn0 be differentiable with a locally Lipschitz continuous derivative and g :
Rn0 → Rmg be twice differentiable with a locally Lipschitz continuous second-order derivative.
We consider the KKT system
H(x) + g0 (x)> u = 0, g(x) ≤ 0, u ≥ 0, u> g(x) = 0 (48)
arising from an optimization problem (in which case H is the gradient of the objective function)
or a variational inequality (in which case H is the function defining the variational inequality).
25
To simplify the presentation, we only deal with inequality constraints, the results in this section
can easily be extended if additional equality constraints are present.
It is well known that the KKT system (48) can be reformulated as a system of equations by
means of C-functions φ : R2 → R that are characterized by
φ (a, b) = 0 ⇐⇒ a ≥ 0, b ≥ 0, ab = 0
for all a, b ∈ R. Obviously, the solution sets of (48) and of
 
H(x) + g0 (x)> u
 φ (u1 , −g1 (x)) 
Φ(x, u) :=  =0 (49)
 
..
 . 
φ (umg , −gmg (x))
are equal. Recall that a solution (x∗ , u∗ ) of (49) is called degenerate if there is an index i ∈
{1, . . . , mg } so that u∗i = gi (x∗ ) = 0. It is known from [33] that Assumption 2 cannot hold at
a degenerate solution if the C-function φ is differentiable. Hence, we choose a nonsmooth
C-function. From now on, we always use the C-function φ := φmin , where
φmin (a, b) := min{a, b} for all a, b ∈ R.
An investigation about the use of other C-functions can be found in [33].
To apply the convergence theory developed in Section 2 the validity of Assumptions 1 – 4
is crucial. In order to facilitate their verification, system (49) is reformulated as the constrained
system of equations:
H(x) + g0 (x)> u
 

 g(x) + v 

F(z) :=  min{u1 , v1 }  = 0, z := (x, u, v) ∈ Ω, (50)
 
 .. 
 . 
min{umg , vmg }
m m
where Ω denotes a closed convex set with Ω ⊇ Rn0 × R+g × R+g . The solution set of (50)
is indicated by Z. Obviously, a point (x∗ , u∗ ) solves (49) if and only if z∗ = (x∗ , u∗ , −g(x∗ ))
is a solution of (50). Accordingly, a solution z∗ = (x∗ , u∗ , v∗ ) of the latter problem is called
degenerate if u∗i = v∗i = 0 for some i ∈ {1, . . . , mg }. The function F : Rn → Rn defined by (50)
with n := n0 + 2mg can be viewed in two ways, both of which will be useful in the application of
the results of the previous section. On the one hand F is a continuous selection of differentiable
functions F 1 , . . . , F p : Rn → Rn , with p := 2mg ; on the other hand the last mg components of of
F can be written as min{B1 (z), B2 (z)}, where B1 (z) = u and B2 (z) = v. Due to Theorem 4 and
Corollary 5 possible choices for Ω are
m m
Ω+ := Rn0 × R+g × R+g or Ω0 := {(x, u, v) ∈ Rn0 × Rmg × Rmg | u + v ≥ 0}.
A further advantage of the reformulation (50) is that we can use Theorem 3 and Proposition 6
to prove the validity of Assumption 3, since the last mg components of F are piecewise affine
and independent of x.
Along this lines, the next theorem describes several situations where Assumptions 1 – 4
hold and, thus, the local quadratic convergence of a sequence {zk } generated by Algorithm 1 to
a solution of (50) follows.
26
Theorem 5 Let z∗ be an arbitrary but fixed solution of problem (50) and suppose that
Ω = Ω+ or Ω = Ω0 . If any of the following conditions (a) – (d) holds, then Assumptions
1 – 4 are satisfied and there is r > 0 so that any infinite sequence {zk } generated by Algorithm
1 converges quadratically to a solution ẑ of (50), provided that z0 belongs to Br (z∗ ) ∩ Ω and
that G(zk ) ∈ {(F i )0 (zk ) | i ∈ A (zk )} for all k ∈ N.
(a) The functions H and g are affine.
(b) Assumption 2 holds and z∗ is a nondegenerate solution of (50).
(c) Assumption 2 holds and there is a neighborhood U (z∗ ) of z∗ = (x∗ , u∗ , v∗ ) so that, for all
z = (x, u, v) ∈ Z ∩ U (z∗ ),
x = x∗ .
(d) Assumption 2 holds and there is a neighborhood U (z∗ ) of z∗ = (x∗ , u∗ , v∗ ) so that, for all
i ∈ {1, . . . , mg } and all z = (x, u, v) ∈ Z ∩ U (z∗ ),
u∗i = v∗i = 0 implies ui = vi = 0.
Proof. To apply Theorem 1 we will show that Assumptions 1 – 4 are satisfied for each of the
conditions (a) – (d). Obviously, the smoothness properties of H and g imply that F defined in
(50) is locally Lipschitz continuous and, thus, Assumption 1 is always satisfied. Assumption 4
also holds. To see this, note that the function F can be split according to (27):
 
min{u 1 , v1 }
H(x) + g0 (x)> u

Fa (z) := , Fb (z) :=  ..
.
 
g(x) + v .
min{umg , vmg }
The function Fa is differentiable with a locally Lipschitz continuous derivative, the function Fb
is of type (40). Thus, setting y , (u, v), the validity of Assumption 4 follows from Theorem 4 or
Corollary 5 depending on the choice of Ω as Ω+ or Ω0 . Moreover, Assumption 2 holds either
because it is explicitly assumed in (b) – (d), or because of Corollary 4 in case (a). The latter
corollary also implies Assumption 3 so that this assumption has to be verified only if one of the
conditions (b), (c), or (d) holds.
(b) Since z∗ is a nondegenerate solution of (50), F is differentiable with a locally Lipschitz
continuous derivative in a sufficiently small neighborhood of z∗ . Due to Corollary 1, Assump-
tion 3 is satisfied.
(c) As we mentioned above, the function F can be split into functions Fa and Fb . Fa is
differentiable with a locally Lipschitz continuous derivative. Fb is piecewise affine and does not
depend on x. Moreover, Ω is a polyhedral set since it is chosen as Ω+ or Ω0 . Furthermore, by
assumption, the x-components of the solutions of (50) are locally unique. Thus, the assumptions
of Theorem 3 hold and so Assumption 3 is satisfied.
(d) Similar to case (c) we can show that the assumptions of Proposition 6 hold. Thus,
Assumption 3 is satisfied.
We remark that, with the exception of case (c), the hypothesis of the Theorem above do not
require that the x-part of the solution be locally unique, a feature which is instead shared by
all methods we are aware of, with the exception of a result in [17, Subsection 5.3.1]. In fact,
27
other Newton-type methods especially designed for the solution of KKT systems, see e.g. [14,
16, 20, 22, 23, 24, 35, 36, 37], all suppose that the primal part of the solution is locally unique
and, in addition, assume hypotheses that all turn to imply or be equivalent to Assumption 2,
so that the settings of these papers are covered by Theorem 5 (c). In the recent report [22]
the local convergence of Josephy-Newton methods for generalized equations and KKT systems
was investigated under weakened differentiability assumptions but assuming strong regularity
which implies local uniqueness of the solution.
Next, we give a further sufficient condition for the validity of Assumptions 2 and 3 and thus,
for the quadratic convergence of Algorithm 1. To this end, let the function Ψ : Rn0 +mg → Rn0
and the cone C (x) for x ∈ Rn0 be defined by
Ψ(x, u) := H(x) + g0 (x)> u for all (x, u) ∈ Rn0 × Rmg
and n o
> >
C (x) := d ∈ R | H(x) d = 0, ∇gi (x) d ≤ 0 for all i ∈ I (x) ,
n0
where
I (x) := i ∈ {1, . . . , mg } | gi (x) = 0 .

The following theorem says that Assumptions 2 and 3 are satisfied if a certain second-order con-
dition holds at z∗ . This second-order condition was used in [14] to derive a local error bound
for KKT systems arising from variational inequalities and to prove quadratic convergence of a
method that extends the stabilized SQP [20, 24, 35] method to solve these systems. For opti-
mization problems this second-order condition reduces to the classical second-order sufficiency
condition. By means of the latter condition local fast convergence of Newton-type methods
(combined with an active set method) was shown in [23, 37].
Theorem 6 Let z∗ = (x∗ , u∗ , v∗ ) be a solution of (50) and assume that Ω = Ω+ or Ω = Ω0 .
Furthermore, suppose that at z∗ the following second-order condition (SOC) is valid:
d > Ψx (x∗ , u∗ )d 6= 0 for all d ∈ C (x∗ ) \ {0}. (51)
Then, Assumptions 2 and 3 are satisfied for problem (50), for δ > 0 sufficiently small, provided
that G(s) ∈ {(F i )0 (s) | i ∈ A (s)} for all s ∈ Bδ (z∗ ) ∩ Ω.
Proof. Since z∗ = (x∗ , u∗ , v∗ ) is a solution of (50), (x∗ , u∗ ) solves problem (49), with φ = φmin .
For that problem the function Φ provides a local error bound near (x∗ , u∗ ). This is shown in [14,
Theorem 4]. Furthermore, it is shown there that the primal component x∗ is locally unique.
Moreover, it holds that Φ provides a local error bound near (x∗ , u∗ ) for problem (49) if
and only if F provides a local error bound near z∗ for problem (50). This follows by slightly
modifying Lemma 3.5 in [34]. Thus, Assumption 2 is satisfied for problem (50). Since the
component x∗ is locally unique, the hypotheses of Theorem 3 hold and therefore Assumption 3
is satisfied as well.
As a consequence of the last theorem we obtain
Corollary 6 Let z∗ = (x∗ , u∗ , v∗ ) be a solution of (50) and assume that Ω = Ω+ or Ω = Ω0 .
Furthermore, suppose that at z∗ the second-order condition (51) holds. Then, there is r > 0 so
that any infinite sequence {zk } generated by Algorithm 1 converges quadratically to a solution
ẑ of (50), provided that z0 belongs to Br (z∗ ) ∩ Ω and that G(zk ) ∈ {(F i )0 (zk ) | i ∈ A (zk )} for all
k ∈ N.
28
The last results say that the second-order condition (51) is sufficient for quadratic convergence
of Algorithm 1 applied to problem (50). In the next section we state examples where (51) is not
satisfied but Assumptions 1–4 do hold so that Algorithm 1 is locally quadratically convergent.
5 Further Applications and Examples

In the previous section we analyzed in detail the application of Algorithm 1 to KKT systems.
The purpose of this section is threefold. On the one hand we want to illustrate, through exam-
ples and informal discussion, the wide applicability of Algorithm 1 to other systems of equa-
tions that can typically be encountered in optimization: complementarity systems, feasibility
systems, systems of (nonsmooth) equations. On the other hand, we give several examples of
KKT systems for which we can establish quadratic convergence while smoothness or regularity
assumptions needed by other methods, in some cases especially designed for the solution of
KKT systems, are not satisfied. Finally, the examples illustrate the use of some of the sufficient
conditions developed in Sections 3 and 4.
For all the examples considered we will also report some very limited numerical results
that show quadratic convergence and can be useful to give the reader a feel for the behavior
of Algorithm 1. However, we stress that the numerical testing of Algorithm 1 is outside the
scope of this paper and is a subject of our current research along with suitable globalization
techniques.
Our first example is a nonsmooth equation from [3]. This example gives a further illustration
of the approach discussed just before Example 4.
Example 5 Consider the nonlinear equation
max{min{x1 , −x2 }, x2 − x1 } = 0.
By introducing slack variables v1 , v2 this equation can be reformulated as

 
x2 − x1 − v1
F(z) := F(x, v) :=  min{x1 , −x2 } − v2  = 0, z ∈ Ω, (52)
max{v1 , v2 }
where Ω := {(x, v) ∈ R2 × R2 | x1 ≥ v2 , −x2 ≥ v2 , v1 ≤ 0, v2 ≤ 0}. The solution set of (52) is

given by
Z = {(t,t, 0, −|t|) |t ∈ R} ∪ {(t, 0, −t, 0) |t ≥ 0} ∪ {(0,t,t, 0) |t ≤ 0}.
Since the mapping F is piecewise affine, Assumptions 2 and 3 are satisfied due to Corollary 4.
Furthermore, Assumption 4 holds by the results in Subsection 3.3. Thus, Algorithm 1 converges
locally with a quadratic rate from the neighborhood of any solution.
Consider now the solution z∗ = (0, 0, 0, 0)> . Obviously, Condition 1 is not valid in the
neighborhood of z∗ because F is not differentiable there. Moreover, for any solution z ∈ Z the
rank of all V ∈ ∂B F(z) is less than n = 4. Hence, convergence conditions used for Levenberg-
Marquardt methods or semismooth Newton methods are not satisfied at z∗
Table 1 shows the behavior of the sequence {kF(zk )k∞ } where the iterates zk were generated
by Algorithm 1 with starting point z0 := (2, 1, 0, 0)> . Quadratic convergence can be observed.
29
k kF(zk )k∞
0 5.0000e − 01
1 1.6666e − 01
2 2.3809e − 02
3 5.5370e − 04
4 3.0642e − 07
5 2.8421e − 14
Table 1: Numerical results for Example 5
We note that in this, as in all following tables, we report the values of kF(zk )k∞ and not, as
more usual, the value of the distance of zk from its accumulation point. In fact, this point
is not known a priori and so such an estimate would be inaccurate. However, by the error
bound condition, quadratic convergence of the sequence {zk } is equivalent to the quadratic
convergence of {kF(zk )k∞ } to zero. Furthermore, in this, as in all subsequent examples, we
used the infinity norm in the definition of subproblems (7) which therefore are simple linear
programs.
The equation in the example above is a simple instance of a piecewise affine function. More in
general, it is known that any piecewise affine system of equations has a max-min representation,
see [32]. Therefore, by the discussion before Example 4 and reasoning as in Example 5, it is
possible to see that conceptually we can always reformulate any piecewise affine equation as a
piecewise affine system for which Assumptions 1 – 4 hold. From the practical point of view this
might be not convenient though, since it could lead to a large increase in the number of variables.
In many applications, simpler reformulations are easily available and should be exploited, this
is the case of linear complementarity problems that we discuss shortly.
We next consider a simple feasibility problem. The feasibility problem is, of course, a part
of the KKT conditions, so we should expect that our algorithm is applicable. It is interesting to
see that the analysis in this case can become particularly simple.
Example 6 Consider the system of two inequalities
x12 + x22 − 1 ≤ 0,
(x1 − 1)2 + x22 − 1 ≤ 0.
With Ω := R2 × R2+ , this can be written as
x12 + x22 − 1 + v1

F(z) := F(x, v) := = 0, z ∈ Ω. (53)
(x1 − 1)2 + x22 − 1 + v2
It is easy to prove that the Mangasarian-Fromovitz constraint qualification is satisfied for each
feasible point of the system above. Thus, it is well known that F provides a local error bound
condition on Ω so that Assumption 2 is valid. Furthermore, since F is differentiable and its
derivative is locally Lipschitz continuous, Assumptions 3 and 4 hold. This follows from Corol-
lary 1. In Table 2 a test run for our algorithm, applied to this problem is presented. As starting
point z0 := (3, 2, 0, 0)> was chosen. As expected we observe quadratic convergence of the se-
quence {kF(zk )k∞ }. Finally note that a constrained Levenberg-Marquardt method would also
provide local fast convergence, see [1, 25].
30
k kF(zk )k∞
0 1.2000e + 01
.. ..
. .
5 1.9964e − 01
6 2.5705e − 02
7 4.6560e − 04
8 1.6695e − 07
9 7.2053e − 14
We discuss next complementarity problems of the form

x ≥ 0, T (x) ≥ 0, x> T (x) = 0, (54)
where the function T : Rn0 → Rn0 is assumed to be differentiable with a locally Lipschitz con-
tinuous derivative. Let us consider first the reformulation
 
T (x) − v
 min{v1 , x1 } 
F(z) := F(x, v) :=   = 0, z ∈ Ω, (55)
 
..
 . 
min{vn0 , xn0 }
where Ω := Rn+0 × Rn+0 or Ω := {z = (x, v) ∈ Rn0 × Rn0 | v + x ≥ 0}. If F in (55) is affine, the
Assumptions 1 – 4 are satisfied, see Corollary 4 for Assumptions 2 and 3, Theorem 4 and
Corollary 5 for Assumption 4. Therefore, Algorithm 1 applied to (55) with T affine always
generates a sequence that is locally quadratically convergent to a solution of (55). We are
not aware of any other algorithm with this property but would like to mention the interesting
approach in [4] for monotone complementarity problems. We remark that one key ingredient
of [4], as well as of some Newton-type methods for the solution of KKT systems, is the use
of active-set identification techniques [7, 8, 23, 29, 36, 37], which however leads to algorithms
whose nature is quite dissimilar to the one of our approach.
If T is nonlinear, checking whether Assumptions 2 – 4 hold might be less straightforward.
Then, the following way of reformulating the complementarity problem (54) as a constraint
system of equations can be helpful:
 
T (x) − v

 x−w 

 min{v1 , w1 } 
F(z) := F(x, v, w) :=   = 0, z ∈ Ω (56)
 .. 
 . 
min{vn0 , wn0 }
with Ω := Rn0 ×Rn+0 ×Rn+0 or Ω := {z = (x, v, w) ∈ Rn0 ×Rn0 ×Rn0 | v+w ≥ 0}. In this reformu-
lation the second part of the function F is independent of x and so Theorem 3 and Proposition
6 can be applied to check the validity of Assumption 3. Then, by a trivial modification of The-
orem 5, convergence results can be obtained analogous to those obtained for KKT systems in
that theorem.
31
Example 7 The nonlinear complementarity problem with

x1 x2
T (x) :=
x12 + x2 − 1
can be reformulated as
 
x1 x2 − v1
 x2 + x2 − 1 − v2 
1 z∈Ω
 min{x1 , v1 }  = 0,
F(z) := F(x, v) :=  (57)

min{x2 , v2 }
with slack variables v1 , v2 and Ω := R2+ × R2+ . The solution set of (57) is given by
Z = {(t, 0, 0,t 2 − 1) |t ≥ 1} ∪ {(0, 1, 0, 0)}.
It is not difficult to verify that at each solution of (57) both Assumption 2 and Condition 4
are satisfied. The latter implies the validity of Assumption 3. Furthermore, Assumption 4
is valid due to Theorem 4. Hence, Algorithm 1 generates a locally quadratically convergent
sequence. In Table 3 the behavior of the residuals kF(zk )k∞ is shown. The chosen starting point
is z0 = (2, 1, 0, 0)> . We can consider the solution corresponding to t = 1, i.e., z∗ = (1, 0, 0, 0)> .
k kF(zk )k∞
0 4.0000e + 00
.. ..
. .
3 1.1633e − 01
4 1.2125e − 02
5 1.4527e − 04
6 3.2637e − 08
7 5.6843e − 14
It is easy to see that F is not differentiable at z∗ , so that Condition 1 does not hold. Furthermore,
it is also immediate to verify that ∂B F(z∗ ) contains singular matrices so that also Condition
2 is not met. Altogether, convergence conditions used in Levenberg-Marquardt methods and
semismooth Newton methods are not satisfied at z∗ .
The next four examples are all KKT systems and illustrate many of the points made in previous
sections. Note that in all following examples the solution set has a nonisolated primal part, and
therefore superlinear or quadratic convergence cannot be obtained for the specialized methods
in [14, 20, 35, 36, 37].
Example 8 The KKT conditions for the convex minimization problem
min x12 s.t. x1 ≥ 0, x2 ≥ 0 (58)

x
32
can be written as
 
2x1 − u1
 −u2 
 min{x1 , u1 }  = 0,
F(z) := F(x, u) :=   z ∈ Ω, (59)
min{x2 , u2 }
where Ω := R2+ × R2+ . The KKT conditions (59) are both necessary and sufficient optimality
conditions for problem (58). The solution set of (59) is obviously given by
Z = {(0,t, 0, 0)> |t ≥ 0}.
The mapping F is locally Lipschitz continuous and, for any z = (x, u) with x1 = u1 or x2 = u2 ,
nondifferentiable. Therefore F is nondifferentiable at every solution of (59) and Condition 1
cannot hold for any solution because this condition would imply the differentiability of F in the
interior of (Bδ (z∗ ) ∩ Ω) \ Z. Moreover, for any z ∈ Z, there is at least one matrix V ∈ ∂B F(z)
whose rank is 3 < 4 = n. If z ∈ Z with x2 > 0 then all matrices in ∂B F(z) have rank 3 < 4. Thus,
although Condition 2 holds for any fixed z∗ ∈ Z, the condition that ∂B F(z∗ ) has rank n does not
hold. Therefore, the existing tools for local convergence analysis are not applicable. Of course,
the second-order condition (51) cannot be satisfied, too, since otherwise the primal solutions
would have to be locally unique.
However, since F is piecewise affine, Assumptions 1 – 4 are valid. This follows from The-
orem 5 (see condition (a)). Hence, Algorithm 1 generates a quadratically convergent sequence,
provided that the starting point is chosen sufficiently close to z∗ . For a numerical test we have
chosen z0 = (2, 4, 3, 2)> . The results are shown in the Table 4.
k kF(zk )k∞
0 2.0000e + 00
.. ..
. .
5 1.5305e − 01
6 4.8160e − 02
7 6.0798e − 03
8 1.4527e − 04
9 2.9328e − 09
Example 9 The optimization problem

min x12 − x22 + x32 s.t. x12 + x22 − x32 ≤ 0, x1 x3 ≤ 0 (60)
x
leads to the KKT system
 
2x1 + 2u1 x1 + u2 x3

 −2x2 + 2u1 x2 


 2x3 − 2u1 x3 + u2 x1 

F(z) := F(x, u, v) := 
 x12 + x22 − x32 + v1 =0
 (61)

 x1 x3 + v2 

 min{u1 , v1 } 
min{u2 , v2 }
33
with multipliers u1 , u2 and slack variables v1 , v2 . This system has nonisolated solutions. Its
primal components are
n o n o
∗ > >
x ∈ (0,t,t) |t ∈ R ∪ (0,t, −t) |t ∈ R .
These are exactly the solutions of the optimization problem. Hence, the KKT system (61)
provides a necessary and sufficient optimality condition for problem (60). For x∗ = 0 the related
multipliers u∗ ∈ R2 are arbitrary while for the slack variables v∗ = 0 is valid. For all other primal
solutions, i.e., for t 6= 0, the multipliers and slacks are unique with
u∗ = (1, 0)> and v∗ = 0.
Similar to Example 8 it can be argued that Condition 1 cannot hold and that, for any z ∈ Z,
there is at least one element of ∂B F(z) whose rank is less than n = 7. Furthermore, the second-
order condition (51) is not satisfied for any solution because otherwise the primal solutions
would have to be locally unique. Nevertheless, it can be shown that, at z∗ := (x∗ , u∗ , v∗ ) with
x∗ 6= (0, 0, 0)> arbitrary but fixed, u∗ := (1, 0)> , v∗ := 0, the function F provides a local error
bound on Bδ (z∗ ) for δ > 0 sufficiently small, i.e., Assumption 2 is valid. For example, if
x∗ = (0, 2, 2)> , it holds that
dist[s, Z] ≤ 6kF(s)k∞ for all s ∈ B1/3 (z∗ ).
Furthermore, Proposition 6 is applicable for system (61). Thus, Assumption 3 also holds.
Finally, with Ω := R3 × R2+ × R2+ , Assumption 4 can be guaranteed by means of Theorem
4. Hence, given any solution corresponding to t 6= 0, Algorithm 1 converges locally with a
quadratic rate. Table 5 shows the behavior of the residuals kF(zk )k∞ , with z0 := (1, 1, 1, 0, 0, 0, 0)> .
We already observed that for t = 0, i.e., for the solution (primal part) x∗ = 0 our Assump-
k kF(zk )k∞
0 2.0000e + 00
1 1.0000e + 00
2 3.3333e − 01
3 6.0993e − 02
4 4.2452e − 03
5 2.9145e − 05
6 1.0382e − 09
tions 1 – 4 are not all satisfied. In Table 6 we report the results for the starting point z0 :=
(0, 1, 1, 0, 0, 0, 0)> , from which the algorithm converges to the critical solution given by t = 0.
It may be observed that although convergence occurs, the sequence of residuals does not seem
to exhibit the typical behavior of quadratic convergence.
Example 10 The KKT conditions for the convex optimization problem
min x2 s.t. x12 + x22 − 1 ≤ 0, −x2 ≤ 0 (62)
34
k kF(zk )k∞
0 2.0000e + 00
.. ..
. .
11 1.6934e − 06
12 4.2396e − 07
13 1.0604e − 07
14 2.6517e − 08
15 6.6301e − 09
Table 6: Numerical results for Example 9 with z0 := (0, 1, 1, 0, 0, 0, 0)>
can be written as  
2u1 x1

 2u1 x2 − u2 + 1 

 x12 + x22 − 1 + v1 
 = 0.
F(z) := F(x, u, v) :=  (63)

 −x2 + v2 

 min{u1 , v1 } 
min{u2 , v2 }
This system is a sufficient and necessary optimality condition for problem (62). It has an infinite
number of solutions that are nonisolated. The primal components of the solutions are
n o
∗ >
x ∈ (t, 0) |t ∈ [−1, 1] .
Both the related vectors of multipliers and slacks are unique:
u∗ = (0, 1)> and v∗ = (1 − t 2 , 0)> .
Similar to Example 8 it can be argued that Condition 1 cannot hold at solutions z∗ with x1∗ ∈
{−1, 1} and that, for any z ∈ Z, there is at least one element of ∂B F(z) whose rank is less than
n = 6. Furthermore, the second-order condition (51) is not satisfied for any solution because
otherwise the primal solutions would have to be locally unique. However, since the optimization
problem is convex, the linear independence constraint qualification holds, and the objective
function satisfies a quadratic growth condition, the function F provides a local error bound (see
[17, Theorem 6] for details). Furthermore, it can be shown that Condition 4 is satisfied for any
solution in Z. Thus, due to Corollary 3, Assumption 3 holds. As in Example 9, Assumption 4
can be easily satisfied by using Ω := R2 × R2+ × R2+ . Therefore, Algorithm 1 converges locally
with a quadratic rate. Table 7 shows numerical results for this example with starting point
z0 := (2, 1, 0, 0, 0, 0)> .
Our final example shows that Assumptions 1 – 4 can hold for the reformulation of the KKT
conditions of an optimization problem even if the following “nasty” situation occurs:
– all primal solutions are nonisolated,
– all dual solutions are nonisolated, and
– the reformulation is nonsmooth at any of its solutions.
35
k kF(zk )k∞
0 4.0000e + 00
.. ..
. .
3 2.3560e − 01
4 4.4926e − 02
5 1.9315e − 03
6 3.7237e − 06
7 3.8653e − 12
Example 11 For the optimization problem

min (x1 + x2 )x4 + 12 (x2 + x3 )2
x
x1 ≤ 0
x1 ≥ 0 (64)
x2 ≥ 1
x4 ≥ 0
x1 + x2 + x3 ≥ 0
we consider the KKT system
 
x4 + u1 − u2 − u5

 x4 + x2 + x3 − u3 − u5 


 x2 + x3 − u5 


 x1 + x2 − u4 


 x1 + v1 


 −x1 + v2 

 1 − x2 + v3 
=0
F(z) := F(x, u, v) :=  (65)

 −x4 + v4 


 −x1 − x2 − x3 + v5 


 min{u1 , v1 } 


 min{u2 , v2 } 


 min{u3 , v3 } 

 min{u4 , v4 } 
min{u5 , v5 }
with multipliers u1 , . . . , u5 and slack variables v1 , . . . , v5 . The solution set of optimization prob-
lem (64) is
{(0,t, −t, 0) |t ≥ 1}.
For each of these solutions (identified by some t) the corresponding set of multipliers is given
by
{(s, s, 0,t, 0) | s ≥ 0}.
On this basis it is easy to check that the solution set of (65) is
Z = {(0,t, −t, 0, s, s, 0,t, 0, 0, 0, 1 − t, 0, 0) |t ≥ 1, s ≥ 0}.
36
Since at any solution both u5 and v5 are zero, F is nondifferentiable at any point in Z. Although
the optimization problem (64) has a nonconvex objective, any solution of the KKT system (65)
is associated to one of the global solutions of (64). Since the function F is piecewise affine,
Assumptions 1–4 are satisfied for any z∗ ∈ Z if Ω = R4 × R5+ × R5+ . The results of a numerical
test are shown in Table 8, where z0 = (1, 4, −2, 1, 3, 3, 1, 4, 1, 0, 1, 3, 1, 3)> is used as starting
point.
k kF(zk )k∞
0 1.0000e + 00
.. ..
. .
5 4.8716e − 02
6 7.9447e − 03
7 2.4470e − 04
8 2.3928e − 07
9 2.3447e − 13
Note that, for an optimization problem with affine constraints ci (x) ≥ 0, there are reason-
able, alternative options to reformulate its KKT conditions as a system of constrained equa-
tions (1). A first option is the one used in Examples 8 and 11. Another option is obtained
if, instead of having ci (x) − vi = 0 and min{ui , vi } in the reformulated KKT system, we only
use min{ui , ci (x)} = 0. Then, according to Theorem 4, the definition of the set Ω should be
augmented by including ui ≥ 0 and ci (x) ≥ 0. Still another option is to include all the affine
constraints directly in the definition of the set Ω. The discussion and analysis of possible ad-
vantages of alternative reformulations is left to future study.
References
[1] Behling, R., Fischer, A.: A unified local convergence analysis of inexact constrained
Levenberg-Marquardt methods. Optim. Lett., doi:10.1007/s11590-011-0321-3
[2] Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer,
New York (2000)
[3] Clarke F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)
[4] Dan, H., Yamashita, N., Fukushima, M.: A superlinearly convergent algorithm for the
monotone nonlinear complementarity problems without uniqueness and nondegeneracy
conditions. Math. Oper. Res. 27, 743–753 (2002)
[5] Dong, Y., Fischer, A.: A framework for analyzing local convergence properties with ap-
plications to proximal-point algorithms. J. Optim. Theory Appl. 131, 53–68 (2006)
37
[6] Facchinei, F., Fischer, A., Herrich. M.: A family of Newton methods for nonsmooth
constrained systems with nonisolated solutions. Technical Report, MATH-NM-07-2011,
Technische Universität Dresden, 2011.
[7] Facchinei, F., Fischer, A., Kanzow, C.: On the accurate identification of active constraints.
SIAM J. Optim. 9, 14–32 (1998)
[8] Facchinei, F., Fischer, A., Kanzow, C.: On the identification of zero variables in an
interior-point framework. SIAM J. Optim. 10, 1058–1078 (2000)
[9] Facchinei, F., Fischer, A., Piccialli, V.: Generalized Nash Equilibrium Problems and New-
ton Methods. Math. Program. 117, 163–194 (2009)
[10] Facchinei, F., Kanzow, C.: A nonsmooth inexact Newton method for the solution of large-
scale nonlinear complementarity problems. Math. Program. 76, 493–512 (1997)
[11] Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5, 173–210
(2007)
[12] Facchinei, F., Pang, J.-S.: Finite Dimensional Variational Inequalities and Complementar-
ity Problems. Springer, New York (2003)
[13] Fan, J.Y., Yuan, Y.X.: On the quadratic convergence of the Levenberg-Marquardt method
without nonsingularity assumption. Comput. 74, 23–39 (2005)
[14] Fernández, D., Solodov, M.: Stabilized sequential quadratic programming for optimiza-
tion and a stabilized Newton-type method for variational problems. Math. Program. 125,
47–73 (2010)
[15] Fischer, A.: Solution of monotone complementarity problems with locally Lipschitzian
functions. Math. Program. 76, 513–532 (1997)
[16] Fischer, A.: Modified Wilson’s method for nonlinear programs with nonunique multipli-
ers. Math. Oper. Res. 24, 699–727 (1999)
[17] Fischer, A.: Local behavior of an iterative framework for generalized equations with non-
isolated solutions. Math. Program. 94, 91–124 (2002)
[18] Fischer, A., Shukla, P.K.: A Levenberg-Marquardt algorithm for unconstrained multicri-
teria optimization. Oper. Res. Lett. 36, 643–646 (2008)
[19] Fischer, A., Shukla, P.K., Wang, M.: On the inexactness level of robust Levenberg-
Marquardt methods. Optim. 59, 273–287 (2010)
[20] Hager, W.W.: Stabilized Sequential Quadratic Programming. Comput. Optim. Appl. 12,
253–273 (1999)
[21] Hoffman, A.J.: On approximate solutions of systems of linear equations and inequalities.
J. Res. Natl. Bur. Stand. 49, 263–265 (1952)
38
[22] Izmailov, A.F., Kurennoy, A.S., Solodov, M.V.: The Josephy-Newton method for semis-
mooth generalized equations and semismooth SQP for optimization. Technical Report,
April 2011
[23] Izmailov, A.F., Solodov, M.V.: Newton-type methods for optimization problems without
constraint qualifications. SIAM J. Optim. 15, 210–228 (2004)
[24] Izmailov, A.F., Solodov, M.V.: Stabilized SQP revisited. Math. Program., published online
September 26 (2010)
[25] Kanzow, C., Yamashita, N., Fukushima, M.: Levenberg-Marquardt methods with strong
local convergence properties for solving equations with convex constraints. J. Comput.
Appl. Math. 172, 375–397 (2004)
[26] Kummer, B.: Newton’s method for non-differentiable functions. In Guddat, J. et al. (eds.)
Mathematical Research Advances in Mathematical Optimization, pp. 114–125. Akademie
Verlag, Berlin (1988)
[27] Luque, F.J.: Asymptotic convergence analysis of the proximal point algorithm. SIAM J.
Control Optim. 22, 277–293 (1984)
[28] Ng, K.F., Zheng, X.Y.: Error bounds of constrained quadratic functions and piecewise
affine inequality systems. J. Optim. Theory Appl. 118, 601–618 (2003)
[29] Oberlin, C., Wright, S.: Active Constraint Identification in Nonlinear Programming. SIAM
J. Optim. 17, 577–605 (2006)
[30] Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math.
Oper. Res. 18, 227–244 (1993)
[31] Qi, L., Sun J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367
(1993)
[32] Scholtes, S.: Introduction to piecewise differentiable equations. Preprint 53/1994, Insti-
tut für Statistik und Mathematische Wirtschaftstheorie, Universität Karlsruhe, Karlsruhe
(1994)
[33] Shukla, P.K.: Levenberg-Marquardt Algorithms for Nonlinear Equations, Multi-objective

Optimization and Complementarity Problems. Dissertation, Technische Universität Dres-
den, Dresden (2010)
[34] Tseng, P.: Growth behavior of a class of merit functions for the nonlinear complementarity
problem. J. Optim. Theory Appl. 89, 17–37 (1996)
[35] Wright, S.: Superlinear convergence of a stabilized SQP method to a degenerate solution.
Comput. Optim. Appl. 11, 253–275 (1998)
[36] Wright, S.: Constraint identification and algorithm stabilization for degenerate nonlinear
programs. Math. Program. 95, 137–160 (2003)
39
[37] Wright, S.: An algorithm for degenerate nonlinear programming with rapid local conver-
gence. SIAM J. Optim. 15, 673–696 (2005)
[38] Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg-Marquardt
method. Comput. [Suppl.] 15, 239–249 (2001)
[39] Zhou, G., Qi, L.: On the convergence of an inexact Newton-type method. Oper. Res. Lett.
34, 647-652 (2006)
40

An LP Newton Method - Nonsmooth Equations, KKT Systems, and Nonisolated Solutions

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

An LP Newton Method - Nonsmooth Equations, KKT Systems, and Nonisolated Solutions

Transféré par

Droits d'auteur :

Formats disponibles

Als Manuskript gedruckt

Technische Universität Dresden

An LP-Newton Method: Nonsmooth Equations,

F. Facchinei, A. Fischer, and M. Herrich

Andreas Fischer1 and Markus Herrich

September 15, 2011

Keywords. Quadratic convergence, Newton method, nonsmooth system, nonisolated solution,

Mathematics Subject Classification (2010). 90C30, 90C33, 49M15, 65K05

1. Relaxation of the differentiability assumption;

2. Relaxation of the nonsingularity assumption;

3. The ability to find a solution lying in some prescribed set Ω.

dist[s, Z] ≤ `kF(s)k for all s ∈ Bδ (z∗ ),

where, Bδ (z∗ ) := {z ∈ Rn | kz − z∗ k ≤ δ } is the ball of radius δ > 0 and dist[s, Z] :=

min kF(zk ) + F 0 (zk )(z − zk )k22 + µk kz − zk k22 , (4)

• A detailed discussion of the applicability of the LP-Newton method to KKT systems

2 The Algorithm and its Local Convergence Properties

(a) the optimization problem (7) has a solution and

γ̄ := max{kF(s) + G(s)(z̄ − s)kkF(s)k−2 , kz̄ − skkF(s)k−1 },

Algorithm 1: LP-Newton Algorithm

Assumption 1 There exits L > 0 such that

holds for all s ∈ Bδ (z∗ ) ∩ Ω.

Assumption 2 There exists ` > 0 such that

holds for all s ∈ Bδ (z∗ ) ∩ Ω.

Assumption 3 There exists Γ ≥ 1 such that

holds for all s ∈ Bδ (z∗ ) ∩ Ω.

Assumption 4 There exists α̂ > 0 such that

This assumption requires that the mapping

w 7−→ F(s) + G(s)(w − s)

Lemma 1 Let Assumption 3 be satisfied and define the set F (s, Γ) by

F (s, Γ) := z ∈ Ω | kz − sk ≤ ΓkF(s)k, kF(s) + G(s)(z − s)k ≤ ΓkF(s)k2 .

kF(s) + G(s)(z − s)k ≤ ΓL2 dist[s, Z]2 and kz − sk ≤ ΓL dist[s, Z]

hold for all z ∈ F (s, Γ).

kz−sk ≤ ΓkF(s)k ≤ ΓL dist[s, Z].

Proof. Let us first choose any ε according to

kF(s) + G(s)(z − s)k ≤ ΓkF(s)k2 ≤ Γ2 kF(s)k2

kF(z)k ≤ α̂α 2 = α̂Γ2 kF(s)k2 .

Using this, Assumptions 1 and 2, and (8), we obtain

kz j+1 − z j k ≤ ΓL dist[z j , Z]. (12)

Because of (9) and (10), Lemma 2 implies

3.1 Relations to Existing Conditions

Condition 1 There exist κ0 > 0 and δ0 > 0 such that

kF(s) + G(s)(w − s) − F(w)k ≤ κ0 kw − sk2 (16)

∂B F(z) := { lim F 0 (z` ) | lim z` = z, z` ∈ DF },

where DF ⊆ Rn is the set of points where F is differentiable.

sup {kF(s) +V (z∗ − s)k |V ∈ ∂ F(s)} ≤ κ1 kz∗ − sk2 (17)

holds for all s ∈ Bδ (z∗ ) ∩ Ω.

Thus, inequality (16) with w := s⊥ and Assumption 2 yield

kF(s) + G(s)(s⊥ − s)k ≤ κ0 ks⊥ − sk2 ≤ κ0 `2 kF(s)k2 .

kF(w)k − kF(s) + G(s)(w − s)k ≤ κ0 kw − sk2 ≤ κ0 α 2

Corollary 1 Let F be differentiable with a locally Lipschitz continuous derivative around z∗ .

Proposition 3 The following assertions are valid:

(a) If Assumption 2 is satisfied then Condition 3 implies Assumption 3.

(b) If Assumptions 1 is satisfied then Assumption 3 implies Condition 3.

kz(s) − sk ≤ ΓkF(s)k ≤ ΓL dist[s, Z]

kF(s) + G(s)(z∗ − s)k ≤ κ1 ks − z∗ k2 = κ1 dist[s, Z]2 .

kF(s) + G(s)(w − s)k ≤ α 2 and kw − sk ≤ α (20)

hold and, by the latter inequality,

kF(w)k = kF(w) − F(z∗ )k ≤ L0 kw − z∗ k (22)

kV + k ≤ c for all V ∈ ∂B F(s), (23)

where V + ∈ Rn×m denotes the pseudo-inverse of V . Note that rank(V ) = n ≤ m implies V +V =

w − z∗ = w − s + s − z∗ = V + ( (F(s) +V (w − s)) − F(s) −V (z∗ − s) ).

Thus, by (23), (20), and Condition 2 it follows that

kw − z∗ k ≤ kV + k(kF(s) +V (w − s)k + kF(s) +V (z∗ − s)k)