Vous êtes sur la page 1sur 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/305695496

Lagrange Programming Neural Network for Nondifferentiable Optimization


Problems in Sparse Approximation

Article  in  IEEE Transactions on Neural Networks and Learning Systems · July 2016


DOI: 10.1109/TNNLS.2016.2575860

CITATIONS READS

6 85

4 authors, including:

A. G. Constantinides Wen-Jun Zeng


Imperial College London City University of Hong Kong
469 PUBLICATIONS   6,230 CITATIONS    38 PUBLICATIONS   410 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Financial Signal Processing View project

DRIVEN - Diagnostically Robust Ultrasound Video Transmission over Emerging Wireless Networks View project

All content following this page was uploaded by A. G. Constantinides on 02 February 2018.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017 2395

Lagrange Programming Neural Network for


Nondifferentiable Optimization Problems
in Sparse Approximation
Ruibin Feng, Chi-Sing Leung, Senior Member, IEEE, Anthony G. Constantinides, Life Fellow, IEEE,
and Wen-Jun Zeng

Abstract— The major limitation of the Lagrange programming Based on the concept of variational inequalities [10]–[12],
neural network (LPNN) approach is that the objective function a number of projection neural network models [13]–[16] for
and the constraints should be twice differentiable. Since sparse constrained optimization problems were proposed, in which
approximation involves nondifferentiable functions, the original
LPNN approach is not suitable for recovering sparse signals. This a projection circuit is required. For simple constraints, such
paper proposes a new formulation of the LPNN approach based as a box set, the projection circuit is very simple. However,
on the concept of the locally competitive algorithm (LCA). Unlike when complicated constraints are considered, the projection
the classical LCA approach which is able to solve unconstrained circuit is difficult to be implemented. Recently, a projection
optimization problems only, the proposed LPNN approach is able neural network model [17] was proposed for handling com-
to solve the constrained optimization problems. Two problems in
sparse approximation are considered. They are basis pursuit (BP) plex variables. In [18]–[20], the concept of projection neural
and constrained BP denoise (CBPDN). We propose two LPNN networks was extended to handle l1 -norm problems. When
models, namely, BP-LPNN and CBPDN-LPNN, to solve these two we use these models (in which the objective function contains
problems. For these two models, we show that the equilibrium an l1 -norm term), the number of neurons is doubled. Many
points of the models are the optimal solutions of the two existing models are designed for solving a particular form of
problems, and that the optimal solutions of the two problems are
the equilibrium points of the two models. Besides, the equilibrium optimization problems. For example, in [5], the model was
points are stable. Simulations are carried out to verify the designed for the quadratic programming problem with equality
effectiveness of these two LPNN models. constrains.
Index Terms— Lagrange programming neural networks The Lagrange programming neural network (LPNN)
(LPNNs), locally competitive algorithm (LCA), optimization. approach [21]–[25] provides a general framework for solving
various nonlinear constrained optimization problems. Further-
I. I NTRODUCTION more, with the augmented term concept, the LPNN approach is
able to solve nonconvex optimization problems. Although the
U SING analog neural networks for solving nonlinear
constrained optimization problems has been studied
over many decades [1]–[5]. The analog neural network
LPNN approach was developed in the early 1990s, the formal
proof [26], [27] of the global convergence for convex problems
was given in the early 2000s. Recently, some new applications
approach is more effective, when realtime solutions are
of using the LPNN approach [24], [25], including target
required [1]–[3], [6]. The use of neural networks for
localization in Multi-input Multi-output and waveform design
optimization could be dated back at least to the 1980s [2], [7].
in radar systems, were reported. In these signal processing
In [7], the Hopfield model was demonstrated to have the
applications, the optimization problems are nonconvex, and
ability for solving several optimization problems. In [3],
the LPNN approach is superior to the traditional numerical
a canonical nonlinear programming circuit was proposed
approaches. However, the major limitation of LPNN is that
to solve nonlinear programming problems with inequality
it cannot handle nondifferentiable objective functions and
constraints. A recurrent neural network model [8] was
constraints.
proposed for quadratic optimization with bound constraints,
Sparse approximation [28], [29] aims at recovering a
and its convergent proof was given in [9].
unknown sparse signal. Its objective function or the constraint
Manuscript received November 27, 2015; revised May 28, 2015, usually is not differentiable. There are many digital algorithms
January 12, 2016, and April 9, 2016; accepted May 19, 2016. Date of (numerical algorithms) for sparse approximation. For example,
publication November 27, 2015; date of current version September 15, 2017.
This work was supported by the Research Grants Council, Hong Kong, under log-barrier and spectral projected gradient (SPG) [30] are
Grant Number, CityU 115612. two representative methods in some sparsity approximation
R. Feng, C.-S. Leung, and W.-J. Zeng are with the Department of Electronic packages [31], [32].
Engineering, City University of Hong Kong, Hong Kong (e-mail: rfeng4-c@
my.cityu.edu.hk; eeleungc@cityu.edu.hk; wenjzeng@gmail.com). In [33], the locally competitive algorithm (LCA) was
A. G. Constantinides is with Imperial College London, London SW7 2AZ, proposed for solving the unconstrained basis pursuit denoise
U.K. (e-mail: a.constantinides@imperial.ac.uk). problem [34]. Unlike the conventional model, the analysis of
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. the LCA is difficult because of using a nonsmooth, unbounded
Digital Object Identifier 10.1109/TNNLS.2016.2575860 and not strictly increasing activation function. The LCA
2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2396 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

properties were reported in [35] and [36]. To escape from the


differentiable requirement, the LCA introduces the internal
state concept. It then defines the dynamics for the internal
states. However, the LCA is able to handle unconstrained
problems only. In sparse approximation, there are many
constrained problems.
There are some neural-based optimization algorithms for
nondifferential functions [37]–[40]. In these algorithms, the
Fig. 1. Threshold function.
dynamics are directly related to the subdifferential of the
objective function. However, there is no discussion on the way
to select the appropriate subgradient when the system state is the measurement matrix with a rank of m, and m < n. The
is at a nondifferentiable point. In addition, choosing a fixed estimation problem can be stated as
subgradient is not an option, because this way cannot ensure
that an optimal solution is a stable equilibrium. min x0 , s.t. x = b. (5)
x
This paper develops two LPNN models for handling
two sparse approximation problems: BP and constrained Unfortunately, the problem stated in (5) is NP-hard. Therefore,
BP denoise (CBPDN). Since the l1 -norm term in these prob- we usually replace the l0 -norm with the l1 -norm. The problem
lems is not differentiable, we adopt the LCA concept to becomes the well-known BP problem [34], [43]
escape from the differentiable requirement. We proposed two min x1 , s.t. x = b. (6)
models: BP-LPNN for the BP-problem and CBPDN-LPNN x
for the CBPDN problem. For these two models, we show When residue is allowed in the constraint of (6), the problem
that the equilibrium points of the networks are the optimal becomes the CBPDN
solutions of these two problems. Furthermore, we show that
the equilibrium points are stable. min x1 , s.t. b − x22 ≤  (7)
x
This paper is organized as follows. Section II presents
where  > 0. In many situations, the measurement
the backgrounds of sparse approximation and LPNNs.
process may contain noise, given by b = x + ε, where
Sections III and IV present the BP-LPNN model and the
ε = [ε1 , . . . , εm ]T , and εi values are independently identically
CBPDN-LPNN model, respectively. Section V presents our
random variables with zero mean and variance σ 2 . In this case,
simulation results. Section VI discusses some existing analog
we can set  = mσ 2 . That means, (7) becomes
network algorithms and how our approach can handle other
sparse approximation problems. Section VII concludes our min x1 , s.t. b − x22 ≤ mσ 2 . (8)
x
results.
An LCA network [33], [35], [36] contains n neurons. Their
internal states and outputs are denoted by u and x, respec-
II. BACKGROUND
tively. The LCA aims at minimizing the following objective
A. Subdifferential, Sparse Approximation, and LCA function:
The definition of subdifferential [41], [42] is given by the
Llca = (1/2)b − x22 + κx1 (9)
following.
Definition 1: Let f : Rn → R be a convex function. where κ is a tradeoff parameter. The mapping from u to x is
A vector ρ is a subgradient of f at x ∈ dom f , if defined by a threshold function, given by

f ( y) ≥ f (x) + ρ T ( y − x) ∀ y ∈ dom f. (1) 0, for |u i | ≤ κ
x i = Tκ (u i ) = (10)
Definition 2: The subdifferential ∂ f (x) at x is the set of all u i − κsign(u i ), for |u i | > κ.
subgradients, given by The LCA embeds the tradeoff parameter into the threshold
∂ f (x) = {ρ|ρ ( y − x) ≤ f ( y) − f (x), ∀ y ∈ dom f }.
T
(2) function Tκ (·). Some threshold functions with different values
of κ are shown in Fig. 1. For |u i | > κ, the mapping from u i
We use two examples to explain the subdifferential concept. to x i is one-to-one. For |u i | ≤ κ, the mapping is many-to-one.
Example 1: For f (x) = |x|: x ∈ R, we have In [33] and [36], the relationship among ∂|x i |, x i , and
 u i is established. For x i = 0, the inverse mapping Tκ−1 is
[−1, 1], for x = 0
∂|x| = (3) one-to-one. For x i = 0, the inverse mapping is one-to-many,
sign(x), for x = 0. i.e., Tκ−1 (0) is equal to a set, given by [−κ, κ]. From (10),
Example 2: For f (x) = x1 : x ∈ Rn , we have given a x i not equal to zero, u i − x i = κsign(u i ) =
κsign(x i ). Also, from (10), given x i = 0, u i is equal to a
∂x1 = [∂|x 1 |, . . . , ∂|x n |]T . (4) set, given by [−κ, κ]. On the other hand, from (3), given a x i ,
∂|x i | = sign(x i ) for x i = 0, and ∂|x i | = [−1, 1] for x i = 0.
In sparse approximation [28], [29], we need to estimate Hence, given an x, we have
an unknown sparse vector x ∈ Rn from the observation
b = x, where b ∈ Rm is the observation vector,  ∈ Rm×n u − x = κ∂x1 . (11)
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2397

If κ = 1, then u − x = ∂x1 . Of course, given a known u III. BP-LPNN


A. Properties of the BP Problem
(u − x) ∈ ∂x1 . (12)
Recall that the BP problem is given by
The generalized gradient of (9) is given by min x1 , s.t. x = b. (19)
x
∂ x Llca = κ∂x1 −  (b − x).
T
(13) The BP problem is nondifferentiable but convex. We have the
following result for the BP problem [41], [44]–[47].
The LCA defines the dynamics on u, given by Proposition 1: A point x is an optimal solution of the
du BP problem, if and only if, there exists a λ (Lagrange
= −∂ x Llca = −∂x1 + T (b − x). (14) multiplier vector) and
dt
0 ∈ ∂x 1 + T λ (20a)
Practically, it is impossible to implement ∂x1 , since ∂x1
may be equal to a set. However, with introducing the internal x = b. (20b)
state u, from (11), we can replace ∂x1 with u − x. Then, In Proposition 1, (20) summarizes the Karush–Kuhn–
the dynamics become Tucker (KKT) conditions. Since the problem is convex, the
du KKT conditions are sufficient and necessary.
= −u + x + T (b − x). (15)
dt
B. BP-LPNN Dynamics
The advantage of introducing the internal state is that we do
not need to implement ∂x1 directly. However, the limitation One may suggest that we can consider the following
of the LCA is that it is designed for solving the unconstrained Lagrangian function, given by Lbp = x1 + λT (b − x), to
optimization problem only. One may suggest that we can construct the BP-LPNN dynamics. However, the Lagrangian
directly implement (d x/dt) = −∂ x Llca . However, in this function Lbp is a first-order function of x, and it may create
direct approach, we have the subgradient selection problem the stability problem around an equilibrium. To solve it,
that will be discussed in Section V-A. we introduce an augmented term (1/2)b − x22 into the
Lagrangian function
1
B. LPNN Lbp = x1 + b − x22 + λT (b − x). (21)
2
The LPNN approach is able to solve a general nonlinear Introducing the augmented term does not affect the objective
constrained optimization problem given by value at an equilibrium x , because b − x = 0.
The gradients of Lbp are given by
min f (x), s.t. h(x) = 0 (16)
x ∂ x Lbp = ∂x1 − T (b − x) − T λ (22a)
where f : Rn → R is the objective function, and h : Rn → ∂Lbp
= b − x. (22b)
Rm (m < n) describes the m equality constraints. The two ∂λ
functions f and h are assumed to be twice differentiable. Following the concept of the LCA, we introduce the internal
In the LPNN approach, we first set up a Lagrangian function, state vector u. The relationship between u and x is
given by given by

Lep = f (x) + λT h(x) (17) 0, for |u i | ≤ 1
x i = T1 (u i ) =
u i − sign(u i ), for |u i | > 1.
where λ = [λ1 , . . . , λm ]T is the Lagrange multiplier vector.
There are two kinds of neurons: variable neurons and Lagrange From (11), (18), and (22), we obtain the BP-LPNN
neurons. The variable neurons hold the variable vector x. dynamics
The Lagrange neurons hold the Lagrange multiplier vector λ. du
= −u + x + T (b − x) + T λ (23a)
The dynamics of the neurons are given by dt

dx ∂Lep dλ ∂Lep = b − x. (23b)
τ0 =− , and τ0 = (18) dt
dt ∂x dt ∂λ
where τ0 is the time constant of the circuit. Without loss C. Property of the Dynamics
of generality, we consider that τ0 is equal to 1. With (18), In the BP-LPNN, we are interested in two issues. The first
the network will settle down at a stable state [21] if the one is whether an equilibrium point of (23) satisfies the KKT
network satisfies some mild conditions. Although the LPNN conditions of the BP problem. The second one is whether the
approach is able to provide a general framework for various equilibrium is stable or not.
kinds of optimization problems, the objective function and the The optimal solutions of the BP problem are in term
constraints should be differentiable. of x, while the equilibrium points of (23) involve the hidden
2398 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

state u. Hence, we need Theorem 1 to explicitly show that the and three balls around the equilibrium point, given by
equilibrium points are the optimal solutions.  
Bx = ( x̃
, λ̃) :  x̃
22 + λ̃22 ≤ γ 2 (33)
Theorem 1: Let {u , λ } be an equilibrium point of the  
BP-LPNN dynamics (23). At the equilibrium point, the KKT Bu = ũ
c : ũ
c 22 ≤ α 2 (34)
 
conditions (20) of the BP problem are satisfied. Since the KKT Bλ = ( x̃
, λ̃) :  x̃
22 + λ̃22 ≤ α 2 /(ς 2 + ω2 ) (35)
conditions of the BP problem are necessary and sufficient, the
equilibrium point of (23) is equilivant to the optimal solution where ς = 
T c 2 
2
and ω = 
T c 2 .

of the BP problem. Theorem 2 tells us about the stability of the equilibrium
Proof: According to the definition of equilibrium points points.
Theorem 2: For the BP-LPNN model, when there is a small
d u dλ perturbation on an equilibrium point, the state converges to the
= 0, and = 0. (24)
dt dt optimal solution of the BP problem.
From (23) and (24), we have Proof: Since the proof is complicated, we give an
overview first. The proof contains three parts. In Part 1, we
−u + x + T (b − x ) + T λ = 0 (25a) show that if the initial state { x̃
(0), λ̃(0)} is inside Bx , then
b − x = 0.
(25b) for t ≥ 0, V1 (t) decreases with time, as long as there are
no inactive neurons switching to be active. Furthermore, as
Clearly, (25b) is identical to (20b). With (25b), (25a) becomes V1 (t) < V1 (0) for t > 0, we have

−u + x + T λ = 0. (26)  x̃
(t)22 + λ̃(t)22 < γ 2 (36)
for t ≥ 0. Inequality (36) implies that |x̃ i (t)| < γ , ∀i ∈
,
On the other hand, from (12)
i.e., no active neurons switches to be inactive.
−u + x + T λ ∈ −∂x 1 + T λ . (27) In Part 2, we show that if ũ
c (0) is inside Bu and
{ x̃
(0), λ̃(0)} is inside Bλ , then for t > 0, there are no inactive
Hence, from (26) and (27), we obtain neurons switching to be active. In the proof, we first show that
for t > 0 ũ
c (t) remains inside Bu . When ũ
c (t) is inside Bu ,
0 ∈ −∂x 1 + T λ . (28) we have
That means, (20a) is also satisfied. Using the similar way, −α ≤ u i (t) − u i ≤ α ∀i ∈
c . (37)
one can prove that (20) leads to (25). The proof is
completed.  From (29), (37) becomes
In order to discuss the stability, we introduce the concepts −1 ≤ u i (t) ≤ 1 ∀i ∈
c . (38)
of active neurons and inactive neurons in [35] and [36].
Definition 1: For the active neurons, the magnitudes of their That means, there are no inactive neurons switching to be
internal state u i values are greater than 1. The collection
of active.
indices of the active neurons is denoted by
= {i ∈ [1, n], Based Parts 1 and 2, in Part 3, we will prove that if the
|u i | > 1}. Also, we define 
as the matrix composed of the initial state is close to an equilibrium point (optimal point),
columns of  indexed by
. then limt →∞ x̃
(t) = 0, and λ̃(t) and ũ
c (t) converge.
Definition 2: For the inactive neurons, the magnitudes of (Proof of Part 1): The dynamics of the inactive neurons and
their internal state u i values are less than or equal to 1 and the active neurons can be rewritten as
corresponding outputs x i ’s are equal to 0. The collection
c d ũ
c
of indices of the inactive neurons is denoted by
c = {i ∈ = −ũ
c (t) − 
T c 
x
(t) + 
T c λ̃(t) (39)
dt
[1, n], |u i | ≤ 1}. Also, we define 
c as the matrix composed d x̃

of the columns of  indexed by
c . = −ũ
(t)+ x̃
∗ (t) − 
T 

(t)+
T λ̃(t) (40)
dt
Let {u , x = T1 (u ), λ } be an equilibrium point (optimal d λ̃
solution). Furthermore, let
and
c be the active set and the = −

(t). (41)
dt
inactive set of x , respectively. We define two constants for
them, given by For a point { x̃
(t), λ̃(t)} inside Bx
d V1  
γ = min |x i | and α = minc 1 − |u i |. (29) =2 − x̃ i (t)ũ i (t) + x̃ i2 (t)
i∈
i∈
dt ∗
i∈

Also, we introduce the notations − 2( x̃


∗ (t))T 
T ∗ 
∗ ( x̃
∗ (t)). (42)

ũ = u − u , x̃ = x − x , and λ̃ = λ − λ . (30) Notice that x̃ i and ũ i are with the same sign and |ũ i | > |x̃ i |.
Hence, we have d V1 /dt ≤ 0.
We define two energy functions, given by In sparse approximation, we usually use random matrices as
the measurement matrix, and the number of nonzero elements
V1 ( x̃
(t), λ̃(t)) =  x̃
(t)22 + λ̃(t)22 (31) in a sparse solution is much less than the number m of
V2 (ũ
c (t)) = ũ
c (t)22 (32) measurements. Let n a be the number of active neurons in
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2399

the equilibrium point and n a ≤ m. If rank(


∗ ) = n a , then inactive neurons [36]. From (23b) and Part 1 (V1 (t) < V1 (0)),

T ∗ 
∗ is positive definite and rank(
T ∗ 
∗ ) = n a . From when limt →∞ x(t) = x , we have limt →∞ λ(t) = λo . Define
the fact [48] that for n a ≤ m, when the elements of 

(t) = x
(t) − x
(50)
are independently identical Gaussian random variables or ±1
random variables, the probability that rank(
∗ ) = n a tends λ̄(t) = λ(t) − λ o
(51)
to 1. Hence, 
T 
is positive definite and d V1 /dt < 0 for u = o
−
T c 
x
+ 
T c λo + 
T c b. (52)

∗ (t) = 0. Hence as long as there are no inactive neurons
From (23a) and (50)–(52), the dynamics of inactive neurons
switching to be active, V1 (t) strictly decreases with time,
are
i.e., V1 (t) < V1 (0) for t > 0.
d u
c
Furthermore, we have = −u
c (t) + uo − 
T c 

(t) + 
T c λ̄(t). (53)
dt
V1 (t) =  x̃
(t)22 + λ̃(t)22 < γ 2 (43) The solution of the dynamics [36] is given by
 t
for t > 0. Inequality (43) means that, |x̃ i (t)| < γ , ∀i ∈
, c
o −t
u
(t) = u + e (u
(0)−u )−e
c
o −t
es 
T c 

(s)ds
i.e., no active neurons switches to be inactive.  t 0
(Proof of Part 2): First at all, we state an important fact.
+ e−t es 
T c λ̄(s)ds. (54)
For the inactive neurons 0
d V2 Define
≤ −ũ
∗c (t)22 + ũ
∗c (t)2 (ς  x̃
∗ (t)2 + ωλ̃(t)2 ).  t
dt −t
(44) ϑ(t) = e es Q x̃
(s)ds, where Q = 
T c 
(55)
0
 t
Clearly, if ς  x̃
∗ (t)2 + ωλ̃(t)2 is less than ũ
∗c (t)2 , then θ (t) = e−t es 
T c λ̄(s)ds. (56)
d V2 /dt < 0. 0
Consider that ũ
c (0) is inside Bu , and that { x̃
(0), λ̃(0)} t
Consider ϑ(t)2 ≤ e−t  Q2 0 es  x̃
(s)2 ds. Since we
is inside Bx and Bλ , that is have limt →∞ x
(t) = x
(limt →∞ x̃
(t) = 0), given any
 x̃
∗ (0)22 + λ̃(0)22 < min(γ 2 , α 2 /(ς 2 + ω2 ). (45) ξ > 0, there exists a tc ≥ 0, such that ∀t ≥ tc ,  x̃(t)2 ≤ ξ .
Let κ be the maximum of  x̃(t)2 for tc > t. For ∀t ≥ 2tc ,
In the following, we will show that ũ
c (t) is inside Bu . we have

 tc  t
From Part 1, at the beginning
ϑ(t)2 ≤ e−t  Q2 κ es ds + ξ es ds (57)
V1 (t) =  x̃
∗ (t)22 + λ̃(t)22 (46) 0 tc
−t /2 −t
≤  Q2 (κ(e − e ) + ξ ). (58)
strictly decreases with time, and it is less than
min(γ 2 , α 2 /(ς 2 + ω2 ) for t > 0. For V2 (t) = ũ
c (t)22 , it Clearly, as t → ∞, the first term tends to 0. Besides, ξ is an
may decrease or increase with time. However, ũ
c (t) must arbitrarily small value. Hence, lim t →∞ ϑ(t) = 0. Similarly,
be inside Bu . It is because even though V2 (t) may reach α 2 we can prove that lim t →∞ θ (t) = 0. From (54), we obtain
at time tb , we still have limt →∞ u
∗c (t) = uo .
Now, we have limt →∞ x(t) = x , limt →∞ u
∗ (t) =
V1 (tb ) =  x̃
∗ (tb )22 + λ̃(tb )22 < min(γ 2 , α 2 /(ς 2 + ω2 )
u
∗ , limt →∞ λ(t) = λo , and limt →∞ u
∗c (t) = uo .
(47) Since every equilibrium corresponds to the optimal solution,
{x , u
∗ , uo , λo } is an optimal solution too. If there is one opti-
because V1 (t) decreases with time. Notice that for any mal solution only, then {x , u
∗ , uo , λo } = {x , u
∗ , u
c , λ∗ }.

positive η1 , η2 , μ1 , and μ2 The proof is complete. 
 2  
η1 + η22 μ21 + μ22 ≥ (η1 μ1 + η2 μ2 )2 . (48) IV. CBPDN-LPNN
Hence, V1 (tb ) < α 2 /(ς 2 + ω2 ) implies that A. Property of the CBPDN Problem
Recall that the CBPDN problem is given by
ς  x̃
∗ (tb )2 + ωλ̃(tb )2 < α. (49)
min x1 , s.t. b − x22 ≤ mσ 2 . (59)
From (44), if (49) holds, then V2 (t) starts to decrease with x
time from tb . Hence, for t > 0, ũ
c (t) is inside Bu , i.e., there Again, the problem is convex. From the well-known convex
are no inactive neurons switching to be active. optimization result [41], [45]–[47], we have the following
(Proof of Part 3): Suppose ũ
c (0) is inside Bu , and proposition for the CBPDN.
{ x̃
(0), λ̃(0)} is inside Bx and Bλ . From Parts 1 and 2, V1 (t) Proposition 2: A point x is an optimal solution of (59),
is strictly decreasing with time. Since V1 (t) is low bounded, it if and only if, there exists a β (Lagrange multiplier) and
converges to a state with V1 (t) = 0, i.e., lim t →∞ x̃
(t) = 0 0 ∈ ∂x 1 − 2β T (b − x ) (60a)
and limt →∞ x
(t) = x
. Furthermore, from Part 2, no
b − x 22 − mσ ≤ 0 2
(60b)
inactive nodes switch to active. Hence, lim t →∞ x(t) = x .
To complete the analysis, we would like to know the β ≥0 (60c)
 
behavior of the Lagrangian neurons and the hidden state of the β b − x 22 − mσ 2 = 0. (60d)
2400 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

For the the CBPDN problem, we do not need to Following the method used in BP-LPNN, we obtain the
consider b22 ≤ mσ 2 . It is because if b22 ≤ mσ 2 , then CBPDN-LPNN dynamics:
b − 022 ≤ mσ 2 and the constraint in (59) is satisfied. The
du
trivial solution of x is a zero vector. Hence, we only need = −u + x + 2λ2 T (b − x) (67a)
to consider b22 > mσ 2 . Then, the KKT conditions can be dt
dλ  
simplified and the result is stated in Theorem 3. = 2λ b − x22 − mσ 2 . (67b)
Theorem 3: Given that b22 > mσ 2 , the optimization dt
problem (59) becomes
C. Properties of CBPDN-LPNN
min x1 , s.t. b − x22 = mσ 2 . (61)
x Again, the optimal solutions of the CBPDN problem, stated
Besides, x is an optimal solution, if and only if, there exists in Theorem 3, are in term of x, while the equilibrium
a β (Lagrange multiplier) and points of (67) involve the hidden state u. Hence, we need
Theorem 4 to explicitly show that the equilibrium points of
0 ∈ ∂x 1 − β T (b − x ) (62a) the CBPDN-LPNN are the optimal solutions. Theorem 4 does
b − x 22 − mσ 2 = 0 (62b) not tell us that the equilibrium points are achievable. Hence,
we need to investigate the above issue. Theorem 5 tells us that
β > 0. (62c)
the equilibrium points are stable, i.e., the equilibrium points
Proof: We first show that given that b22 > mσ 2 , β in are achievable.
Proposition 2 must be greater than zero. From Proposition 2 Theorem 4: Let {u , λ } be an equilibrium point of (67) and

[see (60)], β is either greater than zero or equal to zero. We u = 0. At the equilibrium point, the KKT conditions (62)
will use contradiction to exclude the possibility of β = 0 of Theorem 3 are satisfied. Since the KKT conditions of
when b22 > mσ 2 . From (60a), if β = 0, then 0 ∈ ∂x 1 . Theorem 3 are necessary and sufficient, the equilibrium point
That implies x = 0.1 It follows that b22 ≤ mσ 2 [see (60b)]. of (67) is equilivant to the optimal solution of the CDBPDN
That contradicts our earlier assumption b22 > mσ 2 . Hence, problem.
β must be strictly greater than zero. Proof: Denote x as the output of u . First at all,
As β is strictly greater than zero, from (60d), we obtain if {u , λ } is an equilibrium point, then from (67), we obtain
that b − x 22 − mσ 2 = 0. Hence, if b22 > mσ 2 , then the
KKT conditions of Proposition 2 (necessary and sufficient) −u + x + 2λ 2 T (b − x ) = 0 (68)

become 2λ (b − x 22 2
− mσ ) = 0. (69)

0 ∈ ∂x 1 − β T (b − x ) Besides, from (11) and (12)


b − x 22 − mσ 2 = 0
−u + x + 2λ 2 T (b − x )

β > 0. (63)
∈ −∂x 1 + 2λ 2 T (b − x ). (70)
Furthermore, since b − x 22 − mσ 2 = 0, the optimization
Since −u + x + 2λ 2 T (b − x ) = 0, we obtain
problem (59) can be rewritten as
min x1 , s.t. b − x22 = mσ 2 . (64) 0 ∈ −∂x 1 + 2λ 2 T (b − x ). (71)
x
This means, (62a) is satisfied.
The proof is complete. 
Since λ (b − x 22 − mσ 2 ) = 0, λ is either equal to 0
With Theorem 3, the inequality constrain becomes an equal-
or not equal to 0. we will show that λ∗ = 0 by contradiction.
ity constraint. Hence, we can directly use the LPNN concept
Assume that λ = 0. From (68), we obtain u = x . The
to solve the CBPDN problem.
solution for u = x is u = x = 0 (see Fig. 1). That
contradicts our earlier assumption u = 0. Therefore, we have
B. CBPDN-LPNN Dynamics λ = 0, and then from (69), we obtain (62b).
For the CBPDN problem, the Lagrangian function is As λ = 0, we conclude that λ 2 > 0. This means, there
  exists a β = λ 2 greater than zero. Hence, (62c) is satisfied.
Lbpdn = x1 + λ2 b − x22 − mσ 2 . (65) In addition, the KKT conditions of Theorem 3 are sufficient
and necessary. Hence, any equilibrium point (u , λ ) with
Introducing λ2 rather than β in the second term is to ensure u = 0 is an optimum solution of the CBPDN problem. Using
that the convexity property holds at an equilibrium point. similar way, we can also show that (62) leads to (68) and (69).
The gradients of Lbpdn are given by The proof is complete. 
∂ x Lbpdn = ∂x1 − 2λ2 T (b − x) (66a) Another issue in the LPNN approach is that if the equi-
  librium points are stable or not. Theorem 5 summarizes the
∂Lbpdn
= 2λ b − x22 − mσ 2 . (66b) stability of the equilibrium points.
∂x Theorem 5: Given that b22 > mσ 2 and an equilibrium
1 Given a x , ∂|x | either is equal to sign(x ) or belongs to the
i i i
point {u , λ } of (67) with u = 0, the equilibrium point is
interval [−1, 1]. This means, 0 ∈ ∂|xi | implies xi = 0. an asymptotically stable point.
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2401

Proof: As the proof is a bit complicated, we first give a From the definition of G [see (75)], we also have
brief introduction of the proof. Since the inactive neurons have    
0 −4τ λ 
T (b − 
x )
no effect on the active neurons and the Lagrange neurons, the G = . (78)
τ 0
dynamics can be rewritten as
At the equilibrium point (u , λ ), from (72), we have
d u
 
= −u
+ x
+ 2λ2 
T (b − 
x
) (72a)
dt −u
+ x
= −2λ2 
T b − 
x
. (79)
dλ  
= 2λ b − 
x
22 − mσ 2 (72b) For the active neurons, we have x i < u i for i ∈
. Besides,
dt λ = 0, and then, we have
d u
c  
= −u
c + 2λ2 
T c (b − 
x
). (72c) 
T b − 
x
= 0. (80)
dt
From Appendix A, the linearization of (72) around the equi- Therefore, from (77), (78), and (80), we obtain τ = 0. That
librium point is contradicts the eigenvector assumption. That means, χ = 0.
⎡ ⎤
d u
 Now, we will show that the real part of the eigenvalue is
 ⎡ ⎤ positive. Since (χ T , τ )T is an eigenvector, we have
⎢ dt ⎥ u
− u

⎢ dλ ⎥   
⎢ ⎥ χ  
⎢ ⎥ = −H ⎣ λ − λ ⎦ (73) T
Re [χ̃ τ̃ ]G = Re(ζ ) χ 22 + τ 22 . (81)
⎢ dt ⎥ τ
⎣ d u c ⎦ u
c − u
c


 From the definition of G [see (75)], we also have

dt (u ,λ )   
χ  
where T
Re [χ̃ τ̃ ]G = Re 2λ 2 χ̃ T 
T 
χ . (82)
⎡ ⎤ τ
2λ 2 

T 


−4λ 

T (b −  x )


⎢ ⎥ That means, we have
⎢ 4λ (b −  x )T  ∅⎥    
H= ⎢

0 ⎥. Re 2λ 2 χ̃ T 
T 
χ = Re(ζ ) χ 22 + τ 22 .
⎣   ⎦ (83)
2λ 

T 
c
−4λ 

T b −  x


I
Since λ 2 
T 
is positive definite, the left-hand side must
c

(74) be greater than zero, i.e., Re(ζ ) > 0. That means, all the
eigenvalues of G are with positive real parts.
From the classical optimization theory, if all the eigenvalues
Eigenvalues of H: As G is full rank, it can be diagonalized
of H are with positive real parts, then the equilibrium point
is an asymptotically stable point. G = V ϒ V −1 (84)
In the following, we will show that all the eigenvalues of H
where ϒ is a diagonal matrix whose diagonal element ϒi
are with positive real parts. We first define
⎡ ⎤ values are the eigenvalues of G, the column vectors of V
2λ 2 

T  −4λ 

T (b −  x ) are right eigenvectors of G, and the row vectors of V −1 are





G=⎣ ⎦. (75) the left eigenvectors of G. As shown in the first part, the real
4λ (b − 
x )T 
0 parts of the eigenvalues of G are positive.
Then, H can be rewritten as Consider a matrix

   
  V ∅ −1
G ∅
= , and
=−1 V ∅ . (85)
H= (76) ∅ I ∅ I
B I
where B = [2λ 
T c 
| − 4λ 
c (b − 
x
)]. The proof Define H̃ as H̃ =
H
−1 . From (76), we obtain
 
consists of two parts. In the first part, we show that all the ϒ ∅
H̃ = . (86)
eigenvalues of G are with positive real parts. Based on the BV −1 I
first part, we then show that all the eigenvalues of H are with
positive real parts too. Clearly, H̃ is a lower triangular matrix with diagonal elements
Eigenvalues of G: Clearly, 2λ 2 
T 
is positive/ {ϒ1 , . . . , ϒna +1 , 1, . . . , 1} (87)
semipositive definite. In the proof of Theorem 2, we already   
(n−n a )1 s
discuss that the probability that rank(
) = n a tends to 1 for
large m. When 
T 
is positive definite, the matrix G is are the eigenvalues. Besides, their real parts are greater than
full rank, i.e., rank(G)= n a + 1 (see Appendix B). zero (from the first part). From Appendix C, H and H̃ are
Denote ṽ be the conjugate of v. Let ζ be an eigenvalue of G, with the same set of eigenvalues. This means, the real parts
and (χ T , τ )T = (0T , 0)T be the corresponding eigenvector. of the eigenvalues of H are positive. Hence, the equilibrium
If (χ T , τ )T is an eigenvector of G, it cannot be a zero vector. point {u , λ } is an asymptotically stable point. The proof is
Now, we are going to show that χ = 0. We use contradiction complete. 
to show χ = 0. Assume that χ = 0. From the definition of Note that the global convergent properties of the CBPDN-
eigenvector, we have LPNN model are not known yet. However, it does not limit
    its application or performance. The experimental results, in
0 0 Section V, show that the performance of the CBPDN-LPNN
G =ζ . (77)
τ τ model is identical to that of the two numerical methods.
2402 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

Fig. 2. Simulation results for the BP-problem, where n = 512 and n = 4096. (a) and (d)–(f) MSE performances among the three methods. The experiments
are repeated 100 times using different random matrices. (b), (c), and (g)–(i) Some dynamics examples for the BP-LPNN model.

V. S IMULATIONS among the BP-LPNN, the L1Magic package, and the SPGL1
A. Setting package.
For n = 512, when the number of measurements is less than
We use standard configures [31], [49] to test the proposed
or equal to 70, the reconstruction errors are greater than 0.2.
two models. We consider two signal lengths, n = 512 and
When 95 or more measurements are used, the reconstruction
n = 4096. For n = 512, 15 data points have nonzero val-
errors of the three models are much less than 0.001, as shown
ues (±5). For n = 4096, the numbers of nonzero values (±5)
in the zoomed-in subfigure in Fig. 2(a). For n = 4096, when
are {75, 100, 125}. The measurement matrix  is a ±1 random
there are 75 nonzero elements, around 450 measurements
matrix and it is further normalized with the signal length.
are required. When 465 or more measurements are used, the
We repeat our experiment 100 times with different random
reconstruction errors of the three methods are very small.
matrices, initial states, and sparse signals.
They are much smaller than 0.001, as shown in the zoomed-
in subfigure in Fig. 2(d). Note that when the number of
B. BP-LPNN measurements is greater than the threshold, there are some
We compare our analog method with two digital numerical small differences in the reconstruction errors among the three
approaches. They are the primal-dual interior point method methods. This is because the two numerical methods have
from the L1Magic package [31] and the SPG method from some tuning parameters that affect the accuracy of the solution.
the SPGL1 package [32]. We would like to investigate whether Fig. 2(b), (c), and (g)–(i) shows the output x i (t) values of
the analog BP-LPNN method produces the same MSE perfor- the active set of the equilibrium point under different settings,
mance that the two digital numerical approaches do. where i ∈
. We would like to see when the output x i (t)
Fig. 2(a) and (d)–(f) shows the MSE performances. When values settle down. Since the nonzero elements of the original
the number of measurements reaches a threshold, the recon- signal are equal to ±5, the outputs converge to the values close
struction errors become very small. This phenomenon agrees to ±5. For n = 512 and the selected settings, the outputs can
with the well-known property of the sparse approximation. settle down within around 50–150 characteristic time units.
The MSE performance of the BP-LPNN is quite similar to After these amounts of time, there are no big changes in the
that of two traditional digital methods. All the three methods outputs. For n = 4096 and the selected settings, there are no
have the similar threshold. There is no significant difference big changes in the outputs after 150 characteristic time units.
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2403

Fig. 3. Simulation results for the CBPDN-problem, where n = 512. First row: MSE performances among the three methods. The experiments are repeated
100 times using different random matrices. Secondnd row: some dynamics examples.

Fig. 4. Simulation results for the CBPDN-problem, where n = 4096. First–third rows: MSE performances among the three methods. The experiments are
repeated 100 times using different random matrices. Fourth row: some dynamics examples.

C. CBPDN-LPNN levels are considered: σ 2 = {−26 dB, −32 dB, −46 dB}.
Two digital numerical approaches, the log barrier method Figs. 3 and 4 show their performances.
from the L1Magic package and the SPG method from the From Figs. 3 and 4, when the number of measurements
SPGL1 package, are used for comparison. We expect that the reaches a threshold, the reconstruction errors drop to a very
three methods would have a similar performance. Three noise small value. The threshold mainly depends on the number
2404 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

Fig. 5. Some recovered signals from nonsparse signals. Note that there are no obvious visual differences among the three reconstruction methods.

TABLE I D. Recovery for Nonsparse Signal


MSE OF THE R ECOVERY S IGNALS F ROM N ONSPARSE S IGNALS .
T HE E XPERIMENTS A RE R EPEATED 100 T IMES Sparse approximation can be extended to handle nonsparse
U SING D IFFERENT R ANDOM M ATRICES signals [50]. Let z be a nonsparse signal, and let Q be an
invertible transform. By setting x = Q z, we can use the LPNN
to recover nonsparse signals. The measurement signal is given
by b =  Q z. The problem becomes
min  Q z1 , s.t. b −  Q z22 ≤  (88)
x
where  > 0 is the residual. The 1-D signal that we consid-
ered is extracted from one row of image CameraMan. The
transform used is the Haar transform. We vary the number
of measurements. The MSE values of the reconstruction
signals are summarized in Table I. Besides, some reconstruc-
of nonzero elements and is not sensitive to the noise level.
tion signals are shown in Fig. 5. From Table I, when more
This phenomenon agrees with the well-known property of
measurements are considered, we obtain a better reconstruc-
the sparse approximation. The MSE performance of the
tion. Again, the performances of the CBPDN-LPNN are very
CBPDN-LPNN is quite similar to that of two traditional digital
similar to those of the L1Magic and SPGL1 packages. Also,
methods. All the three methods have the similar threshold.
there are no obvious visual differences among the three recon-
From Fig. 3, for n = 512, around 95 measurements are
struction signals from the three methods, as shown in Fig. 5.
required for all noise levels. The noise level affects the recon-
struction quality only and does not much affect the number of VI. D ISCUSSION
required measurements. For instance, with noise level equal A. Other Recurrent Neural Network Models
to −26 dB, when 100 or more measurements are used, the There are some recurrent neural network models [37]–[40]
MSEs of the three methods are ∼0.005–0.015. With a smaller for nonsmooth convex problems. However, they may not be
noise level σ 2 = −46 dB, when 100 or more measurements suitable for sparse approximation. Since their dynamics are
are used, the MSEs are much smaller than 0.001. directly related to the subdifferential, there is no a simple
For n = 4096, when there are 75 nonzero elements, from way to select a proper subgradient from the subdifferential,
the first row of Fig. 4, around 450 measurements are required, such that the optimal solution is an equilibrium point. We use
regardless the noise level. When we increase the number of the algorithm in [40] to illustrate that these methods are not
non-elements to 125, we should use 650 measurements, as suitable for sparse approximation.
shown in the third row of Fig. 4. Again, the noise level affects Let us consider a simplified version of [40], given by
the reconstruction quality only. For instance, with 75 nonzero
elements and noise level equal to σ 2 = −26 dB, when 600 or min f (x), s.t. h(x) ≥ 0. (89)
x
more measurements are used, the MSEs of the three methods
are ∼0.005–0.015. With a smaller noise level σ 2 = −46 dB, According to (8) in [40], the dynamics are given by
when 600 or more measurements are used, the MSEs of the dx dλ
∈ −2(x − x̃) and = −(λ − λ̃) (90)
three methods are much smaller than 0.0005. There are small dt dt
MSE differences among the three methods, because the two
where x̃ = (x − ∂ f (x) + λ∂h(x)) and λ̃ = max(0, λ − h(x)).
numerical methods have some tuning parameters that affect
Recall that in our case, the problem can be rewritten as
the accuracy of the solution.
The convergent behavior of the CBPDN-LPNN model is min x1 , s.t. mσ 2 − b − x22 ≥ 0. (91)
x
shown in the second row of Fig. 3 and the fourth row of Fig. 4.
Figs. 3 and 4 show the output x i (t) values of the active Let us consider the following two cases.
set of the equilibrium point under different settings, where Case 1:  = [1, 5/8], b = 5.5, σ 2 = 0.25 with optimal
i ∈
. For n = 512, the outputs of the active set can solution x = [5, 0]T .
settle down within around 7–15 characteristic time units. For Case 2:  = [−1, −5/8], b = 5.5, σ 2 = 0.25 with optimal
n = 4096, the outputs of the active set can settle down within solution x = [−5, 0]T .
10–60 characteristic time units. The contour plot of these two cases is shown in Fig. 6.
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2405

points of the network correspond to the optimal solution


of the BP-problem, and that the equilibrium points are
stable. For the CBPDN-LPNN model, the equilibrium points
of the network correspond to the optimal solution of the
CBPDN-problem. Besides, the CBPDN-LPNN model is
locally stable. The experimental results showed that in term
of MSE performance, there are no significant differences
between our analog approach and the traditional numerical
methods. Besides, we briefly discuss the way to use the
Fig. 6. Contour of x1 and the feasible regions in the two situations. LPNN approach to solve the LASSO problem.
From the approach of [40], the dynamics are Although we only prove that the CBPDN-LPNN model
is locally stable, it does not mean that the CBPDN-LPNN
dx
∈ −2(∂|x| − 2λT (b − x)) (92a) model is not global stable. The global convergent property
dt is not known yet. Hence, for completeness, one of the future

= −(λ − λ̃) (92b) directions is to theoretically investigate the global convergence
dt of the CBPDN-LPNN model. Another important extension is
where λ̃ = (λ − mσ 2 + |b − x|22 )+ . According to (92a), to investigate the nonconvex problems, which involves the l p -
at the optimal solution x , we should choose a suitable norm with p < 1 in the objective function and constraints, in
subgradient η from the differential ∂x 1 , such that sparse approximation.
A PPENDIX A
η = 2λ T (b − x ).
L INEARIZATION OF THE DYNAMICS
Otherwise, the optimal solution is not stable. In Case 1, Consider a vector valued function F : n → m and
the optimal point x is at [5, 0]T . We should choose η = F( y) = [F1 ( y), . . . , Fm ( y)]T , and a system, given by
[1, 5/8]T , i.e., we set ∂|x 2 | to 5/8 when x 2 = 0. On the other dy
hand, in Case 2, the optimal point x is at [−5, 0]T . We should ẏ = = F( y). (96)
dt
choose η = [−1, −5/8]T , i.e., we set ∂|x 2 | to −5/8 when
The linearized system at y is given by
x 2 = 0. Clearly, the subgradient should be chosen carefully.
dy
Otherwise, an optimal point may not be stable. The above ≈ F( y ) + J ( y )( y − y ) (97)
example means that when the optimal solution contains some dt
zero elements, we have the subgradient selection problem. where J( y ) is the Jacobian matrix, given by
Similarly, when an optimal point contains multiple zero ⎡ ∂F
1 ∂ F1 ⎤
elements, we cannot use a fixed subgradient. Besides, an ··· 
⎢ ∂y1 ∂yn ⎥
equilibrium point with a particular subgradient may become ⎢ . .. .. ⎥

J( y ) = ⎢ ⎢ .. . . ⎥ . (98)
unstable when we select another subgradient. In contrast, our ⎣ ∂F ⎦ 
m ∂ Fm 
method does not have this selection problem. ··· 
∂y1 ∂yn y= y
B. Least Absolute Shrinkage and Selection Operator Problem
If y is an equilibrium point, then F( y ) is equal to a zero
This section briefly discusses the way to use the LPNN
vector and the linearized system becomes
approach for the least absolute shrinkage and selection oper-
dy
ator (LASSO) problem [51]–[53], given by ≈ J ( y )( y − y ). (99)
dt
min b − x22 , s.t. x1 ≤ ψ (93) For our case, the Jacobian matrix at an equilibrium point is
x
⎡ ⎤
where ψ > 0. Under some conditions, the LASSO problem d u̇
d u̇
d u̇

⎢ ⎥
becomes ⎢ d u
c dλ d u
c ⎥
⎢ ⎥
min b − x22 , s.t. x1 − ψ = 0. ⎢ ⎥
(94) ⎢ d λ̇ d λ̇ d λ̇ ⎥
x ⎢
J (u , λ ) = ⎢ ⎥ . (100)
d u dλ ∂ u ⎥
Although the constraint in the LASSO problem is not differen- ⎢

⎥
c
⎢ ⎥
tiable, we can use the LPNN framework to solve the problem. ⎢ d u̇
c d u̇
c d u̇
c ⎥
⎣ ⎦
Its dynamics are given by d u
dλ d u
c 

(u ,λ )
du dλ
= T (b−)x − λ(u − x), and = x1 − ψ. (95) Define H = − J(u , λ ). After evaluating the elements in
dt dt
J(u , λ ), we obtain
VII. C ONCLUSION ⎡ ⎤
2λ 2 

T 


−4λ 

T (b −  x ) ∅


This paper proposed two LPNN models for solving opti- ⎢ ⎥
⎢ ∅⎥
mization problems in sparse approximation. The BP-LPNN H = ⎢ 4λ (b − 
x )T 
0 ⎥.
⎣ ⎦
model is designed for the BP problem, while the 2λ 

T 
c
s t ar
T
−4λ 
c (b − 
x
c ) I
CBPDN-LPNN is designed for the CBPDN problem.

For the BP-LPNN model, we showed that the equilibrium (101)


2406 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

In the derivation of the Jacobian matrix, we use the facts that [5] Q. Liu and J. Wang, “A one-layer recurrent neural network with
for active neurons (d x i /du i ) = 1, and that for nonactive a discontinuous hard-limiting activation function for quadratic pro-
gramming,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 558–570,
neurons, (d x i /du i ) = 0. Also, at an equilibrium point, Apr. 2008.
|b − 
x
|22 − mσ 2 = 0. [6] S. Bharitkar, K. Tsuchiya, and Y. Takefuji, “Microcode optimization with
neural networks,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 698–703,
May 1999.
A PPENDIX B [7] D. Tank and J. Hopfield, “Simple ‘neural’ optimization networks:
P ROOF OF THE R ANK OF G E QUAL TO n a + 1 An A/D converter, signal decision circuit, and a linear programming
circuit,” IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533–541,
Recall that May 1986.
⎡ ⎤ [8] A. Bouzerdoum and T. R. Pattison, “Neural network for quadratic
2λ 2 

T 


−4λ 

T (b −  x )

optimization with bound constraints,” IEEE Trans. Neural Netw., vol. 4,
G=⎣ ⎦. (102) no. 2, pp. 293–304, Mar. 1993.
4λ (b − 
x )T 
0 [9] X.-B. Liang, “A complete proof of global exponential convergence of a
neural network for quadratic optimization with bound constraints,” IEEE
Since −u
+ x
= 0, we have −4λ 
T (b − 
x ) = 0. Trans. Neural Netw., vol. 12, no. 3, pp. 636–639, May 2001.
[10] M. Fukushima, “Equivalent differentiable optimization problems and
Without loss of generality, we consider that descent methods for asymmetric variational inequality problems,” Math.
  Program., vol. 53, no. 1, pp. 99–110, Jan. 1992.
A υ
G= (103) [11] T. L. Friesz, D. H. Bernstein, N. J. Mehta, R. L. Tobin, and
−υ T 0 S. Ganjalizadeh, “Day-to-day dynamic network disequilibria and ide-
alized traveler information systems,” Oper. Res., vol. 42, no. 6,
where A is n a × n a symmetric positive definite matrix and υ pp. 1120–1136, Jun. 1994.
is nonzero column vector. Since A is invertible, we have [12] B. He and H. Yang, “A neural network model for monotone linear
     asymmetric variational inequalities,” IEEE Trans. Neural Netw., vol. 11,
A υ I − A−1 υ A ∅ no. 1, pp. 3–16, Jan. 2000.
= . (104)
−υ T 0 ∅ 1 −υ T υ T A−1 υ [13] X. Hu and J. Wang, “Solving pseudomonotone variational inequalities
and pseudoconvex optimization problems using the projection neural
Taking determinant of both sides, we obtain network,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1487–1499,
  Nov. 2006.
 A υ  [14] X. B. Gao, “Exponential stability of globally projected dynamic sys-
 T −1
 −υ T 0  = | A| · |υ A υ|. (105) tems,” IEEE Trans. Neural Netw., vol. 14, no. 2, pp. 426–431, Mar. 2003.
[15] Y. Xia, “An extended projection neural network for constrained opti-
Since A−1 is positive definite, |υ T A−1 υ| is nonzero. This mization,” Neural Comput., vol. 16, no. 4, pp. 863–883, Apr. 2004.
[16] X. Hu and J. Wang, “A recurrent neural network for solving a class
means, the determinant of G is nonzero, and then, the rank of general variational inequalities,” IEEE Trans. Syst., Man, Cybern. B,
of G is equal to n a + 1. The proof is complete.  Cybern., vol. 37, no. 3, pp. 528–539, Jun. 2007.
[17] S. Zhang, Y. Xia, and J. Wang, “A complex-valued projection neural
network for constrained optimization of real functions in complex
A PPENDIX C variables,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12,
P ROPERTY OF E IGENVALUES OF M ATRICES pp. 3227–3238, Dec. 2015.
[18] Y. Xia, “A compact cooperative recurrent neural network for computing
Given a full rank square H (symmetric or nonsymmetric), general constrained L 1 norm estimators,” IEEE Trans. Signal Process.,
it can be diagonalized, given by H = U U −1 , where vol. 57, no. 9, pp. 3693–3697, Sep. 2009.
[19] Y. Xia, C. Sun, and W. X. Zheng, “Discrete-time neural network for
is a diagonal matrix whose diagonal element i values are fast solving large linear L 1 estimation problems and its application to
the eigenvalues of H, the column vectors of U are right image restoration,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5,
eigenvectors of H, and the row vectors of U −1 are the left pp. 812–820, May 2012.
eigenvectors of H. Consider H̃ =
H
−1 , where
be [20] Y. Xia and J. Wang, “Low-dimensional recurrent neural network-
based Kalman filter for speech enhancement,” Neural Newtw., vol. 67,
an invertible matrix. The matrix H̃ can be diagonalized too, pp. 131–139, Jul. 2015.
−1
given by H̃ =
U U −1
−1 =
U (
U)−1 = Ũ Ũ . [21] S. Zang and A. G. Constantinides, “Lagrange programming neural
networks,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.,
Considering the normalization of the column vectors of Ũ, vol. 39, no. 7, pp. 441–452, Jul. 1992.
we obtain Ũ = Ü, where  is a diagonal matrix whose [22] X. Zhu, S.-W. Zhang, and A. G. Constantinides, “Lagrange neural
elements are greater than zero, and the length of the column networks for linear programming,” J. Parallel Distrib. Comput., vol. 14,
no. 3, pp. 354–360, Mar. 1992.
vectors of Ü is equal to one. With the normalization [23] V. Sharma, R. Jha, and R. Naresh, “An augmented Lagrange pro-
−1 −1 −1 gramming optimization neural network for short-term hydroelec-
H̃ = Ũ Ũ = Ü −1 Ü = Ü Ü . (106) tric generation scheduling,” Eng. Optim., vol. 37, pp. 479–497,
Jul. 2005.
Hence, H and H̃ are with the same set of eigenvalues. [24] J. Liang, H. C. So, C. S. Leung, J. Li, and A. Farina, “Waveform
design with unit modulus and spectral shape constraints via Lagrange
programming neural network,” IEEE J. Sel. Topics Signal Process.,
R EFERENCES vol. 9, no. 8, pp. 1377–1386, Dec. 2015.
[1] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and [25] J. Liang, C. S. Leung, and H. C. So, “Lagrange programming neural
Signal Processing. London, U.K.: Wiley, 1993. network approach for target localization in distributed MIMO radar,”
[2] J. J. Hopfield, “Neural networks and physical systems with emergent IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1574–1585, Mar. 2016.
collective computational abilities,” Proc. Nat. Acad. Sci. USA, vol. 79, [26] Y. Xia, “Global convergence analysis of Lagrangian networks,” IEEE
no. 8, pp. 2554–2558, Jan. 1982. Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 6,
[3] L. O. Chua and G.-N. Lin, “Nonlinear programming without computa- pp. 818–822, Jun. 2003.
tion,” IEEE Trans. Circuits Syst., vol. 31, no. 2, pp. 182–188, Feb. 1984. [27] X. Lou and J. A. K. Suykens, “Stability of coupled local mini-
[4] Y. Xia, G. Feng, and J. Wang, “A novel recurrent neural network for mizers within the Lagrange programming network framework,” IEEE
solving nonlinear optimization problems with inequality constraints,” Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 2, pp. 377–388,
IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1340–1353, Aug. 2008. Feb. 2013.
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2407

[28] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic Ruibin Feng is currently pursuing the Ph.D. degree
decomposition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845–2862, with the Department of Electronic Engineering, City
Nov. 1999. University of Hong Kong, Hong Kong.
[29] D. L. Donoho and M. Elad, “Optimally sparse representation in general His current research interests include neural net-
(nonorthogonal) dictionaries via 1 minimization,” Proc. Nat. Acad. works and machine learning.
Sci. USA, vol. 100, no. 5, pp. 2197–2202, Mar. 2003.
[30] E. van den Berg and M. P. Friedlander, “Sparse optimization with least-
squares constraints,” SIAM J. Optim., vol. 21, no. 4, pp. 1201–1229,
2011.
[31] E. Candès and J. Romberg. (Oct. 2005). 1 -MAGIC: Recovery
of Sparse Signals via Convex Programming. [Online]. Available:
http://users.ece.gatech.edu/justin/l1magic/downloads/l1magic.pdf Chi-Sing Leung ((M’05–SM’15) received the Ph.D.
[32] E. van den Berg and M. P. Friedlander. (Jun. 2007). SPGL1: A Solver degree in computer science from the Chinese Uni-
for Large-Scale Sparse Reconstruction. [Online]. Available: http://www. versity of Hong Kong, Hong Kong, in 1995.
cs.ubc.ca/labs/scl/spgl1 He is currently a Professor with the Depart-
[33] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen, ment of Electronic Engineering, City University of
“Sparse coding via thresholding and local competition in neural circuits,” Hong Kong, Hong Kong. He has authored over
Neural Comput., vol. 20, no. 10, pp. 2526–2563, Oct. 2008. 120 journal papers in the areas of digital signal
[34] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition processing, neural networks, and computer graphics.
by basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, His current research interests include neural comput-
Jan. 1998. ing and computer graphics.
[35] A. Balavoine, C. J. Rozell, and J. Romberg, “Global convergence of Dr. Leung was a member of the Organizing Com-
the locally competitive algorithm,” in Proc. IEEE Digit. Signal Process. mittee of ICONIP2006. He received the 2005 IEEE Transactions on Multime-
Workshop, IEEE Signal Process. Edu. Workshop (DSP/SPE), Sedona, dia Prize Paper Award for his paper titled The Plenoptic Illumination Function
AZ, USA, Jan. 2011, pp. 431–436. in 2005. He was the Program Chair of ICONIP2009 and ICONIP2012.
[36] A. Balavoine, J. Romberg, and C. J. Rozell, “Convergence and He is/was the Guest Editor of several journals, including Neural Com-
rate analysis of neural networks for sparse approximation,” IEEE puting and Applications, Neurocomputing, and Neural Processing Letters.
Trans. Neural Netw. Learn. Syst., vol. 23, no. 9, pp. 1377–1389, He is a Governing Board Member of the Asian Pacific Neural Network
Sep. 2012. Assembly (APNNA) and the Vice President of APNNA.
[37] Q. Liu and J. Wang, “A one-layer recurrent neural network for con- Anthony G. Constantinides (S’68–M’74–SM’78–
strained nonsmooth optimization,” IEEE Trans. Syst., Man, Cybern. B, F’98) is currently the Professor of Communications
Cybern., vol. 41, no. 5, pp. 1323–1333, Oct. 2011. and Signal Processing with Imperial College Lon-
[38] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural net- don, London, U.K. He has been actively involved
work for nonsmooth nonlinear programming problems,” IEEE Trans. in research in various aspects of digital signal
Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, processing for more than 45 years. He has authored
Sep. 2004. several books and over 400 articles in digital signal
[39] L. Cheng, Z. G. Hou, M. Tan, X. Wang, Z. Zhao, and S. Hu, “A recurrent processing.
neural network for non-smooth nonlinear programming problems,” in Prof. Constantinides is a fellow of the Royal Acad-
Proc. IEEE IJCNN, Aug. 2007, pp. 596–601. emy of Engineering, the Institute of Electrical and
[40] L. Cheng, Z.-G. Hou, Y. Lin, M. Tan, W. C. Zhang, and F.-X. Wu, Electronics Engineers, USA, and the Institution of
“Recurrent neural network for non-smooth convex optimization prob- Electrical Engineers, U.K. He has served as the First President of the European
lems with application to the identification of genetic regulatory net- Association for Signal Processing and has contributed in this capacity to the
works,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 714–726, establishment of the European Journal for Signal Processing. He received the
May 2011. Medal of the Association, Palmes Academiques in 1986, and the Medal of
[41] G. Gordon and R. Tibshirani, “Karush–Kuhn–Tucker conditions,” in the University of Tienjin, Shanghai, China, in 1981. He received honorary
Proc. Optim. Fall Lecture Notes, 2012, pp. 1–26. [Online]. Available: doctorates from European and Far Eastern Universities. Among these, he
https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf values highly the honorary doctorate from the National Technical University
[42] B. Guenin, J. Könemann, and L. Tunçel, A Gentle Introduction to of Athens, Athens, Greece. He has organized the first international series
Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2014. of meetings on Digital Signal Processing, London, initially in 1967, and in
[43] E. J. Candès and M. B. Wakin, “An introduction to compressive Florence with Prof. V. Cappellini at the University of Florence, Florence, Italy,
sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, since 1972. In 1985, he was decorated by the French government with the
Mar. 2008. Honour of Chevalier, Palmes Academiques, and in 1996 with the elevation
[44] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, to Officer, Palmes Academiques. His life work has been recorded in a series
USA: Cambridge Univ. Press, 2004. of audio and video interviews for the IEEE (USA) Archives as a Pioneer
[45] J. J. Fuchs, “Convergence of a sparse representations algorithm applica- of Signal Processing. He has acted as an Advisor to many organizations
ble to real or complex data,” IEEE J. Sel. Topics Signal Process., vol. 1, and governments on modern technology and development. He has served on
no. 4, pp. 598–605, Dec. 2007. the Professorial Selection Committees around the world (15 during the last
[46] J. Dutta and C. S. Lalitha, “Optimality conditions in convex optimization five years) and the EU University Appraising Panels, and as a member of
revisited,” Optim. Lett., vol. 7, no. 2, pp. 221–229, 2013. IEE/IEEE Awards Committees and the Chair (or Co-Chair) of international
[47] A. Dhara and J. Dutta, Optimality Conditions in Convex Optimization. conferences.
New York, NY, USA: Taylor & Francis, 2011.
[48] X. Feng and Z. Zhang, “The rank of a random matrix,” Appl. Math. Wen-Jun Zeng (S’10–M’11) received the M.S.
Comput., vol. 185, no. 1, pp. 689–694, Jan. 2007. degree in electrical engineering from Tsinghua Uni-
[49] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. versity, Beijing, China, in 2008.
Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008. He was a Research Assistant with Tsinghua Uni-
[50] Y. Tsaig and D. L. Donoho, “Extensions of compressed sensing,” Signal versity, from 2006 to 2009. From 2009 to 2011,
Process., vol. 86, no. 3, pp. 549–571, Mar. 2006. he was a Faculty Member with the Department
[51] H. Zhang, W. Yin, and L. Cheng, “Necessary and sufficient conditions of Communication Engineering, Xiamen University,
of solution uniqueness in 1-norm minimization,” J. Optim. Theory Appl., Xiamen, China. He is currently a Senior Research
vol. 164, no. 1, pp. 109–122, 2015. Associate with the Department of Electronic Engi-
[52] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of neering, City University of Hong Kong, Hong Kong.
systems of equations to sparse modeling of signals and images,” SIAM His current research interests include mathematical
Rev., vol. 51, no. 1, pp. 34–81, Feb. 2009. signal processing, including convex optimization, array processing, sparse
[53] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE approximation, and inverse problem, with applications to wireless radio, and
Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. underwater acoustic communications.

View publication stats

Vous aimerez peut-être aussi