Académique Documents
Professionnel Documents
Culture Documents
net/publication/305695496
CITATIONS READS
6 85
4 authors, including:
Some of the authors of this publication are also working on these related projects:
DRIVEN - Diagnostically Robust Ultrasound Video Transmission over Emerging Wireless Networks View project
All content following this page was uploaded by A. G. Constantinides on 02 February 2018.
Abstract— The major limitation of the Lagrange programming Based on the concept of variational inequalities [10]–[12],
neural network (LPNN) approach is that the objective function a number of projection neural network models [13]–[16] for
and the constraints should be twice differentiable. Since sparse constrained optimization problems were proposed, in which
approximation involves nondifferentiable functions, the original
LPNN approach is not suitable for recovering sparse signals. This a projection circuit is required. For simple constraints, such
paper proposes a new formulation of the LPNN approach based as a box set, the projection circuit is very simple. However,
on the concept of the locally competitive algorithm (LCA). Unlike when complicated constraints are considered, the projection
the classical LCA approach which is able to solve unconstrained circuit is difficult to be implemented. Recently, a projection
optimization problems only, the proposed LPNN approach is able neural network model [17] was proposed for handling com-
to solve the constrained optimization problems. Two problems in
sparse approximation are considered. They are basis pursuit (BP) plex variables. In [18]–[20], the concept of projection neural
and constrained BP denoise (CBPDN). We propose two LPNN networks was extended to handle l1 -norm problems. When
models, namely, BP-LPNN and CBPDN-LPNN, to solve these two we use these models (in which the objective function contains
problems. For these two models, we show that the equilibrium an l1 -norm term), the number of neurons is doubled. Many
points of the models are the optimal solutions of the two existing models are designed for solving a particular form of
problems, and that the optimal solutions of the two problems are
the equilibrium points of the two models. Besides, the equilibrium optimization problems. For example, in [5], the model was
points are stable. Simulations are carried out to verify the designed for the quadratic programming problem with equality
effectiveness of these two LPNN models. constrains.
Index Terms— Lagrange programming neural networks The Lagrange programming neural network (LPNN)
(LPNNs), locally competitive algorithm (LCA), optimization. approach [21]–[25] provides a general framework for solving
various nonlinear constrained optimization problems. Further-
I. I NTRODUCTION more, with the augmented term concept, the LPNN approach is
able to solve nonconvex optimization problems. Although the
U SING analog neural networks for solving nonlinear
constrained optimization problems has been studied
over many decades [1]–[5]. The analog neural network
LPNN approach was developed in the early 1990s, the formal
proof [26], [27] of the global convergence for convex problems
was given in the early 2000s. Recently, some new applications
approach is more effective, when realtime solutions are
of using the LPNN approach [24], [25], including target
required [1]–[3], [6]. The use of neural networks for
localization in Multi-input Multi-output and waveform design
optimization could be dated back at least to the 1980s [2], [7].
in radar systems, were reported. In these signal processing
In [7], the Hopfield model was demonstrated to have the
applications, the optimization problems are nonconvex, and
ability for solving several optimization problems. In [3],
the LPNN approach is superior to the traditional numerical
a canonical nonlinear programming circuit was proposed
approaches. However, the major limitation of LPNN is that
to solve nonlinear programming problems with inequality
it cannot handle nondifferentiable objective functions and
constraints. A recurrent neural network model [8] was
constraints.
proposed for quadratic optimization with bound constraints,
Sparse approximation [28], [29] aims at recovering a
and its convergent proof was given in [9].
unknown sparse signal. Its objective function or the constraint
Manuscript received November 27, 2015; revised May 28, 2015, usually is not differentiable. There are many digital algorithms
January 12, 2016, and April 9, 2016; accepted May 19, 2016. Date of (numerical algorithms) for sparse approximation. For example,
publication November 27, 2015; date of current version September 15, 2017.
This work was supported by the Research Grants Council, Hong Kong, under log-barrier and spectral projected gradient (SPG) [30] are
Grant Number, CityU 115612. two representative methods in some sparsity approximation
R. Feng, C.-S. Leung, and W.-J. Zeng are with the Department of Electronic packages [31], [32].
Engineering, City University of Hong Kong, Hong Kong (e-mail: rfeng4-c@
my.cityu.edu.hk; eeleungc@cityu.edu.hk; wenjzeng@gmail.com). In [33], the locally competitive algorithm (LCA) was
A. G. Constantinides is with Imperial College London, London SW7 2AZ, proposed for solving the unconstrained basis pursuit denoise
U.K. (e-mail: a.constantinides@imperial.ac.uk). problem [34]. Unlike the conventional model, the analysis of
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. the LCA is difficult because of using a nonsmooth, unbounded
Digital Object Identifier 10.1109/TNNLS.2016.2575860 and not strictly increasing activation function. The LCA
2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2396 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017
state u. Hence, we need Theorem 1 to explicitly show that the and three balls around the equilibrium point, given by
equilibrium points are the optimal solutions.
Bx = ( x̃
, λ̃) : x̃
22 + λ̃22 ≤ γ 2 (33)
Theorem 1: Let {u , λ } be an equilibrium point of the
BP-LPNN dynamics (23). At the equilibrium point, the KKT Bu = ũ
c : ũ
c 22 ≤ α 2 (34)
conditions (20) of the BP problem are satisfied. Since the KKT Bλ = ( x̃
, λ̃) : x̃
22 + λ̃22 ≤ α 2 /(ς 2 + ω2 ) (35)
conditions of the BP problem are necessary and sufficient, the
equilibrium point of (23) is equilivant to the optimal solution where ς =
T c 2
2
and ω =
T c 2 .
of the BP problem. Theorem 2 tells us about the stability of the equilibrium
Proof: According to the definition of equilibrium points points.
Theorem 2: For the BP-LPNN model, when there is a small
d u dλ perturbation on an equilibrium point, the state converges to the
= 0, and = 0. (24)
dt dt optimal solution of the BP problem.
From (23) and (24), we have Proof: Since the proof is complicated, we give an
overview first. The proof contains three parts. In Part 1, we
−u + x + T (b − x ) + T λ = 0 (25a) show that if the initial state { x̃
(0), λ̃(0)} is inside Bx , then
b − x = 0.
(25b) for t ≥ 0, V1 (t) decreases with time, as long as there are
no inactive neurons switching to be active. Furthermore, as
Clearly, (25b) is identical to (20b). With (25b), (25a) becomes V1 (t) < V1 (0) for t > 0, we have
−u + x + T λ = 0. (26) x̃
(t)22 + λ̃(t)22 < γ 2 (36)
for t ≥ 0. Inequality (36) implies that |x̃ i (t)| < γ , ∀i ∈
,
On the other hand, from (12)
i.e., no active neurons switches to be inactive.
−u + x + T λ ∈ −∂x 1 + T λ . (27) In Part 2, we show that if ũ
c (0) is inside Bu and
{ x̃
(0), λ̃(0)} is inside Bλ , then for t > 0, there are no inactive
Hence, from (26) and (27), we obtain neurons switching to be active. In the proof, we first show that
for t > 0 ũ
c (t) remains inside Bu . When ũ
c (t) is inside Bu ,
0 ∈ −∂x 1 + T λ . (28) we have
That means, (20a) is also satisfied. Using the similar way, −α ≤ u i (t) − u i ≤ α ∀i ∈
c . (37)
one can prove that (20) leads to (25). The proof is
completed. From (29), (37) becomes
In order to discuss the stability, we introduce the concepts −1 ≤ u i (t) ≤ 1 ∀i ∈
c . (38)
of active neurons and inactive neurons in [35] and [36].
Definition 1: For the active neurons, the magnitudes of their That means, there are no inactive neurons switching to be
internal state u i values are greater than 1. The collection
of active.
indices of the active neurons is denoted by
= {i ∈ [1, n], Based Parts 1 and 2, in Part 3, we will prove that if the
|u i | > 1}. Also, we define
as the matrix composed of the initial state is close to an equilibrium point (optimal point),
columns of indexed by
. then limt →∞ x̃
(t) = 0, and λ̃(t) and ũ
c (t) converge.
Definition 2: For the inactive neurons, the magnitudes of (Proof of Part 1): The dynamics of the inactive neurons and
their internal state u i values are less than or equal to 1 and the active neurons can be rewritten as
corresponding outputs x i ’s are equal to 0. The collection
c d ũ
c
of indices of the inactive neurons is denoted by
c = {i ∈ = −ũ
c (t) −
T c
x
(t) +
T c λ̃(t) (39)
dt
[1, n], |u i | ≤ 1}. Also, we define
c as the matrix composed d x̃
of the columns of indexed by
c . = −ũ
(t)+ x̃
∗ (t) −
T
x̃
(t)+
T λ̃(t) (40)
dt
Let {u , x = T1 (u ), λ } be an equilibrium point (optimal d λ̃
solution). Furthermore, let
and
c be the active set and the = −
x̃
(t). (41)
dt
inactive set of x , respectively. We define two constants for
them, given by For a point { x̃
(t), λ̃(t)} inside Bx
d V1
γ = min |x i | and α = minc 1 − |u i |. (29) =2 − x̃ i (t)ũ i (t) + x̃ i2 (t)
i∈
i∈
dt ∗
i∈
ũ = u − u , x̃ = x − x , and λ̃ = λ − λ . (30) Notice that x̃ i and ũ i are with the same sign and |ũ i | > |x̃ i |.
Hence, we have d V1 /dt ≤ 0.
We define two energy functions, given by In sparse approximation, we usually use random matrices as
the measurement matrix, and the number of nonzero elements
V1 ( x̃
(t), λ̃(t)) = x̃
(t)22 + λ̃(t)22 (31) in a sparse solution is much less than the number m of
V2 (ũ
c (t)) = ũ
c (t)22 (32) measurements. Let n a be the number of active neurons in
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2399
For the the CBPDN problem, we do not need to Following the method used in BP-LPNN, we obtain the
consider b22 ≤ mσ 2 . It is because if b22 ≤ mσ 2 , then CBPDN-LPNN dynamics:
b − 022 ≤ mσ 2 and the constraint in (59) is satisfied. The
du
trivial solution of x is a zero vector. Hence, we only need = −u + x + 2λ2 T (b − x) (67a)
to consider b22 > mσ 2 . Then, the KKT conditions can be dt
dλ
simplified and the result is stated in Theorem 3. = 2λ b − x22 − mσ 2 . (67b)
Theorem 3: Given that b22 > mσ 2 , the optimization dt
problem (59) becomes
C. Properties of CBPDN-LPNN
min x1 , s.t. b − x22 = mσ 2 . (61)
x Again, the optimal solutions of the CBPDN problem, stated
Besides, x is an optimal solution, if and only if, there exists in Theorem 3, are in term of x, while the equilibrium
a β (Lagrange multiplier) and points of (67) involve the hidden state u. Hence, we need
Theorem 4 to explicitly show that the equilibrium points of
0 ∈ ∂x 1 − β T (b − x ) (62a) the CBPDN-LPNN are the optimal solutions. Theorem 4 does
b − x 22 − mσ 2 = 0 (62b) not tell us that the equilibrium points are achievable. Hence,
we need to investigate the above issue. Theorem 5 tells us that
β > 0. (62c)
the equilibrium points are stable, i.e., the equilibrium points
Proof: We first show that given that b22 > mσ 2 , β in are achievable.
Proposition 2 must be greater than zero. From Proposition 2 Theorem 4: Let {u , λ } be an equilibrium point of (67) and
[see (60)], β is either greater than zero or equal to zero. We u = 0. At the equilibrium point, the KKT conditions (62)
will use contradiction to exclude the possibility of β = 0 of Theorem 3 are satisfied. Since the KKT conditions of
when b22 > mσ 2 . From (60a), if β = 0, then 0 ∈ ∂x 1 . Theorem 3 are necessary and sufficient, the equilibrium point
That implies x = 0.1 It follows that b22 ≤ mσ 2 [see (60b)]. of (67) is equilivant to the optimal solution of the CDBPDN
That contradicts our earlier assumption b22 > mσ 2 . Hence, problem.
β must be strictly greater than zero. Proof: Denote x as the output of u . First at all,
As β is strictly greater than zero, from (60d), we obtain if {u , λ } is an equilibrium point, then from (67), we obtain
that b − x 22 − mσ 2 = 0. Hence, if b22 > mσ 2 , then the
KKT conditions of Proposition 2 (necessary and sufficient) −u + x + 2λ 2 T (b − x ) = 0 (68)
become 2λ (b − x 22 2
− mσ ) = 0. (69)
Proof: As the proof is a bit complicated, we first give a From the definition of G [see (75)], we also have
brief introduction of the proof. Since the inactive neurons have
0 −4τ λ
T (b −
x )
no effect on the active neurons and the Lagrange neurons, the G = . (78)
τ 0
dynamics can be rewritten as
At the equilibrium point (u , λ ), from (72), we have
d u
= −u
+ x
+ 2λ2
T (b −
x
) (72a)
dt −u
+ x
= −2λ2
T b −
x
. (79)
dλ
= 2λ b −
x
22 − mσ 2 (72b) For the active neurons, we have x i < u i for i ∈
. Besides,
dt λ = 0, and then, we have
d u
c
= −u
c + 2λ2
T c (b −
x
). (72c)
T b −
x
= 0. (80)
dt
From Appendix A, the linearization of (72) around the equi- Therefore, from (77), (78), and (80), we obtain τ = 0. That
librium point is contradicts the eigenvector assumption. That means, χ = 0.
⎡ ⎤
d u
Now, we will show that the real part of the eigenvalue is
⎡ ⎤ positive. Since (χ T , τ )T is an eigenvector, we have
⎢ dt ⎥ u
− u
⎢ dλ ⎥
⎢ ⎥ χ
⎢ ⎥ = −H ⎣ λ − λ ⎦ (73) T
Re [χ̃ τ̃ ]G = Re(ζ ) χ 22 + τ 22 . (81)
⎢ dt ⎥ τ
⎣ d u c ⎦ u
c − u
c
From the definition of G [see (75)], we also have
dt (u ,λ )
χ
where T
Re [χ̃ τ̃ ]G = Re 2λ 2 χ̃ T
T
χ . (82)
⎡ ⎤ τ
2λ 2
T
−4λ
T (b − x )
∅
⎢ ⎥ That means, we have
⎢ 4λ (b − x )T ∅⎥
H= ⎢
0 ⎥. Re 2λ 2 χ̃ T
T
χ = Re(ζ ) χ 22 + τ 22 .
⎣ ⎦ (83)
2λ
T
c
−4λ
T b − x
I
Since λ 2
T
is positive definite, the left-hand side must
c
(74) be greater than zero, i.e., Re(ζ ) > 0. That means, all the
eigenvalues of G are with positive real parts.
From the classical optimization theory, if all the eigenvalues
Eigenvalues of H: As G is full rank, it can be diagonalized
of H are with positive real parts, then the equilibrium point
is an asymptotically stable point. G = V ϒ V −1 (84)
In the following, we will show that all the eigenvalues of H
where ϒ is a diagonal matrix whose diagonal element ϒi
are with positive real parts. We first define
⎡ ⎤ values are the eigenvalues of G, the column vectors of V
2λ 2
T −4λ
V ∅ −1
G ∅
= , and
=−1 V ∅ . (85)
H= (76) ∅ I ∅ I
B I
where B = [2λ
T c
| − 4λ
c (b −
x
)]. The proof Define H̃ as H̃ =
H
−1 . From (76), we obtain
consists of two parts. In the first part, we show that all the ϒ ∅
H̃ = . (86)
eigenvalues of G are with positive real parts. Based on the BV −1 I
first part, we then show that all the eigenvalues of H are with
positive real parts too. Clearly, H̃ is a lower triangular matrix with diagonal elements
Eigenvalues of G: Clearly, 2λ 2
T
is positive/ {ϒ1 , . . . , ϒna +1 , 1, . . . , 1} (87)
semipositive definite. In the proof of Theorem 2, we already
(n−n a )1 s
discuss that the probability that rank(
) = n a tends to 1 for
large m. When
T
is positive definite, the matrix G is are the eigenvalues. Besides, their real parts are greater than
full rank, i.e., rank(G)= n a + 1 (see Appendix B). zero (from the first part). From Appendix C, H and H̃ are
Denote ṽ be the conjugate of v. Let ζ be an eigenvalue of G, with the same set of eigenvalues. This means, the real parts
and (χ T , τ )T = (0T , 0)T be the corresponding eigenvector. of the eigenvalues of H are positive. Hence, the equilibrium
If (χ T , τ )T is an eigenvector of G, it cannot be a zero vector. point {u , λ } is an asymptotically stable point. The proof is
Now, we are going to show that χ = 0. We use contradiction complete.
to show χ = 0. Assume that χ = 0. From the definition of Note that the global convergent properties of the CBPDN-
eigenvector, we have LPNN model are not known yet. However, it does not limit
its application or performance. The experimental results, in
0 0 Section V, show that the performance of the CBPDN-LPNN
G =ζ . (77)
τ τ model is identical to that of the two numerical methods.
2402 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017
Fig. 2. Simulation results for the BP-problem, where n = 512 and n = 4096. (a) and (d)–(f) MSE performances among the three methods. The experiments
are repeated 100 times using different random matrices. (b), (c), and (g)–(i) Some dynamics examples for the BP-LPNN model.
V. S IMULATIONS among the BP-LPNN, the L1Magic package, and the SPGL1
A. Setting package.
For n = 512, when the number of measurements is less than
We use standard configures [31], [49] to test the proposed
or equal to 70, the reconstruction errors are greater than 0.2.
two models. We consider two signal lengths, n = 512 and
When 95 or more measurements are used, the reconstruction
n = 4096. For n = 512, 15 data points have nonzero val-
errors of the three models are much less than 0.001, as shown
ues (±5). For n = 4096, the numbers of nonzero values (±5)
in the zoomed-in subfigure in Fig. 2(a). For n = 4096, when
are {75, 100, 125}. The measurement matrix is a ±1 random
there are 75 nonzero elements, around 450 measurements
matrix and it is further normalized with the signal length.
are required. When 465 or more measurements are used, the
We repeat our experiment 100 times with different random
reconstruction errors of the three methods are very small.
matrices, initial states, and sparse signals.
They are much smaller than 0.001, as shown in the zoomed-
in subfigure in Fig. 2(d). Note that when the number of
B. BP-LPNN measurements is greater than the threshold, there are some
We compare our analog method with two digital numerical small differences in the reconstruction errors among the three
approaches. They are the primal-dual interior point method methods. This is because the two numerical methods have
from the L1Magic package [31] and the SPG method from some tuning parameters that affect the accuracy of the solution.
the SPGL1 package [32]. We would like to investigate whether Fig. 2(b), (c), and (g)–(i) shows the output x i (t) values of
the analog BP-LPNN method produces the same MSE perfor- the active set of the equilibrium point under different settings,
mance that the two digital numerical approaches do. where i ∈
. We would like to see when the output x i (t)
Fig. 2(a) and (d)–(f) shows the MSE performances. When values settle down. Since the nonzero elements of the original
the number of measurements reaches a threshold, the recon- signal are equal to ±5, the outputs converge to the values close
struction errors become very small. This phenomenon agrees to ±5. For n = 512 and the selected settings, the outputs can
with the well-known property of the sparse approximation. settle down within around 50–150 characteristic time units.
The MSE performance of the BP-LPNN is quite similar to After these amounts of time, there are no big changes in the
that of two traditional digital methods. All the three methods outputs. For n = 4096 and the selected settings, there are no
have the similar threshold. There is no significant difference big changes in the outputs after 150 characteristic time units.
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2403
Fig. 3. Simulation results for the CBPDN-problem, where n = 512. First row: MSE performances among the three methods. The experiments are repeated
100 times using different random matrices. Secondnd row: some dynamics examples.
Fig. 4. Simulation results for the CBPDN-problem, where n = 4096. First–third rows: MSE performances among the three methods. The experiments are
repeated 100 times using different random matrices. Fourth row: some dynamics examples.
C. CBPDN-LPNN levels are considered: σ 2 = {−26 dB, −32 dB, −46 dB}.
Two digital numerical approaches, the log barrier method Figs. 3 and 4 show their performances.
from the L1Magic package and the SPG method from the From Figs. 3 and 4, when the number of measurements
SPGL1 package, are used for comparison. We expect that the reaches a threshold, the reconstruction errors drop to a very
three methods would have a similar performance. Three noise small value. The threshold mainly depends on the number
2404 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017
Fig. 5. Some recovered signals from nonsparse signals. Note that there are no obvious visual differences among the three reconstruction methods.
T
−4λ
T (b − x ) ∅
This paper proposed two LPNN models for solving opti- ⎢ ⎥
⎢ ∅⎥
mization problems in sparse approximation. The BP-LPNN H = ⎢ 4λ (b −
x )T
0 ⎥.
⎣ ⎦
model is designed for the BP problem, while the 2λ
T
c
s t ar
T
−4λ
c (b −
x
c ) I
CBPDN-LPNN is designed for the CBPDN problem.
In the derivation of the Jacobian matrix, we use the facts that [5] Q. Liu and J. Wang, “A one-layer recurrent neural network with
for active neurons (d x i /du i ) = 1, and that for nonactive a discontinuous hard-limiting activation function for quadratic pro-
gramming,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 558–570,
neurons, (d x i /du i ) = 0. Also, at an equilibrium point, Apr. 2008.
|b −
x
|22 − mσ 2 = 0. [6] S. Bharitkar, K. Tsuchiya, and Y. Takefuji, “Microcode optimization with
neural networks,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 698–703,
May 1999.
A PPENDIX B [7] D. Tank and J. Hopfield, “Simple ‘neural’ optimization networks:
P ROOF OF THE R ANK OF G E QUAL TO n a + 1 An A/D converter, signal decision circuit, and a linear programming
circuit,” IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533–541,
Recall that May 1986.
⎡ ⎤ [8] A. Bouzerdoum and T. R. Pattison, “Neural network for quadratic
2λ 2
T
−4λ
T (b − x )
optimization with bound constraints,” IEEE Trans. Neural Netw., vol. 4,
G=⎣ ⎦. (102) no. 2, pp. 293–304, Mar. 1993.
4λ (b −
x )T
0 [9] X.-B. Liang, “A complete proof of global exponential convergence of a
neural network for quadratic optimization with bound constraints,” IEEE
Since −u
+ x
= 0, we have −4λ
T (b −
x ) = 0. Trans. Neural Netw., vol. 12, no. 3, pp. 636–639, May 2001.
[10] M. Fukushima, “Equivalent differentiable optimization problems and
Without loss of generality, we consider that descent methods for asymmetric variational inequality problems,” Math.
Program., vol. 53, no. 1, pp. 99–110, Jan. 1992.
A υ
G= (103) [11] T. L. Friesz, D. H. Bernstein, N. J. Mehta, R. L. Tobin, and
−υ T 0 S. Ganjalizadeh, “Day-to-day dynamic network disequilibria and ide-
alized traveler information systems,” Oper. Res., vol. 42, no. 6,
where A is n a × n a symmetric positive definite matrix and υ pp. 1120–1136, Jun. 1994.
is nonzero column vector. Since A is invertible, we have [12] B. He and H. Yang, “A neural network model for monotone linear
asymmetric variational inequalities,” IEEE Trans. Neural Netw., vol. 11,
A υ I − A−1 υ A ∅ no. 1, pp. 3–16, Jan. 2000.
= . (104)
−υ T 0 ∅ 1 −υ T υ T A−1 υ [13] X. Hu and J. Wang, “Solving pseudomonotone variational inequalities
and pseudoconvex optimization problems using the projection neural
Taking determinant of both sides, we obtain network,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1487–1499,
Nov. 2006.
A υ [14] X. B. Gao, “Exponential stability of globally projected dynamic sys-
T −1
−υ T 0 = | A| · |υ A υ|. (105) tems,” IEEE Trans. Neural Netw., vol. 14, no. 2, pp. 426–431, Mar. 2003.
[15] Y. Xia, “An extended projection neural network for constrained opti-
Since A−1 is positive definite, |υ T A−1 υ| is nonzero. This mization,” Neural Comput., vol. 16, no. 4, pp. 863–883, Apr. 2004.
[16] X. Hu and J. Wang, “A recurrent neural network for solving a class
means, the determinant of G is nonzero, and then, the rank of general variational inequalities,” IEEE Trans. Syst., Man, Cybern. B,
of G is equal to n a + 1. The proof is complete. Cybern., vol. 37, no. 3, pp. 528–539, Jun. 2007.
[17] S. Zhang, Y. Xia, and J. Wang, “A complex-valued projection neural
network for constrained optimization of real functions in complex
A PPENDIX C variables,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12,
P ROPERTY OF E IGENVALUES OF M ATRICES pp. 3227–3238, Dec. 2015.
[18] Y. Xia, “A compact cooperative recurrent neural network for computing
Given a full rank square H (symmetric or nonsymmetric), general constrained L 1 norm estimators,” IEEE Trans. Signal Process.,
it can be diagonalized, given by H = U
U −1 , where
vol. 57, no. 9, pp. 3693–3697, Sep. 2009.
[19] Y. Xia, C. Sun, and W. X. Zheng, “Discrete-time neural network for
is a diagonal matrix whose diagonal element i values are fast solving large linear L 1 estimation problems and its application to
the eigenvalues of H, the column vectors of U are right image restoration,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5,
eigenvectors of H, and the row vectors of U −1 are the left pp. 812–820, May 2012.
eigenvectors of H. Consider H̃ =
H
−1 , where
be [20] Y. Xia and J. Wang, “Low-dimensional recurrent neural network-
based Kalman filter for speech enhancement,” Neural Newtw., vol. 67,
an invertible matrix. The matrix H̃ can be diagonalized too, pp. 131–139, Jul. 2015.
−1
given by H̃ =
U
U −1
−1 =
U
(
U)−1 = Ũ
Ũ . [21] S. Zang and A. G. Constantinides, “Lagrange programming neural
networks,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.,
Considering the normalization of the column vectors of Ũ, vol. 39, no. 7, pp. 441–452, Jul. 1992.
we obtain Ũ = Ü, where is a diagonal matrix whose [22] X. Zhu, S.-W. Zhang, and A. G. Constantinides, “Lagrange neural
elements are greater than zero, and the length of the column networks for linear programming,” J. Parallel Distrib. Comput., vol. 14,
no. 3, pp. 354–360, Mar. 1992.
vectors of Ü is equal to one. With the normalization [23] V. Sharma, R. Jha, and R. Naresh, “An augmented Lagrange pro-
−1 −1 −1 gramming optimization neural network for short-term hydroelec-
H̃ = Ũ
Ũ = Ü
−1 Ü = Ü
Ü . (106) tric generation scheduling,” Eng. Optim., vol. 37, pp. 479–497,
Jul. 2005.
Hence, H and H̃ are with the same set of eigenvalues. [24] J. Liang, H. C. So, C. S. Leung, J. Li, and A. Farina, “Waveform
design with unit modulus and spectral shape constraints via Lagrange
programming neural network,” IEEE J. Sel. Topics Signal Process.,
R EFERENCES vol. 9, no. 8, pp. 1377–1386, Dec. 2015.
[1] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and [25] J. Liang, C. S. Leung, and H. C. So, “Lagrange programming neural
Signal Processing. London, U.K.: Wiley, 1993. network approach for target localization in distributed MIMO radar,”
[2] J. J. Hopfield, “Neural networks and physical systems with emergent IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1574–1585, Mar. 2016.
collective computational abilities,” Proc. Nat. Acad. Sci. USA, vol. 79, [26] Y. Xia, “Global convergence analysis of Lagrangian networks,” IEEE
no. 8, pp. 2554–2558, Jan. 1982. Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 6,
[3] L. O. Chua and G.-N. Lin, “Nonlinear programming without computa- pp. 818–822, Jun. 2003.
tion,” IEEE Trans. Circuits Syst., vol. 31, no. 2, pp. 182–188, Feb. 1984. [27] X. Lou and J. A. K. Suykens, “Stability of coupled local mini-
[4] Y. Xia, G. Feng, and J. Wang, “A novel recurrent neural network for mizers within the Lagrange programming network framework,” IEEE
solving nonlinear optimization problems with inequality constraints,” Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 2, pp. 377–388,
IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1340–1353, Aug. 2008. Feb. 2013.
FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS 2407
[28] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic Ruibin Feng is currently pursuing the Ph.D. degree
decomposition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845–2862, with the Department of Electronic Engineering, City
Nov. 1999. University of Hong Kong, Hong Kong.
[29] D. L. Donoho and M. Elad, “Optimally sparse representation in general His current research interests include neural net-
(nonorthogonal) dictionaries via 1 minimization,” Proc. Nat. Acad. works and machine learning.
Sci. USA, vol. 100, no. 5, pp. 2197–2202, Mar. 2003.
[30] E. van den Berg and M. P. Friedlander, “Sparse optimization with least-
squares constraints,” SIAM J. Optim., vol. 21, no. 4, pp. 1201–1229,
2011.
[31] E. Candès and J. Romberg. (Oct. 2005). 1 -MAGIC: Recovery
of Sparse Signals via Convex Programming. [Online]. Available:
http://users.ece.gatech.edu/justin/l1magic/downloads/l1magic.pdf Chi-Sing Leung ((M’05–SM’15) received the Ph.D.
[32] E. van den Berg and M. P. Friedlander. (Jun. 2007). SPGL1: A Solver degree in computer science from the Chinese Uni-
for Large-Scale Sparse Reconstruction. [Online]. Available: http://www. versity of Hong Kong, Hong Kong, in 1995.
cs.ubc.ca/labs/scl/spgl1 He is currently a Professor with the Depart-
[33] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen, ment of Electronic Engineering, City University of
“Sparse coding via thresholding and local competition in neural circuits,” Hong Kong, Hong Kong. He has authored over
Neural Comput., vol. 20, no. 10, pp. 2526–2563, Oct. 2008. 120 journal papers in the areas of digital signal
[34] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition processing, neural networks, and computer graphics.
by basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, His current research interests include neural comput-
Jan. 1998. ing and computer graphics.
[35] A. Balavoine, C. J. Rozell, and J. Romberg, “Global convergence of Dr. Leung was a member of the Organizing Com-
the locally competitive algorithm,” in Proc. IEEE Digit. Signal Process. mittee of ICONIP2006. He received the 2005 IEEE Transactions on Multime-
Workshop, IEEE Signal Process. Edu. Workshop (DSP/SPE), Sedona, dia Prize Paper Award for his paper titled The Plenoptic Illumination Function
AZ, USA, Jan. 2011, pp. 431–436. in 2005. He was the Program Chair of ICONIP2009 and ICONIP2012.
[36] A. Balavoine, J. Romberg, and C. J. Rozell, “Convergence and He is/was the Guest Editor of several journals, including Neural Com-
rate analysis of neural networks for sparse approximation,” IEEE puting and Applications, Neurocomputing, and Neural Processing Letters.
Trans. Neural Netw. Learn. Syst., vol. 23, no. 9, pp. 1377–1389, He is a Governing Board Member of the Asian Pacific Neural Network
Sep. 2012. Assembly (APNNA) and the Vice President of APNNA.
[37] Q. Liu and J. Wang, “A one-layer recurrent neural network for con- Anthony G. Constantinides (S’68–M’74–SM’78–
strained nonsmooth optimization,” IEEE Trans. Syst., Man, Cybern. B, F’98) is currently the Professor of Communications
Cybern., vol. 41, no. 5, pp. 1323–1333, Oct. 2011. and Signal Processing with Imperial College Lon-
[38] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural net- don, London, U.K. He has been actively involved
work for nonsmooth nonlinear programming problems,” IEEE Trans. in research in various aspects of digital signal
Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, processing for more than 45 years. He has authored
Sep. 2004. several books and over 400 articles in digital signal
[39] L. Cheng, Z. G. Hou, M. Tan, X. Wang, Z. Zhao, and S. Hu, “A recurrent processing.
neural network for non-smooth nonlinear programming problems,” in Prof. Constantinides is a fellow of the Royal Acad-
Proc. IEEE IJCNN, Aug. 2007, pp. 596–601. emy of Engineering, the Institute of Electrical and
[40] L. Cheng, Z.-G. Hou, Y. Lin, M. Tan, W. C. Zhang, and F.-X. Wu, Electronics Engineers, USA, and the Institution of
“Recurrent neural network for non-smooth convex optimization prob- Electrical Engineers, U.K. He has served as the First President of the European
lems with application to the identification of genetic regulatory net- Association for Signal Processing and has contributed in this capacity to the
works,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 714–726, establishment of the European Journal for Signal Processing. He received the
May 2011. Medal of the Association, Palmes Academiques in 1986, and the Medal of
[41] G. Gordon and R. Tibshirani, “Karush–Kuhn–Tucker conditions,” in the University of Tienjin, Shanghai, China, in 1981. He received honorary
Proc. Optim. Fall Lecture Notes, 2012, pp. 1–26. [Online]. Available: doctorates from European and Far Eastern Universities. Among these, he
https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf values highly the honorary doctorate from the National Technical University
[42] B. Guenin, J. Könemann, and L. Tunçel, A Gentle Introduction to of Athens, Athens, Greece. He has organized the first international series
Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2014. of meetings on Digital Signal Processing, London, initially in 1967, and in
[43] E. J. Candès and M. B. Wakin, “An introduction to compressive Florence with Prof. V. Cappellini at the University of Florence, Florence, Italy,
sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, since 1972. In 1985, he was decorated by the French government with the
Mar. 2008. Honour of Chevalier, Palmes Academiques, and in 1996 with the elevation
[44] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, to Officer, Palmes Academiques. His life work has been recorded in a series
USA: Cambridge Univ. Press, 2004. of audio and video interviews for the IEEE (USA) Archives as a Pioneer
[45] J. J. Fuchs, “Convergence of a sparse representations algorithm applica- of Signal Processing. He has acted as an Advisor to many organizations
ble to real or complex data,” IEEE J. Sel. Topics Signal Process., vol. 1, and governments on modern technology and development. He has served on
no. 4, pp. 598–605, Dec. 2007. the Professorial Selection Committees around the world (15 during the last
[46] J. Dutta and C. S. Lalitha, “Optimality conditions in convex optimization five years) and the EU University Appraising Panels, and as a member of
revisited,” Optim. Lett., vol. 7, no. 2, pp. 221–229, 2013. IEE/IEEE Awards Committees and the Chair (or Co-Chair) of international
[47] A. Dhara and J. Dutta, Optimality Conditions in Convex Optimization. conferences.
New York, NY, USA: Taylor & Francis, 2011.
[48] X. Feng and Z. Zhang, “The rank of a random matrix,” Appl. Math. Wen-Jun Zeng (S’10–M’11) received the M.S.
Comput., vol. 185, no. 1, pp. 689–694, Jan. 2007. degree in electrical engineering from Tsinghua Uni-
[49] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. versity, Beijing, China, in 2008.
Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008. He was a Research Assistant with Tsinghua Uni-
[50] Y. Tsaig and D. L. Donoho, “Extensions of compressed sensing,” Signal versity, from 2006 to 2009. From 2009 to 2011,
Process., vol. 86, no. 3, pp. 549–571, Mar. 2006. he was a Faculty Member with the Department
[51] H. Zhang, W. Yin, and L. Cheng, “Necessary and sufficient conditions of Communication Engineering, Xiamen University,
of solution uniqueness in 1-norm minimization,” J. Optim. Theory Appl., Xiamen, China. He is currently a Senior Research
vol. 164, no. 1, pp. 109–122, 2015. Associate with the Department of Electronic Engi-
[52] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of neering, City University of Hong Kong, Hong Kong.
systems of equations to sparse modeling of signals and images,” SIAM His current research interests include mathematical
Rev., vol. 51, no. 1, pp. 34–81, Feb. 2009. signal processing, including convex optimization, array processing, sparse
[53] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE approximation, and inverse problem, with applications to wireless radio, and
Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. underwater acoustic communications.