Vous êtes sur la page 1sur 126

1/142

Future Random Matrix Tools


for Large Dimensional Signal Processing
EUSIPCO 2014, Lisbon, Portugal.

Abla KAMMOUN1 and Romain COUILLET2

1 Kings Abdullah University of Technology and Science, Saudi Arabia


2 SUPELEC, France

September 1st, 2014


Introduction/ 2/142

High-dimensional data

I Consider n observations x1 , , xn of size N,


 independent and identically distributed with
zero-mean and covariance CN , i.e, E x1 xH 1 = CN ,
I Let XN = [x1 , , xn ]. The sample covariance estimate SN of CN is given by:
X n
SN = n1 XN XH
N = n
1 xi xi ,
i =1
I From the law of large numbers, as n +,
a.s.
SN CN .

Convergence in the operator norm


I In practice, it might be difficult to afford n +,
I if n  N, SN can be sufficiently accurate,
I if N /n = O(1), we model this scenario by the following assumption: N + and n + with
N
n c,
I Under this assumption, we have pointwise convergence to each element of CN , i.e,
 
a.s.
SN (CN )i,j
i,j

but kSN CN k does not converge to zero.


The convergence in the operator norm does not hold.
Introduction/ 3/142

Illustration
Consider CN = IN , the spectrum of SN is different from that of CN
Spectrum of eigenvalues
Marchenko-Pastur Law
1

0.8

0.6
Histogram

0.4

0.2

0 0.5 1 1.5 2

Eigenvalues of SbN

Figure: Spectrum of eigenvalues when N = 400 and n = 2000

The asymptotic spectrum can be characterized by the Marchenko-Pastur Law.


Introduction/ 4/142

Reasons of interest for signal processing


I Scale similarity in array processing applications: large antenna arrays vs limited number of
observations,
I Need for detection and estimation based on large dimensional random inputs: subspace
methods in array processing.
I The assumption number of obervations  dimension of observation is no longer valid:
large arrays, systems with fast dynamics.

Example
MUSIC with few samples (or in large arrays) Call A() = [a(1 ), . . . , a(K )] CN K , N large,
K small, the steering vectors to identify and X = [x1 , . . . , xn ] CN n the n samples, taken from

X
K

xt = a(k ) p k sk,t + wt .
k =1

W U
The MUSIC localization function reads () = a()H U H a() in the signal vs. noise
W
spectral decomposition XXH = US H + U
SU W WUH .
S W
Writing equivalently A()PA()H + 2 IN = US S UHS + UW UW , as n, N , n/N c,
2 H

from our previous remarks


W U
U H 6 UW UH
W W

Music is NOT consistent in the large N, n regime! We need improved RMT-based solutions.
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 7/142

Stieltjes Transform

Definition
Let F be a real probability distribution function. The Stieltjes transform mF of F is the function
defined, for z C+ , as Z
1
mF (z ) = dF ()
z
For a < b continuity points of F , denoting z = x + iy , we have the inverse formula
Zb
1
F (b ) F (a) = lim =[mF (x + iy )]dx
y 0 a

If F has a density f at x, then


1
f (x ) = lim =[mF (x + iy )]
y 0

The Stieltjes transform is to the Cauchy transform as the characteristic functin is to the Fourier
transform.
Equivalence F mF
Similar to the Fourier transform, knowing mF is the same as knowing F .
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 8/142

Stieltjes transform of a Hermitian matrix

I Let X be a N N randomP matrix. Denote by dF X the empirical measure of its eigenvalues


1 , , N , i.e, dF X = N1 Ni =1 i . The Stieltjes transform of X denoted by mX = mF is
the stieltjes transform of its empirical measure:
Z
1 X 1
N
1 1
mX (z ) = dF () = = tr (X zIN )1 .
z N i z N
i =1

I The Stieltjes transform of a random matrix is the trace of the resolvent matrix
Q(z ) = (X zIN )1 . The resolvent matrix plays a key role in the derivation of many of the
results of random matrix theory.
I For compactly supported F , mF (z ) is linked to the moments Mk = E N1 tr Xk ,

X
+
mF (z ) = Mk z k 1
k =0

I mF is defined in general on C+ but exists everywhere outside the support of F .


Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 9/142

Side remark: the Shannon-transform

A. M. Tulino, S. Verd`
u, Random matrix theory and wireless communications, Now Publishers
Inc., 2004.

Definition
Let F be a probability distribution, mF its Stieltjes transform, then the Shannon-transform VF of
F is defined as Z Z  
1
VF (x ) , log(1 + x )dF () = mF (t ) dt
0 x t

I This quantity is fundamental to wireless communication purposes!


I Note that mF itself is of interest, not F !
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 10/142

Proof of the Marcenko-Pastur law

V. A. Mar
cenko, L. A. Pastur, Distributions of eigenvalues for some sets of random matrices,
Math USSR-Sbornik, vol. 1, no. 4, pp. 457-483, 1967.
The theorem to be proven is the following
Theorem
Let XN CN n have i.i.d. zero mean variance 1/n entries with finite eighth order moments. As
n, N with Nn c (0, ), the e.s.d. of XN XHN converges almost surely to a nonrandom
distribution function Fc with density fc given by
1 p
fc (x ) = (1 c 1 )+ (x ) + (x a ) + ( b x ) +
2cx

where a = (1 c )2 , and b = (1 + c )2 .
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 11/142

The Marcenko-Pastur density


1.2
c = 0.1
c = 0.2
1 c = 0.5

0.8
Density fc (x )

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3

Figure: Mar
cenko-Pastur law for different limit ratios c = limN N /n.
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 12/142

Diagonal entries of the resolvent

Since we want an expression of mF , we start by identifying the diagonal entries of the resolvent
( XN XH
N zIN )
1 of X XH . Denote
N N
yH
 
XN =
Y
Now, for z C+ , we have
1
yH y z yH Y H
 1 
XN XH
N zIN =
Yy YYH zIN 1

Consider the first diagonal element of (RN zIN )1 . From the matrix inversion lemma,
1
(A BD1 C)1 A1 B(D CA1 B)1
  
A B
=
C D (A BD1 C)1 CA1 (D CA1 B)1

which here gives  1  1


XN XH
N zIN =
11 z zyH (YH Y zIn )1 y
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 13/142

Trace Lemma

Z. Bai, J. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, Springer Series
in Statistics, 2009.
To go further, we need the following result,
Theorem
Let {AN } CN N with bounded spectral norm. Let {xN } CN , be a random vector of i.i.d.
entries with zero mean, variance 1/N and finite 8th order moment, independent of AN . Then

1 a.s.
xH
N AN xN tr AN 0.
N

For large N, we therefore have approximately


 1  1
XN XHN zIN '
11 z z N1 tr (YH Y zIn )1
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 14/142

Rank-1 perturbation lemma

J. W. Silverstein, Z. D. Bai, On the empirical distribution of eigenvalues of a class of large


dimensional random matrices, Journal of Multivariate Analysis, vol. 54, no. 2, pp. 175-192,
1995.
It is somewhat intuitive that adding a single column to Y wont affect the trace in the limit.
Theorem
Let A and B be N N with B Hermitian positive definite, and v CN . For z C \ R ,

tr (B zIN )1 (B + vvH zIN )1 A 6 1
1  kAk
N N dist(z, R+ )

with kAk the spectral norm of A, and dist(z, A) = inf y A ky z k.


Therefore, for large N, we have approximately,
 1  1
XN XH
N zIN '
11 z z N1 tr (YH Y zIn )1
1
'
z z N1 tr (XH
N XN zIn )
1

1
=
z z Nn mF (z )

in which we recognize the Stieltjes transform mF of the l.s.d. of XH


N XN .
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 15/142

End of the proof

We have again the relation


n N n 1
m (z ) = mF (z ) +
N F N z
hence  1  1
XN XH
N zIN ' n
11 N 1 z zmF (z )
Note that the choice (1, 1) is irrelevant here, so the expression is valid for all pair (i, i ). Summing
over the N terms and averaging, we finally have
1  1 1
mF (z ) = tr XN XH
N zIN '
N c 1 z zmF (z )

which solve a polynomial of second order. Finally


p
c 1 1 (c 1 z )2 4z
mF (z ) = + .
2z 2 2z
From the inverse Stieltjes transform formula, we then verify that mF is the Stieltjes transform of
the Mar
cenko-Pastur law.
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 16/142

Related bibliography

I V. A. Mar
cenko, L. A. Pastur, Distributions of eigenvalues for some sets of random matrices, Math
USSR-Sbornik, vol. 1, no. 4, pp. 457-483, 1967.
I J. W. Silverstein, Z. D. Bai, On the empirical distribution of eigenvalues of a class of large dimensional
random matrices, Journal of Multivariate Analysis, vol. 54, no. 2, pp. 175-192, 1995.
I Z. D. Bai and J. W. Silverstein, Spectral analysis of large dimensional random matrices, 2nd Edition
Springer Series in Statistics, 2009.
I R. B. Dozier, J. W. Silverstein, On the empirical distribution of eigenvalues of large dimensional
information-plus-noise-type matrices, Journal of Multivariate Analysis, vol. 98, no. 4, pp. 678-694, 2007.
I V. L. Girko, Theory of Random Determinants, Kluwer, Dordrecht, 1990.
I A. M. Tulino, S. Verd`
u, Random matrix theory and wireless communications, Now Publishers Inc., 2004.
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 17/142

Asymptotic results involving Stieltjes transform

J. W. Silverstein, Z. D. Bai, On the empirical distribution of eigenvalues of a class of large


dimensional random matrices, Journal of Multivariate Analysis, vol. 54, no. 2, pp. 175-192,
1995.

Theorem 1
Let YN = 1 XN CN2 , where XN CnN has i.i.d entries of mean 0 and variance 1. Consider the
n
regime n, N + with Nn c. Let m N be the Stieltjes transform associated to XN XN . Then,
N mN 0 almost surely for all z C\R+ , where mN (z ) is the unique solution in the set
m
{z C+ , mN (z ) C+ } to:
1
ctdF CN
Z
m N (z ) = z
1 + tmN (z )

I in general, no explicit expression for F N , the distribution whose Stietljes transform is mN (z ).


I The theorem above characterizes also the Stieltjes transform of BN = XH
N XN denoted by mN ,

1
mN = cmN + (c 1)
z
This gives access to the spectrum of the sample covariance matrix model of x, when
1
yi = CN2 xi , xi i.i.d., CN = E [yyH ].
Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 18/142

0
Getting F from mF

I Remember that, for a < b real,


1
F 0 (x ) = lim =[mF (x + iy )]
y 0

where mF is (up to now) only defined on C+ .


I to plot the density F 0 ,
I first approach: span z = x + iy on the line {x R, y = } parallel but close to the real axis, solve
mF (z ) for each z, and plot =[mF (z )].
I refined approach: spectral analysis, to come next.

Example (Sample covariance matrix)


1 1
For N multiple of 3, let F C (x ) = 31 1x 61 + 13 1x 63 + 13 1x 6K and let BN = n1 CN2 ZH 2
N ZN CN with
F BN F , then
1
mF = cmF + (c 1)
z
 Z 1
t
mF (z ) = c dF C (t ) z
1 + tmF (z )

We take c = 1/10 and alternatively K = 7 and K = 4.


Part 1: Fundamentals of Random Matrix Theory/1.1. The Stieltjes Transform Method 19/142

Spectrum of the sample covariance matrix

Empirical eigenvalue distribution Empirical eigenvalue distribution


Limit law Limit law
0.6 0.6

0.4 0.4
Density

Density
0.2 0.2

0 0
1 3 7 1 3 4

Eigenvalues Eigenvalues

1 1
Figure: Histogram of the eigenvalues of BN = n1 CN2 ZH 2
N ZN CN , N = 3000, n = 300, with CN diagonal composed
of three evenly weighted masses in (i) 1, 3 and 7 on top, (ii) 1, 3 and 4 at bottom.
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 21/142

Support of a distribution
The support of a density f is the closure of the set {x, f (x ) 6=
0}.
cenko-Pastur law is (1 c )2 , (1 + c )2 .

For instance the support of the mar
1.2

0.8
Density fc (x )

0.6

0.4

0.2

" #
Support of the Marchenko-Pastur law
0

0.2
0 0.5 1 1.5 2 2.5 3

Figure: Mar
cenko-Pastur law for different limit ratios c = 0.5.
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 22/142

Extreme eigenvalues

I Limiting spectral results are insufficient to infer about the location of extreme eigenvalues.
P
I Example: Consider dFN (x ) = N1 N 0 N 1 1
k =1 ak . Then, dFN = N dFN + N AN (x ) and dFN with
AN > aN satisfy:
dFN dFN0 0.
I However, the supports of FN and FN0 differ by the mass AN .
Question: How is the behaviour of the extreme eigenvalues of random covariance matrices?
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 23/142

No eigenvalue outside the support of sample covariance matrices

Z. D. Bai, J. W. Silverstein, No eigenvalues outside the support of the limiting spectral


distribution of large-dimensional sample covariance matrices, The Annals of Probability, vol. 26,
no.1 pp. 316-345, 1998.

Theorem
Let XN CN n with i.i.d. entries with zero mean, unit variance and infinite fourth order. Let
CN CN N be nonrandom and bounded in norm. Let mN be the unique solution in C+ of
 Z 1
N N N n 1
mN = z dF CN () , m N (z ) = m (z ) + , z C+ ,
n 1 + m N n N n z

Let FN be the distribution associated to the Stieltjes transform mN (z ). Consider


1 1
BN = n1 CN2 XN XH 2
N CN . We know that F
BN F converge weakly to zero. Choose N N and
N 0
[a, b ], a > 0, outside the support of FN for all N > N0 . Denote LN the set of eigenvalues of BN .
Then,
P (LN [a, b ] 6= i.o.) = 0.
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 24/142

No eigenvalue outside the support: which models?

J. W. Silverstein, P. Debashis, No eigenvalues outside the support of the limiting empirical


spectral distribution of a separable covariance matrix, J. of Multivariate Analysis vol. 100, no. 1,
pp. 37-57, 2009.
I It has already been shown that (for all large N) there is no eigenvalues outside the support of
I cenko-Pastur law: XXH , X i.i.d. with zero mean, variance 1/N, finite 4th order moment.
Mar
1 1
I Sample covariance matrix: C 2 XXH C 2 and XH CX, X i.i.d. with zero mean, variance 1/N, finite 4th
order moment.
1 1
I Doubly-correlated matrix: R 2 XCXH R 2 , X with i.i.d. zero mean, variance 1/N, finite 4th order
moment.

J. W. Silverstein, Z.D. Bai, Y.Q. Yin, A note on the largest eigenvalue of a large dimensional
sample covariance matrix, Journal of Multivariate Analysis, vol. 26, no. 2, pp. 166-168, 1988.
I If 4th order moment is infinite,
H
lim sup XX
max =
N

J. Silverstein, Z. Bai, No eigenvalues outside the support of the limiting spectral distribution of
information-plus-noise type matrices to appear in Random Matrices: Theory and Applications.
I Only recently, information plus noise models, X with i.i.d. zero mean, variance 1/N, finite

4th order moment


(X + A)(X + A)H ,
and the generally correlation model where each column of X has correlation Ri .
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 25/142

Extreme eigenvalues: Deeper into the spectrum

I In order to derive statistical detection tests, we need more information on the extreme
eigenvalues.
I We will study the fluctuations of the extreme eigenvalues (second order statistics)
I However, the Stieltjes transform method is not adapted here!
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 26/142

Distribution of the largest eigenvalues of XXH

C. A. Tracy, H. Widom, On orthogonal and symplectic matrix ensembles, Communications in


Mathematical Physics, vol. 177, no. 3, pp. 727-754, 1996.
K. Johansson, Shape Fluctuations and Random Matrices, Comm. Math. Phys. vol. 209, pp.
437-476, 2000.

Theorem
Let X CN n have i.i.d. Gaussian entries of zero mean and variance 1/n. Denoting +
N the
largest eigenvalue of XXH , then
+ 2
2 (1 + c)
N3 N 4 1 X F
+ +
(1 + c ) 3 c 2

with c = limN N /n and F + the Tracy-Widom distribution given by


 Z 
F + (t ) = exp (x t )2 q 2 (x )dx
t

with q the Painlev


e II function that solves the differential equation
q 00 (x ) = xq (x ) + 2q 3 (x )
q (x ) x Ai(x )

in which Ai(x ) is the Airy function.


Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 27/142

The law of Tracy-Widom


0.5
Empirical Eigenvalues
Tracy-Widom law F +

0.4

0.3
Density

0.2

0.1

0
4 2 0 2

Centered-scaled largest eigenvalue of XXH

2 1 4
Figure: Distribution of N 3 c 2 (1 + c ) 3 + c )2 against the distribution of X + (distributed as
 
N (1 +
Tracy-Widom law) for N = 500, n = 1500, c = 1/3, for the covariance matrix model XXH . Empirical
distribution taken over 10, 000 Monte-Carlo simulations.
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 28/142

Techniques of proof
Method of proof requires very different tools:
I orthogonal (Laguerre) polynomials: to write joint unordered eigenvalue distribution as a
kernel determinant.
p
N (1 , . . . , p ) = det KN (i , j )
i,j =1

with K (x, y ) the kernel Laguerre polynomial.


I Fredholm determinants: we can write hole probability as a Fredholm determinant.
  X (1)k Z Z k Y
P N 2/3 i (1 + c )2 A, i = 1, . . . , N = 1 +

det KN (xi , xj ) dxi
k! Ac Ac i,j =1
k >1

, det(IN KN ).

I kernel theory: show that KN converges to a Airy kernel.

Ai(x )Ai 0 (y ) Ai 0 (x )Ai(y )


KN (x, y ) KAiry (x, y ) = .
x y

I differential equation tricks: hole probability in [t, ) gives right-most eigenvalue distribution,
which is simplified as solution of a Painelve differential equation: the Tracy-Widom
distribution.
R 2
F + (t ) = e t (x t )q (x ) dx , q 00 = tq + 2q 3 , q (x ) x Ai(x ).
Part 1: Fundamentals of Random Matrix Theory/1.2 Extreme eigenvalues 29/142

Comments on the Tracy-Widom law

I deeper result than limit eigenvalue result


I gives a hint on convergence speed
I fairly biased on the left: even fewer eigenvalues outside the support.
I can be shown to hold for other distributions than Gaussian under mild assumptions
Part 1: Fundamentals of Random Matrix Theory/1.3 Extreme eigenvalues: the spiked models 31/142

Spiked models

I We consider n independent observations x1 , , xn of size N,


I The correlation structure is in general white + low rank,

E x1 xH1 = I + P
 

where P is of low rank,


I Objective: to infer the eigenvalues and/or the eigenvectors of P
Part 1: Fundamentals of Random Matrix Theory/1.3 Extreme eigenvalues: the spiked models 32/142

The first result

J. Baik, J. W. Silverstein, Eigenvalues of large sample covariance matrices of spiked population


models, Journal of Multivariate Analysis, vol. 97, no. 6, pp. 1382-1408, 2006.

Theorem 1 1
Let BN = n1 (I + P) 2 XN XH
N (I + P) , where XN C
2 N n has i.i.d., zero mean and unit variance

entries, and PN RN N with eigenvalues given by:

eig(P) = diag(1 , . . . , K , 0, . . . , . . . , 0)
| {z }
N K

with 1 > . . . > K > 1, c = limN N /n. Let 1 , , N be the eigenvalues of BN . We then
have
a.s. 1+
I if >
j c, j 1 + j + c j (i.e. beyond the Mar cenkoPastur bulk!)
j
a.s. 2
I if (0, c ], j (1 + c ) (i.e. right-edge of the Mar cenkoPastur bulk!)
j
a.s. 2
I if [ c, 0 ) , ( 1 c ) (i.e. left-edge of the Mar
cenkoPastur bulk!)
j j
I for the other eigenvalues, we discriminate over c:
a.s. 1+j
I if j < c, c < 1, j 1 + j + c j (i.e. beyond the Mar
cenkoPastur bulk!)
a.s.
I if j < c, c > 1, j (1 c )2 (i.e. left-edge of the Mar
cenkoPastur bulk!)
Part 1: Fundamentals of Random Matrix Theory/1.3 Extreme eigenvalues: the spiked models 33/142

Illustration of spiked models


Mar
cenko-Pastur law, c = 1/3
0.8 Empirical Eigenvalues

0.6
Density

0.4

0.2

0
1+1 1+2
1 + 1 + c 1 , 1 + 2 + c 2

Eigenvalues

1 1
Figure: Eigenvalues of BN = 1
n (P + I)
2 XN XN H (P + I) 2 , where 1 = 2 = 1 and 3 = 4 = 2 Dimensions:
N = 500, n = 1500.
Part 1: Fundamentals of Random Matrix Theory/1.3 Extreme eigenvalues: the spiked models 34/142

Interpretation of the result

I if c is large, or alternatively, if some population spikes are small, part to all of the
population spikes are attracted by the support!
I if so, no way to decide on the existence of the spikes from looking at the largest eigenvalues
I in signal processing words, signals might be missed using largest eigenvalues methods.
I as a consequence,
I the more the sensors (N),
I the larger c = lim N /n,
I the more probable we miss a spike
Part 1: Fundamentals of Random Matrix Theory/1.3 Extreme eigenvalues: the spiked models 35/142

Sketch of the proof


I We start with a study of the limiting extreme eigenvalues.
I Let x > 0, then

det(BN xIN ) = det(IN + P) det(XXH xIN + x [IN (IN + P)1 ])


= det(IN + P) det(XXH xIN )1 det(IN + xP(IN + P)1 (XXH xIN )1 ).


I if x eigenvalue of BN but not of XXH , then for n large, x > (1 + c )2 (edge of MP law
support) and

det(IN + xP(IN + P)1 (XXH xIN )1 ) = det(Ir + x (IN + )1 UH (XXH xIN )1 U) = 0

with P = UUH , U CN r .
I due to unitary invariance of X,
Z
a.s.
UH (XXH xIN )1 U (t x )1 dF MP (t )Ir , m(x )Ir

with F MP the MP law, and m(x ) the Stieltjes transform of the MP law (often known for
r = 1 as trace lemma).
1
I finally, we have that the limiting solutions xk satisfy xk m(xk ) + (1 + k )
k = 0.
I replacing m(x ), this is finally:
a.s. 1
k xk , 1 + k + c (1 + k ) k , if k > c
Part 1: Fundamentals of Random Matrix Theory/1.3 Extreme eigenvalues: the spiked models 36/142

Comments on the result

I there exists a phase



transition when the largest population eigenvalues move from inside to
outside (0, 1 + c ).

I more importantly, for t1 < 1 + c, we still have the same Tracy-Widom,
I no way to see the spike even when zooming in
I in fact, simulation suggests that convergence rate to the Tracy-Widom is slower with spikes.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 38/142

Stieltjes transform inversion for covariance matrix models

J. W. Silverstein, S. Choi, Analysis of the limiting spectral distribution of large dimensional


random matrices, Journal of Multivariate Analysis, vol. 54, no. 2, pp. 295-309, 1995.
1
I We know for the model CN2 XN , XN CN n that, if F CN F C , the Stieltjes transform of the
a.s.
e.s.d. of BN = n1 XH
N CN XN satisfies mBN (z ) mF (z ), with
 Z 1
t
mF (z ) = z c dF C (t )
1 + tmF (z )

which is unique on the set {z C+ , mF (z ) C+ }.


I This can be inverted into
Z
1 t
zF ( m ) = c dF C (t )
m 1 + tm

for m C+ .
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 39/142

Stieltjes transform inversion and spectrum characterization

Remember that we can evaluate the spectrum density by taking a complex line close to R and
evaluating =[mF (z )] along this line. Now we can do better.

It is shown that
lim mF (z ) = m0 (x ) exists.
z x R
z C+

We also have,
I for x0 inside the support, the density f (x ) of F in x0 is 1 =[m0 ] with m0 the unique solution
m C+ of Z
1 t
[zF (m) =] x0 = c dF C (t )
m 1 + tm
I let m0 R and xF the equivalent to zF on the real line. Then x0 outside the support of F
is equivalent to xF0 (mF (x0 )) > 0, mF (x0 ) 6= 0, 1/mF (x0 ) outside the support of F C .

This provides another way to determine the support!. For m (, 0), evaluate xF (m).
Whenever xF decreases, the image is outside the support. The rest is inside.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 40/142

Another way to determine the spectrum: spectrum to analyze


Empirical eigenvalue distribution
Limit law
0.6

0.4
Density

0.2

0
1 3 7

Eigenvalues

1 1
Figure: Histogram of the eigenvalues of BN = n1 CN2 XN XH 2
N CN , N = 300, n = 3000, with CN diagonal composed
of three evenly weighted masses in 1, 3 and 7.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 41/142

Another way to determine the spectrum: inverse function method


xF (m), m B
Support of F

7
xF ( m )

1 13 17 0

1 1
Figure: Stieltjes transform of BN = n1 CN2 XN XH 2
N CN , N = 300, n = 3000, with CN diagonal composed of three
evenly weighted masses in 1, 3 and 7. The support of F is read on the vertical axis, whenever mF is decreasing.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 42/142

Cluster boundaries in sample covariance matrix models

Xavier Mestre, Improved estimation of eigenvalues of covariance matrices and their associated
subspaces using their sample estimates, IEEE Transactions on Information Theory, vol. 54, no.
11, Nov. 2008.

Theorem
Let XN CN n have i.i.d. entries of zero mean, unit variance, and CN be diagonal such that
F CN F C , as n, N , N /n c, where F C has K masses in t1 , . . . , tK with multiplicity
1 1
N CN has support S given by
n1 , . . . , nK respectively. Then the l.s.d. of BN = n1 CN2 XN XH 2

S = [x1 , x1+ ] [x2 , x2+ ] . . . [xQ



, xQ+ ]

with xq = xF (mq ), xq+ = xF (mq+ ), and

1X
K
1 tk
xF (m) = c nk
m n 1 + tk m
k =1

with 2Q the number of real-valued solutions counting multiplicities of xF0 (m) = 0 denoted in
order m1 < m1+ 6 m2 < m2+ 6 . . . 6 mQ

< mQ+
.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 43/142

Comments on spectrum characterization

Previous results allows to determine


I the spectrum boundaries
I the number Q of clusters
I as a consequence, the total separation (Q = K ) or not (Q < K ) of the spectrum in K
clusters.

Mestre goes further: to determine local separability of the spectrum,


I identify the K inflexion points, i.e. the K solutions m1 , . . . , mK to

xF00 (m) = 0

I check whether xF0 (mi ) > 0 and xF0 (mi +1 ) > 0


I if so, the cluster in between corresponds to a single population eigenvalue.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 44/142

Exact eigenvalue separation

Z. D. Bai, J. W. Silverstein, Exact Separation of Eigenvalues of Large Dimensional Sample


Covariance Matrices, The Annals of Probability, vol. 27, no. 3, pp. 1536-1555, 1999.
I Recall that the result on no eigenvalue outside the support
I says where eigenvalues are not to be found
I does not say, as we feel, that (if cluster separation) in cluster k, there are exactly nk eigenvalues.
I This is in fact the case,

Empirical eigenvalue distribution


Limit law
0.6

0.4
Density

0.2

0
1 3 7
n1 n2 n3
Eigenvalues
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 45/142

Eigeninference: Introduction of the problem

I Reminder: for a sequence x1 , . . . , xn CN of independent random variables,

X
n
N = 1
C xk xH
n k
k =1

is an n-consistent estimator of CN = E [x1 xH


1 ].
I If n, N have comparable sizes, this no longer holds.
I Typically, n, N-consistent estimators of the full CN matrix perform very badly.
I If only the eigenvalues of CN are of interest, things can be done. The process of retrieving
information about eigenvalues, eigenspace projections, or functional of these is called
eigen-inference.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 46/142

Girko and the G -estimators

V. Girko, Ten years of general statistical analysis,


http://www.general-statistical-analysis.girko.freewebspace.com/chapter14.pdf
I Girko has come up with more than 50 N, n-consistent estimators, called G -estimators
(Generalized estimators). Among those, we find
I G1 -estimator of generalized variance. For
" #
N ) = 1 n(n 1)N
G1 ( C n log det(CN ) + log Q
(n N ) Nk =1 ( n k )

with n any sequence such that 2


n log(n/(n N )) 0, we have

N )
G1 (C 1
n log det(CN ) 0

in probability.
I However, Girkos proofs are rarely readable, if existent.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 47/142

A long standing problem

X. Mestre, Improved estimation of eigenvalues and eigenvectors of covariance matrices using


their sample estimates, IEEE trans. on Information Theory, vol. 54, no. 11, pp. 5113-5129,
2008.
1 1
I Consider the model BN = n1 CN2 XN XH 2
N CN , where F
CN is formed of a finite number of masses

t1 , . . . , tK .
I It has long been thought the inverse problem of estimating t1 , . . . , tK from the Stieltjes
transform method was not possible.
I Only trials were iterative convex optimization methods.
I The problem was partially solved by Mestre in 2008!
I His technique uses elegant complex analysis tools. The description of this technique is the
subject of this course.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 48/142

Reminders

1 1
I Consider the sample covariance matrix model BN = n1 CN2 XN XH 2
N CN .
I Up to now, we saw:
I that there is no eigenvalue outside the support with probability 1 for all large N.
I that for all large N, when the spectrum is divided into clusters, the number of empirical eigenvalues
in each cluster is exactly as we expect.
I these results are of crucial importance for the following.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 49/142

Eigen-inference for the sample covariance matrix model

X. Mestre, Improved estimation of eigenvalues and eigenvectors of covariance matrices using


their sample estimates, IEEE trans. on Information Theory, vol. 54, no. 11, pp. 5113-5129,
2008.

Theorem 1 1
Consider the model BN = n1 CN2 XN XH 2
N CN , with XN C
N n , i.i.d. with entries of zero mean, unit

variance, and CN RN N is diagonal with K distinct entries t1 , . . . , tK of multiplicity N1 , . . . , NK


of same order as n. Let k {1, . . . , K }. Then, if the cluster associated to tk is separated from the
clusters associated to k 1 and k + 1, as N, n , N /n c,
n X
tk = (m m )
Nk
mNk

P PK
is an N, n-consistent estimator of tk , where Nk = {N K i =k Ni + 1, . . . , N i =k +1 Ni },
1 , . . . , N are the eigenvalues of BN and 1 , . . . , N are the N solutions of

m XH C () = 0
N N XN

1
T
or equivalently, 1 , . . . , N are the eigenvalues of diag() N .
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 50/142

Remarks on Mestres result

Assuming cluster separation, the result consists in


I taking the empirical ordered i s inside the cluster (note that exact separation ensures there
are Nk of these!)
I getting the ordered eigenvalues 1 , . . . , N of
1 T
diag()
N
with = (1 , . . . , N )T . Keep only those of index inside Nk .
I take the difference and scale.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 51/142

How to obtain this result?

I Major trick requires tools from complex analysis


I Silversteins Stieltjes transform identity: for the conjugate model BN = n1 XH
N CN XN ,
 Z 1
t
mN (z ) = z c dF CN (t )
1 + tmN (z )

with mN the deterministic equivalent of mBN . This is the only random matrix result we need.
I Before going further, we need some reminders from complex analysis.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 52/142

Limiting spectrum of the sample covariance matrix

J. W. Silverstein, Z. D. Bai, On the empirical distribution of eigenvalues of a class of large


dimensional random matrices, J. of Multivariate Analysis, vol. 54, no. 2, pp. 175-192, 1995.
Reminder:
a.s.
I If F CN F C , then mBN (z ) mF (z ) such that
 Z 1
t
mF (z ) = c dF C (t ) z
1 + tmF (z )
or equivalently 
mF C 1/mF (z ) = zmF (z )mF (z )
with mF (z ) = cmF (z ) + (c 1) z1 and N /n c.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 53/142

Reminders of complex analysis

I Cauchy integration formula


Theorem
Let U C be an open set and f : U C be holomorphic on U. Let U be a continuous
contour (i.e. closed path). Then, for a inside the surface formed by , we have
I
1 f (z )
dz = f (a)
2i z a

while for a outside the surface formed by ,


I
1 f (z )
dz = 0.
2i z a
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 54/142

Complex integration
I From Cauchy integral formula, denoting Ck a contour enclosing only tk ,
I I I
1 X
K
1 1 N
tk = d = Nj d = mF C ()d .
2i Ck tk 2i Ck Nk tj 2iNk Ck
j =1

I After the variable change = 1/mF (z ),


I mF0 (z )
N 1
tk = zmF (z ) dz,
Nk 2i CF ,k mF2 (z )

I When the system dimensions are large,

1 X
N
1
mF (z ) ' mBN (z ) , , with (1 , . . . , N ) = eig(BN ) = eig(YYH ).
N k z
k =1

I Dominated convergence arguments then show


I mB0 (z )
a.s. N 1
tk tk 0 with tk = zmBN (z ) N
2 (z )
dz
Nk 2i CF ,k mB
N
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 55/142

Understanding the contour change


xF (m), m B
Support of F

7
xF ( m )

m2

m1
1
1/x2 1/x1
1 13 17 0

I IF CF ,k encloses cluster k with real points m1 < m2


I THEN 1/m1 = x1 < tk < x2 = 1/m2 and Ck encloses tk .
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 56/142

Poles and residues

I we find two sets of poles (outside zeros):


I 1 , . . . , N , the eigenvalues of BN .
I the solutions 1 , . . . , N to m N (z ) = 0.
I remember that
n nN 1
mBN (w ) = m (w ) +
N BN N w

  0 (w )
mB
residue calculus, denote f (w ) = n nN N
N wmBN (w ) + N ,
I
mB (w )2
N
I the k s are poles of order 1 and
n
lim (z k )f (z ) =
z k N k

I the k s are also poles of order 1 and by LHospitals rule


0
n (z k )zmBN (z ) n
lim (z k )f (z ) = lim =
z k z k N mBN (z ) N k

I So, finally
n X
tk = (m m )
Nk mcontour
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 57/142

Which poles in the contour?

I we now need to determine which poles are in the contour of interest.


I Since the i are rank-1 perturbations of the i , they have the interleaving property

1 < 2 < 2 < . . . < N < N

I what about 1 ? the trick is to use the fact that


I
1 1
dz = 0
2i Ck z

which leads to I
1 mF0 (w )
dw = 0
2i k mF (w )2
the empirical version of which is

#{i : i k } #{i : i k }

Since their difference tends to 0, there are as many k s as k s in the contour, hence 1 is
asymptotically in the integration contour.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 58/142

Related bibliography

I C. A. Tracy and H. Widom, On orthogonal and symplectic matrix ensembles, Communications in Mathematical Physics, vol. 177, no. 3, pp.
727-754, 1996.
I G. W. Anderson, A. Guionnet, O. Zeitouni, An introduction to random matrices, Cambridge studies in advanced mathematics, vol. 118, 2010.
I F. Bornemann, On the numerical evaluation of distributions in random matrix theory: A review, Markov Process. Relat. Fields, vol. 16, pp.
803-866, 2010.
I Y. Q. Yin, Z. D. Bai, P. R. Krishnaiah, On the limit of the largest eigenvalue of the large dimensional sample covariance matrix, Probability
Theory and Related Fields, vol. 78, no. 4, pp. 509-521, 1988.
I J. W. Silverstein, Z.D. Bai and Y.Q. Yin, A note on the largest eigenvalue of a large dimensional sample covariance matrix, Journal of Multivariate
Analysis, vol. 26, no. 2, pp. 166-168. 1988.
I C. A. Tracy, H. Widom, On orthogonal and symplectic matrix ensembles, Communications in Mathematical Physics, vol. 177, no. 3, pp. 727-754,
1996.
I Z. D. Bai, J. W. Silverstein, No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance
matrices, The Annals of Probability, vol. 26, no.1 pp. 316-345, 1998.
I Z. D. Bai, J. W. Silverstein, Exact Separation of Eigenvalues of Large Dimensional Sample Covariance Matrices, The Annals of Probability, vol.
27, no. 3, pp. 1536-1555, 1999.
I J. W. Silverstein, P. Debashis, No eigenvalues outside the support of the limiting empirical spectral distribution of a separable covariance matrix,
J. of Multivariate Analysis vol. 100, no. 1, pp. 37-57, 2009.
I J. W. Silverstein, J. Baik, Eigenvalues of large sample covariance matrices of spiked population models Journal of Multivariate Analysis, vol. 97,
no. 6, pp. 1382-1408, 2006.
I I. M. Johnstone, On the distribution of the largest eigenvalue in principal components analysis, Annals of Statistics, vol. 99, no. 2, pp. 295-327,
2001.
I K. Johansson, Shape Fluctuations and Random Matrices, Comm. Math. Phys. vol. 209, pp. 437-476, 2000.
I J. Baik, G. Ben Arous, S. P
ech
e, Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices, The Annals of
Probability, vol. 33, no. 5, pp. 1643-1697, 2005.
Part 1: Fundamentals of Random Matrix Theory/1.4 Spectrum Analysis and G-estimation 59/142

Related bibliography (2)

I J. W. Silverstein, S. Choi, Analysis of the limiting spectral distribution of large dimensional random matrices, Journal of Multivariate Analysis, vol.
54, no. 2, pp. 295-309, 1995.
I W. Hachem, P. Loubaton, X. Mestre, J. Najim, P. Vallet, A Subspace Estimator for Fixed Rank Perturbations of Large Random Matrices, arxiv
preprint 1106.1497, 2011.
I R. Couillet, W. Hachem, Local failure detection and diagnosis in large sensor networks, (submitted to) IEEE Transactions on Information Theory,
arXiv preprint 1107.1409.
I F. Benaych-Georges, R. Rao, The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices, Advances in
Mathematics, vol. 227, no. 1, pp. 494-521, 2011.
I X. Mestre, On the asymptotic behavior of the sample estimates of eigenvalues and eigenvectors of covariance matrices, IEEE Transactions on
Signal Processing, vol. 56, no.11, 2008.
I X. Mestre, Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates, IEEE trans. on Information
Theory, vol. 54, no. 11, pp. 5113-5129, 2008.
I R. Couillet, J. W. Silverstein, Z. Bai, M. Debbah, Eigen-Inference for Energy Estimation of Multiple Sources, IEEE Transactions on Information
Theory, vol. 57, no. 4, pp. 2420-2439, 2011.
I P. Vallet, P. Loubaton and X. Mestre, Improved subspace estimation for multivariate observations of high dimension: the deterministic signals
case, arxiv preprint 1002.3234, 2010.
Application to Signal Sensing and Array Processing/2.1 Eigenvalue-based detection 62/142

Problem formulation

I We want to test the hypothesis H0 against H1 ,



hxT + W , information plus noise, hypothesis H1
CN n 3 Y =
W , pure noise, hpothesis H0

with h CN , x CN , W CN n .
I We assume no knowledge whatsoever but that W has i.i.d. (non-necessarily Gaussian)
entries.
Application to Signal Sensing and Array Processing/2.1 Eigenvalue-based detection 63/142

Exploiting the conditioning number

L. S. Cardoso, M. Debbah, P. Bianchi, J. Najim, Cooperative spectrum sensing using random


matrix theory, International Symposium on Wireless Pervasive Computing, pp. 334-338 , 2008.
I under either hypothesis,
I if H0 , for N large, we expect FYYH close to the Mar
cenko-Pastur law, of support
2 2
[2 1 c , 2 1 + c ].
q
I if H1 , if population spike more than 1 + N
n , largest eigenvalue is further away.
I the conditioning number of YYH is therefore asymptotically, as N, n , N /n c,
I if H0 ,
2
max 1 c
cond(Y) , 2
min 1+ c
I if H1 ,
2
ct1 1 c
cond(Y) t1 + > 2
t1 1 1+ c
PN
with t1 = k =1 |hk |2 + 2
I the conditioning number is independent of . We then have the decision criterion, whether
or not is known,


 q 2

1 N
n
H0 : if cond(YYH ) 6  q 2 +
decide

1 + N
n


H : otherwise. 1

for some security margin .


Application to Signal Sensing and Array Processing/2.1 Eigenvalue-based detection 64/142

Comments on the method

I Advantages:
I much simpler than finite size analysis
I ratio independent of , so needs not be known
I Drawbacks:
I only stands for very large N (dimension N for which asymptotic results arise function of !)
I ad-hoc method, does not rely on performance criterion.
Application to Signal Sensing and Array Processing/2.1 Eigenvalue-based detection 65/142

Generalized likelihood ratio test

P. Bianchi, M. Debbah, M. Maida, J. Najim, Performance of Statistical Tests for Source


Detection using Random Matrix Theory, IEEE Trans. on Information Theory, vol. 57, no. 4, pp.
2400-2419, 2011.

I Alternative generalized likelihood ratio test (GLRT) decision criterion, i.e.

sup2 ,h PY|h,2 (Y, h, 2 )


C (Y ) = .
sup2 PY|2 (Y|2 )

I Denote
max (YYH )
TN = 1 H
N tr YY
To guarantee a maximum false alarm ratio of ,
 (1N )n n  (1N )n
TN
decide H1 : if 1 N1 TN 1 N > N
H0 : otherwise.

for some threshold N that can be explicitly given as a function of .


I Optimal test with respect to GLR.
I Performs better than conditioning number test.
Application to Signal Sensing and Array Processing/2.1 Eigenvalue-based detection 66/142

Performance comparison for unknown 2 , P


0.7
Neyman-Pearson, Jeffreys prior
Neyman-Pearson, uniform prior
0.6 Conditioning number test
GLRT
0.5
Correct detection rate

0.4

0.3

0.2

0.1

0
0.1 0.5 1 2

False alarm rate 102

Figure: ROC curve for a priori unknown 2 of the Neyman-Pearson test, conditioning number method and
GLRT, K = 1, N = 4, M = 8, SNR = 0 dB. For the Neyman-Pearson test, both uniform and Jeffreys prior,
with exponent = 1, are provided.
Application to Signal Sensing and Array Processing/2.1 Eigenvalue-based detection 67/142

Related biography

I R. Couillet, M. Debbah, A Bayesian Framework for Collaborative Multi-Source Signal Sensing, IEEE Transactions on Signal Processing, vol. 58,
no. 10, pp. 5186-5195, 2010.
I T. Ratnarajah, R. Vaillancourt, M. Alvo, Eigenvalues and condition numbers of complex random matrices, SIAM Journal on Matrix Analysis and
Applications, vol. 26, no. 2, pp. 441-456, 2005.
I M. Matthaiou, M. R. McKay, P. J. Smith, J. A. Mossek, On the condition number distribution of complex Wishart matrices, IEEE Transactions on
Communications, vol. 58, no. 6, pp. 1705-1717, 2010.
I C. Zhong, M. R. McKay, T. Ratnarajah, K. Wong, Distribution of the Demmel condition number of Wishart matrices, IEEE Trans. on
Communications, vol. 59, no. 5, pp. 1309-1320, 2011.
I L. S. Cardoso, M. Debbah, P. Bianchi, J. Najim, Cooperative spectrum sensing using random matrix theory, International Symposium on Wireless
Pervasive Computing, pp. 334-338 , 2008.
I P. Bianchi, M. Debbah, M. Maida, J. Najim, Performance of Statistical Tests for Source Detection using Random Matrix Theory, IEEE Trans. on
Information Theory, vol. 57, no. 4, pp. 2400-2419, 2011.
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 69/142

Source localization

A uniform array of M antennas receives signal from K radio sources during n signal snapshots.
Objective: Estimate the arrival angles 1 , , K .

1
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 70/142

Source Localization using Music Algorithm


We consider the scenario of K sources and N antenna-array capturing n observations:

X
K
xt = a(k )sk,t + wt , t = 1, , n
k =1


1
e sin
I AN = [aN (1 ), , aN (K )] with aN () =

e (N 1) sin
I 2 is the noise variance and is set 1 for simplicity,
I Objective: infer 1 , , K from the n observations
I Let XN = [x1 , , xn ], then,
 
S
X = AS + W = [A IN ]
W

I If K is finite while n, N +, the model correponds to the spiked covariance model.


I MUSIC Algorithm: Let be the orthogonal projection matrix on the span of AA and
= IN (orthogonal projector on the noise subspace). Angles 1 , , K are the
unique ones verifying
() , aN () aN () = 0
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 71/142

Traditional MUSIC algorithm

I Traditional MUSIC algorithm: Angles are estimated as local minima of:

aN ()
aN ()

where is the orthogonal projection matrix on the eigenspace associated to the K largest
eigenvalues of n1 XN XN
I It is well-known that this estimator is consistent when n + with K , N fixed,
I We consider the case of K finite spiked covariance model
I What happens when n, N + ?
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 72/142

Asymptotic behaviour of the traditional MUSIC (1)

We first need to understand the spectrum of 1 H


n XX
I We know that the weak spectrum is the MP law
I Up to K eigenvalues can leave the support: we identify here these eigenvalues

Denote P = AAH = US UH T T T
S , = diag(1 , . . . , K ), and Z = [S W ] to recover (up to
one row) the generic spiked model
1
X = (IN + P) 2 Z.

1 H
I Reminder: If x eigenvalue of n XX with x > (1 + c )2 (edge of MP law), for all large n,
a.s. 1
x , k k , 1 + k + c (1 + k )
k , if k > c

for some k.
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 73/142

Asymptotic behaviour of the traditional MUSIC (2)


Recall the MUSIC approach: we want to estimate

() = a()H UW UH
W a() (UW CN (N K ) such that UH
W US = 0)

Instead of this quantity, we start with the study of

a()H u H
i ui a(), k = 1, . . . , K

1 , . . . , u
with u N the eigenvectors belonging to 1 > . . . > N .
To fall back on known RMT quantities, we use the Cauchy-integral:
I
1 1
a()H u H
i ui a() = a()H ( XXH zIN )1 a()dz
2 Ci n

with Ci a contour enclosing i only.


Woodburys identity (A + UCV )1 = A1 A1 U (C 1 + VA1 U )1 VA1 gives:
I I
1 1 ZZH 1 1
aH u H
i u i a= aH (IN + P) 2 ( zIN )1 (IN + P) 2 adz + aH
1H
b 1
a2 dz
2 Ci n 2 Ci

where P = US UH
S , and
1

H
b = IK + z (IK + )1 UH H
S ( n ZZ zIN )
1
US
1
aH
= za()H (IN + P) 2 ( n1 ZZH zIN )1 US


1
1

a2 = (IK + ) 1
UH 1
S ( n ZZ
H zIN )1 (IN + P) 2 a().
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 74/142

Asymptotic behaviour of the traditional MUSIC (3)


I For large n, the first term has no pole, while the second converges to

I
H = IK + zm(z )(IK + )1
1 H 1 1
Ti , a H a2 dz, with a1 = zm(z )a (IN + P) 2 US
H
2 Ci 1
21
a2 = m(z )(IK + ) UH 1
S (IN + P) a

which after development is

X
K I
1 1 zm2 (z )
Ti = 1+`
dz.
1 + ` 2
`=1 Ci ` + zm(z )

I Using residue calculus, the sole pole is in i and we find


2
a.s. 1 c
a()H u H
i ui a()
i
1
a()H ui uH
i a().
1 + c i

Therefore,

a.s. X
K
1 c 2
a() a()a()H
() = a()H i
a()H ui uH
i a()
1
i =1
1 + c
i
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 75/142

Improved G-MUSIC

Recall that:
1
1 + c a.s.
a()H uk uH
k a()
k
2
a()H u H
k uk a() 0
1 c k

The k are however unknown. But they can be estimated from


a.s. 1
k k = 1 + k + c (1 + k )
k

This gives finally

X
K

1 + c 1
G () ' a()H a()
k
2
a()H u H
k u k a()
k =1

1 c k

with

k (c + 1)
q
k =
+ (c + 1
k )2 4c )
2

We then obtain another (N, n)-consistent MUSIC estimator, only valid for K finite!
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 76/142

Simulation results
0

10
Cost function [dB]

15

20

25
35 37
MUSIC
G-MUSIC
30
angle [deg] -10 35 37

angle [deg]

Figure: MUSIC against G-MUSIC for DoA detection of K = 3 signal sources, N = 20 sensors, M = 150
samples, SNR of 10 dB. Angles of arrival of 10 , 35 , and 37 .
Application to Signal Sensing and Array Processing/2.2 The spiked G-MUSIC algorithm 77/142

Outline of the tutorial

I Part 1: Basics of Random Matrix Theory for Sample Covariance Matrices


I 1.1. Introduction to the Stieltjes transform method, Mar
cenkoPastur law, advanced models
I 1.2. Extreme eigenvalues: no eigenvalue outside the support, exact separation, TracyWidom law
I 1.3. Extreme eigenvalues: the spiked models
I 1.4. Spectrum analysis and G-estimation
I Part 2: Application to Signal Sensing and Array Processing
I 2.1. Eigenvalue-based detection
I 2.2. The (spiked) G-MUSIC algorithm
I Part 3: Advanced Random Matrix Models for Robust Estimation
I 3.1. Robust estimation of scatter
I 3.2. Robust G-MUSIC
I 3.3. Robust shrinkage in finance
I 3.4. Second order robust statistics: GLRT detectors
I Part 4: Future Directions
I 4.1. Kernel random matrices and kernel methods
I 4.2. Neural network applications
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 80/142

Covariance estimation and sample covariance matrices

P.J. Huber, Robust Statistics, 1981.


Many statistical inference techniques rely on the sample covariance matrix (SCM) taken
from i.i.d. observations x1 , . . . , xn of a r.v. x CN .
I The main reasons are:
I Assuming E [x ] = 0, E [xx ] = CN , with X = [x1 , . . . , xn ], by the LLN
1 a.s.
SN , XX CN as n .
n
Hence, if = f (CN ), we often use the n-consistent estimate = f (SN ).
I The SCM SN is the ML estimate of CN for Gaussian x
One therefore expects to closely approximate for all finite n.
I This approach however has two limitations:
I if N, n are of the same order of magnitude,
kSN CN k 6 0 as N, n , N /n c > 0, so that in general |
| 6 0

This motivated the introduction of G-estimators.


I if x is not Gaussian, but has heavier tails, SN is a poor estimator for CN .
This motivated the introduction of robust estimators.
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 81/142

Reminders on robust estimation

J. T. Kent, D. E. Tyler, Redescending M-estimates of multivariate location and scatter, 1991.


R. A. Maronna, Robust M-estimators of multivariate location and scatter, 1976.
Y. Chitour, F. Pascal, Exact maximum likelihood estimates for SIRV covariance matrix:
Existence and algorithm analysis, 2008.
The objectives of robust estimators:
I Replace the SCM SN by another estimate CN of CN which:
I rejects (or downscales) observations deterministically
I or rejects observations inconsistent with the full set of observations
Example: Huber estimator, CN defined as solution of
 
1X
n
k2
CN = i xi xi with i = min 1, 1 for some > 1, k 2 function of CN .
n 1
C
i =1 N xi N xi

I Provide scale-free estimators of CN :


Example: Tylers estimator: if one observes xi = i zi for unknown scalars i ,

1X
n
1
CN = 1 1
xi xi
n
i =1 N xi CN xi

I existence and uniqueness of CN defined up to a constant.


I few constraints on x1 , . . . , xn (N + 1 of them must be linearly independent)
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 82/142

Reminders on robust estimation

The objectives of robust estimators:


I replace the SCM SN by the ML estimate for CN .
Example: Maronnas estimator for elliptical x

1X
n  
1 1
CN = u xi CN xi xi xi
n N
i =1

with u (s ) such that


(i) u (s ) is continuous and non-increasing on [0, )
(ii) (s ) = su (s ) is non-decreasing, bounded by > 1. Moreover, (s ) increases where (s ) < .
(note that Hubers estimator is compliant with Maronnas estimators)
I existence is not too demanding
I uniqueness imposes strictly increasing u (s ) (inconsistent with Hubers estimate)
I consistency result: CN CN if u (s ) meets the ML estimator for CN .
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 83/142

Robust Estimation and RMT

So far, RMT has mostly focused on the SCM SN .


I x = AN w , w having i.i.d. zero-mean unit variance entries,
I x satisfies concentration inequalities, e.g. elliptically distributed x.

Robust RMT estimation


Can we study the performance of estimators based on the CN ?
N ?
I what are the spectral properties of C

I can we generate RMT-based estimators relying on CN ?


Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 84/142

Setting and assumptions


I Assumptions:
1
I Take x1 , . . . , xn CN elliptical-like random vectors, i.e. xi = i CN2 wi where
I , . . . , n R+ random or deterministic with 1
Pn a.s.
1 n i =1 i 1
I w , . . . , wn CN random independent with w / N uniformly distributed over the unit-sphere
1 i
I C CN N deterministic, with C  0 and lim sup kC k <
N N N N
I We denote cN , N /n and consider the growth regime cN c (0, 1).
I Maronnas estimator of scatter: (almost sure) unique solution to

1X
n  
1 1
CN = u xi CN xi xi xi
n N
i =1

where u satisfies
(i) u : [0, ) (0, ) nonnegative continuous and non-increasing
(ii) : x 7 xu (x ) increasing and bounded with limx (x ) , > 1
1
(iii) < c+ .
1 Pn
I Additional technical assumption: Let n , n i =1 i . For each a > b > 0, a.s.

lim supn n ((t, ))


lim sup = 0.
t (at ) (bt )

Controls relative speed of the tail of n versus the flattening speed of (x ) as x .


Examples:
I i < M for each i. In this case, n ((t, )) = 0 a.s. for t > M.
I For u (t ) = (1 + )/( + t ), > 0, and i i.i.d., it is sufficient to have E [11+ ] < .
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 85/142

Heuristic approach
I Major issues with CN :
I Defined implicitly q
I Sum of non-independent rank-one matrices from vectors u ( N1 xi CN1 xi )xi (CN depends on all xj s).
I But there is some hope:
I First remark: we can work with CN = IN without generality restriction!
I Denote
1X
n  
1 1
C(j ) = u x C x x x
n N i N i i i
i 6=j

Then intuitively, C(j ) and xj are only weakly dependent.


I We expect in particular (highly non-rigorous but intuitive!!):
1 1 1 1
x C x ' i tr C(i )1 ' i tr CN1 .
N i (i ) i N N
I Our heuristic approach:
I Rewrite N1 xi CN1 xi as f ( N1 xi C(i )1 xi ) for some function f (later called g 1 )
I Deduce that
1X
n  
1 1
CN = (u f ) xi C(i ) xi xi xi
n N
i =1
I Use 1
N xi C(i )1 xi ' i 1
N tr CN1 to get

1X
n  
1
CN ' (u f ) i tr CN1 xi xi
n N
i =1
I Use random matrix results to find a limiting value for 1
N tr CN1 , and conclude
1X
n
CN ' (u f )(i )xi xi .
n
i =1
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 86/142

Heuristic approach in detail: f and

I Determination of f : Recall the identity (A + tvv )1 v = A1 /(1 + tv A1 v ). Then


1 1
1 1 N xi C(i ) xi
xi CN xi =
N 1 + cN u ( N xi CN1 xi ) N1 xi C(i )1 xi
1

so that
1 1
1 1 N xi CN xi
xi C(i ) xi = .
N 1 cN ( N1 xi CN1 xi )
Now the function g : x 7 x /(1 cN (x )) is monotonous increasing (we use the assumption
< c 1 !), hence, with f = g 1 ,
 
1 1 1 1
xi CN xi = g 1 xi C(i ) xi .
N N
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 87/142

Heuristic approach in detail: f and


I Determination of : From previous calculus, we expect
1X 1X
n   n
1
CN ' (u g 1 ) i tr CN1 xi xi ' (u g 1 ) (i ) xi xi .
n N n
i =1 i =1
Hence !1
1X
n
1 1
' tr CN1 ' tr (u g 1 ) (i ) i wi wi .
N N n
i =1
Since i are independent of wi and deterministic, this is a Bai-Silverstein model
1
WDW , W = [w1 , . . . , wn ], D = diag(Dii ) = u g 1 (i ).
n
And we have:
1 Z !1
t (u g 1 )(t )

1 1
' tr WDW = m 1 WDW (0) ' 0+ N ( dt )
N n n 1 + c (u g 1 )(t )m 1 WDW (0)
n
!1
1X
n
i (u g 1 )(i )
= .
n 1 + c i (u g 1 )(i )m 1 WDW (0)
i =1 n

Since ' m 1 WDW (0), this defines as a solution of a fixed-point equation:


n
!1
1X
n
i (u g 1 )(i )
= .
n 1 + c i (u g 1 )(i )
i =1
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 88/142

Main result

R. Couillet, F. Pascal, J. W. Silverstein, The Random Matrix Regime of Maronnas M-estimator


with elliptically distributed samples, (submitted to) Elsevier Journal of Multivariate Analysis.
Theorem (Asymptotic Equivalence)
Under the assumptions defined earlier, we have

1X
n
a.s.
CN SN 0, where SN , v (i )xi xi
n
i =1

v (x ) = (u g 1 )(x ), (x ) = xv (x ), g (x ) = x /(1 c (x )) and > 0 unique solution of

1 X (i )
n
1= .
n 1 + c (i )
i =1

I Remarks:
I Th. says: first order substitution of CN by SN allowed for large N, n.
I It turns out that v u and in general behavior.
I Corollaries:
a.s.
max i (SN ) i (CN ) 0

16i 6n
1 1 a.s.
tr (CN zIN )1 tr (SN zIN )1 0
N N
Important feature for detection and estimation.
I Proof: So far in the tutorial, we do not have a rigorous proof!
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 89/142

Proof

Fundamental idea: Showing that all 1 1 1


i N xi C(i ) xi converge to the same limit .
I

I Technical trick: Denote  


v 1 1
N xi C(i ) xi
ei ,
v (i )
and relabel terms such that
e1 6 . . . 6 en
We shall prove that, for each ` > 0,

e1 > 1 ` i.o. and en < 1 + ` i.o.

Some basic inequalities: Denoting di , 1 1 1 1 1


i N xi C(i ) xi = N wi C ( i ) wi , we have
I

  P 1    P 1 
v j N1 wj n1 i 6=j i v (i di )wi wi wj v j N1 wj n1 i 6=j i v (i )ei wi wi wj
ej = =
v (j ) v (j )
  P 1  

 P 1 
v j N1 wj n1 i 6=j i v (i )en wi wi wj v enj 1
N wj
1
n i 6=j i v (i )wi wi wj
6 =
v (j ) v (j )
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 90/142

Proof
I Specialization to en :
  P 1 
n 1 1
v e n N wn n i 6=n i v (i )wi wi wn
en 6
v (n )
or equivalently, recalling (x ) = xv (x ),
 P 1
  P 1 
n 1 1
1 1 e n N wn i v (i )wi wi wn
N wn n i 6=n i v (i )wi wi wn n i 6=n
6 .
(n )

I Random Matrix results:


I By trace lemma, we should have
1 1
1 1 X 1 1 X
w i v (i )wi wi wn ' tr i v (i )wi wi '
N n n N n
i 6=n i 6=n

(by definition of as in previous slides). . .


I DANGER: by relabeling, wn no longer independent of w1 , . . . , wn1 !
Broken trace lemma!
I Solution: uniform convergence result.
By (higher order) moment bounds, Markov inequality, and Borel Cantelli, for all large n a.s.
1
X

1
1


max wj i v (i )wi wi wj < .

16j 6n N n
i 6=j
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 91/142

Proof
I Back to original problem: For all large n a.s., we then have (using growth of )
 
enn ( + )
6 .
(n )

I Proof by contradiction: Assume en > 1 + ` i.o., then on a subsequence en > 1 + ` always and

1+` n ( + )


6 .
(n )

I Bounded support for i : If 0 < < i < + < for all i, n, then on a subsequence where
n 0 ,
0 
1+` ( + )
6 CONTRADICTION!
(0 )
| {z } | {z }

1 as 0

1+`0
<1 as 0
(0 )

I Unbounded support for i : Importance of relative growth of n versus convergence of to .


Proof consists in dividing {i } in two groups: few large ones versus all others.
Sufficient condition:
lim supn n ((t, ))
lim sup = 0.
t (at ) (bt )
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 92/142

Simulations
0.5
1 Pn
Empirical eigenvalue distribution of n
i =1 x i x i
Limiting density

0.4

0.3
Density

0.2

0.1

0
0 5 10 15 20 25 30

Eigenvalues

1 Pn
Figure: Histogram of the eigenvalues of n i =1 xi xi for n = 2500, N = 500, CN = diag(I125 , 3I125 , 10I250 ), 1
with (.5, 2)-distribution.
Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 93/142

Simulations

Empirical eigenvalue distribution of CN Empirical eigenvalue distribution of SN


Limiting density Limiting density
3 3

2 2
Density

Density
1 1

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2

Eigenvalues Eigenvalues

Figure: Histogram of the eigenvalues of CN (left) and SN (right) for n = 2500, N = 500,
CN = diag(I125 , 3I125 , 10I250 ), 1 with (.5, 2)-distribution.

I Remark/Corollary: Spectrum of CN a.s. bounded uniformly on n.


Advanced Random Matrix Models for Robust Estimation/3.1 Robust Estimation of Scatter 94/142

Hint on potential applications

I Spectrum boundedness: for impulsive noise scenarios,


I SCM spectrum grows unbounded
I robust scatter estimator spectrum remains bounded
Robust estimators improve spectrum separability (important for many statistical inference
techniques seen previously)
I Spiked model generalization: we may expect a generalization to spiked models
I spikes are swallowed by the bulk in SCM context
I we expect spikes to re-emerge in robust scatter context
We shall see that we get even better than this. . .
I Application scenarios:
I Radar detection in impulsive noise (non-Gaussian noise, possibly clutter)
I Financial data analytics
I Any application where Gaussianity is too strong an assumption. . .
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 96/142

System Setting
I Signal model:

X
L

yi = pl al sli + i wi = Ai w
i
l =1

i , [s1i , . . . , sLi , wi ]T .

Ai , p1 a1 ... p L aL i IN , w

with y1 , . . . , yn CN satisfying:
1 Pn R
1. 1 , . . . , n > 0 random such that n , n i =1 i weakly and
t (dt ) = 1;

2. w1 , . . . , wn CN random independent unitarily invariant N-norm;
3. L N, p1 > . . . > pL > 0 deterministic;
a.s.
4. a1 , . . . , aL CN deterministic or random with A A diag(p1 , . . . , pL ) as N , with

A , [ p1 a1 , . . . , pL aL ] CN L .
5. s1,1 , . . . , sLn C independent with zero mean, unit variance.

I Relation to previous model: If L = 0, yi = i wi .
Elliptical model with covariance a low-rank (L) perturbation of IN .
We expect a spiked version of previous results.
I Application contexts:
I wireless communications: signals sli from L transmitters, N-antenna receiver; al random i.i.d.
channels (al al 0 l l 0 , e.g. al CN(0, IN /N ));
I array processing: L sources emit signals sli at steering angle al = a(l ). For ULA,
1
[a()]j = N 2 exp(2dj sin()).
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 97/142

Some intuition

I Signal detection/estimation in impulsive environments: Two scenarios


I heavy-tailed noise (elliptical, Gaussian mixtures, etc.)
I Gaussian noise with spurious impulsions
I Problems expected with SCM: Respectively,
I unbounded limiting spectrum, no source separation!
Invalidates G-MUSIC
I isolated eigenvalues due to spikes in time direction
False alarms induced by noise impulses!

I Our results: In a spiked model with noise impulsions,


I whatever noise impulsion type, spectrum of CN remains bounded
I isolated largest eigenvalues may appear, two classes:
I isolated eigenvalues due to noise impulses CANNOT exceed a threshold!
I all isolated eigenvalues beyond this threshold are due to signal
Detection criterion: everything above threshold is signal.
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 98/142

Theoretical results
Theorem (Extension to spiked robust model)
Under the same assumptions as in previous section,
a.s.
kCN SN k 0

where

1X
n
SN , v (i )Ai w i Ai
i w
n
i =1

with the unique solution to


Z
(t )
1= (dt )
1 + c (t )

and we recall
 
Ai , p1 a1 ... pL a L i IN
i = [s1i , . . . , sLi , wi ]T .
w

I Remark: For L = 0, Ai = [0, . . . , 0, IN ].


Recover previous result Ai wi becomes wi .
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 99/142

Localization of eigenvalues

Theorem (Eigenvalue localization)


Denote
PL
I uk eigenvector of k-th largest eigenvalue of AA =
i =1 pi ai ai
I uk eigenvector of k-th largest eigenvalue of CN
Also define (x ) unique positive solution to
 Z 1
tvc (t )
(x ) = c x + (dt ) .
1 + (x )tvc (t )

Further denote
1
(1 + c )2
Z
(x )vc (t )
p , lim c (dt ) , S+ , .
x S + 1 + (x )tvc (t ) (1 c )

a.s.
Then, if pj > p ,
j j > S + , otherwise lim supn
j 6 S + a.s., with j unique positive
solution to
 Z 1
vc ()
c (j ) (d ) = pj .
1 + (j )vc ()
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 100/142

Simulation
1.2
Eigenvalues of n1 Pn
i = 1 yi yi
Limiting spectral measure
1

0.8
Density

0.6

0.4

0.2

0
0 1 2 3 4 5

Eigenvalues

P
Figure: Histogram of the eigenvalues of n1 i yi yi against the limiting spectral measure, L = 2, p1 = p2 = 1,
N = 200, n = 1000, Sudent-t impulsions.
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 101/142

Simulation
8
Eigenvalues of CN
Limiting spectral measure

6
Density

4
Right-edge of support 1

S+
2

0
0 0.2 0.4 0.6 0.8 1 1.2

Eigenvalues

Figure: Histogram of the eigenvalues of CN against the limiting spectral measure, for u (x ) = (1 + )/( + x )
with = 0.2, L = 2, p1 = p2 = 1, N = 200, n = 1000, Student-t impulsions.
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 102/142

Comments

I SCM vs robust: Spikes invisible in SCM in impulsive noise, reborn in robust estimate of
scatter.
I Largest eigenvalues:
I i (CN ) > S + Presence of a source!
I i (CN ) (sup(Support), S + ) May be due to a source or to a noise impulse.
I i (CN ) < sup(Support) As usual, nothing can be said.
Induces a natural source detection algorithm.
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 103/142

Eigenvalue and eigenvector projection estimates


I Two scenarios: Pn
1
I known = limn n i =1 i
I unknown

Theorem (Estimation under known )


1. Power estimation. For each pj > p ,

Z !1
vc () a.s.
c (
j ) (d ) pj .
1 + (
j )vc ()

2. Bilinear form estimation. For each a, b CN with kak = kb k = 1, and pj > p


X X a.s.
a uk uk b wk a uk uk b 0
k,pk =pj k,pk =pj

where
Z
vc (t )
 2 (dt )
1 + (k )tvc (t )
wk = .
Z Z
vc (t ) 1 (k )2 t 2 vc (t )2
(dt ) 1 2 (dt )

1 + (
k )tvc (t ) c


1 + (k )tvc (t )
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 104/142

Eigenvalue and eigenvector projection estimates


Theorem (Estimation under unknown )
1. Purely empirical power estimation. For each pj > p ,
!1
1 X
n
v ( i
n) a.s.
( j ) pj .
N 1 +
( j )
i v (
i
n)
i =1

2. Purely empirical bilinear form estimation. For each a, b CN with kak = kb k = 1, and each
pj > p ,
X X a.s.
a uk uk b k a uk uk b 0
w
k,pk =pj k,pk =pj

where

1X
n
v (
i )
2
n


i =1 1 + (k )i v (i
)
w
k =
1X 1 X (
n n
v (
i
) k )2
2i v ( )2
i
1

2
n 1 + ( i v (i
) N

i =1
k ) i =1 1 + ( k )i v (
i )

1 X 1 1
n
1 1
,
y C y, i ,
yi C(i )1 yi , (
x ) as (x ) but for (i , ) (i ,
).
n N i (i ) i N

i =1
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 105/142

Application to G-MUSIC
I Assume the model ai = a(i ) with
1
a() = N 2 [exp(2dj sin())]N 1
j =0 .

Corollary (Robust G-MUSIC)


Define emp
RG () and RG () as

|{j,pj >p }|
X
RG () = 1
wk a() uk uk a()
k =1
|{j,pj >p }|
X
emp
RG () = 1 k a() uk uk a().
w
k =1

Then, for each pj > p ,


a.s.
j
j
emp a.s.
j j

where
j , argminR {
RG ()}
j
 emp
emp
, argminR RG () .
j j
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 106/142

Simulations: Single-shot in elliptical noise


1
Robust G-MUSIC
Emp. robust G-MUSIC
G-MUSIC
0.8
Emp. G-MUSIC
Robust MUSIC
X ()

MUSIC
0.6
Localization functions

0.4

0.2

4 6 8 10 12 14 16 18

[deg]

Figure: Random realization of the localization functions for the various MUSIC estimators, with N = 20,
n = 100, two sources at 10 and 12 , Student-t impulsions with parameter = 100, u (x ) = (1 + )/( + x )
with = 0.2. Powers p1 = p2 = 100.5 = 5 dB.
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 107/142

Simulations: Elliptical noise


101
Robust G-MUSIC
Emp. robust G-MUSIC
102 G-MUSIC
Emp. G-MUSIC
1 1 |2 ]

Robust MUSIC
103 MUSIC
Mean square error E [|

104

105

106

107

108
5 0 5 10 15 20 25 30

p1 , p2 [dB]

Figure: Means square error performance of the estimation of 1 = 10 , with N = 20, n = 100, two sources at
10 and 12 , Student-t impulsions with parameter = 10, u (x ) = (1 + )/( + x ) with = 0.2, p1 = p2 .
Advanced Random Matrix Models for Robust Estimation/3.2 Spiked model extension and robust G-MUSIC 108/142

Simulations: Spurious impulses


101
Robust G-MUSIC
Emp. robust G-MUSIC
G-MUSIC
102
Emp. G-MUSIC
1 1 |2 ]

Robust MUSIC
MUSIC
103
Mean square error E [|

104

105

106

5 0 5 10 15 20 25 30

p1 , p2 [dB]

Figure: Means square error performance of the estimation of 1 = 10 , with N = 20, n = 100, two sources at
10 and 12 , sample outlier scenario i = 1, i < n, n = 100, u (x ) = (1 + )/( + x ) with = 0.2, p1 = p2 .
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 110/142

Context

Ledoit and Wolf, 2004. A well-conditioned estimator for large-dimensional covariance matrices.
Pascal, Chitour, Quek, 2013. Generalized robust shrinkage estimator Application to STAP data.
Chen, Wiesel, Hero, 2011. Robust shrinkage estimation of high-dimensional covariance matrices.

I Shrinkage covariance estimation: For N > n or N ' n, shrinkage estimator

1X
n
(1 ) xi xi + IN , for some [0, 1].
n
i =1

I allows for invertibility, better conditioning


I may be chosen to minimize an expected error metric
I Limitation of Maronnas estimator:
I Maronna and Tyler estimators limited to N < n, otherwise do not exist
I introducing shrinkage in robust estimator cannot do much harm anyhow...
I Introducing the robust-shrinkage estimator: The literature proposes two such estimators

1 X
n
xi xi
N n
CN () = (1 ) + IN , (max{0, }, 1 ] (Pascal)
n 1
1 x C N
i =1 N i N ()xi

BN () 1 X
n
xi xi
CN () = , BN () = (1 ) + IN , (0, 1] (Chen)
N ()
1 tr B n 1
1 x C
N i =1 N i N ()xi
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 111/142

Main theoretical result

I Which estimator is better?


Having asked to authors of both papers, their estimator was much better than the
other, but the arguments we received were quite vague...

I Our result: In the random matrix regime, both estimators tend to be one and the same!
I Assumptions: As before, elliptical-like model
1
xi = i CN2 wi

This time, CN cannot be taken IN (due to +IN )!


Maronna-based shrinkage is possible but more involved...
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 112/142

Pascals estimator

Theorem (Pascals estimator)


= [ + max{0, 1 c 1 }, 1]. Then, as N, n ,
For (0, min{1, c 1 }), define R
N /n c (0, ),

a.s.
sup CN () SN () 0


R

where

1 X
n
xi xi
CN () = (1 ) + IN
n N ()1 xi
1 x C
i =1 N i

1 1 1 X n 1 1
SN () = CN2 wi wi CN2 + IN
() 1 (1 )c n

i =1

and
() is the unique positive solution to the equation in

1 X
N
i (CN )
1= .
N + (1 )i (CN )

i =1

Moreover, 7
() is continuous on (0, 1].
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 113/142

Chens estimator
Theorem (Chens estimator)
= [, 1]. Then, as N, n , N /n c (0, ),
For (0, 1), define R

a.s.
sup CN () SN () 0


R

where

BN () 1 X
n
xi xi
CN () = , BN () = (1 ) + IN
1 n N ()1 xi
1 x C
N tr BN () i =1 N i

1 X 12
n
1 1 T
SN () = CN wi wi CN2 + I
1 + T n 1 + T N
i =1

in which T =
()F (
(); ) with, for all x > 0,
r
1 1 1
F (x; ) = ( c (1 )) + ( c (1 ))2 + (1 )
2 4 x
and
() is the unique positive solution to the equation in

1 X
N
i (CN )
1= 1
.
N +
i =1 ;) i (CN )
(1)c +F (

Moreover, 7
() is continuous on (0, 1].
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 114/142

Asymptotic Model Equivalence

Theorem (Model Equivalence)


(max{0, 1 c 1 }, 1] and
For each (0, 1], there exist unique (0, 1] such that

1 X 12
n
SN (
) 1

1 1
= SN (
) = (1 ) CN wi wi CN2 + IN .

+ n
(
) 1(1 )c i =1

Besides, (0, 1] (max{0, 1 c 1 }, 1], 7


and (0, 1] (0, 1], 7
are increasing and onto.

I Up to normalization, both estimators behave the same!


I Both estimators behave the same as an impulsion-free Ledoit-Wolf estimator
I About uniformity: Uniformity over in the theorems is essential to find optimal values of .
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 115/142

Optimal Shrinkage parameter


I Chen sought for a Frobenius norm minimizing but got stuck by implicit nature of CN ()
I Our results allow for a simplification of the problem for large N, n!
I Model equivalence says only one problem needs be solved.

Theorem (Optimal Shrinkage)


For each (0, 1], define
!2
CN ()
 2 
1 N () = 1 tr
DN () = tr CN , D CN () CN .
N N ()
1 tr C N
N

M2 1 c 1 PN 2
Denote D ? = c c + ?
M2 1 , = c +M2 1 , M2 = limN N i =1 i (CN ) ? ,
and ? unique solutions to

?
T?
?
= = ? .
1 1
?
+ ? + T?
1
? ) 1(1
(
? )c

Then, letting small enough,


a.s.
N () a.s.
N ()
inf D D?, inf D D?

R
R

N ( a.s. N ( a.s.
D ? ) D ? , D ? ) D ? .
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 116/142

Estimating ? and ?
I ? and
Theorem only useful if ? can be estimated!
I Careful control of the proofs provide many ways to estimate these.
I Proposition below provides one example.

Optimal Shrinkage Estimate


N (max{0, 1 c 1 }, 1] and
Let N (0, 1] be solutions (not necessarily unique) to

N
cN
=
N (
" 2 #
1 tr C
N N ) 1 1 Pn xi xi
N tr n i =1 1 kx k2
i
1
N

Pn
xi CN (
N ) 1 x i
N n1
i =1
kxi k2 cN
P =
x C (
" 2 #
)1 xi Pn
1 N n1 ni=1 i Nkx Nk2
N + 1 1 xi xi
i N tr n i =1 1 kx k2
i
1
N

defined arbitrarily when no such solutions exist. Then


a.s. a.s.
? ,
N
?
N
N ( a.s. N ( a.s.
D N ) D ? , D N ) D ? .
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 117/142

Simulations
3
()}
inf (0,1] {DN
(
D N N )
D?
(
D N O )
Normalized Frobenius norm

0
1 2 4 8 16 32 64 128

n [log2 scale]

Figure: Performance of optimal shrinkage averaged over 10 000 Monte Carlo simulations, for N = 32, various
values of n, [CN ]ij = r |i j | with r = 0.7;
N as above;
O the clairvoyant estimator proposed in (Chen11).
Advanced Random Matrix Models for Robust Estimation/3.3 Robust shrinkage and application to mathematical finance 118/142

Simulations
1

0.8
Shrinkage parameter

0.6

0.4

0.2

N
?


N
?

O

0
1 2 4 8 16 32 64 128

n [log2 scale]

Figure: Shrinkage parameter averaged over 10 000 Monte Carlo simulations, for N = 32, various values of n,
[CN ]ij = r |i j | with r = 0.7;
N and
N as above;
O the clairvoyant estimator proposed in (Chen11);
= argmin{(max{0,1c 1 },1]} {D
N ()} and N ()}.
= argmin{(0,1]} {D
N
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 120/142

Context
I Hypothesis testing problem: Two sets of data
1
I Initial pure-noise data: x1 , . . . , xn , xi = i CN2 wi as before.
I New incoming data y given by:

x , H0
y=
p + x , H1
1
with x = CN2 w , p CN deterministic known, unknown.
I GLRT detection test:
H1
TN ()
H0

for some detection threshold where

|y CN1 ()p |
TN () , q q .
y CN1 ()y p CN1 ()p

and CN () defined in previous section.

In fact, originally found to be CN (0) but


I only valid for N < n
I introducing may bring improved for arbitrary N /n ratios.
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 121/142

Objectives and main results

I Initial observations:
I As N, n , N /n c > 0, under H0 ,
a.s.
TN () 0.
Trivial result of little interest!
I Natural question: for finite N, n and given , find such that
P (TN () > ) = min

I Turns out the correct non-trivial object is, for > 0 fixed
 
P NTN () > = min

I Objectives:
I for each , develop central limit theorem to evaluate
 
lim P NTN () >
N,n
N /nc

I determine limiting minimizing


I empirically estimate minimizing
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 122/142

What do we need?

CLT over CN statistics


a.s.
I We know that kCN () SN ()k 0
Key result so far!

I What about k N (CN () SN ())k ?
Does not converge to zero!!!
I But there is hope. . . : a.s.
N (a CN1 ()b a SN1 ()b ) 0
This is our main result!
I This requires much more delicate treatment, not discussed in this tutorial.
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 123/142

Main results

Theorem (Fluctuation of bilinear forms)


Let a, b CN with kak = kb k = 1. Then, as N, n with N /n c > 0, for any > 0 and
every k Z,

a.s.
sup N 1 a CNk ()b a SNk ()b 0

R

where R = [ + max{0, 1 1/c }, 1].


Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 124/142

False alarm performance

Theorem (Asymptotic detector performance)


As N, n with N /n c (0, ),
 !
2


sup P TN () > exp 2 0

R N 2 N (
)

where 7
is the aforementioned mapping and

1 p CN QN2 (
)p
2N (
) , 1 )2 N1 tr CN2 QN2 (

2 p QN (
)p )
N tr CN QN ( 1 c (1 )2 m( )

with QN (
) , ( IN + ( 1 )CN )1 .
)m(

I Limiting Rayleigh distribution


Weak convergence to Rayleigh variable RN (
)
I Remark: N and not a function of
There exists a uniformly optimal !
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 125/142

Simulation

1
Empirical hist. of TN ()
Distribution of RN ()

0.6
0.8

Cumulative distribution
0.6
0.4
Density

0.4

0.2

0.2

Empirical dist. of TN ()
Distribution of RN ()

0 0
0 1 2 3 4 0 1 2 3 4

Values taken Values taken

1
), N = 20, p = N 2 [1, . . . , 1]T , CN
Figure: Histogram distribution function of the NTN () versus RN (
Toeplitz from AR of order 0.7, cN = 1/2, = 0.2.
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 126/142

Simulation

1
Empirical hist. of TN ()
Density of RN ()

0.6 0.8

Cumulative distribution
0.6
Density

0.4

0.4

0.2
0.2

Empirical dist. of TN ()
Distribution of RN ()

0 0
0 1 2 3 4 0 1 2 3 4

Values taken Values taken

1
), N = 100, p = N 2 [1, . . . , 1]T , CN
Figure: Histogram distribution function of the NTN () versus RN (
Toeplitz from AR of order 0.7, cN = 1/2, = 0.2.
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 127/142

Empirical estimation of optimal

I Optimal can be found by line search. . . but CN unknown!


I We shall successively:
I empirical estimate N (
)
I minimize the estimate
I prove by uniformity asymptotic optimality of estimate

Theorem (Empirical performance estimation)


For (max{0, 1 cN1 }, 1), let

p C ()p 2
1 N1 1 tr CN ()
1 p CN ()p N
2N (
) ,   .
2 1 c + c
N1 tr CN1 () N1 tr CN () 1 N1 tr CN1 () 1
N tr CN ()

2N (1) , lim1
Also let 2N (
). Then

a.s.
sup 2N ( 2N (
) ) 0.

R
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 128/142

Final result

Theorem (Optimality of empirical estimator)


Define


N = argmin{R0 }
2N (
) .

Then, for every > 0,


 
 
P N ) > inf P
NTN ( NTN () > 0.
R
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 129/142

Simulations
100

=2

101
P ( NTN () > )

=3

102

Limiting theory
Empirical estimator

103 Detector

0 0.2 0.4 0.6 0.8 1

1
Figure: False alarm rate P ( NTN () > ), N = 20, p = N 2 [1, . . . , 1]T , CN Toeplitz from AR of order 0.7,
cN = 1/2.
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 130/142

Simulations
100

=2
P ( NTN () > )

101

=3

102 Limiting theory


Empirical estimator
Detector

0 0.2 0.4 0.6 0.8 1

1
Figure: False alarm rate P ( NTN () > ), N = 100, p = N 2 [1, . . . , 1]T , CN Toeplitz from AR of order 0.7,
cN = 1/2.
Advanced Random Matrix Models for Robust Estimation/3.4 Optimal robust GLRT detectors 131/142

Simulations
100
Limiting theory
Detector

101

102 N = 20

N = 100

103

104
0 0.2 0.4 0.6 0.8 1

1
Figure: False alarm rate P (TN () > ) for N = 20 and N = 100, p = N 2 [1, . . . , 1]T , [CN ]ij = 0.7|i j | ,
cN = 1/2.
Future Directions/4.1 Kernel matrices and kernel methods 134/142

Motivation: Spectral Clustering

N. El Karoui. The spectrum of kernel random matrices. The Annals of Statistics, 38(1):150, 2010.

I Objective: Clustering data x1 , . . . , xn CN in k similarity classes


I classical machine learning problem brought here to big data!
I assumes similarity function, e.g. Gaussian kernel
kxi xj k2
 
f (xi , xj ) = exp
22

I naturally brings kernel matrix:


W = [Wij ]16i,j 6n = [f (xi , xj )]16i,j 6n .

I Letting x1 , . . . , xn random, leads naturally to studying kernel random matrices.


I Little is known on such random matrices, but for xi i.i.d. zero mean and covariance IN :
 
W 11T 1 WW
a.s.
0
n

for some , depending on f and its derivatives.


Basically, W gets equivalent to a rank-one matrix.
Future Directions/4.1 Kernel matrices and kernel methods 135/142

Motivation: Spectral Clustering

I Clustering x1 , . . . , xn in k often written as:

X
k X f (xj , xj)
(RatioCut) min .
S1 ,...,Sk |Si |
S1 ...Sk =S i =1 j Si ,jSi
c
i 6=j, Si Sj =

But difficult to solve, NP hard!


I Can be equivalently rewritten
 
(RatioCut) min tr M T LM
M M, M T M =Ik

1
where M = {M = [mij ]16i 6n,16j 6k , mij = |Sj | 2 xi Sj } and

X
n
" #
L = [Lij ]16i,j 6n = [W + diag(W 1)]16i,j 6n = f (xi , xj ) + i,j f (xi , xl ) .
l =1 16i,j 6n

I Relaxing M to unitary leads to a simple eigenvalue/eigenvector problem:


Spectral clustering.
Future Directions/4.1 Kernel matrices and kernel methods 136/142

Objectives

I Generalization to k distributions for x1 , . . . , xn should lead to asymptotically rank-k W


matrices.
I If established, specific choices of known good kernel better understood.
I Eventually, find optimal choices for kernels.
Future Directions/4.2 Neural networks 138/142

Echo-state neural networks


I Neural network:
I Input neuron signal st R (could be multivariate)
I Output neuron signal yt R (could be multivariate)
I N neurons with
I state xt RN at time t
I connectivity matrix W RN N
I connectivity vector to input wI RN
I connectivity vector to output wO RN
I State evolution x0 = 0 (say) and
xt +1 = S (Wxt + wI st )
with S entry-wise sigmoid function.
I Output observation
yt = wOT xt .

I Classical neural networks:


I Learning phase: input-output data (st , yt ) used to learn W , wO , wI (via e.g. LS)
I Interpolation phase: W , wO , wI fixed, we observe output yt from new data st .
Poses overlearning problems, difficult to set up, demands lots of learning data.

I Echo-state neural networks: To solve the problems of neural networks


I W and wI set to be a random matrix, no longer learned
I only wO is learned
Reduces amount of data to learn, shows striking performances in some scenarios.
Future Directions/4.2 Neural networks 139/142

ESN and random matrices

I W , wI being random, performance study involves random matrices.


Stability, chaos regime, etc. involve extreme eigenvalues of W
I main difficulty is non-linearity caused by S
I Performance measures:
I MSE for training data
I MSE for interpolated data
Optimization to be performed on regression method!, e.g.
T
wO = (Xtrain Xtrain + IN )1 Xtrain ytrain

with Xtrain = [x1 , . . . , xT ], ytrain = [y1 , . . . , yT ]T , T train period.


I In first approximation: S = Id.
MSE performance with stationary inputs leads to study

X
W j wI wIT (W T )j
j =1

New random matrix model, can be analyzed with usual tools though.
Future Directions/4.2 Neural networks 140/142

Related biography

I J. T. Kent, D. E. Tyler, Redescending M-estimates of multivariate location and scatter, 1991.


I R. A. Maronna, Robust M-estimators of multivariate location and scatter, 1976.
I Y. Chitour, F. Pascal, Exact maximum likelihood estimates for SIRV covariance matrix: Existence and algorithm analysis, 2008.
I N. El Karoui, Concentration of measure and spectra of random matrices: applications to correlation matrices, elliptical distributions and beyond,
2009.
I R. Couillet, F. Pascal, J. W. Silverstein, Robust M-Estimation for Array Processing: A Random Matrix Approach, 2012.
I J. Vinogradova, R. Couillet, W. Hachem, Statistical Inference in Large Antenna Arrays under Unknown Noise Pattern, (submitted to) IEEE
Transactions on Signal Processing, 2012.
I F. Chapon, R. Couillet, W. Hachem, X. Mestre, On the isolated eigenvalues of large Gram random matrices with a fixed rank deformation,
(submitted to) Electronic Journal of Probability, 2012, arXiv Preprint 1207.0471.
I R. Couillet, M. Debbah, Signal Processing in Large Systems: a New Paradigm, IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 24-39, 2013.
I P. Loubaton, P. Vallet, Almost sure localization of the eigenvalues in a Gaussian information plus noise model. Application to the spiked models,
Electronic Journal of Probability, 2011.
I P. Vallet, W. Hachem, P. Loubaton, X. Mestre, J. Najim, On the consistency of the G-MUSIC DOA estimator. IEEE Statistical Signal Processing
Workshop (SSP), 2011.
Future Directions/4.2 Neural networks 141/142

To know more about all this

Our webpages:
I http://couillet.romain.perso.sfr.fr

I http://sri-uq.kaust.edu.sa/Pages/KammounAbla.aspx
Future Directions/4.2 Neural networks 142/142

Spraed it!

To download this presentation (PDF format):


I Log in to your Spraed account (www.spraed.net)
I Scan this QR code.

Vous aimerez peut-être aussi