00 vote positif00 vote négatif

7 vues89 pagesOct 06, 2016

© © All Rights Reserved

PDF, TXT ou lisez en ligne sur Scribd

© All Rights Reserved

7 vues

00 vote positif00 vote négatif

© All Rights Reserved

Vous êtes sur la page 1sur 89

Matrices: A review

Matrices: A review

Robert M. Gray

Deptartment of Electrical Engineering

Stanford University

Stanford 94305, USA

rmgray@stanford.edu

Contents

Chapter 1 Introduction

2.1

2.2

2.3

2.4

Eigenvalues

Matrix Norms

Asymptotically Equivalent Matrices

Asymptotically Absolutely Equal Distributions

5

8

11

18

25

3.1

3.2

Properties

25

28

31

4.1

4.2

4.3

31

36

41

Finite Order Toeplitz Matrices

Absolutely Summable Toeplitz Matrices

v

vi CONTENTS

4.4

4.5

4.6

4.7

Inverses of Hermitian Toeplitz Matrices

Products of Toeplitz Matrices

Toeplitz Determinants

52

53

57

60

63

5.1

5.2

5.3

5.4

65

68

70

73

Autoregressive Processes

Factorization

Differential Entropy Rate of Gaussian Processes

Acknowledgements

75

References

77

Abstract

t0 t1 t2

t1

t0 t1

t2

t1

t0

..

..

.

.

tn1

t(n1)

..

.

t0

The fundamental theorems on the asymptotic behavior of eigenvalues, inverses, and products of finite section Toeplitz matrices and

Toeplitz matrices with absolutely summable elements are derived in a

tutorial manner. Mathematical elegance and generality are sacrificed

for conceptual simplicity and insight in the hopes of making these results available to engineers lacking either the background or endurance

to attack the mathematical literature on the subject. By limiting the

generality of the matrices considered the essential ideas and results

can be conveyed in a more intuitive manner without the mathematical

machinery required for the most general cases. As an application the

results are applied to the study of the covariance matrices and their

factors of linear models of discrete time random processes.

vii

1

Introduction

a matrix of the form

t0 t1 t2 t(n1)

t1

t0 t1

..

.

(1.1)

Tn = t2

t1

t0

.

..

.

..

.

tn1

t0

Examples of such matrices are covariance matrices of weakly stationary stochastic time series and matrix representations of linear timeinvariant discrete time filters. There are numerous other applications

in mathematics, physics, information theory, estimation theory, etc. A

great deal is known about the behavior of such matrices the most

common and complete references being Grenander and Szego [13] and

Widom [29]. A more recent text devoted to the subject is Bottcher and

Silbermann [4]. Unfortunately, however, the necessary level of mathematical sophistication for understanding reference [13] is frequently

beyond that of one species of applied mathematician for whom the

theory can be quite useful but is relatively little understood. This caste

1

Introduction

consists of engineers doing relatively mathematical (for an engineering background) work in any of the areas mentioned. This apparent

dilemma provides the motivation for attempting a tutorial introduction on Toeplitz matrices that proves the essential theorems using the

simplest possible and most intuitive mathematics. Some simple and

fundamental methods that are deeply buried (at least to the untrained

mathematician) in [13] are here made explicit.

In addition to the fundamental theorems, several related results that

naturally follow but do not appear to be collected together anywhere

are presented.

The essential prerequisites for this book are a knowledge of matrix

theory, an engineers knowledge of Fourier series and random processes

and calculus (Riemann integration). A first course in analysis would

be helpful, but it is not assumed. Several of the occasional results required of analysis are usually contained in one or more courses in the

usual engineering curriculum, e.g., the Cauchy-Schwarz and triangle

inequalities. Hopefully the only unfamiliar results are a corollary to

the Courant-Fischer theorem and the Weierstrass approximation theorem. The latter is an intuitive result which is easily believed even if

not formally proved. More advanced results from Lebesgue integration,

functional analysis, and Fourier series are not used.

The approach of this book is to relate the properties of Toeplitz matrices to those of their simpler, more structured cousin the circulant

or cyclic matrix. These two matrices are shown to be asymptotically

equivalent in a certain sense and this is shown to imply that eigenvalues,

inverses, products, and determinants behave similarly. This approach

provides a simplified and direct path (to the authors point of view)

to the basic eigenvalue distribution and related theorems. This method

is implicit but not immediately apparent in the more complicated and

more general results of Grenander in Chapter 7 of [13]. The basic results for the special case of a finite order Toeplitz matrix appeared in

[10], a tutorial treatment of the simplest case which was in turn based

on the first draft of this work. The results were subsequently generalized using essentially the same simple methods, but they remain less

general than those of [13].

As an application several of the results are applied to study certain

3

models of discrete time random processes. Two common linear models are studied and some intuitively satisfying results on covariance

matrices and their factors are given. As an example from Shannon information theory, the Toeplitz results regarding the limiting behavior

of determinants is applied to find the differential entropy rate of a stationary Gaussian random process.

We sacrifice mathematical elegance and generality for conceptual

simplicity in the hope that this will bring an understanding of the

interesting and useful properties of Toeplitz matrices to a wider audience, specifically to those who have lacked either the background or the

patience to tackle the mathematical literature on the subject.

2

The Asymptotic Behavior of Matrices

theorem and proceed to a discussion of the asymptotic eigenvalue, product, and inverse behavior of sequences of matrices. The major use of

the theorems of this chapter is that we can often study the asymptotic behavior of complicated matrices by studying a more structured

and simpler asymptotically equivalent matrix, as will be developed in

subsequent chapters.

2.1

Eigenvalues

n n matrix A are the solutions to the equation

Ax = x

(2.1)

and hence the eigenvalues are the roots of the characteristic equation

of A:

det(A I) = 0 .

(2.2)

If the eigenvalues are real, then we will usually assume that they are

ordered in nonincreasing fashion, i.e., 1 2 3 .

5

A = U RU ,

(2.3)

U 1 = U , and R = {rk,j } is an upper triangular matrix ([15], p. 79).

The eigenvalues of A are the principal diagonal elements of R. If A

is normal, i.e., if A A = AA , then R is a diagonal matrix, which we

denote as R = diag(k ). If A is Hermitian, i.e., if A = A, then the

eigenvalues are real. If a matrix is Hermitian, then it is also normal.

A matrix A is nonnegative definite if x Ax 0 for all nonzero vectors x. The matrix is positive definite if the inequality is strict for all

nonzero vectors x. (Some books refer to these properties as positive

definite and strictly positive definite, respectively. If an Hermitian matrix is nonnegative definite, then its eigenvalues are all nonnegative. If

the matrix is positive definite, then the eigenvalues are all (strictly)

positive.

The extreme values of the eigenvalues of an Hermitian matrix H

can be characterized in terms of the Rayleigh quotient H and a vector

(complex ntuple) x defined by

RH (x) = (x Hx)/(x x).

(2.4)

As the result is both important and simple prove, we state and prove it

formally. The result will be useful in specifying the interval containing

the eigenvalues of an Hermitian matrix.

Usually in books on matrix theory it is proved as a corollary to

the variational description of eigenvalues given by the Courant-Fischer

theorem (see, e.g., [15], p. 116, for the case of real symmetric matrices),

but the following result is easily demonstrated directly.

Lemma 2.1. Given an Hermitian matrix H, let M and m be the

maximum and minimum eigenvalues of H, respectively. Then

x Hx

(2.5)

x Hx

(2.6)

x:x x=1

x:x x=1

2.1. Eigenvalues

minimum and maximum eigenvalues m and M , respectively. Then

RH (em ) = m and RH (eM ) = M and therefore

m min RH (x)

(2.7)

(2.8)

max RH (x).

x

A is the diagonal matrix of the eigenvalues k , and therefore

x Hx

x U AU x

=

x x

x x

Pn

2

y Ay

k=1 |yk | k

P

=

=

,

n

2

y y

k=1 |yk |

unitary so that x x = y y. But for all vectors y, this ratio is bound

below by m and above by M and hence for all vectors x

m RH (x) M

(2.9)

the lemma. The right-hand equalities are easily seen to hold since if x

minimizes (maximizes) the Rayleigh quotient, then the normalized vector x/x x satisfies the constraint of the minimization (maximization)

to the right, hence the minimum (maximum) of the Rayleigh quotion

must be bigger (smaller) than the constrained minimum (maximum) to

the right. Converseley, if x achieves the rightmost optimization, then

the same x yields a Rayleigh quotient of the the same optimum value.

2

The following lemma is useful when studying non-Hermitian matrices and products of Hermitian matrices. First note that if A is an

arbitrary complex matrix, then the matrix A A is both Hermitian and

nonnegative definite. It is Hermitian because (A A) = A A and it is

nonnegative definite since if for any complex vector x we define the

complex vector y = Ax, then

n

X

x (A A)x = y y =

|yk |2 0.

k=1

Lemma 2.2. Let A be a matrix with eigenvalues k . Define the eigenvalues of the Hermitian nonnegative definite matrix A A to be k 0.

Then

n1

n1

X

X

k

|k |2 ,

(2.10)

k=0

k=0

matrix. The trace is invariant to unitary operations so that it also is

equal to the sum of the eigenvalues of a matrix, i.e.,

Tr{A A} =

n1

X

(A A)k,k =

k=0

n1

X

k .

(2.11)

k=0

We have

Tr{A A} = Tr{R R}

=

n1

X n1

X

k=0 j=0

n1

X

k=0

|rj,k |2

|k |2 +

n1

X

k=0

|k |2

X

|rj,k |2

k6=j

(2.12)

Equation (2.12) will hold with equality iff R is diagonal and hence iff

A is normal.

2

Lemma 2.2 is a direct consequence of Shurs theorem ([15], pp. 229231) and is also proved in [13], p. 106.

2.2

Matrix Norms

or equivalently a norm of the appropriate kind. Two norms the

operator or strong norm and the Hilbert-Schmidt or weak norm (also

called the Frobenius norm or Euclidean norm when the scaling term n

is removed) will be used here ([13], pp. 102-103).

Let A be a matrix with eigenvalues k and let k 0 be the eigenvalues of the Hermitian nonnegative definite matrix A A. The strong

norm k A k is defined by

k A k= max RA A (x)1/2 = max

[x A Ax]1/2 .

x:x x=1

k A k2 = max k = M .

(2.13)

(2.14)

The strong norm of A can be bounded below by letting eM be the eigenvector of A corresponding to M , the eigenvalue of A having largest

absolute value:

k A k2 = max

x A Ax (eM A )(AeM ) = |M |2 .

(2.15)

x:x x=1

If A is itself Hermitian, then its eigenvalues k are real and the eigenvalues k of A A are simply k = 2k . This follows since if e(k) is an

eigenvector of A with eigenvalue k , then A Ae(k) = k A e(k) = 2k e(k) .

Thus, in particular, if A is Hermitian then

k A k= max |k | = |M |.

(2.16)

A = {ak,j } is defined by

|A| = n1

= (n

1/2

n1

X n1

X

|ak,j |2

k=0 j=0

1/2

Tr[A A])

n1

X

k=0

!1/2

(2.17)

The quantity n|A| is sometimes called the Frobenius norm or Euclidean norm. From lemma 2.2 we have

2

|A| n

n1

X

k=0

(2.18)

10

2

k A k = max k n

n1

X

k=0

k = |A|2 .

(2.19)

Note that both the strong and the weak norms are in fact norms

in the linear space of matrices, i.e., both satisfy the following three

axioms:

1. k A k 0 , with equality iff A = 0 , the all zero matrix.

2. k A + B kk A k + k B k

3. k cA k= |c| k A k

.

(2.20)

The triangle inequality in (2.20) will be used often as is the following

direct consequence:

k A B k | k A k k B k |.

(2.21)

The weak norm is usually the most useful and easiest to handle of

the two but the strong norm is handy in providing a bound for the

product of two matrices as shown in the next lemma.

Lemma 2.3. Given two n n matrices G = {gk,j } and H = {hk,j },

then

|GH| k G k |H|.

(2.22)

Proof. Expanding terms yields

XX X

|GH|2 = n1

| gi,k hk,j |2

i

= n1

XXXX

i

= n1

X

hj G Ghj ,

j

m,j

gi,k gi,m hk,j h

(2.23)

11

(hj G Ghj )/(hj hj ) k G k2

and therefore

|GH|2 n1 k G k2

X

j

hj hj =k G k2 |H|2 .

2

Lemma 2.2 is the matrix equivalent of 7.3a of ([13], p. 103). Note

that the lemma does not require that G or H be Hermitian.

2.3

n becomes large. As might be expected, we will use the weak norm of

the difference of two matrices as a measure of the distance between

them. Two sequences of n n matrices An and Bn are said to be

asymptotically equivalent if

(1) An and Bn are uniformly bounded in strong (and hence in

weak) norm:

k An k ,

k Bn k M <

(2.24)

and

(2) An Bn = Dn goes to zero in weak norm as n :

lim |An Bn | = lim |Dn | = 0.

We can immediately prove several properties of asymptotic equivalence which are collected in the following theorem.

Theorem 2.1.

(1) If An Bn , then

lim |An | = lim |Bn |.

(2.25)

12

(3) If An Bn and Cn Dn , then An Cn Bn Dn .

(4) If An Bn and k A1

k, k Bn1 k K < , i.e., A1

n

n

and Bn1 exist and are uniformly bounded by some constant

1

independent of n, then A1

n Bn .

1

(5) If An Bn Cn and k A1

n k K < , then Bn An Cn .

(6) If An Bn , then there are finite constants m and M such

that

m n,k , n,k M , n = 1, 2, . . .

k = 0, 1, . . . , n 1.

(2.26)

Proof.

(1) Eq. (2.25) follows directly from (2.21).

(3) Applying lemma 2.2 yields

|An Cn Bn Dn |

|An Cn An Dn + An Dn Bn Dn |

k An k |Cn Dn |+ k Dn k |An Bn |

0.

(4)

1

|A1

n Bn |

|Bn1 Bn An Bn1 An A1

n |

k Bn1 k k A1

n k |Bn An |

0.

(5)

Bn A1

n Cn

1

A1

n An Bn An Cn

k A1

n k |An Bn Cn |

0.

by some finite number M and hence from (2.18), |n.k | M

13

result holds for m = M and it may hold for larger m, e.g.,

m = 0 if the matrices are all nonnegative definite.

2

The above results will be useful in several of the later proofs. Asymptotic equality of matrices will be shown to imply that eigenvalues, products, and inverses behave similarly. The following lemma provides a

prelude of the type of result obtainable for eigenvalues and will itself

serve as the essential part of the more general results to follow. It shows

that if the weak norm of the difference of the two matrices is small, then

the sums of the eigenvalues of each must be close.

Lemma 2.4. Given two matrices A and B with eigenvalues n and

n , respectively, then

|n1

n1

X

k=0

k n1

n1

X

k=0

k | = |n1

n1

X

k=0

(k k )| |A B|.

n1

X

k=0

n1

X

k=0

k = Tr(A) Tr(B)

= Tr(D).

Tr(D) yields

2

n1

X

|Tr(D)|2 = dk,k

k=0

n1

X

n1

X n1

X

k=0

|dk,k |2

k=0 j=0

= n2 |D|2 .

|dk,j |2

(2.27)

14

2

An immediate consequence of the lemma is the following corollary.

Corollary 2.1. Given two sequences of asymptotically equivalent matrices An and Bn with eigenvalues n,k and n,k , respectively, then

lim n1

n1

X

n,k = lim n1

n

k=0

n1

X

n,k

(2.28)

k=0

or, equivalently,

lim n1

n1

X

k=0

(n,k n,k ) = 0.

(2.29)

lim n1 Tr(Dn ) = 0.

(2.30)

0 |n1 Tr(Dn )|2 |Dn |2

(2.31)

2

The previous corollary can be interpreted as saying the sample or

arithmetic means of the eigenvalues of two matrices are asymptotically

equal if the matrices are asymptotically equivalent. It is easy to see

that if the matrices are Hermitian, a similar result holds for the means

of the squared eigenvalues. From (2.21) and (2.18),

|Dn |

| |An | |Bn | |

v

v

u

u

n1

n1

u

X

u

X

t

2

2 |

1

| n

n,k tn1

n,k

k=0

k=0

15

Corollary 2.2. Given two sequences of asymptotically equivalent Hermitian matrices An and Bn with eigenvalues n,k and n,k , respectively,

then

n1

n1

X

X

2

lim n1

2n,k = lim n1

n,k

(2.32)

n

k=0

k=0

or, equivalently,

lim n1

n1

X

k=0

2

(2n,k n,k

) = 0.

(2.33)

eigenvalues or moments of an eigenvalue distribution rather than individual eigenvalues. Equations (2.28) and (2.32) are special cases of

the following fundamental theorem of asymptotic eigenvalue distribution.

Theorem 2.2. Let An and Bn be asymptotically equivalent sequences

of matrices with eigenvalues n,k and n,k , respectively. Assume that

n1

X

1

the eigenvalue moments of either matrix converge, e.g., lim n

sn,k

n

lim n

n1

X

sn,k

k=0

= lim n

n

n1

X

s

n,k

.

k=0

(2.34)

k=0

n . Since the eigenvalues of Asn are sn,k , (2.34) can be written in terms

of n as

lim n1 Trn = 0.

(2.35)

n

and Bn s but containing at least one Dn . Repeated application of lemma

2.3 thus gives

|n | K |Dn | n 0.

(2.36)

Corollary 2.1 to the matrices Asn and Dns to obtain (2.35) and hence

(2.34).

2

16

eigenvalue behavior. Most of the succeeding results on eigenvalues will

be applications or specializations of (2.34).

Since (2.32) holds for any positive integer s we can add sums corresponding to different values of s to each side of (2.32). This observation

immediately yields the following corollary.

Corollary 2.3. Let An and Bn be asymptotically equivalent sequences

of matrices with eigenvalues n,k and n,k , respectively, and let f (x)

be any polynomial. Then

lim n1

n1

X

f (n,k ) = lim n1

n

k=0

n1

X

f (n,k ) .

(2.37)

k=0

that (2.37) can hold for any analytic function f (x) since such functions

can be expanded into complex Taylor series, i.e., into polynomials. If

An and Bn are Hermitian, however, then a much stronger result is

possible. In this case the eigenvalues of both matrices are real and we

can invoke the Stone-Weierstrass approximation theorem ([19], p. 146)

to immediately generalize Corollary 2.3. This theorem, our one real

excursion into analysis, is stated below for reference.

Theorem 2.3. (Stone-Weierstrass) If F (x) is a continuous complex

function on [a, b], there exists a sequence of polynomials pn (x) such

that

lim pn (x) = F (x)

n

Stated simply, any continuous function defined on a real interval can

be approximated arbitrarily closely by a polynomial. Applying theorem

2.3 to Corollary 2.3 immediately yields the following theorem:

Theorem 2.4. Let An and Bn be asymptotically equivalent sequences

of Hermitian matrices with eigenvalues n,k and n,k , respectively.

From theorem 2.1 there exist finite numbers m and M such that

m n,k , n,k M , n = 1, 2, . . .

k = 0, 1, . . . , n 1.

17

lim n

n1

X

F (n,k ) = lim n

k=0

n1

X

F (n,k )

(2.38)

k=0

lim n

n1

X

k=0

(F (n,k ) F (n,k )) = 0

(2.39)

When two real sequences {n,k ; k = 0, 1, . . . , n 1} and {n,k ; k =

0, 1, . . . , n 1} satisfy (2.26)-(2.38), they are said to be asymptotically

equally distributed ([13], p. 62, where the definition is attributed to

Weyl).

As an example of the use of theorem 2.4 we prove the following

corollary on the determinants of asymptotically equivalent matrices.

Corollary 2.4. Let An and Bn be asymptotically equivalent Hermitian matrices with eigenvalues n,k and n,k , respectively, such that

n,k , n,k m > 0. Then

lim (det An )1/n = lim (det Bn )1/n .

(2.40)

lim n

n1

X

ln n,k = lim n

n

k=0

n1

X

ln n,k

k=0

and hence

"

lim exp n1 ln

n1

Y

k=0

"

n

n1

Y

n,k

k=0

or equivalently

lim exp[n1 ln det An ] = lim exp[n1 ln det Bn ],

18

With suitable mathematical care the above corollary can be extended to cases where n,k , n,k > 0 provided additional constraints

are imposed on the matrices. For example, if the matrices are assumed

to be Toeplitz matrices, then the result holds even if the eigenvalues can

get arbitrarily small but remain strictly positive. (See the discussion on

p. 66 and in Section 3.1 of [13] for the required technical conditions.)

In the preceding chapter the concept of asymptotic equivalence of

matrices was defined and its implications studied. The main consequences have been the behavior of inverses and products (theorem 2.1)

and eigenvalues (theorems 2.2 and 2.4). These theorems do not concern individual entries in the matrices or individual eigenvalues, rather

1

they describe an average behavior. Thus saying A1

n Bn means

1

that that |A1

n Bn | n 0 and says nothing about convergence of

individual entries in the matrix. In certain cases stronger results on a

type of elementwise convergence are possible using the stronger norm

of Baxter [1, 2]. Baxters results are beyond the scope of this book.

2.4

used in its derivation using reasonably elementary methods. The key

additional idea required is the Wielandt-Hoffman theorem [30], a result from matrix theory that is of independent interest. The theorem is

stated and a proof following Wilkinson [31] is presented for completeness.

Lemma 2.5. (Wielandt-Hoffman theorem) Given two Hermitian matrices A and B with eigenvalues k and k in nonincreasing order,

respectively, then

n

n1

X

k=0

|k k |2 |A B|2 .

U diag(k )U , B = W diag(k )W , where U and W are unitary. Since

19

|A B| = |U diag(k )U W diag(k )W |

= |diag(k )U U W diag(k )W |

= |diag(k )U W U W diag(k )|

= |diag(k )Q Qdiag(k )|,

where Q = U W = {qi,j } is also unitary. The (i, j) entry in the matrix

diag(k )Q Qdiag(k ) is (i j )qi,j and hence

|A B|2 = n1

n1

X n1

X

i=0 j=0

|i j |2 |qi,j |2 =

n1

X n1

X

i=0 j=0

|i j |2 pi,j (2.41)

that

n1

n1

X

X

2

|qi,j | =

|qi,j |2 = 1

(2.42)

i=0

or

n1

X

i=0

j=0

pi,j =

n1

X

pi,j =

j=0

1

.

n

(2.43)

This can be interpreted in probability terms: pi,j = n1 |qi,j |2 is a probability mass function or pmf on {0, 1, . . . , n1}2 with uniform marginal

probability mass functions. Recall that it is assumed that the eigenvalues are ordered so that 0 1 2 and 0 1 2 .

We claim that for all such matrices P satisfying (2.43), the right

hand side of (2.41) is minimized by P = n1 I, where I is the identity

matrix, so that

n1

X n1

X

i=0 j=0

|i j |2 pi,j

n1

X

i=0

|i i |2 ,

which will prove the result. To see this suppose the contrary. Let

be the smallest integer in {0, 1, . . . , n 1} such that P has a nonzero

element off the diagonal in either row or in column . If there is a

nonzero element in row off the diagonal, say p,a then there must also

20

be a nonzero element in column off the diagonal, say pb, in order for

the constraints (2.43) to be satisfied. Since is the smallest such value,

< a and < b. Let x be the smaller of pl,a and pb,l . Form a new

matrix P by adding x to p, and pb,a and subtracting x from pb, and

p,a . The new matrix still satisfies the constraints and it has a zero in

either position (b, ) or (, a). Furthermore the norm of P has changed

from that of P by an amount

x ( )2 + (b a )2 ( a )2 (b )2

= x( b )( a ) 0

since > b, > a, the eigenvalues are nonincreasing, and x is positive. Continuing in this fashion all nonzero offdiagonal elements can be

zeroed out without increasing the norm, proving the result.

2

Applying the Cauchy-Schwartz inequality and the WielandtHoffman theorem yields the following strengthening of lemma 2.4,

v

un1

n1

X

uX

1

1 t

(k k )2 n |An Bn |,

n

|k k | n

k=0

k=0

n and n in nonincreasing order, respectively, then

n1

n1

X

k=0

|k k | |A B|.

Note in particular that the absolute values are outside the sum in

lemma 2.4 and inside the sum in lemma 2.6. As was done in the weaker

case, the result can be used to prove a stronger version of theorem

2.4. This line of reasoning, using the Wielandt-Hoffman theorem, was

pointed out by William F. Trench who used special cases in his paper [20]. Similar arguments have become standard for treating eigenvalue distributions for Toeplitz and Hankel matrices. See, for example,

[7, 28, 3]. The following theorem provides the derivation. The specific

21

from William F. Trench. See also [?, 22, 23, 24, 25].

of Hermitian matrices with eigenvalues n,k and n,k in nonincreasing

order, respectively. From theorem 2.1 there exist finite numbers m and

M such that

m n,k , n,k M , n = 1, 2, . . .

k = 0, 1, . . . , n 1.

(2.44)

lim n1

n1

X

k=0

|F (n,k ) F (n,k )| = 0.

(2.45)

Note that the theorem strengthens the result of theorem 2.4 because

of the magnitude inside the sum. Following Trench [?] in this case the

eigenvalues are said to be asymptotically absolutely equally distributed.

Proof: From lemma 2.6

n1

X

k=0

(2.46)

which implies (2.45) for the case F (r) = r. For any nonnegative integer

j

j

|jn,k n,k

| j max(|m|, |M |)j1 |n,k n,k |.

(2.47)

shows that

j

X

aj bj

=

ajl bl1

ab

l=1

22

so that

aj bj

| =

ab

|aj bj |

|a b|

= |

j

X

l=1

ajl bl1 |

j

X

|ajl bl1 |

j

X

|a|jl |b|l1

l=1

l=1

j max(|m|, |M |)j1 ,

which proves (2.47). This immediately implies that (2.45) holds for

functions of the form F (r) = r j for positive integers j, which in turn

means the result holds for any polynomial. If F is an arbitrary continuous function on [m, M ], then from theorem 2.3 given > 0 there is a

polynomial P such that

23

n

n1

X

k=0

|F (n,k ) F (n,k )|

= n

n1

X

n1

X

|F (n,k ) P (n,k )| + n1

k=0

n1

k=0

+n1

n1

X

k=0

2 + n1

n1

X

k=0

|P (n,k ) P (n,k )|

|P (n,k ) F (n,k )|

n1

X

k=0

|P (n,k ) P (n,k )|

since can be made arbitrarily small.

2

3

Circulant Matrices

The properties of circulant matrices are well known and easily derived

([15], p. 267,[6]). Since these matrices are used both to approximate and

explain the behavior of Toeplitz matrices, it is instructive to present

one version of the relevant derivations here.

3.1

c0

c1

c2

cn1 c0

c1 c2

..

.

cn1 c0 c1

C=

..

..

.. ..

.

.

.

.

c1

cn1

cn1

..

c2

c1

c0

(3.1)

where each row is a cyclic shift of the row above it. The structure can

also be characterized by noting that the (k, j) entry of C, Ck,j , is given

by

Ck,j = c(jk) mod n ,

25

26

Circulant Matrices

The eigenvalues k and the eigenvectors y (k) of C are the solutions

of

Cy = y

(3.2)

or, equivalently, of the n difference equations

m1

X

cnm+k yk +

k=0

n1

X

k=m

ckm yk = ym ; m = 0, 1, . . . , n 1.

(3.3)

n1m

X

k=0

ck yk+m +

n1

X

k=nm

ck yk(nm) = ym ; m = 0, 1, . . . , n 1. (3.4)

by guessing an (hopefully) intuitive solution and then proving that it

works. Since the equation is linear with constant coefficients a reasonable guess is yk = k (analogous to y(t) = es in linear time invariant

differential equations). Substitution into (3.4) and cancellation of m

yields

n1m

n1

X

X

ck k + n

ck k = .

k=0

k=nm

roots of unity, then we have an eigenvalue

=

n1

X

ck k

(3.5)

k=0

y = n1/2 1, , 2 , . . . , n1 ,

(3.6)

to give the eigenvector unit energy. Choosing m as the complex nth

root of unity, m = e2im/n , we have eigenvalue

m =

n1

X

k=0

ck e2imk/n

(3.7)

27

and eigenvector

y (m) = n1/2 1, e2im/n , , e2i(n1)/n .

C = U U ,

(3.8)

where

U

= n1/2 e2imk/n ; m, k = 0, 1, . . . , n 1

= {k kj }

(

1

m =

0

m=0

otherwise

To verify (3.8) denote that the (k, j)th element of U U by ak,j and

observe that ak,j will be the product of the kth row of U , which

is {n1/2 e2imk/n k ; m = 0, 2, . . . , n 1}, times the jth row of U ,

{n1/2 e2imj/n ; m = 0, 2, . . . , n 1} so that

ak,j = n1

n1

X

e2im(jk)/n m

m=0

= n1

n1

X

e2im(jk)/n

m=0

= n1

n1

X

But we have

n1

X

m=0

2im(jkr)/n

cr e2imr/n

r=0

cr

r=0

n1

X

n1

X

e2im(jkr)/n .

(3.9)

m=0

(

n

0

k j = r mod n

otherwise

so that ak,j = c(kj) mod n . Thus (3.8) and (3.1) are equivalent. Furthermore (3.9) shows that any matrix expressible in the form (3.8) is

circulant.

28

Circulant Matrices

It should also be familiar to those with standard engineering backgrounds that m in (3.7) is simply the discrete Fourier transform (DFT)

of the sequence ck and (3.8) can be interpreted as a combination of

the Fourier inversion formula and the Fourier cyclic shift formula.

Since C is unitarily similar to a diagonal matrix it is normal. Note

that all circulant matrices have the same set of eigenvectors.

3.2

Properties

The following theorem summarizes the properties derived in the previous section regarding eigenvalues and eigenvectors of circulant matrices

and provides some easy implications.

Theorem 3.1. Let C = {ckj } and B = {bkj } be circulant n n

matrices with eigenvalues

m =

n1

X

ck e2imk/n

k=0

m =

n1

X

bk e2imk/n ,

k=0

respectively.

CB = BC = U U ,

where = {m m k,m }, and CB is also a circulant matrix.

(2) C + B is a circulant matrix and

C + B = U U,

where = {(m + m )k,m }

(3) If m 6= 0; m = 0, 1, . . . , n 1, then C is nonsingular and

C 1 = U 1 U

so that the inverse of C can be constructed in a straightforward manner.

3.2. Properties

29

diagonal matrices with elements m k,m and m k,m , respectively.

(1) CB = U U U U

= U U

= U U = BC

Since is diagonal, (3.9) implies that CB is circulant.

(2) C + B = U ( + )U

(3) C 1 = (U U )1

= U 1 U

if is nonsingular.

2

Circulant matrices are an especially tractable class of matrices since

inverses, products, and sums are also circulant matrices and hence both

straightforward to construct and normal. In addition the eigenvalues of

such matrices can easily be found exactly.

In the next chapter we shall see that certain circulant matrices

asymptotically approximate Toeplitz matrices and hence from Chapter

2 results similar to those in theorem 3.1 will hold asymptotically for

Toeplitz matrices.

4

Toeplitz Matrices

4.1

In this chapter the asymptotic behavior of inverses, products, eigenvalues, and determinants of finite Toeplitz matrices is derived by constructing an asymptotically equivalent circulant matrix and applying

the results of the previous chapters. Consider the infinite sequence

{tk ; k = 0, 1, 2, } and define the finite (n n) Toeplitz matrix

Tn = {tkj } as in (1.1). Toeplitz matrices can be classified by the restrictions placed on the sequence tk . If there exists a finite m such that

tk = 0, |k| > m, then Tn is said to be a finite order Toeplitz matrix. If

tk is an infinite sequence, then there are two common constraints. The

most general is to assume that the tk are square summable, i.e., that

k=

|tk |2 < .

(4.1)

assumed here; i.e., Lebesgue integration and a relatively advanced

knowledge of Fourier series. We will make the stronger assumption that

31

32

Toeplitz Matrices

k=

|tk | < .

(4.2)

This assumption greatly simplifies the mathematics but does not alter

the fundamental concepts involved. As the main purpose here is tutorial

and we wish chiefly to relay the flavor and an intuitive feel for the

results, we here confine interest to the absolutely summable case. The

main advantage of (4.2) over (4.1) is that it ensures the existence and

continuity of the Fourier series f () defined by

f () =

tk eik = lim

k=

n

X

tk eik .

(4.3)

k=n

Not only does the limit in (4.3) converge if (4.2) holds, it converges

uniformly for all , that is, we have that

n1

n

X

X

X

tk eik +

tk eik

tk eik =

f ()

k=

k=n

k=n+1

n1

X

X

ik

ik

tk e +

tk e ,

k=

n1

X

|tk | +

k=

k=n+1

|tk |

k=n+1

where the righthand side does not depend on and it goes to zero as

n from (4.2), thus given there is a single N , not depending on

, such that

n

X

ik

f

()

t

e

(4.4)

, all [0, 2] , if n N.

k

k=n

(

)2

X

X

2

|tk |

|tk | .

k=

k=

33

|f ()|

|tk eik |

k=

k=

we denote the least upper bound and greatest lower bound of f () by

Mf and mf , respectively. Observe that max(|mf |, |Mf |) M|f | .

Since f () is the Fourier series of the sequence tk , we could alternatively begin with a bounded and hence Riemann integrable function

f () on [0, 2] (|f ()| M|f | < for all ) and define the sequence

of n n Toeplitz matrices

Z 2

1

i(kj)

Tn (f ) = (2)

f ()e

d ; k, j = 0, 1, , n 1 .

0

(4.5)

As before, the Toeplitz matrices will be Hermitian iff f is real. The assumption that f () is Riemann integrable implies that f () is continuous except possibly at a countable number of points. Which assumption

is made depends on whether one begins with a sequence tk or a function f () either assumption will be equivalent for our purposes since

it is the Riemann integrability of f () that simplifies the bookkeeping

in either case. Before finding a simple asymptotic equivalent matrix to

Tn , we use lemma 2.1 to find a bound on the eigenvalues of Tn when

it is Hermitian and an upper bound to the strong norm in the general

case.

If Tn (f ) is Hermitian, then

mf n,k Mf .

(4.6)

k Tn (f ) k 2M|f |

(4.7)

34

Toeplitz Matrices

max n,k = max(x Tn x)/(x x)

(4.8)

x

so that

x Tn x

n1

X n1

X

tkj xk x

j

k=0 j=0

n1

X n1

X

(2)1

k=0 j=0

(2)1

2 n1

X

and likewise

x x=

n1

X

k=0

2

0

xk

k=0

xk x

j

(4.9)

2

f () d

eik

|xk | = (2)

f ()ei(kj) d

d|

0

n1

X

k=0

xk eik |2 .

n1

2

Z 2

X

ik

df () xk e

0

x Tn x

k=0

=

Mf ,

mf

2

Z 2 n1

x x

X

d xk eik

0

(4.10)

(4.11)

k=0

e(k) is the eigenvector associated with n,k , then the quadratic form

P

2

with x = e(k) yields x Tn x = n,k n1

k=0 |xk | . Thus (4.11) implies (4.6)

directly.

2

35

|n,M | max(|Mf |, |mf |) which in turn must be less than M|f | , which

proves (4.7) for Hermitian matrices.. Suppose that Tn (f ) is not Hermitian or, equivalently, that f is not real. Any function f can be written

in terms of its real and imaginary parts, f = fr + ifi , where both fr

and fi are real. In particular, fr = (f + f )/2 and fi = (f f )/2i.

Since the strong norm is a norm,

k Tn (f ) k = k Tn (fr + ifi ) k

= k Tn (fr ) + iTn (fi ) k

k Tn (fr ) k + k Tn (fi ) k

M|fr | + M|fi | .

Since |(f f )/2 (|f |+ |f |)/2 M|f | , M|fr | + M|fi | 2M|f | , proving

(4.7).

2

Note for later use that the weak norm between Toeplitz matrices

has a simpler form than (2.14). Let Tn = {tkj } and Tn = {tkj } be

Toeplitz, then by collecting equal terms we have

|Tn

Tn |2

n1

n1

X n1

X

k=0 j=0

= n1

n1

X

|tkj tkj |2

(n |k|)|tk tk |2

k=(n1)

n1

X

(4.12)

(1 |k|/n)|tk tk |2

k=(n1)

We are now ready to put all the pieces together to study the asymptotic behavior of Tn . If we can find an asymptotically equivalent circulant matrix then all of the results of Chapters 2 and 3 can be instantly

36

Toeplitz Matrices

applied. The main difference between the derivations for the finite and

infinite order case is the circulant matrix chosen. Hence to gain some

feel for the matrix chosen we first consider the simpler finite order case

where the answer is obvious, and then generalize in a natural way to

the infinite order case.

4.2

that is, ti = 0 unless |i| m. Since we are interested in the behavior

or Tn for large n we choose n >> m. A typical Toeplitz matrix will

then have the appearance of the following matrix, possessing a band of

nonzero entries down the central diagonal and zeros everywhere else.

With the exception of the upper left and lower right hand corners that

Tn looks like a circulant matrix, i.e., each row is the row above shifted

to the right one place.

t0

t1

..

.

t1

t0

t

m

..

.

T =

tm

0

..

..

tm

t1 t0 t1

..

tm

tm

tm

..

.

t0 t1

t1 t0

(4.13)

We can make this matrix exactly into a circulant if we fill in the upper

right and lower left corners with the appropriate entries. Define the

t0

t1 tm

t1

.

..

..

.

tm

..

tm

t1 t0 t1

T =

tm

.

..

.

.

.

t1

tm

(n)

(n)

c

cn1

0

(n)

(n)

c

n1 c0

..

(n)

(n)

(n)

c1

cn1 c0

(n)

tm

37

..

.

t1

..

.

tm

tm

..

tm

tm

t0

t1

(4.14)

(n)

tk

k = 0, 1, , m

(n)

ck =

(4.15)

t(nk)

k = n m, , n 1

0

otherwise

Cn (f ).

..

.

t1

t0

38

Toeplitz Matrices

asymptotically equivalent to Tn (f ) we need only prove that it is

indeed both asymptotically equivalent and simple.

Lemma 4.2. The matrices Tn and Cn defined in (4.13) and (4.14) are

asymptotically equivalent, i.e., both are bounded in the strong norm

and

lim |Tn Cn | = 0.

(4.16)

n

Proof. The tk are obviously absolutely summable, so Tn are uniformly bounded by 2M|f | from lemma 4.1. The matrices Cn are also

uniformly bounded since Cn Cn is a circulant matrix with eigenvalues

|f (2k/n)|2 4M|f2 | . The weak norm of the difference is

|Tn Cn |2 = n1

m

X

k(|tk |2 + |tk |2 )

k=0

mn1

m

X

k=0

(|tk |2 + |tk |2 ) n 0

2

The above lemma is almost trivial since the matrix Tn Cn has

fewer than m2 non-zero entries and hence the n1 in the weak norm

drives |Tn Cn | to zero.

From lemma 4.2 and theorem 2.2 we have the following lemma:

Lemma 4.3. Let Tn and Cn be as in (4.13) and (4.14) and let their

eigenvalues be n,k and n,k , respectively, then for any positive integer

s

n1

n1

X

X

s

s

lim n1

n,k

= lim n1

n,k

.

(4.17)

n

k=0

k=0

n1

n1

X

1 X s

s

n,k n1

n,k

n

Kn1/2 ,

k=0

k=0

(4.18)

39

Proof. Equation (4.17) is direct from lemma 4.2 and theorem 2.2.

Equation (4.18) follows from Corollary 2.1 and lemma 4.2.

2

Lemma 4.3 is of interest in that for finite order Toeplitz matrices

one can find the rate of convergence of the eigenvalue moments. It can

be shown that K sMfs1 .

The above two lemmas show that we can immediately apply the

results of Section II to Tn and Cn . Although theorem 2.1 gives us immediate hope of fruitfully studying inverses and products of Toeplitz

matrices, it is not yet clear that we have simplified the study of the

eigenvalues. The next lemma clarifies that we have indeed found a useful approximation.

let n,k be the eigenvalues of Cn (f ), then for any positive integer s we

have

lim n1

n1

X

k=0

s

n,k

= (2)1

f s () d.

(4.19)

continuous on [mf , Mf ] we have

lim n1

n1

X

k=0

F (n,k ) = (2)1

F [f ()] d.

0

(4.20)

40

Toeplitz Matrices

n,j =

n1

X

(n)

ck e2ijk/n

k=0

m

X

tk e2ijk/n +

k=0

m

X

n1

X

tnk e2ijk/n .

(4.21)

k=nm

tk e2ijk/n = f (2jn1 )

k=m

uniformly spaced between 0 and 2. Defining 2k/n = k and 2/n =

we have

lim n

n1

X

s

n,k

k=0

lim n

n1

X

f (2k/n)s

k=0

n1

X

lim

f (k )s /(2)

k=0

= (2)

f ()s d,

(4.22)

(4.22) as a Riemann integral. If Tn and Cn are Hermitian than the n,k

and f () are real and application of the Stone-Weierstrass theorem to

(4.22) yields (4.20). Lemma 4.2 and (4.21) ensure that n,k and n,k

are in the real interval [mf , Mf ].

2

Combining lemmas 4.2-4.4 and theorem 2.2 we have the following

special case of the fundamental eigenvalue distribution theorem.

Theorem 4.1. If Tn (f ) is a finite order Toeplitz matrix with eigenvalues n,k , then for any positive integer s

Z 2

n1

X

1

s

1

lim n

n,k = (2)

f ()s d.

(4.23)

n

k=0

41

lim n

n1

X

F (n,k ) = (2)

F [f ()] d;

(4.24)

k=0

i.e., the sequences n,k and f (2k/n) are asymptotically equally distributed.

x and Cn x = x, n > 2m + 1, are essentially the same nth order

difference equation with different boundary conditions. It is in fact the

nice boundary conditions that make easy to find exactly while

exact solutions for are usually intractable.

With the eigenvalue problem in hand we could next write down

theorems on inverses and products of Toeplitz matrices using lemma

4.2 and the results of Chapters 2-3. Since these theorems are identical in statement and proof with the infinite order absolutely summable

Toeplitz case, we defer these theorems momentarily and generalize theorem 4.1 to more general Toeplitz matrices with no assumption of finite

order.

4.3

Obviously the choice of an appropriate circulant matrix to approximate a Toeplitz matrix is not unique, so we are free to choose a

construction with the most desirable properties. It will, in fact, prove

useful to consider two slightly different circulant approximations to a

given Toeplitz matrix. Say we have an absolutely summable sequence

{tk ; k = 0, 1, 2, } with

f () =

tk eik

k=

tk = (2)1

.

Z

2

0

f ()eik

(4.25)

42

Toeplitz Matrices

Define Cn (f ) to be the

(n) (n)

(n)

(c0 , c1 , , cn1 ) where

n1

X

(n)

ck = n1

circulant

matrix

with

top

f (2j/n)e2ijk/n .

row

(4.26)

j=0

(n)

lim ck

lim n1

= (2)1

n1

X

f (2j/n)e2ijk/n

j=0

(4.27)

f ()eik d = tk

(n)

and hence the ck are simply the sum approximations to the Riemann

integral giving tk . Equations (4.26), (3.7), and (3.9) show that the

eigenvalues n,m of Cn are simply f (2m/n); that is, from (3.7) and

(3.9)

n,m =

n1

X

(n)

ck e2imk/n

k=0

n1

X

k=0

n1

n1

X

j=0

n1

X

f (2j/n)e2ijk/n e2imk/n

(

f (2j/n) n1

j=0

n1

X

22ik(jm)/n

k=0

(4.28)

= f (2m/n)

Thus, Cn (f ) has the useful property (4.21) of the circulant approximation (4.15) used in the finite case. As a result, the conclusions of lemma

4.4 hold for the more general case with Cn (f ) constructed as in (4.26).

Equation (4.28) in turn defines Cn (f ) since, if we are told that Cn is a

circulant matrix with eigenvalues f (2m/n), m = 0, 1, , n 1, then

43

from (3.9)

(n)

ck

n1

n1

X

n,m e2imk/n

m=0

= n1

n1

X

(4.29)

f (2m/n)e2imk/n

m=0

The fact that lemma 4.4 holds for Cn (f ) yields several useful properties as summarized by the following lemma.

Lemma 4.5.

(1) Given a function f of (4.25) and the circulant matrix Cn (f )

defined by (4.26), then

(n)

ck

tk+mn ,

m=

k = 0, 1, , n 1.

(4.30)

(2) Given Tn (f ) where f () is real and f () mf > 0, then

Cn (f )1 = Cn (1/f ).

(3) Given two functions f () and g(), then

Cn (f )Cn (g) = Cn (f g).

Proof.

(1) Since e2imk/n is periodic with period n, we have that

f (2j/n) =

tm ei2jm/n

m=

n1

X

l=0 m=

tm ei2jm/n

m=

t1+mn e2ijl/n

44

Toeplitz Matrices

(n)

ck

n1

n1

X

f (2j/n)e2ijk/n

j=0

= n1

n1

X n1

X

t1+mn e2ij(kl)/n

j=0 l=0 m=

n1

X

t1+mn

n1

l=0 m=

n1

X

e2ij(kl)/n

j=0

tk+mn

m=

Cn (f )1 has eigenvalues 1/f (2k/n), and hence from (4.29)

and the fact that Cn (f )1 is circulant we have Cn (f )1 =

Cn (1/f ).

(3) Follows immediately from theorem 3.1 and the fact that, if

f () and g() are Riemann integrable, so is f ()g().

2

Equation (4.30) points out a shortcoming of Cn (f ) for applications

as a circulant approximation to Tn (f ) it depends on the entire sequence {tk ; k = 0, 1, 2, } and not just on the finite collection of

elements {tk ; k = 0, 1, , n 1} of Tn (f ). This can cause problems

in practical situations where we wish a circulant approximation to a

Toeplitz matrix Tn when we only know Tn and not f . Pearl [16] discusses several coding and filtering applications where this restriction

is necessary for practical reasons. A natural such approximation is to

form the truncated Fourier series

fn () =

n

X

tk eik ,

(4.31)

k=n

45

circulant matrix

Cn = Cn (fn );

(4.32)

(n)

(n)

c0 , , cn1 ) where

(n)

ck

n1

n1

X

fn (2j/n)e2ijk/n

j=0

n1

n1

X

j=0

n

X

m=n

n

X

tm

e2ijk/n

m=n

tm n1

n1

X

j=0

e2ijk/n

(4.33)

e2ij(k+m)/n .

and hence

(n)

ck = tk + tnk , k = 0, 1, , n 1.

for an r th order Toeplitz matrix if n > 2r + 1.

The matrix Cn does not have the property (4.28) of having eigenvalues f (2k/n) in the general case (its eigenvalues are fn (2k/n), k =

0, 1, , n 1), but it does not have the desirable property of depending only on the entries of Tn . The following lemma shows that these

circulant matrices are asymptotically equivalent to each other and to

Tm .

k=

f () =

|tk | <

k=

tk eik .

46

Toeplitz Matrices

(4.31)-(4.32). Then,

Cn (f ) Cn Tn .

(4.34)

Proof. Since both Cn (f ) and Cn are circulant matrices with the same

eigenvectors (theorem 3.1), we have from part 2 of theorem 3.1 and

(2.14) and the comment following it that

|Cn (f ) Cn |2 = n1

n1

X

k=0

|f (2k/n) fn (2k/n)|2 .

Recall from (4.4) and the related discussion that fn () uniformly converges to f (), and hence given > 0 there is an N such that for n N

we have for all k, n that

|f (2k(n) fn (2k/n)|2

and hence for n N

|Cn (f ) Cn |2 n1

n1

X

= .

i=0

Since is arbitrary,

lim |Cn (f ) Cn | = 0

proving that

Cn (f ) Cn .

(4.35)

|Tn Cn |2 =

n1

X

(1 |k|/n)|tk tk |2 .

k=(n1)

(n)

= t|k| + tn|k|

c|k|

tk =

(n)

cnk = t|nk| + tk

k0

k0

(4.36)

47

and hence

|Tn Cn |2 = |tn1 |2 +

n1

X

= |tn1 |2 +

n1

X

k=0

k=0

(1 k/n)(|tnk |2 + |t(nk) |2 )

.

(k/n)(|tk |2 + |tk |2 )

lim |tn1 |2 = 0

k=N

|tk |2 + |tk |2

and hence

lim |Tn Cn | =

lim

n1

X

k=0

(k/n)(|tk |2 + |tk |2 )

(N 1

)

n1

X

X

lim

(k/n)(|tk |2 + |tk |2 ) +

(k/n)(|tk |2 + |tk |2 ) .

k=0

lim n1

k=N

N

1

X

k=0

k(|tk |2 + |tk |2 )

k=N

(|tk |2 + |tk |2 )

Since is arbitrary,

lim |Tn Cn | = 0

and hence

Tn Cn ,

(4.37)

2

Pearl [16] develops a circulant matrix similar to Cn (depending only

on the entries of Tn ) such that (4.37) holds in the more general case

where (4.1) instead of (4.2) holds.

48

Toeplitz Matrices

to Tn and whose eigenvalues, inverses and products are known exactly.

We can now use lemmas 4.2-4.4 and theorems 2.2-2.3 to immediately

generalize theorem 4.1

f () is Riemann integrable, e.g., f () is bounded or the sequence tk is

absolutely summable. Then if n,k are the eigenvalues of Tn and s is

any positive integer

lim n1

n1

X

s

n,k

= (2)1

k=0

f ()s d.

(4.38)

F (x) continuous on [mf , Mf ]

lim n

n1

X

F (n,k ) = (2)

F [f ()] d.

(4.39)

k=0

of Szego [1]. The approach used here is essentially a specialization of

Grenanders ([13], ch. 7).

Theorem 4.2 yields the following two corollaries.

Corollary 4.1. Let Tn (f ) be Hermitian and define the eigenvalue distribution function Dn (x) = n1 (number of n,k x). Assume that

Z

d = 0.

:f ()=x

given by

Z

D(x) = (2)1

d.

f ()x

49

set of for which f () = x is needed to ensure that x is a point of

continuity of the limiting distribution.

Proof. Define the characteristic function

mf x

1

1x () =

.

0

otherwise

We have

D(x) = lim n1

n

n1

X

1x (n,k ) .

k=0

4.2 cannot be immediately implied. To get around this problem we

mimic Grenander and Szego p. 115 and define two continuous functions

that provide upper and lower bounds to 1x and will converge to it in

the limit. Define

1

+

1x () = 1 x

x<x+

0

x+<

1x () = 1

x+

x<x

x<

The idea here is that the upper bound has an output of 1 everywhere

1x does, but then it drops in a continuous linear fashion to zero at x +

instead of immediately at x. The lower bound has a 0 everywhere 1x

does and it rises linearly from x to x to the value of 1 instead of

instantaneously as does 1x . Clearly

+

1

x () < 1x () < 1x ()

for all .

50

Toeplitz Matrices

Since both 1+

x and 1x are continuous, theorem 4 can be used to

conclude that

n1

X

1

lim n

1+

x (n,k )

n

k=0

= (2)

= (2)

(2)1

1+

x (f ()) d

d + (2)

(1

d + (2)1

f ()x

and

lim n

n1

X

x<f ()x+

f ()x

f () x

) d

x<f ()x+

1

x (n,k )

k=0

= (2)

= (2)1

= (2)1

1

x (f ()) d

d + (2)1

d + (2)1

(2)

= (2)1

f ()x

f ()x

f () (x )

) d

x<f ()x

(1

x<f ()x

(x f ()) d

f ()x

f ()x

d (2)1

x<f ()x

These inequalities imply that for any > 0, as n grows the sample

P

average n1 n1

k=0 1x (n,k ) will be sandwiched between

Z

Z

1

1

(2)

d + (2)

d

f ()x

and

(2)1

f ()x

x<f ()x+

d (2)1

x<f ()x

d.

51

Since can be made arbitrarily small, this means the sum will be

sandwiched between

Z

1

(2)

d

f ()x

and

1

(2)

f ()x

Thus if

d (2)

d.

f ()=x

d = 0,

f ()=x

then

D(x) =

(2)1

= (2)1 v

1x [f ()]d

0

.

d

f ()x

lim max n,k = Mf

Z

D(mf + ) =

d > 0.

f ()mf +

lim n1 {number of n,k in [mf , mf + ]} > 0

small . Since n,k mf by lemma 4.1, the minimum result is proved.

The maximum result is proved similarly.

2

52

Toeplitz Matrices

4.4

In some applications a sequence of Toeplitz matrices Tn (f ) is constructed using an integrable f which is not bounded. Nonetheless, we

would like to be able to say something about the asymptotic distribution of the eigenvalues. A related problem arises when the function

f is bounded, but we wish to study the asymptotic distribution of a

function F (n,k ) of the eigenvalues that is not continuous at the minimum or maximum value of f . For example, F (f ()) = 1/f () or

F (f ()) = ln f () is not continuous at f () = 0. In either of the cases

the basic asymptotic eigenvalue distribution theorem 4.2 breaks down

and the limits and integral involved might not exist. The limits might

exist and equal something else, or they might simply fail to exist. In

the next section we delve into a particular case arising when treating the inverses of Toeplitz matrices when f has zeros, but we state

without proof an intuitive extension of the fundamental Toeplitz result

that shows how to find asymptotic distributions of suitably truncated

functions. To state the result, define the mid function

z y z

mid(x, y, z) = y x y z

(4.40)

x y z

x < z and the function can be thought of as having input y and thresholds z and X and it puts out y if y is between z and x, z if y is smaller

than z, and x if y is greater than x. The following result was proved in

[11] and extended in [22]. See also [23, 24, 25].

Theorem 4.3. Consider a sequence Tn (f ) of Hermitian Toeplitz matrices (f () is real and integrable) then for any function F (x) continuous on [, ]

Z 2

n1

X

1

1

lim n

F (mid(, n,k , ) = (2)

F [mid(, f (), ] d.

n

k=0

(4.41)

is continuous on the closed interval [, ]. These need not be the min-

53

imum and maximum of f , and indeed these extrema may not even be

bounded.

4.5

Theorem 4.4. Let Tn (f ) be a sequence of Hermitian Toeplitz matrices such that f () is Riemann integrable and f () 0 with equality

holding at most at a countable number of points.

(a) Tn (f ) is nonsingular

(b) If f () mf > 0, then

Tn (f )1 Cn (f )1 ,

(4.42)

Cn (f ) = Dn then Tn (f )1 has the expansion

Tn (f )1 = [Cn (f ) + Dn ]1

1

= Cn (f )1 I + Dn Cn (f )1

h

i

2

= Cn (f )1 I + Dn Cn (f )1 + Dn Cn (f )1 +

(4.43)

and the expansion converges (in weak norm) for sufficiently large n.

(c) If f () mf > 0, then

Z

1

1

i(kj)

Tn (f ) Tn (1/f ) = (2)

de

/f () ;

(4.44)

that is, if the spectrum is strictly positive then the inverse of a Toeplitz

matrix is asymptotically Toeplitz. Furthermore if n,k are the eigenvalues of Tn (f )1 and F (x) is any continuous function on [1/Mf , 1/mf ],

then

Z

n1

X

1

1

lim n

F (n,k ) = (2)

F [(1/f ()] d.

(4.45)

n

k=0

54

Toeplitz Matrices

exists and is bounded, then Tn (f )1 is not bounded, 1/f () is not

integrable and hence Tn (1/f ) is not defined and the integrals of (4.41)

may not exist. For any finite , however, the following similar fact is

true: If F (x) is a continuous function of [1/Mf , ], then

lim n1

n1

X

F [min(n,k , )] = (2)1

F [min(1/f (), )] d.

k=0

(4.46)

Proof.

(a) Since f () > 0 except at possible a finite number of points, we

have from (4.9)

2

Z n1

1

X

ik

x Tn x =

xk e f ()d > 0

2

k=0

k

and hence

det Tn =

n1

Y

k=0

n,k 6= 0

so that Tn (f ) is nonsingular.

(b) From lemma 4.6, Tn Cn and hence (4.40) follows from theorem

2.1 since f () mf > 0 ensures that

k Tn1 k, k Cn1 k 1/mf < .

The series of (4.41) will converge in weak norm if

|Dn Cn1 | < 1

(4.47)

since

(4.45) must hold for large enough n. From (4.40), however, if n is large

enough, then probably the first term of the series is sufficient.

55

(c) We have

|Tn (f )1 Tn (1/f )| |Tn (f )1 Cn (f )1 | + |Cn (f )1 Tn (1/f )|.

From (b) for any > 0 we can choose an n large enough so that

|Tn (f )1 Cn (f )1 | .

(4.48)

2

From theorem 3.1 and lemma 4.5, Cn (f )1 = Cn (1/f ) and from lemma

4.6 Cn (1/f ) Tn (1/f ). Thus again we can choose n large enough to

ensure that

|Cn (f )1 Tn (1/f )| /2

(4.49)

so that for any > 0 from (4.46)-(4.47) can choose n such that

|Tn (f )1 Tn (1/f )|

which is (4.42). Equation (4.43) follows from (4.42) and theorem 2.4.

Alternatively, if G(x) is any continuous function on [1/Mf , 1/mf ] and

(4.43) follows directly from lemma 4.6 and theorem 2.4 applied to

G(1/x).

(d) When f () has zeros (mf = 0) then from Corollary 4.2

lim minn,k = 0 and hence

n

k

(4.50)

hence that Tn (1/f ) does not exist we define the sets

Ek = { : 1/k f ()/Mf > 1/(k + 1)}

= { : k Mf /f () < k + 1}

(4.51)

since f () is continuous on [0, Mf ] and has at least one zero all of these

sets are nonzero intervals of size, say, |Ek |. From (4.49)

Z

X

d/f ()

|Ek |k/Mf

(4.52)

k=1

df

.

(4.53)

d

56

Toeplitz Matrices

Z

X

d/f ()

(k/Mf )(1/k 1/(k + 1)/

k=1

(4.54)

X

1

= (Mf )

1/(k + 1)

k=1

which diverges so that 1/f () is not integrable. To prove (4.44) let F (x)

be continuous on [1/Mf , ], then F [min(1/x, )] is continuous on [0, Mf ]

and hence theorem 2.4 yields (4.44). Note that (4.44) implies that the

eigenvalues of Tn1 are asymptotically equally distributed up to any

finite as the eigenvalues of the sequence of matrices Tn [min(1/f, )].

2

A special case of part 4 is when Tn (f ) is finite order and f () has

at least one zero. Then the derivative exists and is bounded since

m

X

df /d =

iktk eik

k=m

m

X

|k||tk | <

k=m

The series expansion of part 2 is due to Rino [6]. The proof of part

4 is motivated by one of Widom [29]. Further results along the lines

of part 4 regarding unbounded Toeplitz matrices may be found in [11].

Related results considering asymptotically equal distributions of unbounded sequences can be found in Tyrtyshnikov [28] and Trench [22].

These works extend Weyls definition of asymptotically equal distributions to unbounded sequences and develop conditions for and implications of such equal distributions.

Extending (a) to the case of non-Hermitian matrices can be somewhat difficult, i.e., finding conditions on f () to ensure that Tn (f ) is

invertible. Parts (a)-(d) can be straightforwardly extended if f () is

continuous. For a more general discussion of inverses the interested

reader is referred to Widom [29] and the cited references. It should be

pointed out that when discussing inverses Widom is concerned with the

57

are similar. The results of Baxter [1] can also be applied to consider

the asymptotic behavior of finite inverses in quite general cases.

4.6

We next combine theorems 2.1 and lemma 4.6 to obtain the asymptotic

behavior of products of Toeplitz matrices. The case of only two matrices

is considered first since it is simpler.

and g() are two bounded Riemann integrable functions. Define Cn (f )

and Cn (g) as in (4.29) and let n,k be the eigenvalues of Tn (f )Tn (g)

(a)

Tn (f )Tn (g) Cn (f )Cn (g) = Cn (f g).

(4.55)

Tn (f )Tn (g) Tn (g)Tn (f ).

lim n

n1

X

sn,k

k=0

= (2)

[f ()g()]s d s = 1, 2, . . . .

(4.56)

(4.57)

(b) If Tn (t) and Tn (g) are Hermitian, then for any F (x) continuous on

[mf mg , Mf Mg ]

lim n

n1

X

F (n,k ) = (2)

F [f ()g()] d.

(4.58)

k=0

(c)

Tn (f )Tn (g) Tn (f g).

(4.59)

(d) Let f1 (), .., fm () be Riemann integrable. Then if the Cn (fi ) are

defined as in (4.29)

!

!

m

m

m

Y

Y

Y

Tn (fi ) Cn

fi Tn

fi .

(4.60)

i=1

i=1

i=1

58

Toeplitz Matrices

m

Y

Tn (fi ), then for any positive integer

i=1

s

lim n1

n1

X

sn,k = (2)1

k=0

m

Y

!s

fi ()

i=1

(4.61)

If the Tn (fi ) are Hermitian, then the n,k are asymptotically real,

i.e., the imaginary part converges to a distribution at zero, so that

!s

Z 2 Y

n1

m

X

s

1

1

lim n

(Re[n,k ]) = (2)

fi () d.

(4.62)

n

k=0

lim n1

n1

X

i=1

([n,k ])2 = 0.

(4.63)

k=0

Proof.

(a) Equation (4.53) follows from lemmas 4.5 and 4.6 and theorems 2.1

and 3. Equation (4.54) follows from (4.53). Note that while Toeplitz

matrices do not in general commute, asymptotically they do. Equation

(4.55) follows from (4.53), theorem 2.2, and lemma 4.4.

(b) Proof follows from (4.53) and theorem 2.4. Note that the eigenvalues of the product of two Hermitian matrices are real ([15], p. 105).

(c) Applying lemmas 4.5 and 4.6 and theorem 2.1

|Tn (f )Tn (g) Tn (f g)|

= |Tn (f )Tn (g) Cn (f )Cn (g) + Cn (f )Cn (g) Tn (f g)|

(e) Equation (4.60) follows from (d) and theorem 2.1. For the Hermitian case, however,

Y we cannot simply apply theorem 2.4 since the

eigenvalues n,k of

Tn (fi ) may not be real. We can show, however,

i

that they are asymptotically real in the sense that the imaginary part

59

vanishes in the limit. Let n,k = n,k + in,k where n,k and n,k are

real. Then from theorem 2.2 we have for any positive integer s

lim n1

n1

X

(n,k + in,k )s =

k=0

lim n1

= (2)

m

Y

fi

i=1

n1

n1

X

k=0

|n,k |2 = n1

n1

X

k=0

i=1

= (2)1

2

2n,k + n,k

2

m

Y

lim Tn (fi ) = lim Cn

n

n

s

n,k

k=0

n1

X

2

"m

#s

Y

fi () d (4.64)

i=1

. From (2.14)

2

m

Y

Tn (fi ) .

i=i

!2

m

Y

fi

i=1

!2

m

Y

fi () d

(4.65)

i=1

lim n1

n1

X

k=1

2

n,k

0.

Thus the distribution of the imaginary parts tends to the origin and

hence

#s

Z 2 "Y

n1

m

X

lim n1

sn,k = (2)1

fi () d.

n

k=0

i=1

2

Parts (d) and (e) are here proved as in Grenander and Szego ([13],

pp. 105-106.

We have developed theorems on the asymptotic behavior of eigenvalues, inverses, and products of Toeplitz matrices. The basic method

60

Toeplitz Matrices

special simple structure as developed in Chapter 3 could be directly

related to the Toeplitz matrices using the results of Chapter 2. We began with the finite order case since the appropriate circulant matrix is

there obvious and yields certain desirable properties that suggest the

corresponding circulant matrix in the infinite case. We have limited our

consideration of the infinite order case to absolutely summable coefficients or to bounded Riemann integrable functions f () for simplicity.

The more general case of square summable tk or bounded Lebesgue

integrable f () treated in Chapter 7 of [13] requires significantly more

mathematical care but can be interpreted as an extension of the approach taken here.

4.7

Toeplitz Determinants

The fundamental Toeplitz eigenvalue distribution theory has an interesting application for characterizing the limiting behavior of determinants. Suppose now that Tn (f ) is a sequence of Hermitian Toeplitz

matrices such that that f () mf > 0. Let Cn = Cn (f ) denote the

sequence of circulant matrices constructed from f as in (4.26). Then

from (4.28) the eigenvalues of Cn are f (2m/n) for m = 0, 1, . . . , n 1

Q

and hence detCn = n1

m=0 f (2m/n). This in turn implies that

1

ln (det(Cn )) n =

n1

1

1 X

m

ln detCn =

ln f (2 ).

n

n m=0

n

whence

Z 1

1

n

lim ln (det(Cn )) =

ln f (2) d.

n

positive arguments, and changing the variables of integration yields

1

lim (det(Cn )) n = e 2

R 2

0

ln f () d.

and Corollary 2.4 togther yield the following result ([13], p. 65).

61

such that ln f () is Riemann integrable and f () mf > 0. Then

1

R 2

0

ln f () d.

(4.66)

5

Applications to Stochastic Time Series

random processes. Covariance matrices of weakly stationary processes

are Toeplitz and triangular Toeplitz matrices provide a matrix representation of causal linear time invariant filters. As is well known and

as we shall show, these two types of Toeplitz matrices are intimately

related. We shall take two viewpoints in the first section of this chapter

section to show how they are related. In the first part we shall consider two common linear models of random time series and study the

asymptotic behavior of the covariance matrix, its inverse and its eigenvalues. The well known equivalence of moving average processes and

weakly stationary processes will be pointed out. The lesser known fact

that we can define something like a power spectral density for autoregressive processes even if they are nonstationary is discussed. In the

second part of the first section we take the opposite tack we start

with a Toeplitz covariance matrix and consider the asymptotic behavior of its triangular factors. This simple result provides some insight

into the asymptotic behavior of system identification algorithms and

Wiener-Hopf factorization.

The second section provides another application of the Toeplitz

63

64

Shannon information rate of a stationary Gaussian random process.

Let {Xk ; k I} be a discrete time random process. Generally we

take I = Z, the space of all integers, in which case we say that the

process is two-sided, or I = Z+ , the space of all nonnegative integers,

in which case we say that the process is one-sided. We will be interested

in vector representations of the process so we define the column vector

(ntuple) X n = (X0 , X1 , . . . , Xn1 )t , that is, X n is an n-dimensional

column vector. The mean vector is defined by mn = E(X n ), which we

usually assume is zero for convenience. The n n covariance matrix

Rn = {rj,k } is defined by

Rn = E[(X n mn )(X n mn ) ].

(5.1)

This is the autocorrelation matrix when the mean vector is zero. Subscripts will be dropped when they are clear from context. If the matrix

Rn is Toeplitz, say Rn = Tn (f ), then rk,j = rkj and the process is said

X

to be weakly stationary. In this case we can define f () =

rk eik

k=

Toeplitz but is asymptotically Toeplitz, i.e., Rn Tn (f ), then we say

that the process is asymptotically weakly stationary and once again

define f () as the power spectral density. The latter situation arises, for

example, if an otherwise stationary process is initialized with Xk = 0,

k 0. This will cause a transient and hence the process is strictly

speaking nonstationary. The transient dies out, however, and the statistics of the process approach those of a weakly stationary process as n

grows.

The results derived herein are essentially trivial if one begins and

deals only with doubly infinite matrices. As might be hoped the results

for asymptotic behavior of finite matrices are consistent with this case.

The problem is of interest since one often has finite order equations

and one wishes to know the asymptotic behavior of solutions or one

has some function defined as a limit of solutions of finite equations.

These results are useful both for finding theoretical limiting solutions

and for finding reasonable approximations for finite order solutions. So

much for philosophy. We now proceed to investigate the behavior of

65

two common linear models. For simplicity we will assume the process

means are zero.

5.1

pass a zero mean, independent identically distributed (iid) sequence of

random variables Wk with variance 2 through a linear time invariant

discrete time filtered to obtain the desired process. The process Wk is

discrete time white noise. The most common such model is called a

moving average process and is defined by the difference equation

n

n

X

X

Un =

bk Wnk =

bnk Wk

(5.2)

k=0

k=0

Un = 0; n < 0.

can incorporate b0 into 2 . Note that (5.2) is a discrete time convolution, i.e., Un is the output of a filter with impulse response (actually

Kronecker response) bk and input Wk . We could be more general by

allowing the filter bk to be noncausal and hence act on future Wk s.

We could also allow the Wk s and Uk s to extend into the infinite past

rather than being initialized. This would lead to replacing of (5.2) by

Un =

bk Wnk =

k=

bnk Wk .

(5.3)

k=

We will restrict ourselves to causal filters for simplicity and keep the

initial conditions since we are interested in limiting behavior. In addition, since stationary distributions may not exist for some models it

would be difficult to handle them unless we start at some fixed time.

For these reasons we take (5.2) as the definition of a moving average.

Since we will be studying the statistical behavior of Un as n gets

arbitrarily large, some assumption must be placed on the sequence bk

to ensure that (5.2) converges in the mean-squared sense. The weakest

possible assumption that will guarantee convergence of (5.2) is that

X

|bk |2 < .

(5.4)

k=0

66

stronger assumption

X

|bk | < .

(5.5)

k=0

Equation (5.2) can be rewritten as a matrix equation by defining

the lower triangular Toeplitz matrix

1

0

b

1

1

b1

b2

Bn = .

(5.6)

.

.

.. ..

b2

..

bn1 . . .

b 2 b1 1

so that

U n = Bn W n .

(5.7)

addition (5.3) held, i.e., we looked at the entire process at each time

instant, then (5.7) would require infinite vectors and matrices as in

Grenander and Rosenblatt [12]. Since the covariance matrix of Wk is

simply 2 In , where In is the n n identity matrix, we have for the

covariance of Un :

(n)

RU

= EU n (U n ) = EBn W n (W n ) Bn

= Bn Bn

(5.8)

or, equivalently

rk,j = 2

n1

X

bkbj

=0

(5.9)

min(k,j)

= 2

b+(kj)b

=0

From (5.9) it is clear that rk,j is not Toeplitz because of the min(k, j) in

the sum. However, as we next show, as n the upper limit becomes

67

(n)

b() =

bk eik

(5.10)

k=0

then

Bn = Tn (b)

(5.11)

RU = 2 Tn (b)Tn (b) .

(5.12)

so that

(n)

We can now apply the results of the previous sections to obtain the

following theorem.

(n)

matrix RUn . Let n,k be the eigenvalues of RU . Then

(n)

RU 2 Tn (|b|2 ) = Tn ( 2 |b|2 )

(5.13)

is any continuous function on [m, M ], then

Z 2

n1

X

lim n1

F (n,k ) = (2)1

F ( 2 |b()|2 ) d.

(5.14)

n

k=0

(n) 1

RU

2 Tn (1/|b|2 ).

(5.15)

2

If the process Un had been initiated with its stationary distribution

then we would have had exactly

(n)

RU = 2 Tn (|b|2 ).

(n) 1

can be gained from theorem

4.3, e.g., circulant approximations. Note that the spectral density of

the moving average process is 2 |b()|2 and that sums of functions of

eigenvalues tend to an integral of a function of the spectral density. In

effect the spectral density determines the asymptotic density function

for the eigenvalues of Rn and Tn .

68

5.2

Autoregressive Processes

defined by

Xn =

n

X

ak Xnk + Wn

n = 0, 1, . . .

k=1

Xn = 0

n < 0.

(5.16)

Wiener process. Equation (5.16) can be rewritten as a vector equation

by defining the lower triangular matrix.

1

a

1

0

1

a1 1

(5.17)

An =

.. ..

.

.

an1

a1 1

so that

An X n = W n .

We have

(n)

(n)

RW = An RX An

(5.18)

(n)

1

RX = 2 A1

n An

or

(5.19)

(n)

(RX )1 = 2 An An

or equivalently, if

tk,j =

(n)

(RX )1

n

X

(5.20)

= {tk,j } then

nmax(k,j)

a

mk amj =

m=0

am am+(kj) .

m=0

Unlike the moving average process, we have that the inverse covariance

matrix is the product of Toeplitz triangular matrices. Defining

a() =

X

k=0

ak eik

(5.21)

we have that

(n)

69

(5.22)

(n)

matrix RX with eigenvalues n,k . Then

(n)

(RX )1 2 Tn (|a|2 ).

(5.23)

lim n

n1

X

F (1/n,k ) = (2)

k=0

2

0

F ( 2 |a()|2 ) d,

(5.24)

(n)

(n)

RX 2 Tn (1/|a|2 )

(5.25)

Proof. Theorem 5.1.

2

Note that if |a()|2 > 0, then 1/|a()|2 is the spectral density of Xn .

(n)

If |a()|2 has a zero, then RX may not be even asymptotically Toeplitz

and hence Xn may not be asymptotically stationary (since 1/|a()|2

may not be integrable) so that strictly speaking xk will not have a

spectral density. It is often convenient, however, to define 2 /|a()|2 as

the spectral density and it often is useful for studying the eigenvalue

(n)

distribution of Rn . We can relate 2 /|a()|2 to the eigenvalues of RX

even in this case by using theorem 4.3 part 4.

Corollary 5.1. If Xk is an autoregressive process and n,k are the

(n)

eigenvalues of RX , then for any finite and any function F (x) continuous on [1/m , ]

lim n

n1

X

k=0

F [min(n,k , )] = (2)

F [min(1/|a()|2 , )] d.

(5.26)

70

2

If we consider two models of a random process to be asymptotically

equivalent if their covariances are asymptotically equivalent, then from

theorems 5.1d and 5.2 we have the following corollary.

Corollary 5.2. Consider the moving average process defined by

U n = Tn (b)W n

and the autoregressive process defined by

Tn (a)X n = W n .

Then the processes Un and Xn are asymptotically equivalent if

a() = 1/b()

and M a() m > 0 so that 1/b() is integrable.

Proof. Follows from theorems 4.3 and 4.5 and

(n)

RX

2 Tn (1/a)Tn (1/a)

2 Tn (1/a) Tn (1/a).

(5.27)

2

The methods above can also easily be applied to study the mixed

autoregressive-moving average linear models [29].

5.3

Factorization

of a sequence of Hermitian covariance matrices Tn (f ). It is well known

that any such matrix can be factored into the product of a lower triangular matrix and its conjugate transpose ([12], p. 37), in particular

Tn (f ) = {tk,j } = Bn Bn ,

(5.28)

(n)

(5.29)

5.3. Factorization

71

where (j, k) is the determinant of the matrix Tk with the right hand

column replaced by (tj,0 , tj,1 , . . . , tj,k1 )t . Note in particular that the

diagonal elements are given by

(n)

(5.30)

Equation (5.29) is the result of a Gaussian elimination or a GramSchmidt procedure. The factorization of Tn allows the construction of a

linear model of a random process and is useful in system identification

and other recursive procedures. Our question is how Bn behaves for

large n; specifically is Bn asymptotically Toeplitz?

Assume that f () m > 0. Then ln f () is integrable and we can

perform a Wiener-Hopf factorization of f (), i.e.,

f () = 2 |b()|2

b() = b()

b() =

(5.31)

bk eik

k=0

b0 = 1

From (5.28) and theorem 4.4 we have

Bn Bn = Tn (f ) Tn (b)Tn (b) .

(5.32)

Bn Tn (b).

(5.33)

since det Bn = [det Tn (f )]1/2 we have from theorem 4.3 part 1 that

det Tn (f ) 6= 0 so that Bn is invertible. Thus from theorem 2.1 (e) and

(5.32) we have

Tn1 Bn = [Bn1 Tn ]1 Tn Bn1 = [Bn1 Tn ] .

(5.34)

hence Bn Tn and [Bn1 Tn ]1 . Thus (5.34) states that a lower triangular

72

is only possible if both matrices are asymptotically equivalent to a

(n)

diagonal matrix, say Gn = {gk,k k,j }. Furthermore from (5.34) we have

Gn G1

n

n

o

(n)

|gk,k |2 k,j In .

(5.35)

Since Tn (b) is lower triangular with main diagonal element , Tn (b)1

is lower triangular with all its main diagonal elements equal to 1/ even

(n)

(n)

though the matrix Tn (b)1 is not Toeplitz. Thus gk,k = bk,k /. Since

Tn (f ) is Hermitian, bk,k is real so that taking the trace in (5.35) yields

lim 2 n1

n1

X

(n) 2

bk,k

k=0

= 1.

(5.36)

From (5.30) and Corollary 2.4, and the fact that Tn (b) is triangular

we have that

lim 1 n1

n

n1

X

k=0

(n)

n

n

= 1 = 1

(5.37)

lim |Bn1 Tn In | = 0.

(5.38)

2

Since the only real requirements for the proof were the existence of

the Wiener-Hopf factorization and the limiting behavior of the determinant, this result could easily be extended to the more general case

that ln f () is integrable. The theorem can also be derived as a special

case of more general results of Baxter [1] and is similar to a result of

Rissanen and Barbosa [18].

5.4

73

As a final application of the Toeplitz eigenvalue distribution theorem, we consider a property of a random process that arises in Shannon information theory. Given a random process {Xn } for which a

probability density function fX n (xn ) is for the random vector X n =

(X0 , X1 , . . . , Xn1 ) defined for all positive integers n, the Shannon differential entropy h(X n ) is defined by the integral

Z

n

h(X ) = fX n (xn ) log fX n (xn ) dxn

and the differential entropy rate is defined by the limit

h(X) = lim

1

h(X n )

n

if the limit exists. (See, for example, Cover and Thomas[5].) The logarithm is usually taken as base 2 and the units are bits. We will use the

Toeplitz theorem to evaluate the differential entropy rate of a stationary Gaussian random process.

A stationary zero mean Gaussian random process is completely described by its mean correlation function RX (k, m) = RX (k m) =

E[(Xk m)(Xk m)] or, equivalently, by its power spectral density

function

X

S(f ) =

RX (n)e2inf ,

n=

integer n, the probability density function is

fX n (xn ) =

1

n

n

21 (xn mn )t R1

n (x m )

e

,

(2)n/2 det(Rn )1/2

0, 1, . . . , n1. A straightforward multidimensional integration using the

properties of Gaussian random vectors yields the differential entropy

h(X n ) =

1

log(2e)n detRn .

2

If we now identify the the covariance matrix Rn as the Toeplitz matrix generated by the power spectral density, Tn (S), then from theorem

74

h(X) =

where

1

2

log(2e)

2

(5.39)

Z 2

1

=

ln S(f ) df.

(5.40)

2 0

The Toeplitz distribution theorems have also found application

in more complicated information theoretic evaluations, including the

channel capacity of Gaussian channels [27, 26] and the rate-distortion

functions of autoregressive sources [9, 14]. The discussion by Hashimoto

and Arimoto [14] claims that there is a contradiction between their

results and the results of [9] and implicitly in [27, 26], but the alleged

contradiction arises only because they attempt to apply the equal distribution theorem to a situation where it does not apply. Specifically,

they look at a limiting point where a parameter is 0 while the other

cited results require > 0. Hashimoto and Arimoto essentially demonstrate that the results are discontinuous as 0. Eventually the author and Hashimoto intend to write a correspondence item discussing

the apparent discrepancy.

2

Acknowledgements

of the Philips Research Labs in correcting many typos and errors in

the 1993 revision, Liu Mingyu in pointing out errors corrected in the

1998 revision, Paolo Tilli of the Scuola Normale Superiore of Pisa for

pointing out an incorrect corollary and providing the correction, and

to David Neuhoff of the University of Michigan for pointing out several typographical errors and some confusing notation. For corrections,

comments, and improvements to the 2001 revision thanks are due to

William Trench, John Dattorro, and Young-Han Kim. In particular,

Professor Trench brought the Wielandt-Hoffman theorem and its use

to prove strengthened results to my attention. Section 2.4 largely follows his suggestions, although I take the blame for any introduced

errors. For the 2002 revision, particular thanks to Cynthia Pozun of

ENST for several corrections. For the 2005 revisions, special thanks to

Jean-Francois Chamberland-Tremblay.

75

References

Illinois J. Math., 1962, pp. 97103.

[2] G. Baxter, An Asymptotic Result for the Finite Predictor, Math. Scand., 10,

pp. 137144, 1962.

[3] A. B

ottcher and S.M. Grudsky, Toeplitz Matrices, Asymptotic Linear Algebra,

and Functional Analysis, Birkh

auser, 2000.

[4] A. B

ottcher and B. Silbermann, Introduction to Large Truncated Toeplitz Matrices, Springer, New York, 1999.

[5] T. A. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New

York, 1991.

[6] P. J. Davis, Circulant Matrices, Wiley-Interscience, NY, 1979.

[7] D. Fasino and P. Tilli, Spectral clustering properties of block multilevel Hankel

matrices, Linear Algebra and its Applications, Vol. 306, pp. 155-163, 2000.

[8] F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Co., NY 1960.

[9] R.M. Gray, Information Rates of Autoregressive Processes, IEEE Trans. on

Info. Theory, IT-16, No. 4, July 1970, pp. 412421.

[10] R. M. Gray, On the asymptotic eigenvalue distribution of Toeplitz matrices,

IEEE Transactions on Information Theory, Vol. 18, November 1972, pp. 725

730.

[11] R.M. Gray, On Unbounded Toeplitz Matrices and Nonstationary Time Series

with an Application to Information Theory, Information and Control, 24, pp.

181196, 1974.

[12] U. Grenander and M. Rosenblatt, Statistical Analysis of Stationary Time Series, Wiley and Sons, NY, 1966, Chapter 1.

77

78

References

o, Toeplitz Forms and Their Applications, University

of Calif. Press, Berkeley and Los Angeles, 1958.

[14] T. Hashimoto and S. Arimoto, On the rate-distortion function for the nonstationary Gaussian autoregressive process, IEEE Transactions on Information

Theory, Vol. IT-26, pp. 478-480, 1980.

[15] P. Lancaster, Theory of Matrices, Academic Press, NY, 1969.

[16] J. Pearl, On Coding and Filtering Stationary Signals by Discrete Fourier

Transform, IEEE Trans. on Info. Theory, IT-19, pp. 229232, 1973.

[17] C.L. Rino, The Inversion of Covariance Matrices by Finite Fourier Transforms, IEEE Trans. on Info. Theory, IT-16, No. 2, March 1970, pp. 230232.

[18] J. Rissanen and L. Barbosa, Properties of Infinite Covariance Matrices and

Stability of Optimum Predictors, Information Sciences, 1, 1969, pp. 221236.

[19] W. Rudin, Principles of Mathematical Analysis, McGraw-Hill, NY, 1964.

[20] W. F. Trench, Asymptotic distribution of the even and odd spectra of real

symmetric Toeplitz matrices, Linear Algebra Appl., Vol. 302-303, pp. 155-162,

1999.

[21] W. F. Trench,, Absolute equal distribution of the spectra of Hermitian matrices,

Lin. Alg. Appl., 366 (2003), 417-431.

[22] W. F. Trench, Absolute equal distribution of families of finite sets, Lin. Alg.

Appl. 367 (2003), 131-146.

[23] W. F. Trench, A note on asymptotic zero distribution of orthogonal polynomials,

Lin. Alg. Appl. 375 (2003) 275281

[24] W. F. Trench, Simplification and strengthening of Weyls definition of asymptotic equal distribution of two families of finite sets, Cubo A Mathematical

Journal Vol. 06 N 3 (2004), 4754.

[25] W. F. Trench, Absolute equal distribution of the eigenvalues of discrete Sturm

Liouville problems, J. Math. Anal. Appl., in press.

[26] B.S. Tsybakov, Transmission capacity of memoryless Gaussian vector channels, (in Russian),Probl. Peredach. Inform., Vol 1, pp. 2640, 1965.

[27] B.S. Tsybakov, On the transmission capacity of a discrete-time Gaussian channel with filter, (in Russian),Probl. Peredach. Inform., Vol 6, pp. 7882, 1970.

[28] E.E. Tyrtyshnikov, A unifying approach to some old and new theorems on

distribution and clustering, Linear Algebra and its Applications, Vol. 232, pp.

1-43, 1996.

[29] H. Widom, Toeplitz Matrices, in Studies in Real and Complex Analysis,

edited by I.I. Hirschmann, Jr., MAA Studies in Mathematics, Prentice-Hall,

Englewood Cliffs, NJ, 1965.

[30] A.J. Hoffman and H. W. Wielandt, The variation of the spectrum of a normal

matrix, Duke Math. J., Vol. 20, pp. 37-39, 1953.

[31] James H. Wilkinson, Elementary proof of the Wielandt-Hoffman theorem and

of its generalization, Stanford University, Department of Computer Science

Report Number CS-TR-70-150, January 1970 .

Index

absolutely summable Toeplitz

matrices, 41

analytic function, 16

asymptotic equivalence, 38

asymptotically

absolutely

equally distributed, 21

asymptotically equally distributed, 17, 56

asymptotically equivalent matrices, 11

asymptotically weakly stationary, 64

autocorrelation matrix, 64

autoregressive process, 68

20

characteristic equation, 5

circulant matrix, 2, 25

conjugate transpose, 6, 70

continuous, 17, 21, 22, 33, 39,

41, 48, 49, 52, 5457,

67, 69

continuous complex function,

16

convergence

uniform, 32

Courant-Fischer theorem, 6

covariance matrix, 1, 63, 64

cyclic matrix, 2

cyclic shift, 25

bounded matrix, 10

bounded Toeplitz matrices, 31

DFT, 28

diagonal, 8

differential entropy, 73

79

80

INDEX

discrete time, 63

eigenvalue, 5, 26, 31

eigenvalue distribution theorem, 40, 48

eigenvector, 5, 26

Euclidean norm, 9

factorization, 70

filter, 1

linear time invariant, 63

finite order, 31

finite order Toeplitz matrix, 36

Fourier series, 32

truncated, 44

Fourier transform

discrete, 28

Frobenius norm, 9

function, analystic, 16

Gaussian process, 73

Hermitian, 6

Hilbert-Schmidt norm, 8, 9

identity matrix, 19

impulse respone, 65

information theory, 73

inverse, 29, 31, 53

Kronecker delta, 27

Kronecker delta response, 65

linear difference equation, 26

matrix

bounded, 10

circulant, 2, 25

covariance, 1

cyclic, 2

Hermitian, 6

normal, 6

Toeplitz, 26, 31

matrix, Toeplitz, 1

mean, 64

metric, 8

moments, 15

moving average, 65

noncausal, 65

nonnegative definite, 6

nonsingular, 53

norm, 8

axioms, 10

Euclidean, 9

Frobenius, 9

Hilbert-Schmidt, 8, 9

operator, 8

strong, 8, 9

weak, 8, 9

normal, 6, 8

one-sided, 64

operator norm, 8

polynomials, 16

positive definite, 6

power specral density, 64

power spectral density, 63, 64,

73

probability mass function, 19

product, 29, 31

random process, 63

INDEX

discrete time, 64

Rayleigh quotient, 6

Riemann integrable, 48, 53, 61

Shannon information theory, 73

Shurs theorem, 8

spectrum, 53

square summable, 31

Stone-Weierstrass approximation theorem, 16

Stone-Weierstrass theorem, 40

strictly positive definite, 6

sum, 29

Taylor series, 16

time series, 63

Toeplitz determinant, 60

Toeplitz matrix, 1, 26, 31

Toeplitz matrix, finite order, 31

trace, 8

transpose, 26

triangle inequality, 10

triangular, 63, 66, 68, 70, 72

two-sided, 64

uniform convergence, 32

uniformaly bounded, 38

unitary, 6

upper triangular, 6

weak norm, 9

weakly stationary, 64

asymptotically, 64

white noise, 65

Wielandt-Hoffman theorem, 18,

20

Wiener-Hopf factorization, 63

81

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.