Vous êtes sur la page 1sur 11

MAHALANOBIS DISTANCE

Def. The euclidian distance between two points x = (x1, . . . , xp)t and y = (y1, . . . , yp)t in the p-dimensional space Rp is dened as dE (x, y) = (x1 y1 )2 + + (xp yp )2 =
2

(x y)t (x y)

and dE (x, 0) = x norm of x.

x2 1

+ +

x2 p

= xt x is the euclidian

It follows immediately that all points with the same distance of the origin x 2 = c satisfy x2 + + x2 = c2 which is the equation 1 p of a spheroid. This means that all components of an observation x contribute equally to the euclidian distance of x from the center. However in statistics we prefer a distance that for each of the components (the variables) takes the variability of that variable into accountwhen determining its distance from the center. Components with high variability should receive less weight than components with low variability. This can be obtained by rescaling the components Denote x1 xp y1 yp u = ( , . . . , ) and v = ( , . . . , ) s1 sp s1 sp then dene the distance between x and y as d(x, y) = dE (u, v) = ( x1 y1 2 xp yp 2 ) + + ( ) = s1 sp (x y)t D1 (x y)

where D = diag(s2, . . . , s2 ). Now the norm of x equals 1 p x = d(x, 0) = dE (u, 0) = u


2

x1 2 xp ) + + ( )2 = xtD1 x s1 sp

and all points with the same distance of the origin x = c satisfy ( x1 2 xp ) + + ( )2 = c2 s1 sp

which is the equation of an ellipsoid centered at the origin with principal axes equal to the coordinate axes. Finally, we also want to take the correlation between variables into account when computing statistical distances. Correlation means that there are associations between the variables. Therefore, we want the axes of ellipsoid to reect this correlation. This is obtained by allowing the axes of the ellipsoid at constant distance to rotate. This yields the following general form for the statistical distance of two points Def. The statistical distance or Mahalanobis distance between two points x = (x1, . . . , xp)t and y = (y1, . . . , yp)t in the pdimensional space Rp is dened as (x y)t S 1(x y) and dS (x, 0) = x S = xtS 1 x is the norm of x. dS (x, y) = Points with the same distance of the origin x xtS 1x = c2 which is the general equation of an ellipsoid centered at the origin. In general the center of the observations will dier from the origin and we will be interested in the distance of an observation from its center x given by dS (x, x) = (x x)t S 1(x x). = c satisfy

Result 1. Consider any three p-dimensional observations x, y and z of a p-dimensional random variable X = (x1, . . . , Xp)t . The Mahalanobis distance satises the following properties dS (x, y) = dS (y, x) dS (x, y) > 0 if x = y dS (x, y) = 0 if x = y dS (x, y) dS (x, z) + dS (z, y) (triangle inequality)

MATRIX ALGEBRA

Def. A p-dimensional square matrix Q is orthogonal if QQt = Qt Q = Ip or equivalently Qt = Q1 This implies that the rows of Q have unit norms and are orthogonal. The columns have the same property. Def. A p-dimensional square matrix A has an eigenvalue with corresponding eigenvector x = 0 if Ax = x If the eigenvector x is normalized, which means that x = 1, then we will denote the normalized eigenvector by e.

Result 1. A symmetric p-dimensional square matrix A has p pairs of eigenvalues and eigenvectors (1, e1), . . . , (p, ep). The eigenvectors can be chosen to be normalized (et e1 = = 1 et ep = 1) and orthogonal (et ej = 0 if i = j). If all eigenvalues p i are dierent, then the eigenvectors are unique.

Result 2. Spectral decomposition The spectral decompositon of a p-dimensional symmetric square matrix A is given by A = 1 e1 et + + p ep et 1 p where (1, e1 ), . . . , (p, ep ) are the eigenvalue/normalized eigenvector pairs of A. Example Consider the symmetric matrix 13 4 2 A = 4 13 2 2 2 10 From the characteristic equation |AI3 | = 0 we obtain the eigenvalues 1 = 9, 2 = 9, and 3 = 18 The corresponding normalized eigenvectors are solutions of the equations Aei = i ei for i = 1, 2, 3. For example, with e3 = (e13, e23, e33)t the equation Ae3 = 3 e3 gives 13e13 4e23 + 2e33 = 18e13 4e13 + 13e23 2333 = 18e23 2e13 2e23 + 10e33 = 18e33 Solving this system of equations yields the normalized eigenvector e3 = (2/3, 2/3, 1/3). For the other eigenvalue 1 = 2 = 9 the corresponding eigenvectors are not unique. orthogonal pair is An t given by e1 = (1/ 2, 1/ 2, 0) and e2 = (1/ 18, 1/ 18, 4/ 18). With these solutions it can now easily be veried that A = 1 e1 et + 2 e2 et + 3 e3 et 1 2 3 Def. A symmetric p p matrix A is called nonnegative denite if 0 xt Ax for all x Rp . A is called positive denite if 0 < xt Ax for all x = 0.

It follows (from the spectral decomposition) that A is positive denite if and only if all eigenvalues of A are strictly positive and A is nonnegative denite if and only if all eigenvalues are greater than or equal to zero. Remark The Mahalanobis distance of a point was dened as d2 (x, 0) = xt S 1x which does implies that all eigenvalues of the S symmetric matrix S 1 have to be positive. From the spectral decomposition we obtain that a symmetric positive denite p-dimensional square matrix A equals
p

A=
i=1

i ei et = P P t i

with P = (e1 , . . . , ep ) a p-dimensional square matrix whose columns are the normalized eigenvectors of A and = diag(1 , . . . , p) a p-dimensional diagonal matrix whose diagonal elements are the eigenvalues of A. Note that P is an orthogonal matrix. It follows that p 1 t A1 = P 1 P t = ei e i i i=1 and we dene the square root of A by A
1/2 p

=
i=1

i ei et = P 1/2P t i

Result 3. The square root of a symmetric, positive denite pp matrix A has the following properties (A1/2)t = A1/2 (that is, A1/2 is symmetric). A1/2A1/2 = A. A1/2 = (A1/2)1 =
p 1 t i=1 i ei ei

= P 1/2P t

A1/2A1/2 = A1/2A1/2 = Ip A1/2A1/2 = A1

Result 4. Cauchy-Schwarz inequality Let b, d Rp be two p-dimensional vectors, then we have that (bt d)2 (bt b)(dtd) with equality if and only if there exists a constant c R such that b = cd.

Result 5. Extended Cauchy-Schwarz inequality Let b, d Rp be two p-dimensional vectors and B a pdimensional positive denite matrix, then (bt d)2 (bt Bb)(dtB 1d) with equality if and only if there exists a constant c R such that b = cB 1 d. Proof. The result is obvious if b = 0 or d = 0. For other cases we use that bt d = bt B 1/2B 1/2d = (B 1/2b)t (B 1/2d) and apply the previous result to (B 1/2b) and (B 1/2d). Result 6. Maximization Lemma Let B be a p-dimensional positive denite matrix and d Rp a p-dimensional vector, then (xtd)2 max t = dt B 1d x=0 x Bx with the maximum attained when there exists a constant c = 0 such that x = cB 1 d Proof. From the previous result, we have that (xtd)2 (xtBx)(dtB 1 d)

Because x = 0 and B positive denite, xt Bx > 0, which yields (xtd)2 dt B 1d t Bx x for all x = 0. From the extended Cauchy-Schwarz inequality we know that the maximum is attained for x = cB 1d. Result 7. Maximization of Quadratic forms Let B be a p-dimensional positive denite matrix with eigenvalues 1 2 p > 0 and associated normalized eigenvectors e1 , . . . , ep . Then xt Bx = 1 (attained when x = e1 ) max t x=0 x x xt Bx min t = p (attained when x = ep ) x=0 x x More general, xtBx max = k+1 (attained when x = ek+1 , k = 1, . . . , p 1) xe1 ,....ek xt x Proof. We will proof the rst result. B = P P t and denote y = P t x. Then x = 0 implies y = 0 and xtBx y t y = t = xt x yy
p 2 i=1 i yi p 2 i=1 yi

Now take x = e1 , then y = P t e1 = (1, 0, . . . , 0)t such that y t y/y t y = 1 . Remark Note that since xt Bx max t = max xt Bx x=0 x x x =1 the prevoius results shows that 1 is the maximal value and p is the smallest value of the quadratic form xtBx on the unit sphere.

Result 8. Singular Value Decomposition Let A be a (mk) matrix. Then there exist an mm orthogonal matrix U , a k k orthogonal matrix V and an m k matrix with entries (i, i) equal to i 0 for i = 1, . . . , r = min(m, k) and all other entries zero such that A = U V =
i=1 t r t i uivi .

The positive constants i are called the singular values of A. The (2, ui) are the eigenvalue/eigenvector pairs of AAt with i r+1 = = m = 0 if m > k and then vi = 1At ui. i 2 Alternatively, (i , vi) are the eigenvalue/eigenvector pairs of At A with r+1 = = k = 0 if k > m

RANDOM VECTORS

Suppose that X = (X1 , . . . , Xp )t is a p-dimensional vector of random variables, also called a random vector. Each of the components of X is a univariate random variable Xj (j = 1, . . . , p) with its own marginal distribution having expected value j = E[Xj ] 2 and variance j = E[(Xj j )2]. The expected value of X is then dened as the vector of expected values of its components, that is E[X] = (E[X1], . . . , E[Xp])t = (1, . . . , p )t = . The population covariance matrix of X is dened as Cov[X] = E[(X )(X )t ] = .
2 That is, the diagonal elements of equal E[(Xj j )2 ] = j . The o-diagonal elements of equal E[(Xj j )(Xk k )] = Cov(Xj , Xk ) (j = k) = jk , the covariance between the variables 2 Xj and Xk . Note that jj = j . The population correlation between two variables Xj and Xk is dened as jk jk = j k

and measures the amount of linear association between the two variables. The population correlation matrix of X is then dened as 1 12 . . . 1p 21 1 . . . 2p 1 1 = . . . . . . = V V . . . . . . p1 p2 . . . 1 with V = diag(1, . . . , p). It follows that = V V

Result 1. Linear combinations of random vectors Consider X a p-dimensional random vector and c Rp then ct X is a one-dimensional random variable with E[ct X] = ct Var[ct X] = ct c In general, if C Rqp then CX is a q-dimensional random vector with E[CX] = C Cov[CX] = CC t

Vous aimerez peut-être aussi