Vous êtes sur la page 1sur 5

INTRODUCTION TO VECTOR AND MATRIX DIFFERENTIATION

Econometrics 2 Heino Bohn Nielsen September 21, 2005

his note expands on appendix A.7 in Verbeek (2004) on matrix dierentiation. We rst present the conventions for derivatives of scalar and vector functions; then we present the derivatives of a number of special functions particularly useful in econometrics, and, nally, we apply the ideas to derive the ordinary least squares (OLS) estimator in the linear regression model. We should emphasize that this note is cursory reading ; the rules for specic functions needed in this course are indicated with a ().

Conventions for Scalar Functions

Let = ( 1 , ..., k )0 be a k 1 vector and let f ( ) = f ( 1 , ..., k ) be a real-valued function that depends on , i.e. f () : Rk 7 R maps the vector into a single number, f ( ). Then the derivative of f () with respect to is dened as f ( ) f ( ) =
1

. . .

f ( ) k

( ) This is a k 1 column vector with typical elements given by the partial derivative f i . Sometimes this vector is referred to as the gradient. It is useful to remember that the derivative of a scalar function with respect to a column vector gives a column vector as the result1 .

(1)

We can note that Wooldridge (2003, p.783) does not follow this convention, and let row vector.

f ( )

be a 1 k

Similarly, the derivative of a scalar function with respect to a row vector yields the 1 k row vector f ( ) f ( ) f ( ) . = 1 k 0

2
Now let

Conventions for Vector Functions


g1 ( ) . . g ( ) = . gn ( )

where each row, i = 1, 2, ..., n, contains the derivative of the scalar function gi () with respect to the elements in . The result is therefore a n k matrix of derivatives with i ( ) typical element (i, j ) given by g j . If the vector function is dened as a row vector, it is natural to take the derivative with respect to the column vector, . We can note that it holds in general that g ( ) 0 (g ( )0 ) = , (3) 0

be a vector function depending on = ( 1 , ..., k )0 , i.e. g () : Rk 7 Rn maps the k 1 vector into a n 1 vector, where gi ( ) = gi ( 1 , ..., k ), i = 1, 2, ..., n, is a real-valued function. Since g () is a column vector it is natural to consider the derivatives with respect to a row vector, 0 , i.e. g ( ) g1 ( ) 1 1 k g ( ) . . .. , . . (2) = . . . 0 gn ( ) n ( ) g
1 k

which in the case above is a k n matrix. Applying the conventions in (1) and (2) we can dene the Hessian matrix of second derivatives of a scalar function f ( ) as 2 f ( ) 2 f ( ) = = 0 0
2 f ( ) 1 1

. . .

.. .

2 f ( ) 1 k

. . .

2 f ( ) k 1

2 f ( ) k k

f ( ) which is a k k matrix with typical elements (i, j ) given by the second derivative . i j Note that it does not matter if we rst take the derivative with respect to the column or the row.

Some Special Functions

First, let c be a k 1 vector and let be a k 1 vector of parameters. Next dene the scalar function f ( ) = c0 , which maps the k parameters into a single number. It holds that (c0 ) = c. () To see this, we can write the function as f ( ) = c0 = c1 1 + c2 2 + ... + ck k . Taking the derivative with respect to yields (c +c +...+c
1 1 2 2

k k )

which is a k 1 vector as expected. Also note that since 0 c = c0 , it holds that 0c = c.

f ( ) =

. . .

(c1 1 +c2 2 +...+ck k ) k

c1 . = . . = c, ck

()

Now, let A be a n k matrix and let be a k 1 vector of parameters. Furthermore dene the vector function g ( ) = A , which maps the k parameters into n function values. g ( ) is an n 1 vector and the derivative with respect to 0 is a n k matrix given by (A ) = A. 0 To see this, write the function as A11 1 + A12 2 + ... + A1k k . , . g ( ) = A = . An1 1 + An2 2 + ... + Ank k
1k k )

()

and nd the derivative (A +...+A


11 1

Similarly, if we consider the transposed function, g ( ) = 0 A0 , which is a 1 n row vector, we can nd the k n matrix of derivatives as 0 A0 = A0 . () This is just an application of the result in (3). 3

g ( ) = 0

. . .

.. .

(A11 1 +...+A1k k ) k

. . .

(An1 1 +...+Ank k ) 1

(An1 1 +...+Ank k ) k

A11 . .. = . . . An1

A1k . . . = A. Ank

Now consider a quadratic function f ( ) = 0 V for some k k matrix V . This function maps the k parameters into a single number. Here we nd the derivatives as the k 1 column vector 0V = (V + V 0 ), () or the row variant 0V = 0 (V + V 0 ). 0 ()

If V is symmetric this reduces to 2V and 2 0 V , respectively. To see how this works, consider the simple case k = 3 and write the function as 1 V11 V12 V13 0V = 1 2 3 V21 V22 V23 2 V31 V32 V33 3 Taking the derivative with respect to , we get ( 0 V ) 0 1 ( 0 V V ) = 2 ( 0 V )
3

2 2 = V11 2 1 + V22 2 + V33 3 + (V12 + V21 ) 1 2 + (V13 + V31 ) 1 3 + (V23 + V32 ) 2 3 .

2V11 1 + (V12 + V21 ) 2 + (V13 + V31 ) 3 = 2V22 2 + (V12 + V21 ) 1 + (V23 + V32 ) 3 2V33 3 + (V13 + V31 ) 1 + (V23 + V32 ) 2 V12 + V21 V13 + V31 1 2V11 = V12 + V21 2V22 V23 + V32 2 V13 + V31 V23 + V32 2V33 3 1 V11 V12 V13 V11 V21 V31 = V21 V22 V23 + V12 V22 V32 2 V31 V32 V33 V13 V23 V33 3 = (V + V 0 ).

The Linear Regression Model

To illustrate the use of matrix dierentiation consider the linear regression model in matrix notation, Y = X + , where Y is a T 1 vector of stacked left-hand-side variables, X is a T k matrix of explanatory variables, is a k 1 vector of parameters to be estimated, and is a T 1 vector of error terms. Here k is the number of explanatory variables and T is the number of observations. 4

One way to motivate the ordinary least squares (OLS) principle is to choose the estibOLS of , as the value that minimizes the sum of squared residuals, i.e. mator, bOLS = arg min
T X t=1 0 b2 t = arg min b b.

Looking at the function to be minimized, we nd that 0 b b Y X b0b = Y X b b0 X 0 Y X = Y0

Solving the k equations,

b and b0 X 0 Y are identical scalar variables. where the last line uses the fact that Y 0 X b yields Note that b0b is a scalar function and taking the rst derivative with respect to the k 1 vector 0 0 Y 2Y 0 X b+ b0 X 0 X b Y bb b = = 2X 0 Y + 2X 0 X . b b
(0)

b b0 X 0 Y + b0 X 0 X b = Y 0Y Y 0X b+ b0 X 0 X , b = Y 0 Y 2Y 0 X

= 0, yields the OLS estimator 0 1 0 b X Y, OLS = X X

provided that X 0 X is non-singular. bOLS is a minimum of b0b and not a maximum, we should formally To make sure that take the second derivative and make sure that it is positive denite. The k k Hessian matrix of second derivatives is given by 0 0 Y + 2X 0 X b 2 2 X bb = = 2X 0 X, b b0 b0 which is a positive denite matrix by construction.

References
[1] Verbeek, Marno (2004): A Guide to Modern Econometrics, Second edition, John Wiley and Sons. [2] Wooldridge, Jeffrey M. (2003): Introductory Econometrics: A Modern Approach, 2nd edition, South Western College Publishing.

Vous aimerez peut-être aussi