Matrix Diff

INTRODUCTION TO VECTOR AND MATRIX DIFFERENTIATION
Econometrics 2 Heino Bohn Nielsen September 21, 2005
his note expands on appendix A.7 in Verbeek (2004) on matrix dierentiation. We rst present the conventions for derivatives of scalar and vector functions; then we present the derivatives of a number of special functions particularly useful in econometrics, and, nally, we apply the ideas to derive the ordinary least squares (OLS) estimator in the linear regression model. We should emphasize that this note is cursory reading ; the rules for specic functions needed in this course are indicated with a ().
Conventions for Scalar Functions
Let = ( 1 , ..., k )0 be a k 1 vector and let f ( ) = f ( 1 , ..., k ) be a real-valued function that depends on , i.e. f () : Rk 7 R maps the vector into a single number, f ( ). Then the derivative of f () with respect to is dened as f ( ) f ( ) =
1
. . .
f ( ) k
( ) This is a k 1 column vector with typical elements given by the partial derivative f i . Sometimes this vector is referred to as the gradient. It is useful to remember that the derivative of a scalar function with respect to a column vector gives a column vector as the result1 .
(1)
We can note that Wooldridge (2003, p.783) does not follow this convention, and let row vector.
f ( )
be a 1 k
Similarly, the derivative of a scalar function with respect to a row vector yields the 1 k row vector f ( ) f ( ) f ( ) . = 1 k 0
2
Now let
Conventions for Vector Functions

g1 ( ) . . g ( ) = . gn ( )
where each row, i = 1, 2, ..., n, contains the derivative of the scalar function gi () with respect to the elements in . The result is therefore a n k matrix of derivatives with i ( ) typical element (i, j ) given by g j . If the vector function is dened as a row vector, it is natural to take the derivative with respect to the column vector, . We can note that it holds in general that g ( ) 0 (g ( )0 ) = , (3) 0
be a vector function depending on = ( 1 , ..., k )0 , i.e. g () : Rk 7 Rn maps the k 1 vector into a n 1 vector, where gi ( ) = gi ( 1 , ..., k ), i = 1, 2, ..., n, is a real-valued function. Since g () is a column vector it is natural to consider the derivatives with respect to a row vector, 0 , i.e. g ( ) g1 ( ) 1 1 k g ( ) . . .. , . . (2) = . . . 0 gn ( ) n ( ) g
1 k
which in the case above is a k n matrix. Applying the conventions in (1) and (2) we can dene the Hessian matrix of second derivatives of a scalar function f ( ) as 2 f ( ) 2 f ( ) = = 0 0
2 f ( ) 1 1
. . .
.. .
2 f ( ) 1 k
. . .
2 f ( ) k 1
2 f ( ) k k
f ( ) which is a k k matrix with typical elements (i, j ) given by the second derivative . i j Note that it does not matter if we rst take the derivative with respect to the column or the row.
Some Special Functions
First, let c be a k 1 vector and let be a k 1 vector of parameters. Next dene the scalar function f ( ) = c0 , which maps the k parameters into a single number. It holds that (c0 ) = c. () To see this, we can write the function as f ( ) = c0 = c1 1 + c2 2 + ... + ck k . Taking the derivative with respect to yields (c +c +...+c
1 1 2 2
k k )
which is a k 1 vector as expected. Also note that since 0 c = c0 , it holds that 0c = c.
f ( ) =
. . .
(c1 1 +c2 2 +...+ck k ) k
c1 . = . . = c, ck
()
Now, let A be a n k matrix and let be a k 1 vector of parameters. Furthermore dene the vector function g ( ) = A , which maps the k parameters into n function values. g ( ) is an n 1 vector and the derivative with respect to 0 is a n k matrix given by (A ) = A. 0 To see this, write the function as A11 1 + A12 2 + ... + A1k k . , . g ( ) = A = . An1 1 + An2 2 + ... + Ank k
1k k )
()
and nd the derivative (A +...+A

11 1
Similarly, if we consider the transposed function, g ( ) = 0 A0 , which is a 1 n row vector, we can nd the k n matrix of derivatives as 0 A0 = A0 . () This is just an application of the result in (3). 3
g ( ) = 0
. . .
.. .
(A11 1 +...+A1k k ) k
. . .
(An1 1 +...+Ank k ) 1
(An1 1 +...+Ank k ) k
A11 . .. = . . . An1
A1k . . . = A. Ank
Now consider a quadratic function f ( ) = 0 V for some k k matrix V . This function maps the k parameters into a single number. Here we nd the derivatives as the k 1 column vector 0V = (V + V 0 ), () or the row variant 0V = 0 (V + V 0 ). 0 ()
If V is symmetric this reduces to 2V and 2 0 V , respectively. To see how this works, consider the simple case k = 3 and write the function as 1 V11 V12 V13 0V = 1 2 3 V21 V22 V23 2 V31 V32 V33 3 Taking the derivative with respect to , we get ( 0 V ) 0 1 ( 0 V V ) = 2 ( 0 V )
3
2 2 = V11 2 1 + V22 2 + V33 3 + (V12 + V21 ) 1 2 + (V13 + V31 ) 1 3 + (V23 + V32 ) 2 3 .
2V11 1 + (V12 + V21 ) 2 + (V13 + V31 ) 3 = 2V22 2 + (V12 + V21 ) 1 + (V23 + V32 ) 3 2V33 3 + (V13 + V31 ) 1 + (V23 + V32 ) 2 V12 + V21 V13 + V31 1 2V11 = V12 + V21 2V22 V23 + V32 2 V13 + V31 V23 + V32 2V33 3 1 V11 V12 V13 V11 V21 V31 = V21 V22 V23 + V12 V22 V32 2 V31 V32 V33 V13 V23 V33 3 = (V + V 0 ).
The Linear Regression Model
To illustrate the use of matrix dierentiation consider the linear regression model in matrix notation, Y = X + , where Y is a T 1 vector of stacked left-hand-side variables, X is a T k matrix of explanatory variables, is a k 1 vector of parameters to be estimated, and is a T 1 vector of error terms. Here k is the number of explanatory variables and T is the number of observations. 4
One way to motivate the ordinary least squares (OLS) principle is to choose the estibOLS of , as the value that minimizes the sum of squared residuals, i.e. mator, bOLS = arg min
T X t=1 0 b2 t = arg min b b.
Looking at the function to be minimized, we nd that 0 b b Y X b0b = Y X b b0 X 0 Y X = Y0
Solving the k equations,
b and b0 X 0 Y are identical scalar variables. where the last line uses the fact that Y 0 X b yields Note that b0b is a scalar function and taking the rst derivative with respect to the k 1 vector 0 0 Y 2Y 0 X b+ b0 X 0 X b Y bb b = = 2X 0 Y + 2X 0 X . b b
(0)
b b0 X 0 Y + b0 X 0 X b = Y 0Y Y 0X b+ b0 X 0 X , b = Y 0 Y 2Y 0 X
= 0, yields the OLS estimator 0 1 0 b X Y, OLS = X X
provided that X 0 X is non-singular. bOLS is a minimum of b0b and not a maximum, we should formally To make sure that take the second derivative and make sure that it is positive denite. The k k Hessian matrix of second derivatives is given by 0 0 Y + 2X 0 X b 2 2 X bb = = 2X 0 X, b b0 b0 which is a positive denite matrix by construction.
References
[1] Verbeek, Marno (2004): A Guide to Modern Econometrics, Second edition, John Wiley and Sons. [2] Wooldridge, Jeffrey M. (2003): Introductory Econometrics: A Modern Approach, 2nd edition, South Western College Publishing.

Matrix Diff

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Matrix Diff

Transféré par

Droits d'auteur :

Formats disponibles

INTRODUCTION TO VECTOR AND MATRIX DIFFERENTIATION

Econometrics 2 Heino Bohn Nielsen September 21, 2005

Conventions for Scalar Functions

Conventions for Vector Functions

Some Special Functions

which is a k 1 vector as expected. Also note that since 0 c = c0 , it holds that 0c = c.

(c1 1 +c2 2 +...+ck k ) k

and nd the derivative (A +...+A

2 2 = V11 2 1 + V22 2 + V33 3 + (V12 + V21 ) 1 2 + (V13 + V31 ) 1 3 + (V23 + V32 ) 2 3 .

The Linear Regression Model

Looking at the function to be minimized, we nd that 0 b b Y X b0b = Y X b b0 X 0 Y X = Y0

Solving the k equations,

= 0, yields the OLS estimator 0 1 0 b X Y, OLS = X X

Vous aimerez peut-être aussi