Académique Documents
Professionnel Documents
Culture Documents
Matrix calculus
From too much study, and from extreme passion, cometh madnesse.
− Isaac Newton [179, §5]
while the second-order gradient of the twice differentiable real function with respect to its
vector argument is traditionally called the Hessian ;
∂ 2f (x) ∂ 2f (x) ∂ 2f (x)
2 ∂x1 ∂x2 · · · ∂x1 ∂xK
∂x1
∂ 2f (x) ∂ 2
f (x) ∂ 2f (x)
2
· · · ∈ SK
∇ f (x) , ∂x2
..
∂x 1 ∂x 2
.. 2
∂x 2 ∂x
..
K (1956)
..
2 . . . .
∂ f (x) ∂ 2f (x) ∂ 2f (x)
∂xK ∂x1 ∂xK ∂x2 · · · ∂x 2
K
Dattorro, Convex Optimization Euclidean Distance Geometry 2ε, Mεβoo, v2018.09.21. 549
550 APPENDIX D. MATRIX CALCULUS
∂hN (x)
∇ ∂h∂x
1 (x)
∇ ∂h∂x
2 (x)
1 1
··· ∇ ∂x1
∇ ∂h1 (x) ∂hN (x)
2 ∇ ∂h∂x 2 (x)
··· ∇
∇ h(x) , ∂x2 ∂x2
2
.. .. ..
. . . (1960)
∂h1 (x) ∂h2 (x) ∂hN (x)
∇ ∂xK ∇ ∂xK ··· ∇ ∂xK
where the gradient of each real entry is with respect to vector x as in (1955).
The gradient of real function g(X) : RK×L→ R on matrix domain is
where gradient ∇X(:, i) is with respect to the i th column of X . The strange appearance of
(1961) in RK×1×L is meant to suggest a third dimension perpendicular to the page (not
a diagonal matrix). The second-order gradient has representation
D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derived from mater
meaning mother.
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 551
and so on.
552 APPENDIX D. MATRIX CALCULUS
Because gradient of the product (1969) requires total change with respect to change in
each entry of matrix X , the Xb vector must make an inner product with each vector in
that second dimension of the cubix indicated by dotted line segments;
a1 0
0 a1
· ¸
b1 X11 + b2 X12
∈ R2×1×2
∇X (X Ta) Xb =
a
2 0 b1 X21 + b2 X22
0 a2 (1973)
· ¸
a1 (b1 X11 + b2 X12 ) a1 (b1 X21 + b2 X22 )
= ∈ R2×2
a2 (b1 X11 + b2 X12 ) a2 (b1 X21 + b2 X22 )
= abTX T
where the cubix appears as a complete 2 × 2 × 2 matrix. In like manner for the second
term ∇X (g) f
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 553
b1 0
b2 0
· ¸
X11 a1 + X21 a2
∈ R2×1×2
T
∇X (Xb) X a =
0 b1 X12 a1 + X22 a2 (1974)
0 b2
= X TabT ∈ R2×2
The solution
∇X aTX 2 b = abTX T + X TabT (1975)
can be found from Table D.2.1 or verified using (1968). 2
∇X g f (X)T , h(X)T = ∇X f T ∇f g + ∇X hT ∇h g
¡ ¢
(1987)
where ek is the k th standard basis vector in RK while el is the l th standard basis vector in
RL . Total number of partial derivatives equals KLM N while the gradient is defined in
their terms; mn th entry of the gradient is
∂gmn (X) ∂gmn (X) ∂gmn (X)
∂X11 ∂X12 ··· ∂X1L
∂gmn (X) ∂gmn (X) ∂gmn (X)
∂X21 ∂X22 ··· ∂X2L ∈ RK×L
∇gmn (X) = (1994)
.. .. ..
. . .
∂gmn (X) ∂gmn (X) ∂gmn (X)
∂XK1 ∂XK2 ··· ∂XKL
which may be interpreted as the change in gmn at X when the change in Xkl is equal
to Ykl the kl th entry of any Y ∈ RK×L . Because the total change in gmn (X) due to Y is
the sum of change with respect to each and every Xkl , the mn th entry of the directional
derivative is the corresponding total differential [393, §15.8]
556 APPENDIX D. MATRIX CALCULUS
X ∂gmn (X)
Ykl = tr ∇gmn (X)T Y
¡ ¢
dgmn (X)|dX→Y = (1998)
∂Xkl
k, l
X gmn (X + ∆t Ykl ek eT
l ) − gmn (X)
= lim (1999)
∆t→0 ∆t
k, l
gmn (X + ∆t Y ) − gmn (X)
= lim (2000)
¯
∆t→0 ∆t
d¯ ¯
= gmn (X + t Y ) (2001)
dt ¯t=0
where t ∈ R . Assuming finite Y , equation (2000) is called the Gâteaux differential
[43, App.A.5] [230, §D.2.1] [405, §5.28] whose existence is implied by existence of the
Fréchet differential (the sum in (1998)). [285, §7.2] Each may be understood as the change
in gmn at X when the change in X is equal in magnitude and direction to Y .D.2 Hence
the directional derivative,
dg11 (X) dg12 (X) · · · dg1N (X)
¯
¯
¯
→Y dg21 (X) dg22 (X) · · · dg2N (X) ¯
dg (X) , ¯ ∈ RM ×N
¯
.. .. ..
. . . ¯
¯
dg (X) dg (X) · · · dg (X) ¯
M1 M2 MN dX→Y
Yet for all X ∈ dom g , any Y ∈ RK×L , and some open interval of t ∈ R
→Y
g(X + t Y ) = g(X) + t dg (X) + O(t2 ) (2004)
which is the first-order multidimensional Taylor series expansion about X . [393, §18.4]
[177, §2.3.4] Differentiation with respect to t and subsequent t-zeroing isolates the second
term of expansion. Thus differentiating and zeroing g(X + t Y ) in t is an operation
equivalent to individually differentiating and zeroing every entry gmn (X + t Y ) as in
(2001). So the directional derivative of g(X) : RK×L→ RM ×N in any direction Y ∈ RK×L
evaluated at X ∈ dom g becomes
¯
→Y d ¯¯
dg (X) = g(X + t Y ) ∈ RM ×N (2005)
dt ¯t=0
D.2 Although Y is a matrix, we may regard it as a vector in RKL .
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 557
υ ✡T
✡
f (α + t y) ✡
✡
✡
(α , f (α))✡
∇x f (α)
✡ f (x)
υ , ✡
→∇x f (α)
1 ✡
2 df(α) ✡
∂H
[315, §2.1, §5.4.5] [36, §6.3.1] which is simplest. In case of a real function g(X) : RK×L→ R
→Y
dg (X) = tr ∇g(X)T Y
¡ ¢
(2027)
In case g(X) : RK → R
→Y
dg (X) = ∇g(X)T Y (2030)
Unlike gradient, directional derivative does not expand dimension; directional
derivative (2005) retains the dimensions of g . The derivative with respect to t makes
the directional derivative resemble ordinary calculus (§D.2); e.g, when g(X) is linear,
→Y
dg (X) = g(Y ). [285, §7.2]
→X−X ⋆
df (X) ≥ 0 (2006)
⋄
∇ ∂g(X)
∂X11 ∇ ∂g(X)
∂X12 ··· ∇ ∂g(X)
∂X1L
∂g(X)
∇ ∂X21 ∇ ∂g(X) ··· ∇ ∂g(X)
2
∇ g(X) T1
= ..
∂X22
..
∂X2L
.. ∈ RK×L×M ×N ×K×L (2015)
. . .
∇ ∂g(X)
∂XK1 ∇ ∂g(X)
∂XK2 ··· ∂g(X)
∇ ∂XKL
∂∇g(X) ∂∇g(X) ∂∇g(X)
∂X11 ∂X12 ··· ∂X1L
∂∇g(X) ∂∇g(X) ∂∇g(X)
···
2
∇ g(X) T2
= ∂X21 ∂X22 ∂X2L ∈ RK×L×K×L×M ×N (2016)
.. .. ..
. . .
∂∇g(X) ∂∇g(X) ∂∇g(X)
∂XK1 ∂XK2 ··· ∂XKL
Assuming the limits to exist, we may state the partial derivative of the mn th entry of g
with respect to kl th and ij th entries of X ;
(gmn (X+∆t Ykl ek eTl +∆τ Yij ei eTj )−gmn (X+∆t Ykl ek eTl ))− (gmn (X+∆τ Yij ei eTj )−gmn (X))
= lim ∆τ ∆t
∆τ,∆t→0
X X ∂ 2gmn (X) ³ ¢T ´
d 2gmn (X)|dX→Y = Ykl Yij = tr ∇X tr ∇gmn (X)T Y Y
¡
(2019)
i,j
∂Xkl ∂Xij
k, l
X ∂gmn (X + ∆t Y ) − ∂gmn (X)
= lim Yij (2020)
i,j
∆t→0 ∂Xij ∆t
gmn (X + 2∆t Y ) − 2gmn (X + ∆t Y ) + gmn (X)
= lim (2021)
∆t→0 ∆t2
2 ¯
¯
d ¯
= gmn (X + t Y ) (2022)
dt2 ¯t=0
Hence the second directional derivative,
2
d g11 (X) d 2g12 (X) ··· d 2g1N (X)
¯
¯
¯
→Y d 2g21 (X) d 2g22 (X) ··· d 2g2N (X)
¯
dg 2(X) , ¯ ∈ RM ×N
¯
.. .. ..
. . . ¯
¯
d 2gM 1 (X) d 2gM 2 (X) ··· 2
d gMN (X) ¯dX→Y
³ ¢T ´ ³ ¢T ´ ³ ¢T ´
tr ∇tr ∇g11 (X)T Y Y tr ∇tr ∇g12 (X)T Y Y · · · tr ∇tr ∇g1N (X)T Y Y
¡ ¡ ¡
³ ¢T ´ ³ ¢T ´ ³ ¢T ´
tr ∇tr ∇g21 (X)T Y Y tr ∇tr ∇g22 (X)T Y Y · · · tr ∇tr ∇g2N (X)T Y Y
¡ ¡ ¡
=
.. .. ..
³ . . .
¡ T
¢T ´ ³ ¡ T
¢T ´ ³ ¡ T
¢T ´
tr ∇tr ∇gM 1 (X) Y Y tr ∇tr ∇gM 2 (X) Y Y · · · tr ∇tr ∇gMN (X) Y Y
∂ 2g1N (X)
PP 2
∂ 2g12 (X)
∂ g11 (X) PP PP
Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· ∂Xkl ∂Xij Ykl Yij
i,j k, l ∂Xkl ∂Xij i,j k, l i,j k, l
2
P P ∂ 2g21 (X) PP 2
∂ g22 (X) PP ∂ g2N (X)
∂Xkl ∂Xij Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· ∂Xkl ∂Xij Ykl Yij
=
i,j k, l i,j k, l i,j k, l
(2023)
.. .. ..
. . .
P P ∂ 2gM 1 (X) PP ∂ 2gM 2 (X) P P ∂ 2gMN (X)
∂Xkl ∂Xij Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· Ykl Yij
∂Xkl ∂Xij
i,j k, l i,j k, l i,j k, l
Yet for all X ∈ dom g , any Y ∈ RK×L , and some open interval of t ∈ R
→Y 1 2 →Y2
g(X + t Y ) = g(X) + t dg (X) + t dg (X) + O(t3 ) (2025)
2!
which is the second-order multidimensional Taylor series expansion about X . [393, §18.4]
[177, §2.3.4] Differentiating twice with respect to t and subsequent t-zeroing isolates the
third term of the expansion. Thus differentiating and zeroing g(X + t Y ) in t is an
operation equivalent to individually differentiating and zeroing every entry gmn (X + t Y )
as in (2022). So the second directional derivative of g(X) : RK×L→ RM ×N becomes
[315, §2.1, §5.4.5] [36, §6.3.1]
→Y
d 2 ¯¯
¯
dg (X) = 2 ¯ g(X + t Y ) ∈ RM ×N
2
(2026)
dt t=0
which is again simplest. (confer (2005)) Directional derivative retains the dimensions of g .
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 561
→Y ³ ¢T ´
µ
→Y
¶
2 T T
¡
dg (X) = tr ∇X tr ∇g(X) Y Y = tr ∇X dg (X) Y (2028)
à !
→Y µ ³ ¢T ´T
¶ →Y
3 T 2 T
¡
dg (X) = tr ∇X tr ∇X tr ∇g(X) Y Y Y = tr ∇X dg (X) Y (2029)
→Y
dg (X) = Y T ∇ 2 g(X)Y
2
(2031)
→Y ¢T
dg (X) = ∇X Y T ∇ 2 g(X)Y Y
3
¡
(2032)
and so on.
→Y 1 2 →Y2 1 3 →Y3
g(X + µY ) = g(X) + µ dg (X) + µ dg (X) + µ dg (X) + O(µ4 ) (2033)
2! 3!
or on some open interval of kY k2
→Y −X 1 →Y2 −X 1 →Y3 −X
g(Y ) = g(X) + dg(X) + dg (X) + dg (X) + O(kY k4 ) (2034)
2! 3!
which are third-order expansions about X . The mean value theorem from calculus is what
insures finite order of the series. [393] [44, §1.1] [43, App.A.5] [230, §0.4] These somewhat
unbelievable formulaeD.3 imply that a function can be determined over the whole of its
domain by knowing its value and all its directional derivatives at a single point X .
→Y
d 2 ¯¯
¯
dg (X) = 2 ¯ g(X + t Y ) = 2X −1 Y X −1 Y X −1
2
(2036)
dt t=0
2
D.3 e.g, real continuous and differentiable function of real variable f (x) = e−1/x has no Taylor series
expansion about x = 0 , of any practical use, because each derivative equals 0 there.
562 APPENDIX D. MATRIX CALCULUS
→Y
d 3 ¯¯
¯
dg (X) = 3 ¯ g(X + t Y ) = −6X −1 Y X −1 Y X −1 Y X −1
3
(2037)
dt t=0
Let’s find the Taylor series expansion of g about X = I : Since g(I ) = I , for kY k2 < 1
(µ = 1 in (2033))
If Y is small, (X + Y )−1 ≈ X −1 − X −1 Y X −1 . 2
D.1.8.1 first-order
d
tr ∇X gmn (X + t Y )T Y = gmn (X + t Y )
¡ ¢
(2041)
dt
which is valid at t = 0 , of course, when X ∈ dom g . In the important case of a real
function g(X) : RK×L→ R , from (2027) we have simply
d
tr ∇X g(X + t Y )T Y = g(X + t Y )
¡ ¢
(2042)
dt
d
∇X g(X + t Y )T Y = g(X + t Y ) (2043)
dt
D.4 Had we instead set g(Y ) = (I + Y )−1 , then the equivalent expansion would have been about X = 0.
D.5 Justified by replacing X with X + t Y in (1998)-(2000); beginning,
X ∂gmn (X + t Y )
dgmn (X + t Y )|dX→Y = Ykl
k, l
∂Xkl
D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES 563
tr ∇X g(X + t Y )T Y = tr 2wwT(X T + t Y T )Y
¡ ¢ ¡ ¢
(2044)
T T T
= 2w (X Y + t Y Y )w (2045)
d d T
g(X + t Y ) = w (X + t Y )T (X + t Y )w (2046)
dt dt ¡
= wT X T Y + Y TX + 2t Y T Y w
¢
(2047)
T T T
= 2w (X Y + t Y Y )w (2048)
tr ∇X g(X + t Y )T Y 2wT¡(X T Y + t Y T Y )w ¢
¡ ¢
=
= 2 tr wwT(X T + t Y T )Y
tr ∇X g(X)T Y 2 tr wwTX T Y (2049)
¡ ¢ ¡ ¢
=
⇔
∇X g(X) = 2XwwT
2
D.1.8.2 second-order
Likewise removing the evaluation at t = 0 from (2026),
→Y
2 d2
dg (X + t Y ) = g(X + t Y ) (2050)
dt2
we can find a similar relationship between second-order gradient and second derivative: In
the general case g(X) : RK×L→ RM ×N from (2019) and (2022),
³ ¢T ´ d2
tr ∇X tr ∇X gmn (X + t Y )T Y Y = 2 gmn (X + t Y )
¡
(2051)
dt
From (2031), the simpler case, where real function g(X) : RK → R has vector argument,
d2
Y T ∇X2 g(X + t Y )Y = g(X + t Y ) (2053)
dt2
∇ 2 g(X)kl = ∇h(X)kl = − X −1 ek eT −1
∈ RK×K
¡ ¢
l X (2060)
2
From all these first- and second-order expressions, we may generate new ones by evaluating
both sides at arbitrary t (in some open interval) but only after differentiation.
D.2.1 algebraic
T
∇x (Ax
¡ T − b) T=¢ A
∇x x A − b = A
∇x xTAx + 2xTB y + y TC y = ¢A + AT x + 2B y
¡ ¡¢ ¢
+ y)TA(x + y) = A +¢AT (x + y)
¡
∇x (x
∇x x Ax + 2xTB y + y TC y = A + AT
2
¡ T
∇X aTX −1 b = −X −T abT X −T
confer
∂X −1
∇X (X −1 )kl = = −X −1 ek eT
l X −1
, (1996)
∂Xkl
(2060)
∇x aTxTxb = 2xaT b ∇X aTX TXb = X(abT + baT )
algebraic continued
d
dt (X + tY ) = Y
d T
dt B (X + t Y )−1 A = −B T (X + t Y )−1 Y (X + t Y )−1 A
d T
dt B (X + t Y )−TA = −B T (X + t Y )−T Y T (X + t Y )−TA
d T
dt B (X + t Y )µ A = . . . , −1 ≤ µ ≤ 1 , X , Y ∈ SM+
d2
dt2
B T (X + t Y )−1 A = 2B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
d3 −1
dt3
B T (X + tY ) A = −6B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
d
(X + t Y )TA(X + t Y ) = Y TAX + X TAY + 2 t Y TAY
¡ ¢
dt ¡
d2
(X + t Y )TA(X + t Y ) = 2 Y TAY
¢
dt2¡ ¢−1
d T
dt (X¡+ t Y ) A(X + t Y ) ¢−1 T ¢−1
= − (X + t Y ) A(X + t Y ) (Y AX + X TAY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
T
¡
d
dt ((X + t Y )A(X + t Y )) = YAX + XAY + 2 t YAY
d2
dt2
((X + t Y )A(X + t Y )) = 2 YAY
2 T 2 T T T T
∇vec X tr(A XBX ) = ∇vec X vec(X) (B ⊗ A) vec X = B ⊗ A + B ⊗ A (1977)
D.2. TABLES OF GRADIENTS AND DERIVATIVES 567
D.2.3 trace
∇x µ x = µI ∇X tr µX = ∇X µ tr X = µI
d −1
∇x 1T δ(x)−1 1 = dx x = −x−2 ∇X tr X −1 = −X −2T
∇x 1 δ(x) y = −δ(x)−2 y
T −1
∇X tr(X −1 Y ) = ∇X tr(Y X −1 ) = −X −T Y TX −T
d µ
dx x = µx µ−1 ∇X tr X µ = µX µ−1 , X ∈ SM
∇X tr X j = jX (j−1)T
¢T
∇x (b − aTx)−1 = (b − aTx)−2 a ∇X tr (B − AX)−1 = (B − AX)−2 A
¡ ¢ ¡
∇X tr (X + Y )T (X + Y ) = 2(X + Y ) = ∇X kX + Y k2F
¡ ¢
trace continued
d d
dt tr g(X + t Y ) = tr dt g(X + t Y ) [234, p.491]
d
dt tr(X + t Y ) = tr Y
d
dt tr j(X + t Y ) = j tr j−1(X + t Y ) tr Y
d
tr(X + t Y )j = j tr (X + t Y )j−1 Y
¡ ¢
dt (∀ j)
d
dt tr((X + t Y )Y ) = tr Y 2
d d
tr (X + t Y )k Y = tr(Y (X + t Y )k ) = k tr (X + t Y )k−1 Y 2 ,
¡ ¢ ¡ ¢
dt dt k ∈ {0, 1, 2}
k−1
d d
tr (X + t Y )k Y = tr(Y (X + t Y )k ) = tr (X + t Y )i Y (X + t Y )k−1−i Y
¡ ¢ P
dt dt
i=0
d
tr¡(X + t Y )−1 Y ¢ = − tr¡(X + t Y )−1 Y (X + t Y )−1 Y ¢
¡ ¢ ¡ ¢
dt
d
dt tr¡B T (X + t Y )−1 A¢ = − tr¡B T (X + t Y )−1 Y (X + t Y )−1 A ¢
d
dt tr¡B T (X + t Y )−TA ¢ = − tr B T (X + t Y )−T Y T (X + t Y )−TA
d
dt tr B T (X + t Y )−k A = . . . , k>0
d
tr B T (X + t Y )µ A = . . . , −1 ≤ µ ≤ 1 , X , Y ∈ SM
¡ ¢
dt +
d2
tr B T (X + t Y )−1 A = 2 tr B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
¡ ¢ ¡ ¢
dt2
d
(X + t Y )TA(X + t Y ) = tr Y TAX + X TAY + 2 t Y TAY
¡ ¢ ¡ ¢
dt tr ¡
d2 T
¢ ¡ T ¢
dt2
tr (X + t Y ) A(X + t Y ) = 2 tr Y AY
³¡ ´
d −1
+ t Y )TA(X + t Y )
¢
dt tr (X ³¡
T
¢−1 T ¢−1 ´
(Y AX + X AY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
T
¡
= − tr (X + t Y ) A(X + t Y )
d
dt tr((X + t Y )A(X + t Y )) = tr(YAX + XAY + 2 t YAY )
d2
dt2
tr((X + t Y )A(X + t Y )) = 2 tr(YAY )
D.2. TABLES OF GRADIENTS AND DERIVATIVES 569
d
dx log x = x−1 ∇X log det X = X −T
∂X −T −1 T
∇X2 log det(X)kl = = − X −1 ek eT
¡ ¢
l X , confer (2013)(2060)
∂Xkl
d
dx log x−1 = −x−1 ∇X log det X −1 = −X −T
d
dx log x µ = µx−1 ∇X log detµ X = µX −T
µ
∇X log det X = µX −T
1
∇x log(aTx + b) = a aTx+b ∇X log det(AX + B) = AT(AX + B)−T
d
dt log det(X + t Y ) = tr ((X + t Y )−1 Y )
d2
dt2
log det(X + t Y ) = − tr ((X + t Y )−1 Y (X + t Y )−1 Y )
d
dt log det(X + t Y )−1 = − tr ((X + t Y )−1 Y )
d2
dt2
log det(X + t Y )−1 = tr ((X + t Y )−1 Y (X + t Y )−1 Y )
d
dt log det(δ(A(x
³ + t y) + a)2 + µI) ´
−1
= tr (δ(A(x + t y) + a)2 + µI) 2δ(A(x + t y) + a)δ(Ay)
570 APPENDIX D. MATRIX CALCULUS
D.2.5 determinant
d
dt det(X + t Y ) = det(X + t Y ) tr((X + t Y )−1 Y )
d2
det(X + t Y ) = det(X + t Y )(tr 2 (X + t Y )−1 Y − tr((X + t Y )−1 Y (X + t Y )−1 Y ))
¡ ¢
dt2
d
dt det(X + t Y )−1 = − det(X + t Y )−1 tr((X + t Y )−1 Y )
d2
dt2
det(X + t Y )−1 = det(X + t Y )−1 (tr 2 ((X + t Y )−1 Y ) + tr((X + t Y )−1 Y (X + t Y )−1 Y ))
d
dt detµ (X + t Y ) = µ detµ (X + t Y ) tr((X + t Y )−1 Y )
D.2.6 logarithmic
Matrix logarithm.
d
dt log(X + t Y )µ = µY (X + t Y )−1 = µ(X + t Y )−1 Y , XY = YX
d
dt log(I − t Y )µ = −µY (I − t Y )−1 = −µ(I − t Y )−1 Y [234, p.493]
D.2. TABLES OF GRADIENTS AND DERIVATIVES 571
D.2.7 exponential
Matrix exponential. [84, §3.6, §4.5] [374, §5.4]
T T T
∇X etr(Y X)
= ∇X det eY X
= etr(Y X)
Y (∀ X , Y )
T T T
YT
∇X tr¡eY X = ¢eY X Y T = Y T eX (∀ X , Y )
∇X tr AeY X = . . .
∇x 1T eAx = ATeAx
1
∇x log(1T e x ) = ex
1T e x
µ ¶
1 x 1 x xT
∇x2 log(1T e x ) = δ(e ) − e e
1T e x 1T e x
k k
µ ¶
Q 1
1 Q 1
∇x xi =k
xi 1/x
k
i=1 k i=1
k k
µ ¶µ ¶
1
1 1
−2 1
∇x2 T
Q Q
xi = −
k
xi δ(x) − (1/x)(1/x)
k
i=1 k i=1 k
d tY
dt e = etY Y = Y etY
d X+ t Y
dt e = eX+ t Y Y = Y eX+ t Y , XY = YX
d 2 X+ t Y
dt2
e = eX+ t Y Y 2 = Y eX+ t Y Y = Y 2 eX+ t Y , XY = YX
d j tr(X+ t Y )
e = etr(X+ t Y ) tr j(Y )
dt j