Examples of Variational Inference With Gaussian-Gamma Distribution

Examples of varia!
onal inference with

Gaussian-Gamma Distribu!on
Let D = {x1 , ⋯ , xn }.
Every data point xi is random variable with normal distribu!on.
τ 12 −τ
p(xi ∣τ , µ) = ( ) exp( (xi − µ)2 )
2π 2
where τ = σ12 is precision.
The data D given τ and µ has a normal distribu!on. p(D∣τ , µ) is also the likelihood of τ
and µ.
n
p(D∣τ , µ) = ∏ p(xi ∣τ , µ)
i=1
n
−τ
= ∏ ((
τ 12
) exp( (xi − µ)2 ))
i=1
2π 2
n
τ n2 −τ
=( ) exp( ∑(xi − µ)2 )
2π 2 i=1
We know
n
−τ
ln p(D∣τ , µ) = ln (( ) 2 exp(
τ n
∑(xi − µ)2 ))
2π 2 i=1
n
n n τ
= ln τ − ln 2π − ∑(xi − µ)2
2 2 2 i=1
n
n τ
∝ ln τ − ∑(xi − µ)2
2 2 i=1
The prior distribu!on of τ and µ has a normal-gamma distribu!on
p(τ , µ) = p(µ∣τ )p(τ )
where
λ0 τ 1 −λ0 τ
p(µ∣τ ) =N (µ0 , (λ0 τ )−1 ) = ( ) 2 exp( (µ − µ0 )2 )
2π 2
an
bn
p(τ ) =Gamma(τ ∣a0 , b0 ) = τ a0 −1 e−b0 τ
Γ(an )
We have
−λ0 τ
ln p(µ∣τ ) = ln ((
λ0 τ 1
) 2 exp( (µ − µ0 )2 ))
2π 2
1 1 1 λ0 τ
= ln λ0 + ln τ − ln 2π − (µ − µ0 )2
2 2 2 2
1 λ0 τ
∝ ln τ − (µ − µ0 )2
2 2
bann a0 −1 −b0 τ
ln p(τ ) = ln τ e
Γ(an )
=an ln bn − ln Γ(an ) + (a0 − 1) ln τ − b0 τ
∝(a0 − 1) ln τ − b0 τ
Because the normal-gamma is conjugate prior distribu!on of normal distribu!on, the

posterior distribu!on of τ and µ is normal-gamma as well.
P (τ , µ∣D) ∝p(D∣τ , µ)p(τ , µ)

P (τ , µ∣D) =N (µn , (λn τ )−1 )Gamma(τ ∣an , bn )
where
λ0 µ0 + nx
ˉ
µn =
λ0 + n
λn =λ0 + n
an =a0 + n/2
n
1 ˉ − µ0 )2
λ0 n(x
bn =b0 + ∑(xi − x 2
ˉ) +
2 2(λ0 + n)
i=1
However, for demo purposes we will consider a factorized varia!onal approxima!on to

the posterior distribu!on given by
q(µ, τ ) = qµ (µ)qτ (τ )
We know
ln(D, µ, τ ) = ln(p(D∣τ , µ)p(τ , µ))

= ln(p(D∣τ , µ)p(µ∣τ )p(τ ))
= ln p(D∣τ , µ) + ln p(µ∣τ ) + ln p(τ )
n
n τ
= ln τ − ∑(xi − µ)2
2 2 i=1
1 λ0 τ
+ ln λ0 τ − (µ − µ0 )2
2 2
+ (a0 − 1) ln τ − b0 τ
+ const
We have
ln qµ∗ (µ) =E≠µ [ln(D, µ, τ )] + const

=Eτ [ln(D, µ, τ )] + const
=Eτ [ln p(D∣τ , µ) + ln p(µ∣τ ) + ln p(τ )] + const
=Eτ [ln p(D∣τ , µ) + ln p(µ∣τ )] + const
n
−τ −λ0 τ
=Eτ [ ∑(xi − µ)2 + (µ − µ0 )2 ] + const
2 i=1 2
n
Eτ [τ ]
=− [∑(xi − µ)2 + λ0 (µ − µ0 )2 ] + const
2 i=1
Notes:
For q ∗ (µ) is a func!on of µ, and we op!mize using coordinate descent only with µ, so all
terms containing only τ can be treated as const.
Because
n
∑(xi − µ)2 + λ0 (µ − µ0 )2
i=1
n
= ∑(x2i − 2xi µ + µ2 ) + λ0 (µ2 − 2µµ0 + µ20 )
i=1
n
= ∑ x2i − 2nµx
ˉ + nµ2 + λ0 µ2 − 2λ0 µµ0 + λ0 µ20
i=1
ˉ + nµ2 + λ0 µ2 − 2λ0 µµ0 + const
= − 2nµx
=(n + λ0 )µ2 − 2(nxˉ + λ0 µ0 )µ + const
nxˉ + λ0 µ0 2
=(n + λ0 )(µ − ) + const
n + λ0
Notes:
n
∑i=1 x2i and λ0 µ20 are fixed constant values.
We can get
−Eτ [τ ](n + λ0 ) ˉ + λ0 µ0 2
nx
ln q ∗ (µ) = (µ − ) + const
2 n + λ0
Actually
ˉ + λ0 µ0
q (µ) = N (
∗ nx
, Eτ [τ ](n + λ0 ))
n + λ0
Because qτ (τ ) also has the Gamma distribu!on, and if the corresponding parameters of
a
the Gamma distribu!on are an and bn , then Eτ [τ ] = bnn .
ˉ + λ0 µ0 an
q (µ) = N (
∗ nx
, (n + λ0 ))
n + λ0 bn
We also have
ln qτ∗ (τ ) =E≠τ [ln(D, µ, τ )] + const
=Eµ [ln(D, µ, τ )] + const
=Eµ [ln p(D∣τ , µ) + ln p(µ∣τ ) + ln p(τ )] + const
n
n τ
=Eµ [ ln τ − ∑(xi − µ)2
2 2 i=1
1 λ0 τ
+ ln τ − (µ − µ0 )2
2 2
+ (a0 − 1) ln τ − b0 τ ] + const
We can bring terms without µ outside of the integral:
n+1
ln qτ∗ (τ ) = ln τ + (a0 − 1) ln τ − b0 τ
2
n
τ
− Eµ [∑(xi − µ)2 + λ0 (µ − µ0 )2 ] + const
2 i=1
n
n+1 1
=( + a0 − 1) ln τ − τ (b0 + Eµ [∑(xi − µ)2 + λ0 (µ − µ0 )2 ]) + const
2 2
i=1
We set
n+1
an = + a0
2
n
1
bn =b0 + Eµ [∑(xi − µ)2 + λ0 (µ − µ0 )2 ]
2
i=1
n
1
=b0 + Eµ [∑(x2i − 2xi µ + µ2 ) + λ0 (µ2 − 2µµ0 + µ20 )]
2
i=1
n
1
=b0 + Eµ [∑ x2i − 2nx
ˉµ + nµ2 + λ0 µ2 − 2λ0 µµ0 + λ0 µ20 ]
2
i=1
n
1
=b0 + (Eµ [−2µnx
ˉ + nµ2 + λ0 µ2 − 2λ0 µ0 µ] + ∑ x2i + λ0 µ20 )
2
i=1
n
1
=b0 + ((n + λ0 )Eµ [µ2 ] − 2(nx
ˉ + λ0 µ0 )Eµ [µ] + ∑ x2i + λ0 µ20 )
2
i=1
We will compute Eµ [µ2 ] and Eµ [µ] since we know qµ from previous results.
qτ∗ (τ ) = Gamma(τ ∣an , bn )
Notes:
It should be emphasized that we did not assume these specific func!onal forms for the
op!mal distribu!ons qµ∗ (µ) and qτ∗ (τ ). They arose naturally from the structure of the
likelihood func!on and the corresponding conjugate priors.
Thus we have expressions for the op!mal distribu!ons qµ∗ (µ) and qτ∗ (τ ) each of which
depends on moments evaluated with respect to the other distribu!on. One approach to
finding a solu!on is therefore to make an ini!al guess for, say, the moment E[τ ] and use
this to re-compute the distribu!on qµ∗ (µ). Given this revised distribu!on we can then
extract the required moments E[µ] and E[µ2 ], and use these to recompute the
distribu!on qτ∗ (τ ), and so on.
Problem: How to use exponen!al family to simply the calcula!on?

Examples of Variational Inference With Gaussian-Gamma Distribution

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Examples of Variational Inference With Gaussian-Gamma Distribution

Transféré par

Droits d'auteur :

Formats disponibles

Examples of varia!

onal inference with

Every data point xi is random variable with normal distribu!on.

where τ = σ12 is precision.

p(τ , µ) = p(µ∣τ )p(τ )

Because the normal-gamma is conjugate prior distribu!on of normal distribu!on, the

P (τ , µ∣D) ∝p(D∣τ , µ)p(τ , µ)

However, for demo purposes we will consider a factorized varia!onal approxima!on to

ln(D, µ, τ ) = ln(p(D∣τ , µ)p(τ , µ))

ln qµ∗ (µ) =E≠µ [ln(D, µ, τ )] + const

We can bring terms without µ outside of the integral:

Problem: How to use exponen!al family to simply the calcula!on?

Vous aimerez peut-être aussi