Vous êtes sur la page 1sur 15

ADVANCED MACHINE

LEARNING
Module 11: Learning in PGMs

Instructor: Amit Sethi


Co-developer: Neeraj Kumar
TAs: Gaurav Yadav, Niladri Bhattacharya
Page: AdvancedMachineLearning.weebly.com
IITG Course No: EE 622
Learning or parameter estimation
We want to estimate parameters

Supervised: if we knew v and h, it is easy to estimate

Unsupervised: We dont even know h; only v (in general, x)


The key idea is to maximize the log likelihood of the data
given the parameters:

Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Learning is parameter estimation
The first method is gradient descent:

The three terms should be familiar:


Gradient
Weight decay
Momentum

Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
MRF with hidden and visible variables
The log likelihood become the following:

Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Gradient descent for an RBM

Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012

This becomes
Contrastive divergence shortens Gibbs
sampling chain
Repeat for k-steps (usually k=1)
Sample h given v, for a given instance of v
Sample v given h

Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
The complete batch CD algorithm

Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Conditional Random Fields

Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Relation between graphical models

Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Nave Bayes and logistic regression as a
discriminative-generative pair

One can model many


different forms of features
without worrying about their
independence. CRFs make
assumptions about y, not x

Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
From HMM to CRF

Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Parameter estimation in fully observed
CRF
Conditional likelihood

Substituting

Adding regularization (zero mean and 2I variance


Gaussian prior)

Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Taking partial derivative

Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
15

In summary
There are some general algorithms for learning in
PGMs, such as Markov Chain Expectation
Maximization
However, for special dependence structures and
distributions, simplified algorithms are available
These algorithms can be trained for many practical
problems

Vous aimerez peut-être aussi