Académique Documents
Professionnel Documents
Culture Documents
LEARNING
Module 11: Learning in PGMs
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Learning is parameter estimation
The first method is gradient descent:
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
MRF with hidden and visible variables
The log likelihood become the following:
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Gradient descent for an RBM
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
This becomes
Contrastive divergence shortens Gibbs
sampling chain
Repeat for k-steps (usually k=1)
Sample h given v, for a given instance of v
Sample v given h
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
The complete batch CD algorithm
Source: An Introduction to Restricted Boltzmann Machines, Asja Fischer and Christian Igel, CIARP 2012
Conditional Random Fields
Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Relation between graphical models
Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Nave Bayes and logistic regression as a
discriminative-generative pair
Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
From HMM to CRF
Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Parameter estimation in fully observed
CRF
Conditional likelihood
Substituting
Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
Taking partial derivative
Source: An Introduction to Conditional Random Fields for Relational Learning, by Charles Sutton
15
In summary
There are some general algorithms for learning in
PGMs, such as Markov Chain Expectation
Maximization
However, for special dependence structures and
distributions, simplified algorithms are available
These algorithms can be trained for many practical
problems