Académique Documents
Professionnel Documents
Culture Documents
3
2.5 0.4
2
1.5
1 1
0.5 0.8 0.2
0 0.6
0 0.2 0.4 y
0.4 0.2
0.6 0.8 0 0
x 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
1 (y−(θ1 x+θ2 ))2
Maximizing P (y|x) = √ e− 2σ 2 w.r.t. θ1, θ2
2πσ
N
X
= minimizing E =
j =1
(yj − (θ1xj + θ2))2
That is, minimizing the sum of squared errors gives the ML solution
for a linear fit assuming Gaussian noise of fixed variance
Chapter 20, Sections 1–3 12
Summary
Full Bayesian learning gives best possible predictions but is intractable
MAP learning balances complexity with accuracy on training data
Maximum likelihood assumes uniform prior, OK for large data sets
1. Choose a parameterized family of models to describe the data
requires substantial insight and sometimes new models
2. Write down the likelihood of the data as a function of the parameters
may require summing over hidden variables, i.e., inference
3. Write down the derivative of the log likelihood w.r.t. each parameter
4. Find the parameter values such that the derivatives are zero
may be hard/impossible; modern optimization techniques help
Chapter 20, Sections 1–3 13