Vous êtes sur la page 1sur 8

2/27/2018 Maximum Likelihood Estimation – Stokastik

MACHINE LEARNING AI

PROGRAMMING RANDOM

Maximum Likelihood
Estimation
April 4, 2016

Observations from a probability distribution,


depends on the parameters of that model. For
example, given an unbiased coin with
equal probability of landing heads as well as tails,
what is probability of observing the sequence
"HHTH". Our knowledge from probability theory
says that since the toss of a coin follows the
binomial distribution, the probability of the
observation should be 0.54 = 0.0625, but what if
the coin was biased and the probability of landing

http://www.stokastik.in/mle/ 1/8
2/27/2018 Maximum Likelihood Estimation – Stokastik

heads was 0.7 (and 0.3 for tails) ? The answer


would be 0.103.

Thus the probability of the observation is a


function of the probability of heads (or tails). Lets
denote that probability as P(O; p) where 'p' is the
probability of heads,

P(O=HHTH; p=0.5) = 0.0625 and P(O=HHTH;


p=0.7) = 0.103.

But in more common scenarios, given the


observation, we need to estimate the
probabilities. The aim of maximum likelihood
estimation is to find the parameter value(s) that
makes the observed data most likely, or in other
words given the observation 'O', what should be
the value of 'p' so that P(O; p) is maximum.

Continuing with our coin tossing example, let's


suppose we do not know the value of 'p' and we
have to estimate it given the observation 'HHTH'.
Since the probability of heads is 'p', hence the
probability of tails is (1-p). From binomial
distribution, the likelihood of the observed
sequence as a function of 'p' is :
http://www.stokastik.in/mle/ 2/8
2/27/2018 Maximum Likelihood Estimation – Stokastik

3
L(p; O = HHT H) = p ∗ (1 − p)

Our goal is to maximize this quantity.

Probability of the observation as a function of the


unknown parameter 'p'

There are many possible ways the above function


can be maximized. If the function is too complex
then we normally use numerical algorithms to
estimate the values of the parameters. But for
simple functions like the one above, we can find a
solution by setting the first derivative of the
likelihood w.r.t the unknown parameters.
http://www.stokastik.in/mle/ 3/8
2/27/2018 Maximum Likelihood Estimation – Stokastik

Sometimes the function may be having local


minima and maxima and hence the solution
obtained may not be the globally correct answer.

Complex curve demonstrating local maxima


and minima.

Since we are dealing with probabilities (real


numbers between 0 and 1), raising them to
exponents and taking products will result in
integer underflow. We can overcome that by
taking Log Likelihood of the function, i.e.

LogL(p; O = HHT H) = 3 ∗ log(p) + log(1 −


http://www.stokastik.in/mle/ 4/8
2/27/2018 Maximum Likelihood Estimation – Stokastik

Taking the derivative of the above quantity w.r.t. 'p'


and setting it to zero, gives us : 3(1 − p) = p,
solving for p gives us, p=3/4=0.75. The answer
seems pretty intuitive and one might wonder why
did we go through the pain of computing
derivatives and roots. True. But the point was to
illustrate that the approach is generic.

Lets look at a more complex example with Normal


distribution.

Given a sequence of identical and independent


observations X1, X2, ..., Xn from a normal
distribution with unknown parameters μ and σ,
estimate the parameters so as to fit the
distribution to the observation.

The Gaussian distribution is defined as


2
(x−μ)

1 2
N (μ, σ) = e 2σ

√2πσ

Hence the probability of the sequence of


observations is the product of the probabilities of
each observation (since the observations are
I.I.D.), thus :
http://www.stokastik.in/mle/ 5/8
2/27/2018 Maximum Likelihood Estimation – Stokastik
2
(X −μ)
i
n −
1 2
P (O|μ, σ) = ∏ e 2σ
i=1
√2πσ

Taking the log likelihood of the above, we get :

n (X
1
LogL(μ, σ|O) = ∑ log( ) − log(σ) −
i=1
√2π

To maximize the above function, we will use the


same technique as earlier i.e. take the derivatives
w.r.t. the unknown parameters and find their roots.
On taking the derivative w.r.t. μ we get :
n

i=1
(Xi − μ) = 0 ,

solving for μ we get,


n
∑ Xi
μ =
i=1

n
,

i.e. the mean of the observations. Similarly on


taking derivative w.r.t. σ, we get the following :
2
n (Xi −μ)

i=1

1

σ
+
3
= 0 ,
σ

solving for σ, we get


−−−
n −−−−−−
2
∑ (Xi −μ)
σ = √
i=1

n
,

http://www.stokastik.in/mle/ 6/8
2/27/2018 Maximum Likelihood Estimation – Stokastik

which is the standard deviation of the


observations.

Share this:

   
Like this:

 Like
Be the first to like this.

Related

Expectation Sampling from Understanding


Maximization with Probability Conditional
an Example Distributions Random Fields
April 5, 2016 December 26, August 9, 2017
In "MACHINE 2016 In "MACHINE
LEARNING" In "MACHINE LEARNING"
LEARNING"

EXPECTATION

MAXIMIZATION WITH

AN EXAMPLE →

Categories: MACHINE LEARNING

http://www.stokastik.in/mle/ 7/8
2/27/2018 Maximum Likelihood Estimation – Stokastik

Tags: M A X I M U M L I K E L I H O O D E S T I M AT I O N , PROBABILITY ,
S TAT I S T I C S

PROUDLY POWERED BY WORDPRESS • THEME: ISCA BY PRO THEME DESIGN.

http://www.stokastik.in/mle/ 8/8

Vous aimerez peut-être aussi