Académique Documents
Professionnel Documents
Culture Documents
Aleksandr Palatnik
Completed: November 28, 2010
1 Bayesian Networks
For reference, we’ll refer to the following rules for active paths:
1. X → Y → Z is active if Y ∈
/O
2. X ← Y ← Z is active if Y ∈
/O
3. X ← Y → Z is active if Y ∈
/O
4. X → Y ← Z is active if Y ∈ O or Descendants(Y ) ∈ O
2. True. A path connecting a and g has to pass through d or f , but since they’re unobserved,
by rule 4 there is no active path through them.
4. True. Condsidering the path from j to d, we look at the path from j to e. This path has
to pass through g and possibly h. Looking at path jige, g creates an inactive path since
it is observed (by rule 3). Looking at path jihge, h being observed allows an active path
by rule 4, but g creates an inactive path again by rule 3. Therefore, there is no active
path from j to d.
5. True. Any path between b and i has to pass through d or f . Because neitgher d or f is
observed, rule 4 cannot be applied to create an active path through those nodes
1
2 Markov Decision Processes
1. This problem can be considered to have only 3 states: < 2, 1, 0 >, each representing the
distance between P1 and P2. The state changes are dictated by the following probabilities:
2
3 Information Gathering
1. Given the table, we can estimate the following probabilities:
x P(S = x) i P(Yi = 1)
1 1/4 1 1/2
2 1/6 2 1/2
3 1/3 3 1/2
4 1/4 4 7/12
We can use this information to calculate the entropy prior to observation H(x) using
Shannon entropy:
X
H(x) = − P (S = x)Log2 P (S = x)
x
= 1/4Log2 (4) + 1/6Log2 (6) + 1/3Log2 (3) + 1/4Log2 (4)
= 1/2 + .4308 + .5283 + 1/2
= 1.9591
The next step is to compute the entropy after observing each Yi :
H(S|Y1 = 1) = 1/3Log2 (3) − (0)Log2 (0) + 1/3Log2 (3) + 1/3Log2 (3)
= Log2 (3) = 1.585
H(S|Y1 = 0) = 1/6Log2 (6) + 1/3Log2 (3) + 1/3Log2 (3) + 1/6Log2 (6)
= 1/3Log2 (6) + 2/3Log2 3 = 0.8616 + 1.0566 = 1.9182
H(S|Y2 = 1) = 1/2Log2 (2) + 1/2Log2 (2)
= Log2 (2) = 1
H(S|Y2 = 0) = 1/3Log2 (3) + 1/6Log2 (6) + 1/2Log2 (2)
= .5283 + .4308 + .5 = 1.4591
H(S|Y3 = 1) = 1/2Log2 (2) + 1/2Log2 (2)
= Log2 (2) = 1
H(S|Y3 = 0) = 1/2Log2 (2) + 1/3Log2 (3) + 1/6Log2 (6)
= 1.4591
H(S|Y4 = 1) = 1/7Log2 (7) + 1/7Log2 (7) − 2/7Log2 (2/7) − 3/7Log2 (3/7)
= .8021 + .5164 + .5239 = 1.8424
H(S|Y4 = 0) = −2/5Log2 (2/5) + 1/5Log2 (5) − 2/5Log2 (2/5)
= 1.0575 + .4644 = 1.5219
Now that we have the post-observation entropies, we can calculate the conditional en-
tropies:
X
H(S|Y1 ) = P (y1 )H(X|Y1 = y1 ) = 1/2(1.585) + 1/2(1.9182) = 1.7516
y1
I(S; Y ) = H(S) − H(S|Y ) = 1.9591 − H(S|Y ) =< .2075, .7295, .7295, .2502 >
3
This implies that questions 2 and 3 are the most informative since they have the highest
information gain. Suppose we start with question 2 in building the decision tree. We now
have to do a bunch of additional calculations to find which of the remaining questions is
most informative:
Case Y2 = 1:
H(S|Y2 = 1) = H(S1′ ) = 1 (As calculated before)
H(S1′ |Y1 = 1) = 2/3Log2 (3/2) + 1/3Log2 (3) = .9183
H(S1′ |Y1 = 0) = 1/3Log2 (3) + 2/3Log2 (3/2) = .9183
H(S1′ |Y3 = 1) = −1 ∗ Log2 (1) = 0
H(S1′ |Y3 = 0) = 3/4Log2 (4/3) + 1/4Log2 (4) = .8113
H(S1′ |Y4 = 1) = 1/2Log2 (2) + 1/2Log2 (2) = 1
H(S1′ |Y4 = 0) = 1/2Log2 (2) + 1/2Log2 (2) = 1
H(S1′ |Y1 ) = 1/2(.9183) + 1/2(.9183) = .9183
H(S1′ |Y3 ) = 1/3(0) + 2/3(.8113) = .5409
H(S1′ |Y4 ) = 1/3(1) + 2/3(1) = 1
I(S1′ |Y ) =< .0817, .4591, 0 >
This implies that after question 2, we should ask question 3. At this point we
consider the cases for question 3. If it returns 1, this means that the result is species
3, otherwise it could be 1 or 3, so we do the math in that case:
H(S1′ |Y3 = 0) = H(S13
′′
) = .8113
′′
H(S13 |Y1 = 1) = −1Log2 (1) = 0
′′
H(S13 |Y1 = 0) = 1/2Log2 (2) + 1/2Log2 (2) = 1
′′
H(S13 |Y4 = 1) = 1/2Log2 (2) + 1/2Log2 (2) = 1
′′
H(S13 |Y4 = 0) = −1Log2 (1) = 0
′′
H(S13 |Y1 ) = 1/2(0) + 1/2(1) = .5
′′
H(S13 |Y4 ) = 1/2(1) + 1/2(0) = .5
′′
I(S13 |Y ) =< .5, .5 >
This indicates that either question is equally useful at this point. Suppose we choose
question 1, if it’s result is 1, then the species is 1, otherwise we ask question 4. If
that results in 1, it’s species 3, otherwise species 1.
Case Y2 = 0:
H(S|Y2 = 0) = H(S1′ ) = 1.4591 (As calculated before)
H(S1′ |Y1 = 1) = 1/3Log2 (3) + 2/3Log2 (3/2) = .9183
H(S1′ |Y1 = 0) = 2/3Log2 (3/2) + 1/3Log2 (3) = .9183
H(S1′ |Y3 = 1) = 1/4Log2 (4) + 3/4Log2 (4/3) = .8113
H(S1′ |Y3 = 0) = −1Log2 (1) = 0
H(S1′ |Y4 = 1) = 1/5Log2 (5) + 1/5Log2 (5) + 3/5Log2 (5/3) = 1.3710
H(S1′ |Y4 = 0) = −1Log2 (1) = 0
H(S1′ |Y1 ) = 1/2(.9183) + 1/2(.9183) = .9183
H(S1′ |Y3 ) = 2/3(.8113) + (/1)(3)(0) = .5409
H(S1′ |Y4 ) = 5/6(1.3710) + 1/6(0) = 1.1425
I(S1′ |Y ) =< .5408, .9182, .3166 >
4
This means that question 3 is the next most informative. If the result of question 3
is 0, the result is species 2. The dataset at this point contains no results for question
4 being 0, so it is no longer useful. This means that the next question is question 1.
If the result of question 1 is 0, the species is species 4, otherwise it is inconclusive
between 3 or 4.
Question 2
1 0
Question 3 Question 3
1 0
0 1
Species 3 Species 1
In practice, you can stop expanding the tree as soon as one result has a much higher
probability of being correct than any of the others since the cost of expanding further
may be greater than the cost of giving the wrong result.
Question 1:
X X
E1 = max(maxa P (x|Y1 = 1)U (x, a), maxa P (x|Y1 = 0)U (x, a))
x x
E1 = max(maxa (1/3U (1, a) + 1/3U (3, a) + 1/3U (4, a)),
maxa (1/6U (1, a) + 1/3U (2, a) + 1/3U (3, a) + 1/6U (4, a)))
E1 = −1
M athematicaCode :M ax[M ax[T able[(1/2)U [1, i] + (1/2)U [3, i], {i, 0, 4}]],
M ax[T able[(1/3)U [2, i] + (1/6)U [3, i] + (1/2)U [4, i], {i, 0, 4}]]]
E2 = −1
Question 3:
M athematicaCode :M ax[M ax[T able[(1/2)U [3, i] + (1/2)U [4, i], {i, 0, 4}]],
M ax[T able[(1/2)U [1, i] + (1/3)U [2, i] + (1/6)U [3, i], {i, 0, 4}]]]
E3 = −1
Question 4:
M athematicaCode :M ax[M ax[T able[(1/7)U [1, i] + (1/7)U [2, i] + (2/7)U [3, i] + (3/7)U [4, i], {i, 0, 4}
M ax[T able[(2/5)U [1, i] + (1/5)U [2, i] + (2/5)U [3, i], {i, 0, 4}]]]
E4 = −1
5
Questions 1,4:
X X
E14 = max(maxa P (x|Y1 = 1, Y4 = 1)U (x, a), maxa P (x|Y1 = 1, Y4 = 0)U (x, a)),
x x
X X
maxa P (x|Y1 = 0, Y4 = 1)U (x, a), maxa P (x|Y1 = 0, Y4 = 0)U (x, a))
x x
M.Code :M ax[
M ax[T able[(1/4)U [1, i] + (1/4)U [3, i] + (1/2)U [4, i], {i, 0, 4}]],
M ax[T able[(1/2)U [1, i] + (1/2)U [4, i], {i, 0, 4}]],
M ax[T able[(1/3)U [2, i] + (1/3)U [3, i] + (1/3)U [4, i], {i, 0, 4}]],
M ax[T able[(1/3)U [1, i] + (1/3)U [2, i] + (1/3)U [3, i], {i, 0, 4}]]
]
E14 = −1
Questions 2,3:
X X
E14 = max(maxa P (x|Y2 = 1, Y3 = 1)U (x, a), maxa P (x|Y2 = 1, Y3 = 0)U (x, a)),
x x
X X
maxa P (x|Y2 = 0, Y3 = 1)U (x, a), maxa P (x|Y2 = 0, Y3 = 0)U (x, a))
x x
M.Code :M ax[
M ax[T able[(1)U [3, i], {i, 0, 4}]],
M ax[T able[(3/4)U [1, i] + (1/4)U [3, i], {i, 0, 4}]],
M ax[T able[(1/4)U [3, i] + (3/4)U [4, i], {i, 0, 4}]],
M ax[T able[(1)U [2, i], {i, 0, 4}]]
]
E23 = 1
Since adding a question does not affect the value of information additively, this implies
that value of information is not submodular.