Decision Trees: USC Linguistics July 26, 2007

Decision Trees
USC Linguistics
July 26, 2007
Suppose we wish to classify events according to their temporal orientations.1
-ed -s will be [sing] t(e)

He read. F F F F T t(e)<ST
He was hungry. F F F T T t(e)<ST
He ran. F F F F T t(e)<ST
They liked it. T F F F F t(e)<ST
He reads. F T F F T t(e)∩ST
He is happy. F F F T T t(e)∩ST
They need it. T F F F F t(e)∩ST
They want it. F F F F F t(e)∩ST
He will read. F F T F T t(e)>ST
They will get it. F F T F F t(e)>ST
n
X
I(P (v1 ), ..., P (vn )) = −P (vi )log2 P (vi ) (1)
i=1
4 4 2
I( , , ) = .5288 + .5288 + .4644 = 1.522 (2)
10 10 10
X c(Ai ) c(v1 ) c(vn )

Remainder(A) = I , ..., (3)
N c(Ai ) c(Ai )
i
Gain(A) = I(P (v1 ), ..., P (vn )) − Remainder(A) (4)
2 1 1 8 3 3 2
R(−ed) = I( , , 0) + I( , , ) = .2 ∗ 1 + .8 ∗ 1.56 = 1.45 (5)
10 2 2 10 8 8 8
1 9 4 3 2
R(−s) = I(0, 1, 0) + I( , , ) = .1 ∗ 0 + .9 ∗ 1.53 = 1.38 (6)
10 10 9 9 9
1
Notice immediately, that there is no way to disambiguate the written representation “They read”;
unless, perhaps, the past is more likely to be eventive, and more likely to want a direct object.
1
2 8 4 4
R(will) = I(0, 0, 1) + I( , , 0) = .2 ∗ 0 + .8 ∗ 1 = .8 (7)
10 10 8 8
2 1 1 8 3 3 2
R(be) = I( , , 0) + I( , , ) = .2 ∗ 1 + .8 ∗ 1.56 = 1.45 (8)
10 2 2 10 8 8 8
6 3 2 1 4 1 2 1
R([sing]) = I( , , ) + I( , , ) = .6 ∗ 1.45 + .4 ∗ 1.5 = 1.475 (9)
10 6 6 6 10 4 4 4
<<<< ∩ ∩ ∩∩ >>
will:T will:F
>> <<<< ∩ ∩ ∩∩
4 4
I( , ) = 1 (10)
8 8
2 1 1 6 3 3
R(−ed) = I( , ) + I( , ) = .25 + .75 = 1 (11)
8 2 2 8 6 6
1 7 4 3
R(−s) = I(0, 1) + I( , ) = 0 + .86 = .86 (12)
8 8 7 7
2 1 1 6 3 3
R(be) = I( , ) + I( , ) = .25 + .75 = 1 (13)
8 2 2 8 6 6
5 3 2 3 1 2
R([sing]) = I( , ) + I( , ) = .61 + .34 = .95 (14)
8 5 5 8 3 3
<<<< ∩ ∩ ∩∩ >>
will:T will:F
>> <<<< ∩ ∩ ∩∩
-s:T -s:F
∩ <<<< ∩ ∩ ∩
2
-ed -s w-1 [sing] t(e)
He read. F F - T t(e)<ST
He was hungry. F F was T t(e)<ST
He ran. F F - T t(e)<ST
They liked it T F - F t(e)<ST
He reads. F T - T t(e)∩ST
He is happy F F is T t(e)∩ST
They need it. T F - F t(e)∩ST
They want it. F F - F t(e)∩ST
He will read. F F will T t(e)>ST
They will get it. F F will F t(e)>ST
6 3 3 1 1 2
R(w − 1) = I( , , 0) + I(1, 0, 0) + I(0, 1, 0) + I(0, 0, 1) = .6 (15)
10 6 6 10 10 10
<<<< ∩ ∩ ∩∩ >>
w-1:- w-1:was w-1:is w-1:will
<<< ∩ ∩ ∩ < ∩ >>
-ed -s w-1 [sing] t(e)

He read. F F he T t(e)<ST
He was hungry. F F was T t(e)<ST
He ran. F F he T t(e)<ST
They liked it T F they F t(e)<ST
He reads. F T he T t(e)∩ST
He is happy F F is T t(e)∩ST
They need it. T F they F t(e)∩ST
They want it. F F they F t(e)∩ST
He will read. F F will T t(e)>ST
They will get it. F F will F t(e)>ST
3 2 1 1 3 1 2 1 2
R(w − 1) = I( , , 0) + I(1, 0, 0) + I( , , 0) + I(0, 1, 0) + I(0, 0, 1) (16)
10 3 3 10 10 3 3 10 10
R(w − 1) = .55 (17)
3
Venkataraman2 credits Quinlan with the concept of GainRatio:
Gain(A)
GainRatio(A) = X (18)
−P (v)log2 P (v)
v∈A
1.522 − .55 .972

GainRatio(w − 1) = 3 1 3 1 2 = = .45 (19)
I( 10 , 10 , 10 , 10 , 10 ) 2.17
1.522 − .6 .922
GainRatio(w − 1) = 6 1 1 2 = 1.57 = .59 (20)
I( 10 , 10 , 10 , 10 )
1.522 − .8 .722
GainRatio(will) = 8 2 = =1 (21)
I( 10 , 10 ) .722
1 χ2
• “Don’t use low numbers, especially not zero.”
T otal(row) ∗ T otal(col)
E= (22)
N
‘Expected’ by the “independent hypothesis”3 :
Observed Expected
< ∩ > < ∩ >
- 30 30 0 60 - 24 24 12 60
was 10 0 0 10 was 4 4 2 10
is 0 10 0 10 is 4 4 2 10
will 0 0 20 20 will 8 8 4 20
total 40 40 20 100 total 40 40 20 100
Table 1: y:ind. var.; x: dep. var.
X (O − E)2
χ2 = (23)
E
(30 − 24)2 (30 − 24)2 (0 − 12)2 (10 − 4)2 (0 − 4)2 (0 − 2)2

χ2 = + + + + + + (24)
24 24 12 4 4 2
2
http://www.speech.sri.com/people/anand/771/html/node29.html
3
Null in the sense of ‘to be nullified’, not in the sense of maximum entropy.
4
(0 − 4)2 (10 − 4)2 (0 − 2)2 (0 − 8)2 (0 − 8)2 (20 − 4)2
+ + + + + = 125 (25)
4 4 2 8 8 4
f ◦ = (rows − 1)(cols − 1) (26)

gamma function:
Z∞
Γ(α) = xα−1 e−x dx (27)
0
for integers:
Γ(n) = (n − 1)! (28)
gamma pdf:
λα α−1 −λx
p(x) = x e (29)
Γ(α)
χ2 :
f◦ 1
α= ,λ = (30)
2 2
f◦
( 12 ) 2 f◦
−1 1 2
2
p(χ ) = ◦ x 2 e− 2 χ (31)
Γ( f2 )
Gamma PDF (χ2 : f ◦ = 2α; λ = .5)

0.5
α = 10, λ = 2
α = 5, λ = 2
0.4 α = 3, λ = .5
α = 1, λ = .5
α = .5, λ = .5
0.3
0.2
0.1
0
0 2 4 6 8 10
p(χ2 > 125, f ◦ = 6) = 9.25 ∗ 10−14 (32)
5
Observed Expected
-s < ∩ > -s < ∩ >
T 0 10 0 10 T 5 5 0 10
F 40 30 0 70 F 35 35 0 70
total 40 40 0 80 total 40 40 0 80
χ2 = 5 + 5 + 0 + .71 + .71 + 0 = 11.42, f ◦ = 2 (33)

p(χ2 > 11.42, f ◦ = 2) = .003 (34)
Observed Expected
-s < ∩ > -s < ∩ >
T 0 1 0 1 T .5 .5 0 1
F 4 3 0 7 F 3.5 3.5 0 7
total 4 4 0 8 total 4 4 0 8
χ2 = .5 + .5 + 0 + .071 + .071 + 0 = 1.142, f ◦ = 2 (35)

p(χ2 > 1.142, f ◦ = 2) = .56 (36)

Decision Trees: USC Linguistics July 26, 2007

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Decision Trees: USC Linguistics July 26, 2007

Transféré par

Droits d'auteur :

Formats disponibles

Decision Trees

July 26, 2007

Suppose we wish to classify events according to their temporal orientations.1

-ed -s will be [sing] t(e)

Gain(A) = I(P (v1 ), ..., P (vn )) − Remainder(A) (4)

w-1:- w-1:was w-1:is w-1:will

<<< ∩ ∩ ∩ < ∩ >>

-ed -s w-1 [sing] t(e)

R(w − 1) = .55 (17)

1.522 − .55 .972

Table 1: y:ind. var.; x: dep. var.

(30 − 24)2 (30 − 24)2 (0 − 12)2 (10 − 4)2 (0 − 4)2 (0 − 2)2

f ◦ = (rows − 1)(cols − 1) (26)

Gamma PDF (χ2 : f ◦ = 2α; λ = .5)

p(χ2 > 125, f ◦ = 6) = 9.25 ∗ 10−14 (32)

χ2 = 5 + 5 + 0 + .71 + .71 + 0 = 11.42, f ◦ = 2 (33)

χ2 = .5 + .5 + 0 + .071 + .071 + 0 = 1.142, f ◦ = 2 (35)

Vous aimerez peut-être aussi