Académique Documents
Professionnel Documents
Culture Documents
Contents 1. Linear Regression 2. Logistic Regression 2.1. The Cost Function J() 1 1 2
1. Linear Regression 2. Logistic Regression For logistic Regression, we use h(x(i) ) as the estimated probability that the training example x(i) is in class y = 1 ( or is labeled as y = 1). Here, we assume that the response variables y (1) , y (2) , ...y (m) Bern(p = i ) The hypothesis h (x) is the logistic function:
h (x) = g(T x) =
1 1 + eT x
The derivative of the sigmoid function has the following nice property (the proof is very easy, so we will not prove it): g (z) = g(z) (1 g(z))
Lets consider X Rmn+1 and Y Rm1 as our dataset. Now, for the parameters Rn+1 , we have that X is a linear combination of the features of X:
1
x(1) T T x(1) T x(2) x(2) T T x(3) x(3) T X = = . . . . . . x(m) T m1 T x(m) m1 Lets dene the vector h Rm1 such that: [h]i = g(T x(i) ) = g(x(i) T ) h = g(X) (g is the sigmoid function) ( g is the matrix version of the sigmoid function)
2.1. The Cost Function J(). We can get the cost function by using maximum likelihood L() over the joint distributions of the dataset. Or by constructing a cost function that penalizes the missclasication. We present the following similar cost functions : J1 () = J2 () = J3 () = 1 m y (i) log(h (x(i) )) + (1 y (i) )log(1 h (x(i) )) y (i) log(h (x(i) )) + (1 y (i) )log(1 h (x(i) )) y (i) log(h (x(i) )) + (1 y (i) )log(1 h (x(i) ))
1 m
The last cost function J3 () is the log-likehood () = log(L()) of the parameters s. It is easy to demonstrate that minimizing J1 () is the same as maximizing J2 () and J3 (). min J1 () = max J2 () = max J3 ()
2.1.1. Matrix notation for J(). Lets consider the matrix notation for J1 ():
J1 () =
1 Y T log(h) + (1 + Y )T log(1 h) m
= J1 ()
J1 () = ?
1 J1 () = j m j 1 J1 () = j m 1 J1 () = j m
y (i)
y (i)
h (x(i) ) = g(T x(i) ) j j T (i) h (x(i) ) = g(T x(i) ) 1 g(T x(i) ) x j j (i) h (x(i) ) = g(T x(i) ) 1 g(T x(i) ) xj j (i) h (x(i) ) = h (x(i) ) 1 h (x(i) ) xj j
1 J1 () = j m 1 J1 () = j m 1 J1 () = j m 1 J1 () = j m 1 J1 () = j m
y (i)
(i)
(i)
y (i) h (x(i) ) xj
(i)
h (x(i) ) y (i) xj
(i)
The expression above in vector notation is: h(x(1) ) y (1) h(x(2) ) y (2) . . . (i) (i) 1m h(x ) y . . . h(x(m) ) y (m) m1
1 J1 () = xj T (h y) j m 1 X T (h y) m
J1 () =
1 X T (h y) m
J2 () = ? 1 J2 () = j m h (x(i) ) y (i) xj
(i)
1 J2 () = xj T (h y) j m 1 X T (h y) m
J2 () =
1 X T (h y) m