Simplified Cost Function and Gradient Descent
Note:[6:53 - the gradient descent equation should have a 1/m factor]
We can compress our cost function's two conditional cases into one case:
| Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x)) |
|
Notice that when y is equal to 1, then the second term(1−y)log(1−hθ(x))will be zero and will not affect the result. If y is equal to 0, then the first term−ylog(hθ(x))will be zero and will not affect the result.
We can fully write out our entire cost function as follows:
| J(θ)=−m1i=1∑m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))] |
|
A vectorized implementation is:
h=g(Xθ)
J(θ)=m1⋅(−yTlog(h)−(1−y)Tlog(1−h))
Gradient Descent
Remember that the general form of gradient descent is:
Repeat{
θj:=θj−α∂θj∂J(θ)
}
We can work out the derivative part using calculus to get:
Repeat{
θj:=θj−mα∑i=1m(hθ(x(i))−y(i))xj(i)
}
Notice that this algorithm is identical to the one we used in linear regression.
We still have to simultaneously update all values in θ.
A vectorized implementation is:
θ:=θ−mαXT(g(Xθ)−y⃗)