Simplified Cost Function and Gradient Descent

Note:[6:53 - the gradient descent equation should have a 1/m factor]

We can compress our cost function's two conditional cases into one case:

Cost(hθ(x),y)=ylog(hθ(x))(1y)log(1hθ(x))\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))

Notice that when y is equal to 1, then the second term(1−y)log(1−hθ(x))will be zero and will not affect the result. If y is equal to 0, then the first term−ylog(hθ(x))will be zero and will not affect the result.

We can fully write out our entire cost function as follows:

J(θ)=1mi=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]

A vectorized implementation is:

h=g(Xθ)h = g(X\theta)

J(θ)=1m(yTlog(h)(1y)Tlog(1h))J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right)

Gradient Descent

Remember that the general form of gradient descent is:

Repeat{Repeat \; \lbrace

θj:=θjαθjJ(θ)\;\;\; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta)

}\rbrace

We can work out the derivative part using calculus to get:

Repeat{Repeat \; \lbrace

θj:=θjαmi=1m(hθ(x(i))y(i))xj(i)\;\;\; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}

}\rbrace

Notice that this algorithm is identical to the one we used in linear regression.

We still have to simultaneously update all values in θ\theta.

A vectorized implementation is:

θ:=θαmXT(g(Xθ)y)\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})

results matching ""

    No results matching ""