梯度下降算法:
Repeat
{
θj=θj−α∂∂θjJ(θ0,θ1...θn)\theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1...\theta_n)θj=θj−α∂θj∂J(θ0,θ1...θn)
}simultaneously update for every j=0,1…n)
θj=θj−α1m∑i=1m(hθ(x(i))−y(i))xj(i)\theta_j=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}θj=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
Feature Scaling以及Mean normalizaition
α\alphaα太大:slow convergence
α\alphaα太小:J(θ\thetaθ) mat not decrease on every iteration,may not converge
尝试不同的α\alphaα,绘制J(θ\thetaθ)随迭代次数变化的曲线
polynominal regression(多项式回归)
Normal equation(正规方程)
∂∂θjJ(θ)=0\frac{\partial}{\partial\theta_j}J(\theta)=0∂θj∂J(θ)=0 for every j
Gradient Descent 和 Normal Equation各自的优缺点