深入理解线性回归与梯度下降算法：以ethen8181机器学习项目为例-CSDN博客

本文链接：https://blog.csdn.net/gitblog_00919/article/details/148601762

深入理解线性回归与梯度下降算法：以ethen8181机器学习项目为例

machine-learning :earth_americas: machine learning tutorials (mainly in Python3) 项目地址: https://gitcode.com/gh_mirrors/mach/machine-learning

引言

线性回归是机器学习中最基础且重要的算法之一，而梯度下降则是优化模型参数的经典方法。本文将基于ethen8181机器学习项目中的线性回归实现，深入浅出地讲解梯度下降在线性回归中的应用原理和实践技巧。

梯度下降基础

基本概念

梯度下降是一种迭代优化算法，用于寻找函数的最小值。其核心思想是：沿着函数梯度的反方向逐步调整参数，直到找到最优解。

数学表达式为： $$\text{重复直到收敛} { x:=x-\alpha\triangledown F(x) }$$

其中：

$\alpha$：学习率，控制步长大小
$\triangledown F(x)$：函数在x处的梯度（导数）

简单示例

考虑函数 $F(x) = 1.2\times(x-2)^2 + 3.2$，我们可以通过梯度下降找到其最小值：

# 定义函数及其导数
Formula <- function(x) 1.2 * (x-2)^2 + 3.2
Derivative <- function(x) 2 * 1.2 * (x-2) 

# 梯度下降实现
learning_rate <- 0.6
x <- 0.1  # 初始值

# 迭代过程
for(i in 1:10) {
  x <- x - learning_rate * Derivative(x)
  print(paste("迭代", i, ": x =", x, "F(x) =", Formula(x)))
}

这个简单示例展示了梯度下降如何逐步逼近最优解x=2的过程。

线性回归中的梯度下降

问题定义

给定房屋面积和卧室数量预测房价，这是一个典型的多元线性回归问题。模型表示为：

$$ h_{\theta}(x) = \theta_0 + \theta_1 x_{area} + \theta_2 x_{bedrooms} $$

成本函数

我们使用均方误差(MSE)作为成本函数：

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 $$

其中m是样本数量，$\theta$是待优化的参数。

参数更新规则

对每个参数$\theta_j$的更新规则为：

$$ \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} $$

实现细节

数据预处理

特征缩放是梯度下降中的重要步骤，特别是当特征量纲差异大时：

# Z-score标准化
Normalize <- function(x) (x - mean(x)) / sd(x)

# 对房屋数据进行标准化
normed_housing <- apply(housing[, -3], 2, Normalize)

梯度下降实现

ethen8181项目中实现了完整的梯度下降算法：

# 梯度下降主函数
GradientDescent <- function(data, target, learning_rate, iteration, 
                           epsilon=0.001, normalize=TRUE, method="batch") {
  # 实现细节...
}

# 应用示例
result <- GradientDescent(data=housing, target="price", 
                         learning_rate=0.05, iteration=500)

与线性回归对比

将梯度下降结果与R内置的lm函数对比：

# 梯度下降结果
parameters_gd <- result$theta[nrow(result$theta), ]

# 线性回归结果
model_lm <- lm(price ~ area + bedrooms, data=normed_housing)
coefficients_lm <- coef(model_lm)

# 比较
data.frame(Gradient_Descent=parameters_gd, Linear_Regression=coefficients_lm)