【机器学习】回归案例实践:数据处理建模调参

本文深入探讨了机器学习中的回归问题,通过实际案例详细讲解了数据预处理步骤,包括数据清洗、特征工程和缺失值处理。接着,介绍了如何建立回归模型,如线性回归和决策树回归,并讨论了模型评估指标。最后,文章重点阐述了模型参数调优的方法,如网格搜索和随机搜索,以提升模型预测性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

# -*- coding: utf-8 -*-
"""回归问题案例.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1l8xlYKSd8nljVVEEriZyoc0oivqMDWR0
"""

# 导入必要的包
import numpy as np
import matplotlib.pyplot as plt
from pandas import read_csv
from pandas import set_option
from pandas.plotting import scatter_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error

# 导入数据
filename = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data'
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PRTATTO', 'B', 'LSTAT', 'MEDV' ]
data = read_csv(filename, names=names, delim_whitespace=True) # 指定文件的分隔符为空格键

data.shape

data.head()

data.describe()

# 理解数据
print(data.dtypes)

#
set_option('precision', 1)
print(data.describe())

# 查看特征之间的两两关联关系
set_option('precision', 2)
print(data.corr(method='pearson'))

# 数据可视化
# 单一特征图表
data.hist(sharex=False, sharey=False, xlabelsize=1, ylabelsize=1, layout=(3,5), bins=100)
plt.show()



# 用密度图展示:更加平滑展示数据特征
data.plot(kind='density', subplots=True, layout=(4,4), sharex=False, fontsize=1)
plt.show()

# 箱线图
data.plot(kind='box', subplots=True, layout=(4,4), sharex=False, sharey=False, fontsize=8)
plt.show(
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值