导入机器学习boston数据集报错问题解决

最新推荐文章于 2025-05-17 16:06:19 发布

kaka_R-Py

最新推荐文章于 2025-05-17 16:06:19 发布

阅读量1.1k

点赞数 22

CC 4.0 BY-SA版权

文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/2301_76574743/article/details/139919740

在导入机器学习boston数据集：

from sklearn.datasets import load_boston

发生了报错：

ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as investigated in [1], the authors of this dataset engineered a non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the research that led to the creation of this dataset was to study the impact of air quality but it did not give adequate demonstration of the validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning.

这是因为在sklearn库版本到1.2及以上时from sklearn.datasets import load_boston语句已经不在受支持，需要改为以下代码，这个解决方法在报错中已经给出：

import pandas as pd
import numpy as np

data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

这样问题就会得到解决。

import numpy as np
import pandas as pd
# 读取网络数据
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep=r"\s+", skiprows=22, header=None)
boston = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]
data=pd.DataFrame(boston)
data.columns=['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT']
data['target']=target

data.to_csv('数据集/boston.csv',index=False)