K 临近算法_k临近搜索csdn-CSDN博客

本文链接：https://blog.csdn.net/hawk2014bj/article/details/143466215

机器学习中的 K 临近算法，计算输入数据与训练集中数据的距离，选取 k 个最近的数据，选中的数据中，那个分类多，那个分类就是最终结果。特征空间的距离有多重测量方法，最常用的就是欧氏距离，公式如下。
在这里插入图片描述
K 临近并不会对模型进行训练，而只是在训练集上进行查找。

数据准备

使用 Iris 数据集，准备训练集、测试集

# 导入相关模块
from collections import Counter
from sklearn import datasets
from sklearn.utils import shuffle
# 导入sklearn iris数据集
iris = datasets.load_iris()
# 打乱数据后的数据与标签
X, y = shuffle(iris.data, iris.target, random_state=13)
# 数据转换为float32格式
X = X.astype(np.float32)
# 训练集与测试集的简单划分，训练-测试比例为7：3
offset = int(X.shape[0] * 0.7)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
# 将标签转换为竖向量
y_train = y_train.reshape((-1,1))
y_test = y_test.reshape((-1,1))
# 打印训练集和测试集大小
print('X_train=', X_train.shape)
print('X_test=', X_test.shape)
print('y_train=', y_train.shape)
print('y_test=', y_test.shape)

实现 K 临近算法

# 导入KneighborsClassifier模块
from sklearn.neighbors import KNeighborsClassifier
# 创建k近邻实例
neigh = KNeighborsClassifier(n_neighbors=10)
# k近邻模型拟合
neigh.fit(X_train, y_train)
# k近邻模型预测
y_pred = neigh.predict(X_test)
# 预测结果数组重塑
y_pred = y_pred.reshape((-1, 1))
# 统计预测正确的个数
num_correct = np.sum(y_pred == y_test)
# 计算准确率
accuracy = float(num_correct) / X_test.shape[0]
print('Got %d / %d correct => accuracy: %f' % (num_correct, X_test.shape[0], accuracy))

在这里插入图片描述