Open In App

How to Visualize KNN in Python

Last Updated : 25 Nov, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Visualizing the K-Nearest Neighbors (KNN) algorithm in Python is a great way to understand how this supervised learning method works and how it makes predictions. In essence, visualizing KNN involves plotting the decision boundaries that the algorithm creates based on the number of nearest neighbors (K) it considers. Here’s a straightforward guide on how to do this. Visualizing helps in understanding how changing the value of K affects the decision boundaries and the accuracy of the model as well as help in identifying outliers or data skew.

To visualize KNN in Python, you typically follow these steps:

Python
# Import necessary libraries
import matplotlib.pyplot as plt
import pandas as pd 
from sklearn import datasets, neighbors  
from sklearn.model_selection import train_test_split  

# Step 1: Generate a synthetic dataset with 2 features and 4 centers (clusters)
X, y = datasets.make_blobs(n_samples=500, n_features=2, centers=4, cluster_std=1.5, random_state=4)

# Step 2: Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Initialize the KNN classifier with 5 neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors=5)

# Step 4: Train the KNN model using the training data
knn.fit(X_train, y_train)

# Step 5: Visualize the decision regions of the trained KNN model
from mlxtend.plotting import plot_decision_regions 
plot_decision_regions(X, y, clf=knn, legend=2)  

# Step 6: Label the axes and add a title to the plot
plt.xlabel('X')  
plt.ylabel('Y')
plt.title('KNN with K=5')

# Step 7: Save the plot as an image file with tight bounding box and high resolution (150 dpi)
plt.savefig('KNN with K=5.jpeg', bbox_inches="tight", dpi=150)

# Step 8: Display the plot
plt.show()


This code demonstrates how to implement a K-Nearest Neighbors (KNN) classifier on a synthetic dataset with 2 features and 4 clusters. It generates the dataset, splits it into training and testing sets, and trains the KNN model with 5 neighbors. The decision boundaries of the classifier are visualized using plot_decision_regions(), and the plot is saved as an image before being displayed.

Output:

KNN_with_K_5
KNN ON IRIS WITH K = 5

Visualizing Unique Features in KNN in Python

  • Changing K Values: One of the most interesting aspects of visualizing KNN is to see how different values of K affect the decision boundaries. For example, a low K value (e.g., K=1) results in a more complex boundary, while a high K value (e.g., K=20) simplifies the boundary but may lead to over-smoothing.
  • Decision Boundaries: The decision boundaries are the lines or curves that separate the different classes in the feature space. Visualizing these boundaries helps in understanding how the model is making predictions and where it might be overfitting or underfitting.
  • Outliers and Noise: Visualizations can also highlight how outliers and noisy data points affect the decision boundaries. This is crucial for understanding the robustness of the KNN model to such data points.
Python
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets, neighbors
from mlxtend.plotting import plot_decision_regions

# Generate synthetic data
X, y = datasets.make_blobs(n_samples=500, n_features=2, centers=4, cluster_std=1.5, random_state=4)

# Split data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train KNN models with different K values
for k in [1, 5, 20]:
    knn = neighbors.KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    plot_decision_regions(X, y, clf=knn, legend=2)
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.title(f'KNN with K={k}')
    plt.show()


This code generates a synthetic 2D dataset with 4 clusters and applies the K-Nearest Neighbors (KNN) algorithm with different values of K (1, 5, and 20). It splits the data into training and test sets, trains the KNN model for each K, and visualizes the decision boundaries using the plot_decision_regions function. The plot is displayed for each K value, showing how the decision regions change as K varies.

Output:

PLOT-COMPARISON
KNN ON IRIS WITH DIFFERENT K VALUES

Understanding how to visualize KNN in Python is crucial for gaining insights into how the algorithm works and how to optimize it for better performance. This visualization can help in identifying the optimal K value and understanding the impact of outliers and noisy data on the model's predictions.


Next Article

Similar Reads