0% found this document useful (0 votes)
8 views

Assignment 6

Uploaded by

dattatreyasaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Assignment 6

Uploaded by

dattatreyasaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment 6

Question-1: Write a python code to implement K-Means clustering algorithm.


Implementation Code :
from sklearn.cluster import KMeans # KMeans for clustering
from sklearn import datasets # Load datasets
from sklearn.utils import shuffle # Shuffle data
import numpy as np # Numerical operations
import matplotlib.pyplot as plt # Plotting
import matplotlib.colors as colors # Color handling
# Load iris dataset
iris = datasets.load_iris() # Iris dataset
X = iris.data # Features (inputs)
y = iris.target # Target (labels)
names = iris.feature_names # Feature names
print(names) # Show feature names
# Shuffle the dataset
X, y = shuffle(X, y, random_state=42) # Shuffle data, set seed
# KMeans clustering model
model = KMeans(n_clusters=3, random_state=42) # 3 clusters, seed 42
# Fit model to data
iris_kmeans = model.fit(X) # Train KMeans
# Get cluster labels
iris_kmeans.labels_ # Output labels (cluster)

Execution and Output :


['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 1,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0,
0, 1, 1, 2, 1, 2, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0,
1, 2, 0, 1, 1, 0, 1, 1, 1, 1, 2, 1, 0, 1, 2, 0, 0, 1, 2, 0, 1, 0,
0, 1, 1, 2, 1, 2, 2, 1, 0, 0, 1, 2, 0, 0, 0, 1, 2, 0, 2, 2, 0, 1,
1, 1, 1, 2, 0, 2, 1, 2, 1, 1, 1, 0, 1, 1, 0, 1, 2, 2, 0, 1, 2, 2,
0, 2, 0, 2, 2, 2, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 1, 2])

Discussion:
• The Iris dataset is loaded, and features are shuffled to randomize the data order.
• A KMeans clustering model is created with 3 clusters and a fixed random state for
reproducibility.
• The model is trained on the shuffled data, grouping the data points into clusters.
• Cluster labels for each data point are generated, but no further analysis or
visualization is performed in this snippet.
Name - Sandipan Rakshit | Roll No. - CSE/22/014 | CSE1 – Advanced IT Workshop
- Lab (PC-CS 591)
Implementation Code :
y = np.choose(y, [1, 2, 0]).astype(int) # Remap target labels: 0→1, 1→2, 2→0, then convert to
int
print(y) # Print the remapped target labels

Execution and Output :


[2 1 0 2 2 1 2 0 2 2 0 1 1 1 1 2 0 2 2 0 1 0 1 0 0 0 0 0 1 1 1 1 2 1 1 0 2
1110221120020202102111201112120120100
2202120112210112202002110011120100122
0201020222122120012001012002022001201
2 0]

Discussion:
• The code uses `np.choose()` to rearrange the class labels `y` of the Iris dataset by
]

mapping the original values `[0, 1, 2]` to `[1, 2, 0]`.


• This transformation changes the original class labels to a new order and converts
them to integers using `.astype(int)`.
• The result is printed, showing the new class labels after reordering.
• This label transformation could be used for consistency or to adapt the dataset to a
specific requirement for clustering or classification tasks.

Implementation Code :
from sklearn.metrics import confusion_matrix # Import confusion matrix function
# Create confusion matrix
conf_matrix = confusion_matrix(y, iris_kmeans.labels_) # Compare true labels with predicted
clusters
# Plot confusion matrix
fig, ax = plt.subplots(figsize=(7.5, 7.5)) # Create figure with custom size
ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3) # Display matrix as an image with
blue color
# Annotate confusion matrix cells
for i in range(conf_matrix.shape[0]): # Iterate over rows
for j in range(conf_matrix.shape[1]): # Iterate over columns
ax.text(x=j, y=i, s=conf_matrix[i, j], va='center', ha='center',
size='xx-large') # Add text
# Add labels and title
plt.xlabel('Predictions', fontsize=18) # X-axis label (predicted clusters)
plt.ylabel('Actuals', fontsize=18) # Y-axis label (actual labels)
plt.title('Confusion Matrix', fontsize=18) # Title of the plot
# Show the plot
plt.show() # Display the confusion matrix
# Print the cluster centers
print(iris_kmeans.cluster_centers_) # Display the coordinates of cluster centers

Name - Sandipan Rakshit | Roll No. - CSE/22/014 | CSE1 – Advanced IT Workshop


- Lab (PC-CS 591)
Execution and Output:

[[5.006 3.428 1.462 0.246 ]


[5.9016129 2.7483871 4.39354839 1.43387097]
[6.85 3.07368421 5.74210526 2.07105263]]

Discussion:
• The code calculates a confusion matrix using `confusion_matrix()`, comparing the
true labels `y` and predicted cluster labels from KMeans (`iris_kmeans.labels_`).
• A confusion matrix is a useful tool for evaluating clustering by showing how well
the predicted clusters correspond to actual classes.
• The matrix is visualized using `matshow()` with a blue color map to represent the
matrix values, and each cell is annotated with its corresponding number using
`ax.text()`.
• Labels for the x-axis (predictions) and y-axis (actuals) are added, and the matrix is
displayed with a title.
• Finally, the cluster centers of the KMeans model are printed, showing the mean
feature values for each cluster.
Name - Sandipan Rakshit | Roll No. - CSE/22/014 | CSE1 – Advanced IT Workshop
- Lab (PC-CS 591)

You might also like