Assignment 6
Assignment 6
Discussion:
• The Iris dataset is loaded, and features are shuffled to randomize the data order.
• A KMeans clustering model is created with 3 clusters and a fixed random state for
reproducibility.
• The model is trained on the shuffled data, grouping the data points into clusters.
• Cluster labels for each data point are generated, but no further analysis or
visualization is performed in this snippet.
Name - Sandipan Rakshit | Roll No. - CSE/22/014 | CSE1 – Advanced IT Workshop
- Lab (PC-CS 591)
Implementation Code :
y = np.choose(y, [1, 2, 0]).astype(int) # Remap target labels: 0→1, 1→2, 2→0, then convert to
int
print(y) # Print the remapped target labels
Discussion:
• The code uses `np.choose()` to rearrange the class labels `y` of the Iris dataset by
]
Implementation Code :
from sklearn.metrics import confusion_matrix # Import confusion matrix function
# Create confusion matrix
conf_matrix = confusion_matrix(y, iris_kmeans.labels_) # Compare true labels with predicted
clusters
# Plot confusion matrix
fig, ax = plt.subplots(figsize=(7.5, 7.5)) # Create figure with custom size
ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3) # Display matrix as an image with
blue color
# Annotate confusion matrix cells
for i in range(conf_matrix.shape[0]): # Iterate over rows
for j in range(conf_matrix.shape[1]): # Iterate over columns
ax.text(x=j, y=i, s=conf_matrix[i, j], va='center', ha='center',
size='xx-large') # Add text
# Add labels and title
plt.xlabel('Predictions', fontsize=18) # X-axis label (predicted clusters)
plt.ylabel('Actuals', fontsize=18) # Y-axis label (actual labels)
plt.title('Confusion Matrix', fontsize=18) # Title of the plot
# Show the plot
plt.show() # Display the confusion matrix
# Print the cluster centers
print(iris_kmeans.cluster_centers_) # Display the coordinates of cluster centers
Discussion:
• The code calculates a confusion matrix using `confusion_matrix()`, comparing the
true labels `y` and predicted cluster labels from KMeans (`iris_kmeans.labels_`).
• A confusion matrix is a useful tool for evaluating clustering by showing how well
the predicted clusters correspond to actual classes.
• The matrix is visualized using `matshow()` with a blue color map to represent the
matrix values, and each cell is annotated with its corresponding number using
`ax.text()`.
• Labels for the x-axis (predictions) and y-axis (actuals) are added, and the matrix is
displayed with a title.
• Finally, the cluster centers of the KMeans model are printed, showing the mean
feature values for each cluster.
Name - Sandipan Rakshit | Roll No. - CSE/22/014 | CSE1 – Advanced IT Workshop
- Lab (PC-CS 591)