0% found this document useful (0 votes)
36 views50 pages

CL IV Lab Manual

The document is a lab manual for the Computer Laboratory IV course in the Department of Artificial Intelligence & Data Science Engineering for the academic year 2024-25. It outlines various experiments related to MapReduce programming, MongoDB operations, and data visualization techniques, along with certification details for students. Additionally, it includes hardware and software requirements for practical exercises, emphasizing the importance of effective data import and visualization in decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views50 pages

CL IV Lab Manual

The document is a lab manual for the Computer Laboratory IV course in the Department of Artificial Intelligence & Data Science Engineering for the academic year 2024-25. It outlines various experiments related to MapReduce programming, MongoDB operations, and data visualization techniques, along with certification details for students. Additionally, it includes hardware and software requirements for practical exercises, emphasizing the importance of effective data import and visualization in decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE ENGINEERING

Academic Year: 2024-25

Semesters -II

LAB MANUAL OF

417535: Computer Laboratory IV


(2019 Course)

Class: BE

Subject In charge: Prof.Bhagyashree Patil


PART II (417532): ELECTIVE VI
417531(B): Big Data Analytics
S.R. Name of the Experiment Date of Date of Pg. Sign
No Conduction Checking No
.

Develop a MapReduce program to calculate the frequency


1
of a given word in a given file.

2 Implement Matrix Multiplication using Map-Reduce


Develop a MapReduce program to find the grades of
3
students.
Mongo DB: Installation and Creation of database and
4 Collection CRUD Document: Insert, Query,Update and
Delete Document.

Develop a MapReduce program to analyze Titanic ship


data and to find the average age of the people (both male
5
and female) who died in the tragedy. How many persons
are survived in each class.
417532(B): Business Intelligence
Import Data from different Sources such as (Excel, Sql
1
Server, Oracle etc.) and load in targeted system.
Data Visualization from Extraction Transformation and
2
Loading (ETL) Process

Perform the Extraction Transformation and Loading (ETL)


3 process to construct the database in the Sql server / Power
BI.
4 Data Analysis and Visualization using Advanced Excel.
Perform the data clustering algorithm using any Clustering
5
algorithm
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
ENGINEERING

Academic Year: 2024-25

CERTIFICATE
This is to certify that Mr. /Miss. of Class
B.E. AI-DS Roll No.___________ Exam Seat No: ____________has satisfactory
completed practical of the subject “Computer Laboratory IV- 417535” for IInd semester of
Academic Year 2024 – 2025.

Date:

Prof.Bhagyashree Patil Prof.Dr.R.V.Babar Prof. (Dr.) J.B.Sankpal

Subject In Charge Head of Department Principal


EXP NO. 1
Develop a MapReduce program to calculate the frequency of a given word in a
given file.
EXP 2
Implement Matrix Multiplication using Map-Reduce
EXP NO 3
Develop a MapReduce program to find the grades of students.
EXP NO 4
Mongo DB: Installation and Creation of database and Collection CRUD
Document: Insert, Query,Update and Delete Document.

1. Introduction to MongoDB
2. Installation & Database Creation
3. CRUD Operations (You are here).
4. Embedded Documents and Arrays

CRUD (Create, Read, Update, Delete) operations are the fundamental building blocks for
interacting with a
MongoDB database. MongoDB provides various methods to insert, query, update, and delete
documents

Operation Method Description

Create insertOne(data, options) Inserts a single document into a collection.

insertMany(data, options) Inserts multiple documents into a collection.

Read find(filter, options) Retrieves documents matching a filter.

findMany(filter, options) Retrieves multiple documents based on criteria.

Updates a single document that matches the


Update updateOne(filter, data, options)
filter.

updateMany(filter, data, options) Updates multiple documents that match the filter.

replaceOne(filter, replacement,
Replaces a single document entirely.
options)

Delete deleteOne(filter, options) Deletes a single document matching the filter.

deleteMany(filter, options) Deletes multiple documents that match the filter.

1. CREATE Operations in MongoDB


To add documents to a collection, MongoDB offers the following methods:
db.collection.insertOne()

db.collection.insertMany()

Insert operations in MongoDB always target a single collection at a time, ensuring data
integrity and structure.

1. Practical Example: Inserting a Single Document

This method inserts a single object into the database.


db.passengers.insertOne({"name": "Jennifer", "age": 21, "seat": 34})

2. Practical Example: Inserting Multiple Documents

This method inserts an array of objects into the database.


db.passengers.insertMany([
{ "name": "Michael", "age": 30, "seat": 12 },
{ "name": "Sarah", "age": 25, "seat": 18 },
{ "name": "David", "age": 40, "seat": 22 },
{ "name": "Emily", "age": 28, "seat": 7 },
{ "name": "James", "age": 35, "seat": 15 }

II.Find Data(Retrieving) Operations in MongoDB

In MongoDB, we can retrieve documents from a collection using two methods: find()
and findOne().

3. Fetching Multiple Documents with find()

The find()method is used to retrieve multiple documents that match a given


condition. If no condition is specified, it will return all documents in the
collection.

Example: Retrieve all passengers

db.passengers.find()
4. Fetching a SingleDocument with findOne()

In MongoDB, when using the find()method, you can specify which fields to
include or exclude in the results using a projection object.

1 means include the field.


0 means exclude the field.

Example 1: Include only nameand seatfields

db.passengers.find({}, { name: 1, seat: 1 })

Example 2: Exclude _id field while including name and seat

db.passengers.find({}, { _id: 0, name: 1, seat: 1 })

III.Updating Documents in MongoDB

To update existing documents, MongoDB provides the updateOne()and updateMany()


methods.

The first parameter is a query object to define


which document(s) should be updated.
The second parameter is an update object
that specifies the new data to update.

1. updateOne()
The updateOne()method updates only the first document that matches the given query.

Example: Add a new field destinationto Jennifer’s document

1. Find the document first (optional)


db.passengers.updateOne(
db.passengers.find({ name: "Jennifer" })
{ name: "Jennifer" },
2. Update the document using updateOne()
db.passengers.updateOne(
{ name: "Jennifer" },

After updating the document, if you query the passengers collection to find
Jennifer’s document again, it will include the newly added field destination.
2. updateMany()
If you want to update multiple documents at once, you can use updateMany().
It will update all documents that match the query.

Example: Update the seat numbers


For example, if all passengers with seat: 34 should now be assigned seat: 30, you can do
this:

1. Find the document first (optional)


db.passengers.find({ seat: 34 })

3. Check the updated passengers:


db.passengers.find({ seat: 40 })

3.replaceOne() – Replace a Document


The replaceOne()method replaces the entire document that matches the query with a
new document. This is different from updateOne(), where you only update specific
fields in the document.

Example:
Let’s say you want to replace the document for the passenger named
“Jennifer” with a completely new document. Here’s how you would do it:
db.passengers.replaceOne(
{ name: "Jennifer" },
{ name: "Jennifer", age: 22, seat: 45, destination: "New York" }
)
The first parameter ({ name: "Jennifer" }) specifies the query to find the document.
The second parameter ({ name: "Jennifer", age: 22, seat: 45, destination:
"New York" }) is the new document that will replace the existing one.
- replaceOne()will completely replace the existing document with the new one.
-If the fields you do not include in the new document, they will be removed from the
document.
- Unlike updateOne(), it does not modify only specific fields but replaces the whole
document.

IV.Delete Documents in MongoDB

To remove documents from a MongoDB collection, you can use the following methods:
-deleteOne(): Deletes a single document that matches the query.
-deleteMany(): Deletes all documents that match the query.

deleteOne() – Remove One Document


The deleteOne()method deletes the first document that matches the query.
Example: To delete a passenger named “Jennifer” from the flights.passengers collection, you
can use:

This will remove only the first document that matches the name “Jennifer”.

deleteMany() – Remove Multiple Documents


Example: To delete all passengers who have the destination “Paris” from the
flights.passengers collection:
db.passengers.deleteMany({ destination: "New York" })

Once you’re connected to your MongoDB flightsdatabase in DbSchema,


you’ll be able to view the collections created during the previous lessons. In
this case, we’re working with the passengerscollection, which already has
data and fields defined.

1. Access the Data Editor


- First, click on the passengerstable in the DbSchema interface
2. Visualizing the Data
In the Data Editor, you can see your data as it is stored. This view makes it easy to work
directly with your

3.Adding a New Field (Like the CREATE operation in MongoShell)

To add a new field to a document, simply click the “+” button.


Enter the data you want to add in the new field, and then click the Save icon to
save your changes.
(In MongoShell, this would be similar to using the insertOne()or
insertMany()methods to insert a new document with additional
fields into the collection.)
4. FilteringData(LiketheFIND operation in MongoShell)

If you want to view only specific data, right-click on the Data Editor grid and
select “Filter”.
This allows you to set filters for certain fields, making it easy to narrow
down the results and find the exact data you’re looking for.
(In MongoShell, this would be like using the find()or findOne()
methods with query parameters to filter the data.)

5. Updating Data (Like the UPDATE operation in MongoShell)


You can modify any field’s data directly in the grid. For example, you can
update the name, age, destination, and other fields visually.

6.Modifying the Collection (DDL - Like ALTER operations in MongoShell)

If you need to modify the collection itself (e.g., add, update, or delete
fields in the structure), you can do this directly from the diagram
view in DbSchema. This allows you to visually change the schema of
your MongoDB collection.
(In MongoShell, this would be equivalent to using commands like
db.collection.update()for modifying documents or using
db.createCollection()for creating new collections.)

7.Deleting Data (Like the DELETE operation in MongoShell)


To delete an entry (or data) from the collection, click the bin icon or
right-click on a record and choose “Delete Record”.
This will remove the document from the collection.
(In MongoShell, this would be similar to using the deleteOne()or
deleteMany()methods to delete documents from the collection.)
EXP NO 5
EXPERIMENT NO. 1

➢ Aim : Import Data from different Sources such as (Excel, Sql Server, Oracle etc.) and
load in
targeted system.

➢ Outcome: Effective data visualizations derived from the ETL process provide clear
insights,
facilitating informed decision-making and enhancing understanding of the data.

➢ Hardware Requirement: Hardware requirements for Import Data from different


Sources process typically include a robust computer or server with sufficient
processing power (CPU), memory (RAM), and storage space. Additionally, a graphics
card may be beneficial for rendering complex visualizations quickly.

➢ Software Requirement: Ubuntu OS, Python Editor(Python Interpreter), Some


Libraries

➢ Objective:The objective of this practical is to import data from diverse sources such as
Excel spreadsheets, SQL Server databases, Oracle databases, etc., and load this data into
a targeted system for further analysis or processing.

➢ Theory:

Background:
In real-world scenarios, organizations often deal with data stored in various formats
and locations. These can include structured data in databases like SQL Server and
Oracle, as well as semi-structured or unstructured data in files like Excel
spreadsheets. Importing this data into a centralized system for analysis, reporting,
or other purposes is a common requirement.

Importance of Data Import:


Efficient data import processes are crucial for maintaining data integrity, reducing
manual effort, and ensuring timely availability of updated information for decision-
making. Automated data import procedures also enhance productivity and minimize
errors compared to manual data entry.

Tools and Technologies:


Source Data Formats:

1
Excel: Tabular data with sheets, rows, and columns.
SQL Server: Structured data in relational databases.

Oracle: Similar to SQL Server, storing structured data.

Target System:

Could be a database management system (DBMS) like MySQL, PostgreSQL, or


another SQL-based system.

Alternatively, it could be a data warehouse, data lake, or analytics platform.


Integration Tools:
ETL (Extract, Transform, Load) tools like Apache NiFi, Talend, or Informatica.
Database management tools with import/export capabilities. Programming
languages like Python with libraries such as pandas, SQLAlchemy, or pyodbc.

Network Connectivity: Required for accessing remote data sources such as SQL
Server or Oracle databases.
Procedure Overview:

I. Data Source Identification: Identify the data sources from which data needs
to be imported. This could include Excel files, SQL Server databases, Oracle
databases, or other sources.
II. Data Extraction: Extract data from the identified sources using appropriate
methods. For example:
III. Excel: Read data using libraries like pandas in Python or built-in Excel
functions.
IV. SQL Server/Oracle: Use SQL queries to extract data based on defined criteria.
V. Data Transformation (if required): Perform any necessary data
transformations such as data cleansing, formatting, or aggregation to prepare
the data for import into the target system.
VI. Data Loading: Load the transformed data into the targeted system. This could
involve using SQL INSERT statements, bulk import utilities, or ETL tools
depending on the target system and data volume.
VII. Data Validation: Validate the imported data in the target system to ensure
accuracy and completeness.
VIII. Error Handling: Implement error handling mechanisms to address any issues
encountered during data import, such as data format mismatches or
connectivity problems.
IX. Logging and Reporting: Maintain logs of the import process for auditing
purposes and generate reports on import status, errors, and data quality
metrics.

2
Output:

3
4
EXPERIMENT NO. 2

➢ Aim : Data Visualization from Extraction Transformation and Loading (ETL) Process.

➢ Outcome: Effective data visualizations derived from the ETL process provide clear
insights, facilitating informed decision-making and enhancing understanding of the data.

➢ Hardware Requirement: Hardware requirements for data visualization from the ETL
process typically include a robust computer or server with sufficient processing power
(CPU), memory (RAM), and storage space. Additionally, a graphics card may be
beneficial for rendering complex visualizations quickly.

➢ Software Requirement: Ubuntu OS, Python Editor(Python Interpreter), Some Libraries

➢ Theory:

1. ETL Process: The ETL process involves three main steps:

○ Extraction: Retrieving data from various sources such as databases, files,


APIs, etc.
○ Transformation: Cleaning, structuring, and enriching the extracted data
to make it suitable for analysis and visualization.
○ Loading: Storing the transformed data in a database, data warehouse, or
other storage systems for further analysis or reporting.

2. Data Visualization: Data visualization is the graphical representation of data


to communicate insights and facilitate understanding. It helps identify trends,
patterns, and relationships within the data. Effective visualizations use charts,
graphs, maps, and other graphical elements to convey information clearly and
efficiently.

3. Purpose of Data Visualization: Data visualization serves several purposes:

● Exploratory Analysis: Exploring data to discover trends, anomalies, and


patterns.
● Presentation: Communicating insights and findings to stakeholders in a
clear and compelling manner.
● Decision Making: Supporting decision-making processes by providing
actionable insights derived from data analysis.

4. Visualization Tools: There are various tools available for creating data
visualizations, ranging from standalone software like Tableau and Power BI to
libraries and frameworks in programming languages such as Python (Matplotlib,
Seaborn, Plotly) and R (ggplot2). These tools offer different features, capabilities,
and levels of customization to suit different needs and preferences.

5
5. Best Practices for Data Visualization:

● Understand the Audience: Tailor visualizations to the needs and preferences of


the intended audience.
● Choose Appropriate Visualizations: Select visualization types that effectively
represent the underlying data and insights.
● Simplify and Clarify: Avoid clutter and unnecessary complexity to enhance
clarity and readability.
● Use Color and Formatting Wisely: Use color, size, labels, and annotations
strategically to highlight key points and aid interpretation.
● Provide Context: Provide context and explanations to help viewers understand
the significance of the data and visualizations.__

➢ PROGRAM :

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris # To load the Iris dataset

# Data Extraction (using sklearn.datasets)


iris = load_iris() # Load the Iris flower dataset

# Data Transformation (Optional)


# The Iris data is pre-processed and ready for use

# Data Visualization (Multiple Examples)

# Example 1: Sepal Length vs. Sepal Width colored by Species (Scatter Plot)
sepal_length = iris.data[:, 0]
sepal_width = iris.data[:, 1]
target_names = iris.target_names # Get species names

plt.scatter(sepal_length, sepal_width, c=iris.target, cmap='plasma') # Color by target


species
plt.xlabel('Sepal length (cm)')
plt.ylabel('Sepal width (cm)')
plt.title('Iris Flower Dataset - Sepal Dimensions by Species')
plt.legend(target_names)
plt.show()

# Example 2: Distribution of Petal Lengths across Species (Box Plot)


petal_length = iris.data[:, 2] # Petal length feature
plt.boxplot([petal_length[iris.target == 0], petal_length[iris.target == 1],
petal_length[iris.target == 2]], notch=True, vert=False, patch_artist=True,

6
labels=target_names) # Separate boxes by species
plt.xlabel('Petal Length (cm)')
plt.ylabel('Species')
plt.title('Distribution of Petal Length by Iris Species')
plt.show()

➢ OUTPUT :

7
8
EXPERIMENT NO. 3

➢ Aim : Perform the ELT process to construct the database in SQL server / Power BI

➢ Outcome: Effective data visualizations derived from the ETL process provide clear
insights, facilitating informed decision-making and enhancing understanding of the
data.

➢ Hardware Requirement: Refer to the hardware recommendations provided in the


previous response for setting up SQL Server and Power BI environments.

➢ Software Requirement: SQL Server Management Studio (SSMS), Power BI Desktop

➢ Theory:

Step 1: Extraction

1)Identify the data sources from which you will extract data. These could be
relational databases, flat files (CSV, Excel), APIs, etc.

2)Use SQL Server Integration Services (SSIS) or any other ETL tool to extract data
from the sources and load it into a staging area.

3)Ensure that the extracted data is structured and ready for loading into the SQL
Server database.

Step 2: Loading

1)Open SQL Server Management Studio (SSMS) and connect to your SQL Server
instance.

2)Create a new database where you will load the extracted data. You can use the
following SQL script:

sql

CREATE DATABASE YourDatabaseName;

3)Design the schema for your database, including tables, columns, data types, and
relationships based on the extracted data.

4)Use SQL scripts or SSIS packages to load the data from the staging area into the
database tables.

5)Monitor the loading process and ensure data integrity.

Step 3: Transformation

9
1)Once the data is loaded into the database, perform any necessary transformations
to prepare it for analysis and reporting.

2)Use SQL queries to clean, filter, aggregate, and join data as per your requirements.

3)Create views, stored procedures, or user-defined functions (UDFs) to encapsulate


complex transformation logic and make it reusable.

4)Validate the transformed data to ensure accuracy and consistency.

Step 4: Utilization in Power BI

1)Open Power BI Desktop and connect to your SQL Server database as a data source.

2)Import tables or views from the database into Power BI.

3)Design your data model in Power BI by creating relationships between tables,


adding calculated columns, measures, and hierarchies.

4)Build interactive reports and dashboards using Power BI visuals such as charts,
tables, maps, etc.

5)Enhance your reports with additional features like slicers, filters, and drill-down
capabilities.

6)Publish your Power BI report to the Power BI service for sharing and
collaboration with others.

10
OUTPUT :

11
12
EXPERIMENT NO. 4
➢ Aim : Perform the data classification algorithm using any Classification algorithm

➢ Outcome:
The objective of this lab session is to perform data classification using the K- Nearest
Neighbors (KNN) algorithm. By the end of this lab, you should be able to:
• Understand the KNN algorithm and its working principle.
• Implement KNN classification using Python and scikit-learn.
• Evaluate the performance of the KNN classifier.
• Interpret the outcomes and draw conclusions.

➢ Hardware Requirement:
• Personal computer or laptop with a modern processor (e.g., Intel Core i3 or
higher).
• Sufficient RAM for running Python and the required libraries.

➢ Software Requirement:
• Python (3.0 or later)
• Jupyter Notebook (optional but recommended)
• Libraries: NumPy, pandas, scikit-learn

➢ Theory:

K-Nearest Neighbors (KNN) is a simple yet powerful classification algorithm that


classifies data points based on the majority class of their nearest neighbors in the
feature space. The key steps involved in the KNN algorithm are as follows:

• Training: The algorithm stores all the available data points and their
corresponding class labels.
• Prediction: For a new data point, the algorithm calculates the distances to all
training data points and selects the K nearest neighbors.
• Majority Voting: It then assigns the class label to the new data point based on the
majority class among its K nearest neighbors.

13
➢ PROGRAM :

import numpy as np # linear algebra


import matplotlib.pyplot as plt # For plotting
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Importing the dataset


dataset = pd.read_csv("/content/Social_Network_Ads.csv")
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state =
0)

dataset.info()

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)


classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix, accuracy_score, precision_score,
recall_score

cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
error_rate = 1 - accuracy # Error rate is 1 - Accuracy
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

# Print the results


print("Confusion Matrix:")
print(cm)
print("Accuracy:", accuracy)
print("Error Rate:", error_rate)
print("Precision:", precision)
print("Recall:", recall)

import seaborn as sns

def plot_confusion_matrix(cm):
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True)

14
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

plot_confusion_matrix(cm)

y_test

Y_pred

OUTPUT :

15
EXPERIMENT NO. 5

➢ Aim : Perform the data clustering algorithm using any clustering algorithm

➢ Introduction:
Clustering is an unsupervised learning technique used to group data points or
objects based on their similarities. K-Means is one of the most popular
clustering algorithms, widely used for partitioning data into clusters. It aims to
minimize the sum of squares of distances between data points and their
corresponding cluster centroids.
.
➢ Objective:
In this lab session, you will Understand the working principle of the K-Means
clustering algorithm.Implement the K-Means algorithm using Python and a
relevant library.Apply K-Means clustering to a sample dataset.Analyze and
interpret the results.

➢ Software Requirement:
Python environment (Jupyter Notebook recommended),Required libraries:
NumPy, Pandas, Matplotlib, and Scikit-learn,Sample dataset (can be generated
or obtained from any reliable source)

➢ Theory:

Step 1: Understanding K-Means Algorithm:

Initialization: Randomly select K centroids to represent initial cluster centers.

Assignment: Assign each data point to the nearest centroid, forming K clusters.

Update: Recalculate the centroids based on the mean of data points in each cluster.

Repeat: Repeat steps 2 and 3 until convergence (i.e., centroids do not change
significantly).

Step 2: Implementation of K-Means Algorithm:

Import necessary libraries: NumPy, Pandas, Matplotlib, and Scikit-learn.

Load or generate a sample dataset.

Implement the K-Means algorithm using Scikit-learn's KMeans class.

Fit the model to the dataset.

Retrieve cluster assignments and centroids.

16
Step 3: Applying K-Means Clustering:

Choose the number of clusters (K).

Perform K-Means clustering on the dataset.

Visualize the clustering results using scatter plots.

Step 4: Analysis and Interpretation:

Evaluate the quality of clustering using relevant metrics (e.g., silhouette score).

Analyze the distribution of data points in each cluster.

Interpret the results and discuss any insights gained.

Expected Outcome:

Upon completing this experiment, you should be able to successfully implement


the K-Means clustering algorithm, apply it to a dataset, and interpret the
clustering results.

Safety Precautions:

Ensure the dataset used does not contain sensitive or confidential information.

Handle the Python environment with care, following standard coding practices.

Save your work periodically to prevent data loss.

17
➢ PROGRAM :

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()
data = iris.data

# Standardize the data (scaling)


scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Initialize an empty list to store the values of the within-cluster sum of squares
(WCSS)
wcss = []

# Determine the number of clusters using the elbow method


for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=30)
kmeans.fit(data_scaled)
wcss.append(kmeans.inertia_)

# Plot the elbow graph


plt.figure(figsize=(8, 6))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')
plt.title('Elbow Method for Optimal k')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS (Within-Cluster Sum of Squares)')
plt.grid()
plt.show()

18
OUTPUT :

19

You might also like