CL IV Lab Manual
CL IV Lab Manual
Semesters -II
LAB MANUAL OF
Class: BE
CERTIFICATE
This is to certify that Mr. /Miss. of Class
B.E. AI-DS Roll No.___________ Exam Seat No: ____________has satisfactory
completed practical of the subject “Computer Laboratory IV- 417535” for IInd semester of
Academic Year 2024 – 2025.
Date:
1. Introduction to MongoDB
2. Installation & Database Creation
3. CRUD Operations (You are here).
4. Embedded Documents and Arrays
CRUD (Create, Read, Update, Delete) operations are the fundamental building blocks for
interacting with a
MongoDB database. MongoDB provides various methods to insert, query, update, and delete
documents
updateMany(filter, data, options) Updates multiple documents that match the filter.
replaceOne(filter, replacement,
Replaces a single document entirely.
options)
db.collection.insertMany()
Insert operations in MongoDB always target a single collection at a time, ensuring data
integrity and structure.
In MongoDB, we can retrieve documents from a collection using two methods: find()
and findOne().
db.passengers.find()
4. Fetching a SingleDocument with findOne()
In MongoDB, when using the find()method, you can specify which fields to
include or exclude in the results using a projection object.
1. updateOne()
The updateOne()method updates only the first document that matches the given query.
After updating the document, if you query the passengers collection to find
Jennifer’s document again, it will include the newly added field destination.
2. updateMany()
If you want to update multiple documents at once, you can use updateMany().
It will update all documents that match the query.
Example:
Let’s say you want to replace the document for the passenger named
“Jennifer” with a completely new document. Here’s how you would do it:
db.passengers.replaceOne(
{ name: "Jennifer" },
{ name: "Jennifer", age: 22, seat: 45, destination: "New York" }
)
The first parameter ({ name: "Jennifer" }) specifies the query to find the document.
The second parameter ({ name: "Jennifer", age: 22, seat: 45, destination:
"New York" }) is the new document that will replace the existing one.
- replaceOne()will completely replace the existing document with the new one.
-If the fields you do not include in the new document, they will be removed from the
document.
- Unlike updateOne(), it does not modify only specific fields but replaces the whole
document.
To remove documents from a MongoDB collection, you can use the following methods:
-deleteOne(): Deletes a single document that matches the query.
-deleteMany(): Deletes all documents that match the query.
This will remove only the first document that matches the name “Jennifer”.
If you want to view only specific data, right-click on the Data Editor grid and
select “Filter”.
This allows you to set filters for certain fields, making it easy to narrow
down the results and find the exact data you’re looking for.
(In MongoShell, this would be like using the find()or findOne()
methods with query parameters to filter the data.)
If you need to modify the collection itself (e.g., add, update, or delete
fields in the structure), you can do this directly from the diagram
view in DbSchema. This allows you to visually change the schema of
your MongoDB collection.
(In MongoShell, this would be equivalent to using commands like
db.collection.update()for modifying documents or using
db.createCollection()for creating new collections.)
➢ Aim : Import Data from different Sources such as (Excel, Sql Server, Oracle etc.) and
load in
targeted system.
➢ Outcome: Effective data visualizations derived from the ETL process provide clear
insights,
facilitating informed decision-making and enhancing understanding of the data.
➢ Objective:The objective of this practical is to import data from diverse sources such as
Excel spreadsheets, SQL Server databases, Oracle databases, etc., and load this data into
a targeted system for further analysis or processing.
➢ Theory:
Background:
In real-world scenarios, organizations often deal with data stored in various formats
and locations. These can include structured data in databases like SQL Server and
Oracle, as well as semi-structured or unstructured data in files like Excel
spreadsheets. Importing this data into a centralized system for analysis, reporting,
or other purposes is a common requirement.
1
Excel: Tabular data with sheets, rows, and columns.
SQL Server: Structured data in relational databases.
Target System:
Network Connectivity: Required for accessing remote data sources such as SQL
Server or Oracle databases.
Procedure Overview:
I. Data Source Identification: Identify the data sources from which data needs
to be imported. This could include Excel files, SQL Server databases, Oracle
databases, or other sources.
II. Data Extraction: Extract data from the identified sources using appropriate
methods. For example:
III. Excel: Read data using libraries like pandas in Python or built-in Excel
functions.
IV. SQL Server/Oracle: Use SQL queries to extract data based on defined criteria.
V. Data Transformation (if required): Perform any necessary data
transformations such as data cleansing, formatting, or aggregation to prepare
the data for import into the target system.
VI. Data Loading: Load the transformed data into the targeted system. This could
involve using SQL INSERT statements, bulk import utilities, or ETL tools
depending on the target system and data volume.
VII. Data Validation: Validate the imported data in the target system to ensure
accuracy and completeness.
VIII. Error Handling: Implement error handling mechanisms to address any issues
encountered during data import, such as data format mismatches or
connectivity problems.
IX. Logging and Reporting: Maintain logs of the import process for auditing
purposes and generate reports on import status, errors, and data quality
metrics.
2
Output:
3
4
EXPERIMENT NO. 2
➢ Aim : Data Visualization from Extraction Transformation and Loading (ETL) Process.
➢ Outcome: Effective data visualizations derived from the ETL process provide clear
insights, facilitating informed decision-making and enhancing understanding of the data.
➢ Hardware Requirement: Hardware requirements for data visualization from the ETL
process typically include a robust computer or server with sufficient processing power
(CPU), memory (RAM), and storage space. Additionally, a graphics card may be
beneficial for rendering complex visualizations quickly.
➢ Theory:
4. Visualization Tools: There are various tools available for creating data
visualizations, ranging from standalone software like Tableau and Power BI to
libraries and frameworks in programming languages such as Python (Matplotlib,
Seaborn, Plotly) and R (ggplot2). These tools offer different features, capabilities,
and levels of customization to suit different needs and preferences.
5
5. Best Practices for Data Visualization:
➢ PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris # To load the Iris dataset
# Example 1: Sepal Length vs. Sepal Width colored by Species (Scatter Plot)
sepal_length = iris.data[:, 0]
sepal_width = iris.data[:, 1]
target_names = iris.target_names # Get species names
6
labels=target_names) # Separate boxes by species
plt.xlabel('Petal Length (cm)')
plt.ylabel('Species')
plt.title('Distribution of Petal Length by Iris Species')
plt.show()
➢ OUTPUT :
➢
7
8
EXPERIMENT NO. 3
➢ Aim : Perform the ELT process to construct the database in SQL server / Power BI
➢ Outcome: Effective data visualizations derived from the ETL process provide clear
insights, facilitating informed decision-making and enhancing understanding of the
data.
➢ Theory:
Step 1: Extraction
1)Identify the data sources from which you will extract data. These could be
relational databases, flat files (CSV, Excel), APIs, etc.
2)Use SQL Server Integration Services (SSIS) or any other ETL tool to extract data
from the sources and load it into a staging area.
3)Ensure that the extracted data is structured and ready for loading into the SQL
Server database.
Step 2: Loading
1)Open SQL Server Management Studio (SSMS) and connect to your SQL Server
instance.
2)Create a new database where you will load the extracted data. You can use the
following SQL script:
sql
3)Design the schema for your database, including tables, columns, data types, and
relationships based on the extracted data.
4)Use SQL scripts or SSIS packages to load the data from the staging area into the
database tables.
Step 3: Transformation
9
1)Once the data is loaded into the database, perform any necessary transformations
to prepare it for analysis and reporting.
2)Use SQL queries to clean, filter, aggregate, and join data as per your requirements.
1)Open Power BI Desktop and connect to your SQL Server database as a data source.
4)Build interactive reports and dashboards using Power BI visuals such as charts,
tables, maps, etc.
5)Enhance your reports with additional features like slicers, filters, and drill-down
capabilities.
6)Publish your Power BI report to the Power BI service for sharing and
collaboration with others.
10
OUTPUT :
11
12
EXPERIMENT NO. 4
➢ Aim : Perform the data classification algorithm using any Classification algorithm
➢ Outcome:
The objective of this lab session is to perform data classification using the K- Nearest
Neighbors (KNN) algorithm. By the end of this lab, you should be able to:
• Understand the KNN algorithm and its working principle.
• Implement KNN classification using Python and scikit-learn.
• Evaluate the performance of the KNN classifier.
• Interpret the outcomes and draw conclusions.
➢ Hardware Requirement:
• Personal computer or laptop with a modern processor (e.g., Intel Core i3 or
higher).
• Sufficient RAM for running Python and the required libraries.
➢ Software Requirement:
• Python (3.0 or later)
• Jupyter Notebook (optional but recommended)
• Libraries: NumPy, pandas, scikit-learn
➢ Theory:
• Training: The algorithm stores all the available data points and their
corresponding class labels.
• Prediction: For a new data point, the algorithm calculates the distances to all
training data points and selects the K nearest neighbors.
• Majority Voting: It then assigns the class label to the new data point based on the
majority class among its K nearest neighbors.
13
➢ PROGRAM :
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state =
0)
dataset.info()
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
error_rate = 1 - accuracy # Error rate is 1 - Accuracy
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
def plot_confusion_matrix(cm):
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True)
14
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
plot_confusion_matrix(cm)
y_test
Y_pred
OUTPUT :
15
EXPERIMENT NO. 5
➢ Aim : Perform the data clustering algorithm using any clustering algorithm
➢ Introduction:
Clustering is an unsupervised learning technique used to group data points or
objects based on their similarities. K-Means is one of the most popular
clustering algorithms, widely used for partitioning data into clusters. It aims to
minimize the sum of squares of distances between data points and their
corresponding cluster centroids.
.
➢ Objective:
In this lab session, you will Understand the working principle of the K-Means
clustering algorithm.Implement the K-Means algorithm using Python and a
relevant library.Apply K-Means clustering to a sample dataset.Analyze and
interpret the results.
➢ Software Requirement:
Python environment (Jupyter Notebook recommended),Required libraries:
NumPy, Pandas, Matplotlib, and Scikit-learn,Sample dataset (can be generated
or obtained from any reliable source)
➢ Theory:
Assignment: Assign each data point to the nearest centroid, forming K clusters.
Update: Recalculate the centroids based on the mean of data points in each cluster.
Repeat: Repeat steps 2 and 3 until convergence (i.e., centroids do not change
significantly).
16
Step 3: Applying K-Means Clustering:
Evaluate the quality of clustering using relevant metrics (e.g., silhouette score).
Expected Outcome:
Safety Precautions:
Ensure the dataset used does not contain sensitive or confidential information.
Handle the Python environment with care, following standard coding practices.
17
➢ PROGRAM :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
# Initialize an empty list to store the values of the within-cluster sum of squares
(WCSS)
wcss = []
18
OUTPUT :
19