0% found this document useful (0 votes)

12 views71 pages

Fds Lab Manual

The document outlines the practical laboratory exercises for the 'Foundations of Data Science' course at Panimalar Engineering College, detailing various programming tasks involving data analysis using Python libraries such as NumPy, Pandas, and SciPy. It includes a bonafide certificate, a list of experiments, and specific programming examples for tasks like data manipulation, statistical analysis, and visualization. The document serves as a record book for students' practical work during the semester.

Uploaded by

shobanasofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views71 pages

Fds Lab Manual

Uploaded by

shobanasofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

PANIMALAR ENGINEERING COLLEGE

(An Autonomous Institution, Affiliated to Anna University, Chennai)

(A CHRISITIAN MINORITY INSTITUTION) JAISAKTHI EDUCATIONAL TRUST
ACCREDITED BY NATIONAL BOARD OF ACCREDITATION (NBA)
Bangalore Trunk Road, Varadharajapuram, Nasarathpettai,
Poonamallee, Chennai – 600 123

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

23AD1413 - FOUNDATIONS OF DATA SCIENCE LABORATORY

II CSE - IV SEMESTER
2025 - 2026

Name :
Register No. :
Roll No. :
PANIMALAR ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)
(A CHRISITIAN MINORITY INSTITUTION) JAISAKTHI EDUCATIONAL TRUST
ACCREDITED BY NATIONAL BOARD OF ACCREDITATION (NBA)
Bangalore Trunk Road, Varadharajapuram, Nasarathpettai,
Poonamallee, Chennai – 600 123

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Bonafide Certificate

This is a Certified Bonafide Record Book of

Mr. /Ms.
Register Number
Submitted for End Semester Practical Examination held on
in 23AD1413 - FOUNDATIONS OF DATA SCIENCE LABORATORY during MAY
2026

Staff in charge

Internal Examiner External Examiner

Ex. Date Experiments Marks Sign
No.
1 Download, install and explore the features
of NumPy, SciPy, Jupyter,
Statsmodels and Pandas packages.
2 Create an empty and a full NumPy array.
3 Program to remove rows in Numpy
array that contains non-numeric Values.
4 Reading data from text files, Excel and the
web and exploring various commands for
doing descriptive
analytics on the Iris data set.
5 Use the diabetes data set from UCI and Pima
Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean,
Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic
regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above
analysis for the two data sets
6 Write a Pandas program to create and display
a DataFrame from a specified dictionary data
which has the index labels.

7 Write a Pandas program to get the first 3

rows of a given DataFrame.
8 Program to find the variance and standard
deviation of set of elements.
9 Write a Python program to draw line charts of
the financial data of Alphabet Inc. between
October 3, 2016 to October 7, 2016.
10 Program to plot a Correlation and scatter
plots.

11 Program for Linear Regression and

Logistic Regression.

12 Apply and explore various plotting

functions on UCI data sets.

a. Normal curves, b. Density and contour

plots

c. Correlation and scatter plots, d.

Histograms

e. Three dimensional plotting

13 Perform Mini Project on Fake News

Detection.

14 Build an application to detect colors in the

given picture using Basic Data

Science

Additional Programs
1 Generate random number in an array

2 Program that compares two arrays

element and return a boolean array
3 Case study 1

4 Case study 2

5 Program to find the unique element in an

array and count the occurrence
6 Program to implement matrix
multiplication for two dimensional arrays
using numpy
7 Program to check if an array contains all
positive numbers using a comparison
operator
8 Program to create a random 3×3 matrix
and find the maximum value in the
matrix
9 Given a 3x3 matrix, sort the array in
ascending order along each row, sort it in
descending order along each column.
Find the index position of the element
after reading.
10 Create a mask that filters out values
greater than 50 and compute the sum of
the remaining values
11 Case study 3
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 1
DATE:
Download, install and explore the features of NumPy, SciPy,
Jupyter, Statsmodels and Pandas packages

Program:
import numpy as np
import pandas as pd
import [Link] as sm
from scipy import linalg
A = [Link]([[1,2],[3,4]])
[Link](A)
print("A: ", A)
b = [Link]([[5,6]])
[Link](b.T)
print("multiplication: ", A)
print("norm: ",
[Link](A))
data = [Link]([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
df = [Link](data, columns=['c', 'a'])
print("shape: ", [Link])
print("size: ", [Link])
print("dim: ", [Link])
print("mean: ", [Link](df))
print("std: ", [Link](df))
Output:
A: [[1 2]
[3 4]]
multiplication: [[1 2]
[3 4]]
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
norm: 5.477225575051661
shape: (3, 2)
size: 6
dim: 2
mean: 5.0
std:
c 2.44949
a 2.44949

Result:
Thus the program to explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 2

DATE:
Create an empty and a full NumPy array
Program:
import numpy as
np a =
[Link]((3,3))
print(a)
a = [Link]((3,3))
print(a)
a = [Link]((3,3))
print(a)
a = [Link]((3,3), 5)
print(a)

Output:
[[1.06099790e-312 1.10343781e-312 1.20953760e-312]
[1.20953760e-312 6.79038653e-313 9.76118064e-313]
[1.10343781e-312 1.10343781e-312 1.01855798e-312]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[5 5 5]
[5 5 5]
[5 5 5]]
Result:
Thus the program to to create and initialize NumPy arrays using two different
functions empty() and full() is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 3
DATE:
Program to remove rows in Numpy array that contains non-
numeric Values
Program:
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
a = [Link]([['1', '2', '3'],
['6', 'x', '5'],
['7', '8', '9']])
mask = [Link](a).all(axis=1)
c = a[mask]
print(c)

Output:
[['1' '2' '3']
['7' '8' '9']]

Result:
Thus we have removed the non-numeric values from the array.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 4
DATE:
Reading data from text file, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris
data set
Program:
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
i = pd.read_csv('iris_dataset.csv')
print([Link]())
with open('[Link]', 'r') as t:
print([Link]())
d = pd.read_excel('[Link]')
print([Link]())

Output:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
target
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa

hello world!

Name Age Gender

0 Alice 30 Female
1 Bob 25 Male
2 Charlie 35 Male
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Exploring various commands for doing descriptive analytics on Iris
dataset.

Program:
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
i = sns.load_dataset('iris')
print([Link]())
print([Link]())
print([Link]())
print([Link]())
print(i['species'].value_counts())
print([Link])
[Link](data=i, x='sepal_width', y='sepal_length', hue='species')
[Link]()
[Link](i, hue='species')
[Link]()
[Link](data=i, x='species')
[Link]()

Output:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
<class '[Link]'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype

0 sepal_length 150 non-null float64

1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None

sepal_length sepal_width petal_length petal_width

count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

0 False
1 False
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
2 False
3 False
4 False
145 False
146 False
147 False
148 False
149 False
Length: 150, dtype: bool
species
setosa 50
versicolor 50
virginica 50
Name: count, dtype: int64
(150, 5)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

Result:
Thus the program to Read data from text file, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris data set is
executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 5
DATE:
Use the diabetes data set from UCI and Pima Indians Diabetes
data set for performing the following:
a. Univariate analysis
b. Bivariate analysis
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the
two data sets.
[Link]:
import pandas as pd
import [Link] as plt
import numpy as np
uci = pd.read_csv('uci/[Link]')
pima = pd.read_csv('pima/[Link]')
print([Link]())
print([Link]())
print([Link]())
print([Link]())
def uni(ds):
print("mean:\n", [Link]())
print()
print("standard deviation\n", [Link]())
print("median:\n", [Link]())
print("mode:\n", [Link]())
print("variance\n", [Link]())
print("skewness:\n", [Link]())
print("kustosis:\n", [Link]())
uni(pima)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:
Unnamed: 0 patient_id date time code value
0 0 1 04-21-1991 9:09 58 100
1 1 1 04-21-1991 9:09 33 9
2 2 1 04-21-1991 9:09 34 13
3 3 1 04-21-1991 17:08 62 119
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome

0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

Unnamed: 0 patient_id code

count 29330.000000 29330.000000 29330.000000
mean 318.006035 36.435152 46.428606
std 287.185714 20.101179 13.453219
min 0.000000 1.000000 0.000000
25% 105.000000 21.000000 33.000000
50% 221.000000 34.000000 48.000000
75% 462.750000 55.000000 60.000000
max 1326.000000 70.000000 72.000000
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Pregnancies Glucose BloodPressure SkinThickness Insulin \
count 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479
std 3.369578 31.972618 19.355807 15.952218 115.244002

min 0.000000 0.000000 0.000000 0.000000 0.000000

25% 1.000000 99.000000 62.000000 0.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000

BMI DiabetesPedigreeFunction Age Outcome

count 768.000000 768.000000 768.000000 768.000000
mean 31.992578 0.471876 33.240885 0.348958
std 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.000000
75% 36.600000 0.626250 41.000000 1.000000
max 67.100000 2.420000 81.000000 1.000000
mean:
3.845052
Pregnancies
Glucose 120.894531
BloodPressure 69.105469
SkinThickness 20.536458
Insulin 79.799479
BMI 31.992578
DiabetesPedigreeFunction 0.471876
Age 33.240885
Outcome 0.348958
dtype: float64
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

standard deviation
Pregnancies 3.369578
Glucose 31.972618
BloodPressure 19.355807
SkinThickness 15.952218
Insulin 115.244002
BMI 7.884160
DiabetesPedigreeFunction 0.331329
Age 11.760232
Outcome 0.476951
dtype: float64
median:
Pregnancies 3.0000
Glucose 117.0000
BloodPressure 72.0000
SkinThickness 23.0000
Insulin 30.5000
BMI 32.0000
DiabetesPedigreeFunction 0.3725
Age 29.0000
Outcome 0.0000
dtype: float64
mode:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 1.0 99 70.0 0.0 0.0 32.0
1 NaN 100 NaN NaN NaN NaN
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

DiabetesPedigreeFunction Age Outcome

0 0.254 22.0 0.0
1 0.258 NaN NaN
variance
Pregnancies 11.354056
Glucose 1022.248314
BloodPressure 374.647271
SkinThickness 254.473245
Insulin 13281.180078
BMI 62.159984
DiabetesPedigreeFunction 0.109779
Age 138.303046
Outcome 0.227483
dtype: float64
skewness:
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
kustosis:
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64

Result:
Thus the program for performing the Univariate analysis is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[Link] Analysis:

Program:

import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
from sklearn.linear_model import LinearRegression as
lr from [Link] import mean_squared_error as
mse from sklearn.model_selection import
train_test_split

pima = pd.read_csv("pima/[Link]")
X = [Link](pima['Glucose']).reshape(-1, 1)
y = [Link](pima['BloodPressure']).reshape(-1, 1)
[Link](pima, x='Glucose', y='BloodPressure')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lr()
[Link](X_train, y_train)
preds = [Link](X_test)
print(100 - mse(y_test, preds))

Output:
MSE: -274.49327891888
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

Result:
Thus the program for performing the Bivariate analysis is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[Link] Analysis:

Program:
import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
from sklearn.linear_model import LinearRegression as
lr from [Link] import mean_squared_error as
mse from sklearn.model_selection import
train_test_split

pima = pd.read_csv("[Link]")
X = [Link](pima['Glucose']).reshape(-1, 1)
y = [Link](pima['Outcome']).reshape(-1, 1)
[Link](pima, x='BloodPressure', y='Age')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lr()
[Link](X_train, y_train)
preds = [Link](X_test)
print(100 - mse(y_test, preds))

Output:
Mse: 99.81831030079522

Result:
Thus the program for performing the Multivariate analysis is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 6
DATE:
Write a Pandas program to create and display a DataFrame from
a specified
dictionary data which has the index labels.
Program:
import pandas as pd
data = {
"name" : ['john', 'jack', 'jim', 'josh', 'joel'],
"age" : [30, 40, 25, 46, 32],
"city" : ['NY', 'CA', 'WS', 'TX', 'KS']
}
i = [1, 2, 3, 4, 5]
df = [Link](data, index=i)
print(df)

Output:
name age city
1 john 30 NY
2 jack 40 CA
3 jim 25 WS
4 josh 46 TX
5 joel 32 KS

Result:
The Pandas program successfully created and displayed a DataFrame from the
specified dictionary data with custom index labels.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 7
DATE:
Write a Pandas program to get the first 3 rows of a given DataFrame.

Program:
import pandas as pd

data = {
"name" : ['john', 'jack', 'jim', 'josh', 'joel'],
"age" : [30, 40, 25, 46, 32],
"city" : ['NY', 'CA', 'WS', 'TX', 'KS']
}

i = [1, 2, 3, 4, 5]

df = [Link](data, index=i)
f3 = [Link](3)
print(f3)

Output:
name age city
1 john 30 NY
2 jack 40 CA
3 jim 25 WS

Result:
The program successfully calculates the variance and standard deviation of the
given dataset using basic mathematical operations in Python.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 8
DATE:
Program to find the variance and standard deviation of set of
elements

Program:
Import math
dt = [10, 12, 23, 23, 16, 23, 21, 16]
m = sum(dt)/len(dt)
print("mean:", m)
var = 0.0
for i in dt:
var += (i - m)**2

var = var/len(dt)
print("variance:", var)
print("STD:", [Link](var))

Output:
mean: 18.0
variance: 24.0
STD: 4.898979485566356

Result:
The program successfully calculates the variance and standard deviation of the
given dataset using basic mathematical operations in Python.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 9
DATE:
Write a Python program to draw line charts of the financial data
of Alphabet Inc. between October 3, 2016 to October 7, 2016.

Program:
import [Link] as plt
import pandas as pd

data = pd.read_csv("[Link]")
data = [Link](data)
[Link](data['Date'][0:5], data['Close'][0:5])
[Link]()

Output:

Result:
The program successfully visualized the financial data of Alphabet Inc. between
October 3, 2016, and October 7, 2016, using a line chart.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 10
DATE:
Program to plot a Correlation and scatter plots

Program:

import pandas as pd
import numpy as np
import [Link] as plt
import seaborn as sns
d = pd.read_csv("[Link]")
dt = [Link](d[['chol', 'age', 'cp', 'thalach']])
[Link]([Link](), annot=True)
[Link](x=d['chol'], y=d['thalach'])

Output:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

Result:
Correlation Matrix: Provided insights into feature relationships, identifying
strong positive or negative correlations.
Scatter Plot: Visualized dependencies and patterns, such as linear trends or
clustering of data points by category.
This method is effective for exploratory data analysis, feature selection, and
understanding the structure of the dataset.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 11
DATE:
Program for Linear Regression and Logistic Regression
Linear Regression
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error
data = pd.read_csv("[Link]")
X = [Link][::, :-1:]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=80)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = [Link](X_test)
lrr = LinearRegression()
[Link](X_train, y_train)
lrr_pred = [Link](X_test)
lrr_acc = [Link](X_test, y_test)
print("accuracy:", lrr_acc*100)
mse = mean_squared_error(y_test, lrr_pred)
print(f"MSE: {mse}")
intercept = lrr.intercept_coefficients = lrr.coef_ print(f"Intercept: {intercept}")

print(f"Coefficients: {coefficients}")
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:
accuracy: 54.554261744998044

Result:
Thus the program for performing linear regression is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Logistic Regression

Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LogisticRegression
from [Link] import confusion_matrix, accuracy_score, roc_curve,
classification_report, ConfusionMatrixDisplay

data = pd.read_csv("[Link]")
X = [Link][::, :-1:]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=80)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = [Link](X_test)
lr = LogisticRegression()
[Link](X_train, y_train)
lr_pred = [Link](X_test)
lr_conf = confusion_matrix(y_test, lr_pred)
lr_acc = [Link](X_test, y_test)
print("confusion matrix:\n", lr_conf)
ConfusionMatrixDisplay.from_estimator(lr, X_test,
y_test) print("accuracy:", lr_acc*100)
print(classification_report(y_test, lr_pred))
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

Output:
confusion matrix:
[[81 16]
[ 9 99]]
accuracy: 87.8048780487805
precision recall f1-score support
0 0.90 0.84 0.87 97
1 0.86 0.92 0.89 108
accuracy 0.88 205
macro avg 0.88 0.88 0.88 205
weighted avg 0.88 0.88 0.88 205

Result:
Thus the program for performing logistic regression is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 12
DATE:
Apply and explore various plotting functions on UCI data sets.
a. Normal curves:

Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
data = pd.read_csv("[Link]")
[Link](data["chol"], bins=30, kde=True, stat="count", color="skyblue",
label="Histogram + KDE")
x = [Link](min(data["chol"]), max(data["chol"]), 100)
[Link](x, [Link](x, [Link](data["chol"]), [Link](data["chol"])),
color="red", lw=2, label="Normal Distribution")
[Link]()
[Link]()
Output:

Result:
This exploration highlights the applicability of normal curve plotting to
understand data distributions and identify key statistical properties.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
b. Density and Contour Plot

Program:

import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
data = pd.read_csv("[Link]")
[Link](x=[Link], y=[Link], fill=True, levels=20)
[Link]([Link], fill=True, levels=20)

Output:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

Result:
The visualizations effectively demonstrated relationships between features and
the distribution of data in the UCI dataset.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
C: Correlation and Scatter plots:

Program:

import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
from [Link] import StandardScaler
data = pd.read_csv("[Link]")
scaler = StandardScaler()
X = scaler.fit_transform(data)
[Link](figsize=(12,12))
[Link]([Link](), annot=True, cmap="coolwarm", fmt=".2f")
[Link](x=[Link], y=[Link])

Output:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

Result:
Correlation Heatmap: Identified numerical features with strong or weak
relationships in the dataset.
Scatter Plot: Provided a visual representation of the dependency between two
features, confirming trends indicated by the correlation matrix.
This approach helps understand feature interactions and guides further
exploratory or predictive analysis.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
d. Histograms:

Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
from [Link] import StandardScaler
data = pd.read_csv("[Link]")
scaler = StandardScaler()
X = scaler.fit_transform(data)
[Link](figsize=(15, 15))
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:

Result:
Insights Gained: The histograms provided a clear understanding of the
frequency distribution of numerical attributes.
Histograms are an effective tool for identifying data spread, outliers, and
patterns.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
e. Three Dimensional Scatter Plotting:

Program:

import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
import [Link] as px
from [Link] import norm
from [Link] import StandardScaler
data = pd.read_csv("[Link]")
scaler = StandardScaler()
X = scaler.fit_transform(data)
fig = px.scatter_3d(data, x="age", y="chol", z="thalach", color='slope')
[Link]()

Output:

Result:
The 3D plot effectively visualized interactions between three numerical features
in the dataset.
3D visualizations are useful for uncovering complex patterns, clusters, or
outliers that are not apparent in 2D plots.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 13
DATE:
Perform Mini Project on Fake News Detection.

Program:
import pandas as pd
import numpy as np
import [Link] as plt
import nltk as nlp
import re
import string
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score, classification_report

d_fake = pd.read_csv("[Link]")
d_true = pd.read_csv("[Link]")
d_true["text"] = d_true["text"].replace("(Reuters)", "", regex=True)
d_fake['target'] = 0
d_true['target'] = 1
d_fake = d_fake.drop(["title", "subject", "date"], axis=1)
d_true = d_true.drop(["title", "subject", "date"], axis=1)
df = [Link]([d_fake, d_true], axis=0)
df = [Link](frac=1)
df.reset_index(inplace=True)
[Link](["index"], axis=1, inplace=True)

def wordopt(text):
text = [Link]()
text = [Link](r'\[.*?\]', '', text)
text = [Link](r'[()]', '', text)
text = [Link](r'\\W', ' ', text)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
text = [Link](r'https?://\S+|www\.\S+', '', text)
text = [Link](r'<.*?>', '', text)

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
xv_train = vectorizer.fit_transform(X_train)
xv_test = [Link](X_test)
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
[Link](xv_train, y_train)
lr_p = [Link](xv_test)
from [Link] import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
[Link](xv_train, y_train)
dtc_p = [Link](xv_test)
from [Link] import GradientBoostingClassifier
gbc = DecisionTreeClassifier()
[Link](xv_train, y_train)
gbc_p = [Link](xv_test)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

from [Link] import RandomForestClassifier

rfc = RandomForestClassifier()
[Link](xv_train, y_train)
rfc_p = [Link](xv_test)

def output(n):
if (n == 0):
return "Fake News"
elif (n == 1):
return "Not Fake News"
def testing(news):
testing_news = {"text": [news]}
new_test = [Link](testing_news)
new_test = new_test["text"].apply(wordopt)
new_xv_test = [Link](new_test)
pred_lr = [Link](new_xv_test)
pred_dtc = [Link](new_xv_test)
pred_gbc = [Link](new_xv_test)
pred_rfc = [Link](new_xv_test)
return [pred_lr, pred_dtc, pred_gbc, pred_rfc]
def printout(pl):
p = []
[Link]("Logistic Regression: " + output(pl[0]))
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[Link]("Decision Tree Classifier: " + output(pl[1]))
[Link]("Gradient Boosting Classifier: " + output(pl[2]))
[Link]("Random Forest Classifier: " + output(pl[3]))
return p

news = str(input())
printout(testing(news))

Output:
"The government announces a new stimulus package that will boost the
economy by 100%."

['Logistic Regression: Fake News',

'Decision Tree Classifier: Fake News',
'Gradient Boosting Classifier: Fake News',
'Random Forest Classifier: Fake News']

Result:
Thus mini project to classify news articles as either fake or real based on textual
content is developed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 14
DATE:
Build an application to detect colors in the given picture using
Basic Data Science

Program:
import numpy as np
import pandas as pd
from [Link] import KMeans
import cv2
from [Link] import files
from PIL import Image
from io import BytesIO
import collections
uploaded = [Link]()
image = [Link](BytesIO(uploaded[list([Link]())[0]]))
image_np = [Link](image)
image_colors = [Link](image_np, cv2.COLOR_BGR2RGB)
pixels = image_colors.reshape((-1, 3))
num_clusters = 10
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
[Link](pixels)
labels = kmeans.labels_
colors = kmeans.cluster_centers_
color_counts = [Link](labels)
color_names = [
"Red", "Green", "Blue", "Yellow", "Purple", "Orange", "Pink", "Brown",
"Black", "White"
]
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
for i in range(num_clusters):
color_name = color_names[i % len(color_names)]
count = color_counts[i]
rgb_color = tuple(map(int, colors[i]))
print(f"Color: {color_name}, RGB: {rgb_color}, Count: {count}")

Output:
Color: Red, RGB: (9, 6, 134), Count: 580263
Color: Green, RGB: (14, 60, 200), Count: 140996
Color: Blue, RGB: (7, 37, 42), Count: 599477
Color: Yellow, RGB: (5, 141, 161), Count: 457859
Color: Purple, RGB: (96, 135, 166), Count: 201281
Color: Orange, RGB: (13, 102, 124), Count: 551993
Color: Pink, RGB: (75, 63, 40), Count: 140930
Color: Brown, RGB: (35, 169, 207), Count: 223069
Color: Black, RGB: (134, 190, 218), Count: 175200
Color: White, RGB: (10, 60, 77), Count: 615332

Result:
Thus the application to detect colors in the given picture using Basic Data
Science is built and output is verified
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

DATE:
[Link]: 1

ADDITIONAL PROGRAMS

GENERATE RANDOM NUMBER IN AN ARRAY

PROGRAM:
import numpy as np
random_integers = [Link](0, 10, 5)
print("Random Integer Array:", random_integers)
random_floats = [Link](5)
print("Random Float Array:", random_floats)
random_normal = [Link](5)
print("Random Normal Distribution Array:", random_normal)
random_2d_array = [Link](0, 10, (3, 3))
print("Random 2D Integer Array:\n", random_2d_array)
random_shape_array = [Link]((4, 4))

print("Random Array with Shape (4x4):\n", random_shape_array)

OUTPUT:
Random Integer Array: [0 6 9 0 6]
Random Float Array: [0.910141 0.37395005 0.04729324 0.99031385
0.93985559]
Random Normal Distribution Array: [-0.58544812 -0.31042167 -0.31726232 -
2.18994984 0.42318254]
Random 2D Integer Array:
[[8 8 5]
[7 6 8]
[5 8 3]]
Random Array with Shape (4x4):
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[[0.72800535 0.42900513 0.59822429 0.38490628]
[0.98336226 0.58878188 0.1133993 0.12893145]
[0.19279917 0.84499983 0.0170265 0.23372838]
[0.47922397 0.50141304 0.03462652 0.97882316]]

RESULT:
Thus the program was successfully executed and verified
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:

DATE:
[Link]: 2
PROGRAM THAT COMPARES TWO ARRAYS ELEMENT AND
RETURN A BOOLEAN ARRAY

PROGRAM:
import numpy as np
def compare_arrays(arr1, arr2):
if [Link] != [Link]:
raise ValueError("Arrays must have the same shape to compare")
return arr1 == arr2
arr1 = [Link]([1, 2, 3, 4, 5])
arr2 = [Link]([1, 2, 0, 4, 5])
result = compare_arrays(arr1,
arr2)
print("Boolean Comparison Result:", result)

OUTPUT:
Boolean Comparison Result: [ True True False True True]

RESULT:
Thus the program was successfully executed and verified.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
DATE:
[Link]

CASE STUDY:
You are a data scientist working for a company that manages a chain of fitness
centres across various regions. the company collects data on the monthly
number of members attending classes in different fitness centers. you need to
analyse this data to understand key matrices like:
1. The average number of attendes for fitness centre.
2. The variation in attendance across different centres (standard deviation).
3. The centres with the min and max attendance over the month.
4. The centre with the highest average attendance the data is stored in a 2d
array where each column represents the number of members attending the
class on a specific day [30days].

PROGRAM:
import numpy as np
attendance_data = [Link]([
[50, 45, 52, 49, 47, 48, 50, 51, 49, 52, 46, 44, 50, 49, 53, 47, 50, 48, 52, 51,
46, 47, 50, 53, 49, 48, 50, 51, 52, 53],
[60, 58, 61, 59, 62, 60, 61, 63, 59, 64, 58, 57, 62, 60, 65, 59, 60, 58, 63, 62,
57, 58, 61, 65, 60, 59, 62, 63, 64, 65],
[40, 38, 42, 39, 41, 40, 41, 43, 39, 44, 38, 37, 41, 40, 45, 39, 40, 38, 43, 42,
37, 38, 41, 45, 40, 39, 42, 43, 44, 45]])
average_attendance = [Link](attendance_data, axis=1)
std_deviation = [Link](attendance_data, axis=1)
total_attendance = [Link](attendance_data, axis=1)
min_attendance_center = [Link](total_attendance)
max_attendance_center = [Link](total_attendance)
highest_avg_center = [Link](average_attendance)
print("Average Attendance per Center:", average_attendance)
print("Standard Deviation of Attendance per Center:", std_deviation)
print(f"Center with Minimum Attendance: Center {min_attendance_center +
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
1}")
print(f"Center with Maximum Attendance: Center {max_attendance_center +
1}")
print(f"Center with Highest Average Attendance: Center {highest_avg_center +
1}")

OUTPUT:
Average Attendance per Center: [49.4 60.83333333 40.8]
Standard Deviation of Attendance per Center: [2.38886305 2.38164276
2.37205958]
Center with Minimum Attendance: Center 3
Center with Maximum Attendance: Center 2
Center with Highest Average Attendance: Center
2

RESULT:
Thus the program was successfully executed and verified.
Reg. No.:

DATE:
[Link]
CASE STUDY:
You are analyst working for a e-commerce platform that collects daily
transaction data for the products sold on the websites. the company is looking to
implement certain data processing task to gain better insight into the sales
trends and optimize marketing strategies. they require you to apply some
fundamental mathematical transformations on the sales data for further analysis.
the dataset consists of daily sales figures for multiple products over a 30 day
period. Each row corresponds to a specific product and each column contains
the sales figures for that product on the given day. Give the nature of the
analysis you are expected to perform the following operations on the data using
numPy's, unary universal functions.
1. Your task is to adjust the sales data by scaling it accordingly, reflecting
this potential price change.
2. Compute a logarithmic transformation of the sales data to archieve this goal.
3. You are tasked with adding this tags to each sale in the dataset, adjusting
the sales figure accordingly.
4. You need to apply a mathematical transformation(sqrt) to the sales data to
model this phenomenon, where larger sales figures are compressed more
than smaller ones.
5. Convert all sales data to negative values, which could be useful for calculating
refund or loses.
6. Your tasked with applying this rounding operations across the entire data set.
PROGRAM:
import numpy as np
# Sample sales data (5 products, sales over 30 days)
[Link](0)
sales_data = [Link](10, 500, (5, 30))
# 1. Scaling the Sales Data (e.g., increasing by 10%)
scaling_factor = 1.10
scaled_sales = sales_data * scaling_factor
# 2. Logarithmic Transformation
log_sales = [Link](sales_data + 1) # Adding 1 to avoid
log(0) # 3. Adding Tax (e.g., 8% tax)
tax_rate = 1.08
Reg. No.:

sales_with_tax = sales_data * tax_rate

# 4. Applying Square Root Transformation
sqrt_sales = [Link](sales_data)

# 5. Converting Sales Data to Negative Values

negative_sales = -sales_data
# 6. Rounding the Sales Data
rounded_sales = [Link](sales_data)
# Display Results
print("Original Sales Data:\n", sales_data) print("\
nScaled Sales Data:\n", scaled_sales) print("\nLog
Transformed Sales Data:\n", log_sales) print("\nSales
with Tax:\n", sales_with_tax) print("\nSquare Root
Transformed Sales:\n", sqrt_sales) print("\nNegative
Sales Data:\n", negative_sales) print("\nRounded Sales
Data:\n", rounded_sales)
OUTPUT:
Original Sales Data:
[[182 57 127 202 333 261 205 369 19 221 287 252 302 97 80 482 98 406
324 203 496 49 97 184 98 347 175 35 343 82]
[275 414 125 474 253 207 345 441 458 348 109 482 187 253 295 157 157 408
433 298 459 275 195 137 42 41 212 254 161 173]
[469 380 193 38 300 138 138 430 63 399 48 498 254 283 345 398 115 52
452 41 386 267 331 497 435 67 301 368 129 277]
[440 92 101 394 408 109 63 406 131 436 94 213 334 272 462 57 137 141
470 366 190 498 344 153 158 237 452 289 217 407]
[383 351 58 315 79 179 173 458 105 207 104 266 379 188 302 428 314 359
397 108 52 471 378 497 415 211 393 10 404 380]]
Scaled Sales Data:
[[200.2 62.7 139.7 222.2 366.3 287.1 225.5 405.9 20.9 243.1 315.7 277.2
332.2 106.7 88. 530.2 107.8 446.6 356.4 223.3 545.6 53.9 106.7 202.4
107.8 381.7 192.5 38.5 377.3 90.2]
Reg. No.:

[302.5 455.4 137.5 521.4 278.3 227.7 379.5 485.1 503.8 382.8 119.9 530.2
205.7 278.3 324.5 172.7 172.7 448.8 476.3 327.8 504.9 302.5 214.5 150.7
46.2 45.1 233.2 279.4 177.1 190.3]
[515.9 418. 212.3 41.8 330. 151.8 151.8 473. 69.3 438.9 52.8 547.8
279.4 311.3 379.5 437.8 126.5 57.2 497.2 45.1 424.6 293.7 364.1 546.7
478.5 73.7 331.1 404.8 141.9 304.7]
[484. 101.2 111.1 433.4 448.8 119.9 69.3 446.6 144.1 479.6 103.4 234.3
367.4 299.2 508.2 62.7 150.7 155.1 517. 402.6 209. 547.8 378.4 168.3
173.8 260.7 497.2 317.9 238.7 447.7]
[421.3 386.1 63.8 346.5 86.9 196.9 190.3 503.8 115.5 227.7 114.4 292.6
416.9 206.8 332.2 470.8 345.4 394.9 436.7 118.8 57.2 518.1 415.8 546.7
456.5 232.1 432.3 11. 444.4 418. ]]
Log Transformed Sales Data:
[[5.20948615 4.06044301 4.85203026 5.31320598 5.81114099 5.5683445
5.32787617 5.91350301 2.99573227 5.40267738 5.66296048 5.53338949
5.71373281 4.58496748 4.39444915 6.18001665 4.59511985 6.00881319
5.78382518 5.31811999 6.20859003 3.91202301 4.58496748 5.22035583
4.59511985 5.85220248 5.170484 3.58351894 5.84064166 4.41884061]
[5.62040087 6.02827852 4.83628191 6.1633148 5.53733427 5.33753808
5.84643878 6.09130988 6.12905021 5.85507192 4.70048037 6.18001665
5.23644196 5.53733427 5.69035945 5.06259503 5.06259503 6.01371516
6.07304453 5.70044357 6.13122649 5.62040087 5.27811466 4.92725369
3.76120012 3.73766962 5.36129217 5.54126355 5.08759634 5.1590553 ]
[6.15273269 5.94279938 5.26785816 3.66356165 5.70711026 4.93447393
4.93447393 6.06610809 4.15888308 5.99146455 3.8918203 6.2126061
5.54126355 5.64897424 5.84643878 5.98896142 4.75359019 3.97029191
6.11589213 3.73766962 5.95842469 5.59098698 5.80513497 6.21060008
6.07764224 4.21950771 5.71042702 5.91079664 4.86753445 5.62762111]
[6.08904488 4.53259949 4.62497281 5.97888576 6.01371516 4.70048037
Reg. No.:

4.15888308 6.00881319 4.88280192 6.0799332 4.55387689 5.36597602

5.81413053 5.6094718 6.13772705 4.06044301 4.92725369 4.95582706
6.15485809 5.90536185 5.25227343 6.2126061 5.84354442 5.0369526
5.0689042 5.47227067 6.11589213 5.66988092 5.38449506 6.01126717]
[5.95064255 5.86363118 4.07753744 5.75574221 4.38202663 5.19295685
5.1590553 6.12905021 4.66343909 5.33753808 4.65396035 5.58724866
5.94017125 5.24174702 5.71373281 6.06145692 5.75257264 5.88610403
5.98645201 4.69134788 3.97029191 6.15697899 5.93753621 6.21060008
6.03068526 5.35658627 5.97635091 2.39789527 6.00388707 5.94279938]]
Sales with Tax:
[[196.56 61.56 137.16 218.16 359.64 281.88 221.4 398.52 20.52 238.68
309.96 272.16 326.16 104.76 86.4 520.56 105.84 438.48 349.92 219.24
535.68 52.92 104.76 198.72 105.84 374.76 189. 37.8 370.44 88.56]
[297. 447.12 135. 511.92 273.24 223.56 372.6 476.28 494.64 375.84
117.72 520.56 201.96 273.24 318.6 169.56 169.56 440.64 467.64 321.84
495.72 297. 210.6 147.96 45.36 44.28 228.96 274.32 173.88 186.84]
[506.52 410.4 208.44 41.04 324. 149.04 149.04 464.4 68.04 430.92
51.84 537.84 274.32 305.64 372.6 429.84 124.2 56.16 488.16 44.28
416.88 288.36 357.48 536.76 469.8 72.36 325.08 397.44 139.32 299.16]
[475.2 99.36 109.08 425.52 440.64 117.72 68.04 438.48 141.48 470.88
101.52 230.04 360.72 293.76 498.96 61.56 147.96 152.28 507.6 395.28
205.2 537.84 371.52 165.24 170.64 255.96 488.16 312.12 234.36 439.56]
[413.64 379.08 62.64 340.2 85.32 193.32 186.84 494.64 113.4 223.56
112.32 287.28 409.32 203.04 326.16 462.24 339.12 387.72 428.76 116.64
56.16 508.68 408.24 536.76 448.2 227.88 424.44 10.8 436.32 410.4 ]]
Square Root Transformed Sales:
[[13.49073756 7.54983444 11.26942767 14.2126704 18.24828759
16.15549442
Reg. No.:

14.31782106 19.20937271 4.35889894 14.86606875 16.94107435

15.87450787
17.3781472 9.8488578 8.94427191 21.9544984 9.89949494 20.14944168
18. 14.24780685 22.27105745 7. 9.8488578 13.56465997
9.89949494 18.62793601 13.22875656 5.91607978 18.52025918
9.05538514]
[16.58312395 20.34698995 11.18033989 21.77154106 15.90597372
14.38749457
18.57417562 21. 21.40093456 18.65475811 10.44030651 21.9544984
13.67479433 15.90597372 17.17556404 12.52996409 12.52996409
20.19900988
20.80865205 17.2626765 21.42428529 16.58312395 13.96424004
11.70469991
6.4807407 6.40312424 14.56021978 15.93737745 12.68857754
13.15294644]
[21.65640783 19.49358869 13.89244399 6.164414 17.32050808
11.74734012
11.74734012 20.73644135 7.93725393 19.97498436 6.92820323 22.3159136
15.93737745 16.82260384 18.57417562 19.94993734 10.72380529
7.21110255
21.26029163 6.40312424 19.6468827 16.34013464 18.1934054
22.29349681
20.85665361 8.18535277 17.34935157 19.18332609 11.35781669
16.64331698]
[20.97617696 9.59166305 10.04987562 19.84943324 20.19900988
10.44030651
7.93725393 20.14944168 11.44552314 20.88061302 9.69535971
14.59451952
18.27566688 16.4924225 21.49418526 7.54983444 11.70469991
11.87434209
21.67948339 19.13112647 13.78404875 22.3159136 18.54723699
12.36931688
12.56980509 15.39480432 21.26029163 17. 14.73091986 20.174241 ]
Reg. No.:

[19.57038579 18.734994 7.61577311 17.74823935 8.88819442

13.37908816
13.15294644 21.40093456 10.24695077 14.38749457 10.19803903
16.30950643
19.46792233 13.7113092 17.3781472 20.68816087 17.72004515
18.94729532
19.92485885 10.39230485 7.21110255 21.70253441 19.4422221
22.29349681
20.37154879 14.52583905 19.8242276 3.16227766 20.09975124
19.49358869]]
Negative Sales Data:
[[-182 -57 -127 -202 -333 -261 -205 -369 -19 -221 -287 -252 -302 -97
-80 -482 -98 -406 -324 -203 -496 -49 -97 -184 -98 -347 -175 -35
-343 -82]
[-275 -414 -125 -474 -253 -207 -345 -441 -458 -348 -109 -482 -187 -253
-295 -157 -157 -408 -433 -298 -459 -275 -195 -137 -42 -41 -212 -254
-161 -173]
[-469 -380 -193 -38 -300 -138 -138 -430 -63 -399 -48 -498 -254 -283
-345 -398 -115 -52 -452 -41 -386 -267 -331 -497 -435 -67 -301 -368
-129 -277]
[-440 -92 -101 -394 -408 -109 -63 -406 -131 -436 -94 -213 -334 -272
-462 -57 -137 -141 -470 -366 -190 -498 -344 -153 -158 -237 -452 -289
-217 -407]
[-383 -351 -58 -315 -79 -179 -173 -458 -105 -207 -104 -266 -379 -188
-302 -428 -314 -359 -397 -108 -52 -471 -378 -497 -415 -211 -393 -10
-404 -380]]
Rounded Sales Data:
[[182 57 127 202 333 261 205 369 19 221 287 252 302 97 80 482 98 406
324 203 496 49 97 184 98 347 175 35 343 82]
[275 414 125 474 253 207 345 441 458 348 109 482 187 253 295 157 157 408
Reg. No.:

433 298 459 275 195 137 42 41 212 254 161 173]
[469 380 193 38 300 138 138 430 63 399 48 498 254 283 345 398 115 52
452 41 386 267 331 497 435 67 301 368 129 277]
[440 92 101 394 408 109 63 406 131 436 94 213 334 272 462 57 137 141
470 366 190 498 344 153 158 237 452 289 217 407]
[383 351 58 315 79 179 173 458 105 207 104 266 379 188 302 428 314 359
397 108 52 471 378 497 415 211 393 10 404 380]]

RESULT:
Thus the program was successfully executed and verified.
Reg. No.:

DATE:
[Link]: 5

PROGRAM TO FIND THE UNIQUE ELEMENT IN AN ARRAY AND

COUNT THE OCCURRENCE

PROGRAM:
import numpy as np
arr = [Link]([1, 2, 3, 2, 3, 4, 5, 5, 6, 3, 2, 1, 4])
unique_elements, counts = [Link](arr, return_counts=True)
print("Unique elements:", unique_elements)
print("Counts of unique elements:", counts)
for i in range(len(unique_elements)):
print(f"Element {unique_elements[i]} occurs {counts[i]} times")

OUTPUT
Unique elements: [1 2 3 4 5 6]
Counts of unique elements: [2 3 3 2 2 1]
Element 1 occurs 2 times
Element 2 occurs 3 times
Element 3 occurs 3 times
Element 4 occurs 2 times
Element 5 occurs 2 times
Element 6 occurs 1 times

RESULT:
Thus the program to find the unique element in an array and count the
occurrence was written and executed successfully.
Reg. No.:

DATE:
[Link]: 6
PROGRAM TO IMPLEMENT MATRIX MULTIPLICATION FOR
TWO DIMENSIONAL ARRAYS USING NUMPY

PROGRAM:
import numpy as np
A = [Link]([[1, 2, 3],
[4, 5, 6]])
B = [Link]([[7, 8],
[9, 10],
[11, 12]])
result = [Link](A, B)
print("Matrix A:")
print(A)

print("\nMatrix B:")
print(B)
print("\nResultant Matrix (A x B):")
print(result)
OUTPUT:
Matrix A: [[1 2 3]
[4 5 6]]
Matrix B: [[ 7 8]
[ 9 10]
[11 12]]
Resultant Matrix (A x B):
[58 64]
[139 154]]

RESULT:
Thus the above program to matrix multiplication for two-dimensional
arrays using NumPy was written and executed successfully.
Reg. No.:

DATE:
[Link]: 7

PROGRAM TO CHECK IF AN ARRAY CONTAINS ALL

POSITIVE NUMBERS USING A COMPARISON
OPERATOR

PROGRAM:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
if (arr > 0).all():
print("The array contains all positive numbers.")
else:
print("The array does not contain all positive numbers.")

OUTPUT:
The array contains all positive numbers.

RESULT:
Thus the program to check if an array contains all positive numbers using a
comparison operator was written and executed successfully.
Reg. No.:

DATE:
EX NO: 8
PROGRAM TO CREATE A RANDOM 3×3 MATRIX AND FIND THE
MAXIMUM VALUE IN THE MATRIX
PROGRAM:
import numpy as np
matrix = [Link](1, 101, (3,
3)) max_value = [Link](matrix)
print("Random 3x3 Matrix:")
print(matrix)
print("\nMaximum Value in the Matrix:", max_value)

OUTPUT:
Random 3x3 Matrix:
[[23 89 45]
[67 12 98]
[56 34 77]]

Maximum Value in the Matrix: 98

RESULT:
Thus the program to create a random 3×3 matrix and find the maximum
value in the matrix using NumPy was written and executed successfully.
Reg. No.:

DATE:
EX NO: 9
GIVEN A 3X3 MATRIX, SORT THE ARRAY IN ASCENDING ORDER
ALONG EACH ROW, SORT IT IN DESCENDING ORDER ALONG
EACH COLUMN. FIND THE INDEX POSITION OF THE ELEMENT
AFTER READING.

PROGRAM:
import numpy as np
matrix = [Link]([
[4, 2, 9],
[1, 8, 5],
[7, 6, 3]])
original_matrix = [Link]()
[Link](axis=1)
matrix = [Link](matrix, axis=0)[::-1]
index_positions = {}
for i in range(3):

for j in range(3):
element = matrix[i, j]
original_index = [Link](original_matrix == element)
index_positions[element] = (original_index[0][0], original_index[1][0])
print("Sorted Matrix (Rows Ascending, Columns Descending):")
print(matrix)
print("\nIndex Positions (Original -> New):")
for element, pos in index_positions.items():
print(f"Element {element}: Original Index {pos}")
OUTPUT:
Sorted Matrix (Rows Ascending, Columns Descending):
[[3 6 9]
[2 5 8]
[1 4 7]]
Reg. No.:

Index Positions (Original -> New):

Element 3: Original Index (np.int64(2), np.int64(2))
Element 6: Original Index (np.int64(2), np.int64(1))
Element 9: Original Index (np.int64(0), np.int64(2))
Element 2: Original Index (np.int64(0), np.int64(1))
Element 5: Original Index (np.int64(1), np.int64(2))
Element 8: Original Index (np.int64(1), np.int64(1))
Element 1: Original Index (np.int64(1), np.int64(0))
Element 4: Original Index (np.int64(0), np.int64(0))
Element 7: Original Index (np.int64(2), np.int64(0))

RESULT:
This program is to sort a 3x3 matrix in ascending order along each row, then in
descending order along each column was written and executed successfully.
Reg. No.:

DATE:
[Link]: 10

CREATE A MASK THAT FILTERS OUT VALUES GREATER THAN

50 AND COMPUTE THE SUM OF THE REMAINING VALUES

PROGRAM:
import numpy as np
data = [Link]([10, 20, 30, 60, 40, 50, 70, 80])
mask = data <= 50
filtered_values = data[mask]
sum_filtered_values = [Link](filtered_values)
print("Filtered values:", filtered_values)
print("Sum of remaining values:", sum_filtered_values)

OUTPUT:
Filtered values: [10 20 30 40 50]
Sum of remaining values: 150

RESULT:
The program is to filter out values greater than 50 and compute the sum of the
remaining values was written and executed successfully.
Reg. No.:

DATE:
[Link]: 11

CASE STUDY:
You are a data analyst working for a school discrete the school wants to track
the grades of students in different subjects across multiple terms. you ask with a
performing basic operations of numpy. you have five grades in three subjects
for two terms.

TASK TO PERFORM:
[Link] the grade for all students by 5 points in each subjects.
[Link] the average grade for each students across all the subjects.
3. Find the highest grade in each subject.
4. extract the grade of student 3.
5. Reshape the grade array so that the subject becomes rows and student
becomes column.
6. Add 10 points to each student the math grade and 5 points to science and
no change in english.

PROGRAM:
import numpy as np
grades = [Link]([
[75, 80, 85],
[88, 76, 90],
[92, 85, 87],
[78, 88, 82],
[85, 89, 84] ])
# 1. Increase all grades by 5 points
grades += 5
print("Grades after adding 5 points:")
print(grades)
# 2. Calculate the average grade for each student across all subjects
average_grades = [Link](grades, axis=1)
Reg. No.:

print("\nAverage grade for each student:")

print(average_grades)
# 3. Find the highest grade in each subject
highest_grades = [Link](grades, axis=0)
print("\nHighest grade in each subject:")
print(highest_grades)

# 4. Extract grade of Student 3 (index 2 in 0-based indexing)

student_3_grades = grades[2]
print("\nGrades of Student 3:")
print(student_3_grades)
# 5. Reshape array so that subjects become rows and students become columns
reshaped_grades = grades.T
print("\nReshaped grades (Subjects as rows, Students as columns):")
print(reshaped_grades)
# 6. Add 10 points to Math, 5 to Science, and no change in English
grades[:, 0] += 10 # Math
grades[:, 1] += 5 # Science (English remains the
same) print("\nGrades after specific modifications:")
print(grades)
OUTPUT:
Grades after adding 5 points:
[[80 85 90]
[93 81 95]
[97 90 92]
[83 93 87]
[90 94 89]]
Average grade for each student:
[85. 89.66666667 93. 87.66666667 91. ]
Highest grade in each subject:
[97 94 95]
Reg. No.:

Grades of Student 3:
[97 90 92]

Reshaped grades (Subjects as rows, Students as columns):

[[80 93 97 83 90]
[85 81 90 93 94]
[90 95 92 87 89]]
Grades after specific modifications:
[[ 90 90 90]
[103 86 95]
[107 95 92]
[ 93 98 87]
[100 99 89]]

RESULT:
Thus the Student performance analysis is executed successfully.

Data Science Lab Exercises at Panimalar College
No ratings yet
Data Science Lab Exercises at Panimalar College
102 pages
Data Science Lab: NumPy & Pandas Guide
No ratings yet
Data Science Lab: NumPy & Pandas Guide
33 pages
CS3361 Data Science Lab Record
No ratings yet
CS3361 Data Science Lab Record
52 pages
IT Practical Record for Engineering Students
No ratings yet
IT Practical Record for Engineering Students
61 pages
Data Analysis Lab: Python & Visualization
No ratings yet
Data Analysis Lab: Python & Visualization
11 pages
Explore NumPy, SciPy, Jupyter, Pandas
No ratings yet
Explore NumPy, SciPy, Jupyter, Pandas
18 pages
IT Practical Record Notebook 2024
No ratings yet
IT Practical Record Notebook 2024
61 pages
Data Science Lab Manual 2023-24
No ratings yet
Data Science Lab Manual 2023-24
54 pages
CS3361 Data Science Lab Exam Guide
No ratings yet
CS3361 Data Science Lab Exam Guide
3 pages
CS3362 Data Science Lab Manual 2022-23
No ratings yet
CS3362 Data Science Lab Manual 2022-23
54 pages
Enercalc Download for Data Science Lab
No ratings yet
Enercalc Download for Data Science Lab
3 pages
Data Science Laboratory Lab Record 2025-26
No ratings yet
Data Science Laboratory Lab Record 2025-26
32 pages
FDS Laboratory: Data Science Practices
No ratings yet
FDS Laboratory: Data Science Practices
43 pages
Data Science Lab Experiments Guide
No ratings yet
Data Science Lab Experiments Guide
53 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
53 pages
Cs3361 Set3 Sign
No ratings yet
Cs3361 Set3 Sign
5 pages
Data Science Lab Record 2024-2025
No ratings yet
Data Science Lab Record 2024-2025
53 pages
Cse Cs3361 Ds Lab Manual-1
No ratings yet
Cse Cs3361 Ds Lab Manual-1
43 pages
Machine Learning Lab Record Coimbatore
No ratings yet
Machine Learning Lab Record Coimbatore
21 pages
FDS Data Science Lab Manual
No ratings yet
FDS Data Science Lab Manual
53 pages
Data Science Lab Manual for CSE Students
No ratings yet
Data Science Lab Manual for CSE Students
5 pages
Data Science Laboratory Course Overview
No ratings yet
Data Science Laboratory Course Overview
24 pages
Install NumPy, SciPy, Jupyter, Pandas
No ratings yet
Install NumPy, SciPy, Jupyter, Pandas
31 pages
Data Science Lab Record Book
No ratings yet
Data Science Lab Record Book
43 pages
CS3361 Data Science Lab Manual 2023
No ratings yet
CS3361 Data Science Lab Manual 2023
58 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
17 pages
Data Science Laboratory Manual
No ratings yet
Data Science Laboratory Manual
49 pages
Data Science Fundamentals Practical Guide
No ratings yet
Data Science Fundamentals Practical Guide
32 pages
Data Science Laboratory Course Overview
No ratings yet
Data Science Laboratory Course Overview
64 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
65 pages
Data Science Lab Record at M.A.M College
No ratings yet
Data Science Lab Record at M.A.M College
65 pages
Data Science Lab Report 2023-24
No ratings yet
Data Science Lab Report 2023-24
107 pages
Python Data Handling and Computation Guide
No ratings yet
Python Data Handling and Computation Guide
4 pages
Data Science Lab Report: Numpy & Pandas
No ratings yet
Data Science Lab Report: Numpy & Pandas
45 pages
Machine Learning Laboratory Manual
No ratings yet
Machine Learning Laboratory Manual
71 pages
AI & Data Science Lab Manual
No ratings yet
AI & Data Science Lab Manual
27 pages
Lab Manual Machin Learning
No ratings yet
Lab Manual Machin Learning
61 pages
CS3362 Data Science Lab Manual
No ratings yet
CS3362 Data Science Lab Manual
31 pages
CS3361 Data Science Lab Exam Guide
No ratings yet
CS3361 Data Science Lab Exam Guide
3 pages
NumPy and Pandas Data Science Codes
No ratings yet
NumPy and Pandas Data Science Codes
2 pages
Data Engineering Labs with Pandas & Numpy
No ratings yet
Data Engineering Labs with Pandas & Numpy
4 pages
Machine Learning Lab Manual for CSE
No ratings yet
Machine Learning Lab Manual for CSE
25 pages
ML MANUAL All Practicals Modi
No ratings yet
ML MANUAL All Practicals Modi
80 pages
Data Science Lab Experiments Guide
No ratings yet
Data Science Lab Experiments Guide
35 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
77 pages
ML MANUAL All Practicals
No ratings yet
ML MANUAL All Practicals
71 pages
Data Science Laboratory Record 2024-25
No ratings yet
Data Science Laboratory Record 2024-25
42 pages
Data Analysis Experiments with Python
No ratings yet
Data Analysis Experiments with Python
54 pages
Data Analytics Lab: Python Setup Guide
No ratings yet
Data Analytics Lab: Python Setup Guide
26 pages
Python Libraries for AI & ML Studies
No ratings yet
Python Libraries for AI & ML Studies
41 pages
Data Science Python Lab Exam Guide
No ratings yet
Data Science Python Lab Exam Guide
7 pages
Python Data Structures and Numpy Basics
No ratings yet
Python Data Structures and Numpy Basics
34 pages
B. Tech AI & ML Lab Assignments Guide
No ratings yet
B. Tech AI & ML Lab Assignments Guide
14 pages
Data Science Lab Manual for AI Students
No ratings yet
Data Science Lab Manual for AI Students
70 pages
Data Preprocessing in Python Libraries
No ratings yet
Data Preprocessing in Python Libraries
159 pages
NumPy and Pandas: Python Data Science Tools
No ratings yet
NumPy and Pandas: Python Data Science Tools
12 pages
CS3352 Data Science Lab Manual
No ratings yet
CS3352 Data Science Lab Manual
56 pages
DEV manual-AD3301 sem3
No ratings yet
DEV manual-AD3301 sem3
19 pages
Pedagogical Strategies in TOC Course
No ratings yet
Pedagogical Strategies in TOC Course
1 page
ML Notes
No ratings yet
ML Notes
66 pages
Unique Elements and Counts in Array
No ratings yet
Unique Elements and Counts in Array
6 pages
New Treap After Deleting Node F
No ratings yet
New Treap After Deleting Node F
89 pages
Database Management Systems: NoSQL & RAID
No ratings yet
Database Management Systems: NoSQL & RAID
54 pages
Upcoming AI and Data Science Events
No ratings yet
Upcoming AI and Data Science Events
1 page
Next-Gen OS Mission Brainstorming Session
No ratings yet
Next-Gen OS Mission Brainstorming Session
2 pages
CSE IV Sem DBMS Lab Assignment Marks
No ratings yet
CSE IV Sem DBMS Lab Assignment Marks
38 pages
NumPy Array Operations and Analysis
No ratings yet
NumPy Array Operations and Analysis
15 pages
Recursive Bl4 in Theory of Computation
No ratings yet
Recursive Bl4 in Theory of Computation
6 pages
Closure Properties of Context Free Languages
No ratings yet
Closure Properties of Context Free Languages
1 page
Undecidability in Turing Languages
No ratings yet
Undecidability in Turing Languages
17 pages

Fds Lab Manual

Uploaded by

Fds Lab Manual

Uploaded by

PANIMALAR ENGINEERING COLLEGE

(An Autonomous Institution, Affiliated to Anna University, Chennai)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

23AD1413 - FOUNDATIONS OF DATA SCIENCE LABORATORY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

This is a Certified Bonafide Record Book of

Internal Examiner External Examiner

7 Write a Pandas program to get the first 3

11 Program for Linear Regression and

12 Apply and explore various plotting

a. Normal curves, b. Density and contour

c. Correlation and scatter plots, d.

e. Three dimensional plotting

13 Perform Mini Project on Fake News

14 Build an application to detect colors in the

2 Program that compares two arrays

5 Program to find the unique element in an

Name Age Gender

0 sepal_length 150 non-null float64

sepal_length sepal_width petal_length petal_width

DiabetesPedigreeFunction Age Outcome

Unnamed: 0 patient_id code

min 0.000000 0.000000 0.000000 0.000000 0.000000

BMI DiabetesPedigreeFunction Age Outcome

DiabetesPedigreeFunction Age Outcome

text = [Link](r'[%s]' % [Link]([Link]), '', text)

from sklearn.feature_extraction.text import TfidfVectorizer

from [Link] import RandomForestClassifier

['Logistic Regression: Fake News',

GENERATE RANDOM NUMBER IN AN ARRAY

print("Random Array with Shape (4x4):\n", random_shape_array)

sales_with_tax = sales_data * tax_rate

# 5. Converting Sales Data to Negative Values

4.15888308 6.00881319 4.88280192 6.0799332 4.55387689 5.36597602

14.31782106 19.20937271 4.35889894 14.86606875 16.94107435

[19.57038579 18.734994 7.61577311 17.74823935 8.88819442

PROGRAM TO FIND THE UNIQUE ELEMENT IN AN ARRAY AND

PROGRAM TO CHECK IF AN ARRAY CONTAINS ALL

Maximum Value in the Matrix: 98

Index Positions (Original -> New):

CREATE A MASK THAT FILTERS OUT VALUES GREATER THAN

print("\nAverage grade for each student:")

# 4. Extract grade of Student 3 (index 2 in 0-based indexing)

Reshaped grades (Subjects as rows, Students as columns):

You might also like