Fds Lab Manual
Fds Lab Manual
II CSE - IV SEMESTER
2025 - 2026
Name :
Register No. :
Roll No. :
PANIMALAR ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)
(A CHRISITIAN MINORITY INSTITUTION) JAISAKTHI EDUCATIONAL TRUST
ACCREDITED BY NATIONAL BOARD OF ACCREDITATION (NBA)
Bangalore Trunk Road, Varadharajapuram, Nasarathpettai,
Poonamallee, Chennai – 600 123
Bonafide Certificate
Staff in charge
Science
Additional Programs
1 Generate random number in an array
4 Case study 2
Program:
import numpy as np
import pandas as pd
import [Link] as sm
from scipy import linalg
A = [Link]([[1,2],[3,4]])
[Link](A)
print("A: ", A)
b = [Link]([[5,6]])
[Link](b.T)
print("multiplication: ", A)
print("norm: ",
[Link](A))
data = [Link]([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
df = [Link](data, columns=['c', 'a'])
print("shape: ", [Link])
print("size: ", [Link])
print("dim: ", [Link])
print("mean: ", [Link](df))
print("std: ", [Link](df))
Output:
A: [[1 2]
[3 4]]
multiplication: [[1 2]
[3 4]]
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
norm: 5.477225575051661
shape: (3, 2)
size: 6
dim: 2
mean: 5.0
std:
c 2.44949
a 2.44949
Result:
Thus the program to explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 2
DATE:
Create an empty and a full NumPy array
Program:
import numpy as
np a =
[Link]((3,3))
print(a)
a = [Link]((3,3))
print(a)
a = [Link]((3,3))
print(a)
a = [Link]((3,3), 5)
print(a)
Output:
[[1.06099790e-312 1.10343781e-312 1.20953760e-312]
[1.20953760e-312 6.79038653e-313 9.76118064e-313]
[1.10343781e-312 1.10343781e-312 1.01855798e-312]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[5 5 5]
[5 5 5]
[5 5 5]]
Result:
Thus the program to to create and initialize NumPy arrays using two different
functions empty() and full() is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 3
DATE:
Program to remove rows in Numpy array that contains non-
numeric Values
Program:
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
a = [Link]([['1', '2', '3'],
['6', 'x', '5'],
['7', '8', '9']])
mask = [Link](a).all(axis=1)
c = a[mask]
print(c)
Output:
[['1' '2' '3']
['7' '8' '9']]
Result:
Thus we have removed the non-numeric values from the array.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 4
DATE:
Reading data from text file, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris
data set
Program:
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
i = pd.read_csv('iris_dataset.csv')
print([Link]())
with open('[Link]', 'r') as t:
print([Link]())
d = pd.read_excel('[Link]')
print([Link]())
Output:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
target
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
hello world!
Program:
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
i = sns.load_dataset('iris')
print([Link]())
print([Link]())
print([Link]())
print([Link]())
print(i['species'].value_counts())
print([Link])
[Link](data=i, x='sepal_width', y='sepal_length', hue='species')
[Link]()
[Link](i, hue='species')
[Link]()
[Link](data=i, x='species')
[Link]()
Output:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
<class '[Link]'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
0 False
1 False
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
2 False
3 False
4 False
145 False
146 False
147 False
148 False
149 False
Length: 150, dtype: bool
species
setosa 50
versicolor 50
virginica 50
Name: count, dtype: int64
(150, 5)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Result:
Thus the program to Read data from text file, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris data set is
executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 5
DATE:
Use the diabetes data set from UCI and Pima Indians Diabetes
data set for performing the following:
a. Univariate analysis
b. Bivariate analysis
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the
two data sets.
[Link]:
import pandas as pd
import [Link] as plt
import numpy as np
uci = pd.read_csv('uci/[Link]')
pima = pd.read_csv('pima/[Link]')
print([Link]())
print([Link]())
print([Link]())
print([Link]())
def uni(ds):
print("mean:\n", [Link]())
print()
print("standard deviation\n", [Link]())
print("median:\n", [Link]())
print("mode:\n", [Link]())
print("variance\n", [Link]())
print("skewness:\n", [Link]())
print("kustosis:\n", [Link]())
uni(pima)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:
Unnamed: 0 patient_id date time code value
0 0 1 04-21-1991 9:09 58 100
1 1 1 04-21-1991 9:09 33 9
2 2 1 04-21-1991 9:09 34 13
3 3 1 04-21-1991 17:08 62 119
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1
standard deviation
Pregnancies 3.369578
Glucose 31.972618
BloodPressure 19.355807
SkinThickness 15.952218
Insulin 115.244002
BMI 7.884160
DiabetesPedigreeFunction 0.331329
Age 11.760232
Outcome 0.476951
dtype: float64
median:
Pregnancies 3.0000
Glucose 117.0000
BloodPressure 72.0000
SkinThickness 23.0000
Insulin 30.5000
BMI 32.0000
DiabetesPedigreeFunction 0.3725
Age 29.0000
Outcome 0.0000
dtype: float64
mode:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 1.0 99 70.0 0.0 0.0 32.0
1 NaN 100 NaN NaN NaN NaN
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Result:
Thus the program for performing the Univariate analysis is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[Link] Analysis:
Program:
import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
from sklearn.linear_model import LinearRegression as
lr from [Link] import mean_squared_error as
mse from sklearn.model_selection import
train_test_split
pima = pd.read_csv("pima/[Link]")
X = [Link](pima['Glucose']).reshape(-1, 1)
y = [Link](pima['BloodPressure']).reshape(-1, 1)
[Link](pima, x='Glucose', y='BloodPressure')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lr()
[Link](X_train, y_train)
preds = [Link](X_test)
print(100 - mse(y_test, preds))
Output:
MSE: -274.49327891888
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Result:
Thus the program for performing the Bivariate analysis is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[Link] Analysis:
Program:
import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
from sklearn.linear_model import LinearRegression as
lr from [Link] import mean_squared_error as
mse from sklearn.model_selection import
train_test_split
pima = pd.read_csv("[Link]")
X = [Link](pima['Glucose']).reshape(-1, 1)
y = [Link](pima['Outcome']).reshape(-1, 1)
[Link](pima, x='BloodPressure', y='Age')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lr()
[Link](X_train, y_train)
preds = [Link](X_test)
print(100 - mse(y_test, preds))
Output:
Mse: 99.81831030079522
Result:
Thus the program for performing the Multivariate analysis is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 6
DATE:
Write a Pandas program to create and display a DataFrame from
a specified
dictionary data which has the index labels.
Program:
import pandas as pd
data = {
"name" : ['john', 'jack', 'jim', 'josh', 'joel'],
"age" : [30, 40, 25, 46, 32],
"city" : ['NY', 'CA', 'WS', 'TX', 'KS']
}
i = [1, 2, 3, 4, 5]
df = [Link](data, index=i)
print(df)
Output:
name age city
1 john 30 NY
2 jack 40 CA
3 jim 25 WS
4 josh 46 TX
5 joel 32 KS
Result:
The Pandas program successfully created and displayed a DataFrame from the
specified dictionary data with custom index labels.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 7
DATE:
Write a Pandas program to get the first 3 rows of a given DataFrame.
Program:
import pandas as pd
data = {
"name" : ['john', 'jack', 'jim', 'josh', 'joel'],
"age" : [30, 40, 25, 46, 32],
"city" : ['NY', 'CA', 'WS', 'TX', 'KS']
}
i = [1, 2, 3, 4, 5]
df = [Link](data, index=i)
f3 = [Link](3)
print(f3)
Output:
name age city
1 john 30 NY
2 jack 40 CA
3 jim 25 WS
Result:
The program successfully calculates the variance and standard deviation of the
given dataset using basic mathematical operations in Python.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 8
DATE:
Program to find the variance and standard deviation of set of
elements
Program:
Import math
dt = [10, 12, 23, 23, 16, 23, 21, 16]
m = sum(dt)/len(dt)
print("mean:", m)
var = 0.0
for i in dt:
var += (i - m)**2
var = var/len(dt)
print("variance:", var)
print("STD:", [Link](var))
Output:
mean: 18.0
variance: 24.0
STD: 4.898979485566356
Result:
The program successfully calculates the variance and standard deviation of the
given dataset using basic mathematical operations in Python.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 9
DATE:
Write a Python program to draw line charts of the financial data
of Alphabet Inc. between October 3, 2016 to October 7, 2016.
Program:
import [Link] as plt
import pandas as pd
data = pd.read_csv("[Link]")
data = [Link](data)
[Link](data['Date'][0:5], data['Close'][0:5])
[Link]()
Output:
Result:
The program successfully visualized the financial data of Alphabet Inc. between
October 3, 2016, and October 7, 2016, using a line chart.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 10
DATE:
Program to plot a Correlation and scatter plots
Program:
import pandas as pd
import numpy as np
import [Link] as plt
import seaborn as sns
d = pd.read_csv("[Link]")
dt = [Link](d[['chol', 'age', 'cp', 'thalach']])
[Link]([Link](), annot=True)
[Link](x=d['chol'], y=d['thalach'])
Output:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Result:
Correlation Matrix: Provided insights into feature relationships, identifying
strong positive or negative correlations.
Scatter Plot: Visualized dependencies and patterns, such as linear trends or
clustering of data points by category.
This method is effective for exploratory data analysis, feature selection, and
understanding the structure of the dataset.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 11
DATE:
Program for Linear Regression and Logistic Regression
Linear Regression
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error
data = pd.read_csv("[Link]")
X = [Link][::, :-1:]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=80)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = [Link](X_test)
lrr = LinearRegression()
[Link](X_train, y_train)
lrr_pred = [Link](X_test)
lrr_acc = [Link](X_test, y_test)
print("accuracy:", lrr_acc*100)
mse = mean_squared_error(y_test, lrr_pred)
print(f"MSE: {mse}")
intercept = lrr.intercept_coefficients = lrr.coef_ print(f"Intercept: {intercept}")
print(f"Coefficients: {coefficients}")
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:
accuracy: 54.554261744998044
Result:
Thus the program for performing linear regression is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Logistic Regression
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LogisticRegression
from [Link] import confusion_matrix, accuracy_score, roc_curve,
classification_report, ConfusionMatrixDisplay
data = pd.read_csv("[Link]")
X = [Link][::, :-1:]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=80)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = [Link](X_test)
lr = LogisticRegression()
[Link](X_train, y_train)
lr_pred = [Link](X_test)
lr_conf = confusion_matrix(y_test, lr_pred)
lr_acc = [Link](X_test, y_test)
print("confusion matrix:\n", lr_conf)
ConfusionMatrixDisplay.from_estimator(lr, X_test,
y_test) print("accuracy:", lr_acc*100)
print(classification_report(y_test, lr_pred))
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:
confusion matrix:
[[81 16]
[ 9 99]]
accuracy: 87.8048780487805
precision recall f1-score support
0 0.90 0.84 0.87 97
1 0.86 0.92 0.89 108
accuracy 0.88 205
macro avg 0.88 0.88 0.88 205
weighted avg 0.88 0.88 0.88 205
Result:
Thus the program for performing logistic regression is executed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 12
DATE:
Apply and explore various plotting functions on UCI data sets.
a. Normal curves:
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
data = pd.read_csv("[Link]")
[Link](data["chol"], bins=30, kde=True, stat="count", color="skyblue",
label="Histogram + KDE")
x = [Link](min(data["chol"]), max(data["chol"]), 100)
[Link](x, [Link](x, [Link](data["chol"]), [Link](data["chol"])),
color="red", lw=2, label="Normal Distribution")
[Link]()
[Link]()
Output:
Result:
This exploration highlights the applicability of normal curve plotting to
understand data distributions and identify key statistical properties.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
b. Density and Contour Plot
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
data = pd.read_csv("[Link]")
[Link](x=[Link], y=[Link], fill=True, levels=20)
[Link]([Link], fill=True, levels=20)
Output:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Result:
The visualizations effectively demonstrated relationships between features and
the distribution of data in the UCI dataset.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
C: Correlation and Scatter plots:
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
from [Link] import StandardScaler
data = pd.read_csv("[Link]")
scaler = StandardScaler()
X = scaler.fit_transform(data)
[Link](figsize=(12,12))
[Link]([Link](), annot=True, cmap="coolwarm", fmt=".2f")
[Link](x=[Link], y=[Link])
Output:
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Result:
Correlation Heatmap: Identified numerical features with strong or weak
relationships in the dataset.
Scatter Plot: Provided a visual representation of the dependency between two
features, confirming trends indicated by the correlation matrix.
This approach helps understand feature interactions and guides further
exploratory or predictive analysis.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
d. Histograms:
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import norm
from [Link] import StandardScaler
data = pd.read_csv("[Link]")
scaler = StandardScaler()
X = scaler.fit_transform(data)
[Link](figsize=(15, 15))
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
Output:
Result:
Insights Gained: The histograms provided a clear understanding of the
frequency distribution of numerical attributes.
Histograms are an effective tool for identifying data spread, outliers, and
patterns.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
e. Three Dimensional Scatter Plotting:
Program:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
import [Link] as px
from [Link] import norm
from [Link] import StandardScaler
data = pd.read_csv("[Link]")
scaler = StandardScaler()
X = scaler.fit_transform(data)
fig = px.scatter_3d(data, x="age", y="chol", z="thalach", color='slope')
[Link]()
Output:
Result:
The 3D plot effectively visualized interactions between three numerical features
in the dataset.
3D visualizations are useful for uncovering complex patterns, clusters, or
outliers that are not apparent in 2D plots.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 13
DATE:
Perform Mini Project on Fake News Detection.
Program:
import pandas as pd
import numpy as np
import [Link] as plt
import nltk as nlp
import re
import string
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score, classification_report
d_fake = pd.read_csv("[Link]")
d_true = pd.read_csv("[Link]")
d_true["text"] = d_true["text"].replace("(Reuters)", "", regex=True)
d_fake['target'] = 0
d_true['target'] = 1
d_fake = d_fake.drop(["title", "subject", "date"], axis=1)
d_true = d_true.drop(["title", "subject", "date"], axis=1)
df = [Link]([d_fake, d_true], axis=0)
df = [Link](frac=1)
df.reset_index(inplace=True)
[Link](["index"], axis=1, inplace=True)
def wordopt(text):
text = [Link]()
text = [Link](r'\[.*?\]', '', text)
text = [Link](r'[()]', '', text)
text = [Link](r'\\W', ' ', text)
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
text = [Link](r'https?://\S+|www\.\S+', '', text)
text = [Link](r'<.*?>', '', text)
def output(n):
if (n == 0):
return "Fake News"
elif (n == 1):
return "Not Fake News"
def testing(news):
testing_news = {"text": [news]}
new_test = [Link](testing_news)
new_test = new_test["text"].apply(wordopt)
new_xv_test = [Link](new_test)
pred_lr = [Link](new_xv_test)
pred_dtc = [Link](new_xv_test)
pred_gbc = [Link](new_xv_test)
pred_rfc = [Link](new_xv_test)
return [pred_lr, pred_dtc, pred_gbc, pred_rfc]
def printout(pl):
p = []
[Link]("Logistic Regression: " + output(pl[0]))
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[Link]("Decision Tree Classifier: " + output(pl[1]))
[Link]("Gradient Boosting Classifier: " + output(pl[2]))
[Link]("Random Forest Classifier: " + output(pl[3]))
return p
news = str(input())
printout(testing(news))
Output:
"The government announces a new stimulus package that will boost the
economy by 100%."
Result:
Thus mini project to classify news articles as either fake or real based on textual
content is developed.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
EX. NO.: 14
DATE:
Build an application to detect colors in the given picture using
Basic Data Science
Program:
import numpy as np
import pandas as pd
from [Link] import KMeans
import cv2
from [Link] import files
from PIL import Image
from io import BytesIO
import collections
uploaded = [Link]()
image = [Link](BytesIO(uploaded[list([Link]())[0]]))
image_np = [Link](image)
image_colors = [Link](image_np, cv2.COLOR_BGR2RGB)
pixels = image_colors.reshape((-1, 3))
num_clusters = 10
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
[Link](pixels)
labels = kmeans.labels_
colors = kmeans.cluster_centers_
color_counts = [Link](labels)
color_names = [
"Red", "Green", "Blue", "Yellow", "Purple", "Orange", "Pink", "Brown",
"Black", "White"
]
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
for i in range(num_clusters):
color_name = color_names[i % len(color_names)]
count = color_counts[i]
rgb_color = tuple(map(int, colors[i]))
print(f"Color: {color_name}, RGB: {rgb_color}, Count: {count}")
Output:
Color: Red, RGB: (9, 6, 134), Count: 580263
Color: Green, RGB: (14, 60, 200), Count: 140996
Color: Blue, RGB: (7, 37, 42), Count: 599477
Color: Yellow, RGB: (5, 141, 161), Count: 457859
Color: Purple, RGB: (96, 135, 166), Count: 201281
Color: Orange, RGB: (13, 102, 124), Count: 551993
Color: Pink, RGB: (75, 63, 40), Count: 140930
Color: Brown, RGB: (35, 169, 207), Count: 223069
Color: Black, RGB: (134, 190, 218), Count: 175200
Color: White, RGB: (10, 60, 77), Count: 615332
Result:
Thus the application to detect colors in the given picture using Basic Data
Science is built and output is verified
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
DATE:
[Link]: 1
ADDITIONAL PROGRAMS
PROGRAM:
import numpy as np
random_integers = [Link](0, 10, 5)
print("Random Integer Array:", random_integers)
random_floats = [Link](5)
print("Random Float Array:", random_floats)
random_normal = [Link](5)
print("Random Normal Distribution Array:", random_normal)
random_2d_array = [Link](0, 10, (3, 3))
print("Random 2D Integer Array:\n", random_2d_array)
random_shape_array = [Link]((4, 4))
OUTPUT:
Random Integer Array: [0 6 9 0 6]
Random Float Array: [0.910141 0.37395005 0.04729324 0.99031385
0.93985559]
Random Normal Distribution Array: [-0.58544812 -0.31042167 -0.31726232 -
2.18994984 0.42318254]
Random 2D Integer Array:
[[8 8 5]
[7 6 8]
[5 8 3]]
Random Array with Shape (4x4):
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
[[0.72800535 0.42900513 0.59822429 0.38490628]
[0.98336226 0.58878188 0.1133993 0.12893145]
[0.19279917 0.84499983 0.0170265 0.23372838]
[0.47922397 0.50141304 0.03462652 0.97882316]]
RESULT:
Thus the program was successfully executed and verified
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
DATE:
[Link]: 2
PROGRAM THAT COMPARES TWO ARRAYS ELEMENT AND
RETURN A BOOLEAN ARRAY
PROGRAM:
import numpy as np
def compare_arrays(arr1, arr2):
if [Link] != [Link]:
raise ValueError("Arrays must have the same shape to compare")
return arr1 == arr2
arr1 = [Link]([1, 2, 3, 4, 5])
arr2 = [Link]([1, 2, 0, 4, 5])
result = compare_arrays(arr1,
arr2)
print("Boolean Comparison Result:", result)
OUTPUT:
Boolean Comparison Result: [ True True False True True]
RESULT:
Thus the program was successfully executed and verified.
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
DATE:
[Link]
CASE STUDY:
You are a data scientist working for a company that manages a chain of fitness
centres across various regions. the company collects data on the monthly
number of members attending classes in different fitness centers. you need to
analyse this data to understand key matrices like:
1. The average number of attendes for fitness centre.
2. The variation in attendance across different centres (standard deviation).
3. The centres with the min and max attendance over the month.
4. The centre with the highest average attendance the data is stored in a 2d
array where each column represents the number of members attending the
class on a specific day [30days].
PROGRAM:
import numpy as np
attendance_data = [Link]([
[50, 45, 52, 49, 47, 48, 50, 51, 49, 52, 46, 44, 50, 49, 53, 47, 50, 48, 52, 51,
46, 47, 50, 53, 49, 48, 50, 51, 52, 53],
[60, 58, 61, 59, 62, 60, 61, 63, 59, 64, 58, 57, 62, 60, 65, 59, 60, 58, 63, 62,
57, 58, 61, 65, 60, 59, 62, 63, 64, 65],
[40, 38, 42, 39, 41, 40, 41, 43, 39, 44, 38, 37, 41, 40, 45, 39, 40, 38, 43, 42,
37, 38, 41, 45, 40, 39, 42, 43, 44, 45]])
average_attendance = [Link](attendance_data, axis=1)
std_deviation = [Link](attendance_data, axis=1)
total_attendance = [Link](attendance_data, axis=1)
min_attendance_center = [Link](total_attendance)
max_attendance_center = [Link](total_attendance)
highest_avg_center = [Link](average_attendance)
print("Average Attendance per Center:", average_attendance)
print("Standard Deviation of Attendance per Center:", std_deviation)
print(f"Center with Minimum Attendance: Center {min_attendance_center +
PANIMALAR ENGINEERING COLLEGE
DEPARTMENT OF CSE
REG NO:
1}")
print(f"Center with Maximum Attendance: Center {max_attendance_center +
1}")
print(f"Center with Highest Average Attendance: Center {highest_avg_center +
1}")
OUTPUT:
Average Attendance per Center: [49.4 60.83333333 40.8]
Standard Deviation of Attendance per Center: [2.38886305 2.38164276
2.37205958]
Center with Minimum Attendance: Center 3
Center with Maximum Attendance: Center 2
Center with Highest Average Attendance: Center
2
RESULT:
Thus the program was successfully executed and verified.
Reg. No.:
DATE:
[Link]
CASE STUDY:
You are analyst working for a e-commerce platform that collects daily
transaction data for the products sold on the websites. the company is looking to
implement certain data processing task to gain better insight into the sales
trends and optimize marketing strategies. they require you to apply some
fundamental mathematical transformations on the sales data for further analysis.
the dataset consists of daily sales figures for multiple products over a 30 day
period. Each row corresponds to a specific product and each column contains
the sales figures for that product on the given day. Give the nature of the
analysis you are expected to perform the following operations on the data using
numPy's, unary universal functions.
1. Your task is to adjust the sales data by scaling it accordingly, reflecting
this potential price change.
2. Compute a logarithmic transformation of the sales data to archieve this goal.
3. You are tasked with adding this tags to each sale in the dataset, adjusting
the sales figure accordingly.
4. You need to apply a mathematical transformation(sqrt) to the sales data to
model this phenomenon, where larger sales figures are compressed more
than smaller ones.
5. Convert all sales data to negative values, which could be useful for calculating
refund or loses.
6. Your tasked with applying this rounding operations across the entire data set.
PROGRAM:
import numpy as np
# Sample sales data (5 products, sales over 30 days)
[Link](0)
sales_data = [Link](10, 500, (5, 30))
# 1. Scaling the Sales Data (e.g., increasing by 10%)
scaling_factor = 1.10
scaled_sales = sales_data * scaling_factor
# 2. Logarithmic Transformation
log_sales = [Link](sales_data + 1) # Adding 1 to avoid
log(0) # 3. Adding Tax (e.g., 8% tax)
tax_rate = 1.08
Reg. No.:
[302.5 455.4 137.5 521.4 278.3 227.7 379.5 485.1 503.8 382.8 119.9 530.2
205.7 278.3 324.5 172.7 172.7 448.8 476.3 327.8 504.9 302.5 214.5 150.7
46.2 45.1 233.2 279.4 177.1 190.3]
[515.9 418. 212.3 41.8 330. 151.8 151.8 473. 69.3 438.9 52.8 547.8
279.4 311.3 379.5 437.8 126.5 57.2 497.2 45.1 424.6 293.7 364.1 546.7
478.5 73.7 331.1 404.8 141.9 304.7]
[484. 101.2 111.1 433.4 448.8 119.9 69.3 446.6 144.1 479.6 103.4 234.3
367.4 299.2 508.2 62.7 150.7 155.1 517. 402.6 209. 547.8 378.4 168.3
173.8 260.7 497.2 317.9 238.7 447.7]
[421.3 386.1 63.8 346.5 86.9 196.9 190.3 503.8 115.5 227.7 114.4 292.6
416.9 206.8 332.2 470.8 345.4 394.9 436.7 118.8 57.2 518.1 415.8 546.7
456.5 232.1 432.3 11. 444.4 418. ]]
Log Transformed Sales Data:
[[5.20948615 4.06044301 4.85203026 5.31320598 5.81114099 5.5683445
5.32787617 5.91350301 2.99573227 5.40267738 5.66296048 5.53338949
5.71373281 4.58496748 4.39444915 6.18001665 4.59511985 6.00881319
5.78382518 5.31811999 6.20859003 3.91202301 4.58496748 5.22035583
4.59511985 5.85220248 5.170484 3.58351894 5.84064166 4.41884061]
[5.62040087 6.02827852 4.83628191 6.1633148 5.53733427 5.33753808
5.84643878 6.09130988 6.12905021 5.85507192 4.70048037 6.18001665
5.23644196 5.53733427 5.69035945 5.06259503 5.06259503 6.01371516
6.07304453 5.70044357 6.13122649 5.62040087 5.27811466 4.92725369
3.76120012 3.73766962 5.36129217 5.54126355 5.08759634 5.1590553 ]
[6.15273269 5.94279938 5.26785816 3.66356165 5.70711026 4.93447393
4.93447393 6.06610809 4.15888308 5.99146455 3.8918203 6.2126061
5.54126355 5.64897424 5.84643878 5.98896142 4.75359019 3.97029191
6.11589213 3.73766962 5.95842469 5.59098698 5.80513497 6.21060008
6.07764224 4.21950771 5.71042702 5.91079664 4.86753445 5.62762111]
[6.08904488 4.53259949 4.62497281 5.97888576 6.01371516 4.70048037
Reg. No.:
433 298 459 275 195 137 42 41 212 254 161 173]
[469 380 193 38 300 138 138 430 63 399 48 498 254 283 345 398 115 52
452 41 386 267 331 497 435 67 301 368 129 277]
[440 92 101 394 408 109 63 406 131 436 94 213 334 272 462 57 137 141
470 366 190 498 344 153 158 237 452 289 217 407]
[383 351 58 315 79 179 173 458 105 207 104 266 379 188 302 428 314 359
397 108 52 471 378 497 415 211 393 10 404 380]]
RESULT:
Thus the program was successfully executed and verified.
Reg. No.:
DATE:
[Link]: 5
PROGRAM:
import numpy as np
arr = [Link]([1, 2, 3, 2, 3, 4, 5, 5, 6, 3, 2, 1, 4])
unique_elements, counts = [Link](arr, return_counts=True)
print("Unique elements:", unique_elements)
print("Counts of unique elements:", counts)
for i in range(len(unique_elements)):
print(f"Element {unique_elements[i]} occurs {counts[i]} times")
OUTPUT
Unique elements: [1 2 3 4 5 6]
Counts of unique elements: [2 3 3 2 2 1]
Element 1 occurs 2 times
Element 2 occurs 3 times
Element 3 occurs 3 times
Element 4 occurs 2 times
Element 5 occurs 2 times
Element 6 occurs 1 times
RESULT:
Thus the program to find the unique element in an array and count the
occurrence was written and executed successfully.
Reg. No.:
DATE:
[Link]: 6
PROGRAM TO IMPLEMENT MATRIX MULTIPLICATION FOR
TWO DIMENSIONAL ARRAYS USING NUMPY
PROGRAM:
import numpy as np
A = [Link]([[1, 2, 3],
[4, 5, 6]])
B = [Link]([[7, 8],
[9, 10],
[11, 12]])
result = [Link](A, B)
print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)
print("\nResultant Matrix (A x B):")
print(result)
OUTPUT:
Matrix A: [[1 2 3]
[4 5 6]]
Matrix B: [[ 7 8]
[ 9 10]
[11 12]]
Resultant Matrix (A x B):
[58 64]
[139 154]]
RESULT:
Thus the above program to matrix multiplication for two-dimensional
arrays using NumPy was written and executed successfully.
Reg. No.:
DATE:
[Link]: 7
PROGRAM:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
if (arr > 0).all():
print("The array contains all positive numbers.")
else:
print("The array does not contain all positive numbers.")
OUTPUT:
The array contains all positive numbers.
RESULT:
Thus the program to check if an array contains all positive numbers using a
comparison operator was written and executed successfully.
Reg. No.:
DATE:
EX NO: 8
PROGRAM TO CREATE A RANDOM 3×3 MATRIX AND FIND THE
MAXIMUM VALUE IN THE MATRIX
PROGRAM:
import numpy as np
matrix = [Link](1, 101, (3,
3)) max_value = [Link](matrix)
print("Random 3x3 Matrix:")
print(matrix)
print("\nMaximum Value in the Matrix:", max_value)
OUTPUT:
Random 3x3 Matrix:
[[23 89 45]
[67 12 98]
[56 34 77]]
RESULT:
Thus the program to create a random 3×3 matrix and find the maximum
value in the matrix using NumPy was written and executed successfully.
Reg. No.:
DATE:
EX NO: 9
GIVEN A 3X3 MATRIX, SORT THE ARRAY IN ASCENDING ORDER
ALONG EACH ROW, SORT IT IN DESCENDING ORDER ALONG
EACH COLUMN. FIND THE INDEX POSITION OF THE ELEMENT
AFTER READING.
PROGRAM:
import numpy as np
matrix = [Link]([
[4, 2, 9],
[1, 8, 5],
[7, 6, 3]])
original_matrix = [Link]()
[Link](axis=1)
matrix = [Link](matrix, axis=0)[::-1]
index_positions = {}
for i in range(3):
for j in range(3):
element = matrix[i, j]
original_index = [Link](original_matrix == element)
index_positions[element] = (original_index[0][0], original_index[1][0])
print("Sorted Matrix (Rows Ascending, Columns Descending):")
print(matrix)
print("\nIndex Positions (Original -> New):")
for element, pos in index_positions.items():
print(f"Element {element}: Original Index {pos}")
OUTPUT:
Sorted Matrix (Rows Ascending, Columns Descending):
[[3 6 9]
[2 5 8]
[1 4 7]]
Reg. No.:
RESULT:
This program is to sort a 3x3 matrix in ascending order along each row, then in
descending order along each column was written and executed successfully.
Reg. No.:
DATE:
[Link]: 10
PROGRAM:
import numpy as np
data = [Link]([10, 20, 30, 60, 40, 50, 70, 80])
mask = data <= 50
filtered_values = data[mask]
sum_filtered_values = [Link](filtered_values)
print("Filtered values:", filtered_values)
print("Sum of remaining values:", sum_filtered_values)
OUTPUT:
Filtered values: [10 20 30 40 50]
Sum of remaining values: 150
RESULT:
The program is to filter out values greater than 50 and compute the sum of the
remaining values was written and executed successfully.
Reg. No.:
DATE:
[Link]: 11
CASE STUDY:
You are a data analyst working for a school discrete the school wants to track
the grades of students in different subjects across multiple terms. you ask with a
performing basic operations of numpy. you have five grades in three subjects
for two terms.
TASK TO PERFORM:
[Link] the grade for all students by 5 points in each subjects.
[Link] the average grade for each students across all the subjects.
3. Find the highest grade in each subject.
4. extract the grade of student 3.
5. Reshape the grade array so that the subject becomes rows and student
becomes column.
6. Add 10 points to each student the math grade and 5 points to science and
no change in english.
PROGRAM:
import numpy as np
grades = [Link]([
[75, 80, 85],
[88, 76, 90],
[92, 85, 87],
[78, 88, 82],
[85, 89, 84] ])
# 1. Increase all grades by 5 points
grades += 5
print("Grades after adding 5 points:")
print(grades)
# 2. Calculate the average grade for each student across all subjects
average_grades = [Link](grades, axis=1)
Reg. No.:
Grades of Student 3:
[97 90 92]
RESULT:
Thus the Student performance analysis is executed successfully.