Practical no.
Aim: Perform basic data pre-processing tasks such as handling missing values and outliers.
Code:-
For missing values:-
import pandas as pd
data = pd.read_csv("/content/[Link]")
df = [Link](data)
# Checking for missing values using isnull()
missing_values = [Link]()
print(missing_values)
For missing values by boolean:-
bool_series = [Link](data["Gender"])
missing_gender_data = data[bool_series]
print(missing_gender_data)
For missing values:-
missing_values = [Link]()
print(missing_values)
For non missing values:-
non_missing_values = [Link]()
print(non_missing_values)
Practical no. 2
Aim: Apply feature scaling techniques like standardization and normalizationto numerical
features.
Code:-
from [Link] import StandardScaler, MinMaxScaler
import pandas as pd
# Data
df = [Link]({'Age': [25, 45, 35, 50, 23], 'Income': [50000, 120000, 80000,
110000, 75000]})
# Standardization
df_standardized = [Link]()
df_standardized[['Age', 'Income']] = StandardScaler().fit_transform(df[['Age',
'Income']])
# Normalization
df_normalized = [Link]()
df_normalized[['Age', 'Income']] = MinMaxScaler().fit_transform(df[['Age', 'Income']])
print("Standardized:\n", df_standardized, "\n")
print("Normalized:\n", df_normalized)
Output:-
Practical no. 3
Aim: To perform practical of Hypothesis Testing.
Code:-
x=c(6.2,6.6,7.1,7.4,7.6,7.9,8,8.3,8.4,8.5,8.6,8.8,8.8,9.1,9.2,9.4,9.4,9.7,9.9,10.2,10.4,10.8,11.3,11.9)
[Link](x-9,alternative="[Link]",[Link]=0.95)
x=c(418,421,421,422,425,427,431,434,437,439,446,447,448,453,454,463,465)
y=c(429,430,430,431,436,437,440,441,445,446,447)
test2<-[Link](x,y,alternative="[Link]",mu=0,[Link]=F,[Link]=0.95)
test2
Output:-
Practical no. 4
Aim: To perform practical of Analysis of Varience.
Code:-
y1 <- c(18.2, 20.1, 17.6, 16.8, 18.8, 19.7, 19.1)
y2 <- c(17.4, 18.7, 19.1, 16.4, 15.9, 18.4, 17.7)
y3 <- c(15.2, 18.8, 17.7, 16.5, 15.9, 17.1, 16.7)
y <- c(y1, y2, y3)
group <- factor(rep(1:3, each = length(y1)))
tapply(y, group, stem)
tmpfn <- function(x) {
list(sum = sum(x), mean = mean(x), var = var(x), n = length(x))
}
tapply(y, group, tmpfn)
data <- [Link](y = y, group = group)
fit <- lm(y ~ group, data)
anova_fit <- anova(fit)
df <- anova_fit[,"Df"]
names(df) <- c("trt", "err")
df
alpha <- c(0.05, 0.01)
qf(alpha, df["trt"], df["err"], [Link] = FALSE)
anova_fit["Residuals", "Sum Sq"]
anova_fit["Residuals", "Sum Sq"] / qchisq(c(0.025, 0.975), df["err"], [Link] = FALSE)
Output:-
Practical No. 5
Aim: Practical of Simple/Multiple Linear Regression.
Code:-
height <- c(102,117,105,141,135,115,138,114,137,100,131,119,115,121,113)
weight <- c(61,46,62,54,60,69,51,50,46,64,48,56,64,48,59)
student <- lm(weight ~ height)
student
predict(student, [Link] (height = 199), interval="confidence")
plot(student)
Output:-
Practical No. 6
Aim: To perform practicals of Logistics Regression.
Code:-
# Load dataset
library(datasets)
ir_data <- iris
head(ir_data)
str(ir_data)
levels(ir_data$Species)
# Check for missing values
sum([Link](ir_data))
# Subset the data for two species and 100 observations
ir_data <- ir_data[1:100, ]
# Split data into training and testing sets
[Link](100)
samp <- sample(1:100, 80)
ir_test <- ir_data[samp, ]
ir_ctrl <- ir_data[-samp, ]
# Install and load libraries for visualization
if (!require("ggplot2")) [Link]("ggplot2", dependencies = TRUE)
library(ggplot2)
if (!require("GGally")) [Link]("GGally", dependencies = TRUE)
library(GGally)
# Pair plot for test data
ggpairs(ir_test)
# Logistic regression: Predict Species using [Link]
y <- [Link](ir_test$Species == "setosa") # Convert Species to binary
x <- ir_test$[Link]
glfit <- glm(y ~ x, family = "binomial")
summary(glfit)
# Predict on control data
newdata <- [Link](x = ir_ctrl$[Link])
predicted_val <- predict(glfit, newdata, type = "response")
# Combine predictions with control data
prediction <- [Link](
[Link] = ir_ctrl$[Link],
[Link] = ir_ctrl$Species,
[Link] = predicted_val
)
print(prediction)
# Plot predictions
qplot(
prediction$[Link],
round(prediction$[Link]),
col = prediction$[Link],
xlab = "Sepal Length",
ylab = "Prediction using Logistic Regression"
)
Output:-
Practical No. 7
Aim: K- Means Clustering.
Code:-
data(iris)
names(iris)
new_data<-subset(iris,select = c(-Species))
new_data
cl<-kmeans(new_data,3)
cl
data<-new_data
wss<-sapply(1:15,function(k){kmeans(data,k)$[Link]})
wss
plot(1:15,wss,type="b",pch=19,frame=FALSE,xlab="Number of clustersK",ylab ="Total within-clusters sums of
squares")
library(cluster)
clusplot(new_data,cl$cluster,color=TRUE,shade=TRUE, labels=2,lines=0)
cl$cluster
cl$centers
"agglomarative clustering"
clusters<-hclust(dist(iris[,3:4]))
plot(clusters)
clusterCut<-cutree(clusters,3)
table(clusterCut,iris$Species)
Output:-
Practical No. 8
Aim: Principal Component Analysis (PCA)
Code:-
data_iris <- iris[1:4]
cov_data <- cov(data_iris)
print(cov_data) # Print covariance matrix to check
Eigen_data <- eigen(cov_data)
print(Eigen_data$values) # Print eigenvalues to check
PCA_data <- princomp(data_iris, cor = FALSE) # Set cor=FALSE instead of "False"
summary(PCA_data) # Print summary to ensure PCA ran correctly
model2 <- PCA_data$loadings[, 1]
print(model2) # Print the first principal component
model2_scores <- [Link](data_iris) %*% model2
print(head(model2_scores)) # Print the first few PCA scores
if (!require(e1071)) [Link]("e1071", dependencies = TRUE)
library(e1071)
mod1 <- naiveBayes(iris[, 1:4], iris[, 5])
mod2 <- naiveBayes(model2_scores, iris[, 5])
table(predict(mod1, iris[, 1:4]), iris[, 5])
table(predict(mod2, model2_scores), iris[, 5])
Output:-
Practical No. 9
Aim: Data Visualization and Storytelling.
Code:-
import seaborn as sns
import [Link] as plt
# Data
x = [15, 20, 25, 30, 35, 40]
y = [150, 180, 220, 250, 270, 300]
# Scatter plot
[Link](x=x, y=y)
[Link]('Sales vs Advertising Spend')
[Link]()
Output:-