0% found this document useful (0 votes)

15 views

BDA LAB Experiments

Uploaded by

athulyakrishna501

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

BDA LAB Experiments

Uploaded by

athulyakrishna501

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Big Data Analytics lab

1. SETTING UP AND INSTALLING HADOOP

HADOOP
Hadoop is an open-source framework designed for processing and storing
large datasets across distributed computing clusters. It enables the handling of big
data by providing a scalable and fault-tolerant solution. The core components of
Hadoop are the Hadoop Distributed File System (HDFS) and the MapReduce
programming model. HDFS divides data into blocks and replicates them across
multiple nodes for reliability. MapReduce allows for parallel processing of data by
splitting it into smaller tasks that can be executed in parallel across the cluster.
Hadoop is widely used in various industries for tasks like data analysis, machine
learning, and log processing, providing a cost-effective and efficient solution for big
data processing.

Step 1
sudo apt install openjdk-11-jdk

Step 2
#Open bashrc file and modify contents
sudo nano bashrc
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-11-openjdk-amd64/bin
export HADOOP_HOME=~/hadoop-3.3.4/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export
HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-
streaming-3.3.4.jar
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export PDSH_RCMD_TYPE=ssh
Save: Control+O
Exit: Control+X

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Step 3:
( ssh — secure shell — protocol used to securely connect to remote server/system —
transfers data in encrypted form)

sudo apt-get install ssh

Step 4:
#now open hadoop-env.h
sudo nano hadoop-env.h
#Save and Exit

Step 5:
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

step 6:
ssh localhost

step 7:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

step 8:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

step 9:
chmod 0600 ~/.ssh/authorized_keys

step 10:
hadoop-3.3.4/bin/hdfs namenode –format

step 11:
export PDSH_RCMD_TYPE=ssh

step 12:
start-all.sh

*Once installed how to start hadoop

stop-all.sh
hadoop namenode -format

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

start-all.sh

2. SHELL COMMANDS IN HADOOP

Shell commands in Hadoop provide a convenient and efficient way to interact

with the Hadoop ecosystem through the command line interface. These commands
allow users to perform various tasks related to managing and manipulating data in
Hadoop. For example, the "hadoop fs" command is used to interact with the Hadoop
Distributed File System (HDFS), allowing users to create, delete, copy, and move
files and directories. The "hadoop jar" command is used to submit MapReduce jobs to
the Hadoop cluster, enabling the processing of large datasets in a distributed manner.
Additionally, there are commands for monitoring cluster status, managing
permissions, and configuring Hadoop settings. Shell commands in Hadoop streamline
the management and execution of tasks, providing a flexible and powerful toolset for
working with big data.

1. mkdir: To make a directory

Syntax: hadoop fs -mkdir /<directory name>

2. nano: to create a file locally

Syntax: nano <filename>

3. rmdir: to remove a directory

Syntax: hadoop fs -rmdir /<directory name>

4. version: to get the current version of hadoop

Syntax: hadoop version

5. To list out the files

Syntax: hadoop fs -ls

3. FILE MANAGEMENT TASKS IN HADOOP

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

1. put: To put a file from local into directory in hadoop

Syntax: hadoop fs -put <filename>/<directory_name>

2. get: To get a file from the hadoop directory to a local directory

Syntax: hadoop fs -get /<directory_name>/<file_name> <directory_name>

3. rm: To remove a file from hadoop directory

Syntax: hadoop fs -rm/<directory_name>/<file_name>

4. cp: to copy file from one directory to another

Syntax: hadoop fs -cp/<directory_name>/<file_name> /<directory_name>

4. PROGRAM TO FIND FACTORIAL AND CHECK PALINDROME

AIM
Write an R program to find factorial and check palindrome.

ALGORITHM

1. Start
2. Import library stringi
3. Algorithms for palindrome (x)
3.1 using stri_reverse function; if stri_reverse(x) is equal to x then
3.1.1 print that the number is palindrome
3.2 else.
3.2.1 print that the number is not palindrome
4.Algorithm for factorial (y)
4.1 factt=1
4.2 if y<0 then
4.2.1 print that y is negative and factorial is not possible
4.3 if y=0 then
4.3.1 print that the factorial of 0 is 1
4.3.2 Call function palindrome(1)
4.4 else
4.4.1 for i from y to 1 do.
4.4.1.1 factt=factt*i
4.4.2 print factt
4.4.3 Call function palindrome(factt)

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

5. Read a number from user,k

6. call function factorial (k)
7. Stop.

PROGRAM
library(stringi)
palin<-function(x){
if(stri_reverse(x)==x){
print(paste(x," is a palindrome"))
}else{
print(paste(x," is not a palindrome"))
}
}

fact=function(y){
factt=1
if(y<0){
print(paste(y, "is a negative number"))
}else if(y==0) {
print("The factorial of 0 is 1")
palin(y)
}else{
for (i in 1:y){
factt=factt*i
}
print(paste("The factorial of ", y," is ",factt ))
palin(factt)
}

}
k=as.integer(readline("Enter a number: "))
fact(k)

OUTPUT

Enter a number: 1

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

[1] "The factorial of 1 is 1"

[1] "1 is a palindrome"

Enter a number: 11
[1] "The factorial of 11 is 39916800"
[1] "39916800 is not a palindrome"

Enter a number: 0
[1] "The factorial of 0 is 1"
[1] "0 is a palindrome"

Enter a number: -9
[1] "-9 is a negative number"

5. PROGRAM TO CHECK IF A NUMBER IS PRIME

AIM
Write an R program to check if a number is prime

ALGORITHM
1. Start.
2. Algorithm for prime (a)
2.1 flag=0
2.2 for i from 2 to x-1
2.2.1 if i mod x is equal to x then
2.2.1.1. flag=1
2.3 if x= 2 then
2.3.1 flag=0
2.4 if flag=0 then
2.4.1 Print x is a prime number.
2.5 else
2.5.1 print x is not a prime number
3. Stop
Read a number, k
14. call fusction prime (k)
5 Stop

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

PROGRAM
prim=function(x){
flag=0

for(i in 2:(x-1)){
if((i%%x)==0){
flag=1
}
}
if(x==2){
flag=0
}
if(flag==0){
print(paste(x," is a prime number"))
}else{
print(paste(x," is not a prime number"))
}
}

k=as.integer(readline("Enter a number: "))

prim(k)

OUTPUT

Enter a number: 6
[1] "6 is not a prime number"
> source("~/Desktop/07-05/primmm.R")
Enter a number: 5
[1] "5 is a prime number"
> source("~/Desktop/07-05/primmm.R")
Enter a number: 2
[1] "2 is a prime number"

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

3. PROGRAM TO PRINT A PATTERN

AIM
Write an R program to print a pattern

ALGORITHM

1. Start
2. for i from 1 to 5
2.1 v=c()
2.2 for j from i to 1
2.2.1 v=c(v,c(“*”))
2.3 print v
3. for i from 1 to 5
3.1 v=()
3.2 for j from i to 5
3.2.1 v=c(v,c(“*”))
3.3 print v
4. Stop

PROGRAM
print("pattern")
for(i in 1:5){
v=c()
for (j in i:1){
v=c(v,c("*"))
}
print(v)
}
for(i in 1:5){
v=c()
for (j in i:5){
v=c(v,c("*"))
}
print(v)
}

OUTPUT

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

[1] "pattern"
[1] "*"
[1] "*" "*"
[1] "*" "*" "*"
[1] "*" "*" "*" "*"
[1] "*" "*" "*" "*" "*"
[1] "*" "*" "*" "*" "*"
[1] "*" "*" "*" "*"
[1] "*" "*" "*"
[1] "*" "*"
[1] "*"

4. PROGRAM TO IMPLEMENT A SIMPLE CALCULATOR

AIM
Write an R program to implement a simple calculator

ALGORITHM

1. Start
2. Algorithm for add(a,b)
a. return (a+b)
3. Algorithm for subtract(a,b)
a. Return a-b
4. Algorithm for multiple(a,b)
a. Return a*b
5. Algorithm for divide (a,b)
a. If b=0 then return ‘Not possible’
b. Else return a/b
6. Read a choice ‘ch’; 1 for addition, 2 for subtraction, 3 for multiplication and 4 for
division
7. Using switch() choose the operator
8. Read 2 values a and b
9. Using switch() call function based on the corresponding function is to be performed
and store the return value to a variable r
10. Print r
11. Stop

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

PROGRAM

add=function(a,b){
print(paste("The sum is: ",a+b)
}
sub=function(a,b){
print(paste("The Difference is: ",a-b)
}
mul=function(a,b){
print(paste("The Multiplicated value is: ",a*b)
}
div=function(a,b){
print(paste("The Divided value is: ",(a%/%b)
}
print("Enter the choice: ")
print("1. Addition")
print("2. Subtraction")
print("3. Multiplication")
print("4. Division")
ch=as.integer(readline("Enter the choice: "))
a=as.integer(readline("Enter the first number: "))
b=as.integer(readline("Enter the second number: "))
op=switch(ch,"+","-","*","/")
r=switch(ch,add(a,b),sub(a,b),mul(a,b),div(a,b))
print(paste(a," ",op, " = ",r ))

OUTPUT

[1] "Enter the choice: "

[1] "1. Addition"
[1] "2. Subtraction"
[1] "3. Multiplication"
[1] "4. Division"
Enter the choice: 4
Enter the first number: 5
Enter the second number: 0

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

[1] "5 / 0 = Not Possible"

5. PROGRAM TO PRINT FIBONACCI SERIES

AIM
Write an R program to print fibonacci series of a number
ALGORITHM
1. Start
2. Algorithm for fib(x)
a. Assign x1=0 and x2=1
b. l=empty vector
c. l=vector(l,x1)
d. l=vector(l,x2)
e. For i from 3 to x do
i. xn=x1+x2
ii. l=vector(l,xn)
iii. x1=x2
iv. x2=xn
f. Print
3. Read an integer n
4. Call function fib(n)
5. Stop

PROGRAM

fib=function(x)#Function which prints the fibonacci series

{
x1=0
x2=1
l=c()
print(paste("The fibonacci series for ",x," numbers is: \n"))
l=c(l,x1)
l=c(l,x2)#Inserting the first two into the vector
for (i in 3:x)
{
xn=x1+x2
l=c(l,xn)

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

x1=x2
x2=xn
}
print(l)#printing the series as a vector
}
n=as.integer(readline(prompt = "Enter a number:"))
fib(n)

OUTPUT
Enter a number:8
[1] "The fibonacci series for 8 numbers is: \n"
[1] 0 1 1 2 3 5 8 13

6. PROGRAM TO FIND GCD OF 2 NUMBERS

AIM
Write an R program to print the GCD of 2 numbers

ALGORITHM

1. Start
2. Algorithm gcd(a,b)
a. Assign gcdd=1
b. If a<b then l=a else l=b
c. for i from 1 to l do
i. if a mod i and b mod i = 0 then gcdd=i
d. return gcdd
3. Read 2 integers a and b
4. Call function gcd(a,b) and store the result to a variable k
5. print k
6. Stop

PROGRAM
#greatest common divisor
gcd=function(a,b){
gcdd=1
if(a<b)
{

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

l=a
}
else
{
l=b
}
for (i in 1:l)
{
if((a%%i==0)&&(b%%i==0))
{
gcdd=i
}
}
return(gcdd)
}
a=as.integer(readline("Enter a number:"))
b=as.integer(readline("Enter another number:"))
k=gcd(a,b)
print(paste("The GCD is: ",k))

OUTPUT
Enter a number:6
Enter another number:5
[1] "The GCD is: 1"
> source("~/Desktop/14-03/gcd.R")
Enter a number:6
Enter another number:4
[1] "The GCD is: 2"

7. PROGRAM TO FIND LCM OF 2 NUMBERS

AIM
Write an R program to find LCM of 2 numbers
ALGORITHM
1. Start
2. Algorithm lcm(a,b)
a. Assign lcmm=1
b. If a<b then g=b else g=a

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

c. While true do
i. if g mod a and g mod b = 0 then lcmm=g and break
ii. g=g+1
d. return lcmm
3. Read 2 integers a and b
4. Call function lcm(a,b) and store the result to a variable k
5. print k
6. Stop

PROGRAM

#lcm
lcm=function(a,b){
lcmm=1
if(a<b)
{
g=b
}
else
{
g=a
}
while(TRUE)
{
if((g%%a==0)&&(g%%b==0))
{
lcmm=g
break
}
g=g+1
}
return(lcmm)
}
a=as.integer(readline("Enter a number:"))
b=as.integer(readline("Enter another number:"))
k=lcm(a,b)
print(paste("The LCM is: ",k))

OUTPUT

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Enter a number:6
Enter another number:5
[1] "The LCM is: 30"

8. PROGRAM TO PRINT THE SUM OF N NATURAL NUMBERS

AIM
Write an R program to print the sum of N natural numbers
ALGORITHM
1. Start
2. Read an integer n
3. Assign sum=0
4. For i from 1 to n do
a. Sum =sum+i
5. Print sum
6. stop

PROGRAM
#sum of n natural numbers
n=as.integer(readline("Enter a natural number:"))
sum=0
for (i in 1:n)
{
sum=sum+i
}
cat("The sum is: ",sum)

OUTPUT

Enter a natural number:5

The sum is: 15

9. PROGRAM TO PRINT OCCURENCES OF N RANDOM NUMBERS

AIM

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Write an R program to create a list of random numbers in normal distribution and count
occurrences of each values
ALGORITHM

1. Start
2. Using rnorm() function, get 50 random values
3. Using table() function get the occurrences of each value
4. Print the table
5. Stop

PROGRAM

x=floor(rnorm(n=50))
t = table(x)
print("Occurrences of each value:")
print(t)

OUTPUT
[1] "Occurrences of each value:"
x
-3 -2 -1 0 1 2
1 10 19 15 4 1

10. PROGRAM TO PLOT A BARPLOT

AIM
Write an R program to plot a barplot
ALGORITHM

1. Start
2. Read the subjects
3. Read the corresponding marks
4. Using barplot() function, plot the graph
5. Stop

PROGRAM
subject=c("BDA","PR","RIS","IEFT","AAD")
mark=c(47,49,34,46,48)
barplot(mark,names.arg=subject,main="Bar
Plot",xlab="Subject",ylab="Mark",col="royalblue2")

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

OUTPUT

11. PROGRAM TO PRINT SUM, MEAN AND PRODUCT OF A VECTOR

AIM
Write an R program to compute sum, mean and product of a given vector
ALGORITHM
1. Start
2. Read a vector, v
3. Sum of that vector can be found using sum() function
4. Mean can be found using mean() function
5. To a variable ‘size’ store the length of the vector, v
6. For i from 1 to size do
a. prod=prod*v[i]
7. Print prod
8. Stop
PROGRAM
#sum mean and product
v=c(23,12,34,54,1,2,7,8)
cat("The Vector is: ",v,"\n")
cat("The sum of the vector is: ",sum(v),"\n")
cat("The mean of the vector is: ",mean(v),"\n")

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

size = length(v)
prod = 1
for(i in 1:size)
{
prod = v[i]*prod
}
cat("The product of the vector is: ",prod,"\n")

OUTPUT

The Vector is: 23 12 34 54 1 2 7 8

The sum of the vector is: 141
The mean of the vector is: 17.625
The product of the vector is: 56754432

12. PROGRAM TO PRINT A DATAFRAME

AIM
Write an R program to create a dataframe which contains the details of 5 employees and
display the details
ALGORITHM
1. Start
2. Read names to a vector
3. Read gender to a vector
4. Read age to a vector
5. Read designation to a vector
6. Using data.frame() function, make the vectors into a single dataframe and print it
7. Stop

PROGRAM
Name=c("John","Christy","Ivy","Maggie","Zayn")
Gender=c("M","M","F","F","M")
Age=c(21,24,25,27,36)
Designation=c("Clerk","Manager","Executive","CEO","Assistant")
Employees = data.frame(Name,Gender,Age,Designation);
print("EMPLOYEE DETAILS:")
print(Employees)

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

OUTPUT
[1] "EMPLOYEE DETAILS:"
Name Gender Age Designation
1 John M 21 Clerk
2 Christy M 24 Manager
3 Ivy F 25 Executive
4 Maggie F 27 CEO
5 Zayn M 36 Assistant

13. PROGRAM TO IMPLEMENT LINEAR REGRESSION

AIM
Write an R program to implement simple linear regression

ALGORITHM

1. Start
2. Read height values
3. Read weight values
4. Make the height and weight into a single data frame
5. Print the data frame
6. Write the data frame to a csv file
7. Create a linear regression model with height as independent variable and weight as
dependent variable using lm() function
8. Print the Slope and Y intercept of the line
9. Predict the weight of a given height using predict() function
10. Plot the linear regression graph using abline
11. Stop
PROGRAM
x=c(150,152,153,160,157,158,166,170,156,154)
y=c(45,50,49,59,53,55,55,67,54,52)
df=data.frame(x,y)
print(df)
write.csv(df,"file.csv")
v=lm(y~x)
print("The Formula, Slope and Y intercept of the line is: ")
print(v$coefficients)
pred=data.frame(x=156)
xpred=predict(v,pred)

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

print(paste("Predicted Weight of height 156 is:",xpred))

plot(x,y,col='blue',main='Height and Weight
Regression',abline(v,col='red'),xlab="Height",ylab="Weight")

OUTPUT

x y
1 150 45
2 152 50
3 153 49
4 160 59
5 157 53
6 158 55
7 166 55
8 170 67
9 156 54
10 154 52
[1] "The Formula, Slope and Y intercept of the line is: "
(Intercept) x
-80.3518519 0.8518519
[1] "Predicted Weight of height 156 is: 52.537037037037"

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

14. PROGRAM TO IMPLEMENT LOGISTIC REGRESSION

AIM
Write an R program to implement Logistic Regression
ALGORITHM
1. Start
2. Import libraries caTools and ROCR
3. Load a csv file
4. Split the data into testing and training data with split ratio 80%
5. Create a logistic regression model with the training data using the function glm()
6. Print the summary of the model using the function summary()
7. Predict using the model the test values using the function predict()
8. Print the confusion matrix matrix with the actual values and predicted values using
table() function
9. Predict the Outcome of the training data also, using the function predict(), and store to
predict1 variable
10. Create an ROC curve using the predict1 values using the prediction() function
11. Find the Area under the curve, AUC using the function performance()
12. Plot the ROC graph using plot() function
13. Stop

PROGRAM

#logistic regression
library(caTools)
library(ROCR)
df=read.csv("diabetes.csv")
spl=sample.split(df,SplitRatio=0.8)
train_reg <- subset(df, spl == "TRUE")
test_reg <- subset(df, spl == "FALSE")
model=glm(Outcome~.,data=train_reg,family="binomial")
print("The model is: ")
print(model)
summ=summary(model)
print("The summary of the model is")
print(summ)

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

predict<- predict(model,test_reg, type = "response")

p1=data.frame(predict)
print("The predicted values of Test Data is: ")

print(head(p1,10))
x<-table(test_reg$Outcome, predict>0.4)
print("The Confusion Matrix is: ")
print(x)
acc <-(x[[1,1]]+x[[2,2]])/sum(x)
print(paste("The Accuracy is:",acc))
predict1<- predict(model,train_reg, type = "response")

print(paste("The predicted values of Train Data is: "))

print(head(p1,10))
ROC1 <- prediction(predict1, train_reg$Outcome)
ROC2 <- performance(ROC1, measure = "tpr", x.measure = "fpr")
auc <- performance(ROC1, measure = "auc")
auc <- [email protected][[1]]
cat("The Area under the curve is: ",auc)
auc <- round(auc, 4)
plot(ROC2, colorize=TRUE,print.cutoffs.at = seq(0.1, by = 0.1), main = "ROC CURVE")
abline(a = 0, b = 1)

OUTPUT
1] "The model is: "

Call: glm(formula = Outcome ~ ., family = "binomial", data = train_reg)

Coefficients:
(Intercept) Pregnancies Glucose
-8.639132 0.143143 0.037756
BloodPressure SkinThickness Insulin
-0.016310 0.001434 -0.001336
BMI DiabetesPedigreeFunction Age
0.096936 0.633853 0.011697

Degrees of Freedom: 596 Total (i.e. Null); 588 Residual

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Null Deviance: 775.6

Residual Deviance: 550.2 AIC: 568.2
[1] "The summary of the model is"

Call:
glm(formula = Outcome ~ ., family = "binomial", data = train_reg)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.4706 -0.7023 -0.3893 0.7345 2.6794

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.639132 0.824683 -10.476 < 2e-16 ***
Pregnancies 0.143143 0.037795 3.787 0.000152 ***
Glucose 0.037756 0.004350 8.680 < 2e-16 ***
BloodPressure -0.016310 0.006011 -2.714 0.006656 **
SkinThickness 0.001434 0.007827 0.183 0.854605
Insulin -0.001336 0.001014 -1.318 0.187473
BMI 0.096936 0.017400 5.571 2.53e-08 ***
DiabetesPedigreeFunction 0.633853 0.331976 1.909 0.056219 .
Age 0.011697 0.010617 1.102 0.270576
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 775.56 on 596 degrees of freedom

Residual deviance: 550.22 on 588 degrees of freedom
AIC: 568.22

Number of Fisher Scoring iterations: 5

[1] "The predicted values of Test Data is: "

predict
3 0.8069695
8 0.7271615
12 0.9129396
17 0.3447163

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

21 0.3609391
26 0.4695308
30 0.2606044
35 0.4248690
39 0.1589572
44 0.9251604
[1] "The Confusion Matrix is: "

FALSE TRUE
0 99 15
1 29 28
[1] "The Accuracy is: 0.742690058479532"
[1] "The predicted values of Train Data is: "
predict
3 0.8069695
8 0.7271615
12 0.9129396
17 0.3447163
21 0.3609391
26 0.4695308
30 0.2606044
35 0.4248690
39 0.1589572
44 0.9251604
The Area under the curve is: 0.8481448
> source("~/.active-rstudio-document")
[1] "The model is: "

Call: glm(formula = Outcome ~ ., family = "binomial", data = train_reg)

Coefficients:
(Intercept) Pregnancies Glucose
-8.231923 0.133019 0.034235
BloodPressure SkinThickness Insulin
-0.012922 0.002832 -0.001240
BMI DiabetesPedigreeFunction Age
0.087621 0.968626 0.013071

Degrees of Freedom: 596 Total (i.e. Null); 588 Residual

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Null Deviance: 781.4

Residual Deviance: 571.8 AIC: 589.8
[1] "The summary of the model is"

Call:
glm(formula = Outcome ~ ., family = "binomial", data = train_reg)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.5474 -0.7373 -0.4114 0.7502 2.8689

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.2319229 0.7966276 -10.333 < 2e-16 ***
Pregnancies 0.1330191 0.0372710 3.569 0.000358 ***
Glucose 0.0342347 0.0040348 8.485 < 2e-16 ***
BloodPressure -0.0129215 0.0058050 -2.226 0.026018 *
SkinThickness 0.0028324 0.0078585 0.360 0.718529
Insulin -0.0012398 0.0009722 -1.275 0.202237
BMI 0.0876209 0.0165019 5.310 1.1e-07 ***
DiabetesPedigreeFunction 0.9686256 0.3299302 2.936 0.003326 **
Age 0.0130708 0.0103317 1.265 0.205831
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 781.42 on 596 degrees of freedom

Residual deviance: 571.84 on 588 degrees of freedom
AIC: 589.84

Number of Fisher Scoring iterations: 5

[1] "The predicted values of Test Data is: "

predict
2 0.05296349
6 0.15162920
11 0.22250460
15 0.62678551

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

20 0.24266690
24 0.32726113
29 0.56138003
33 0.05283638
38 0.42753236
42 0.69997193
[1] "The Confusion Matrix is: "

FALSE TRUE
0 100 19
1 16 36
[1] "The Accuracy is: 0.795321637426901"
[1] "The predicted values of Train Data is: "
predict
2 0.05296349
6 0.15162920
11 0.22250460
15 0.62678551
20 0.24266690
24 0.32726113
29 0.56138003
33 0.05283638
38 0.42753236
42 0.69997193
The Area under the curve is: 0.8351925

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

15. PROGRAM TO IMPLEMENT STATISTICAL ANALYSIS

AIM
Write an R program to perform statistical Analysis

ALGORITHM
1. Start
2. Load datasets mtcars and iris
3. Perform one sample t test, 2 sample t test and paired t test using t.test() function
4. Perform anova test using aov() function
5. Perform Shapiro Normality test using shapiro.test() function
6. Perform Kolmogorav-Smirnov test using ks.test() function
7. Perform Kruskal test using kruskal.test() function
8. Perform Wilcoxon test using wilcox.test() function
9. Perform Flinger test using flinger.test() function
10. Perform Ansari test using ansari.test() function

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

11. Read to vectors smokers and patients and using this data perform Proposition test
using prop.test() function
12. Perform Binomial test using binomial.test() function
13. Stop

PROGRAM
#statistical analysis
data(mtcars)
data(iris)
print(t.test(mtcars$mpg, y=NULL)) # One sample
print(t.test(mpg ~ cyl, data = mtcars, subset = cyl %in% c(4, 6))) # Two sample
print(t.test(mtcars$mpg, mtcars$am, data = mtcars, paired = T)) # Paired t-test
print(aov(mpg ~ cyl, data = mtcars)) # ANOVA Test
print(shapiro.test(mtcars$wt)) # Shapiro Normality Test
print(ks.test(mtcars$wt, mtcars$disp)) # Kolmogorov-Smirnov test
print(kruskal.test(mpg ~ am, data = mtcars)) # Kruskal Test
print(wilcox.test(iris$Sepal.Length)) # Wilcoxon Test
print(fligner.test(mtcars$mpg, mtcars$am)) # Flinger Test
print(ansari.test(rnorm(20), rnorm(10, 0, 5), conf.int = T)) #Ansari Test
smokers <- c(83, 90, 129, 70)
patients <- c(86, 93, 136, 82)
print(prop.test(smokers, patients)) # Proposition Test
print(binom.test(64, 100, 0.5)) # Binomial Test

OUTPUT
One Sample t-test

data: mtcars$mpg
t = 18.857, df = 31, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
17.91768 22.26357
sample estimates:
mean of x
20.09062

Welch Two Sample t-test

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

data: mpg by cyl

t = 4.7191, df = 12.956, p-value = 0.0004048
alternative hypothesis: true difference in means between group 4 and group 6 is not equal to
0
95 percent confidence interval:
3.751376 10.090182
sample estimates:
mean in group 4 mean in group 6
26.66364 19.74286

Paired t-test

data: mtcars$mpg and mtcars$am

t = 19.394, df = 31, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
17.61433 21.75442
sample estimates:
mean difference
19.68437

Call:
aov(formula = mpg ~ cyl, data = mtcars)

Terms:
cyl Residuals
Sum of Squares 817.7130 308.3342
Deg. of Freedom 1 30

Residual standard error: 3.205902

Estimated effects may be unbalanced

Shapiro-Wilk normality test

data: mtcars$wt
W = 0.94326, p-value = 0.09265

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Exact two-sample Kolmogorov-Smirnov test

data: mtcars$wt and mtcars$disp

D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided

Kruskal-Wallis rank sum test

data: mpg by am
Kruskal-Wallis chi-squared = 9.7914, df = 1, p-value = 0.001753

Wilcoxon signed rank test with continuity correction

data: iris$Sepal.Length
V = 11325, p-value < 2.2e-16
alternative hypothesis: true location is not equal to 0

Fligner-Killeen test of homogeneity of variances

data: mtcars$mpg and mtcars$am

Fligner-Killeen:med chi-squared = 4.4929, df = 1, p-value = 0.03404

Ansari-Bradley test

data: rnorm(20) and rnorm(10, 0, 5)

AB = 193, p-value = 0.002937
alternative hypothesis: true ratio of scales is not equal to 1
95 percent confidence interval:
0.05364466 0.73505807
sample estimates:
ratio of scales
0.1959577

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

4-sample test for equality of proportions without continuity correction

data: smokers out of patients

X-squared = 12.6, df = 3, p-value = 0.005585
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4
0.9651163 0.9677419 0.9485294 0.8536585

Exact binomial test

data: 64 and 100

number of successes = 64, number of trials = 100, p-value = 0.006637
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5378781 0.7335916
sample estimates:
probability of success
0.64

16. PROGRAM TO PRINT VARIANCE, COVARIANCE AND CORRELATION

OF A DATA

AIM
Write an R program to print variance, correlation and covariance of a data

ALGORITHM
1. Start
2. Load the first 4 columns of iris data and store that to a variable ‘data’
3. Find variance using apply(data,margin,function), where function =var
4. Find the covariance matrix using cov() function
5. Find the correlation matrix using cor() function
6. Stop
PROGRAM

#covariance & correlation

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

data=iris[1:4]
print("Variance:")
var = apply(data,2,var)
print(var)
print("Covariance:")
print(cov(data))
print("Correlation:")
print(cor(data),method='pearson')

OUTPUT

[1] "Variance:"
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.6856935 0.1899794 3.1162779 0.5810063
[1] "Covariance:"
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 0.6856935 -0.0424340 1.2743154 0.5162707
Sepal.Width -0.0424340 0.1899794 -0.3296564 -0.1216394
Petal.Length 1.2743154 -0.3296564 3.1162779 1.2956094
Petal.Width 0.5162707 -0.1216394 1.2956094 0.5810063
[1] "Correlation:"
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000

17. PROGRAM TO IMPLEMENT SVM CLASSIFIER

AIM
Write an R program to implement SVM classifier

ALGORITHM
1. Start
2. Import libraries e1071 and caTools
3. Read a csv file to a variable ‘data’
4. Set the seed value to 123
5. Split the data into testing and training data with split ratio 80%
6. Create an SVM classifier model using the data, and svm() function

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

7. Predict the values of test data using the model

8. Create a confusion matrix with actual values and predicted values of test data
9. Find the accuracy
10. Stop
PROGRAM
library(e1071)
library(caTools)
file='/home/student/Desktop/27-04/social1.csv' #Social network ads
data=read.csv(file)
set.seed(123)
spl=sample.split(data,SplitRatio=0.75)
train_reg <- subset(data, spl == "TRUE")
test_reg <- subset(data, spl == "FALSE")
model=svm(Purchased~.,train_reg,type='C-classification',kernel='linear')
print(model)
pred=predict(model,test_reg)
cm=table(test_reg$Purchased,pred)
print(cm)
accu=(sum(diag(cm)))/sum(cm)
cat("Accuracy:",accu)

OUTPUT

Call:
svm(formula = Purchased ~ ., data = train_reg, type = "C-classification",
kernel = "linear")

Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1

Number of Support Vectors: 120

pred
0 1
0 54 1

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

1 18 27
Accuracy: 0.81

18. PROGRAM TO IMPLEMENT DECISION TREE ALGORITHM

AIM
Write an R program to implement decision tree algorithm

ALGORITHM

1. Start
2. Import libraries rpart, rpart.plot and caTools
3. Load iris dataset
4. Set seed value to 123
5. Split the data into training and testing data with split ratio 80%
6. Create a decision tree model using train data using the function rpart()
7. Predict the values of test data using model
8. Create a confusion matrix using actual values and predicted values of test data
9. Find the accuracy of the model
10. Stop
PROGRAM
library(rpart)
library(rpart.plot)
library(caTools)
data=iris
set.seed(123)
spl=sample.split(data,SplitRatio=0.8)
train_reg <- subset(data, spl == "TRUE")
test_reg <- subset(data, spl == "FALSE")
model<-rpart(Species~.,train_reg,method="class")
rpart.plot(model,type=4,extra=101)
pred=predict(model,test_reg,type='class')
cm=table(test_reg$Species,pred)
print(cm)
accu=(cm[1]+cm[5]+cm[9])/sum(cm)
cat("The Accuracy is:",accu)

OUTPUT
pred

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

setosa versicolor virginica

setosa 10 0 0
versicolor 0 10 0
virginica 0 3 7
The Accuracy is: 0.9

19. PROGRAM TO IMPLEMENT K-MEANS CLUSTERING

AIM
Write an R program to implement k-means clustering
ALGORITHM

1. Start
2. Import library cluster
3. Load iris data from columns 1 to 4
4. Create a k-means clustering model with number of clusters 3 using the function km()
5. Using autoplot() plot the graph
6. Print the confusion matrix with species and the clusters of the model
7. Print the cluster centers
8. Stop
PROGRAM

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

library (ggplot2)
library(cluster)
data=iris
df=data[1:4] #remove the label bc unsupervised
print(head(df))
km=kmeans(df,centers = 3)
print("The Model is: ")
print(km)
plot(autoplot(km,df,frame=TRUE)) # to visually display that clusters are distinct
print("The Cluster Centers are:")
print(km$centers)
cm=table(data$Species,km$cluster)
print("The Confusion Matrix is:")
print(cm)

OUTPUT
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
[1] "The Model is: "
K-means clustering with 3 clusters of sizes 50, 38, 62

Cluster means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.006000 3.428000 1.462000 0.246000
2 6.850000 3.073684 5.742105 2.071053
3 5.901613 2.748387 4.393548 1.433871

Clustering vector:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[73] 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2
[109] 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2
[145] 2 2 3 2 2 3

Department of Artificial Intelligence and DataScience

Big Data Analytics lab

Within cluster sum of squares by cluster:

[1] 15.15100 23.87947 39.82097
(between_SS / total_SS = 88.4 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"
[1] "The Cluster Centers are:"
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.006000 3.428000 1.462000 0.246000
2 6.850000 3.073684 5.742105 2.071053
3 5.901613 2.748387 4.393548 1.433871
[1] "The Confusion Matrix is:"

1 2 3
setosa 50 0 0
versicolor 0 2 48
virginica 0 36 14

Department of Artificial Intelligence and DataScience

Guide
No ratings yet
Guide
37 pages
Python: Learn Python in 24 Hours
From Everand
Python: Learn Python in 24 Hours
Alex Nordeen
4/5 (12)
Device Network SDK (Video Intercom) - Developer Guide - V6.1.7.X - 20230330
No ratings yet
Device Network SDK (Video Intercom) - Developer Guide - V6.1.7.X - 20230330
277 pages
R Programming Examples
No ratings yet
R Programming Examples
34 pages
DS
No ratings yet
DS
63 pages
R Programming Lab Mannual
No ratings yet
R Programming Lab Mannual
34 pages
R Lab Record
No ratings yet
R Lab Record
30 pages
Introduction to r Chap 2
No ratings yet
Introduction to r Chap 2
30 pages
R Programming Questions Answers
No ratings yet
R Programming Questions Answers
2 pages
R Skill Lab Programs
No ratings yet
R Skill Lab Programs
48 pages
R Programming
No ratings yet
R Programming
34 pages
50 R Exercises
No ratings yet
50 R Exercises
44 pages
R Lab Manual
No ratings yet
R Lab Manual
27 pages
Basic Concepts in R Programming
No ratings yet
Basic Concepts in R Programming
14 pages
R_Lab_rec_final
No ratings yet
R_Lab_rec_final
31 pages
r File Finall
No ratings yet
r File Finall
75 pages
Deep R Programming
No ratings yet
Deep R Programming
456 pages
2301.01188v4
No ratings yet
2301.01188v4
450 pages
R assignment 1
No ratings yet
R assignment 1
10 pages
Notes On Engineering Computing
No ratings yet
Notes On Engineering Computing
215 pages
ziyaul 12
No ratings yet
ziyaul 12
26 pages
Lab Record
No ratings yet
Lab Record
32 pages
Basic R For Finance
100% (1)
Basic R For Finance
312 pages
R Intro A Firsts Steps
No ratings yet
R Intro A Firsts Steps
112 pages
R Prograaming Journal
No ratings yet
R Prograaming Journal
16 pages
19PDSC205 Lab Manual
No ratings yet
19PDSC205 Lab Manual
21 pages
R Programs I Mba II Sem
No ratings yet
R Programs I Mba II Sem
11 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
R Presentation 1
No ratings yet
R Presentation 1
25 pages
Notes
No ratings yet
Notes
215 pages
Notes
No ratings yet
Notes
215 pages
Practical 1- Basics of R
No ratings yet
Practical 1- Basics of R
8 pages
D.A Lab Assignment-02: Name: Rudrasish Mishra Section: It-8 ROLL NO: 1906649
No ratings yet
D.A Lab Assignment-02: Name: Rudrasish Mishra Section: It-8 ROLL NO: 1906649
26 pages
R Intro
No ratings yet
R Intro
109 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Development Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Development Core Team
100 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
109 pages
Python Introduction
No ratings yet
Python Introduction
405 pages
r programming docx
No ratings yet
r programming docx
22 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Easy Programming for Everyone
From Everand
Easy Programming for Everyone
Umar Asghar
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
DTM ppt 2
No ratings yet
DTM ppt 2
14 pages
Tableau Module3 Cleaned Complete Content
No ratings yet
Tableau Module3 Cleaned Complete Content
164 pages
DL Lab Experiments 2
No ratings yet
DL Lab Experiments 2
12 pages
Business Analytics Module 5
No ratings yet
Business Analytics Module 5
263 pages
Abstract 2
No ratings yet
Abstract 2
1 page
Litrature Survey
No ratings yet
Litrature Survey
8 pages
Navigation System
No ratings yet
Navigation System
15 pages
Webometric and Technology Analysis of Websites of Selected New.pptx
No ratings yet
Webometric and Technology Analysis of Websites of Selected New.pptx
18 pages
Multi Company - in Odoo Enterprise
No ratings yet
Multi Company - in Odoo Enterprise
6 pages
Green Cloud Computing: An Energy-Aware Layer in Software Architecture
No ratings yet
Green Cloud Computing: An Energy-Aware Layer in Software Architecture
4 pages
New ShowGeneralReport
No ratings yet
New ShowGeneralReport
1 page
Allama Iqbal Open University, Islamabad Warning: (Department of Computer Science)
No ratings yet
Allama Iqbal Open University, Islamabad Warning: (Department of Computer Science)
4 pages
BCA Syllabus 2021
No ratings yet
BCA Syllabus 2021
73 pages
еучн
No ratings yet
еучн
9 pages
Pattern-directed Architecture in Prolog n8
No ratings yet
Pattern-directed Architecture in Prolog n8
9 pages
BA1562_1e_A300_V0400_Program_EN_0808_001
No ratings yet
BA1562_1e_A300_V0400_Program_EN_0808_001
182 pages
Applications of Internet
No ratings yet
Applications of Internet
4 pages
DCA Syllabus
No ratings yet
DCA Syllabus
9 pages
SmartJCL - Product Presentation
No ratings yet
SmartJCL - Product Presentation
50 pages
SISTEMA UROGENITAL
No ratings yet
SISTEMA UROGENITAL
1 page
Standby Dump Deviceidle 2024 0727 101541
No ratings yet
Standby Dump Deviceidle 2024 0727 101541
4 pages
BMW Navigation System
No ratings yet
BMW Navigation System
2 pages
Emp Tech - Reviewer
No ratings yet
Emp Tech - Reviewer
4 pages
3 - Service Training - MESX
No ratings yet
3 - Service Training - MESX
67 pages
CarPlay Wired-To-Wireless Adapters Comparison - Wired-To-Wireless CarPlay Adapters
No ratings yet
CarPlay Wired-To-Wireless Adapters Comparison - Wired-To-Wireless CarPlay Adapters
1 page
Avineon India Pvt. LTD
No ratings yet
Avineon India Pvt. LTD
2 pages
UID Ia1
No ratings yet
UID Ia1
10 pages
Tech Note 1039 - Tips For Resolving Demo Mode With Historian Client and WIS
No ratings yet
Tech Note 1039 - Tips For Resolving Demo Mode With Historian Client and WIS
12 pages
Hegamurl Youtube Channel: Video Tutorials To Watch Hierarchy
No ratings yet
Hegamurl Youtube Channel: Video Tutorials To Watch Hierarchy
2 pages
Computer Retail Store and Maintenance
No ratings yet
Computer Retail Store and Maintenance
3 pages
AWS Certified Machine Learning Specialty Exam Guide
0% (1)
AWS Certified Machine Learning Specialty Exam Guide
11 pages
Hardely's Travelogue
No ratings yet
Hardely's Travelogue
4 pages
Health Insurance Project Requirement
No ratings yet
Health Insurance Project Requirement
6 pages
Implementation of A 4G5G Base Station
No ratings yet
Implementation of A 4G5G Base Station
11 pages
First Lab Class Handouts - To Upload - 24-03-2021
No ratings yet
First Lab Class Handouts - To Upload - 24-03-2021
30 pages

BDA LAB Experiments

Uploaded by

BDA LAB Experiments

Uploaded by

Big Data Analytics lab

1. SETTING UP AND INSTALLING HADOOP

Department of Artificial Intelligence and DataScience

sudo apt-get install ssh

*****Once installed how to start hadoop****

Department of Artificial Intelligence and DataScience

2. SHELL COMMANDS IN HADOOP

Shell commands in Hadoop provide a convenient and efficient way to interact

1. mkdir: To make a directory

2. nano: to create a file locally

3. rmdir: to remove a directory

4. version: to get the current version of hadoop

5. To list out the files

3. FILE MANAGEMENT TASKS IN HADOOP

Department of Artificial Intelligence and DataScience

1. put: To put a file from local into directory in hadoop

2. get: To get a file from the hadoop directory to a local directory

3. rm: To remove a file from hadoop directory

4. cp: to copy file from one directory to another

4. PROGRAM TO FIND FACTORIAL AND CHECK PALINDROME

Department of Artificial Intelligence and DataScience

5. Read a number from user,k

Department of Artificial Intelligence and DataScience

[1] "The factorial of 1 is 1"

5. PROGRAM TO CHECK IF A NUMBER IS PRIME

Department of Artificial Intelligence and DataScience

k=as.integer(readline("Enter a number: "))

Department of Artificial Intelligence and DataScience

3. PROGRAM TO PRINT A PATTERN

Department of Artificial Intelligence and DataScience

4. PROGRAM TO IMPLEMENT A SIMPLE CALCULATOR

Department of Artificial Intelligence and DataScience

[1] "Enter the choice: "

Department of Artificial Intelligence and DataScience

[1] "5 / 0 = Not Possible"

5. PROGRAM TO PRINT FIBONACCI SERIES

fib=function(x)#Function which prints the fibonacci series

Department of Artificial Intelligence and DataScience

6. PROGRAM TO FIND GCD OF 2 NUMBERS

Department of Artificial Intelligence and DataScience

7. PROGRAM TO FIND LCM OF 2 NUMBERS

Department of Artificial Intelligence and DataScience

Department of Artificial Intelligence and DataScience

8. PROGRAM TO PRINT THE SUM OF N NATURAL NUMBERS

Enter a natural number:5

9. PROGRAM TO PRINT OCCURENCES OF N RANDOM NUMBERS

Department of Artificial Intelligence and DataScience

10. PROGRAM TO PLOT A BARPLOT

Department of Artificial Intelligence and DataScience

11. PROGRAM TO PRINT SUM, MEAN AND PRODUCT OF A VECTOR

Department of Artificial Intelligence and DataScience

The Vector is: 23 12 34 54 1 2 7 8

12. PROGRAM TO PRINT A DATAFRAME

Department of Artificial Intelligence and DataScience

13. PROGRAM TO IMPLEMENT LINEAR REGRESSION

Department of Artificial Intelligence and DataScience

print(paste("Predicted Weight of height 156 is:",xpred))

Department of Artificial Intelligence and DataScience

14. PROGRAM TO IMPLEMENT LOGISTIC REGRESSION

Department of Artificial Intelligence and DataScience

predict<- predict(model,test_reg, type = "response")

print(paste("The predicted values of Train Data is: "))

Call: glm(formula = Outcome ~ ., family = "binomial", data = train_reg)

Degrees of Freedom: 596 Total (i.e. Null); 588 Residual

Department of Artificial Intelligence and DataScience

Null Deviance: 775.6

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 775.56 on 596 degrees of freedom

Number of Fisher Scoring iterations: 5

[1] "The predicted values of Test Data is: "

Department of Artificial Intelligence and DataScience

Call: glm(formula = Outcome ~ ., family = "binomial", data = train_reg)

Degrees of Freedom: 596 Total (i.e. Null); 588 Residual

Department of Artificial Intelligence and DataScience

Null Deviance: 781.4

(Dispersion parameter for binomial family taken to be 1)

*Once installed how to start hadoop