0% found this document useful (0 votes)
117 views

Bigdata Programs&Solutions

1. The document describes several R programs for analyzing big data. It includes programs to create vectors and data frames, perform operations on datasets like mtcars, and write data to CSV files. 2. The solutions section provides code snippets to implement the programs, such as creating vectors with repeating values, turning strings into factors, plotting variables, and reading/writing to CSV files. 3. Examples analyze the mtcars dataset, creating vectors from repeated characters, joining data frames, and selecting rows based on column conditions. Functions like rep(), factor(), write.csv() and read.csv() are used.

Uploaded by

Anitha Mc
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views

Bigdata Programs&Solutions

1. The document describes several R programs for analyzing big data. It includes programs to create vectors and data frames, perform operations on datasets like mtcars, and write data to CSV files. 2. The solutions section provides code snippets to implement the programs, such as creating vectors with repeating values, turning strings into factors, plotting variables, and reading/writing to CSV files. 3. Examples analyze the mtcars dataset, creating vectors from repeated characters, joining data frames, and selecting rows based on column conditions. Functions like rep(), factor(), write.csv() and read.csv() are used.

Uploaded by

Anitha Mc
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Big Data Programs

Part A

(List of R programs)

1. Create vector for the following


a. (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3) where there are 10 occurrences of 4.
b. Use the function paste to create the following character vectors of length 30
("fn1", "fn2", ..., "fn30").In this case, there is no space between fn and the number

2. a. Turn the vector of character items "Control", "Control", "Control", "Ear Removal", "Ear Removal",
"Ear Removal", "Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake
Ear Removal" into a Factor variable and create a table from it to show the number of entries in each
treatment.
b. Create a vector of character variables that contains 25 ”a”, 15 ”b”, and 58 ”c” instances. What is
the length of this vector? Create a table from the entries.
3. a. Create three different variables, one that is numeric type and other two are vector of characters.
Use these to create data frame of student.(USN,Name,Marks)
b. Add a new numeric data column to the existing data frame (Age). Provide summary of the data
c. Display the list of student whose Age is less than 20 and Marks greater than 25

4. Write a program to create the csv file for storing Employee data. Containing the data
(EmpID, EmpName , DOJ, EmpCode, Dept, Desig.)
a. Read the suitable number of employee details from the user.
b. Create a dataframe of Employee
c. Store the dataframe in the csv file
d. Check the difference between csv and csv2 file
e. Read the data from csv and Display the contents
f. Append a new row into the csv file

5. Dataset example
a. List the data set available in your system using suitable command
b. Select “mtcars” data set, find and display the number of rows and columns in that data
set
c. Find are there more automatic (0) or manual (1) transmission-type cars in the
dataset? Hint: 9th column indicate the transmission type
d. Get a scatter plot of ‘hp’ vs ‘weight’.
e. Change ‘am’, ‘cyl’ and ‘vs’ to integer and store the new dataset as ‘newmtc’.
f. Extract the cases where cylinder is less than 5

6. Consider “Airquality” dataset


a. Display the dimension of the dataset
b. Display the class of each fields in the data set
c. Test the missing values
d. Recode the missing values, as mean of the column values
e. Exclude the missing values

Solutions

1. Create vector for the following


a. (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3) where there are 10 occurrences of 4.
b. Use the function paste to create the following character vectors of length 30
("fn1", "fn2", ..., "fn30").In this case, there is no space between fn and the number

a. tmp <- c(4,6,3) # Create the vector


rep(tmp,10) #Repeat the vector 10 times
b. paste("fn",1:30,sep="") # paste 1st and 2nd argument

2. a. Turn the vector of character items "Control", "Control", "Control", "Ear Removal", "Ear Removal",
"Ear Removal", "Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake
Ear Removal" into a Factor variable and create a table from it to show the number of entries in each
treatment.
b. Create a vector of character variables that contains 25 ”a”, 15 ”b”, and 58 ”c” instances. What is
the length of this vector? Create a table from the entries.

a. # Create the vector of strings


x<-c("Control", "Control", "Control", "Ear Removal", "Ear Removal", "Ear
Removal", "Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake Ear
Removal", "Fake Ear Removal")

# display the vector


> x
[1] "Control" "Control" "Control" "Ear Removal"
[5] "Ear Removal" "Ear Removal" "Ear Removal" "Fake Ear
Removal"
[9] "Fake Ear Removal" "Fake Ear Removal" "Fake Ear Removal"

#construct factor from the vector


> xfact<- factor(x)

#Display the vector


> xfact
[1] Control Control Control Ear Removal
[5] Ear Removal Ear Removal Ear Removal Fake Ear Removal
[9] Fake Ear Removal Fake Ear Removal Fake Ear Removal
Levels: Control Ear Removal Fake Ear Removal
> nlevels(xfact)
[1] 3

2b.

#Create the vector


> x<-c(rep("a",25),rep("b",15),rep("c",58))
> x
[1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
"a" "a"
[21] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
"b" "b"
[41] "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
"c" "c"
[61] "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
"c" "c"
[81] "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
> x<-c(rep("a",25),rep("b",15),rep("c",58))
> x
# Find the length of the vector

[1] 98
> table1<- data.frame(x) # Construct table from the vector
> table1
x
1 a
2 a
3 a
4 a
5 a
6 a
|
|
|
|
|
93 c
94 c
95 c
96 c
97 c
98 c

3. a. Create three different variables, one that is numeric type and other two are vector of characters.
Use these to create data frame of student.(USN,Name,Marks)
b. Add a new numeric data column to the existing data frame (Age). Provide summary of the data
c. Display the list of student whose Age is less than 20 and Marks greater than 25

n <- as.integer(readline(prompt = "Enter no of students")) # Read No of


students
# Declare the vector of character of length n
USN <- vector(mode="character", length= n)
Name <- vector(mode="character", length= n)
Marks <- vector(mode="numeric", length= n)

#Read the elements of the vector


print("Enter USN")
for (i in 1:n)
USN[i] <- as.character(readline())
print("Enter Name")
for (i in 1:n)
Name [i] <- readline()
print("Enter Marks" )
for (i in 1:n)
Marks[i] <- as.integer(readline())

#Construct the data frame from the vectors


student <- data.frame(USN,Name,Marks)
print("The Student detials are as follows")
print(student) # Display data frame

print("Enter Age") # Read the vector of Age


Age <- vector(mode="integer", length=n)
for (i in 1:n)
Age [i] <- readline()
student <- cbind(student,Age) # Append the vector to the data frame

print(student)

for(i in 1:n) # Print student age > 20 , marks > 25


if ( student[i,][3] > 25 )
if (student[i,][4] > 20)
print(student[i,])

4. Write a program to create the csv file for storing Employee data. Containing the data
(EmpID, EmpName , DOJ, EmpCode, Dept, Desig.)
a. Read the suitable number of employee details from the user.
b. Create a dataframe of Employee
c. Store the dataframe in the csv file
d. Read the data from csv and Display the contents
e. Append a new row into the csv file

a. n <- as.integer(readline(prompt = "Enter no of Employee"))


EmpId <- vector(mode="character", length= n)
EmpName <- vector(mode="character", length= n)
DOJ <- vector(mode="character", length= n)
EmpCode <- vector(mode="numeric",length = n)
Desig <- vector(mode="character",length = n)
Dept <- vector(mode="character",length = n)

print("Enter EmpId")
for (i in 1:n)
EmpId[i] <- as.character(readline())
print("Enter EmployeeName")
for (i in 1:n)
EmpName [i] <- readline()
print("Enter DOJ" )
for (i in 1:n)
DOJ[i] <- (readline())
print("Enter EmployeeCode" )
for (i in 1:n)
EmpCode[i] <- as.integer(readline())
print("Enter Designation" )
for (i in 1:n)
Desig[i] <- (readline())
print("Enter Dept" )
for (i in 1:n)
Dept[i] <- (readline())

b.
Emp <- data.frame(EmpId,EmpName,EmpCode,Desig,Dept,DOJ)

print("The Employee detials are as follows")


print(Emp)

c.
write.csv(Emp,"C:/Users/ARCHANA/Documents/Empfile.csv")

d.
readStudent=read.csv("C:/Users/ARCHANA/Documents/file.csv")

e.
print("Enter a new row")
u<- readline(prompt = "EmpId")
n<- readline(prompt = "EmpName")
m<- readline(prompt = "EmpCode")
A<- readline(prompt = "Desig")
s<- readline(prompt = "Dept")
t<- readline(prompt = "DOJ")

x<- data.frame(u,n,m,A,s,t)

write.table(x,"C:/Users/ARCHANA/Documents/Empfile.csv",col.names = FALSE, append = T,row.names


= T, quote= FALSE, sep = ",")

5. Dataset example
a. List the data set available in your system using suitable command
b. Select “mtcars” data set, find and display the number of rows and columns in that data
set
c. Find are there more automatic (0) or manual (1) transmission-type cars in the
dataset? Hint: 9th column indicate the transmission type
d. Get a scatter plot of ‘hp’ vs ‘weight’.
e. Change ‘am’, ‘cyl’ and ‘vs’ to integer and store the new dataset as ‘newmtc’.
f. Extract the cases where cylinder is less than 5
a. data()
head(mtcars)
b. # Number of rows (observations)
rownum <- nrow(mtcars)
# Number of columns (variables)
colnum <- ncol(mtcars)

c. x<- data.frame(mtcars)
automatic <-0
manual <-0
for (i in 1:rownum)
ifelse( x[i,9] == 1, automatic <- automatic + 1, manual <- manual +1)
ifelse (automatic > manual,
print("There are more automatic transmission type"),
print("There are more manual transmission type") )

d. //The scatter plot


HorsePower <- x[,4]
Weight <- x[,6]
scatter.smooth(HorsePower,Weight, span=2/3, degree = 1, family =c("symmetric","gaussian"))

// Plot histogram of Miles/gallon


Mpg <- x[,1]
hist(Mpg, breaks = 12, col ="lightblue", border = "pink")

e. // Solution for e
x[,2]<- as.integer(x[,2])
x[,8]<- as.integer(x[,8])
x[,9]<- as.integer(x[,9])
x[,2] <= 5

f. mtcars[mtcars$cyl <=5 ]

6. Consider “Airquality” dataset


a. Display the dimension of the dataset
b. Display the class of each fields in the data set
c. Test the missing values
d. Recode the missing values, as mean of the column values
e. Exclude the missing values

a. df <- airquality
dim(df)
b. sapply(df,class)
c. #Printing the missing values
print("The Missing values are as follows")
xcolNames <- colnames(df)
x<- colSums(is.na(df))
print(x)
d. which(is.na(df))
sum(is.na(df))
df1<- as.data.frame(df)
e. #Recoding the missing values
for(i in 1:4)
df1[,i]<- ifelse ( is.na(df[,i]), mean(df[,i], na.rm = TRUE), df[,i])
# Excluding the missing values

df2<-na.omit(df)

You might also like