Bigdata Programs&Solutions
Bigdata Programs&Solutions
Part A
(List of R programs)
2. a. Turn the vector of character items "Control", "Control", "Control", "Ear Removal", "Ear Removal",
"Ear Removal", "Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake
Ear Removal" into a Factor variable and create a table from it to show the number of entries in each
treatment.
b. Create a vector of character variables that contains 25 ”a”, 15 ”b”, and 58 ”c” instances. What is
the length of this vector? Create a table from the entries.
3. a. Create three different variables, one that is numeric type and other two are vector of characters.
Use these to create data frame of student.(USN,Name,Marks)
b. Add a new numeric data column to the existing data frame (Age). Provide summary of the data
c. Display the list of student whose Age is less than 20 and Marks greater than 25
4. Write a program to create the csv file for storing Employee data. Containing the data
(EmpID, EmpName , DOJ, EmpCode, Dept, Desig.)
a. Read the suitable number of employee details from the user.
b. Create a dataframe of Employee
c. Store the dataframe in the csv file
d. Check the difference between csv and csv2 file
e. Read the data from csv and Display the contents
f. Append a new row into the csv file
5. Dataset example
a. List the data set available in your system using suitable command
b. Select “mtcars” data set, find and display the number of rows and columns in that data
set
c. Find are there more automatic (0) or manual (1) transmission-type cars in the
dataset? Hint: 9th column indicate the transmission type
d. Get a scatter plot of ‘hp’ vs ‘weight’.
e. Change ‘am’, ‘cyl’ and ‘vs’ to integer and store the new dataset as ‘newmtc’.
f. Extract the cases where cylinder is less than 5
Solutions
2. a. Turn the vector of character items "Control", "Control", "Control", "Ear Removal", "Ear Removal",
"Ear Removal", "Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake Ear Removal", "Fake
Ear Removal" into a Factor variable and create a table from it to show the number of entries in each
treatment.
b. Create a vector of character variables that contains 25 ”a”, 15 ”b”, and 58 ”c” instances. What is
the length of this vector? Create a table from the entries.
2b.
[1] 98
> table1<- data.frame(x) # Construct table from the vector
> table1
x
1 a
2 a
3 a
4 a
5 a
6 a
|
|
|
|
|
93 c
94 c
95 c
96 c
97 c
98 c
3. a. Create three different variables, one that is numeric type and other two are vector of characters.
Use these to create data frame of student.(USN,Name,Marks)
b. Add a new numeric data column to the existing data frame (Age). Provide summary of the data
c. Display the list of student whose Age is less than 20 and Marks greater than 25
print(student)
4. Write a program to create the csv file for storing Employee data. Containing the data
(EmpID, EmpName , DOJ, EmpCode, Dept, Desig.)
a. Read the suitable number of employee details from the user.
b. Create a dataframe of Employee
c. Store the dataframe in the csv file
d. Read the data from csv and Display the contents
e. Append a new row into the csv file
print("Enter EmpId")
for (i in 1:n)
EmpId[i] <- as.character(readline())
print("Enter EmployeeName")
for (i in 1:n)
EmpName [i] <- readline()
print("Enter DOJ" )
for (i in 1:n)
DOJ[i] <- (readline())
print("Enter EmployeeCode" )
for (i in 1:n)
EmpCode[i] <- as.integer(readline())
print("Enter Designation" )
for (i in 1:n)
Desig[i] <- (readline())
print("Enter Dept" )
for (i in 1:n)
Dept[i] <- (readline())
b.
Emp <- data.frame(EmpId,EmpName,EmpCode,Desig,Dept,DOJ)
c.
write.csv(Emp,"C:/Users/ARCHANA/Documents/Empfile.csv")
d.
readStudent=read.csv("C:/Users/ARCHANA/Documents/file.csv")
e.
print("Enter a new row")
u<- readline(prompt = "EmpId")
n<- readline(prompt = "EmpName")
m<- readline(prompt = "EmpCode")
A<- readline(prompt = "Desig")
s<- readline(prompt = "Dept")
t<- readline(prompt = "DOJ")
x<- data.frame(u,n,m,A,s,t)
5. Dataset example
a. List the data set available in your system using suitable command
b. Select “mtcars” data set, find and display the number of rows and columns in that data
set
c. Find are there more automatic (0) or manual (1) transmission-type cars in the
dataset? Hint: 9th column indicate the transmission type
d. Get a scatter plot of ‘hp’ vs ‘weight’.
e. Change ‘am’, ‘cyl’ and ‘vs’ to integer and store the new dataset as ‘newmtc’.
f. Extract the cases where cylinder is less than 5
a. data()
head(mtcars)
b. # Number of rows (observations)
rownum <- nrow(mtcars)
# Number of columns (variables)
colnum <- ncol(mtcars)
c. x<- data.frame(mtcars)
automatic <-0
manual <-0
for (i in 1:rownum)
ifelse( x[i,9] == 1, automatic <- automatic + 1, manual <- manual +1)
ifelse (automatic > manual,
print("There are more automatic transmission type"),
print("There are more manual transmission type") )
e. // Solution for e
x[,2]<- as.integer(x[,2])
x[,8]<- as.integer(x[,8])
x[,9]<- as.integer(x[,9])
x[,2] <= 5
f. mtcars[mtcars$cyl <=5 ]
a. df <- airquality
dim(df)
b. sapply(df,class)
c. #Printing the missing values
print("The Missing values are as follows")
xcolNames <- colnames(df)
x<- colSums(is.na(df))
print(x)
d. which(is.na(df))
sum(is.na(df))
df1<- as.data.frame(df)
e. #Recoding the missing values
for(i in 1:4)
df1[,i]<- ifelse ( is.na(df[,i]), mean(df[,i], na.rm = TRUE), df[,i])
# Excluding the missing values
df2<-na.omit(df)