0% found this document useful (0 votes)

116 views

DS Lab

Here is the R code to read different file types and write to a specific disk location: # Read CSV file from disk fruits <- read.csv("Fruit.csv") # Write CSV file to a new location write.csv(fruits, "fruits_data.csv") # Read text file from web data <- read.table("http://www.example.com/data.txt", header=TRUE) # Write text file to disk write.table(data, "data_from_web.txt") # Read XML file library(XML) doc <- xmlTreeParse("data.xml") # Write XML file

Uploaded by

018 Neelima

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views

DS Lab

Uploaded by

018 Neelima

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

St.

Ann's College for Women

(Autonomous), Affiliated to Osmania University
NAAC Reaccredited with 'A+' Grade, College with
Potential forExcellence by UGC Mehdipatnam,
Hyderabad -500028

Certificate of Completion
This is to certify that………….………………….………………………
of MCA bearing Hall Ticket Number….…….….….…………….
has successfully completed the necessary practical
record workin the subject: Data Science. Her work has
been duly corrected and certified.

Signature of Internal Examiner Signature of External Examiner

Head of the Department

1
1
INDEX
R AS CALCULATOR APPLICATION a. Using with and without
R objects on console b. Using mathematical functions on console c.
Write an R script, to create R 4
objects for calculator application and save in a specified location in
disk.
2 DESCRIPTIVE STATISTICS IN R
a. Write an R script to find basic descriptive statistics using
summary, str, quartile function on mtcars& cars datasets. 5
b. Write an R script to find subset of dataset by using
subset (), aggregate () functions on iris dataset.
3 READING AND WRITING DIFFERENT TYPES OF
DATASETS
a. Reading different types of data sets (.txt, .csv) from 9
Web and disk and writing in file in specific disk location.
b. Reading Excel data sheet in R.
c. Reading XML dataset in R.
4 VISUALIZATIONS a. Find the data distributions using box and
scatter plot.
b. Find the outliers using plot. 12
c. Plot the histogram, bar chart and pie chart on sample data.
5 CORRELATION AND COVARIANCE
a. Find the correlation matrix.
b. Plot the correlation plot on dataset and visualize giving an 15
overview of relationships
among data on iris data.
c. Analysis of covariance: variance (ANOVA), if data have
categorical variables on iris data.
6 REGRESSION MODEL Import a data from web storage. Name the 19
dataset and now do Logistic Regression to find out relation between
variables that are affecting the admission of a student in a institute
based on his or her GRE score, GPA obtained and rank of the
student. Also check the model is fit or not. Require (foreign), require
(MASS).
7 MULTIPLE REGRESSION MODEL
Apply multiple regressions, if data have a continuous Independent
variable. Apply on above dataset. 22

2
8 REGRESSION MODEL FOR PREDICTION Apply regression 25
Model techniques to predict the data on above dataset.
9 CLASSIFICATION MODEL 27
a. Install relevant package for classification.
b. Choose classifier for classification problem.
c. Evaluate the performance of classifier.
10 CLUSTERING MODEL 29
a. Clustering algorithms for unsupervised classification.
b. Plot the cluster data using R visualizations.

3
1. R AS CALCULATOR APPLICATION
a. Using with and without R objects on console
b. Using mathematical functions on console
c. Write an R script, to create R objects for calculator application and save in a
specified location in disk.

R AS CALCULATOR APPLICATION:
1.#Write an R script to create R objects for calculator application
add <- function(x, y) {
return(x + y)
}
subtract <- function(x, y) {
return(x - y)
}
multiply <- function(x, y) {
return(x * y)
}
divide <- function(x, y) {
return(x / y)
}
# take input from the user
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))
num1 = as.integer(readline(prompt="Enter first number: "))
num2 = as.integer(readline(prompt="Enter second number: "))
operator <- switch(choice,"+","-","*","/")
result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2),
divide(num1, num2))
print(paste(num1,operator, num2, "=", result))
Output:

[1] "Select operation."

[1] "1.Add"

[1] "2.Subtract"

[1] "3.Multiply"

4
[1] "4.Divide"

Enter choice[1/2/3/4]: 4

Enter first number: 20

Enter second number: 4

[1] "20 / 4 = 5"

5
2.DESCRIPTIVE STATISTICS IN R
a. Write an R script to find basic descriptive statistics using summary, str, quartile function
on mtcars & cars datasets.
Compute the minimum,1st quartile, median, mean,3rd quartile and the maximum for all numeric
variables of a dataset at once using SUMMARY():

step1: summarydata=summary(mtcars)
write.csv(summarydata,”x.csv”, row.names= FALSE)
read.csv(“x.csv”)
OUTPUT:
Min. Min. Min. Min. Min. Min.
:10.40 :4.000 Min. : 71.1 Min. : 52.0 :2.760 :1.513 :14.50 :0.0000
1st 1st 1st 1st Qu.: 1st 1st 1st 1st
Qu.:15.43 Qu.:4.000 Qu.:120.8 96.5 Qu.:3.080 Qu.:2.581 Qu.:16.89 Qu.:0.0000
Median Median Median Median Median Median Median Median
:19.20 :6.000 :196.3 :123.0 :3.695 :3.325 :17.71 :0.0000
Mean Mean Mean Mean Mean Mean Mean Mean
:20.09 :6.188 :230.7 :146.7 :3.597 :3.217 :17.85 :0.4375
3rd 3rd 3rd 3rd 3rd 3rd 3rd 3rd
Qu.:22.80 Qu.:8.000 Qu.:326.0 Qu.:180.0 Qu.:3.920 Qu.:3.610 Qu.:18.90 Qu.:1.0000
Max. Max. Max. Max. Max. Max. Max. Max.
:33.90 :8.000 :472.0 :335.0 :4.930 :5.424 :22.90 :1.0000

summary(cars)
step1: summarydata=summary(mtcars)
step2: write.csv(summarydata,”x.csv”, row.names= FALSE)
read.csv(“x.csv”)
output:
Speed Dist
Min. : 4.0 Min. : 2.00
1st
Qu.:12.0 1st Qu.: 26.00
Median
:15.0 Median : 36.00
Mean
:15.4 Mean : 42.98
3rd
Qu.:19.0 3rd Qu.: 56.00
Max. Max. :120.00

6
:25.0

>Str(mtcars)

str() function in R Language is used for compactly displaying the internal structure of a R
object. It can display even the internal structure of large lists which are nested. It provides one
liner output for the basic R objects letting the user know about the object and its constituents.

RANGE:The range can then be easily computed, as you have guessed, by subtracting the
minimum from the maximum

Range(mtcars$mpg)

Output: 10.4 33.9

quantile() function in R Language is used to create sample quantiles within a data set
with probability[0, 1].

Such as first quantile is at 0.25[25%], second is at 0.50[50%], and third is at 0.75[75%].

Step1: quantile(mtcars$cyl)

"x"
"0%" 4
"25%" 4
"50%" 6
"75%" 8
"100%" 8

b. Write an R script to find subset of dataset using subset() aggregate() functions on

iris datset
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to
select and filter variables and observations.
Aggregate() Function in R Splits the data into subsets, computes summary statistics for each
subsets and returns the result in a group by form.

Aggregate function in R is similar to group by in SQL. Aggregate() function is useful in

performing all the aggregate operations like sum,count,mean, minimum and Maximum.

7
#load dataset iris into r
>r=iris
>s=subset(r,r$Species==”virginica”)
>write.csv(s,file=”su.csv”)
>read.csv(“su.csv”)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
101 6.3 3.3 6 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
103 7.1 3 5.9 2.1 virginica
104 6.3 2.9 5.6 1.8 virginica
105 6.5 3 5.8 2.2 virginica
106 7.6 3 6.6 2.1 virginica
107 4.9 2.5 4.5 1.7 virginica
108 7.3 2.9 6.3 1.8 virginica
109 6.7 2.5 5.8 1.8 virginica
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2 virginica
112 6.4 2.7 5.3 1.9 virginica
113 6.8 3 5.5 2.1 virginica

8
3A. Reading different types of data sets (.txt, .csv) from Web and disk and writing in file in
specific disk location

Create Fruit.csv file using .CSV file

Fruit Name Fruit Color Fruit Price

Apple Red 100
Banana Yellow 60
Watermelon, Green 120
Pineapple Yellow 80
Grapes Green 130
Bananna Green 50
Apple Green 150

Reading CSV file using read.csv() function

read.csv("Fruit.csv")

Reading CSV file using read.table() function

read.table ("Fruit.csv", header=TRUE, sep=",")

OUTPUT:

Fruit.Name Fruit.Color Fruit.Price

1 Apple Red 100
2 Banana Yellow 60
3 Watermelon Green 120
4 Pineapple Yellow 80
5 Grapes Green 130
6 Banana Green 50
7 Apple Green 150

3b Reading Excel data sheet in R.

Softdrinks.xlsx

Soft drinks Price

Pepsi 40
ThumpsUP 60
Maaza 70
Limca 80
Sprite 90

9
convert an Excel worksheet to a text file by using the Save As command.
1.Go to File > Save As.
1. Click Browse.
2. In the Save As dialog box, under Save as type box, choose the text file format for the
worksheet; for example, click Text (Tab delimited) or CSV (Comma delimited)

Reading spreadsheet[.xlsx] after converting into[ .csv file]

read.csv (“softdrinks.csv”)

OUTPUT:

softdrinks price
1 Pepsi 40
2 Thumpsup 60
3 maaza 70
4 Limca 80
5 sprite 90

3c. Reading XML dataset in R

<RECORDS>
<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<SALARY>843.25</SALARY>
<STARTDATE>3/27/2015</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>

<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<SALARY>632.8</SALARY>
<STARTDATE>7/30/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>

10
<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<SALARY>722.5</SALARY>
<STARTDATE>6/17/2014</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>

</RECORDS>

Load the PAckag requires to read XML file

Library(“XML”)

Load the required Package

Library(“methods”)
Give the input file name to the function
xmldtaframe<- xmlToDataFrame(“employee.xml”)
Print the output
Print(xmldataframe)

Output:
ID NAME SALARY STARTDATE DEPT
1 5 Gary 843.25 3/27/2015 Finance
2 6 Nina 578 5/21/2013 IT
3 7 Simon 632.8 7/30/2013 Operations
4 8 Guru 722.5 6/17/2014 Finance

11
VISUALIZATIONS
4a. Find the data distributions using box and scatter plot.

In R, boxplot (and whisker plot) is created using the boxplot() function.

The boxplot() function takes in any number of numeric vectors, drawing a boxplot for each
vector.
You can also pass in a list (or data frame) with numeric vectors as its components. Let us use the
built-in dataset airquality which has “Daily air quality measurements in New York, May to
September”
>str (airquality)

OUTPUT:

'data.frame': 153 obs. of 6 variables:

$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...

$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...

$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...

$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...

$ Month : int 5 5 5 5 5 5 5 5 5 5 ...

$ Day : int 1 2 3 4 5 6 7 8 9 10 ...

>boxplot (airquality)
>boxplot(airquality$Ozone)
OUTPUT:

>boxplot(airquality$Ozone,main = "Mean ozone in parts per billion at Roosevelt

Island",xlab = "Parts Per Billion",ylab = "Ozone",col = "orange",border =
"brown",horizontal = TRUE,notch = TRUE)
OUTPUT:

12
Scatterplot:

Plot a Scatter Plot. The function to plot a scatter plot is ‘plot’. This function uses two vectors, i.e.
one for the x axis and another for the y axis. The objective is to understand the relationship
between numbers and their sines. We will use two vectors. Vector, x which will have a sequence
of values between 1 and 25 at an interval of 0.1 and vector, y which stores the sines of all values
held in vector, x.

> x <-seq(1, 25, 0.1)

> y <-sin(x)

The plot function takes the values in the vector, x and plots it on the horizontal axis.

> plot(x, y)
OUTPUT:

Scatter Diagram

13
4b. Find the outliers using plot.

Removed all the existing objects

>rm(list = ls())

#Setting the working directory

>setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
>getwd()

#Load the dataset

>bike_data = read.csv("day.csv",header=TRUE)

### Missing Value Analysis ###

>sum(is.na(bike_data))
>summary(is.na(bike_data))

#From the above result, it is clear that the dataset contains NO Missing Values.

4c. Plot the histogram, bar chart and pie chart on sample data

# Create data for the histogram

h<- c (8,13,30,5,28)

#Create histogram for H

hist(h)
Output:

14
CORRELATION AND COVARIANCE
5a. Find the correlation matrix.
A correlation matrix is a table of correlation coefficients for a set of variables used to
determine if a relationship exists between the variables. The coefficient indicates both the
strength of the relationship as well as the direction (positive vs. negative correlations).

install.packages("corrplot")

source ("http://www.sthda.com/upload/rquery_cormat.r")
mydata <- mtcars[, c(1,3,4,5,6,7)]
head(mydata)
mpg disp hp drat wt qsec
Mazda RX4 21.0 160 110 3.90 2.620 16.46
Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02
Datsun 710 22.8 108 93 3.85 2.320 18.61
Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44
Hornet Sportabout 18.7 360 175 3.15 3.440 17.02
Valiant 18.1 225 105 2.76 3.460 20.22
>rquery.cormat(mydata)
$r
hp disp wt qsec mpg drat
hp 1
disp 0.79 1
wt 0.66 0.89 1
qsec -0.71 -0.43 -0.17 1
mpg -0.78 -0.85 -0.87 0.42 1
drat -0.45 -0.71 -0.71 0.091 0.68 1
$p
hp disp wt qsec mpg drat
hp 0
disp 7.1e-08 0
wt 4.1e-05 1.2e-11 0
qsec 5.8e-06 0.013 0.34 0
mpg 1.8e-07 9.4e-10 1.3e-10 0.017 0
drat 0.01 5.3e-06 4.8e-06 0.62 1.8e-05 0
$sym
hp disp wt qsec mpg drat
hp 1
disp , 1
wt , + 1
qsec , . 1
mpg , + + . 1
drat . , , , 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1

15
mydata <- iris[, c(1,2,3,4)]
head(mydata)
>rquery.cormat(mydata)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4

5b. Plot the correlation plot on dataset and visualize giving an overview of relationships
among data on iris data.
Step 1- Load the relevant libraries

>library(ggplot2)
>library(tidyr)
>library(datasets)
>data("iris")
>summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300

16
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50

Step 2 - Create a correlation matrix of the Iris dataset using the DataExplorer correlation
function we used in class in lab 3. Include only continuous variables in your correlation
plot to avoid confusion as factor variables don’t make sense in a correlation plot

>library(corrplot)
>title="matrix_iris"
>plot_correlation(iris)

Step 3 - Create three separate correlation matrices for each species of iris flower
str(iris)
data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

>m<-levels(iris$Species)
>title0<-"Setosa"
>setosaCorr=cor(iris[iris$Species==m[1],1:4])
>corrplot(setosaCorr,method="number",title=title,mar=c(0,0,1,0))
Output:

5c. Analysis of covariance: variance (ANOVA), if data have categorical variables on iris
data.
>input <- mtcars[,c("am","mpg","hp")]
>print(head(input))

am mpg hp
Mazda RX4 1 21.0 110
Mazda RX4 Wag 1 21.0 110
Datsun 710 1 22.8 93
Hornet 4 Drive 0 21.4 110

17
Hornet Sportabout 0 18.7 175
Valiant 0 18.1 105
Model with interaction between categorical variable and predictor variable

>input <- mtcars

>result <- aov(mpg~hp*am,data = input)
>print(summary(result))

OUTPUT:

Df Sum Sq Mean Sq F value Pr(>F)

hp 1 678.4 678.4 77.391 1.50e-09 ***
am 1 202.2 202.2 23.072 4.75e-05 ***
hp:am 1 0.0 0.0 0.001 0.981
Residuals 28 245.4 8.8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Model without interaction between categorical variable and predictor variable

>input <- mtcars
>result <- aov(mpg~hp+am,data = input)
>print(summary(result))

OUTPUT:

Df Sum Sq Mean Sq F value Pr(>F)

hp 1 678.4 678.4 80.15 7.63e-10 ***
am 1 202.2 202.2 23.89 3.46e-05 ***
Residuals 29 245.4 8.5
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Comparing Two Models

>input <- mtcars
>result1 <- aov(mpg~hp*am,data = input)
>result2 <- aov(mpg~hp+am,data = input)
>print(anova(result1,result2))

OUTPUT:

Analysis of Varience table

Model 1: mpg ~ hp * am
Model 2: mpg ~ hp + am
Res.Df RSS Df Sum of Sq F Pr(>F)
1 28 245.43
2 29 245.44 -1 -0.0052515 6e-04 0.9806

18
REGRESSION MODEL
6.Import a data from web storage. Name the dataset and now do Logistic Regression to
find out relation between variables that are affecting the admission of a student in a
institute based on his or her GRE score, GPA obtained and rank of the student. Also check
the model is fit or not. Require (foreign), require (MASS).

>library (rio)
>data<-import(“binary.sas7bdat”)

Data Cleaning:
Looking at the structure of data set

>str(data)

## 'data.frame': 400 obs. of 4 variables:

## $ ADMIT: num 0 1 1 1 0 1 1 0 1 0 ...
## $ GRE : num 380 660 800 640 520 760 560 400 540 700 ...
## $ GPA : num 3.61 3.67 4 3.19 2.93 ...
## $ RANK : num 3 3 1 4 4 2 1 2 3 2 ...
## - attr(*, "label")= chr "LOGIT"

Variables ADMIT and RANK are of type numeric but they should be factor variables since
were are not going to perform any mathematical operations on them.

>data$ADMIT<-as.factor(data$ADMIT)
>data$RANK<- as.factor(data$RANK)
>str (data)
# 'data.frame': 400 obs. of 4 variables:
## $ ADMIT: Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 1 2 1 ...
## $ GRE : num 380 660 800 640 520 760 560 400 540 700 ...
## $ GPA : num 3.61 3.67 4 3.19 2.93 ...
## $ RANK : Factor w/ 4 levels "1","2","3","4": 3 3 1 4 4 2 1 2 3 2 ...
## - attr(*, "label")= chr "LOGIT"
Looking at the summary of the dataset
>summary (data)
## ADMIT GRE GPA RANK
## 0:273 Min. :220.0 Min. :2.260 1: 61
## 1:127 1st Qu.:520.0 1st Qu.:3.130 2:151
## Median :580.0 Median :3.395 3:121

19
## Mean :587.7 Mean :3.390 4: 67
## 3rd Qu.:660.0 3rd Qu.:3.670
## Max. :800.0 Max. :4.000

From the summary statistics we observe

• Most of students did not get admitted

• There are no missing data values(NAs).

Checking for multi collineality

>Plot(data$GPA,data$GRE,col=”red”)
> cor (data$GRE, data$GPA)
Exploratoty Data Analysis.
We will explore the relationship between dependent and independent variables by way of
visualization.
GRE
Since GRE is numeric variable and dependent variable is factor variable, we plot a box plot

Library(ggplot2)
Ggplot(data,aes(ADMIT,GRE,fill=ADMIT))+
library(ggplot2) #For plotting
ggplot(data,aes(ADMIT,GRE,fill=ADMIT))+
geom_boxplot()+
theme_bw()+
xlab("Admit")+
ylab("GRE")+
ggtitle("ADMIT BY GRE")

The two box plots are differents in terms of displacement, and hence GRE is significant
variable.
GPA
ggplot(data,aes(ADMIT,GPA,fill=ADMIT))+
geom_boxplot()+
theme_bw()+
xlab("Admit")+

20
ylab("GPA")+
ggtitle("ADMIT BY GPA")
There is clear difference in displacement between the two box plots, hence GPA is an important
predictor.
RANK
RANK is a factor variable and since the dependent variable is a factor variable we plot a bar plot.
ggplot(data,aes(RANK,ADMIT,fill=ADMIT))+
geom_col()+
xlab("RANK")+
ylab("COUNT-ADMIT")+
ggtitle("ADMIT BY RANK")

Modelling
Data Splitting
Before we fit a model, we need to split the dataset into training and test dataset to be able to
assess the performance of the model with the unseen test dataset.
library(caret) #For data spliting
set.seed(125) #For reproducibiity
ind <- createDataPartition(data$ADMIT,p=0.80,list = FALSE)
training <- data[ind,] #training data set
testing <- data[-ind,] #Testing data set

21
MULTIPLE REGRESSION MODEL
7.Apply multiple regressions, if data have a continuous Independent variable. Apply on
above dataset.
tidyverse for data manipulation and visualization

>library(tidverse)

We’ll use the marketing data set [datarium package], which contains the impact of the amount of
money spent on three advertising medias (youtube, facebook and newspaper) on sales.
First install the datarium package using devtools::install_github("kassmbara/datarium"), then load
and inspect the marketing data as follow:
>data("marketing", package = "datarium")
> head(marketing, 4)

## youtube facebook newspaper sales

## 1 276.1 45.4 83.0 26.5
## 2 53.4 47.2 54.1 12.5
## 3 20.6 55.1 83.2 11.2
## 4 181.8 49.6 70.2 22.2

Building model
We want to build a model for estimating sales based on the advertising budget invested in youtube,
facebook and newspaper, as follow:
sales = b0 + b1*youtube + b2*facebook + b3*newspaper
You can compute the model coefficients in R as follow:
>model <- lm(sales ~ youtube + facebook + newspaper, data = marketing)
>summary(model)
##
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.59 -1.07 0.29 1.43 3.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.52667 0.37429 9.42 <2e-16 ***
## youtube 0.04576 0.00139 32.81 <2e-16 ***
## facebook 0.18853 0.00861 21.89 <2e-16 ***
## newspaper -0.00104 0.00587 -0.18 0.86
## ---

22
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.02 on 196 degrees of freedom
## Multiple R-squared: 0.897, Adjusted R-squared: 0.896
## F-statistic: 570 on 3 and 196 DF, p-value: <2e-16

Interpretation
The first step in interpreting the multiple regression analysis is to examine the F-statistic and the
associated p-value, at the bottom of model summary.
In our example, it can be seen that p-value of the F-statistic is < 2.2e-16, which is highly significant.
This means that, at least, one of the predictor variables is significantly related to the outcome
variable.
To see which predictor variables are significant, you can examine the coefficients table, which
shows the estimate of regression beta coefficients and the associated t-statitic p-values:
>summary(model)$coefficient

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 3.52667 0.37429 9.422 1.27e-17
## youtube 0.04576 0.00139 32.809 1.51e-81
## facebook 0.18853 0.00861 21.893 1.51e-54
## newspaper -0.00104 0.00587 -0.177 8.60e-01

As the newspaper variable is not significant, it is possible to remove it from the model:
>model <- lm(sales ~ youtube + facebook, data = marketing)
>summary(model)
##

## Call:
## lm(formula = sales ~ youtube + facebook, data = marketing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.557 -1.050 0.291 1.405 3.399
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.50532 0.35339 9.92 <2e-16 ***
## youtube 0.04575 0.00139 32.91 <2e-16 ***
## facebook 0.18799 0.00804 23.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.02 on 197 degrees of freedom

23
## Multiple R-squared: 0.897, Adjusted R-squared: 0.896
## F-statistic: 860 on 2 and 197 DF, p-value: <2e-16
Finally, our model equation can be written as follow: sales = 3.5 + 0.045*youtube +
0.187*facebook.
The confidence interval of the model coefficient can be extracted as follow:
>confint(model)

## 2.5 % 97.5 %
## (Intercept) 2.808 4.2022
## youtube 0.043 0.0485
## facebook 0.172 0.2038

Model accuracy assessment

Residual Standard Error (RSE), or sigma:
The RSE estimate gives a measure of error of prediction. The lower the RSE, the more accurate
the model (on the data in hand).
The error rate can be estimated by dividing the RSE by the mean outcome variable:
>sigma(model)/mean(marketing$sales)

## [1] 0.12

24
REGRESSION MODEL FOR PREDICTION

8. Apply regression Model techniques to predict the data on above dataset.

Predicting Blood pressure using Age by Regression in R

Now we are taking a dataset of Blood pressure and Age and with the help of the data train a
linear regression model in R which will be able to predict blood pressure at ages that are not
present in our dataset.

Equation of the regression line in our dataset

BP = 98.7147 + 0.9709 Age

Importing dataset
Importing a dataset of Age vs Blood Pressure which is a CSV file using function read.csv( ) in R
and storing this dataset into a data frame bp.

>bp <- read.csv(“bp.csv”)

Creating data frame for predicting values

Creating a data frame which will store Age 53. And this data frame will be used to predict blood
pressure at Age 53 after creating a linear regression model.

>p <- as.data.frame(53)

>colnames(p) <- "Age"
Calculating the correlation between Age and Blood pressure

We can also verify our above analysis that there is a correlation between Blood pressure and Age
by taking the help of cor( ) function in R which is used to calculate the correlation between two
variables.

>cor(bp$BP,bp$Age)

[1] 0.6575673
Creating a Linear regression model
Now with the help of lm( ) function, we are going to make a linear model. lm( ) function has two
attributes first is a formula where we will use “BP ~ Age” because Age is an independent
variable and Blood pressure is a dependent variable and the second is data, where we will give
the name of the data frame containing data which is in this case, is data frame bp.

model <- lm(BP ~ Age, data = bp)

25
Summary of our linear regression model
summary(model)
Output:

##
## Call:
## lm(formula = BP ~ Age, data = bp)
# Residuals:
## Min 1Q Median 3Q Max
## -21.724 -6.994 -0.520 2.931 75.654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.7147 10.0005 9.871 1.28e-10 ***
## Age 0.9709 0.2102 4.618 7.87e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.31 on 28 degrees of freedom
## Multiple R-squared: 0.4324, Adjusted R-squared: 0.4121
## F-statistic: 21.33 on 1 and 28 DF, p-value: 7.867e-05
Interpretation of the model

## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.7147 10.0005 9.871 1.28e-10 ***
## Age 0.9709 0.2102 4.618 7.87e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
B0 = 98.7147 (Y- intercept)
B1 = 0.9709 (Age coefficient)
BP = 98.7147 + 0.9709 Age

26
CLASSIFICATION MODEL
9a. Install relevant package for classification.
9b. Choose classifier for classification problem
9c. Evaluate the performance of classifier
The R package "party" is used to create decision trees.
Install R Package
Use the below command in R console to install the package. You also have to install the dependent
packages if any.
>install.packages("party")
The package "party" has the function ctree() which is used to create and analyze decison tree.
Syntax
The basic syntax for creating a decision tree in R is −
>ctree(formula, data)
INPUT DATA:
>library(party)
>print(head(reading skills))
When we execute the above code, it produces the following result and chart −
nativeSpeaker age shoeSize score
1 yes 5 24.83189 32.29385
2 yes 6 25.95238 36.63105
3 no 11 30.42170 49.60593
4 yes 7 28.66450 40.28456
5 yes 11 31.88207 55.46085
6 yes 10 30.07843 52.83124
Loading required package: methods
Loading required package: grid
...............................

Example
We will use the ctree() function to create the decision tree and see its graph.
# Load the party package. It will automatically load other
# dependent packages.
library(party)

# Create the input data frame.

input.data <- readingSkills[c(1:105),]

# Give the chart file a name.

png(file = "decision_tree.png")

27
# Create the tree.
output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)

# Plot the tree.

plot(output.tree)

# Save the file.

dev.off()
When we execute the above code, it produces the following result −
null device
1
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

as.Date, as.Date.numeric

Loading required package: sandwich

28
CLUSTERING MODEL
10a. Clustering algorithms for unsupervised classification
10b. Plot the cluster data using R visualizations

k-means clustering

str(x)
## num [1:300, 1:2] 3.37 1.44 2.36 2.63 2.4 ...
head(x)
## [,1] [,2]
## [1,] 3.370958 1.995379
## [2,] 1.435302 2.760242
## [3,] 2.363128 2.038991
## [4,] 2.632863 2.735072
## [5,] 2.404268 1.853527
## [6,] 1.893875 1.942113
# Create the k-means model: km.out
km.out <- kmeans(x, centers = 3, nstart = 20)

# Inspect the result

summary(km.out)
## Length Class Mode
## cluster 300 -none- numeric
## centers 6 -none- numeric
## totss 1 -none- numeric
## withinss 3 -none- numeric
## tot.withinss 1 -none- numeric
## betweenss 1 -none- numeric
## size 3 -none- numeric
## iter 1 -none- numeric
## ifault 1 -none- numeric

– Results of kmeans()

# Print the cluster membership component of the model

km.out$cluster
## [1] 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

29
## [36] 2 3 3 3 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [176] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 1 1 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 2
## [281] 3 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 2 3 3
# Print the km.out object
km.out
## K-means clustering with 3 clusters of sizes 150, 98, 52
##
## Cluster means:
## [,1] [,2]
## 1 -5.0556758 1.96991743
## 2 2.2171113 2.05110690
## 3 0.6642455 -0.09132968
##
## Clustering vector:
## [1] 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [36] 2 3 3 3 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [176] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 1 1 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 2
## [281] 3 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 2 3 3
##
## Within cluster sum of squares by cluster:
## [1] 295.16925 148.64781 95.50625
## (between_SS / total_SS = 87.2 %)
##
## Available components:
##

30
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault"

– Visualizing and interpreting results of kmeans()

# Scatter plot of x
plot(x,
col = km.out$cluster,
main = "k-means with 3 clusters",
xlab = "",
ylab = "")

Once Upon A Time in Korea. An Elementary Reader (In Ku Kim-Marshall) (Z-Library)
100% (3)
Once Upon A Time in Korea. An Elementary Reader (In Ku Kim-Marshall) (Z-Library)
200 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Unit 2
No ratings yet
Unit 2
32 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
DA_Lab_Week-1
No ratings yet
DA_Lab_Week-1
7 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
MBA Sem 1 Unit 3 Fundamentals of R (1)
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R (1)
41 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Unit II - R Programming
No ratings yet
Unit II - R Programming
29 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
Kanak Gupta 1116 SEC Assignment
No ratings yet
Kanak Gupta 1116 SEC Assignment
3 pages
Stats With R
No ratings yet
Stats With R
103 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Starting With R
No ratings yet
Starting With R
34 pages
R
No ratings yet
R
13 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
PushpendraLabFile
No ratings yet
PushpendraLabFile
51 pages
R Manual
No ratings yet
R Manual
10 pages
Lec 13
No ratings yet
Lec 13
46 pages
R Studio Lab Summary Sheet
No ratings yet
R Studio Lab Summary Sheet
3 pages
50 R Exercises
No ratings yet
50 R Exercises
44 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
CH 3
No ratings yet
CH 3
33 pages
Introduction to R
No ratings yet
Introduction to R
23 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
R Programming Practical File
No ratings yet
R Programming Practical File
38 pages
Unit 2 Notes - Data Analysis Using r
No ratings yet
Unit 2 Notes - Data Analysis Using r
19 pages
computer-interactive-statistics
No ratings yet
computer-interactive-statistics
103 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
filefile (6) (1)
No ratings yet
filefile (6) (1)
39 pages
Untitled
No ratings yet
Untitled
59 pages
R Examples
No ratings yet
R Examples
56 pages
ML File
No ratings yet
ML File
12 pages
R Commands
No ratings yet
R Commands
18 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Introduction To R: Arin Basu MD MPH Dataanalytics
No ratings yet
Introduction To R: Arin Basu MD MPH Dataanalytics
33 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
June 2019 Paper Mark Scheme
No ratings yet
June 2019 Paper Mark Scheme
20 pages
Scientific Accuracy of Quran
No ratings yet
Scientific Accuracy of Quran
1 page
Finextra Axway Psd2 Paper Final
No ratings yet
Finextra Axway Psd2 Paper Final
36 pages
Process Operational Readiness and Operational Readiness Follow-On
No ratings yet
Process Operational Readiness and Operational Readiness Follow-On
81 pages
Design A Hotel Management System
No ratings yet
Design A Hotel Management System
16 pages
BSC I Paper II Aug11
No ratings yet
BSC I Paper II Aug11
1 page
30th International Kangaroo Mathematics Contest 2020 Answer of Problems
No ratings yet
30th International Kangaroo Mathematics Contest 2020 Answer of Problems
1 page
Solid Layer Thermal-Conductivity Measurement Techniques: KE Goodson
No ratings yet
Solid Layer Thermal-Conductivity Measurement Techniques: KE Goodson
12 pages
Stage Operation Material Balances 1. Simple Mass Balance and Units
No ratings yet
Stage Operation Material Balances 1. Simple Mass Balance and Units
10 pages
The Matrix Exercises
No ratings yet
The Matrix Exercises
4 pages
Forest and Wildlife Resources Class 6 Next Topper SST
100% (1)
Forest and Wildlife Resources Class 6 Next Topper SST
16 pages
NX Import Pax Files
No ratings yet
NX Import Pax Files
4 pages
b4 41 Booster
No ratings yet
b4 41 Booster
2 pages
Optimal Capacitor Bank Allocation in Power Distribution System White Paper Wp917001en
No ratings yet
Optimal Capacitor Bank Allocation in Power Distribution System White Paper Wp917001en
9 pages
Adastra CM30B, CM60B User Manual
No ratings yet
Adastra CM30B, CM60B User Manual
8 pages
Ash Content Determination
50% (2)
Ash Content Determination
17 pages
Final Poster Presentation
No ratings yet
Final Poster Presentation
1 page
Belletti - Truncation Vs Reduction in Development.10.06.2021
No ratings yet
Belletti - Truncation Vs Reduction in Development.10.06.2021
23 pages
Kundli
No ratings yet
Kundli
40 pages
Steel Frame Vs Concrete Frame For High Rise Buildings
No ratings yet
Steel Frame Vs Concrete Frame For High Rise Buildings
6 pages
0 Subiect XI A Final
No ratings yet
0 Subiect XI A Final
3 pages
Unit 6
No ratings yet
Unit 6
11 pages
Neet PG 2022 All India Deemed Colleges Round Wise Final Cut Off
No ratings yet
Neet PG 2022 All India Deemed Colleges Round Wise Final Cut Off
2 pages
Pit Design
100% (1)
Pit Design
99 pages
Jennifer Ormond Teaching
No ratings yet
Jennifer Ormond Teaching
3 pages
Operating Instructions Flexdip CYA112: Wastewater Assembly
No ratings yet
Operating Instructions Flexdip CYA112: Wastewater Assembly
36 pages
Aventurian Names (Wip, Partially Translated)
No ratings yet
Aventurian Names (Wip, Partially Translated)
14 pages
Workout Thesis Statement
100% (3)
Workout Thesis Statement
8 pages
The Effectively Integrating Technology into Mathematics Education and its Benefits and Challenges to Teachers: A Systematic Literature Review
No ratings yet
The Effectively Integrating Technology into Mathematics Education and its Benefits and Challenges to Teachers: A Systematic Literature Review
13 pages

DS Lab

Uploaded by

DS Lab

Uploaded by

St.

Ann's College for Women

Signature of Internal Examiner Signature of External Examiner

Head of the Department

[1] "Select operation."

Enter first number: 20

Enter second number: 4

[1] "20 / 4 = 5"

Output: 10.4 33.9

Such as first quantile is at 0.25[25%], second is at 0.50[50%], and third is at 0.75[75%].

b. Write an R script to find subset of dataset using subset() aggregate() functions on

Aggregate function in R is similar to group by in SQL. Aggregate() function is useful in

Create Fruit.csv file using .CSV file

Fruit Name Fruit Color Fruit Price

Reading CSV file using read.csv() function

Reading CSV file using read.table() function

read.table ("Fruit.csv", header=TRUE, sep=",")

Fruit.Name Fruit.Color Fruit.Price

3b Reading Excel data sheet in R.

Soft drinks Price

Reading spreadsheet[.xlsx] after converting into[ .csv file]

3c. Reading XML dataset in R

Load the PAckag requires to read XML file

Load the required Package

In R, boxplot (and whisker plot) is created using the boxplot() function.

'data.frame': 153 obs. of 6 variables:

$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...

$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...

$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...

$ Month : int 5 5 5 5 5 5 5 5 5 5 ...

$ Day : int 1 2 3 4 5 6 7 8 9 10 ...

>boxplot(airquality$Ozone,main = "Mean ozone in parts per billion at Roosevelt

> x <-seq(1, 25, 0.1)

Removed all the existing objects

#Setting the working directory

#Load the dataset

### Missing Value Analysis ###

# Create data for the histogram

#Create histogram for H

>input <- mtcars

Df Sum Sq Mean Sq F value Pr(>F)

Model without interaction between categorical variable and predictor variable

Df Sum Sq Mean Sq F value Pr(>F)

Comparing Two Models

Analysis of Varience table

## 'data.frame': 400 obs. of 4 variables:

From the summary statistics we observe

• Most of students did not get admitted

Checking for multi collineality

## youtube facebook newspaper sales

## Estimate Std. Error t value Pr(>|t|)

Model accuracy assessment

8. Apply regression Model techniques to predict the data on above dataset.

Predicting Blood pressure using Age by Regression in R

Equation of the regression line in our dataset

BP = 98.7147 + 0.9709 Age

>bp <- read.csv(“bp.csv”)

Creating data frame for predicting values

>p <- as.data.frame(53)

model <- lm(BP ~ Age, data = bp)

# Create the input data frame.

# Give the chart file a name.

# Plot the tree.

# Save the file.

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

Loading required package: sandwich

# Inspect the result

# Print the cluster membership component of the model

– Visualizing and interpreting results of kmeans()

You might also like