Input Data
Below is the sample data representing the observations −
# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48
lm() Function
This function creates the relationship model between the predictor and the
response variable.
Syntax
The basic syntax for lm() function in linear regression is −
lm(formula,data)
Following is the description of the parameters used −
formula is a symbol presenting the relation between x and y.
data is the vector on which the formula will be applied.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
predict() Function
Syntax
The basic syntax for predict() in linear regression is −
predict(object, newdata)
Following is the description of the parameters used −
object is the formula which is already created using the lm() function.
newdata is the vector containing the new value for predictor variable.
---------------------------------------------------------------------------
-------------------------------------------- # The predictor
vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
When we execute the above code, it produces the following result −
1
76.22869
Visualize the Regression GraphicallY
# Create the predictor and response variable.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab =
"Height in cm")
# Save the file.
dev.off()
Note :
model=lm(y~x1+x2)
summary(model)
>
Fit the Model: The lm() function is used to fit a linear regression model.
Summary: The summary() function is used to view detailed results of the regression
model, including coefficients, standard errors, t-values, p-values, R-squared, etc.
Linear regression equation
where:
y is the response (also called outcome, or dependent
variable)
x and z are the predictors (also called features, or
independent variables)
2.
boxplot(model$residuals)
Note :
Normality
To check whether the dependent variable follows a normal distribution, use
the hist() function.
3. hist(income.data$happiness)
--------------------------------------------------------
----- Linearity
The relationship between the independent and dependent variable must be linear.
We can test this visually with a scatter plot to see if the distribution of data points
could be described with a straight line.
plot(happiness ~ income, data = income.data)
----------------------------------------------------------------------
--- Multiple regression
1. Independence of observations
Use the cor() function to test the relationship between your independent variables
and make sure they aren’t too highly correlated.
cor(heart.data$biking, heart.data$smoking)
---------------------------------------------------------------
-- You can use the following methods to extract
regression coefficients from the lm() function in R:
Method 1: Extract Regression Coefficients Only
model$coefficients
---------------------------------------------------------------------------
--------------------------------- summary(model)
$coefficients
The following example shows how to use these methods in
practice
Example: Extract Regression Coefficients
from lm() in R
Suppose we fit the following multiple linear regression model
in R:
#create data frame
df <- data.frame(rating=c(67, 75, 79, 85, 90, 96, 97),
points=c(8, 12, 16, 15, 22, 28, 24),
assists=c(4, 6, 6, 5, 3, 8, 7),
rebounds=c(1, 4, 3, 3, 2, 6, 7))
#fit multiple linear regression model
model <- lm(rating ~ points + assists + rebounds, data=df)
We can use the summary() function to view the entire
summary of the regression model:
#view model summary
summary(model)
Call:
lm(formula = rating ~ points + assists + rebounds, data = df)
Residuals:
1 2 3 4 5 6 7
-1.5902 -1.7181 0.2413 4.8597 -1.0201 -0.6082 -0.1644
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.4355 6.6932 9.926 0.00218 **
points 1.2152 0.2788 4.359 0.02232 *
assists -2.5968 1.6263 -1.597 0.20860
rebounds 2.8202 1.6118 1.750 0.17847
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.193 on 3 degrees of freedom
Multiple R-squared: 0.9589, Adjusted R-squared: 0.9179
F-statistic: 23.35 on 3 and 3 DF, p-value: 0.01396
To view the regression coefficients only, we can
use model$coefficients as follows:
#view only regression coefficients of model
model$coefficients
(Intercept) points assists rebounds
66.435519 1.215203 -2.596789 2.820224
We can use these coefficients to write the following fitted
regresion equation:
Rating = 66.43551 + 1.21520(points) – 2.59678(assists) +
2.82022(rebounds)
To view the regression coefficients along with their standard
errors, t-statistics, and p-values, we can
use summary(model)$coefficients as follows:
#view regression coefficients with standard errors, t-statistics, and
p-values
summary(model)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.435519 6.6931808 9.925852 0.002175313
points 1.215203 0.2787838 4.358942 0.022315418
assists -2.596789 1.6262899 -1.596757 0.208600183
rebounds 2.820224 1.6117911 1.749745 0.178471275
We can also access specific values in this output.
For example, we can use the following code to access the p-
value for the points variable:
#view p-value for points variable
summary(model)$coefficients["points", "Pr(>|t|)"]
[1] 0.02231542
Or we could use the following code to access the p-value for
each of the regression coefficients:
#view p-value for all variables
summary(model)$coefficients[, "Pr(>|t|)"]
(Intercept) points assists rebounds
0.002175313 0.022315418 0.208600183 0.178471275
The p-values are shown for each regression coefficient in the
model.
You can use similar syntax to access any of the values in the
regression output.