0% found this document useful (0 votes)
2 views

Simple and Multiple Regression Analysis

The document provides an overview of simple and multiple regression analysis, detailing the statistical procedures used to predict the value of a response variable based on one or more independent variables. It includes explanations of regression equations, coefficients, and examples illustrating the calculation of these models. Additionally, it discusses the assumptions and design requirements for multiple regression, as well as significance testing for regression coefficients and model comparisons.

Uploaded by

thanosstark190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Simple and Multiple Regression Analysis

The document provides an overview of simple and multiple regression analysis, detailing the statistical procedures used to predict the value of a response variable based on one or more independent variables. It includes explanations of regression equations, coefficients, and examples illustrating the calculation of these models. Additionally, it discusses the assumptions and design requirements for multiple regression, as well as significance testing for regression coefficients and model comparisons.

Uploaded by

thanosstark190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Simple regression

and
Multiple regression analysis

By

Dr. S. B. Javali Ph.D.


Associate Professor in Statistics
Department of Community Medicine
USM-KLE-IMP, Belagavi
Simple Regression Analysis

(Linear)
It is statistical procedure that
attempts to predict or estimate the value
of the response variable from known
values of one or more independent
variables.

Analysis using a single independent


variable is called Simple (Linear)
regression
Linear Regression Equation

Y = a + bX
Suppose we want to test whether there is any relation
between birth weight (BW) of baby and blood pressure
(BP)
Here dependent variable is BP and independent variable
is BW. The equation is
BP=a+b(BW)
i.e. Y=a+bX

In mathematics Y is called a function of X, but in


statistics the term regression is used to describe the
relationship
Regression Equations

The regression equation of y on x is


y = a + bx

b is called the regression coefficient and

a is called the arbitrary constant or Y

intercept

a and b are to be determined or calculated by

least square methods


Regression Coefficients
b is called the slope or regression coefficient of
the equation, a is called the Y intercept or
constant and they are as follows

b =
 xy − nx y
( x ) 2

x − n
2

or b =
 xy − nx y
 x − nx
2 2

and a = y − bx
Where xy =sum of products of all individual
values of x and y

Σx2= sum of squares of all individual values of x

Σx= sum of all individual values of x

n= number of pairs of x and y

x and y = mean of x and y


Regression Equation

The regression equation of x on y is


x=a+by
b is called the regression coefficient and

a is called the arbitrary constant or Y intercept

a and b are to be determined or calculated by

least square method


Regression Coefficients
b is called the slope or regression coefficient of
the equation, a is called the Y intercept or
constant and they are as follows

b =
 xy − nx y
( y ) 2

y − n 2

or b =
 xy − nx y
 y − ny 2 2

and a = x − b y
Where xy =sum of products of all individual
values of x and y

Σy2= sum of squares of all individual values of y

Σy= sum of all individual values of y

n= number of pairs of x and y

x and y = mean of x and y


Example: The data given below is related to
birth weight of baby and blood pressure
No BW (kg) BP
1 1.5 47.0
2 1.9 50.0 1.Fit a regression
3 2.2 71.0
4 2.5 76.0 equation of y on x
5 2.7 76.0
6 2.8 81.0
2.Calculate BP if BW
7 3.0 86.0
is 3.5 kg
8 3.2 85.0
9 3.4 91.0
10 3.7 106.0
The regression equation of y on x is
y = a + bx
Where

b=
 xy − nx y
 x − nx 2 2

and a = y − bx
Solution:

No BW (x) BP (y) xy x2
1 1.5 47.0 70.5 2.25
2 1.9 50.0 95.0 3.61
3 2.2 71.0 156.2 4.84
4 2.5 76.0 190.0 6.25
5 2.7 76.0 205.2 7.29
6 2.8 81.0 226.8 7.84
7 3.0 86.0 258.0 9.00
8 3.2 85.0 272.0 10.24
9 3.4 91.0 309.4 11.56
10 3.7 106.0 392.2 13.69
Total 26.90 769.00 2175.30 76.57
Mean 2.69 76.90
b=
 xy − nx y
 x − nx
2 2

2175.30 − 10( 2.69)(76.90)


=
76.57 − (10)(2.69) * ( 2.69)
106.69
=
4.21
= 25.3481

and a = y − bx
= 76.90 − (25.3481) * (2.69)
= 8.7137
So that the regression equation will be
y=8.7137+25.3481*x

The regression coefficient b means that for


each unit change in x (BW), y (BP) increases by
25.3481 units

Calculate BP if BW is 3.5 kg
y (BP) =8.7137+25.3481*3.50
=97.4312
Relationship between regression coefficients
i.e. b and b1 and correlation coefficient
The relationship is
  xy − nx y   xy − nx y 
bb = 
1  
   y 2 − ny 2 
  x − nx
2 2
 
( xy − nx y ) 2

=
( x 2
− nx )( y
2 2
− ny 2 )
2
 
=
 xy − nx y 


 ( x 2
− nx )( y
2 2
− ny 2 ) 


= (r )
2

= r2

 r 2 = bb 1
i.e. r = (bb ) 1
Types of regression
Simple Linear regression

Multiple linear regression

Simple and multiple Logistic regression

Forward Backward step wise regression

Simple or multiple ordinal regression


Linear regression
Prediction of dependent (quantitative) variable with
only one independent variable (quantitative or
qualitative)

Multiple Linear regression


Prediction of dependent variable (quantitative) with
more than one independent variable (quantitative or
qualitative)

Logistic regression
Prediction of dependent variable (qualitative) with one
or more than one independent variable (quantitative or
qualitative)
Ordinal regression

Prediction of dependent variable (qualitative /

ordinal data) with one or more than one

independent variable (quantitative or

qualitative)
Example: A study was conducted to establish a statistical
relationship between the X and X by researcher. The data is given
below. Suppose Y is the response variable and X is the independent
variable

Patient X Y
1 140 12
2 141 12 -Find the correlation coefficient
3 143 11
4 141 14
5 141 17 -Estimate the regression equation
6 133 8
7 135 13 -Use regression equation line to
8 143 13
9 130 14
10 150 14 predict the Y for a patient whose
11 139 15
12 130 10 X is 145
13 140 10
14 161 18
15 135 11
Example: A researcher selected 9 sets of identical
twins to determine whether is a relationship between
the first born and second born twins in the IQ
scores. Is there is a strong relationship in the IQ
score between identical twins? The following table
represents their IQ scores

Patient 1 2 3 4 5 6 7 8 9

First born 112 127 105 132 117 135 122 101 128

Second

born 118 120 100 128 102 133 125 104 114
Multiple Regression
Analysis
Topics
• Review Simple Regression Analysis
• Multiple Regression Analysis
– Design requirements
– Multiple regression model
– R2
– Testing R2 and b’s
– Comparing models
– Comparing standardized regression
coefficients
Simple regression analysis
Simple regression considers the relation
between a single explanatory/independent
variable and response/dependent variable

i.e. Y=a+bX
Simple Regression Model
Regression coefficients are estimated by
minimizing ∑residuals2 (i.e., sum of the squared
residuals = observed - predicted) to derive this
model:

The standard error of the regression (sY|x)


is based on the squared residuals:
Multiple Regression Analysis

• Method for studying the relationship


between a dependent variable and two
or more independent variables.
• Purposes:
– Prediction
– Explanation
– Theory building
Design Requirements

• One dependent variable (criterion)

• Two or more independent variables


(predictor variables).

• Sample size: >= 30 (at least 10 times


as many cases as independent
variables)
Assumptions
• Independence: the scores of any particular
subject are independent of the scores of all
other subjects
• Normality: in the population, the scores on the
dependent variable are normally distributed
for each of the possible combinations of the
level of the X variables; each of the variables
is normally distributed
Assumptions

•Homoscedasticity:in the population, the variances


of the dependent variable for each of the possible
combinations of the levels of the X variables are
equal
•Linearity: In the population, the relation between
the dependent variable and the independent
variable is linear when all the other independent
variables are held constant.
Simple vs. Multiple Regression

• One dependent variable Y • One dependent variable Y


predicted from one predicted from a set of
independent variable X independent variables
(X1, X2 ….Xk)
• One regression • One regression
coefficient for each
coefficient
independent variable
• R 2: proportion of
• r :
2 proportion of variation in dependent
variation in dependent variable Y predictable by
variable Y predictable set of independent
from X variables (X’s)
Regression Modeling

• A simple regression
model (one independent
variable) fits a
regression line in 2-
dimensional space

• A multiple regression
model with two
explanatory variables
fits a regression plane
in 3-dimensional space
Example: Self Concept and
Academic Achievement (N=103)
The General Idea
Multiple regression simultaneously considers
the influence of multiple explanatory variables
on a response variable Y

The intent is to look


at the independent
effect of each
variable while
“adjusting out” the
influence of potential
confounders
Multiple Regression Model for 2 variables
Again, estimates for the multiple slope
coefficients are derived by minimizing
∑residuals2 to derive this multiple regression
model:

Again, the standard error of the


regression is based on the ∑residuals2:
Example: The multiple Model

The model or equation with k variables is given by


Y = a + b1X1 + b2X2 + …+bkXk
• The b i’s (i=1, 2, …k) are called regression coefficients

Our example-Predicting AA:


• Y’= 36.83 + (3.52)XASC + (-0.44)XGSC
• Predicted AA for person with GSC of 4 and ASC of 6
• -Y’= 36.83 + (3.52)(6) + (-.44)(4) = 56.23
Multiple Correlation Coefficient (R) and
Coefficient of Multiple Determination (R2)

• R = the magnitude of the relationship


between the dependent variable and the best
linear combination of the predictor variables

• R2 = the proportion of variation in Y


accounted for by the set of independent
variables (X’s).
Explaining Variation: How much?

Predictable variation
by the combination of
independent variables
Total Variation in Y

Unpredictable
Variation
Proportion of Predictable and
Unpredictable Variation
(1-R2) = Unpredictable
Where: (unexplained) variation
Y= AA in Y
X1 = ASC
X2 =GSC
Y
X1

R2 = Predictable
X2 (explained)
variation in Y
Various Significance Tests

• Testing R2
– Test R2 through an F test
– Test of competing models (difference
between R2) through an F test of
difference of R2s
• Testing b
– Test of each partial regression
coefficient (b) by t-tests i.e. t= Reg.
coefficient/SE(b)
Example: Testing R2
• What proportion of variation in AA can be
predicted from GSC and ASC?
– Compute R2: R2 = .16 (R = .41) : 16% of the
variance in AA can be accounted for by the
composite of GSC and ASC

• Is R2 statistically significant from 0?


– F test: Fobserved = 9.52, Fcrit (05/2,100) = 3.09
– Reject H0: in the population there is a
significant relationship between AA and the
linear composite of GSC and ASC
Example: Comparing Models -Testing
R2
• Comparing models
– Model 1: Y’= 35.37 + (3.38)XASC
– Model 2: Y’= 36.83 + (3.52)XASC + (-.44)XGSC
• Compute R2 for each model
– Model 1: R2 = r2 = .160
– Model 2: R2 = .161
• Test difference between R2s
– Fobs = .119, Fcrit(.05/1,100) = 3.94
– Conclude that GSC does not add significantly to
ASC in predicting AA
Testing Significance of b’s

• H0:  = 0
• tobserved = b-
standard error of b

• with N-k-1 df
Example: t-test of b
• tobserved = -0.44 - 0/14.24
• tobserved = -0.03
• tcritical(.05,2,100) = 1.96

• Decision: Cannot reject the null


hypothesis.
• Conclusion: The population  for GSC
is not significantly different from 0
Comparing Regression Coefficients

• Which is the stronger predictor?


Comparing bGSC and bASC
• Convert to standardized partial regression
coefficients (beta weights, ’s)
– GSC = -0.038
– ASC = 0.417
– On same scale so can compare: ASC is stronger
predictor than GSC
• Beta weights (’s ) can also be tested for
significance with t tests.
Different Ways of Building Regression
Models

• Simultaneous: all independent variables entered


together

• Stepwise: independent variables entered according


to some order
– By size or correlation with dependent variable

– In order of significance

• Hierarchical: independent variables entered in


stages

You might also like