0% found this document useful (0 votes)
8 views

Lecture 18

Uploaded by

Fareh Iqbal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 18

Uploaded by

Fareh Iqbal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Lecture 18

Regression methods

1
Curve Fitting
Types of
Regression Models

3
Types of
Regression Models

Regression
Models

4
Types of
Regression Models

1 Explanatory Regression
Variable Models

Simple

5
Types of
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

6
Types of
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Linear

7
Types of
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non-
Linear
Linear

8
Types of
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non-
Linear Linear
Linear

9
Types of
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

10
Linear Regression Model

11
Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable M odels Variables

Simple M ultiple

Non- Non-
Linear Linear
Linear Linear

12
Linear Equations
Y
Y = mX + b
C ha ng e
m = S lo pe in Y
C ha ng e in X
b = Y -in te rce pt
X

13
Linear Regression Model

• 1. Relationship Between Variables Is a


Linear Function
Population Population Random
Y-Intercept Slope Error

Yi   0   1 X i   i
Dependent Independent (Explanatory)
(Response) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Scatter plot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
15
Thinking Challenge

How would you draw a line through the points?


How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
16
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
17
Parameter Estimation Solution
yˆi  ˆ0  ˆ1 xi
n
  n

n
  i   Yi 
X
  i 1 

i 1
X Y
i i 
n
ˆ1  i 1
n 2
 
n
  Xi
 

i 1
Xi 
2

i 1 n

ˆ0  Y  ˆ1 X
18
Computation Table
2 2
Xi Yi Xi Yi XiYi
2 2
X1 Y1 X1 Y1 X1 Y1
2 2
X2 Y2 X2 Y2 X2 Y2
: : : : :
2 2
Xn Yn Xn Yn Xn Yn
2 2
 Xi  Yi  Xi  Yi XiYi
19
Parameter Estimation Example
• Obstetrics: What is the relationship between
Mother’s Estriol level & Birthweight using the following
data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

20
Scatterplot
Birthweight vs. Estriol level

Birthweight
4
3.5
3
2.5
2
1.5
1
0.5
0
0 1 2 3 4 5 6

Estriol level

21
Parameter Estimation Solution Table

Xi Yi Xi2 Yi2 XiYi


1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
22
Parameter Estimation Solution
n
  n 
  X i   Yi 
n
   1510 

i 1 i 1
X Y
i i  37 
n 5
ˆ1  i 1
  0.70

n

2
15
2

n
  Xi 55 
5
 

i 1
X i
2

i 1 n

ˆ0  Y  ˆ1 X  2  0.703  0.10


23
Coefficient Interpretation Solution
^
• 1. Slope (1)
– Birthweight (Y) Is Expected to Increase by .7 Units
for Each 1 unit Increase in Estriol (X)
^
• 2. Intercept (0)
– Average Birthweight (Y) Is -.10 Units When Estriol
level (X) Is 0
• Difficult to explain
• The birthweight should always be positive

24
Parameter Estimation Thinking Challenge

• You’re a Vet epidemiologist for the county


cooperative. You gather the following data:
• Food (lb.) Milk yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
• What is the relationship
between cows’ food intake and milk yield?

25
Scattergram
Milk Yield vs. Food intake*

M. Yield (lb.)
10
8
6
4
2
0
3 4 5 6 7 8 9 10 11 12 13

Food intake (lb.)

26
Parameter Estimation Solution Table*

2 2
Xi Yi Xi Yi XiYi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218

27
Parameter Estimation Solution*
n
  n

  X i   Yi 

n
X Y   i 1  i 1 
218 
32 24 
i i
n 4
ˆ1  i 1
  0.65

n

2
32 
2

n
  Xi 296 
4
 

i 1
X i
2

i 1 n

ˆ0  Y  ˆ1 X  6  0.658  0.80


28
Coefficient Interpretation Solution*
^
• 1. Slope (1)
– Milk Yield (Y) Is Expected to Increase by .65 lb.
for Each 1 lb. Increase in Food intake (X)

• 2. Y-Intercept (0)
^
– Average Milk yield (Y) Is Expected to Be 0.8 lb.
When Food intake (X) Is 0

29
Simple Nonlinear Regression (Polynomial
Regression)
Simple Nonlinear Regression (Polynomial
Regression)
Example
Fit a second-order polynomial to the data.
xi 0 1 2 3 4 5
yi 2 8 14 27 41 61
Solution:
Example
A = 2.5
B = 2.5214
C = 1.8214
Hence, the least squares quadratic equation for this
problem is
y = 2.5 + 2.5214x + 1.8214x2
Multiple Linear Regression
The General Idea

Simple regression considers the relation


between a single explanatory variable and
response variable

36
The General Idea

Multiple regression simultaneously considers the


influence of multiple explanatory variables on a
response variable Y
The intent is to look at the
independent effect of each
variable while “adjusting
out” the influence of
potential confounders

37
The General Idea

• In many applications, there is more than one factor that


influences the response.
• Multiple regression models thus describe how a single
response variable Y depends linearly on a number of
predictor variables.
– The selling price of a house can depend on the desirability of the
location, the number of bedrooms, the number of bathrooms, the
year the house was built, the square footage of the lot and a number
of other factors.
– The height of a child can depend on the height of the mother, the
height of the father, nutrition, and environmental factors.

38
Regression Modeling

• A simple regression model (one


independent variable) fits a
regression line in 2-dimensional
space

• A multiple regression model with


two explanatory variables fits a
regression plane in 3-
dimensional space

Basic Biostat 39
How do we “learn” parameters

• For the 2-d problem (line) there are


coefficients for the bias and the independent
variable (y-intercept and slope)

• To find the values for the coefficients which


minimize the objective function we take the
partial derivates of the objective function (SSE)
with respect to the coefficients. Set these to 0
and solve.
40
Multiple Regression Model

Again, estimates for the multiple slope coefficients


are derived by minimizing ∑residuals2 to derive this
multiple regression model:

41
Multiple Regression Model

• Intercept α predicts
where the regression
plane crosses the Y
axis
• Slope for variable X1
(β1) predicts the
change in Y per unit
X1 holding X2 constant
• The slope for variable
X2 (β2) predicts the
change in Y per unit
X2 holding X1 constant
Basic Biostat 15: Multiple Linear Regression 42
Multiple Regression Model

A multiple regression model with k independent


variables fits a regression “surface” in k + 1
dimensional space (cannot be visualized)

43
Multiple Linear Regression Models

• For example, suppose that the effective life of a cutting


tool depends on the cutting speed and the tool angle. A
possible multiple regression model could be

where
Y – tool life
x1 – cutting speed
x2 – tool angle

44
Multiple Linear Regression Models

Figure 12-1 (a) The regression plane for the model E(Y) = 50 + 10x1 + 7x2.
(b) The contour plot
45
Multiple Linear Regression Models

46
Multiple Linear Regression Models

Figure 12-2 (a) Three-


dimensional plot of the
regression model E(Y) = 50
+ 10x1 + 7x2 + 5x1x2. (b) The
contour plot

47
12-1: Multiple Linear Regression Models

12-1.2 Least Squares Estimation of the Parameters

48
Multilple Linear regression
• The coefficient giving the minimum sum of the
squares of the residuals are obtained by
setting the partial derivatives equal to zero
and expressing the result in matrix form as

49
12-1: Multiple Linear Regression Models

12-1.2 Least Squares Estimation of the Parameters


• The least squares normal Equations are

• The solution to the normal Equations are the least


squares estimators of the regression coefficients.
50
Example

• The data generated from equation y = 7 + 3x1


+ 4x2. Using multiple linear regressions to fit
this data.

• The summations required for

51
Solving system of linear equations

52

You might also like