0% found this document useful (0 votes)

7 views40 pages

Share MBBS Lecture 5 (1) - 1

The document outlines the methods of measuring relationships between variables through regression and correlation analysis, focusing on simple linear, multiple linear, and logistic regression. It details the learning objectives, statistical principles, and practical applications, including hypothesis testing and interpretation of results. Additionally, it emphasizes the importance of checking assumptions for valid regression models and the distinction between correlation and causation.

Uploaded by

olamiderbp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views40 pages

Share MBBS Lecture 5 (1) - 1

Uploaded by

olamiderbp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Measures of relationship

Presentation outline

• Learning objectives
• Introduction
• Regression Analysis
- Simple linear regression
- Multiple linear regression
- Logistic regression

• Correlation Analysis
- Curvilinear Relationship
- Coefficient of Determination
Learning objective

After this topic, you should be able to:

• Distinguish between the basic purpose of regression
analysis and correlation analysis
• Compute and interpret a regression equation
• Interpret the coefficient of simple linear regression
• Know how to extend simple linear regression to consider
multiple risk variables
• Know how to apply some of the ideas from linear regression
when the outcome of interest is a binary outcome
• Know how and when to apply logistic regression and how to
interpret a relationship represented by an odds ratio
Introduction

• Statistical analysis is
concerned not only
with summarizing data but also with
investigating relationships
• Some of our most intriguing scientific
questions deal with the relationship between
two variables
• Does a relationship exist between use of
oral contraceptives and the incidence of
• What is the relationship of a mother’s weight to
her baby’s birth weight?
• These are typical of countless questions we
pose in seeking to understand the relationship
between two variables
• There is an all-too-human tendency to attribute
a cause-and-effect relationship to variables
that might be related
• We discuss the methods of measuring the
relationships of bivariate data

• Determine the strength of the relationships and

• Make inferences to the population from which

the sample was drawn
Regression Analysis
• Sir Francis Galton coined the term regression
during his study of heredity laws
• He observed that physical characteristics of
children were correlated with those of their
fathers
• He found that tall fathers tended to have shorter
sons, whereas short fathers tended to have
taller sons
• A phenomenon he called “regression toward the
mean”
• Subsequently, statistician embraced the term
regression line to describe a linear relationship
between two variables
Simple linear regression

• Gives the equation of the straight line that best

describes the linear relationship between two
numerical variables
• i.e. how one variable will behave as another
variable changes
• Enables the prediction of one variable using
another variable using the equation of the
straight line
Types of variable in linear regression

• The dependent variable (y) is the variable to

be predicted (i.e. usually the measured health
outcome of interest)
• The independent variable (x) or explanatory
variable is the variable used for predicting the
dependent variable
• NB: In correlation it does not matter which
variable is which, but in regression it matters
Research questions

• How does systolic blood pressure change as

age increases?
• Can a subject’s diastolic blood pressure
predict their systolic blood pressure?
• Can body fat be predicted from abdomen
circumference measurements?
• In each of these research questions which is
the dependent variable?
Equation of a straight line

• The equation of a straight line is

• y’ = a + bx
• y’ is the predicted value (of the dependent
variable y)
• a is the intercept
• b is the slope (or gradient) of the line
• x is the independent (explanatory) variable
Simple example
X Y
0 2
1 5
2 8
3 ?
4 14
5 17
6 ?

• Equation of line is y = 2 + 3x
Linear regression equation
• y’ = a + b * x
• y’ = intercept + ( slope * x )
• We want the residuals (distances between the
observations and the line) to be small
• Residual: the difference between the actual value of the
dependent variable and the predicted value from the
regression line: ε = (y’- y)
• A residual is calculated for each observation
• The values of a and b for the regression equation are
calculated to minimise the sum of the squared residuals
–called the least squares fit
Regression coefficient (b)

• The slope, b, is often called the regression

coefficient

• It has the same sign as the correlation

coefficient

• When there is no correlation between x and y,

then the regression coefficient, b, equals 0
Predicted value (y’)

• The predicted value, y’, is subject to sampling

variation
• Its precision can be estimated (prediction
error) by the standard error of the estimate
• The greater the standard error, the greater the
dispersion of predicted y values around the
regression line and hence the larger the
prediction error
Statistical inference in regression
• Regression coefficients calculated from a sample
of observations are estimates of the population
regression coefficients
• Hypothesis tests and confidence intervals can be
constructed using sample estimates to make
inferences about population regression coefficients
• For valid use of these inferential approaches, it is
necessary to check the underlying assumptions of
the model (linearity, normality, constant variance) –
discussed later
Process for simple linear regression
• Check that there is a linear relationship (scatter
plot)
• Use SPSS to fit the simple linear regression model
to find the best straight line through the data
• Check the to see the amount of variation in the
dependent variable explained by the explanatory
variable should be close to 1 (i.e. )
• Write down the regression equation
• Check the assumptions to ensure that the equation
can be used to make predictions
• A fitness gym wishes to assess their client’s body
fat. An accurate method of measuring body fat is
using an underwater weighing technique. This is
not a practical method for the fitness instructors to
carry out on the premises.
• They would like to be able to predict their client’s
body fat from other measurements, e.g. Abdomen
circumference
• 252 men had their body fat and abdomen
circumference measured
Testing hypothesis

• H0: There is no linear relationship between body fat

and abdomen circumference in the population
• H1: There is a linear relationship between body fat
and abdomen circumference in the population
Or this can be rephrased as
• H0: Abdomen circumference does not account for
any variability in body fat in the population
• H1: Abdomen circumference does account for some
of the variability in body fat in the population
Simple linear regression in SPSS

• Analyze
–Regression
–Linear
• The dependent variable is body fat
• The independent variable is abdomen
circumference
SPSS: linear regression
Model R R Adjusted Std Error of
Square R Square the estimate
1 0.814 0.662 0.661 4.5144

• R is the correlation between the two variables

0.814
• R square (R x R) is the proportion of variability
in body fat measurements that can be
explained by differences in abdomen
circumference = 0.662 or 66.2%
SPSS: linear regression
Anova
Model Source of Sum of df Mean F
variation square square

1 Regression 9984.086 1 9984.086 489.903

Residual 5094.931 250 20.380
Total 15079.017 251
• A statistically significant (p<0.001) proportion
of the variability in body fat measurements can
be attributed to the regression model (i.e.
abdomen)
SPSS:regression equation
Coeff
Model B Std Standanad t Sig
Error ize coeff.
(Beta)

Constant -35.197 2.462 -14.294 0.000

Abdomen 0.585 0.26 0.814 22.134 0.000

• Predicted body fat = constant + B x abdomen circum.

• Predicted body fat = -35.197 + 0.585 x abdomen
circum.
Prediction
• How do you use linear regression for prediction?
• The regression equation allows you to predict the value
of the dependent variable (Y) for a particular value of the
independent variable (X)
• Predicted body fat = -35.197 + 0.585 abdomen circum
• What is the predicted body fat content for a man with an
abdomen circumference of 100cm?
• Predicted body fat = -35.197 + 0.585 x 100cm
= -35.197 + 58.5
• = 23.3%
Assumptions of linear regression

• There should be a linear relationship between the

dependent variable and the independent variable
• For any value of the independent variable the
dependent variable values should follow a Normal
distribution (i.e. normally distributed residuals)
• The variance of the dependent variable values
should be the same for all independent variable
values
Checking the assumptions
• After the regression model has been fitted to
the data it is essential to check that the
assumptions of linear regression have not
been violated
• If any of the assumptions have been violated
then the regression model is likely to be invalid
• INVALID ASSUMPTIONS MEAN THAT THE
PREDICTIONS BASED ON THIS MODEL
MAY BE POOR
Assumptions

• Plot the dependent variable against the

independent variable
- Linear pattern (sausage shape) if linearity
assumption to hold
• Plot the residuals against the predicted
values
- No curvature in the plot should be seen for the
linearity assumption to hold
• Normally distributed residuals can be tested
by looking at a histogram of the residuals
• Normally distributed residuals can be tested
by looking at a normal probability plot (Normal
p-p plot)
• Constant variance of the residuals can be
assessed by plotting the residuals against the
predicted values
-There should be an even spread of residuals
around zero
Summary: simple linear regression
• Simple linear regression gives the equation of the
straight line that best describes the association
between two variables
– A linear relationship between the dependent
variable and the independent variable is required
– For any value of the independent variable the
dependent variable values should follow a Normal
distribution
– The variance of the dependent variable values
should be the same for all independent variable values
Multiple regression

• Extend the principles learnt today to multiple

linear regression
• To explore the dependency of one outcome
variable on two or more explanatory variables
simultaneously
• To study the relationship between two
variables after removing (adjusting for) the
possible effects of other “nuisance” variables
of less interest
• Logistic regression model simply mean a statistical
model which describe the relationship between a
qualitative dependent variable (i.e. presence and
absence of disease) and the independent variables
(continuous or/and categorical)
• Continuous variables are not used as dependent
in logistics regression
• The logistic model uses the odds ratio to determine
the effect a predictor variable has on the outcome
• An odds ratio is simply the ratio of 2 odds and is
used extensively in medical studies as a measure
of effect for categorical data
• Odds are usually expressed in term of probability of
an event
• If the probability of an event is p, then
• odds = p/1-p
• Similarly, odds can be converted to probability by p
= odds/1+odds
• As probability goes from 0 to 1, odds vary from 0 to
Example
Renal dysfunction

Sex yes no Total

Male 16 14 30
Female 12 18 30
Total 28 32 60

• What is the odds of Renal dysfunction?

• Compute the odds ratio
Solution
• P (renal dysfunction) = 28/60 = 0.47
• odds(renal dysfunction) = 0.47/1-0.47= 0.875
• odds() = (16/30)/(1-16/30)= 16/14=1.143
• odds() = (12/30)/(1-12/30)= 12/18=0.667
• Therefore odds ratio= odds() / odds()
= 1.143/0.667= 1.71
I.e. The odds of renal dysfunction is 71% more
in male than female
Correlation

• Measures the strength of linear association

between two continuous / discrete variables
• Can be positive or negative
• Can vary between -1 and +1
• Does not imply causation (there may be some
other factor that can explain the association)
Correlation coefficient

• The sample correlation coefficient is

represented using r and calculated as

• r = Covariance between X and Y /(Variance of

X * variance of Y)

• This is called Pearson correlation coefficient

Pearson correlation coefficient

• r=-1 Strong negative linear relationship

As the
value of X increases the value of Y decreases

• r=0 No linear relationship between X and Y

• r=+1 Strong positive linear relationshipAs the

value of X increases the value of Y increases
Hypothesis test for correlation coefficient
• It is possible to test whether a population
correlation coefficient differs significantly from
zero
• The significance of the correlation coefficient will
depend on the size of the correlation coefficient
and the number of observations in the sample
• The validity of this test requires that the variables
are observed on a random sample of
individuals and at least one of the variables
follows a normal distribution

Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
34 pages
Linear Regression
100% (3)
Linear Regression
28 pages
F Regression
No ratings yet
F Regression
65 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
35 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Regression
No ratings yet
Regression
3 pages
Chapter 9: Correlation & Simple Linear Regression
No ratings yet
Chapter 9: Correlation & Simple Linear Regression
17 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Simple and Multiple Regression Analysis
No ratings yet
Simple and Multiple Regression Analysis
46 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
@regression
No ratings yet
@regression
33 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Correlation
No ratings yet
Correlation
13 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Correlation Coefficient and R-squared Explained
No ratings yet
Correlation Coefficient and R-squared Explained
66 pages
STATG5 - Simple Linear Regression Using SPSS Module
No ratings yet
STATG5 - Simple Linear Regression Using SPSS Module
16 pages
Correlation and Regression Notes
No ratings yet
Correlation and Regression Notes
5 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
SRT 605 - Topic (10) SLR
No ratings yet
SRT 605 - Topic (10) SLR
39 pages
Module - 05 Statistical Computing and R Programming
No ratings yet
Module - 05 Statistical Computing and R Programming
53 pages
Correlation Least Squares
No ratings yet
Correlation Least Squares
59 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Biostatistics Regression Guide
No ratings yet
Biostatistics Regression Guide
10 pages
Multiple Linear Regression Session 4
No ratings yet
Multiple Linear Regression Session 4
32 pages
Understanding Simple Regression Analysis
100% (1)
Understanding Simple Regression Analysis
8 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
19 pages
LP-III Lab Manual
No ratings yet
LP-III Lab Manual
49 pages
Assessing Relationships: Regression Analyses: February 25, 2020
No ratings yet
Assessing Relationships: Regression Analyses: February 25, 2020
20 pages
Module 11. Lesson Proper
No ratings yet
Module 11. Lesson Proper
5 pages
Chap 2-6 Regression
No ratings yet
Chap 2-6 Regression
17 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Simple Linear Regression 2023
No ratings yet
Simple Linear Regression 2023
33 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
11 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
1 Linear Regression
No ratings yet
1 Linear Regression
22 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
65 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
1.1.2simple Linear Regression
No ratings yet
1.1.2simple Linear Regression
14 pages
Regression Analysis in Healthcare
No ratings yet
Regression Analysis in Healthcare
3 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
104 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
13 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
14 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
ML Assignment No. 1: 1.1 Title
No ratings yet
ML Assignment No. 1: 1.1 Title
8 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Share MBBS-LECTURE 3 (1) - 1
No ratings yet
Share MBBS-LECTURE 3 (1) - 1
68 pages
Share Mbbs - Lecture 1 (1) - 1
No ratings yet
Share Mbbs - Lecture 1 (1) - 1
24 pages
Share Mbbs - Lecture2 2024 (1) - 1
No ratings yet
Share Mbbs - Lecture2 2024 (1) - 1
39 pages
A Study of Preventable Causes of Ischaemic Heart Disease: International Journal of Current Research February 2017
No ratings yet
A Study of Preventable Causes of Ischaemic Heart Disease: International Journal of Current Research February 2017
5 pages
Matrices & Bubble Diagram: Site Matrix: - Desirable - Undesirable - Not Necessary
No ratings yet
Matrices & Bubble Diagram: Site Matrix: - Desirable - Undesirable - Not Necessary
1 page
Matheus Burity - Portfolio
No ratings yet
Matheus Burity - Portfolio
14 pages
RAMA 54294 05021381520022 0025066601 0031107101 01 Front Ref
No ratings yet
RAMA 54294 05021381520022 0025066601 0031107101 01 Front Ref
19 pages
PCA Teoria-Rectangular Concrete Tanks
100% (5)
PCA Teoria-Rectangular Concrete Tanks
27 pages
Secrets 2nd Edition Luis Royo PDF Download
100% (7)
Secrets 2nd Edition Luis Royo PDF Download
52 pages
TA9 - Unit 2
No ratings yet
TA9 - Unit 2
47 pages
Big Data and Artificial Intelligence in The Fields of Accounting and Auditing A Bibliometric Analysis
No ratings yet
Big Data and Artificial Intelligence in The Fields of Accounting and Auditing A Bibliometric Analysis
28 pages
Topcon System 5 and Smoothtrac Manual 23021
No ratings yet
Topcon System 5 and Smoothtrac Manual 23021
204 pages
CHAPTER 21 - Booklet
No ratings yet
CHAPTER 21 - Booklet
12 pages
Tense Table
No ratings yet
Tense Table
2 pages
DDWRT OpenVPN Client Setup Guide v14
No ratings yet
DDWRT OpenVPN Client Setup Guide v14
17 pages
Understanding The Six Stages of Disease - Maharishi AyurVeda
No ratings yet
Understanding The Six Stages of Disease - Maharishi AyurVeda
9 pages
Meiosis Lab
No ratings yet
Meiosis Lab
9 pages
Desktop Log
No ratings yet
Desktop Log
4 pages
Samsung WF Series Model Overview
100% (1)
Samsung WF Series Model Overview
14 pages
Gravidity and Parity
No ratings yet
Gravidity and Parity
11 pages
Placing Boom SPB32 SPB35 Bro2018 - 0
No ratings yet
Placing Boom SPB32 SPB35 Bro2018 - 0
4 pages
Instructions SC WD365 2021 3a
No ratings yet
Instructions SC WD365 2021 3a
5 pages
Plug Flow Reactor Saponification Study
No ratings yet
Plug Flow Reactor Saponification Study
87 pages
Rule Book BGMI Tournament
No ratings yet
Rule Book BGMI Tournament
7 pages
Casein Extraction & Glue Making
No ratings yet
Casein Extraction & Glue Making
3 pages
David Fang: Math and Tech Achievements
No ratings yet
David Fang: Math and Tech Achievements
2 pages
Vehicle/Tire/Road Dynamics: Handling, Ride, and NVH 1st Edition - Ebook PDF Download
100% (6)
Vehicle/Tire/Road Dynamics: Handling, Ride, and NVH 1st Edition - Ebook PDF Download
65 pages
Likas Kayang Pag-Unlad
No ratings yet
Likas Kayang Pag-Unlad
8 pages
Material Ledgers Actual Costing
100% (1)
Material Ledgers Actual Costing
19 pages
FS41 TheGreatImpostor
No ratings yet
FS41 TheGreatImpostor
4 pages
Ground-to-Components Wiring Index
No ratings yet
Ground-to-Components Wiring Index
2 pages
Vipa 67p-Pnl0 Manual
No ratings yet
Vipa 67p-Pnl0 Manual
55 pages
Aluminium ASTM-B209-1996 PDF
No ratings yet
Aluminium ASTM-B209-1996 PDF
31 pages

Share MBBS Lecture 5 (1) - 1

Uploaded by

Share MBBS Lecture 5 (1) - 1

Uploaded by

Measures of relationship

After this topic, you should be able to:

• Determine the strength of the relationships and

• Make inferences to the population from which

• Gives the equation of the straight line that best

• The dependent variable (y) is the variable to

• How does systolic blood pressure change as

• The equation of a straight line is

• The slope, b, is often called the regression

• It has the same sign as the correlation

• When there is no correlation between x and y,

• The predicted value, y’, is subject to sampling

• H0: There is no linear relationship between body fat

• R is the correlation between the two variables

1 Regression 9984.086 1 9984.086 489.903

Constant -35.197 2.462 -14.294 0.000

Abdomen 0.585 0.26 0.814 22.134 0.000

• Predicted body fat = constant + B x abdomen circum.

• There should be a linear relationship between the

• Plot the dependent variable against the

• Extend the principles learnt today to multiple

Sex yes no Total

• What is the odds of Renal dysfunction?

• Measures the strength of linear association

• The sample correlation coefficient is

• r = Covariance between X and Y /(Variance of

• This is called Pearson correlation coefficient

• r=-1 Strong negative linear relationship

• r=0 No linear relationship between X and Y

• r=+1 Strong positive linear relationshipAs the

You might also like