0% found this document useful (0 votes)
97 views125 pages

Biostatistics Nurses HND

The document outlines a course on Research Methodology and Biostatistics at Biaka University, covering topics such as data collection, statistical measures, hypothesis testing, and the use of software in research. It defines key concepts like biostatistics, variables, and data sources, and emphasizes the importance of both descriptive and inferential statistics in public health. The course includes practical sessions and exercises to reinforce understanding of statistical methods and their applications in health sciences.

Uploaded by

dccx6vvznh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views125 pages

Biostatistics Nurses HND

The document outlines a course on Research Methodology and Biostatistics at Biaka University, covering topics such as data collection, statistical measures, hypothesis testing, and the use of software in research. It defines key concepts like biostatistics, variables, and data sources, and emphasizes the importance of both descriptive and inferential statistics in public health. The course includes practical sessions and exercises to reinforce understanding of statistical methods and their applications in health sciences.

Uploaded by

dccx6vvznh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

RESEARCH METHODOLOGY

AND BIOSTATISTICS
(NUS226)
Biaka University
Institute of Buea (BUIB)

Instructor

Yoah Aldof Tah


PhD Public Health, UB(
)MSc. Epidemiology/Biostatistics
COURSE OUTLINE
INTRODUCTION

DATA

VARIABLES AND MEASUREMENT SCALES

MEASURES OF CENTRAL TENDENCY

MEASURES OF DISPERSION OR VARIABILITY

PRESENTATION/ORGANIZATION OF DATA

INFERENTIAL STATISTICS

HYPOTHESIS TESTING/P-VALUE

STATISTICAL TESTS

CALCULATING 95%CI FOR PROPORTIONS

SOFTWARES USED IN RESEARCH METHODOLOGY

PRACTICAL SESSION (QUANTITATIVE/QUALITATIVE ANALYSES)


2
TEXTBOOK(S)

BIOSTATISTICS FOR HEALTH SCIENCES (ETHIOPIAN PUBLIC -


HEALTH TRAINING INITIATIVE) BY GETU DEGU

MATHEMATICS FOR HEALTH SCIENCES BY WAYNE W. Daniel -

BMC PUBLIC HEALTH -

3
INTRODUCTION
“Biostatistics is the way to decision making in
Public Health”

4
 OBJECTIVES
 Define statistics
 Differentiate between descriptive and
inferential statistics
 Define biostatistics
 Define data and list some sources of data
 List the importance of statistics

5
……Discovering the topic
Below is the cholera vaccination status
:of 10 inhabitants of Muea Community
V, NV, V, V, NV, V, NV, NV, V, NV

V=Vaccinated, NV=Not vaccinated


Q: What can you say about cholera
?vaccination in the Muea Community
6
Introduction
Some Basic concepts
Statistics is a field of study concerned with
1- collection, organization, summarization
and analysis of data.
2- drawing of inferences about a body of data
when only a part of the data is observed.
Statisticians try to interpret and
communicate the results to others.

NB: Statistics is the science of collecting,


organizing, analyzing and interpreting
numerical data for understanding a
phenomenon or making wise decisions 7
Descriptive/Inferential
statistics
Descriptive- collection,
organization, summarization and
analysis of data. Describe in
.terms of PPT

Inferential- drawing of inferences


about a body of data when only a
.part of the data is observed
8
* Biostatistics:
The tools of statistics are employed in
many fields:
business, education, psychology,
agriculture, economics, … etc.

Definition: Biostatistics is the science of


collecting, organizing, analyzing and
interpreting numerical data from
biological, medical and public health for
understanding a phenomenon or
making wise decisions
9
Statistics: Method or
data
When it means statistical data it refers
to numerical descriptions of things. Thus
statistics of malaria cases in Buea include
.fever cases, number of positives obtained

Statistical methods' it refers to a body‘


of methods that are used for collecting,
organizing, analyzing and interpreting
numerical data for understanding a
phenomenon or making wise decisions
10
:Data
• The raw material of Statistics is
data.
• We may define data as figures.
Figures result from the process of
counting or from taking a
measurement.
• For example:
• - When a hospital administrator
counts the number of patients
(counting).
• - When a nurse weighs a patient
(measurement)
11
:Sources of Data *
We search for suitable data to serve as
the raw material for our investigation.
Such data are available from one or
more of the following sources:
1- Routinely kept records.
For example:
- Hospital medical records contain
immense amounts of information on
patients.
- Hospital accounting records contain a
wealth of data on the facility’s
business
- activities.
12
2- External sources.
The data needed to answer a
question may already exist in the
form of
published reports, commercially
available data banks, or the research
literature, i.e. someone else has
already asked the same question.

13
3- Surveys:
The source may be a survey, if the data
needed is about answering certain
questions.
For example:
If the administrator of a clinic wishes to
obtain information regarding the
mode of transportation used by
patients to visit the clinic,
then a survey may be conducted
among
patients to obtain this
information. 14
4- Experiments.
Frequently the data needed to
answer
a question are available only as the
result of an experiment.
For example:
If a nurse wishes to know which of
several strategies is best for
maximizing patient compliance,
she might conduct an experiment in
which the different strategies of
motivating compliance 15
Primary and secondary
data
,The statistical data may be classified under two categories
.depending upon the sources
Primary data 2) Secondary data )1

Primary Data: are those data, which are collected by


the investigator himself for the purpose of a specific
inquiry More reliable and accurate or study,
.generated from survey and research

Secondary Data: When an investigator uses data,


which have already been collected by others, such data
are called
-."Secondary Data"
.less expensive to collect both in money and time
16
:A variable *
It is a characteristic that takes on
different values in different
persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental
clinic.

17
* A population:
It is the largest collection of values
of a random variable for which we
have an interest at a particular
time.
For example:
The weights of all the children
enrolled in a certain elementary
school.
Populations may be finite or infinite.
18
* A sample:
It is a part of a population.
For example:
The weights of only a fraction of
these children.

Biostatisticians try as much as


possible to make their samples
representative

19
Biostatistics
More and more things are now measured -
quantitatively in medicine, physiotherapy and
public health

There is a great deal of intrinsic (inherent) -


.variation in most biological processes

The planning, conduct, and interpretation of much -


of medical research are becoming increasingly
reliant on statistical Technology

Statistics pervades the medical literature -

Statistics pervades a way of organizing -


information on a wider and more formal basis 20
Exercise
1- Define the terms: statistics,
biostatistics, variable, data, sample.

[Link] 3 sources of data in health


sciences.

[Link] between primary and


secondary data.

21
B- VARIABLES AND
MEASUREMENT
SCALES

22
:A variable *
It is a characteristic that takes on
different values in different
persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental
clinic.

23
Types of variables
QuantitativeQualitative

Quantitative Qualitative (categorical)


Variables Variables
It can be measured Many characteristics
in the usual are not capable of
sense. being measured.
For example: Some of them can be
- the heights of ordered or ranked.
adult males,
- the weights of For example:
preschool - classification of people
children, into socio-economic
groups,
- the ages of
patients seen in a - social classes based on 24
income, education, etc.
- dental clinic.
Types of quantitative variables
Discrete Continuous
A discrete variable A continuous variable
Quantitative variable can assume any value within
that takes only a specified relevant interval
integers (whole of values assumed by the
numbers) variable.
For example: For example: Height,
- The number of daily weight,
admissions to a Interval scale: interval
general hospital, between values are same.
- The number of But the scale is not a Ratio
decayed, missing or e.g Temp(oC/F) and
filled teeth per child calendar year
- in an Ratio scale: intervals are
- elementary same and data has
- school. meaningful ratio e.g age25
and weight
Categorical variables

Norminal • Ordinal variables •


variable Categorical
Categorical variables with a
variables that logical order of
have no logical importance or
order e.g sex severity e.g
(M,F) stages of cancer,
HIV

26
MEASUREMENT
SCALES

NORMINAL SCALE INTERVAL SCALE

ORDINAL SCALE RATIO SCALE

27
Types of variables
•Independent variable
– The “cause” or the variable thought to influence
the dependent variable in experimental research it
is the variable manipulated by the researcher.
•Dependent variable
– The “effect” a response or behavior that is
influenced by the independent variable;
sometimes called criterion variable.

28
Exercise
Identify the type of data (nominal, ordinal, interval and
ratio) represented
by each of the following. Confirm your answers by
giving your own
.examples
Blood group .1
Temperature (Celsius) .2
Ethnic group .3
Job satisfaction index (1-5) .4
Number of heart attacks .5
Calendar year .6
Serum uric acid (mg/100ml) .7
WHO clinical stages of HIV/AIDS (I-IV) .8
29
Classify into Norminal,
ordinal, interval and ratio
Blood type (A, AB, O, B) -
BP (High, Normal) -
)( Weight -
Gender (M,F) -
)( Distance -
Temperature (oC or F) -

30
Descriptive Statistics
Measures of Central
Tendency
(Covered in previous
courses)
Descriptive Statistics
Measures of Central
Tendency
key words:
Descriptive Statistic, measure of
central tendency ,statistic, parameter,
mean (μ) ,median, mode.

33
The Statistic and The
Parameter
• A Statistic:
It is a descriptive measure computed from the
data of a sample.
• A Parameter:
It is a a descriptive measure computed from
the data of a population.
Since it is difficult to measure a parameter from the
population, a sample is drawn of size n, whose
values are  1 ,  2 , …,  n. From this data, we measure
the statistic.
34
Measures of Central
Tendency
A measure of central tendency is a measure which
indicates where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the
Mode.
The Mean:
It is the average of the data.

35
The Population Mean:
N

=
X
i 1
i
which is usually unknown, then we use the
N
sample mean to estimate or approximate it.
The Sample Mean:

x
= n

Example: x
i 1
i
/n
Here is a random sample of size 10 of ages, where
 1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,
 6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37.

x = (42 + 28 + … + 37) / 10 = 36.6


NB: Look for the differences between arithmetic and
geometric mean.
36
Properties of the Mean:
• Uniqueness. For a given set of data there is
one and only one mean.
• Simplicity. It is easy to understand and to
compute.
• Affected by extreme values (Outliers).
Since all values enter into the computation.
Example: Assume the values are 115, 110, 119, 117, 121
and 126. The mean = 118.
But assume that the values are 75, 75, 80, 80 and 280. The
mean = 118, a value that is not representative of the set of
data as a whole.
37
The Median:
When ordering the data, it is the observation that divide the
set of observations into two equal parts such that half of
the data are before it and the other are after it.
* If n is odd, the median will be the middle of observations. It
will be the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
* If n is even, there are two middle observations. The median
will be the mean of these two middle observations. It will
be the (n+1)/2 th ordered observation.
When n = 12, then the median is the 6.5th observation, which
is an observation halfway between the 6th and 7th ordered
observation.

For grouped data:

38
Example:
For the same random sample, the ordered
observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5 th
observation, i.e. = (32+34)/2 = 33.
Properties of the Median:
• Uniqueness. For a given set of data there is
one and only one median.
• Simplicity. It is easy to calculate.
• It is not affected by extreme values as
is the mean.
39
The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode.
Example:
For the same random sample, the value 28 is
repeated two times, so it is the mode.
Properties of the Mode:
• Sometimes, it is not unique.
• It may be used for describing qualitative
data.
40
Geometric mean
It is obtained by taking the nth root of the product of “n” values,
i.e, if the values of the observation are demoted by x1,x2 ,…,x n
then,
GM = n√(x1)(x2)….(xn) .
The geometric mean is preferable to the arithmetic mean if the
series of observations contains one or more unusually large
values.
The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of individual values. The actual process
involves obtaining logarithm of each value, adding them and
dividing the sum by the number of observations. The quotient so
obtained is then looked up in the tables of anti-logarithms which
will give us the geometric mean.

GM = ∑(log xi)/n 41
Normal distribution
When the mean, median and mode of a data
set is same, we say the data is Normally
Distributed

Mean
Exploring a dataset permits the statistician to determine a
the variables are normally distributed or not. The net effect
42
is
to determine if we are to use parametric or non-parametric
Skewness
Skewness: If extremely low or extremely high
observations are present in a distribution, then the mean
tends to shift towards those scores. Based on the type of
skewness, distributions can be:
a) Negatively skewed distribution: occurs when majority
of scores are at the right end of the curve and a few small
scores are scattered at the left end.
b) Positively skewed distribution: Occurs when the
majority of scores are at the left end of the curve and a few
extreme large scores are scattered at the right end.
c) Symmetrical distribution: It is neither positively nor
negatively skewed. A curve is symmetrical if one half of
the curve is the mirror image of the other half.

43
Skewness

44
Exercise

;Calculate
a) Mean IP
b) Modal IP
C) Median IP

45
Exercise: compute mean, mode
and median

46
Descriptive Statistics
Measures of Dispersion
Descriptive Statistics
Measures of Dispersion
key words:
Descriptive Statistic, measure of
dispersion , range ,variance, coefficient of
variation.

49
2.5. Descriptive Statistics –
Measures of Dispersion:
• A measure of dispersion conveys information regarding
the amount of variability present in a set of data.
• Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
[Link] the values close to each other
→The amount of Dispersion small.
b) If the values are widely scattered
→ The Dispersion is greater.

50
.Example
• ** Measures of Dispersion are :
[Link] (R).
1b. Interquartile Range (IQR)
2. Variance.
3. Standard deviation.
[Link] of variation (C.V).

51
[Link] Range (R):
• Range =Largest value- Smallest value =
xL  xS
• Note:
• Range concern only onto two values
• Example :

• Data:
• 43,66,61,64,65,38,59,57,57,50.
• Find Range?
• Range=66-38=28

52
Interquartile Range (IQR)
IQR = 3rd Q – 1st Q
IQR = [3/4(n+1)th] – [1/4(n+1)th]

53
IQR

54
Compute a)Range and b)IQR •
for the data set below

19 ,30 ,23,18 ,28 ,20 •

55
[Link] Variance:
• It measure dispersion relative to the scatter of the values
a bout there mean.
2
a) Sample Variance ( S ) :
n

•  (x  x) i ,where x is sample mean


2

S2  i 1

n 1

• Example:
• Refer previous example
• Find Sample Variance of ages , x = 56
• Solution:
• S2= [(43-56) 2 +(66-43) 2+…..+(50-56) 2 ]/ 10
• = 900/10 = 90

56
• b)Population Variance ( 2 ) :
N

• 
2


i1where , is Population mean
( xi 

N
)2

[Link] Standard Deviation:


• is the square root of variance= Varince
2
a) Sample Standard Deviation = S = S

b) Population Standard Deviation = σ =  2

57
[Link] Coefficient of Variation
(C.V):
• Is a measure use to compare the
dispersion in two sets of data which is
independent of the unit of the
measurement .
S
• C .
V 
X
(100) where S: Sample standard

deviation.
• X : Sample mean.

58
: Example
• Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound

59
• We wish to know which is more variable.
• Solution:
• c.v (Sample1)= (10/145)*100= 6.9

• c.v (Sample2)= (10/80)*100= 12.5

• Then age of 11-years old(sample2) is more


variation

60
Exercise
Compute standard deviation

61
Application

When your data is normally distributed (few


outliers), use
Mean ± SD

If the data is skewed (many outliers), use


Median ± IQR

NB: Median and IQR are less affected or


.influence by outlier
62
QUIZ
Consider the number of polio vaccine doses received by 10
randomly selected children aged less than 5 years in
.Muyuka Health District in 2018
1 ,5 ,2 ,1 ,0 ,0 ,4 ,3 ,1 ,0
:Compute the following
a)Mean
b)Median
c)mode
d)Range
e)Standard deviation and
f)Coefficient of variation

63
DATA PRESENTATION

64
STATISTICAL TABLES
A statistical table is :Simple or one-way table
an orderly and The simple frequency
systematic table is used
presentation of when the individual
numerical observations involve only
data in rows and to a single variable
.columns

65
One-way table

What do you
?observe

66
Two-way table
What do you
?observe

67
Graphs or charts
Bar chart: represent
and compares
frequencies of
categorical variables

68
Histogram
It is a graph of
frequency for
continuous
variables

69
Pie chart
It is a circle divided
into sectors
proportional to the
various frequencies

70
Line Diagram
The line graph is especially
useful for the study of some
variables
according to the passage of
.time

The time, in weeks, months


or years is marked along
the horizontal axis; and the
value of the quantity that is
being studied is marked on
.the vertical axis

71
TUTORIAL SHEET I
Using sample data to make
estimates about population
parameters
 Key words:

Point estimate, interval estimate,


estimator,
Confident level ,α , Confident interval for
mean μ, Confident interval for two means,
Confident interval for population
proportion P,
Confident interval for two proportions
Text Book : Basic Concepts
and Methodology for the
Health Sciences 74
 6.1 Introduction:
 Statistical inference is the procedure by which
we reach to a conclusion about a population on
the basis of the information contained in a
sample drawn from that population.
 Suppose that:
 an administrator of a large hospital is
interested in the mean age of patients admitted
to his hospital during a given year.
1. It will be too expensive to go through the
records of all patients admitted during that
particular year.
2. He consequently elects to examine a sample of
the records from which he can compute an
estimate of the mean age of patients admitted
to his that year.
Text Book : Basic Concepts
and Methodology for the
Health Sciences 75
• To any parameter, we can compute two types of
estimate: a point estimate and an interval
estimate.
 A point estimate is a single numerical value
used to estimate the corresponding population
parameter.
 An interval estimate consists of two numerical
values defining a range of values that, with a
specified degree of confidence, we feel includes
the parameter being estimated.
 The Estimate and The Estimator:
 The estimate is a single computed value, but the
estimator is the rule that tell us how to compute

x  xi
this value, or estimate.
 For example, i


is an estimator of the population mean,. The
single numerical value that results from
evaluating thisTextformula
Book : Basic is called an
Concepts

estimate of theand parameter


Methodology for the
Health Sciences . 76
Confidence Interval for a
:Population proportion (P)
A sample is drawn from the population of
interest ,then compute the sample P̂ proportion
such as
no. of element in the sample with some charachtaristic a
pˆ  
Total no. of element in the sample n

This sample proportion is used as the point


estimator of the population proportion . A
confident interval is obtained by the following
Pˆ (1  P
ˆ)
formula Pˆ Z
 1
2 n

Text Book : Basic Concepts


and Methodology for the
Health Sciences 77
Example
The Pew internet life project reported in 2003 that
18%
of internet users have used the internet to search for
information regarding experimental treatments or
medicine . The sample consist of 1220 adult internet
users, and information was collected from telephone
interview. We wish to construct 98% C.I for the
proportion of internet users who have search for
information about experimental treatments or
medicine

Text Book : Basic Concepts


and Methodology for the
Health Sciences 78
: Solution
1-α =0.98 → α = 0.02 → α/2 =0.01 → 1- α/2 = 0.99
18
Z 1- α/2 = Z 0.99 =2.33 , n=1220, pˆ  100  0.18
The 98% C. I is
ˆ (1  P
P ˆ) 0.18(1  0.18)
ˆ Z
P 0.18 2.33

1
2 n 1220

0.18 ± 0.0256 = ( 0.1544 , 0.2056 )

Text Book : Basic Concepts


and Methodology for the
Health Sciences 79
Using sample statistics to
Test Hypotheses
about population parameters
 Key words :

 Null hypothesis H0, Alternative hypothesis HA , testing


hypothesis , test statistic , P-value

Text Book : Basic Concepts and 81


Methodology for the Health Sciences
Hypothesis Testing

 One type of statistical inference, estimation,


was discussed in the previous Chapter.

 The other type ,hypothesis testing ,is discussed


in this chapter.

Text Book : Basic Concepts and 82


Methodology for the Health Sciences
Definition of a hypothesis

 It is a tentative statement that needs to be tested.

It is usually concerned with the parameters of the


population. e.g. the hospital administrator may
want to test the hypothesis that the average
length of stay of patients admitted to the
hospital is 5 days
Text Book : Basic Concepts and 83
Methodology for the Health Sciences
Definition of Statistical hypotheses
 They are hypotheses that are stated in such a way that
they may be evaluated by appropriate statistical
techniques.
 There are two hypotheses involved in hypothesis
testing
 Null hypothesis H0: It is the hypothesis to be tested .
 Alternative hypothesis HA : It is a statement of what
we believe is true if our sample data cause us to reject
the null hypothesis

Text Book : Basic Concepts and 84


Methodology for the Health Sciences
Testing a hypothesis about the mean
:of a population
 We have the following steps:
[Link]: determine variable, sample size (n), sample
mean( x ) , population standard deviation or sample
standard deviation (s) if is unknown
2. Assumptions : We have two cases:
 Case1: Population is normally or approximately

normally distributed with known or unknown


variance (sample size n may be small or large),
 Case 2: Population is not normal with known or

unknown variance (n is large i.e. n≥30).

Text Book : Basic Concepts and 85


Methodology for the Health Sciences
 [Link]:
 we have three cases
 Case I : H0: μ=μ0
HA: μ

μ0
 e.g. we want to test that the population mean is
different than 50
 Case II : H0: μ = μ0
HA: μ > μ0
 e.g. we want to test that the population mean is
greater than 50
 Case III : H0: μ = μ0
HA: μ< μ0

e.g. we want to test that the population mean is less
than 50

Text Book : Basic Concepts and 86


Methodology for the Health Sciences
[Link] Statistic:
 Case 1: population is normal or approximately
normal

σ2 is known σ2 is unknown
( n large or small)
X - o n large n small
Z X - o
 Z 
X - o T 
n s s
n n

 Case2: If population is not normally distributed and n is


large
 i)If σ2 is known ii) If σ2 is unknown
X - o X - o
Z 
 Text Book : Basic Concepts and Z  87
n s
Methodology for the Health Sciences
n
[Link] Rule:
i) If HA: μ μ0
 Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1
) when use T- test(
 __________________________

 ii) If H : μ> μ
A 0
 Reject H if Z>Z
0 1-α (when use Z - test)

Or Reject H0 if T>t1-α,n-1 (when use T - test)

Text Book : Basic Concepts and 88


Methodology for the Health Sciences
 iii) If HA: μ< μ0
Reject H0 if Z< - Z1-α (when use Z - test)
Or
Reject H0 if T<- t1-α,n-1 (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table D
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n-1) degree of freedom (df)
Text Book : Basic Concepts and 89
Methodology for the Health Sciences
 [Link] :
 If we reject H0, we can conclude that HA is
true.
 If ,however ,we do not reject H 0, we may
conclude that H0 is true.

Text Book : Basic Concepts and 90


Methodology for the Health Sciences
An Alternative Decision Rule using the
p - value Definition
 The p-value is defined as the smallest value of
α for which the null hypothesis can be
rejected.
 If the p-value is less than or equal to α ,we
reject the null hypothesis (p ≤ α)
 If the p-value is greater than α ,we do not
reject the null hypothesis (p > α)

Text Book : Basic Concepts and 91


Methodology for the Health Sciences
STATISTICAL TESTS
- Normality
- Nature of dependent
variable

92
STATISTICAL TESTS
Presentation on t-test and
ANOVA

93
CORRELATION AND
REGRESSION
Relationship between two
continuous variables

94
Introduction
Correlation measures the closeness
(strength) and direction of the
,association

while linear regression gives the


equation of the straight line that best
describes it and enables the prediction
.of one variable from the other

95
REGRESSION Regression and Correlation are all statistical
CORRELATION techniques that use the idea that one variable say, may
be related to one or more variables through an
.equation
EQUATION OF
REGRESSION
Here we consider the relationship of two variables only
in a linear form, which is called linear regression and
.linear correlation; or simple regression and correlation

The relationships between more than two variables,


called multiple regression and correlation will be
.considered later

The related method of correlation is used to measure


how strong the relationship is between the two
.variables is
96

96
:Simple Linear Regression •
Suppose that we are interested in a variable Y, but we
want to know about its relationship to another
variable X or we want to use X to predict (or
Line of Regression estimate) the value of Y that might be obtained
DEPENDENT VARIABLE without actually measuring it, provided the
relationship between the two can be expressed by
INDEPENDENT VARIABLE
a line.’ X’ is usually called the independent
.variable and ‘Y’ is called the dependent variable
TWO RANDOM VARIABLE
OR We assume that the values of variable X are either
BIVARIATE fixed or random. By fixed, we mean that the
RANDOM values are chosen by researcher--- either an
VARIABLE experimental unit (patient) is given this value of X
(such as the dosage of drug or a unit (patient) is
.chosen which is known to have this value of X
By random, we mean that units (patients) are chosen
at random from all the possible units,, and both
.variables X and Y are measured
We also assume that for each value of x of X, there is
a whole range or population of possible Y values
and that the mean of the Y population at X = x,
,denoted by µy/x , is a linear function of x. That is

µy/x = α +βx

97
.Estimate α and β
Predict the value of Y at a
ESTIMATION .given value x of X
We select a sample of
n observations (xi,yi) Make tests to draw
,from the population conclusions about the model
WITH
the goals
.and its usefulness

We estimate the parameters


α and β by ‘a’ and ‘b’
respectively by using sample
:regression line
Ŷ = a+ bx
Where we calculate

98
ESTIMATION AND CALCULATION OF CONSTANTS , ‘’a’’ AND ‘’b’’

=B

99
EXAMPLE
investigators at a sports health centre are
interested in the relationship between
oxygen consumption and exercise time in
athletes recovering from injury. Appropriate
mechanics for exercising and measuring
oxygen consumption are set up, and the
:results are presented below
x variable

100
exercise time y variable
)min( oxygen consumption

0.5 620
1.0 630
1.5 800
2.0 840
2.5 840
3.0 870
3.5 1010
4.0 940
4.5 950
5.0 1130

101
calculations

o
r

102
Pearson’s Correlation Coefficient

• With the aid of Pearson’s correlation


coefficient (r), we can determine the strength
and the direction of the relationship between
X and Y variables,
• both of which have been measured and they
must be quantitative.
• For example, we might be interested in
examining the association between height
and weight for the following sample of eight
children:

103
Height and weights of 8 children
Child Height(inches)X Weight(pounds)Y

A 49 81
B 50 88
C 53 87
D 55 99
E 60 91
F 55 89
G 60 95
H 50 90
Average )inches 54 = ( )pounds 90 = (

104
Scatter plot for 8 babies
heig ht weig ht

49 81
50 88
53 83
120
55 99
60 91
100

55 89
80
60 95
50 9060
1‫سسلة‬
‫مت ل‬

40

20

0
0 10 20 30 40 50 60 70

105
Table : The Strength of a Correlation


• Value of r (positive or negative) Meaning
• _________________________________________________
______

• 0.00 to 0.19 A very weak correlation
• 0.20 to 0.39 A weak correlation
• 0.40 to 0.69 A modest correlation
• 0.70 to 0.89 A strong correlation
• 0.90 to 1.00 A very strong correlation
• _________________________________________________
_______

106
FORMULA FOR CORRELATION
COEFFECIENT ( r )

• With Pearson’s r,
• means that we add the products of the deviations to see if the
positive products or negative products are more abundant and
sizable. Positive products indicate cases in which the variables
go in the same direction (that is, both taller or heavier than
average or both shorter and lighter than average);
• negative products indicate cases in which the variables go in
opposite directions (that is, taller but lighter than average or
shorter but heavier than average).

107
Computational Formula for Pearsons’s Correlation Coefficient• r

Where SP (sum of the product), SSx (Sum of


the squares for x) and SSy (sum of the squares
for y) can be computed as follows:

108
XY Y2 X2 Y X Child

144 14412 144A 12


80 64 100 8 10 B
72 144 36 12 6 C
176 121 256 11 16 D
80 64 10010 8E
72 64 81 8 9 F
192 256 144 16 12 G
165 225 121 15 11 H

981 1118 946 92 84 ∑

109
Table 2 : Chest circumference and Birth
Weight of 10 babies

• X(cm) y(kg) x2 y2 xy
• ___________________________________________________
• 22.4 2.00 501.76 4.00 44.8
• 27.5 2.25 756.25 5.06 61.88
• 28.5 2.10 812.25 4.41 59.85
• 28.5 2.35 812.25 5.52 66.98
• 29.4 2.45 864.36 6.00 72.03
• 29.4 2.50 864.36 6.25 73.5
• 30.5 2.80 930.25 7.84 85.4
• 32.0 2.80 1024.0 7.84 89.6
• 31.4 2.55 985.96 6.50 80.07
• 32.5 3.00 1056.25 9.00 97.5
• TOTAL
• 292.1 24.8 8607.69 62.42 731.61

110
Checking for significance

• There appears to be a strong between chest circumference


and birth weight in babies.
• We need to check that such a correlation is unlikely to have
arisen by in a sample of ten babies.
• Tables are available that gives the significant values of this
correlation ratio at two probability levels.
• First we need to work out degrees of freedom. They are the
number of pair of observations less two, that is (n – 2)= 8.
• Looking at the table we find that our calculated value of 0.86
exceeds the tabulated value at 8 df of 0.765 at p= 0.01. Our
correlation is therefore statistically highly significant.

111
Exercise
Question 6: Correlation and Regression for Health
Sciences
Resting metabolic rate (RMR) is related with body weight.
.Consider the table and scatter plot below

112
Exercise (cont’d)

113
Exercise (cont’d)
.a) Interprete the scatter plot above
b) Interprete the correlation between
Body weight and RMR (given that the
Pearson correlation coefficient, r = 0.987;
p<0.001)
c) Determine the linear regression
equation between body weight and RMR
(from the scatter plot above)
.d) Interprete the regression equation
114
Analysis of Frequency Data
An Introduction to the Chi-
Square
Distribution
TESTS OF INDEPENDENCE
 To test whether two criteria of classification
are independent . For example
socioeconomic status and area of residence
of people in a city are independent.
 We divide our sample according to status,
low, medium and high incomes etc. and the
same samples is categorized according to
urban, rural or suburban and slums etc.
 Put the first criterion in columns equal in
number to classification of 1st criteria
( Socioeconomic status) and the 2nd in rows,
where the no. of rows equal to the no. of
Text Book : Basic Concepts and
categories ofMethodology
2nd criteria (areas of cities).
for the Health
Sciences 116
The Contingency Table
 Table Two-Way Classification of
sample
Second First Criterion of Classification →
↓ Criterion
1 2 3 ..… c Total
1 N11 N12 N13 …… N1c .N1
2 N21 N22 N 23 …… N2c .N2

3 N31 N32 N33 ...… N3c .N3

. . . . …… . .
. . . . . .

r Nr1 Nr2 Nr3 N rc . Nr

Total N.1 N.2 N.3 …… N.c N


Text Book : Basic Concepts and
Methodology for the Health
Sciences 117
Observed versus Expected
Frequencies

 Oi j : The frequencies in ith row and jth column


given in any contingency table are called
observed frequencies that result form the cross
classification according to the two classifications.
 eij:Expected frequencies on the assumption of
independence of two criterion are calculated by
multiplying the marginal totals of any cell and
then dividing by total frequency
Formula:
N N

( ( )i j
eij

N 118
Chi-square Test
 After the calculations of expected frequency,
Prepare a table for expected frequencies and use
Chi-square
2
(o  e )
  [
2 k i i
i 1 ]
ei
Where summation is for all values of r xc = k
cells.
 D.F.: the degrees of freedom for using the table
are (r-1)(c-1) for α level of significance
 Note that the test is always one-sided.
Text Book : Basic Concepts and
Methodology for the Health
Sciences 119
Example 12.401(page 613)
The researcher are interested to determine that
preconception use of folic acid and race are
independent. The data is:
Observed Frequencies Table Expected
frequencies Table Yes no Total
Use of Acid total White 636/)559()282( /)559()354( 559
Folic 636
No 247.86=
Yes 311.14=
Black 636/)56()282( 56
White 260 299 559
)559()354(
Black 15 41 56
24.83= =
Other 7 14 21
Other )21(()282( 31.17 21
Total 282 354 636 s
Text Book : Basic Concepts and
Methodology for9.31 =
the Health 21x354/636
Sciences 120
11.69=
Calculations and Testing
Data: See the given table 

Assumption: Simple random sample 

Hypothesis: H0: race and use of folic acid are 


independent
HA: the two variables are not independent. Let α
= 0.05
The test statistic is Chi Square given earlier 

Distribution when H0 is true chi-square is valid with (r- 


.1)(c-1) = (3-1)(2-1)= 2 d.f

2

Decision Rule: Reject H0 if value of is greater 


2
than
 , ( r  1)( c  1)
5.991 =
 (260 247.86) / 247.86  (299 311 .14) / 311.14
2 2 2

:Calculations 
2
 .....  (14 11 .69) / 11 .69 9.091

Text Book : Basic Concepts and


Methodology for the Health
Sciences 121
Conclusion
Statistical decision. We reject H0 since 9.08960> 

5.991

Conclusion: we conclude that H0 is false, and that 

there is a relationship between race and


.preconception use of folic acid
P value. Since 7.378< 9.08960< 9.210, 

0.01<p <0.025
We also reject the hypothesis at 0.025 level of 

.significance but do not reject it at 0.01 level

Text Book : Basic Concepts and


Methodology for the Health
Sciences 122
Softwares used in research
methodology and
Biostatistics
,Quantitative :
,Qualitative
,Referencing
,Anti-plagiarism
Grammatical error corrector
)Give an example for each(

123
PRACTICAL SESSION
BUIB PROJECT
ANALYSIS

124
WELCOME TO THE WORLD OF
DECISION MAKING IN PUBLIC
HEALTH

125

You might also like