Biostatistics Nurses HND
Biostatistics Nurses HND
AND BIOSTATISTICS
(NUS226)
Biaka University
Institute of Buea (BUIB)
Instructor
DATA
PRESENTATION/ORGANIZATION OF DATA
INFERENTIAL STATISTICS
HYPOTHESIS TESTING/P-VALUE
STATISTICAL TESTS
3
INTRODUCTION
“Biostatistics is the way to decision making in
Public Health”
4
OBJECTIVES
Define statistics
Differentiate between descriptive and
inferential statistics
Define biostatistics
Define data and list some sources of data
List the importance of statistics
5
……Discovering the topic
Below is the cholera vaccination status
:of 10 inhabitants of Muea Community
V, NV, V, V, NV, V, NV, NV, V, NV
13
3- Surveys:
The source may be a survey, if the data
needed is about answering certain
questions.
For example:
If the administrator of a clinic wishes to
obtain information regarding the
mode of transportation used by
patients to visit the clinic,
then a survey may be conducted
among
patients to obtain this
information. 14
4- Experiments.
Frequently the data needed to
answer
a question are available only as the
result of an experiment.
For example:
If a nurse wishes to know which of
several strategies is best for
maximizing patient compliance,
she might conduct an experiment in
which the different strategies of
motivating compliance 15
Primary and secondary
data
,The statistical data may be classified under two categories
.depending upon the sources
Primary data 2) Secondary data )1
17
* A population:
It is the largest collection of values
of a random variable for which we
have an interest at a particular
time.
For example:
The weights of all the children
enrolled in a certain elementary
school.
Populations may be finite or infinite.
18
* A sample:
It is a part of a population.
For example:
The weights of only a fraction of
these children.
19
Biostatistics
More and more things are now measured -
quantitatively in medicine, physiotherapy and
public health
21
B- VARIABLES AND
MEASUREMENT
SCALES
22
:A variable *
It is a characteristic that takes on
different values in different
persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental
clinic.
23
Types of variables
QuantitativeQualitative
26
MEASUREMENT
SCALES
27
Types of variables
•Independent variable
– The “cause” or the variable thought to influence
the dependent variable in experimental research it
is the variable manipulated by the researcher.
•Dependent variable
– The “effect” a response or behavior that is
influenced by the independent variable;
sometimes called criterion variable.
28
Exercise
Identify the type of data (nominal, ordinal, interval and
ratio) represented
by each of the following. Confirm your answers by
giving your own
.examples
Blood group .1
Temperature (Celsius) .2
Ethnic group .3
Job satisfaction index (1-5) .4
Number of heart attacks .5
Calendar year .6
Serum uric acid (mg/100ml) .7
WHO clinical stages of HIV/AIDS (I-IV) .8
29
Classify into Norminal,
ordinal, interval and ratio
Blood type (A, AB, O, B) -
BP (High, Normal) -
)( Weight -
Gender (M,F) -
)( Distance -
Temperature (oC or F) -
30
Descriptive Statistics
Measures of Central
Tendency
(Covered in previous
courses)
Descriptive Statistics
Measures of Central
Tendency
key words:
Descriptive Statistic, measure of
central tendency ,statistic, parameter,
mean (μ) ,median, mode.
33
The Statistic and The
Parameter
• A Statistic:
It is a descriptive measure computed from the
data of a sample.
• A Parameter:
It is a a descriptive measure computed from
the data of a population.
Since it is difficult to measure a parameter from the
population, a sample is drawn of size n, whose
values are 1 , 2 , …, n. From this data, we measure
the statistic.
34
Measures of Central
Tendency
A measure of central tendency is a measure which
indicates where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the
Mode.
The Mean:
It is the average of the data.
35
The Population Mean:
N
=
X
i 1
i
which is usually unknown, then we use the
N
sample mean to estimate or approximate it.
The Sample Mean:
x
= n
Example: x
i 1
i
/n
Here is a random sample of size 10 of ages, where
1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31,
6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37.
38
Example:
For the same random sample, the ordered
observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5 th
observation, i.e. = (32+34)/2 = 33.
Properties of the Median:
• Uniqueness. For a given set of data there is
one and only one median.
• Simplicity. It is easy to calculate.
• It is not affected by extreme values as
is the mean.
39
The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode.
Example:
For the same random sample, the value 28 is
repeated two times, so it is the mode.
Properties of the Mode:
• Sometimes, it is not unique.
• It may be used for describing qualitative
data.
40
Geometric mean
It is obtained by taking the nth root of the product of “n” values,
i.e, if the values of the observation are demoted by x1,x2 ,…,x n
then,
GM = n√(x1)(x2)….(xn) .
The geometric mean is preferable to the arithmetic mean if the
series of observations contains one or more unusually large
values.
The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of individual values. The actual process
involves obtaining logarithm of each value, adding them and
dividing the sum by the number of observations. The quotient so
obtained is then looked up in the tables of anti-logarithms which
will give us the geometric mean.
GM = ∑(log xi)/n 41
Normal distribution
When the mean, median and mode of a data
set is same, we say the data is Normally
Distributed
Mean
Exploring a dataset permits the statistician to determine a
the variables are normally distributed or not. The net effect
42
is
to determine if we are to use parametric or non-parametric
Skewness
Skewness: If extremely low or extremely high
observations are present in a distribution, then the mean
tends to shift towards those scores. Based on the type of
skewness, distributions can be:
a) Negatively skewed distribution: occurs when majority
of scores are at the right end of the curve and a few small
scores are scattered at the left end.
b) Positively skewed distribution: Occurs when the
majority of scores are at the left end of the curve and a few
extreme large scores are scattered at the right end.
c) Symmetrical distribution: It is neither positively nor
negatively skewed. A curve is symmetrical if one half of
the curve is the mirror image of the other half.
43
Skewness
44
Exercise
;Calculate
a) Mean IP
b) Modal IP
C) Median IP
45
Exercise: compute mean, mode
and median
46
Descriptive Statistics
Measures of Dispersion
Descriptive Statistics
Measures of Dispersion
key words:
Descriptive Statistic, measure of
dispersion , range ,variance, coefficient of
variation.
49
2.5. Descriptive Statistics –
Measures of Dispersion:
• A measure of dispersion conveys information regarding
the amount of variability present in a set of data.
• Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
[Link] the values close to each other
→The amount of Dispersion small.
b) If the values are widely scattered
→ The Dispersion is greater.
50
.Example
• ** Measures of Dispersion are :
[Link] (R).
1b. Interquartile Range (IQR)
2. Variance.
3. Standard deviation.
[Link] of variation (C.V).
51
[Link] Range (R):
• Range =Largest value- Smallest value =
xL xS
• Note:
• Range concern only onto two values
• Example :
•
• Data:
• 43,66,61,64,65,38,59,57,57,50.
• Find Range?
• Range=66-38=28
52
Interquartile Range (IQR)
IQR = 3rd Q – 1st Q
IQR = [3/4(n+1)th] – [1/4(n+1)th]
53
IQR
54
Compute a)Range and b)IQR •
for the data set below
55
[Link] Variance:
• It measure dispersion relative to the scatter of the values
a bout there mean.
2
a) Sample Variance ( S ) :
n
S2 i 1
n 1
• Example:
• Refer previous example
• Find Sample Variance of ages , x = 56
• Solution:
• S2= [(43-56) 2 +(66-43) 2+…..+(50-56) 2 ]/ 10
• = 900/10 = 90
56
• b)Population Variance ( 2 ) :
N
•
2
i1where , is Population mean
( xi
N
)2
57
[Link] Coefficient of Variation
(C.V):
• Is a measure use to compare the
dispersion in two sets of data which is
independent of the unit of the
measurement .
S
• C .
V
X
(100) where S: Sample standard
deviation.
• X : Sample mean.
58
: Example
• Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
59
• We wish to know which is more variable.
• Solution:
• c.v (Sample1)= (10/145)*100= 6.9
60
Exercise
Compute standard deviation
61
Application
63
DATA PRESENTATION
64
STATISTICAL TABLES
A statistical table is :Simple or one-way table
an orderly and The simple frequency
systematic table is used
presentation of when the individual
numerical observations involve only
data in rows and to a single variable
.columns
65
One-way table
What do you
?observe
66
Two-way table
What do you
?observe
67
Graphs or charts
Bar chart: represent
and compares
frequencies of
categorical variables
68
Histogram
It is a graph of
frequency for
continuous
variables
69
Pie chart
It is a circle divided
into sectors
proportional to the
various frequencies
70
Line Diagram
The line graph is especially
useful for the study of some
variables
according to the passage of
.time
71
TUTORIAL SHEET I
Using sample data to make
estimates about population
parameters
Key words:
is an estimator of the population mean,. The
single numerical value that results from
evaluating thisTextformula
Book : Basic is called an
Concepts
σ2 is known σ2 is unknown
( n large or small)
X - o n large n small
Z X - o
Z
X - o T
n s s
n n
ii) If H : μ> μ
A 0
Reject H if Z>Z
0 1-α (when use Z - test)
92
STATISTICAL TESTS
Presentation on t-test and
ANOVA
93
CORRELATION AND
REGRESSION
Relationship between two
continuous variables
94
Introduction
Correlation measures the closeness
(strength) and direction of the
,association
95
REGRESSION Regression and Correlation are all statistical
CORRELATION techniques that use the idea that one variable say, may
be related to one or more variables through an
.equation
EQUATION OF
REGRESSION
Here we consider the relationship of two variables only
in a linear form, which is called linear regression and
.linear correlation; or simple regression and correlation
96
:Simple Linear Regression •
Suppose that we are interested in a variable Y, but we
want to know about its relationship to another
variable X or we want to use X to predict (or
Line of Regression estimate) the value of Y that might be obtained
DEPENDENT VARIABLE without actually measuring it, provided the
relationship between the two can be expressed by
INDEPENDENT VARIABLE
a line.’ X’ is usually called the independent
.variable and ‘Y’ is called the dependent variable
TWO RANDOM VARIABLE
OR We assume that the values of variable X are either
BIVARIATE fixed or random. By fixed, we mean that the
RANDOM values are chosen by researcher--- either an
VARIABLE experimental unit (patient) is given this value of X
(such as the dosage of drug or a unit (patient) is
.chosen which is known to have this value of X
By random, we mean that units (patients) are chosen
at random from all the possible units,, and both
.variables X and Y are measured
We also assume that for each value of x of X, there is
a whole range or population of possible Y values
and that the mean of the Y population at X = x,
,denoted by µy/x , is a linear function of x. That is
µy/x = α +βx
97
.Estimate α and β
Predict the value of Y at a
ESTIMATION .given value x of X
We select a sample of
n observations (xi,yi) Make tests to draw
,from the population conclusions about the model
WITH
the goals
.and its usefulness
=B
99
EXAMPLE
investigators at a sports health centre are
interested in the relationship between
oxygen consumption and exercise time in
athletes recovering from injury. Appropriate
mechanics for exercising and measuring
oxygen consumption are set up, and the
:results are presented below
x variable
100
exercise time y variable
)min( oxygen consumption
0.5 620
1.0 630
1.5 800
2.0 840
2.5 840
3.0 870
3.5 1010
4.0 940
4.5 950
5.0 1130
101
calculations
•
o
r
102
Pearson’s Correlation Coefficient
103
Height and weights of 8 children
Child Height(inches)X Weight(pounds)Y
A 49 81
B 50 88
C 53 87
D 55 99
E 60 91
F 55 89
G 60 95
H 50 90
Average )inches 54 = ( )pounds 90 = (
104
Scatter plot for 8 babies
heig ht weig ht
49 81
50 88
53 83
120
55 99
60 91
100
55 89
80
60 95
50 9060
1سسلة
مت ل
40
20
0
0 10 20 30 40 50 60 70
105
Table : The Strength of a Correlation
•
• Value of r (positive or negative) Meaning
• _________________________________________________
______
•
• 0.00 to 0.19 A very weak correlation
• 0.20 to 0.39 A weak correlation
• 0.40 to 0.69 A modest correlation
• 0.70 to 0.89 A strong correlation
• 0.90 to 1.00 A very strong correlation
• _________________________________________________
_______
106
FORMULA FOR CORRELATION
COEFFECIENT ( r )
• With Pearson’s r,
• means that we add the products of the deviations to see if the
positive products or negative products are more abundant and
sizable. Positive products indicate cases in which the variables
go in the same direction (that is, both taller or heavier than
average or both shorter and lighter than average);
• negative products indicate cases in which the variables go in
opposite directions (that is, taller but lighter than average or
shorter but heavier than average).
107
Computational Formula for Pearsons’s Correlation Coefficient• r
108
XY Y2 X2 Y X Child
109
Table 2 : Chest circumference and Birth
Weight of 10 babies
• X(cm) y(kg) x2 y2 xy
• ___________________________________________________
• 22.4 2.00 501.76 4.00 44.8
• 27.5 2.25 756.25 5.06 61.88
• 28.5 2.10 812.25 4.41 59.85
• 28.5 2.35 812.25 5.52 66.98
• 29.4 2.45 864.36 6.00 72.03
• 29.4 2.50 864.36 6.25 73.5
• 30.5 2.80 930.25 7.84 85.4
• 32.0 2.80 1024.0 7.84 89.6
• 31.4 2.55 985.96 6.50 80.07
• 32.5 3.00 1056.25 9.00 97.5
• TOTAL
• 292.1 24.8 8607.69 62.42 731.61
110
Checking for significance
111
Exercise
Question 6: Correlation and Regression for Health
Sciences
Resting metabolic rate (RMR) is related with body weight.
.Consider the table and scatter plot below
112
Exercise (cont’d)
113
Exercise (cont’d)
.a) Interprete the scatter plot above
b) Interprete the correlation between
Body weight and RMR (given that the
Pearson correlation coefficient, r = 0.987;
p<0.001)
c) Determine the linear regression
equation between body weight and RMR
(from the scatter plot above)
.d) Interprete the regression equation
114
Analysis of Frequency Data
An Introduction to the Chi-
Square
Distribution
TESTS OF INDEPENDENCE
To test whether two criteria of classification
are independent . For example
socioeconomic status and area of residence
of people in a city are independent.
We divide our sample according to status,
low, medium and high incomes etc. and the
same samples is categorized according to
urban, rural or suburban and slums etc.
Put the first criterion in columns equal in
number to classification of 1st criteria
( Socioeconomic status) and the 2nd in rows,
where the no. of rows equal to the no. of
Text Book : Basic Concepts and
categories ofMethodology
2nd criteria (areas of cities).
for the Health
Sciences 116
The Contingency Table
Table Two-Way Classification of
sample
Second First Criterion of Classification →
↓ Criterion
1 2 3 ..… c Total
1 N11 N12 N13 …… N1c .N1
2 N21 N22 N 23 …… N2c .N2
. . . . …… . .
. . . . . .
2
than
, ( r 1)( c 1)
5.991 =
(260 247.86) / 247.86 (299 311 .14) / 311.14
2 2 2
:Calculations
2
..... (14 11 .69) / 11 .69 9.091
5.991
0.01<p <0.025
We also reject the hypothesis at 0.025 level of
123
PRACTICAL SESSION
BUIB PROJECT
ANALYSIS
124
WELCOME TO THE WORLD OF
DECISION MAKING IN PUBLIC
HEALTH
125