DESCRIPTIVE STATISTICS
Q1. The weight (in Kg) of 9 males were 53, 59, 45,
50, 80, 67, 59, 74, 62. Calculate mean, median,
mode and SD.
Write down the formula, substitute values and then write
answer
Q2. The respiratory rate/min of 10 adults were 13,
12, 20, 14, 16, 21, 15, 18, 17, 19. Calculate mean,
median, mode, SD and variance.
Coefficient of variation
Q1. In a sample of adults aged 21 years and children 3
months old the following data were obtained for
height. Find which series shows greater variation
Mean ht SD
Adults 160cm 10cm
Children 60cm 5cm
Coeff of Variation for adults = 10/160*100 = 6.25%
Coeff of Variation for children = 5/60*100 = 8.33%
Height of children shows greater variation
NORMAL (GAUSSIAN) DISTRIBUTION
Curve is bell shaped and symmetrical about the mean
Curve is asymptotic (does not touch the baseline)
Mean = median = mode
68.2% of observations will be within mean ± 1SD
95.4% within mean ± 2SD
99.7% within mean ± 3SD
Area under the curve is equal to 1
Two parameters:
Mean (µ) – determines the position of the curve
SD (σ) – determines the shape of the curve
STANDARD NORMAL DISTRIBUTION (Z)
Mean = 0 and SD = 1
68.2% b/w -1 and +1
95.4% b/w -2 and +2
99.7% b/w -3 and +3
𝑥−𝑚𝑒𝑎𝑛 𝑥−𝜇
Standard normal variate, 𝑧 = =
𝑆𝐷 𝜎
INFERENTIAL STATISTICS
95% confidence interval (CI)
1. 95% CI for population mean:
Sample mean ± 𝑧𝛼/2 *SE (mean)
s
= x ± 1.96 ∗
n
2. 95% CI for population proportion:
Sample proportion ± 𝑧𝛼/2 *SE (proportion)
pq
= p± 1.96 ∗
n
Q1. The mean weight of 50 boys aged 5-years was 18.6kg with SD 1.65kg. Find the
95% confidence interval for mean weight of 5-year old boys.
95% CI for population mean (mean weight of 5 year old boys)
𝑠
= 𝑥 ± 1.96 ∗ 𝑛
1.65
= 18.6 ± 1.96* = 18.14 – 19.06
50
95% CI for mean weight of 5 year old boys is 18.14kg – 19.06kg
[Link] mean pulse rate of 100 MBBS students of GMCT was 74
with SD 3. Find the 95% confidence interval for the mean pulse
rate of all MBBS students of GMCT.
95% CI for mean pulse rate
𝑠
= 𝑥 ± 1.96 ∗ 𝑛
3
= 74 ± 1.96* = 73.4 – 74.6
100
95% CI for mean pulse rate of MBBS students of GMCT is 73.4/min – 74.6/min
Q3. In a study to find the prevalence of refractive errors among adolescents, out of 250 adolescents
selected 45 had refractive errors. Using this data estimate a 95% CI for
prevalence of refractive errors among adolescents.
95% CI for population proportion (prevalence of refractive errors)
𝑝𝑞
= p± 1.96 ∗
𝑛
45
p= ∗ 100 = 18%
250
18∗82
= 18 ± 1.96* = 13.24 – 22.76
250
95% CI for prevalence of refractive errors is 13.2% - 22.8%
Q4. In a study to find the prevalence of anemia among antenatal women, 200 antenatal patients were
selected and anemia was present in 40. Find the 95% CI for the prevalence of anemia in antenatal
Patients.
95% CI for prevalence of anemia
𝑝𝑞 40
= p± 1.96 ∗ p= ∗ 100 = 20%
𝑛 200
20∗80
= 20 ± 1.96* = 14.46 – 25.54
200
95% CI for prevalence of anemia in antenatal women is 14.5% - 25.5%
Testing of Hypothesis
1. One sample Z test for population mean
Q1. A random sample of 50 new born babies had mean birth
weight 2.95kg with SD 0.75kg. Test whether this data justify the
statement that mean birth weight of new born babies is 2.8kg at
5% level of significance?
Null hypothesis is H0:μ = 2.8
Alternative hypothesis is H1:μ ≠ 2.8
x 0 z
2.95 2.8
1.41
Test statistic is z 0.75
s 50
n
At 5% level of significance, z / 2 1.96
Since z 1.41 1.96 , we accept H0 and reject H1:μ ≠ 2.8
Hence mean birth weight of new born babies is 2.8kg
2. Two samples Z test for equality of two population means
Q2. The following data on blood sugar was obtained from a
study conducted for testing the effect of two drugs A and
B. Test whether the effect of the drugs are different in reducing
blood sugar at 5% level of significance.
Drug Mean SD
A (100) 115 20
B (100) 140 30
H0: There is no difference in the effect of the
drugs
(H0:μ1 = μ2)
H1: There is difference in the effect of the drugs
(H1:μ1 ≠ μ2)
x1 x2
Test statistics is z
s12 s22
n1 n2
115 140
z
20 2 30 2
100 100
Z = - 6.93
At 5% level of significance, z / 2 1.96
Since z 6.93 1.96 , we reject H0 and accept
H1: μ1 ≠ μ2
Hence there is significant difference in the effect
of the drugs
3. One sample Z test for population proportion
Q3. In an otological examination of school children, out of 150
children examined 21 were found to have some type of otological
abnormalities. Does this agree with the statement that prevalence
of otological abnormalities among school children is 16%?
Null hypothesis is H0:P = 16%
Alternative hypothesis is H1:P ≠ 16%
p p0 14 16
z
p0 q0 16 * 84
Test statistic is
150
n
2
z 0.669
2.99
At 5% level of significance, z / 2 1.96
Since z 0.669 1.96 , we accept H0:P = 16% and reject H1
Hence prevalence of otological abnormalities among school
children is 16%.
4. Two samples Z test for equality of two population proportions
Q4. To compare the prevalence of dyslipidaemia between males and females,
random samples of 400 males and 400 females were selected. 120 males and
80 females were found to have dyslipidaemia. Test whether the prevalence of
dyslipidaemia between males and females is significantly different at 5%
level of significance.
Null hypothesis is H0:P1 = P2
Alternative hypothesis is H1:P1 ≠ P2
p1 p2
Test statistic is z
1 1
pq
n1 n2
n1 p1 n2 p2
where p , q 100 p
n1 n2
400 * 30 400 * 20
p 25
400 400
q= 100-25 = 75
30 20 10
z
1 1 9.375
25 * 75
400 400
z = 3.27
At 5% level of significance, z / 2 1.96
Since z 3.27 1.96 , we reject H0 and accept H1:P1 ≠ P2
Hence there is significant difference in the prevalence of
dyslipidaemia between males and females.
5. Chi square test
Used to test the association between two attributes
Null hypothesis is H0: There is no association b/w the
attributes
Alternative hypothesis is H1: There is association b/w
the attributes
Test statistic used is 2
O E
2
where O is the observed frequency and E is the expected
frequency of the cells.
E= row total*column total
grand total
If there are r rows and c columns in the frequency table,
(r-1) x (c-1) gives the degrees of freedom.
Critical region (rejection region) is 2 2 ,k , where α is
significance level and k is df
If 2
2
, k we reject the null hypothesis and accept the
alternative hypothesis
If 2
2
,k we accept the null hypothesis and reject the
alternative hypothesis
Q5. To study the association between smoking and lung
cancer, 80 patients and 120 controls were selected.
Among patients 53 were smokers and among controls
37 were smokers. Using the data test whether there is
any association between smoking and lung cancer
Null hypothesis is
H0 : There is no association between smoking
and lung cancer
Alternative hypothesis is
H1: There is association between smoking
and lung cancer
Smoking Patients Controls Total
status
Smokers 53 37 90
Non 27 83 110
Smokers
Total 80 120 200
Expected frequencies are
E(53) = 90*80/200 = 36
E(37) = 90*120/200 = 54
E(27) = 110*80/200 = 44
E(83) = 110*120/200 = 66
2 O E 2
53 36 37 54 27 44 83 66
2
2
2
2
36 54 44 66
= 24.3
df = (r-1)*(c-1) = (2-1)*(2-1) = 1
2 ,k 3.84 for 5% and df 1
Since 2
24.3 3.84 , we reject the null hypothesis and
accept the alternative hypothesis.
Conclusion: There is association between smoking and
cancer.
When there is a 2x2 contingency table
Attribute 2 Attribute 2 Total
Present Absent
Attribute 1 a b a+b
Present
Attribute 1 c d c+d
Absent
Total a+c b+d a+b+c+d
(N)
the above formula is reduced to
2
ad bc * (a b c d )
2
a b c d a c b d
and df = 1
Q6. A study was conducted to determine the association
between vitamin D deficiency and duration of exposure to
sunlight. 35 Vitamin D deficient subjects and 165 controls
were selected. 22 Vitamin D deficient subjects and 58
controls had a history of < 30min exposure to sunlight.
Test whether there is association between vitamin D
deficiency and duration of exposure to sunlight
Null hypothesis is
H0 : There is no association between vitamin D
deficiency and duration of exposure to sunlight
Alternative hypothesis is
H1: There is association between vitamin D
deficiency and duration of exposure to sunlight
Vit D deficient Control Total
< 30 min exposure 22 (a) 58 (b) 80
> 30 min exposure 13 (c) 107 (d) 120
Total 35 165 200
2 ad bc 2
*N
a b c d a c b d
2
22 *107 13 * 58 * 200
2
22 5813 107 22 1358 107
2 9.24
2 ,k 3.84 for 5% and df 1
Since 2 9.24 3.84 , we reject the null hypothesis and
accept the alternative hypothesis.
Conclusion: There is association between vitamin D
deficiency and duration of exposure to sunlight.
VITAL STATISTICS
Important vital rates and ratios
Total number of live births during the year
Crude Birth Rate CBR = ∗ 1000
Mid year population of the same year
Total number of deaths during the year
Crude Death Rate CDR = ∗ 1000
Mid year population of the same year
CBR−CDR
Natural Growth Rate(NGR) = ∗ 100
1000
Age specific death Rate =
Total number of deaths in the specific age group during the year
∗ 1000
Mid year population of the same age group
Sex specific death Rate =
Total number of deaths in the specific sex during the year
∗ 1000
Mid year population of the same sex
Disease specific death Rate =
Total number of deaths from the specific disease during the year
∗ 1000
Mid year population of the same year
Still Birth Rate SBR =
Total number of still births
∗ 1000
Number of live births + still births
Perinatal Mortality Rate PMR
Total number of still births + deaths under 1 week of birth
= ∗ 1000
Number of live births + still births
Neonatal Mortality Rate NMR
Total number of deaths upto 28 days of life
= ∗ 1000
Number of live births
Post Neonatal Mortality Rate PNMR
Total number of deaths after 28 days of birth upto 1 year
= ∗ 1000
Number of live births
Infant Mortality Rate IMR
Total number of deaths within 1 year of birth
= ∗ 1000
Number of live births
Maternal Mortality Ratio MMR
Total number of maternal deaths
= ∗ 100000
Number of live births
Proportional mortality rate from a specific disease =
Total number of deaths due to a specific disease in the year
∗ 100
Total number of deaths due to all causes in the same year
Proportional mortality rate aged 50 yrs and above =
Total number of deaths of persons aged 50 yrs and above in the year
∗ 100
Total number of deaths of all age groups in the same year
Case Fatality rate CFR
Total number of deaths due to a specific disease in a year
= ∗ 100
Total number of cases of the disease in the same year
Survival rate 5 years
Total number of patients alive after 5 years
= ∗ 100
Total number of patients diagnosed or treated
MEASUREMENTS OF MORBIDITY
Incidence
Number of new cases of a disease during a given time period
= ∗ 100
Population at risk during that period
Point prevalence
Number of all current cases (old and new) of a disease at a given point in time
= ∗ 100
Estimated population at the same point in time
Period prevalence
Number of all existing cases (old and new) of a disease during a given period of time
= ∗ 100
Estimated mid interval population at risk
Q1. Census population of an area during 2001 and 2011 were 231500 and
246500 resp. During 2004 there were 7500 live births and 2500 deaths. Of
these 50 deaths were within 1 year of birth and 15 deaths under 28 days of life.
There were 2 maternal deaths, 1750 deaths of above 50 years. Calculate
possible vital rates.
Write down the formula, substitute values and then write answer with
unit
Q2. The census population of a town during 2001 was 47000 and in 2011 was
65000. There were 2000 live births and 700 deaths in the year 2002. Of the 20
infant deaths, 8 infants died in the first 28 days of life and 4 of them died in the
first week itself. There were 10 still births in the same year. Calculate all
possible vital rates.
Q3. Census population of an urban area during 2001 and 2011 were 5,10,000
and 5,31,000. During 2003 there were 12500 live births and 4000 deaths. Out
of the total deaths 150 deaths were of children below one year. There were 25
maternal deaths, 5 deaths due to TB and 2800 deaths of persons above 50
years. 200 cases of TB were reported during that year. Calculate various vital
rates.