0% found this document useful (0 votes)
11 views32 pages

Biostats Part 2

The document discusses sampling variability and significance, outlining the steps involved in sampling and the importance of selecting representative samples. It details various sampling techniques such as simple random sampling, systematic random sampling, stratified sampling, and others, along with tests of significance like t-tests and chi-square tests. Additionally, it explains correlation and regression, emphasizing the relationship between variables and the methods used to analyze these relationships.

Uploaded by

Waheed ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views32 pages

Biostats Part 2

The document discusses sampling variability and significance, outlining the steps involved in sampling and the importance of selecting representative samples. It details various sampling techniques such as simple random sampling, systematic random sampling, stratified sampling, and others, along with tests of significance like t-tests and chi-square tests. Additionally, it explains correlation and regression, emphasizing the relationship between variables and the methods used to analyze these relationships.

Uploaded by

Waheed ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

SAMPLING VARIABILITY AND SIGNIFICANCE

The study of the measures of group variability from sample to sample or


sample to population involves 3 steps:

 Selection of sufficiently large and random samples representative of the


population from which they are drawn.
 Finding the probability or relative frequency of the sample results,
occurring by chance.
 Drawing the inference.
Significance

 After making experiments, certain results like means and proportions are
obtained which vary from sample to sample and sample to universe. The
observer wants to know the significance of the difference he has observed
in his result as compared to that of the population or with that of another
observer. The difference observed is expressed in terms of significance or
probability or relative frequency of its occurrence by chance and is stated
on the basis of sampling distribution.
SAMPLING

 Sampling refers to the choosing of a sample from a population.The number


of individuals or obsrevations in the sample is the sample size.

 When a large proportion of individuals or items or units have to be


studied,we take sample.A sampling frame is a listing of the members of the
universe from which the sample is to be drawn.The accuracy and
completeness of the sampling frame influences the quality of the sample
drawn from it.
 There are 3 types of population:finite,infinite and hypothetical.
 In finite population there are finite number of memberse.g.-number
of citizens in the country,number of leprosy patients in a community.

 Infinte population with infinite number of members.e.g.-Hb values


in a given interval ,etc.

 A hypothetical population is one which is assumed for theoretical


purposes.e.g. few guinea pigs are given vit-A deficiency diet and then are
watched for deficiency symtoms.All such groups constitute a population.
 The variation between one sample and another can be categorised as ‘non
sampling’ or ‘sampling’ variation and may arise out of different types of
sampling methods , inconsistencies in defination , tools , measurement
process , nonresponce , carelessness , recording practices , etc.
 Another factor that affects the representative nature of the sample is the
bias in sampling.

 SAMPLING TECHNIQUES

 Simple Random Sampling:


 Here each and every unit in the population has an equal chance of being
included in the sample . Selection of the unit is determined by chance
only.Mainly 2 methods : 1. Lottery Method :
 2. Table of random numbers :
 The principle is that every unit of the population has an equal chance of
being selected. Hence, sometimes, also called ‘unrestricted random
sampling’.
Systematic Random sampling

 A systemic sample is formed by selecting one unit at random and then units at
evenly spaced interval till the sample of required size has been formed.This
method is popularly used in cases when a complete list of population from which
sample is to be drawn, is available. If there are N units in the population and the
sample of size n is desired and if k=N/n,then one number is selected between 1 to
k. Adding k to the previous number selected gets the other units. This k is called
the sampling interval. This method may be selected to obtain a sample out of the
patients attending the OPD of dental
Stratified Random Sampling
 This method is followed when the population is not homogenous. The

population is first divided into homogenous groups or classes called strata


and the sample is drawn from each stratum at random in proportion to its
size.This type is used when the population is heterogenous with regard to
characteristic under study.
 This method ensures more representativeness, provides greater accuracy

and can concentrate on wider geographical area.



 It is a method of sampling for giving representation to all strata of society
or population such as selecting sample from defined areas, classes, ages etc.
4. Multistage sampling.
 This method refers to the sampling procedures carried out in several stages
using random sampling techniques. This is employed in large country
surveys.
 In the first stage, random numbers of districts are chosen in all the states,
followed by random numbers of towns, villages and units respectively.
 5.Cluster sampling
 A cluster is a randomly selected group. This method is used when units of
population are natural groups or clusters such as villages, wards,or
factories. Here , first a sample of the clusters is selected and then all the
units in each of the selected clusters are surveyed. Cluster sampling gives a
higher standard error but the data collection in this method is simpler and
involves less time and cost than other sampling techniques. It is most often
used to evaluate vaccination coverage in expanded programme of
immunization and universal immunization programme.
6.Multiphase sampling
 In this method, part of the information is collected from the whole sample

and part from the subsample.

 Survey by this procedure is less costly , less laborious and more purposeful.
TESTS OF SIGNIFICANCE

 Difference in estimates for different samples drawn


from same population is called sampling variability.The
methodologies of statistics that deal with techniques to
know how far the differences between the estimates of
different samples is due to sampling variation or
otherwise is called as testing of hypothesis.
 The tests are used to compare 2 parameters,such as
means or proportions,and to determine whether the
difference between is statistically significant.The
various t-tests compare differences between means,‘z’
tests for comparing differences between
proportions .These tests make comparision possible by
calculating ‘critical ratio’.

 A critical ratio is the ratio of some parameter(e.g.,a difference
between means from 2 sets of data) divided by the standard
error (SE)of that parameter(SE of the difference between
means).

 Critical Ratio= Parameter
 SE of that parameter

 Use of t – Tests
 It is applied to find the significance of difference between two
means as:
 Unpaired t-test.
 Paired t-test.
 Purpose is to compare the means of a continous variable in 2
research samples ,such as treatment group and a control group.
If the samples come from 2 different groups (e.g.,a groupof men
and and a group of women) , Student’s t-test is used. If from
same group (e.g.,pretreatment and post treatment values for
the same study subjects) , paired t-test is used .
 The normal distribution is z distribution. The t distribution is
needed when sample sizes of studies are small,as the
observed estimates of the mean and variance are subject to
considerable error.The larger the sample size ,the smaller the
errors are,and more the t distribution looks like the normal
distribution.

 Student’s t – test
 It can be one or two tailed. The calculations are the same but
interpretation of the resulting t differs.
 While calculating in both types of student’s t – tests, t is
calculated by taking the observed difference between the
means of the two groups (the numerator) and dividing this
difference by the standard error of the difference between the
means of the two groups (the denominator). Before t can be
calculated, the standard error of difference between the
means (SED) must be determined. The basic formula for this
is the square root of the sum of the respective population
variances, each divided by its own sample size.
 The t- test is designed to help investigators distinguish
“explained variation” from “unexplained variation” (random error
or chance)

 Interpretation of results
 If the value of t is large, the p value is small because it is unlikely
that a large t ratio would be obtained by chance alone. If the p
value is ≤ 0.05, it is customary to accept the difference as real.
Such findings are called statistically significant.

 One–Tailed and Two-Tailed t-Tests
 These tests are also called a one sided test or two sided test.
 The two-tailed test is generally recommended because
differences in either direction are usually important to
document,e.g-it is important to know if a new treatment is better
than a standard treatment, but it is also important to know if a
new treatment is significantly worse and should be avoided.In
this situation , the two tailed test provides an accepted criterion
for when a difference shows the new treatment to be better or
worse.
 One–tailed test is significant in cases
where, a new therapy is known to cost
much more than the currently used
treatment. It would not be used if it
were worse than the current therapy,
but it also would not be used if it were
merely as good as the current
therapy.It would be used only if it were
better than the current therapy.



Degree of freedom
 The quantity in the denominator
which is one less than the
independent number of the
observation in a sample is called as
the degree of freedom and used in
preference to the sample size.
 In unpaired t test of difference
between two means=n1+n2-2 where
n1 and n2 are the number of
observation.
 Paired t-test
 In many medical studies, individuals are followed over
time to see a change in the value of a continuous
variable,e.g- a typical “before and after
experiment”,such as one testing to see if there was a
decrease in average blood pressure after treatment or
to see if there was a reduction in weight after the use
of a special diet .The appropriate statistical test for
this kind of data is the paired t-test.
 The paired t-test is more robust than the student’s t-
test because it considers the variation from only one
group of people, whereas student’s t-test considers
variation from two groups.
 z-tests
 In contrast to t-tests, which compare differences
between means, the z-tests compare differences
between proportions. In medicine, the e.g of
proportions frequently used are sensitivity,
specificity, risks and percentages of people with
symptoms,illness or recovery.Goal is to see if the
proportion of patients surviving in a treated
group differs from that in an untreated group.
 Z is calculated by taking the observed difference
between the two proportions (the numerator)
and dividing it by the standard error of
difference between the two proportions (the
denominator).

 The Chi-square test
 The test involves calculation of a quantity, called chi-
square from the greek letter chi ‘χ’. The deviationsof the
observed numbers from those specified by the
hypothesis form the basis of the chi-square .Greater the
difference between the 2 percentages in the 2
categories higher the values of chi-square.Is used for
comparison of groups when data is expressed as counts
or proportions.Chi- square is considered a measure of
association between 2 categorical variables.Thus the
statement of the null hypothesis is one of “no
association “ between the 2 variables.
 The data for a chi-square test is displayed in an
arrangement called a contingency table. is a non
parametric test not based on any assumption or
distribution of any variable.
 It is most commonly used when data are there in
frequencies.
 Chi-square analysis is a popular tool in the analysis of research
data.
 It has got applications in at least 3 types of problems:

 1.Test of proportions: an alternate test to find the significance of
difference in two or more than two proportions. In case of large
binomial samples of size over 30, the significance could be found
by calculating the standard error of difference between two
proportions. Chi-square test has two more advantages
 To compare the values of two binomial samples even if they are
small, less than 30, such as incidence of diabetes in 20 non-obese
with that in 20 obese persons.
 To compare the frequencies of two multinomial samples such as
no of diabetics and non-diabetics in groups weighing 40-50 kg, 50-
60 kg, 60-70 kg and more than 70 kg.

 2. Test of association(Independence) : the test of association
measures the probability of association between two discrete
attributes whether any two events influence each other or they
don’t.
 The chi square test has an added advantage
that it can be applied to find association
between two discrete attributes when there are
more than two groups.

 3. Test of goodness of fit :
 Chi square test is also applied as a test of
‘goodness of fit’ to determine if actual numbers
are similar to the expected or theoretical
numbers – goodness of fit to a theory.
 The test determines whether an observed
frequency distribution differs from the
theoretical distribution by chance or if the
sample is drawn from a different population.
 If the calculated value of chi of the
sample is greater than the table value at
the critical level of significance i.e
probability 0.05, the hypothesis of no
difference is rejected.

 If the CV<TV, null hypothesis is


accepted, there by concluding that the
difference is due to chance, or the 2
characters are not associated
CORRELATION and REGRESSION
 Meaning of correlation- Often we wish to know whether
there is linear relation between 2 variables, e.g., height
and weight, temperature and pulse, age and vital
capacity, etc. In order to find out whether there is
significant association or not between 2 variables ,we
calculate what is known as Co-efficient of
correlation ,represented by symbol ‘r’.
 ‘r’ lies between -1 and +1 .If r is near +1,it indicates a
strong positive association between 2 variables i.e., when
one increases other also increases. A value near -1
indicates a strong negative association i.e. when one
variable increases the other decreases. If r=0 , no
association between 2 variables. There are also tests to
show whether or not the correlation could be due to
chance. However,it needs to be noted that correlation
does not necessarily prove causation.
 The relationship or association between two quantitatively measured or
continuous variables is called correlation. The extent or degree of
relationship between two sets of figures is measured in terms of
another parameter called correlation coefficient which is denoted by ‘r’.
 There are 5 types of correlation depending on its extent and direction
as follows:
 Perfect positive correlation : in this the 2 variables denoted by the
letter X and Y are directly proportional and fully correlated with each
other. The correlation coefficient (r) = +1. Both variables rise or fall in
the same proportion.
 Perfect negative correlation : here X and Y values are inversely
proportional to each other. When one rises, the other falls in the same
proportion. The correlation coefficient r is -1.
 Moderately positive correlation : in this case, the non zero values of
coefficient (r) lie between 0 and +1.
 Moderately negative correlation : in this case, the non zero values of
coefficient (r) lie between -1 and 0.
 Absolutely no correlation : here the value of correlation coefficient is 0,
indicating that no linear relationship exists between the two variables.
 Pearson correlation coefficient : when associated variables are
normally distributed such as height and weight, the correlation
coefficient is called pearson’s correlation coefficient.
 When two variables are correlated, but they don’t follow
normal distribution, another correlation coefficient called –
spearman’s rank order correlation coefficient is used.

 Meaning of regression: If we wish to know in an individual case


the value of one variable , knowing the value of the other,we
calculate what is known as the regression coefficient of one
measurement to the other. It is customary to denote the
independent variate by x and the dependent variate by y.The
value of b is called the regression coefficient of y upon x.
Similarly,we can obtain the regression of x and y.
 Regression means change in the measurements of a variable
character, on the positive or negative side, beyond the mean.
Regression coefficient is a measure of the change in one
dependent character Y with one unit change in the
independent character X. It is denoted by the letter ‘b’.
 If correlation coefficient (r) is already calculated, the
regression coefficient is derived as:
 byx = r × SD of Y series ÷ SD of X series.
 Variance ratio test ( F test)

 Comparison of the sample variance involves what is called is


variance ratio test.
 The test involves another distribution called the F-
distribution.
 Calculate S22/S12..
 F=S12/S22( S12 SHOULD BE GREATER OF THE TWO AND BE
KEPT AS NUMERATOR)
 The significance of F can be found by referring to F TABLE.
 Degree of freedom will be n1 -1 and n2 -1 in the two samples.
 F table gives variance ratio value at different level of
significance at df(n1-1) given horizontally and (n2-2) given
vertically
ANALYSIS OF VARIANCE
 This test is basically applied to compare the

means of more than two samples drawn from


the corresponding normal population.
 e.g. suppose we want to know whether

occupation plays any part in the causation of


the BP. Take BP of randomly selected 10
officers ,10 clerks,10 laboratory technicians
and 10 lab attendants.

 If occupation plays no role in the causation of


BP, the 4 gps when compared among
themselves will not differ significantly. If
occupation is playing any significant role , the
4 gps will differ significantly
 To test whether the 4 means differ significantly or
not, F-test or ANOVA is carried out.

 We always start with null hypothesis that BP is


independent of occupation .
 Now compare the calculated F-ratio with that
given in the F-table at df between the classes and
at df with in the classes at 5% level of significance.

 If calculated value> table value, null hypothesis is


rejected and alternate hypothesis of significant
difference between the mean is accepted

 DEMOGRAPHY AND VITAL STATISTICS
 Demography is a collective study of mankind. It is
defined as the scientific study of human population,
focussing attention on readily observable human
phenomena eg changes in population size, its
composition and distribution in space. It may be :
 Static demography – the study of anatomy or structure
of communities and their environment in a given
population.
 Dynamic demography – deals with physiology or
function of communities as regards changing patterns
of mortality, fertility and migration.
 Vital statistics means data which gives quantitative
information on vital events occurring in life ie migration,
births, marriages and deaths in a given population.
MEASURES OF VITAL STATISTICS
 Absolute number of vital events such as births and deaths are

converted into rates and ratios for comparison of vital


statistics from place to place or year to year. Eg infant deaths
in UP and Kerala.
 Rate differs from proportion in the matter of time. No time

factor is involved in the proportion. The rate is a measure of


the speed at which new events are occurring in a community.
The rates in general are of two types – crude and specific.
 1.Crude rate
 Crude rate = total no of events that occurred in a given

geographical area during a given year /mid-year population of


the geographical areas for the same period × 1000.
 2.Specific rate
 Specific rate = no of events which occurred among a specific

group of the population of a given geographic area during a


given year /mid year population of the specific group of
population in the same geographic area during the same
period × 1000.
CONCLUSION
 To summarise, statistics and biostatistics have got a

much wider application in modern times. It has


become an incredibly vast branch of science which
needs to be thoroughly understood and applied.

REFERENCES
 Jekel F. James, Katz L. David, Elmore G. Joann, Wild

M.G. Dorothea. Epidemiology, Biostatistics &


Preventive Medicine. 3rd edition, Saunders Elsevier;
2007:139-220.
 K.Park. Park’s Textbook Of Preventive And Social

Medicine. 20th edition, 2009 :


 Elhance D.N., Elhance Vena. Fundamentals of

Statistics.1994: 8.33-8.79
 Methods in Biostatistics. BK Mahajan. 2005,Jaypee;

6th edition:40-209.

You might also like