BIOSTATISTICS -
DR. PRATHA AKOLU
MDS PART – I
CONTENTS
Introduction
Collection of Data
Sources & Presentation of Data
Measures of Central Tendency
Measures of Dispersion
Normal curve
Tests of significance
Sampling
Reference
INTRODUCTION
It is just a tool for analysis and interpretation.
Statistics is the science of compiling, classifying and tabulating numerical
data and expressing the results in mathematical or graphical form.
Biostatistics is that branch of statistics concerned with mathematical facts
and data related to biological events.
It is the term used when the tools of statistics are applied to data that is
derived from biological sciences such as medicine or dentistry.
COLLECTION OF DATA
DATA – Comprises of details of population size, geographic distribution, ethnic
groups, socio-economic factors & their trends over time.
Depending on the nature of the variable,
Data
Qualitative Quantitative
Collected
Collected on
through
the basis of Eg :- gender Eg :- indices
measureme
qualities
nt
when the variable under
observation takes only
fixed values like whole
numbers
Discrete
Eg :- DMF teeth
Quantitative
When variables can take
any value in range
Continuous
Eg :- Arch length
METHOD OF DATA COLLECTION
Primary Source
• Data is obtained by the investigator himself (1st hand
information)
• Eg :- Personal Interviews
Secondary Source
• Already recorded data is utilized
• Eg :- OPD records
PRIMARY DATA
DIRECT
ORAL HEALTH QUESTIONNAIR
PERSONAL
EXAMINATION E METHOD
INTERVIEWS
A list of question
Face to face contact
pertaining to the
with person from Information on oral
survey is prepared,
whom the health, conducted by
informants are asked
information is dentists
to supply the
obtained
information
PRESENTATION OF DATA
Data collected and compiled from experimental work, surveys,
registers/ records.
Such data is not helpful for easy understanding
Thus, data needs to be sorted and classified.
The objective being to make the data –
Simple
Concise
Meaningful
Interesting
Helpful in further analysis
DATA PRESENTATION
TABULATION
CHARTS AND LINE STATISTICAL
DIAGRAMS DIAGRAMS MAPS
FREQUENCY
SIMPLE DISTRIBUTION
TABLES TABLES
HISTOGRAMS PIE CHARTS PICTOGRAM
BAR CHARTS
FREQUENCY FREQUENCY
COMPONENT POLYGON CURVE
MULTIPLE BAR
SIMPLE BAR CHART BAR CHART
CHART
TABULATION
Tables are simple devices used for presentation of statistical
data.
Data must be presented according to size or importance,
chronologically or alphabetically.
Tables should be self explanatory, codes abbreviation or symbols
should be explained in detail in footnote.
Each row and each column should be labelled concisely and
clearly.
Every table should contain a title as to what is depicted in the
table. The title is commonly separated from the body of the
table by lines or spaces.
Master table
Contains all the data from a study.
Sr no. Name Age Sex Teeth Caries
present
Simple Table
Presents only one characteristic of data
Age Height
Frequency Polygon
Depicts frequency of occurrence
Marks Range Tally Frequency
0-25
25-50
CHARTS AND DIAGRAMS
Purpose of diagrammatic and Graphic presentation:
They are self explanatory
They are simple and consistent.
They give a bird eye view of the entire data.
They are attractive to the eye and lasting impression.
They have greater memorizing effect.
They facilitate the comparison of data relating to different
periods of time.
MULTIPLE BAR CHART
BAR CHART Represents more than one variable
Represents data along the
bar
Can be
Vertical
Horizontal Category 1
COMPONENT BAR CHART
SIMPLE BAR CHART Individual bars are divided into two or
Represents only one more parts
28
variable
21
19
Category 1 Category 2
PIE CHART
Represents total frequency
Total angle – 360 degree
Divided into different sectors corresponding to frequencies of
variables
Caries
0-10 years 10-20 yrs 20-30 yrs 30-40 years
LINE DIAGRAM
Study values of variables over time i.e hours, days, weeks etc
0
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Column1
HISTOGRAM
Pictorial diagram of frequency of distribution
FREQUENCY POLYGON
It is also a pictorial diagram of frequency distribution.
Mid point on histogram bars are connected by line.
CARTOGRAM/SPOT
MAP/SHADED MAP
These maps are used to show geographical distribution of frequencies of a
characteristic.
If shades are used it is called shaded map.
SCATTER DIAGRAM
It is diagram which shows the relationship between two variables.
Relationship between two variables.
4.5
3.5
2.5
1.5
0.5
0
Category 1 Category 2 Category 3 Category 4
MEASURES OF CENTRAL
TENDENCY
The main objective of measures of central tendency is to condense the
entire mass of data and facilitate comparison.
The most common measures of central tendency that are used in dental
sciences are –
1. Arithmetic mean – mathematical estimate.
2. Median – positional estimate.
3. Mode – based on frequency.
ARITHMETIC MEAN
It is the simplest measure of central tendency.
It is obtained by adding the individual observations and then divided by the
total number of observations.
Advantages – 1. Easy to calculate and understand.
2. It is most useful of all the averages.
3. Based on all items of the given data
MEDIAN
It is the middle most value in a distribution arranged either in an ascending
or descending order.
When the number of observations is odd – M = the middle value.
When the number of observations is even then M = The sum of n/2 th and
the next value, divided by two.
Advantages
1. Simple to understand and easy to compute.
2. Can be calculated for open-ended classes.
3. Not affected by extreme values.
4. Can be graphically determined.
MODE
The mode is the observation that has been repeated the most number of
times. In a grouped data it is the class interval having the highest
frequency.
A set of observations can be described as either unimodal, bimodal or
multimodal depending on the number of modes present in that distribution.
Advantages
1. Easy to compute.
2. Is an average used in day-to-day life.
3. No need of arranging data.
MEASURES OF DISPERSION
The different measures of dispersion are:
A] Range.
B] Mean deviation.
C] Standard deviation.
Objectives:
- determine reliable average.
- Serve as a basis for the control of variability.
- Enable other statistical analysis.
- Compare with other sets of data.
RANGE
Range is the difference between the highest and the lowest values found in
the set of observations.
Uses of range:
- quality control.
- Variation in status.
- Weather reports.
MEAN DEVIATION
It is the mean of the deviations of all the different observations from the
calculated mean.
STANDARD DEVIATION
It was introduced by Karl Pearson. It is a useful measure of variability.
It overcomes the zero sum concept of the mean.
Calculating the standard deviation:
SD = √∑[x – x bar]2/ n.
Where ,
SD = Standard deviation,
x bar is the calculated mean,
x = the different observations,
n = the number of observations.
THE NORMAL DISTRIBUTION –
NORMAL CURVE
The most important probability density function used in
statistical analysis is the normal or gaussian distribution. The
features of this distribution are:
1. Shape: The normal distribution is represented by a smooth
bell shaped curve that is symmetric about the population
mean . The exact shape of a particular normal curve
depends on the,
Location of its center, which is the mean .
The degree of spread of the other observations around
the center.
2. Area under the curve: The relative frequencies of the values around the
in a normal distribution are (as given in fig)
Fifty percent of the observations lie above the mean and the
remaining 50% lie below the mean.
Approximately 68% (68.26% to be precise) lie within one standard
deviation of the mean.
Approximately 95% lie within two standard deviations of the mean.
Approximately 99% (99.02% to be precise) lie within three
standard deviations of the mean.
TEST OF SIGNIFICANCE
When different samples are drawn from the same population, the estimates
might differ
This difference in the estimates – sampling variability
Test of significance – deals with technique to know how far the estimates
differ due to sampling variability.
1. Standard of Error of mean = S.D/√n
2. Standard of Error of proportion = √pq/n
3. Standard of Error of difference between 2 means
4. Standard of Error of difference between 2 proportions
PARAMETRIC AND NON
PARAMETRIC TESTS
Parametric tests Non Parametric tests
• Based on specific • Not based on any
distribution such as particular parameter such
Gaussian as mean
• Student’s t- test • Do not require that the
• Z test means follow a particular
distribution such as
Gaussian.
• Chi-square test
CHI SQUARE TEST
Developed by Karl Pearson
Data is measured in terms of attributes or qualities and is intended to test
whether the difference in the distribution of attributes in different groups is
due to sampling variation or not – Chi Square Test is applied.
Used
1. To test significance of difference between two proportions
2. Can be used when there are more than two groups to be compared.
STEPS
1. Test the null hypothesis
Eg: To test if there is an association between oral hygiene instructions received
and the occurrence of new cavities
2. The x2 – statistic is calculated
O – observed frequency
E – Expected frequency
3. Applying the test
4. Finding degree of freedom – d.f (depends on no. of columns and rows)
d.f = (column – 1 ) (row – 1)
5. Probability table
Null hypothesis is verified
STUDENT’S T TEST
Designed by W.S. Gossett
Applied to find the difference between two means
Criteria for applying t test
1. Random samples
2. Quantitative data
3. Sample size < 30
4. Variable normally distributed
Unpaired t test
When sample in two groups give individual value,
To test for the difference in between the groups
UNPAIRED T TEST
When each individual gives a pair of observations, to test for
the difference in the pair of values, t test is utilized.
Steps
1. Null hypothesis
2. The difference in each set is obtained. D=x1-x2
3. Calculate mean of difference
4. Calculate standard deviation
5. Test statistic is calculated t = d /SD/√n
6. Find degree of freedom
7. Compare calculated value.
Z TEST
Used to test the significance of difference in means for large
samples (>30)
Pre-requisites are:
1. Sample must be randomly selected
2. Quantitative data
3. Sample size > 30
4. Variable normally distributed
5. Observation-mean = x-x
Standard deviation SD
SAMPLING
Sample is the part of a population.
Sampling is the process or technique of selecting a sample of
appropriate characteristics and adequate size.
Sampling frame is the total of the elements of the survey
population.
Advantages
1. Reduces cost of investigation, time required, no. of personnel
involved.
2. Allows thorough investigation.
3. Provides adequate and in-depth coverage of sample units.
Ideal Requirements
1. Efficiency – Ability to yield the desired information
2. Representativeness – Should represent parent population
3. Measurability – the investigator should be able to estimate the
extent to which findings from sample are likely to differ from
parent population.
4. Size – Large enough to minimize sample variability.
5. Coverage – Adequate coverage is essential
6. Goal Orientation – sample should be oriented towards study
objectives and research design.
7. Feasibility – Simple enough to be carried out in practice.
8. Cost-effectiveness – Desired info with appreciable savings in
time.
SAMPLE SELECTION
Purposive selection
• Represents the population as whole.
• Purposively select individuals who seem to represent the
population.
• Easy to carry
• Does not need preparation of samples
Random Selection
• Sample is selected in such a way that all characteristics of
population are reflected in sample.
SAMPLING METHODS
(DEPENDING UPON TYPE AND NATURE OF POPULATION OF
POPULATION)
Non-probability Sampling
(Not truly representative and are therefore less desirable than probability samples )
• Quota Sampling
• Purposive Sampling
• Convenience Sampling
Probability Sampling
(Each individual unit in the total population has a known probability of being
selected)
• Simple Random Sampling
• Systematic Sampling
• Stratified Sampling
• Cluster Sampling
Non Probability Sampling
Quota Sampling
• General composition of the sample is decided in advance.
• Right no. of people be somehow found to fill these quotas.
• Done to insure inclusion of particular segment.
• Eg- Minority samples
Purposive Sampling
• Constructed to serve a very specific need.
• It is snowball sample (chain referral sampling)
• Eg- Inclusion of business execs
Convenience Sampling
• Is a matter of taking what is available.
• Accidental sample.
• Eg- volunteers select
Probability Sampling
Simple Random Sampling
• Selection of unit is determined by chance only.
• Procedure – sampling frame, Size of sample, required no. of units.
• Methods
• Lottery method – Blindfold Selection.
• Table of random nos. – random arrangement of digits in rows and
columns
Systematic Sampling
• Obtained by selecting one unit at random.
• Then selecting additional units at evenly spaced interval.
• Adopted as long as no periodicity of occurrence of any particular event
in population.
Stratified Sampling
• Population is divided into subgroups or strata
according common characteristics.
• Types
• Stratified Random Sampling - used when the
population is heterogeneous with regard to
characteristic under study.
• Stratified Systemic Sampling – eliminates sampling
variation.
Cluster Sampling
• Used when population forms natural groups or
clusters.
• Sampling units are clusters.
• They are less expensive.
OTHER SAMPLING METHODS
Multiphase Sampling
• Part of information is collected from whole sample and part from sub – sample.
Multistage Sampling
• 1st stage is to select the groups.
• Then subsamples are taken in as many subsequent stages as necessary to obtain desired sample size.
REFERENCES
Essentials of public health dentistry - Soben Peter –
5th Editiion
THANK YOU !