BIOSTATISTICS IN
PSYCHIATRY..
Chairperson:
Dr. Deepanjali Medhi,
Associate professor,
Department of Psychiatry, GMCH.
presentor-:
Dr. Puneet Mathur,
Post Graduate trainee,
Department of Psychiatry, GMCH.
Plan of presentation
SECTION-A. INTRODUCTION TO BIOSTATISTICS.
1. Statistics
2. Biostatistics
3. Biostatistics for Psychiatry
4. Brief history of biostatistics.
SECTION-B. BIOSTATISTICS FOR DISCRIPTION.
1. Basic concepts
2. Presentation of data
3. Measures of central tendency
4. Measures of location
5. Measures of Variability
SECTION-C. BIOSTATISTICS FOR INFERENCE.
1. Basic concepts
2. Study designs
3. Sampling
4. Tests of significance
5. Specific statistical concepts
6. Computers in biostatistics.
INTRODUCTION TO
BIOSTATISTICS
STATISTICS
A piece of information stated as figures.
A science of figures.
A mathematical science.
A set of recorded data.
A multipurpose applied science
BIOSTATISTICS
It’s the term used when tools of statistics are applied to
the data that is derived from biological sciences such
as medicine..
It is the science which deals with development and
application of the most appropriate methods, in a
biological model for the:
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis .
BIOSTATISTICS FOR PSYCHIATRY
In keeping records.
In describing a Psychiatric disease.
In describing the epidemiology
In testing of rating scales.
In testing of various tests.
In testing a hypothesis.
In comparing different guidelines.
In drug trials.
In planning.
BRIEF HISTORY OF BIOSTATISTICS
Uptill early 19th century-
John Grant published the Natural and political Observations
upon the Bills of Mortality.
Blaise Pascal and Pierre de Fermat developed probability theory.
Late 19th and early 20th century-
Sir Francis Galton introduced the concepts of Standard Deviation,
correlation and regression.
Karl Pearson, the founder of mathematical statistics developed
the Correlation coefficient, the method of moments and pearson’s
system of continuous curves.
Galton and Pearson founded Biometrika, the first journal of
mathemetical statistics and also the first university statistic
Through 20th century-
Sir Ronald Fisher- introduced the term
Variance, and concepts of sufficiency, ancillary
statistics, fisher’s linear discriminator and the
fisher information.
Egon pearson and jerzy Neyman introduced
the concept of Type-2 error, Power of a test,
and Confidence interval..
Since late 20th century computers has taken care
of large scale statistical computations
DISCRIPTIVE
BIOSTATISTICS
BASIC CONCEPTS
DATA AND THE VARIABLE
A DATA is a fact.
The fact, that can be measured.
And, the fact, that has some character, characteristic
or quality.
Ie. The height 6 feet..
Ie. The weight 70 kg..
In statistical language any character, characteristic or
quality which can vary is called a Variable.
Ie. Basal metabolic rate, body temperature etc..
TYPES OF DATA
Qualitative – discrete data-
They are defined by the frequency of a character.
And this character is not measurable..
Ie 15 vaccinated and 15 unvaccinated.
Ie. 20 males and 20 females.
Quantitative- continuous data-
Thay are defined by the frequency of a character,
which has a measurable Magnitude.
Ie. Height of students in a classroom.
TOOLS OF MEASUREMENTS IN
EPIDEMIOLOGY
A. Rate-Measuring the Occurrence of an event in a time
frame.
Eg. Crude death rate,
Crude birth rate.
B. Ratio-It express the relation in size between two random
quantities.
Eg. Sex ratio,
Doctor-population ratio.
C. Proportion- its the relation in magnitude of a part to
the whole.
Usually expressed in percentage.
BASIC MEASUREMENTS IN
EPIDEMIOLOGY
Measurements of Mortality-
Eg. Crude death rate,
Specific death rate,
Case fatality rate (ratio)
Measurements of Morbidity-
Eg. Incidence
prevalence
Various other Measurements-
Disability rates
Natality rates
Health needs
RELATIONSHIP B/W INCIDENCE &
PREVALENCE
P= I x D
D- duration of disease..
Eg. With advent of newer drugs and education
incidence of HIV has come down while the
prevalence is increasing..
Eg. With proper care of mentally ill patients ,
life expectancy would be increased leading to
increased prevalence..
METHODOLOGY OF DESCRIPTIVE
EPIDEMIOLOGY
Defining the population under study.
Defining the disease under the study.
Distribution by time, place and person.
Measurement burden of disease.
Comparing with the known indices.
Formulation of an aetiological Hypothesis.
APPLICATIONS OF DESCRIPTIVE
EPIDEMIOLOGY
Planning and organizing of adequate interventions.
Evaluation of existing programme.
Formulation of newer programme.
Background for further research.
Funding for a particular disease.
Eg. Proper analysis of prevalence of mental illnesses
would facilitate adequate budget from the
government.
PRESENTATION OF DATA
TABLES, GRAPHS AND DIAGRAMS
FREQUENCY TABLES
variable frequency percent
•2 3 21.4 21.4
•3 4 28.6 50.0
•4 3 21.4 71.4
•5 1 7.1 78.5
•6 1 7.1 85.6
•7 2 14.3 100.0
SAMPLE- 2,4,3,5,2,6,4,3,3,7,4,2,3,7
HISTOGRAM
FREQUENCY POLYGON
Males Females
%
40
35
30
25
20
15
10
5
0
Age
25 35 45 55 65
FREQUENCY CURVE
9
8 Female
7 Male
6
Frequency
5
4
0
20- 30- 40- 50- 60-69
Age in years
LINE CHART/ GRAPH
MMR/1000
60
50
40
30
20
10
0
Year
1960 1970 1980 1990 2000
CUMULATIVE FREQUENCY
DIAGRAM
Its also called the OLIVE..
Its is the graphic form of cumulative frequency
tables.
In the graph, the cumulative percentage at
successive data is present as an area. This area
get increased with the dispersion of data.
The diagram indicates how the data disperse in
the sample.
Eg. Cumulative charting of total scores made
by a team.
SCATTER OR DOT DIAGRAM
They are plotted to check relation between two
variables..
Positive correlation is expressed as the
increased density of dots around a straight line.
Dispersed dots refers to NO relation.
Eg. People with more income will consume
more protein, fat and sugar. If we take fat
intake and sugar intake on x, y axis, the rich
people will be clustered around a mid line on
the graph.
BAR DIAGRAM
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed
Marital status
PIE OR SECTORAL DIAGRAM
Deletion
Inversion
3%
18%
Translocation
79%
PICTORAL OR PICTURE DIAGRAM
Pictures or symbols are present to present the
data.
In essence, they are form of bar diagram.
A picture of doctor to present the population
per physician..
They are good to present ratios.
Easy to be used by relatively less literate
people.
At places like hospitals, railway stations..
MAP DIAGRAM OR SPOT MAP
To show data in a demographic or
administrative area..
Eg. Shaded maps or Dot maps.
Shaded area with different colours can be used.
Eg. Number of seat won by a political party in
terms of the area covered over the map..
Eg. Showing the distribution of a variable over
a large region.
MEASURES OF CENTRAL
TENDENCY
MEAN
The arithmetic average of the total sum of data.
Most commonly used average value in statistics.
The mean is defined as total sum of the variable
divided by the total numbers of the data.
Advantages - easy to obtain
-easy to understand
-a true average.
Disadvantage- not useful in very asymmetrical
distribution.
MODE
Its the most frequently occurring data in a
sample.
Its easy to understand.
Not affected by extreme location.
Not a true average.
Not used in biological statistics usually.
It has some use in finance and trades.
MEDIAN
Its the middle value in a sample when variable
is arranged in a ascending order.
Used for samples in which data are
asymmetrically distributed as in such samples
mean can not provide the average value.
Not a true average.
Its used when Mean is not desirable.
Eg. Average no. of death every month in last 5
years in a region where a earth quake caused
many deaths recently.
MEASURES OF LOCATION
PERCENTILES
Its the arrangement of data in terms of
percentages..
A value is expressed as the percentage of the
value with maximum magnitudes.
Frequently used for assessment in exams.
A 50th percentile represent the median or
middle value.
Growth charts are good examples.
QUARTILES
Entire data is presented in 4 equally weighing
sectors.
The desired value can be traced in a particular
quartile.
Mid range represents the variable present
between the 25% and 75% value.
Quartile can be used as a replacement of
percentiles when a very large sample is being
dealt with.
Eg. Height and weight of soldiers in big
regiment.
MEASURES OF VARIABILITY
RANGE
Its the simplest measure of dispersion.
Its defined as the difference between the highest
and lowest figure in a given sample.
Eg. For a sample-
83,75, 81, 79, 71, 90, 95, 77, 94..
The range here is expressed as-
Either- 71 to 95
Or the actual difference ie. 24.
INTERQUARTILE RANGE
When data distribution is divided in quartiles,
the range of different quartiles can be
presented separatly.
Mid range is one such range which present
data between quartile Q1 and Q3..
Useful when there is big size sample is being
studied.
Eg. interquartile range for purchase of arms per
month in last 60 months..
MEAN DEVIATION
The average of the deviations from the
arithmetic mean.
Also known as average deviation.
It is obtained with sum of the deviations from
the mean divided by the total no. of variables.
Gives an idea of distribution of variables
STANDARD DEVIATION &
VARIANCE
Standard deviation is the most frequently used
measure of deviation.
Its defined as ROOT-MEAN-SQURE Deviation.
In a normal distribution curve its value
determine the dispersion of data.
# Variance is a related term.
# Variance is square of the standard deviation.
COEFFICIENT OF VARIATION
CV is the normalised measure of dispersion.
Also known as Unitized Risk or variation
coefficient.
Its sometimes known as relative standard
deviation.
its defined as the ratio of standard deviation to
the mean.
Its use is mainly confined to the data measured
on ratio scales.
STANDARD ERROR
Its the standard deviation for a sample which
is being attributed to the population.
Standard error of mean SEM, is the standard
deviation of the sample- mean’s estimate of a
population mean.
SEM is usually estimated with the standard
deviation divided by the square root of the sample
size.
# 95% confidence limit= mean+(SEM x 1.96)
# Standard error is usually used for test of
Significance in Parametric tests.
DISTRIBUTION OF THE
DATA
THE NORMAL DISTRIBUTION
Normal distribution-
Also known as gaussian distribution.
Data are in symmetrical distribution.
Mean=mode=median
# p- value or the probability can be fixed on a normal
curve by choosing a desired confidence interval,
which usually is 95%..ie p=0.05
CONCEPT OF SKEWING
Skewing is seen in Asymmetrical distribution-
They shows skewed curve with tail of curve towards
the one side of the midline.
Data are in asymmetrical distribution.
Mean , mode and median are not equal.
# Median is used as a central tendency in skewed
curve
INFERENTIAL
BIOSTATISTICS
BASIC CONCEPTS
CONCEPT OF SCREENING
A. SCREENING V/S DIAGNOSTIC TEST
B. CRITERIA FOR A SCREENING TEST
- Acceptability
- Repeatability
- Validity
- Yield
- Cost effective
C. Evaluation of a screening test-
- Sensitivity
- Specificity
- Predictive value
D. USES OF SCREENING-
- Case detection
- Control of disease
- Research purpose
- Educational opportunities
CONCEPT OF CAUSAL ASSOCIATION
A. ONE TO ONE CAUSAL RELATIONSHIP
B. MULTIFACTORIAL CAUSATION
C. ADDED CRITERIAS-
- Temporal association
- Strength of association
- Specificity of association
- Consistency of association
- Biological plausibility
ESTIMATION OF RISK
EXPOSURE RATE-
it is the percentage of the study subjects who were exposed
to the risk factors or the causative agent.
A case-control study provide the direct estimation of the
exposure rate or the frequency of exposure.
ODD RATIO-
Also known as cross product ratio
Its closely related to relative risk
A measure of strength between the risk factor and outcome.
Used in case control study
Mathematically its AxD/BxC
Relative risk-
It is the ratio of the incidence among the exposed and the
incidence among the non-exposed.
Used in cohort studies.
Attributable risk-
Its the difference in incidence rates of disease between an
exposed group and non exposed group.
Its often expressed in percentage.
# Estimation of risk is done for etiological enquiries. Bigger the
risk, greater is the association of risk factor with the disease.
CONCEPT OF CORRELATION
Correlation refers to the direction of the relationship or
association between two quantitatively measured or
continuous variable.
Coefficient of correlation measures the degree of this
association.
R=+1 means Perfect positive correlation.
R= -1 means Perfect negetive correlation.
R= 0 means NO correlation.
# CC is the normalised version of Co-variance which is a
measure of how much two random variable changes
together.
THE REGRESSION
In simple word Regression means Relation.
It refers to the mathematical relationship
between two variable which can be used to
know the value of one if the value of other
variable is known.
# correlation- direction of association
# regression- prediction of the value of
associated variable.
# hence Regression helps to know the cause
and effect relationship precisely..
THE NULL HYPOTHESIS
Its used in tests of significance..
In Null Hypothesis its assumed that there is no
significant difference between given variable or
if there is any difference it is by chance only.
This null hypothesis is rejected or accepted
with use of various tests of significance. Ie Z-
test, T- test etc.
An alternate hypothesis is one which assume
the opposite of the null hypothesis.
STUDY DESIGN
OBSERVATIONAL STUDIES
1. descriptive studies-
Description in Time, Place and Person.
2. analytical studies-
A. Ecological or correlational studies.
B. Cross sectional or prevalence studies.
C. Case control or case reference studies.
D. Cohort or follow up studies
EXPERIMENTAL STUDIES
1. Randomized Controlled trial.
Patient is unit of study.
2. Field trials.
Healthy subjects are unit of study.
3. Community trials.
Community is unit of study.
OTHER SPECIFIC STUDIES
1. Uncontrolled trials.
2. Natural experiments.
3. Before and after comparison test.
4. Efficacy and effectiveness studies.
SAMPLING
CONCEPT OF SAMPLING
When a large proportion of individual are to be
studied, it is desired to take a Sample.
It is easier or more economical to study the
sample than the whole population.
Its important to make sure that the individual
in the sample should be representative of the
study population.
# a sampling frame is the listing of the members
of the population from which the sample is to
be drawn.
METHODS OF SAMPLING
SIMPLE RANDOM SAMPLING
LOTTERY METHOD
TABLE OF RANDOM NUMBER METHOD
SYSTEMIC SAMPLING
STRATIFIED SAMPLING
MULTISTAGE SAMPLING
CLUSTER SAMPLING
MULTIPHASE SAMPLING
TESTS OF SIGNIFICANCE
BASIC CHARACTERS OF TESTING A
HYPOTHESIS
1. P- value
2. Power of a test
3. Degree of freedom
4. Type-1 error
5. Type-2 error
TESTS OF “SIGNIFICANCE”
1. parametric tests- for normal distribution.
Eg. Z-test
Eg. T-test
Eg. Chi square test
Eg. F-test/ANOVA/ANACOVA
2. non-parametric tests- for asymmetric distribution.
Eg. One sample sign test.- for hypothesis concerning some single
value for a given sample.
Eg. Fisher-Irwin test.- for hypothesis concerning NO DIFFERENCE
among two or more set of variable.
Eg. Kendall’s coefficient of concordance.- for hypothesis
concerning relationship between variables.
Eg. Kruskal-wallis test.- for hypothesis concerning variations in a
given sample.
Eg. One sample RUNS test.- for hypothesis concerning randomness
CHI SQURE TEST
Its a test with quality of both paramatric and non-
parametric testing.
It offers to test the significance of difference between the
two proportion.
It is done in following steps-
1. observed and expected value is obtained from the
proportions.
2. value of chi square is obtained from O & E.
3. the obtained value is matched against the given value in
probability table for a fixed value of degree of freedom.
SPECIFIC STATISTICAL CONCEPTS
REGRESSION MODEL
GENERAL LINEAR MODEL
LOG LINEAR MODEL
LONGITUDINAL MODEL
PATH ANALYSIS
MULTIVARIATE ANALYSIS
META- ANALYSIS
COMPUTERS IN BIOSTATISTICS
APPLICATIONS OF COMPUTERS
For methods of biostatistics.
Public health management
Hospital management
Research management
SOFTWARES & BIOSTATISTICS
Statistical software application programs contain
a collection of statistical method that help
provide solution to health science problems
along with other fields.
These programs allow users who are relatively
unskilled in statistics to access a wide variety of
statistical method for their data sets of interest.
The “SPSS”
STATISTICAL PACKAGE FOR SOCIAL
SCIENCES.
This software offer computation for-
Frequency distribution
Descriptive measures
Confidence interval
T- test
Chi square test
ANOVA
Regression analysis.
BIBILOGRAPHY
Comprehensive text book of psychiatry, 9th ed.
Park’s text book of preventive and social medicine, 22
ed.
B.K. Mahajan’s Methods in biostatistics, 7th ed.
C.R. Kothari’s Research methodology, 2nd ed.
Robert F Woolson’s Statistical methods for the
analysis of biomedical data, 10th ed.
S.P. Gupta’s Statistical methods, 38th ed.
David C Howell’s Statistical methods for Psychology.
Larry Winner’s Introduction to Biostatistics.
Various internet sites.
THANK YOU..
&
HAPPY SCHIZOPHRENIA DAY