Biostatistics PDF
Biostatistics PDF
DEPARTMENT OF BOTANY
METHODOLOGY AND PERSPECTIVE IN PLANT SCIENCE
4th SEMESTER BOTANY
Module 1: BIOSTATISTICS
Biostatistics can be defined as the application of the mathematical tools used in statistics to the
fields of biological sciences and medicine.
Biostatistics is the discipline concerned with the design and analysis of data from biomedical
studies.
It comprises a set of principles and methods for generating and using quantitative evidence to
address scientific questions, for estimating unknown quantities and for quantifying the
uncertainty in our estimates.
Methods of data collection
1. complete enumeration survey
It is the method of data collection in which all the units of study are visited for data
collection.
The units of study together constitute the study population.
The method of collection data from all the units of study is called census method or
complete enumeration. This is highly reliable method but it is time and money
consuming.
2. Sampling
Studying a whole population on the basis of the study of samples drawn from it.
A sample is a representative sub set of whole population. It represent the whole
population in respect of the specific characteristics under investigation.
Study of the sample gives information about the whole population. It is called
statistical inference.
Samplings involves three principal steps
1. Selection of sample.
2. Collection of information about them.
3. Making inference about whole population.
The theory of sampling is based on certain principles derived from the theory of
probability
1. Principle of statistical regularity
It holds that a moderately large number of items chosen at random from
a large group are almost sure to possess the major characteristics of the
group as a whole.
1
It maintains that if samples are taken at random from a population, they
are likely to possess the characteristics of the population in general.
2. Principle of inertia of large numbers.
It is the concept that the larger the size of the sample, the more accurate
the results are likely to be. Eg., If a coin is tossed ten times, we expect
equal chance for head and tail.
a. random sampling
I. simple or unrestricted random sampling
Random sampling in which all items of the population get an equal
chance of being included in the sample.
The selection is free from personal bias.
For random selection we choose
1. Lottery method- lot method
2. Table of random sampling
II. restricted random sampling
Random sampling, in which certain restrictions are imposed while
sampling.
1) stratified sampling
Here the whole population divided into homogenous groups called
strata.
A specific number of random samples drawn from each stratum, and
finally all these samples thus selected polled together.
The method of random sampling with greater accuracy and greater
geographical concentration.
This method should be done with the complete supervision of skilled
person.
2) systematic sampling/ quasi random sampling
Random sampling technique in which the population is arranged in
order, the first item is selected at random and further items are
selected at specific intervals.
Simple, effective and convenient method and the time and effort
involved are relatively lesser.
The main limitation is that it becomes less representative, if the
populations have hidden periodicities.
3) multistage or cluster sampling
Sampling procedure carried out in several stages.
The population is divided into several groups, called clusters, and a
desired sample is selected from them to represent the whole
population.
This system introduces flexibility to the sampling method and
enables existing division and subdivisions of the populations to be
used as unit at various stages.
It enables intensive field work and coverage of large areas.
2
b. non-random sampling
I. judgment sampling
This is the method of sampling in which the choice of sample items
depends exclusively on the discretion of the investigator.
The investigators use his judgement in the choice and include those items
in the sample which he thinks are most typical of the universe with
regard to the characteristic under investigation.
II. convenience sampling
In this method, each unit is selected only for convenience. A unit
selected in this way is called a chunk.
A chunk refers to a fraction of population, selected only because of
convenience.
The results obtained following convenience sampling can hardly be
representative of the population. They are generally biased and
unsatisfactory.
III. quota sampling
Type of judgement sampling.
Quotas are setup according to some specific characteristics, such as so
many in each of several flower colour groups.
3
3. Captions or column headings
4. Stubs or row designations
5. Body
6. Foot notes
7. Sources of data
Types of tables
1. Time of time series
It carries data based on the chronology of time.
2. Frequency table
A frequency tables is constructed by arranging collected data values in ascending order of
magnitude with their corresponding frequencies.
a. Discrete frequency tables- variables have a definite gap between two values.
b. Continuous frequency tables- variables that have all possible intermediates
diagrammatic and graphic representation of data
Collected data can be represented in diagrams and graphs.
Diagrams are different types
1. One diamensional diagrams
Bar-diagrams
a. Simple bar diagram
4
c. Multiple bar diagrams
5
f. Broken bars
a. Rectangles
The area of the rectangles is equals to the product of its length and width.
b. Squares
The method of drawing a square diagram is very simple. One has the square root
of the values of various items that are to be shown in the diagrams and then select
a suitable scale to draw the square.
c. Circles
The circles are to be compared on area basis rather than on diameter basis.
d. Pie diagram
6
3. Three diamensional diagrams
This is also known as volume diagrams and it consist of cubes, cylinders, blocks etc.
Here length, width and height have to be taken in account.
4. Pictograms and cartograms
Pictogram
These are pictures used in representing statistical data.
It really depicts the kind of data with pictures.
Pictures are attractive and easy to comprehend and such this method is particularly
useful presenting statictics to laymen.
7
Graphs
Graphs are broadly classified into two
1. Graphs of time series or line graphs
Simplest form of graphical representation.
Graphs of time series can be constructed either on a natural scale or on a
ratio scale.
Here the variables at different point of time, the series so formed is
known to be time series.
8
2. Graphs of frequency distribution
Here frequencies are distributed over the different classes.
The frequency distribution can be presented graphically in three ways
a. Histogram
A histogram is a set of bars whose areas are proportionate to the
frequencies represented.
b. Frequency polygon
It is the graph of frequency distribution. It is particularly effective in
comparing two or more frequency distributions.
It has a special advantage over histogram. The frequency polygon of
several distribution can be plotted on the same axis, thereby making
certain comparisons possible histograms are preferable when classes
are numerous.
9
c. Smoothed frequency curve
It can be drawn through the various points of the polygon.
The curve is drawn free hand in such a manner that the area included
under the curve approximately the same as that of polygon.
10
OBSERVATION
Scientific observation will lead the investigation or any research to the success.
A keen observation the study area or field will gave the correct or accurate results.
Observation can be classified under 4 major categories
1. Direct and indirect observation
This is the observation of an event personally by the observer. This method is very much
flexible and it allows the observer to see and record the necessary aspects of an event or
behaviour.
Indirect observation which does not involve the physical presence of the observer, and the
recording of observation is done by mechanical, photographic or electronic devices.
2. Controlled and uncontrolled observation
In a controlled study, researchers are able to determine which of their subjects receive the
factor that is being tested for having a causal influence upon another factor. This type of
observation may be carried out in a laboratory type situation and because variables are
manipulated is said to be high in control.
In uncontrolled or observational studies, researchers have no such control over whether their
subjects receive the treatment being investigated
3. Structured and unstructured observation
Structured observations are planned and systematic observation which records information or
data in a standard way, following certain designs, patterns and rules.
Unstructured is opposite to structured. It is casual, unplanned and nonsystematic observation,
without following any rules and directions.
Module 2: BIOSTATISTICS
ANALYSIS OF NUMERICAL DATA
Quantitative data collected from experimental populations are analyzed with the help of
different statistical tools as to reach inferences on different aspects under study.
The techniques are listed below
1. Study of central tendencies
2. Study of dispersion
3. Correlation studies
4. Regression studies etc.
1. Analysis of central tendencies
a. Arithmetic mean
11
Mean is the sum of all the results included in the sample divided by the
number of observation.
It is quick and easy to calculate but it may not representative of whole sample.
Arithmetic mean can be obtained in different formula
1. Individual observations( unorganized data)
∑
Arithmetic mean, ̅
X= Number of observation in the data
N= Number of observation
2. Discrete frequency table
∑
Arithmetic mean, ̅
f =frequency of corresponding X in the frequency table
N= ∑
3. Continuous frequency table
∑
Arithmetic mean, ̅
m = mid value of the frequency classes of the data.
f =frequency of corresponding X in the frequency table
N= ∑
b. Median
Middle value of all the numbers in the sample.
It is the value that divides the set of data in 50% of the observations being
above it and 50% being below it.
Median can be obtained in different formula
1. Individual observations( unorganized data)
th
Median= size of the term
N= number of observation
2. Discrete frequency table
th
Median= size of the term
N= number of observation
12
c. Mode
It is the most frequently observed value of the measurements in the sample.
There can be more than one mode or no mode.
Mode can be obtained in different formula
1. Individual observations( unorganized data)
Mode = the observation or value occurring the highest number of times in a
distribution is the mode value
2. Discrete frequency table
The class with the highest frequency is the mode class. The value of the
variable corresponding to the highest frequency is the mode.
3. Continuous frequency table
( )
Mode =
L= lower limit of the modal class
The difference between the frequency of the modal class and the
frequency of the premodal class
The difference between the frequency of the modal class and the
frequency of the postmodal class
2. Measures of dispersion
Measures of dispersion or variation are the statistical values under study from the
central values of the variables under study from the central values.
Major measures of dispersion being used in statistical analysis are listed below
a. Range
It is the simplest method of studying dispersion of the values of the
variable under study.
It is the difference between the value of the smallest item and the value
of the largest item of the distribution.
It is based on the extreme items of the distribution only.
Range R= L-S, L=largest value of the variable. S= smallest value of the
variable.
b. Variance
It is the mean of the squared deviations of the individual values from
the mean of the distribution.
Variance can be obtained in different formula
1. Individual observations( unorganized data)
13
∑
Variance =
d= ̅
2. Discrete frequency table
∑
Variance =
d= ̅
c. Quartile deviation
It gives the average quantity by which the two quartile differs from the
median.
In a symmetrical distribution, 2 quartiles Q1and Q3 are equidistant from
the median.
When quartile deviation is very small, it describes high uniformity or
small variation of the central 50% items and a high quartile deviation
means that the variation among the central items is large.
An interquartile range is a measure developed for this purpose. This is
the range which includes the middle 50% of the distribution.
Interquartile range = Q3- Q1
Q1= N/4 th term of the distribution.
Q3=3N/4 th term of the distribution.
Quartile deviation is an absolute measure of dispersion. The relative
measures corresponding to this measure, called the coefficient of
quartile deviation.
Coefficient of quartile deviation
d. Mean deviation
It is the average difference between the items in a distribution and the
median or mean of that series.
Mean deviation can be obtained in different formula
1. Individual observations( unorganized data)
| |
MD= ∑
a = average about which M.D is calculating
N= total number of observation
xi = observations
14
2. Discrete frequency table
MD= ∑ | |
f = frequency
a = average about which M.D is calculating
N= total number of observation
xi = observations
e. Standard deviation
Known as the root-mean square deviation.
It is the square root of the means of the squared deviations from
arithmetic mean.
Standard deviation measures the absolute dispersion or variability of a
distribution. Greater the amount of dispersion then greater the SD., for
the greater will be the magnitude of the deviations of the values from
their means.
Small SD means a high degree of uniformity of the observations.
Two or more comparable series with identical or nearly identical means
it is the distribution with the smallest SD has the most representative
mean
Standard deviation can be obtained by
1. Individual observations( unorganized data)
∑( ̅)
S.D, =√
2. Discrete frequency table
∑ ( ̅)
S.D, =√
fi= frequency of each observation
f. Coefficient of variation
It is the relative measures of SD.
Used to compare the variability of two or more than two series.
The series with greater coefficient of variation of variation is more
variable
Coefficient of variation =
Coefficient of SD=
3. Correlation analysis
15
Statistical correlation is the conditions in which 2 variables are intimately
interrelated so that a change in the values of one variable may cause a
corresponding change in the value of the other.
Correlation analysis helps to determine the degree of linear relationships between
variables.
Types of correlation
1. Positive and negative correlation
2. Linear and non-linear correlations
3. Simple, multiple and partial correlation
Coefficient of correlation is a measure of the degree to which there is alinear
relationship between to variables.
Coefficient of correlation indicates how far a change in one variable is related to
the change in another.
Coefficient of correlation varies from -1 to +1. If it is -1, the correlation is perfect
negative; +1 , correlation is perfect positive. If it is zero correlation is zero, there
is no correlation.
∑
Coefficient of correlation
√∑ ∑
̅, y= ̅
4. Regression analysis
Regression is the statistical method or tool which helps to estimate the unknown
values of one variable from the known values of another related variable.
The measurement of the probable form of relationship between two related
variables known as regression analysis.
It helps to establish nature of the linear relationship between variables.
Regression equations
There are two regression lines of two variables X and Y.
There are two regression equations.
1. Equation of X on Y
Used to describe the variation in the value of X for given changes
in Y.
̅ ( ̅)
∑
∑
16
2. Equation of Y on X
Used to describe the variation in the values of Y for given changes
in X.
̅ ( ̅)
∑
∑
17
The total occurance of different outcome will be expressed as
(p+q)n = pn+ npn-1q+……..+qn
The number of tems in a binomial expansion is always n+1
The exponents of p and q, for any single term, when added together, always sum to n.
The exponents of p are n, n-1,n-2……1,0, respectively and the exponents of q are
0,1,2…….n-1, n respectively.
2. Poisson distribution
Discrete probability distribution and widely used in statistical work.
Limiting form of the binomial distribution as n becomes infinitely large and p
approaches zero.
Distribution is used to describe the behaviour of rare events and has been called law
of improbable events.
Formula for poisson probability
( )
3. Normal distribution
RESERCH METHODOLOGY
Chapter 1: Scientific methods
Scientific method is the systematic procedure and technique used in scientific inquiries and
investigations for explaining natural observations, acquiring new knowledge, correcting or
integrating previous knowledge, correcting or integrating previous knowledge, formulating
generalization and making predictions.
Major steps in scientific methods
1. Observation
Should be correct and repeatable.
Observation can be direct, using our sense organs or indirect with the help of
instruments.
2. Defining a problem
18
A problem should be defined in such a way that the different aspects of the observed
phenomena are amenable to investigation.
Asking questions about what? , why? , how? of the observation have to be subjected
to scientific analysis.
3. Collection of information
Helps to know more about previous works and to avoid repetition.
Investigator collects and makes use of all the available information and data about the
problem.
4. Framing a hypothesis
Hypothesis is a supposition, assumption or provisional explanation which has to be
proved or disproved in the light of accepted facts.
It will provide a logical answer to a question and also helps to predict new facts.
5. Testing or experimenting
The correctness, validity or acceptability of a hypothesis has to be tested with the
help of a controlled experiment.
6. Theorizing
A theory should be formulated on the basis of s experimental evidences or on the
basis of the analysis of interrelated facts.
A hypothesis become a theory when it is supported by a large body of observations
and experimental evidences
19
It involves the deductive construction of scientific theories in mathematics and
logic.
Constructive method is applied solely to formal sciences , such as
mathematics, statistics and logic.
4. Hypothetico-deductive method
In this method, initial hypothesis gets evaluated by a complex and step-by-
step procedure leading either to its substantiation and acceptance or to its
rejection.
Hypothesis
Hypothesis is a testable tentative or provisional generalization based on previous
knowledge and it forms the basis for reasoning.
Hypothesis serves the following purposes.
1. Offers adequate explanation of related facts.
2. Helps in the formulation of laws, explaining facts.
3. Helps in the selection of the appropriate method of testing and verification.
4. Channels scientific inquiries in the right direction, suggesting experiments and
observations, and helping the collection of evidences.
5. Helps in drawing conclusions to expand the horizon of knowledge.
Features of good or valid hypothesis
1. It provides answer to the problem of inquiry.
2. It must be straight forward, definite and unambiguous.
3. Hypothesis must be empirically testable or verifiable so that it can be ultimately
accepted, rejected or revised.
4. It must be simple with very high predictability.
5. Its explanation must be true to the existing state of knowledge.
Types of hypothesis
1. Crude and refined hypothesis
Crude hypothesis indicate the kind of data to be collected, does not lead to the formulation
of theories.
Refined hypothesis lead to the formulation of theories.
Refined hypothesis classified into three
a. Simple- does not involve much of testing and verification.
b. Ideal- one which examine the causes and relations of natural phenomena.
c. Complex hypothesis-concerned with the interrelations of multiple variables
2. Null hypothesis
This is the hypothesis nullified by the negative evidence of testing.
Null hypothesis was put forward on the belief that might be true and could be used in
collecting data.
This is used in sampling theory and providing additional support to an accepted hypothesis.
Formulation of hypothesis
Scientific hypothesis originate from different thought processes, such as analogy,
induction, deduction and intuition.
Verification and proof of hypothesis
20
Verification of a hypothesis means the testing of the truth of the hypothesis in the light of
facts. Verification can be direct or indirect.
Direct verification is through simple observation or experiment. In indirect verification,
the consequence deduced from the hypothesis is compared to actual facts.
In order to prove a hypothesis, it is essential first to verify.
22
Journals with higher impact factors are considered important.
It is the scientometric index that reflects the yearly average number of citations of
the papers published in an academic journal.
For any given year, the impact factor is calculated as the ratio between the
number of citations received by publications published in the two preceding years
in that particular year and the total number of citable items published in that
journal during the two preceding years.
Sources of references
There are different sources which gives a detailed picture on the references or
previous works, which is already carried out in research areas related to a
particular research project.
Indexing and abstracting services are available in different disciplines to get the
abstract of papers related to any project or field of study.
There are different references sources are used in different disciplines of study
1. Citation indexes
2. Google scholar
3. INFLIBNET
4. Shodhganga
5. e-PG Pathasala
6. ePathasala
7. NCBI
In modern science there are different software companies have developed novel
types of software for effective and comprehensive presentation of information and
very efficient communication with the viewers.
In most caes slides are prepared using softwares like Microsoft Powerpoint,Open
Office Impress, Apple Keynote And Corel Presentation. And projected using LCD,
LED or DLP projectors.
The major presentation software are
1. Microsoft Power Point
2. Open Office Impress
3. Apple Keynote
4. Corel Presentation
23