Statistics

P R E S E N T E D B Y
S H I VA S H A N K A R . K P G 1 S T Y E A R
STASTICAL TESTS USED
IN DENTISTRY

CONTENTS
• Introduction
• Descriptive Statistics
• Measures of Central Tendency
• Measures of Dispersion
• Measures of Position and Outliers
• Inferential statistics
• Normal Distribution
• Parametric tests
• Non-parametric or Distribution free statistical tests
• Correlation and regression
• References

INTRODUCTION
Research can generally be categorised into either quantitative or
qualitative studies.
Three general domains of what statistics can do for scientific
investigations:
• Differences between groups
• Associations between groups
• Time-to-event data

DESCRIPTIVE STATISTICS
Measures of Central Tendency
1.Mean: The mean of a data set is the sum of the observations divided by
the number of observations.
2. Median: The median is the middle value of the data when the data has
been arranged in ascending/descending order.
3. Mode: The most frequently occurring data value in a set of data is
called the mode.

Measures of Dispersion
Range = Largest Value – Smallest Value
The range is based on only two of the items in the data set and
thus is influenced too much by extreme values.
Variance: Average Squared Deviation from the Mean
Standard Deviation = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

Measures of Outliers
Sometimes a set of data has one or more items with unusually large or
unusually small values. Extreme values such as these are called Outliers.
Z score
Z-score for any data item is referred to as its standardized value. It can be
interpreted as a measure of the relative location of an item in the data.
A value with a Z - Score > 3 or Z -Score < -3 will be treated as an outlier.

Measure of Position
Percentile: A percentile is a numerical measure that also locates values of
interest in the data set. A percentile provides information regarding how the
data items are spread over the interval from the lowest value to the highest
value
The pth percentile of a data set is a value such that at least p percent of the
items take on this value or less and at least (100 – p) percent of the items take
on this value or more.

Quartiles: It is often desired to divide a data set into four parts with each part
containing one-fourth of the data.
Q1 = First Quartile = 25% percentile
Q2 = Second Quartile = 50% percentile
Q3 = Third Quartile = 75% percentile

Percentile

The Five Number Summary and Box plots
The Interquartile Range (IQR): IQR = Q3 - Q1
Note: The IQR gives the range of the middle 50% of the observations.
The Five-Number Summary
The five number summary of a data set: Min, Q1, Q 2 , Q3, and Max.
Example 1: Find the five-number summary.
The data is 4, 5, 6, 8, 9, 10, 12, 14, 15, 15, 15, 16, 17, 18, 20, 26
Min = 4, Q1,= 8.5, Q 2 = 14.5, Q3 = 16.5, and Max = 26.

Boxplot : Is Built to Detect Outliers
1. Find Q1, Q 2 , Q3, and IQR.
2. Compute Lower Fence and Upper Fence:
Lower Inner Fence=Q1 -1.5(IQR), Upper Inner Fence=Q3 +1.5(IQR)
Lower Outer Fence=Q1 - 3(IQR), Upper Outer Fence=Q3 + 3(IQR)
3. Draw the box plot indicating the Lower an Upper fences.
4. Determine whether there are any outlier

INFERENTIAL STATISTICS
Inferential statistics consist of statistical methods that are
used to test hypotheses that relate to relationships between
variables.

Null hypothesis:
Fundamental to statistical analysis is the concept of the null hypothesis.
Evolving from both inductive and deductive reasoning, the null
hypothesis was developed because it is easier to disprove than to
prove a hypothesis.

One and Two-Tailed tests
Most tests of significance are two-tailed, meaning that the null
hypothesis can be rejected irrespective of the direction of the effect.
A one- tailed test of significance is used when the researcher is sure
that differences can occur only in one direction

Normal Distribution
• Family of frequency distributions that have same general shape
• Symmetrical
• More scores in the middle than in the tails
• Bell-shaped curve
• Height and spread of a normal distribution can be specified
mathematically in terms of Mean and Standard deviation

Properties
• Mean, median, and mode are very similar
• Curve is symmetrical on either side of mean value
• Probability of any value being above, or below, the mean value is 0.5
• Any individual value is more likely to be closer to the central tendency than the
extremes of the curve
• There is a constant relationship between the standard deviation and probability

Procedure for Performing an Inferential Test
1. Start with a theory
2. Make a research hypothesis
3. Operationalize the variables
4. Identify the population to which the study results should apply (Kids in developed
nations)
5. Form a null hypothesis for this population
6. Collect a sample of children from the population and run the study
7. Perform statistical tests to see if the obtained sample characteristics are sufficiently
different from what would be expected under the null hypothesis to be able to reject the
null hypothesis.

Parametric Versus Non-Parametric Methods
Parametric statistical methods assume
• Normal distribution
• When comparing groups, scores exhibit similar variance or spread
• Interval or ratio level data
Non-parametric methods
• do not make assumptions about the sample of the population distribution
• Data are categories or ranks (nominal or ordinal)
• Usually less powerful
• Need larger samples

Parametric tests:
T-TEST:
• The t-test was published by William Sealy Gosset in 1908
• t-test are used when you want to test the difference between two groups on
some continuous variable
• t-test can also be used when testing the same group of people at two
different times.

Goal of t test: Is there a difference in the populations based on data from
samples of those populations?
Sample mean difference
• Difference between the two sample means
• Larger the mean difference, more likely there is a difference between the
two populations
Sample data variability
• Greater variability reduces likelihood that the sample mean difference is
result of a real difference

Unpaired t-test:
• An unpaired t-test is used to compare two population means.
• Carrying out an unpaired t-test in SPSS
• Analyze
• Compare Means
• Independent-Samples T-Test
• Choose your outcome variable as the Test Variable
• Choose the variable that defines the groups as the Grouping Variable then click on
• Define Groups. You will need to enter the two codes which identify your two groups
• then click on Continue we would enter 1 in Group 1 and 2 in Group 2. Now click OK.

Paired t-test:
• A paired t-test is used to compare two population means where you have two
samples in which observations in one sample can be paired with observations in
the other sample.
• Before-and-after observations on the same subjects
• A comparison of two different methods of measurement or two different
treatments where the measurements/treatments are applied to the same
subjects

Carrying out a paired t-test in SPSS
• The simplest way to carry out a paired t-test in SPSS is to compute the
differences (using Transform, Compute) and then carrying out a one-sample t-
test as follows:
• Analyze
• Compare Means
• One-Sample T Test
• Choose the difference variable as the Test Variable and click OK

ANOVA:
• The ANOVA, which stands for analysis of variance, is like a
generalized version of the t-test that can be used to test the
difference in a continuous dependent variable between three or
more groups or to test the level of a continuous dependent variable
in a single group of respondents who were tested at three or more
points in time.

ANCOVA: Analysis of Covariance
• Extend the ANOVA to include a qualitative independent variable (covariate)
• Used to reduce the within group error variance
• Used to eliminate confounders
30/09/09 ANOVA 38
• Most useful when the covariate is linearly related to the dependent variable
and is not related to the factors (independent qualitative variables)
• Similar assumptions to the ANOVA

Non-parametric or Distribution free statistical tests:
Advantages:
• Probability statements obtained from most non-parametric tests are exact
probabilities.
• Less sample size
• There are suitable tests for treating observations from samples drawn from
several different populations.
• Tests are available to treat data which are inherently in ranks as well as data
whose seemingly numerical scores have only the strength of ranks.
• Methods are available to treat data which are simple classificatory.
• Much easier to learn than parametric tests.

Disadvantages:
• There is no non-parametric methods for testing interactions in the analysis
of variance
• Tables of critical values may not be easily available.
• If all the assumptions of the parametric test are in fact met in the data, and if
the measurement is of the required strength, then non-parametric testes are
waste of data.

Chi square test
• The chi square statistic was also developed by Karl Pearson.
• The chi-square statistic is used to show whether or not there is a
relationship between two categorical variables. It can also be used to test
whether or not a number of outcomes are occurring in equal frequencies or
not, or conform to a known distribution.

Applications:
• Alternate test to find the significance of difference in two or more than two
proportions
• As a test of association between two events in binomial or multinomial
samples
• As a test of goodness of fit

Requirement to apply chi square test:
• Random samples
• Qualitative data
• Lowest observed frequency not less than 5

Wilcoxon signed rank test
• Equivalent to paired ’t’ test
Mann Whitney U test:
• Is used to determine whether two independent sample have been drawn
from same sample
• It is alternative to student t test and requires at least nominal or normal
measurement
Kruskal Wallis test:
• Kruskal-Wallis test is the more general form of the Mann-Whitney test
• Kruskal-Wallis test, doesn’t assume normality, compares medians
• Used instead of ANOVA

Friedman’s two way analysis of variance non-parametric hypothesis test
• It’s based on ranking the data in each row of the table from low to high
• Each row is ranked separately
• The ranks are then summed in each column (group)
• The test is based on a Chi squared distribution
• Just like with the ANOVA the Friedman test will only indicate a difference but
won’t say where the difference lies

Correlation and regression:
• Relationship or association between two quantitatively measured or
continuous variables are called correlation.
• Change in character of a variable character is called regression
Types of correlation:
• Perfect positive correlation r=1
• Perfect negative correlation r=-1
• Moderately positive correlation 0<r<1
• Moderately negative correlation -1<r<0
• Absolutely no correlation r=0

Pearson’s correlation coefficient:
• The purpose of the correlation coefficient is to determine whether there is a
significant relationship (i.e., correlation) between two variables.
• The correlation between any two variables using Pearson’s r will always be
between –1 and +1. A correlation coefficient of 0 means that there is no
relationship, either positive or negative, between these two variables.

• Spearman’s correlation coefficient used along with non-parametric test
for ranked data
• Kendall’s correlation coefficient used for ranked data.

Relative risk:
• Used in prospective studies
• Incidence of disease among exposed / incidence of disease among non
exposed
Odds ratio:
• Used in retrospective study
• Odds that he exposed individual will have the disease / odds that the non
exposed individual will have disease

REFERENCES
1. Elliot Abt (Department of Dentistry, Illinois Masonic Medical Center,
Chicago, Illinois, USA) understanding statistics 1, Evidence Based
Dentistry 2010:11.2 p 60-61
2. Sundar rao. Introduction to biostatistics and research methodology 5th
edition.
3. Interpretation and Uses of Medical Statistics by Leslie E Daly and
Geoffrey J. Bourke, Blackwell science
4. Mahajan.B.K methods in biostatistics 6th edition
5. Rosie Shier mathematics learning support centre, 2004

Statistics

More Related Content

What's hot (20)

Similar to Statistics (20)

More from ●๋•ѕнιναshankar●๋• kengadaran★彡 (9)

Recently uploaded (20)

Statistics