An Introduction To T-Tests: Statistical Test Means Hypothesis Testing
An Introduction To T-Tests: Statistical Test Means Hypothesis Testing
An introduction to t-tests
A t-test is a statistical test that is used to compare the means of two groups. It is
often used in hypothesis testing to determine whether a process or treatment
actually has an effect on the population of interest, or whether two groups are
different from one another.
You want to know whether the mean petal length of iris flowers differs according to
their species. You find two different species of irises growing in a garden and
measure 25 petals of each species. You can test the difference between these two
groups using a t-test.
The null hypothesis (H0) is that the true difference between these group
means is zero.
The alternate hypothesis (Ha) is that the true difference is different from zero.
Slide - 23
The t-test is a parametric test of difference, meaning that it makes the same
assumptions about your data as other parametric tests. The t-test assumes your
data:
1. are independent
2. are (approximately) normally distributed.
3. have a similar amount of variance within each group being compared (a.k.a.
homogeneity of variance)
If your data do not fit these assumptions, you can try a nonparametric alternative to
the t-test, such as the Wilcoxon Signed-Rank test for data with unequal variances.
Slide - 24
where x̅ is the mean of the sample, and µ is the assumed mean, σ is the
standard deviation, and n is the number of observations.
T-test for the difference in mean:
Slide – 25
Slide – 26
Two sample t-test (two-tailed t-test)
Two sample t-test is a test a method in which the critical area of a
distribution is two-sided and the test is performed to determine whether
the population parameter of the sample is greater than or less than a
specific range of values.
A two-tailed test rejects the null hypothesis in cases where the sample
mean is significantly higher or lower than the assumed value of the mean
of the population.
This type of test is appropriate when the null hypothesis is some
assumed value, and the alternative hypothesis is set as the value not equal
to the specified value of the null hypothesis.
The two-tailed test is appropriate when we have H 0: µ = µ0 and Ha: µ ≠
µ0 which may mean µ > µ0 or µ < µ0.
Slide – 27
Independent t-test
An Independent t-test is a test used for judging the means of two
independent groups to determine the statistical evidence to prove that the
population means are significantly different.
Subjects in each sample are also assumed to come from different
populations, that is, subjects in “Sample A” are assumed to come from
“Population A” and subjects in “Sample B” are assumed to come from
“Population B.”
The populations are assumed to differ only in the level of the
independent variable.
Thus, any difference found between the sample means should also exist
between population means, and any difference between the population
means must be due to the difference in the levels of the independent
variable.
Slide - 28
T-test example
If a sample of 10 copper wires is found to have a mean breaking strength
of 527 kgs, is it feasible to regard the sample as a part of a large
population with a mean breaking strength of 578 kgs and a standard
deviation of 12.72 kgs? Test at 5% level of significance.
Taking the null hypothesis that the mean breaking strength of the population
is equal to 578 kgs, we can write:
H0 : µ = 578 kgs
Ha : µ ≠ 578 kgs
x̅ = 527 kgs , σ = 12.72 , n = 10.
Based on the assumption that the population to be normal, the formula for
the test statistic t can be written as:
t = (527+578) / (12.722/√10)
t = 21.597
As Ha is two-sided in the given question, a two-tailed test is to used for the
determination of the rejection regions at a 5% level of significance which
comes to as under, using normal curve area table:
R : | t | > 1.96
The observed value of t is -1.488 which is in the acceptance region since R: | t |
> 1.96, and thus, H0 is accepted.
Side- 29
T-test applications
The T-test is used to compare the mean of two samples, dependent or
independent.
It can also be used to determine if the sample mean is different from the
assumed mean.
T-test has an application in determining the confidence interval for a
sample mean.
Slide- 30
Definition of F-test:
In statistics, a test statistic has an F-distribution under the null
hypothesis is known as an F test. It is used to compare the statistical
models as per the data set available. George W. Snedecor, in honor
of Sir Ronald A. Fisher, has given name to this formula as F Test
Formula.
Slide- 31
Formula for F-Test to Compare Two Variances:
A Statistical F Test uses an F Statistic to compare two
variances, σ1andσ2, by dividing them. The result will always be a
positive number because variances are always positive. Thus, the
equation for comparing two variances with the F-test is:
F−value=variance/ variance2
i.e. F−value=σ21/ σ22
Slide- 32
Solved Examples for F Test Formula
Q.1: Conduct an F-Test on the following samples:
Solution:
Step-2:- Calculate the F-critical value. Here take the highest variance
as the numerator and the lowest variance as the denominator:
FValue=σ21σ22
FValue=109.6365.99
FValue=1.66
Step-3:- Calculate the degrees of freedom as:
The degrees of freedom in the table will be the sample size -1, so for
sample-1 it is 40 and for sample-2 it is 20.
Step-4:- Choose the alpha level. As, no alpha level was given in the
question, so we may use the standard level of 0.05. This needs to be
halved for the test, so use 0.025.
Step-5:- We will find the critical F-Value using the F-Table. We will
use the table with 0.025. Critical-F for (40,20) at alpha (0.025) is
2.287.
Slide- 33
Confidence intervals
A confidence interval is the mean of your estimate plus and minus the variation in
that estimate. This is the range of values you expect your estimate to fall between if
you redo your test, within a certain level of confidence.
For example, if you construct a confidence interval with a 95% confidence level, you
are confident that 95 out of 100 times the estimate will fall between the upper and
lower values specified by the confidence interval.
Your desired confidence level is usually one minus the alpha ( a ) value you used in
your statistical test:
So if you use an alpha value of p < 0.05 for statistical significance, then your
confidence level would be 1 − 0.05 = 0.95, or 95%.
Slide- 34
Proportions
Population means
Differences between population means or proportions
Estimates of variation among groups
These are all point estimates, and don’t give any information about the variation
around the number. Confidence intervals are useful for communicating the variation
around a point estimate.
Slide- 35
Calculating a confidence interval: what you need to know
Most statistical programs will include the confidence interval of the estimate when
you run a statistical test.
If you want to calculate a confidence interval on your own, you need to know:
1. The point estimate you are constructing the confidence interval for
2. The critical values for the test statistic
3. The standard deviation of the sample
4. The sample size
Once you know each of these components, you can calculate the confidence interval
for your estimate by plugging them into the confidence interval formula that
corresponds to your data.