Two-Sample t-Test Introduction to Statistics JMP
Two-Sample t-Test Introduction to Statistics JMP
search
What do we need?
For the two-sample t-test, we need two variables. One variable defines the
two groups. The second variable is the measurement of interest.
For very small groups of data, it can be hard to test these requirements.
Below, we'll discuss how to check the requirements using software and
what to do when a requirement isn’t met.
Our sample data is from a group of men and women who did workouts at
a gym three times a week for a year. Then, their trainer measured the
body fat. The table below shows the data.
You can clearly see some overlap in the body fat measurements for the
men and women in our sample, but also some differences. Just by looking
at the data, it's hard to draw any solid conclusions about whether the
underlying populations of men and women at the gym have the same
mean body fat. That is the value of statistical tests – they provide a
common, statistically valid way to make decisions, so that everyone
makes the same decision on the same set of data values.
The data values are independent. The body fat for any one person does
not depend on the body fat for another person.
We assume the people measured represent a simple random sample
from the population of members of the gym.
We assume the data are normally distributed, and we can check this
assumption.
The data values are body fat measurements. The measurements are
continuous.
We assume the variances for men and women are equal, and we can
check this assumption.
Before jumping into analysis, we should always take a quick look at the
data. The figure below shows histograms and summary statistics for the
men and women.
Figure 1: Histogram and summary statistics for the body fat data
The two histograms are on the same scale. From a quick look, we can see
that there are no very unusual points, or outliers. The data look roughly
bell-shaped, so our initial idea of a normal distribution seems reasonable.
Table 2: Average, standard deviation and sample size statistics grouped by gender
Without doing any testing, we can see that the averages for men and
women in our samples are not the same. But how different are they? Are
the averages “close enough” for us to conclude that mean body fat is the
same for the larger population of men and women at the gym? Or are the
averages too different for us to make this conclusion?
We'll further explain the principles underlying the two sample t-test in the
statistical details section below, but let's first proceed through the steps
from beginning to end. We start by calculating our test statistic. This
calculation begins with finding the difference between the two averages:
((10−1)5.322 )+((13−1)6.842 )
s2p = (10+13−2)
(9×28.30)+(12×46.82)
= 21
(254.7+561.85)
= 21
816.55
= 21 = 38.88
Next, we take the square root of the pooled variance to get the pooled
standard deviation. This is:
√38.88 = 6.24
We now have all the pieces for our test statistic. We have the difference of
the averages, the pooled standard deviation and the sample sizes. We
calculate our test statistic as follows:
To find this value, we need the significance level (α = 0.05) and the
degrees of freedom. The degrees of freedom (df) are based on the
sample sizes of the two groups. For the body fat data, this is:
df = n1 + n2 − 2 = 10 + 13 − 2 = 21
Statistical details
Let’s look at the body fat data and the two-sample t-test using statistical
terms.
Our null hypothesis is that the underlying population means are the same.
The null hypothesis is written as:
H o : µ1 = µ2
The alternative hypothesis is that the means are not equal. This is written
as:
H o : µ1 ≠ µ2
We calculate the average for each group, and then calculate the difference
between the two averages. This is written as:
¯¯¯¯¯ ¯¯¯¯¯
x1 − x2
The formula shows the sample size for the first group as n1 and the
second group as n2. The standard deviations for the two groups are s1 and
s2. This estimate allows the two groups to have different numbers of
observations. The pooled standard deviation is the square root of the
variance and is written as sp.
What if your sample sizes for the two groups are the same? In this
situation, the pooled estimate of variance is simply the average of the
variances for the two groups:
(s21 +s22 )
s2p = 2
(¯x¯¯¯1¯−¯x¯¯¯2¯)
t=
sp √1/n1 +1/n2
The numerator of the test statistic is the difference between the two group
averages. It estimates the difference between the two unknown
population means. The denominator is an estimate of the standard error
of the difference between the two unknown population means.
We then compare the test statistic to a t value with our chosen alpha value
and the degrees of freedom for our data. Using the body fat data as an
example, we set α = 0.05. The degrees of freedom (df) are based on the
group sizes and are calculated as:
df = n1 + n2 − 2 = 10 + 13 − 2 = 21
The formula shows the sample size for the first group as n1 and the
second group as n2. Statisticians write the t value with α = 0.05 and 21
degrees of freedom as:
t0.05,21
The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are
two possible results from our comparison:
The test statistic is lower than the t value. You fail to reject the
hypothesis of equal means. You conclude that the data support the
assumption that the men and women have the same average body fat.
The test statistic is higher than the t value. You reject the hypothesis of
equal means. You do not conclude that men and women have the same
average body fat.
(¯x¯¯¯1¯−¯x¯¯¯2¯)
t=
√s21 /n1 +s22 /n2
The numerator of the test statistic is the same. It is the difference between
the averages of the two groups. The denominator is an estimate of the
overall standard error of the difference between means. It is based on the
separate standard error for each group.
The degrees of freedom calculation for the t value is more complex with
unequal variances than equal variances and is usually left up to statistical
software packages. The key point to remember is that if you cannot use
the pooled estimate of standard deviation, then you cannot use the simple
formula for the degrees of freedom.
Figure 2: Normal quantile plot of the body fat measurements for men and women
You can also perform a formal test for normality using software. The
figure above shows results of testing for normality with JMP software. We
test each group separately. Both the test for men and the test for women
show that we cannot reject the hypothesis of a normal distribution. We
can go ahead with the assumption that the body fat data for men and for
women are normally distributed.
Figure 3: Test for unequal variances for the body fat data
Without diving into details of the different types of tests for unequal
variances, we will use the F test. Before testing, we decide to accept a 10%
risk of concluding the variances are equal when they are not. This means
we have set α = 0.10.
Like most statistical software, JMP shows the p-value for a test. This is the
likelihood of finding a more extreme value for the test statistic than the
one observed. It’s difficult to calculate by hand. For the figure above, with
the F test statistic of 1.654, the p-value is 0.4561. This is larger than our α
value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In
practical terms, we can go ahead with the two-sample t-test with the
assumption of equal variances for the two groups.
Understanding p-values
Using a visual, you can check to see if your test statistic is a more extreme
value in the distribution. The figure below shows a t-distribution with 21
degrees of freedom.
Since our test is two-sided and we have set α = .05, the figure shows that
the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only