0% found this document useful (0 votes)
5 views

Two-Sample t-Test Introduction to Statistics JMP

The two-sample t-test is a statistical method used to determine if the means of two independent groups are equal. It is applicable when data values are independent, randomly sampled from normal populations, and have equal variances, although it can still be used with unequal variances. The document provides detailed guidance on performing the test, assumptions required, and examples, including a case study on body fat percentages among men and women.

Uploaded by

dalcheryn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Two-Sample t-Test Introduction to Statistics JMP

The two-sample t-test is a statistical method used to determine if the means of two independent groups are equal. It is applicable when data values are independent, randomly sampled from normal populations, and have equal variances, although it can still be used with unequal variances. The document provides detailed guidance on performing the test, assumptions required, and examples, including a case study on body fat percentages among men and women.

Uploaded by

dalcheryn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Try JMP 

Statistics Knowledge Portal


A free online introduction to statistics

search  

The Two-Sample t -Test


What is the two-sample t-test?
The two-sample t-test (also known as the independent samples t-
test) is a method used to test whether the unknown population
means of two groups are equal or not.

Is this the same as an A/B test?


Yes, a two-sample t-test is used to analyze the results from A/B
tests.

When can I use the test?


You can use the test when your data values are independent, are
randomly sampled from two normal populations and the two
independent groups have equal variances.

What if I have more than two groups?


Use a multiple comparison method. Analysis of variance (ANOVA)
is one such method. Other multiple comparison methods include
the Tukey-Kramer test of all pairwise differences, analysis of
means (ANOM) to compare group means to the overall mean or
Dunnett’s test to compare each group mean to a control mean.

What if the variances for my two groups are not


equal?
You can still use the two-sample t-test. You use a different estimate
of the standard deviation.

What if my data isn’t nearly normally


distributed?
If your sample sizes are very small, you might not be able to test for
normality. You might need to rely on your understanding of the
data. When you cannot safely assume normality, you can perform a
nonparametric test that doesn’t assume normality.

See how to perform a two-sample t -test using statistical


software

 Download JMP to follow along using the sample data included


with the software.
 To see more JMP tutorials, visit the JMP Learning Library.

Using the two-sample t-test


The sections below discuss what is needed to perform the test, checking
our data, how to perform the test and statistical details.

What do we need?
For the two-sample t-test, we need two variables. One variable defines the
two groups. The second variable is the measurement of interest.

We also have an idea, or hypothesis, that the means of the underlying


populations for the two groups are different. Here are a couple of
examples:

 We have students who speak English as their first language and


students who do not. All students take a reading test. Our two groups
are the native English speakers and the non-native speakers. Our
measurements are the test scores. Our idea is that the mean test scores
for the underlying populations of native and non-native English
speakers are not the same. We want to know if the mean score for the
population of native English speakers is different from the people who
learned English as a second language.
 We measure the grams of protein in two different brands of energy
bars. Our two groups are the two brands. Our measurement is the
grams of protein for each energy bar. Our idea is that the mean grams
of protein for the underlying populations for the two brands may be
different. We want to know if we have evidence that the mean grams of
protein for the two brands of energy bars is different or not.

Two-sample t-test assumptions

To conduct a valid test:

 Data values must be independent. Measurements for one observation


do not affect measurements for any other observation.
 Data in each group must be obtained via a random sample from the
population.
 Data in each group are normally distributed.
 Data values are continuous.
 The variances for the two independent groups are equal.

For very small groups of data, it can be hard to test these requirements.
Below, we'll discuss how to check the requirements using software and
what to do when a requirement isn’t met.

Two-sample t-test example


One way to measure a person’s fitness is to measure their body fat
percentage. Average body fat percentages vary by age, but according to
some guidelines, the normal range for men is 15-20% body fat, and the
normal range for women is 20-25% body fat.

Our sample data is from a group of men and women who did workouts at
a gym three times a week for a year. Then, their trainer measured the
body fat. The table below shows the data.

Table 1: Body fat percentage data grouped by gender

Group Body Fat Percentages

Men 13.3 6.0 20.0 8.0 14.0

19.0 18.0 25.0 16.0 24.0

15.0 1.0 15.0

Women 22.0 16.0 21.7 21.0 30.0

26.0 12.0 23.2 28.0 23.0

You can clearly see some overlap in the body fat measurements for the
men and women in our sample, but also some differences. Just by looking
at the data, it's hard to draw any solid conclusions about whether the
underlying populations of men and women at the gym have the same
mean body fat. That is the value of statistical tests – they provide a
common, statistically valid way to make decisions, so that everyone
makes the same decision on the same set of data values.

Checking the data


Let’s start by answering: Is the two-sample t-test an appropriate method
to evaluate the difference in body fat between men and women?

 The data values are independent. The body fat for any one person does
not depend on the body fat for another person.
 We assume the people measured represent a simple random sample
from the population of members of the gym.
 We assume the data are normally distributed, and we can check this
assumption.
 The data values are body fat measurements. The measurements are
continuous.
 We assume the variances for men and women are equal, and we can
check this assumption.

Before jumping into analysis, we should always take a quick look at the
data. The figure below shows histograms and summary statistics for the
men and women.

Figure 1: Histogram and summary statistics for the body fat data

The two histograms are on the same scale. From a quick look, we can see
that there are no very unusual points, or outliers. The data look roughly
bell-shaped, so our initial idea of a normal distribution seems reasonable.

Examining the summary statistics, we see that the standard deviations


are similar. This supports the idea of equal variances. We can also check
this using a test for variances.

Based on these observations, the two-sample t-test appears to be an


appropriate method to test for a difference in means.

How to perform the two-sample t -test


For each group, we need the average, standard deviation and sample size.
These are shown in the table below.

Table 2: Average, standard deviation and sample size statistics grouped by gender

Group Sample Size (n) Average (X-bar) Standard deviation (s)

Women 10 22.29 5.32

Men 13 14.95 6.84

Without doing any testing, we can see that the averages for men and
women in our samples are not the same. But how different are they? Are
the averages “close enough” for us to conclude that mean body fat is the
same for the larger population of men and women at the gym? Or are the
averages too different for us to make this conclusion?

We'll further explain the principles underlying the two sample t-test in the
statistical details section below, but let's first proceed through the steps
from beginning to end. We start by calculating our test statistic. This
calculation begins with finding the difference between the two averages:

22.29 − 14.95 = 7.34

This difference in our samples estimates the difference between the


population means for the two groups.

Next, we calculate the pooled standard deviation. This builds a combined


estimate of the overall standard deviation. The estimate adjusts for
different group sizes. First, we calculate the pooled variance:

((n1 −1)s21 )+((n2 −1)s22 )


s2p = n1 +n2 −2

((10−1)5.322 )+((13−1)6.842 )
s2p = (10+13−2)

(9×28.30)+(12×46.82)
= 21

(254.7+561.85)
= 21

816.55
= 21 = 38.88

Next, we take the square root of the pooled variance to get the pooled
standard deviation. This is:

√38.88 = 6.24

We now have all the pieces for our test statistic. We have the difference of
the averages, the pooled standard deviation and the sample sizes. We
calculate our test statistic as follows:

difference of group averages 7.34 7.34


t= standard error of difference
= = 2.62 = 2.80
(6.24×√(1/10+1/13))

To evaluate the difference between the means in order to make a decision


about our gym programs, we compare the test statistic to a theoretical
value from the t-distribution. This activity involves four steps:

1. We decide on the risk we are willing to take for declaring a significant


difference. For the body fat data, we decide that we are willing to take a
5% risk of saying that the unknown population means for men and
women are not equal when they really are. In statistics-speak, the
significance level, denoted by α, is set to 0.05. It is a good practice to
make this decision before collecting the data and before calculating
test statistics.
2. We calculate a test statistic. Our test statistic is 2.80.
3. We find the theoretical value from the t-distribution based on our null
hypothesis which states that the means for men and women are equal.
Most statistics books have look-up tables for the t-distribution. You can
also find tables online. The most likely situation is that you will use
software and will not use printed tables.

To find this value, we need the significance level (α = 0.05) and the
degrees of freedom. The degrees of freedom (df) are based on the
sample sizes of the two groups. For the body fat data, this is:

df = n1 + n2 − 2 = 10 + 13 − 2 = 21

The t value with α = 0.05 and 21 degrees of freedom is 2.080.


4. We compare the value of our statistic (2.80) to the t value. Since 2.80 >
2.080, we reject the null hypothesis that the mean body fat for men
and women are equal, and conclude that we have evidence body fat in
the population is different between men and women.

Statistical details
Let’s look at the body fat data and the two-sample t-test using statistical
terms.

Our null hypothesis is that the underlying population means are the same.
The null hypothesis is written as:

H o : µ1 = µ2

The alternative hypothesis is that the means are not equal. This is written
as:

H o : µ1 ≠ µ2

We calculate the average for each group, and then calculate the difference
between the two averages. This is written as:

¯¯¯¯¯ ¯¯¯¯¯
x1 − x2

We calculate the pooled standard deviation. This assumes that the


underlying population variances are equal. The pooled variance formula
is written as:

((n1 −1)s21 )+((n2 −1)s22 )


s2p = n1 +n2 −2

The formula shows the sample size for the first group as n1 and the
second group as n2. The standard deviations for the two groups are s1 and
s2. This estimate allows the two groups to have different numbers of
observations. The pooled standard deviation is the square root of the
variance and is written as sp.

What if your sample sizes for the two groups are the same? In this
situation, the pooled estimate of variance is simply the average of the
variances for the two groups:

(s21 +s22 )
s2p = 2

The test statistic is calculated as:

(¯x¯¯¯1¯−¯x¯¯¯2¯)
t=
sp √1/n1 +1/n2

The numerator of the test statistic is the difference between the two group
averages. It estimates the difference between the two unknown
population means. The denominator is an estimate of the standard error
of the difference between the two unknown population means.

Technical Detail: For a single mean, the standard error is s/√n .


The formula above extends this idea to two groups that use a
pooled estimate for s (standard deviation), and that can have
different group sizes.

We then compare the test statistic to a t value with our chosen alpha value
and the degrees of freedom for our data. Using the body fat data as an
example, we set α = 0.05. The degrees of freedom (df) are based on the
group sizes and are calculated as:

df = n1 + n2 − 2 = 10 + 13 − 2 = 21

The formula shows the sample size for the first group as n1 and the
second group as n2. Statisticians write the t value with α = 0.05 and 21
degrees of freedom as:

t0.05,21

The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are
two possible results from our comparison:

 The test statistic is lower than the t value. You fail to reject the
hypothesis of equal means. You conclude that the data support the
assumption that the men and women have the same average body fat.
 The test statistic is higher than the t value. You reject the hypothesis of
equal means. You do not conclude that men and women have the same
average body fat.

t -Test with unequal variances


When the variances for the two groups are not equal, we cannot use the
pooled estimate of standard deviation. Instead, we take the standard error
for each group separately. The test statistic is:

(¯x¯¯¯1¯−¯x¯¯¯2¯)
t=
√s21 /n1 +s22 /n2

The numerator of the test statistic is the same. It is the difference between
the averages of the two groups. The denominator is an estimate of the
overall standard error of the difference between means. It is based on the
separate standard error for each group.

The degrees of freedom calculation for the t value is more complex with
unequal variances than equal variances and is usually left up to statistical
software packages. The key point to remember is that if you cannot use
the pooled estimate of standard deviation, then you cannot use the simple
formula for the degrees of freedom.

Testing for normality


The normality assumption is more important when the two groups have
small sample sizes than for larger sample sizes.

Normal distributions are symmetric, which means they are “even” on


both sides of the center. Normal distributions do not have extreme values,
or outliers. You can check these two features of a normal distribution with
graphs. Earlier, we decided that the body fat data was “close enough” to
normal to go ahead with the assumption of normality. The figure below
shows a normal quantile plot for men and women, and supports our
decision.

Figure 2: Normal quantile plot of the body fat measurements for men and women

You can also perform a formal test for normality using software. The
figure above shows results of testing for normality with JMP software. We
test each group separately. Both the test for men and the test for women
show that we cannot reject the hypothesis of a normal distribution. We
can go ahead with the assumption that the body fat data for men and for
women are normally distributed.

Testing for unequal variances


Testing for unequal variances is complex. We won’t show the calculations
in detail, but will show the results from JMP software. The figure below
shows results of a test for unequal variances for the body fat data.

Figure 3: Test for unequal variances for the body fat data

Without diving into details of the different types of tests for unequal
variances, we will use the F test. Before testing, we decide to accept a 10%
risk of concluding the variances are equal when they are not. This means
we have set α = 0.10.

Like most statistical software, JMP shows the p-value for a test. This is the
likelihood of finding a more extreme value for the test statistic than the
one observed. It’s difficult to calculate by hand. For the figure above, with
the F test statistic of 1.654, the p-value is 0.4561. This is larger than our α
value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In
practical terms, we can go ahead with the two-sample t-test with the
assumption of equal variances for the two groups.

Understanding p-values
Using a visual, you can check to see if your test statistic is a more extreme
value in the distribution. The figure below shows a t-distribution with 21
degrees of freedom.

Figure 4: t-distribution with 21 degrees of freedom and α = .05

Since our test is two-sided and we have set α = .05, the figure shows that

the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only

You might also like