Student's t-distribution in Statistics
Last Updated :
04 Nov, 2025
The Student’s t-distribution (or simply the t-distribution) is a probability distribution used in statistics when making inferences about a population mean, particularly when the sample size is small (n ≤ 30) or the population standard deviation (σ) is unknown. It resembles the standard normal distribution but has heavier tails, allowing it to better handle variability in small samples. The t-score indicates how many estimated standard errors the sample mean (x̄) is away from the population mean (μ).
Formula for the t-Score
The t-score (or t-statistic) quantifies how many estimated standard errors the sample mean (x̄) is from the population mean (μ):
t = \frac{x̄-μ}{s\sqrt{n}}
where,
- t = t-score,
- x̄ = sample mean
- μ = population mean,
- s = standard deviation of the sample,
- n = sample size
The t-score helps determine how far the sample mean is from the population mean under the assumption of random sampling.
When to Use the t-Distribution
Student's t Distribution is used when :
- The sample size is 30 or less than 30.
- The population standard deviation(σ) is unknown.
- The population distribution must be unimodal and skewed.
Interpretation of t-Distribution with Example
Suppose a researcher wants to estimate the average daily study time of students before exams. A random sample of 20 students reports an average (x̄) of 4 hours with a sample standard deviation (s) of 1.5 hours. We want to construct a 90% confidence interval for the true population mean.
Given:
- x̄ = 4 hours, s = 1.5 hours, n = 20 and a 90% confidence level.
- Degrees of freedom = n – 1 = 19
- Critical t-value (from t-table) ≈ 1.729
CI = \bar{x} \pm t \times \frac{s}{\sqrt{n}}
Substituting the given values:
CI = 4 \pm 1.729 \times \frac{1.5}{\sqrt{20}} = (3.42, \, 4.58)
Interpretation: We are 90% confident that the true average study time for all students lies between 3.42 and 4.58 hours per day. This range indicates where the actual population mean is most likely to fall, given the data and confidence level.
Implementation
Let's implement the example in Python using scipy.stats:
Python
import numpy as np
from scipy import stats
x_bar = 4
s = 1.5
n = 20
confidence = 0.90
df = n - 1
t_critical = stats.t.ppf((1 + confidence) / 2, df)
margin_of_error = t_critical * (s / np.sqrt(n))
lower_bound = x_bar - margin_of_error
upper_bound = x_bar + margin_of_error
print(f"t-critical value: {t_critical:.3f}")
print(f"Confidence Interval (90%): ({lower_bound:.2f}, {upper_bound:.2f})")
Output:
t-critical value: 1.729
Confidence Interval (90%): (3.42, 4.58)
Properties of the t-Distribution
t-distribution- It is symmetric and bell-shaped, like the normal distribution.
- The variable t ranges from −∞ to +∞.
- As degrees of freedom (df) increase, the t-distribution approaches the standard normal distribution (Z).
- The mean, median and mode are all zero (for df > 1).
- Variance = df / (df − 2) for df > 2, indicating that as df increases, variance decreases.
- It has heavier tails than the normal distribution, making it more flexible for small samples.
Key Elements in Student’s t-Distribution
1. t-Distribution Table
- Provides critical t-values for various confidence levels and degrees of freedom.
- Used to determine whether the calculated t-score falls in the rejection region of a hypothesis test.
- Example: At a 5% significance level (α = 0.05), if the calculated |t| exceeds the tabulated t-value, the difference between sample and population means is considered statistically significant.
T- Distribution table 2. t-Score
- t-Score measures how far a sample mean deviates from the population mean in terms of standard errors.
- Helps determine statistical significance and construct confidence intervals.
- Larger |t| values indicate a greater difference between sample and population means.
3. p-Value
- p-Value Represents the probability of obtaining a test result at least as extreme as the one observed, assuming the null hypothesis is true.
- A small p-value (typically < 0.05) suggests strong evidence against the null hypothesis, leading to its rejection.
- It can be obtained from statistical software or a t-table using the calculated t-score and degrees of freedom.
Applications
- Testing a Population Mean: Determine if a sample mean significantly differs from a known or hypothesized population mean.
- Comparing Two Means: Evaluate whether two independent or paired samples have different means.
- Testing Correlation: Assess if the correlation coefficient between two variables significantly differs from zero.
Limitations
- Assumes Normality: The t-distribution relies on the assumption that the underlying population is normally distributed. Significant deviations can lead to inaccurate results.
- Less Useful for Large Samples: As the sample size increases, the t-distribution approaches the normal distribution, making the latter more suitable for large datasets.
- Sensitive to Outliers: Extreme values can distort results, especially in small samples where the t-distribution’s heavy tails amplify their effect.
- Requires Random Sampling: Valid conclusions depend on random, independent observations. Violating these assumptions reduces the reliability of inferences.
Difference Between T-Distribution and Normal Distribution
| Aspect | t-Distribution | Normal Distribution |
|---|
| Definition | Defined by degrees of freedom (df) depending on sample size | Defined by mean (μ) and standard deviation (σ) |
| Sample Size | Used for small samples (n ≤ 30) | Used for large samples (n > 30) |
| Standard Deviation | Unknown (estimated from sample) | Known |
| Shape | Heavier tails more prone to extreme values | Lighter tails, data closer to mean |
| Application | Hypothesis testing when σ is unknown | When σ is known or sample size is large |
| Range of Critical Values | Wider range due to more uncertainty | Narrower range with less uncertainty |
Student's t-distribution in Statistics
Explore
Linear Algebra
Sequence & Series
Calculus
Probability & Statistics
Practice Questions