0% found this document useful (0 votes)
24 views108 pages

Eda Final Topic

The document outlines the process of hypothesis testing, including formulating null and alternative hypotheses, choosing a significance level, calculating test statistics, and interpreting results. It details various types of hypothesis tests such as Z-tests, T-tests, and Chi-squared tests, along with common formulas and assumptions. Additionally, it provides examples and interpretations of results from hypothesis tests, emphasizing the importance of understanding one-sided and two-sided tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views108 pages

Eda Final Topic

The document outlines the process of hypothesis testing, including formulating null and alternative hypotheses, choosing a significance level, calculating test statistics, and interpreting results. It details various types of hypothesis tests such as Z-tests, T-tests, and Chi-squared tests, along with common formulas and assumptions. Additionally, it provides examples and interpretations of results from hypothesis tests, emphasizing the importance of understanding one-sided and two-sided tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Calculate & interpret the results of

Hypothesis Testing
Hypothesis testing is a statistical method used to determine
whether a hypothesis about a population parameter is true
or false. Here's a step-by-step guide:
Hypothesis Testing Process
1. Formulate the null and alternative hypotheses: Define the null
hypothesis (H0) and alternative hypothesis (H1).
2. Choose a significance level (α): Typically 0.05.
3. Select a sample and calculate the test statistic: Calculate the
sample mean, proportion, or other relevant statistic.
4. Determine the critical region: Identify the region where the
null hypothesis is rejected.
5. Calculate the p-value: Probability of observing the test statistic
under H0.
6. Make a decision: Reject H0 if p-value < α or test statistic falls
within the critical region.
7. Interpret the results: Discuss implications of accepting or
rejecting H0.
Types of Hypothesis Tests
1. Z-test: For large samples, comparing means or
proportions.
2. T-test: For small samples, comparing means.
3. Chi-squared test: For categorical data, testing
independence or goodness-of-fit.
4. ANOVA: Comparing means across multiple groups.
5. Regression analysis: Testing relationships between
variables.
Common Hypothesis Testing Formulas
1. Z-score: Z = (X̄ - μ) / (σ / √n)
2. T-statistic: t = (X̄ - μ) / (s / √n)
3. Chi-squared statistic: χ² = Σ [(observed - expected)² /
expected]
4. p-value: Calculated using statistical software or tables.
Interpretation of Results
1. Reject H0: Significant difference/effect exists (p-value < α).
2. Fail to reject H0: No significant difference/effect (p-value ≥ α).
3. Type I error: Rejecting H0 when true (α).
4. Type II error: Failing to reject H0 when false (β).
Statistical inference of two samples
Statistical inference for two samples involves comparing the
characteristics of two groups to determine if there's a
significant difference between them. Here's a comprehensive
overview:
Types of Two-Sample Tests
1. Independent Samples T-Test: Compares means of two
independent groups.
2. Paired Samples T-Test: Compares means of two related
groups (e.g., before-after).
3. Mann-Whitney U Test: Compares medians of two
independent groups (non-parametric).
4. Wilcoxon Signed-Rank Test: Compares medians of two related
groups (non-parametric).
5. Chi-Squared Test: Compares proportions of two independent
groups.
Assumptions
1. Independence: Samples are randomly selected and
independent.
2. Normality: Data follows a normal distribution (for parametric
tests).
3. Equal Variances: Variances are equal across groups (for
parametric tests).
Hypothesis Testing
1. Null Hypothesis (H0): No significant difference between
groups.
2. Alternative Hypothesis (H1): Significant difference between
groups.
3. Significance Level (α): Typically 0.05.
4. Test Statistic: Calculated value (e.g., t-statistic, z-score).
5. p-value: Probability of observing the test statistic under H0.
Interpretation
1. Reject H0: Significant difference between groups (p-value < α).
2. Fail to reject H0: No significant difference between groups (p-
value ≥ α).
3. Type I Error: Rejecting H0 when true (α).
4. Type II Error: Failing to reject H0 when false (β). Common
Formulas
1. Independent Samples T-Test:
t = (X̄1 - X̄2) / sqrt((s1^2/n1) + (s2^2/n2))
2. Paired Samples T-Test: t = (X̄d) / (sd / sqrt(n))
3. Mann-Whitney U Test: U = n1n2 + (n1(n1+1)/2) - R1
Problem
A coffee shop owner claims that the average amount of coffee
consumed per customer is 250ml. A random sample of 36
customers shows an average consumption of 270ml with a
standard deviation of 50ml. Test the claim at a 5% significance
level.
Given Values
1. Sample size (n): 36
2. Sample mean (X̄): 270ml
3. Sample standard deviation (s): 50ml
4. Population mean (μ): 250ml (claimed)
5. Significance level (α): 0.05
Hypothesis
1. H0: μ = 250 (null hypothesis)
2. H1: μ ≠ 250 (alternative hypothesis)
Calculation
1. Calculate the test statistic (t):
t = (X̄ - μ) / (s / √n)
t = (270 - 250) / (50 / √36)
t = 20 / 8.33
t ≈ 2.4
2. Determine the degrees of freedom:
df = n - 1
df = 36 - 1
df = 35
3. Find the critical t-value:
Using a t-distribution table or software, find the critical t-value
for α = 0.05 and df = 35.
t-critical ≈ ±2.030
4. Calculate the p-value:
Using software or a t-distribution table, find the p-value
associated with
t ≈ 2.4 and df = 35.
p-value ≈ 0.023
Interpretation
Since the calculated t-value (2.4) exceeds the critical t-value
(2.030), and the p-value (0.023) is less than α (0.05), we:
1. Reject H0: The average amount of coffee consumed per
customer is significantly different from 250ml.
2. Conclude: The coffee shop owner's claim is incorrect. The
actual average consumption is likely higher than 250ml.
Software Output
Here's an example output from R:
t.test(x = c(270),mu = 250, sd = 50, n = 36)
One Sample t-test
data: c(270) t = 2.4, df = 35, p-value = 0.023
alternative hypothesis:
true mean is not equal to 250
95 percent confidence interval:
253.411 286.589
sample estimates:
mean of x 270
This output confirms our manual calculations.
Interpretation
Since the calculated t-value (2.4) exceeds the critical t-value
(2.030), and the p-value (0.023) is less than α (0.05), we:
1. Reject H0: The average amount of coffee consumed per
customer is significantly different from 250ml.
2. Conclude: The coffee shop owner's claim is incorrect. The
actual average consumption is likely higher than 250ml.
One-Sided Hypothesis (Directional Test)
1. Alternative hypothesis (H1) is directional: Specifies the
direction of the difference or relationship.
2. Null hypothesis (H0) is opposite: States the absence of the
specified effect or difference.
3. Tested in one direction: Only one tail of the distribution is
considered.
4. Example: H0: μ ≤ 10, H1: μ > 10 (testing if the mean is greater
than 10)
Two-Sided Hypothesis (Non-Directional Test)
1. Alternative hypothesis (H1) is non-directional: Doesn't specify
the direction of the difference or relationship.
2. Null hypothesis (H0) states equality: States the absence of any
difference or relationship.
3. Tested in both directions: Both tails of the distribution are
considered.
4. Example: H0: μ = 10, H1: μ ≠ 10 (testing if the mean is
different from 10)
Key Differences
1. Directionality: One-sided tests have a directional alternative
hypothesis, while two-sided tests have a non-directional
alternative hypothesis.
2. Null hypothesis: One-sided tests have a null hypothesis that
specifies the absence of the effect in one direction, while two-
sided tests have a null hypothesis that specifies equality.
3. Critical region: One-sided tests have a critical region in one
tail, while two-sided tests have critical regions in both tails.
4. p-value calculation: One-sided tests calculate the p-value
using the area in one tail, while two-sided tests calculate the p-
value using the area in both tails.
When to Use
1. One-sided test: Use when:
- You have a prior expectation about the direction of the effect.
- The alternative hypothesis is directional.
- You want to detect an increase or decrease.
2. Two-sided test: Use when:
- You don't have a prior expectation about the direction of the
effect.
- The alternative hypothesis is non-directional.
- You want to detect any difference (increase or decrease).
Examples
1. One-sided test:
Testing whether a new medicine increases blood pressure
(H0: μ ≤ 120, H1: μ > 120).
2. Two-sided test:
Testing whether a new exercise program affects blood pressure
(H0: μ = 120, H1: μ ≠ 120).
Common Mistakes
1. Incorrectly specifying the direction of the alternative
hypothesis.
2. Failing to consider the directionality of the test.
3. Misinterpreting the results of a one-sided test.
Problem
A company claims that the average lifespan of its batteries is at
least 500 hours. A random sample of 25 batteries has a mean
lifespan of 520 hours with a standard deviation of 30 hours. Test
the claim at a 5% significance level.
Given Values
1. Sample size (n): 25
2. Sample mean (X̄): 520
3. Sample standard deviation (s): 30
4. Population mean (μ): 500 (claimed)
5. Significance level (α): 0.05
Hypotheses
1. H0: μ ≤ 500 (null hypothesis)
2. H1: μ > 500 (alternative hypothesis, one-sided)
Test Statistic
1. t = (X̄ - μ) / (s / √n)
2. t = (520 - 500) / (30 / √25)
3. t = 20 / 6 4.
t ≈ 3.33
Degrees of Freedom
1. df = n - 1
2. df = 25 - 1
3. df = 24
Critical Region
1. One-sided test, α = 0.05
2. Critical t-value ≈ 1.711 (using t-distribution table)
p-value
1. p-value ≈ 0.0013 (using t-distribution table or software)
Here's a sample problem with a solution for a two-sided
hypothesis:
Problem A researcher claims that the average height of adults in
a population is 175 cm. A random sample of 36 adults has a
mean height of 178 cm with a standard deviation of 8 cm. Test
the claim at a 5% significance level.
Given Values
1. Sample size (n): 36
2. Sample mean (X̄): 178 cm
3. Sample standard deviation (s): 8 cm
4. Population mean (μ): 175 cm (claimed)
5. Significance level (α): 0.05
Hypotheses
1. H0: μ = 175 (null hypothesis)
2. H1: μ ≠ 175 (alternative hypothesis, two-sided)
Test Statistic
1. t = (X̄ - μ) / (s / √n)
2. t = (178 - 175) / (8 / √36)
3. t = 3 / 1.33
4. t ≈ 2.26
Degrees of Freedom
1. df = n - 1
2. df = 36 - 1
3. df = 35
Critical Region
1. Two-sided test, α = 0.05
2. Critical t-values ≈ ±2.030 (using t-distribution table)
p-value
1. p-value ≈ 0.029 (using t-distribution table or software)
Decision
1. Reject H0:
Since t ≈ 2.26 > 2.030 and p-value ≈ 0.029 < α = 0.05.
2. Conclude: The researcher's claim is incorrect. The average
height of adults is significantly different from 175 cm.
Software Output Here's an example output from R:
t.test(x = c(178), mu = 175, sd = 8, n = 36)
One Sample t-test data: c(178) t = 2.26, df = 35, p-value = 0.029
alternative hypothesis:
true mean is not equal to 175
95 percent confidence interval: 174.41 181.59
sample estimates: mean of x 178
Interpretation The test results indicate that the average height of
adults is significantly different from 175 cm, supporting the
alternative hypothesis. This suggests that the researcher's claim
may be incorrect.
Here are key concepts and formulas for testing the mean of a
normal distribution:
Hypothesis Testing
1. Null Hypothesis (H0): μ = μ0 (population mean equals a known
value)
2. Alternative Hypothesis (H1): μ ≠ μ0 (two-tailed), μ > μ0 (one-
tailed, right), or μ < μ0 (one-tailed, left)
3. Test Statistic: z = (x ̄ - μ0) / (σ / √n) or t = (x ̄ - μ0) / (s / √n)
4. p-value: Probability of observing the test statistic under H0
5. Critical Region: Range of values where H0 is rejected
Types of Tests
1. One-Sample Z-Test: Known population standard deviation (σ)
2. One-Sample T-Test: Unknown population standard deviation
(s)
3. Two-Sample Z-Test: Comparing means of two independent
samples
4. Two-Sample T-Test: Comparing means of two independent
samples
Formulas
1. One-Sample Z-Test: z = (x ̄ - μ0) / (σ / √n)
2. One-Sample T-Test: t = (x ̄ - μ0) / (s / √n)
3. Two-Sample Z-Test:
z = ((x1̄ - x2̄ ) - (μ1 - μ2)) / √((σ1^2/n1) + (σ2^2/n2))
4. Two-Sample T-Test:
t = ((x1̄ - x2̄ ) - (μ1 - μ2)) / √((s1^2/n1) + (s2^2/n2))
Assumptions
1. Normality: Data follows a normal distribution
2. Independence: Observations are independent
3. Equal Variances: Variances are equal across groups (for two-
sample tests)
Example Suppose we want to test whether the average height of
adults in a population is 175 cm, given a sample of 36 adults
with a mean height of 178 cm and standard deviation of 8 cm.
1. H0: μ = 175
2. H1: μ ≠ 175
3. Test statistic:
t = (178 - 175) / (8 / √36) ≈ 2.26
4. p-value ≈ 0.029
5. Reject H0, conclude μ ≠ 175
Here's a sample problem with a solution:
Problem
A manufacturing company claims that the average weight of its
bags of flour is 2 kg. A random sample of 25 bags has a mean
weight of 2.1 kg and a standard deviation of 0.2 kg. Test the
claim at a 5% significance level.
Given Values
1. Sample size (n): 25
2. Sample mean (x)̄ : 2.1 kg
3. Sample standard deviation (s): 0.2 kg
4. Population mean (μ0): 2 kg (claimed)
5. Significance level (α): 0.05
Hypotheses
1. H0: μ = 2 kg (null hypothesis)
2. H1: μ ≠ 2 kg (alternative hypothesis, two-tailed)
Test Statistic
1. t = (x ̄ - μ0) / (s / √n)
2. t = (2.1 - 2) / (0.2 / √25)
3. t = 0.1 / 0.04
4. t = 2.5
degrees of Freedom
1. df = n - 1
2. df = 25 - 1
3. df = 24
Critical Region
1. Two-tailed test, α = 0.05
2. Critical t-values ≈ ±2.064 (using t-distribution table)
p-value
1. p-value ≈ 0.023 (using t-distribution table or software)
Decision
1. Reject H0: Since t = 2.5 > 2.064 and p-value ≈ 0.023 < α = 0.05.
2. Conclude: The company's claim is incorrect. The average
weight of bags of flour is significantly different from 2 kg.
Software Output
Here's an example output from R:
t.test(x = c(2.1), mu = 2, sd = 0.2, n = 25)
One Sample t-test data:
c(2.1) t = 2.5, df = 24, p-value = 0.023
alternative hypothesis: true mean is not equal to 2
95 percent confidence interval: 1.936 2.164
sample estimates: mean of x 2.1
Interpretation
The test results indicate that the average weight of bags of flour
is significantly different from 2 kg, supporting the alternative
hypothesis. This suggests that the company's claim may be
incorrect.
Here's a comprehensive guide to testing the variance and
standard deviation of a normal distribution:
Hypothesis Testing
1. Null Hypothesis (H0): σ² = σ0² (population variance equals a
known value)
2. Alternative Hypothesis
(H1): σ² ≠ σ0² (two-tailed), σ² > σ0² (one-tailed, right),
or σ² < σ0² (one-tailed, left)
3. Test Statistic: χ² = (n - 1)s² / σ0²
4. p-value: Probability of observing the test statistic under H0
5. Critical Region: Range of values where H0 is rejected
Assumptions
1. Normality: Data follows a normal distribution
2. Independence: Observations are independent
3. Random Sampling: Sample is randomly selected Test
Procedure
1. Calculate sample variance (s²).
2. Choose significance level (α).
3. Determine degrees of freedom (df = n - 1).
4. Calculate test statistic (χ²).
5. Find p-value or critical χ²-value.
6. Make decision: Reject H0 if p-value < α or χ² > critical χ²-value.
Formulas
1. Sample Variance: s² = Σ(xi - x)̄ ² / (n - 1)
2. Test Statistic: χ² = (n - 1)s² / σ0²
3. Standard Error: SE = √(s² / (2(n - 1)))
Types of Tests
1. One-Sample Chi-Square Test: Testing variance of one sample.
2. Two-Sample F-Test: Comparing variances of two independent
samples.
Example
Suppose we want to test whether the variance of exam scores is
100, given a sample of 25 students with a sample variance of
120.
1. H0: σ² = 100
2. H1: σ² ≠ 100
3. α = 0.05
4. df = 25 - 1 = 24
5. χ² = (24)(120) / 100 ≈ 28.8
6. p-value ≈ 0.001 7.
Reject H0, conclude σ² ≠ 100
Here's a sample problem:
Problem
A manufacturer claims that the variance of the weights of its
bags of flour is 0.04 kg². A random sample of 25 bags has a
sample variance of 0.06 kg². Test the claim at a 5% significance
level.
Given Values
1. Sample size (n): 25
2. Sample variance (s²): 0.06 kg²
3. Population variance (σ0²): 0.04 kg² (claimed)
4. Significance level (α): 0.05
Hypotheses
1. H0: σ² = 0.04 kg² (null hypothesis)
2. H1: σ² ≠ 0.04 kg² (alternative hypothesis, two-tailed)
Test Statistic
1. χ² = (n - 1)s² / σ0²
2. χ² = (25 - 1)(0.06) / 0.04
3. χ² = 24(0.06) / 0.04
4. χ² ≈ 36
Critical Region
1. Two-tailed test, α = 0.05
2. Critical χ²-values ≈ 12.401 and 39.364 (using χ²-distribution
table)
p-value
1. p-value ≈ 0.012 (using χ²-distribution table or software)
Decision
1. Reject H0: Since χ² ≈ 36 > 39.364 is not true, but p-value ≈
0.012 0.05.
2. Conclude: The manufacturer's claim is incorrect. The variance
of the weights is significantly different from 0.04 kg².
Interpretation
The test results indicate that the variance of the weights is
significantly different from 0.04 kg², supporting the alternative
hypothesis. This suggests that the manufacturer's claim may be
incorrect.
Here's a comprehensive guide on test on a population
proportion:
Hypothesis Testing
1. Null Hypothesis (H0): p = p0 (population proportion equals a
known value)
2. Alternative Hypothesis (H1): p ≠ p0 (two-tailed), p > p0 (one-
tailed, right), or p < p0 (one-tailed, left)
3. Test Statistic: z = (p̂ - p0) / √(p0(1-p0)/n)
4. p-value: Probability of observing the test statistic under H0
5. Critical Region: Range of values where H0 is rejected
Assumptions
1. Random Sampling: Sample is randomly selected
2. Independence: Observations are independent
3. Large Sample Size: n ≥ 30 or np ≥ 5 and n(1-p) ≥ 5
Test Procedure
1. Calculate sample proportion (p̂).
2. Choose significance level (α).
3. Determine test statistic (z).
4. Find p-value or critical z-value.
5. Make decision: Reject H0 if p-value < α or z > critical z-value.
Formulas
1. Sample Proportion: p̂ = (Number of successes) / n
2. Test Statistic: z = (p̂ - p0) / √(p0(1-p0)/n)
3. Standard Error: SE = √(p0(1-p0)/n)
Test Procedure
1. Calculate sample proportion (p̂).
2. Choose significance level (α).
3. Determine test statistic (z).
4. Find p-value or critical z-value.
5. Make decision: Reject H0 if p-value < α or z > critical z-value.
Formulas
1. Sample Proportion: p̂ = (Number of successes) / n
2. Test Statistic: z = (p̂ - p0) / √(p0(1-p0)/n)
3. Standard Error: SE = √(p0(1-p0)/n)
Types of Tests
1. One-Proportion Z-Test: Testing proportion of one population.
2. Two-Proportion Z-Test: Comparing proportions of two
independent populations.
Example Suppose we want to test whether the proportion of
smokers in a population is 0.3, given a sample of 100 individuals
with 35 smokers.
1. H0: p = 0.3
2. H1: p ≠ 0.3
3. α = 0.05
4. p̂ = 35/100 = 0.35
5. z = (0.35 - 0.3) / √(0.3(1-0.3)/100) ≈ 1.38
6. p-value ≈ 0.168
7. Fail to reject H0, conclude p ≈ 0.3
problem solving for test on a population proportion
Problem
A company claims that 80% of its customers are satisfied with
their service. A random sample of 200 customers found 154
satisfied customers. Test the claim at a 5% significance level.
Given Values
1. Sample size (n): 200
2. Number of successes (x): 154 (satisfied customers)
3. Sample proportion (p̂): 154/200 = 0.77
4. Population proportion (p0): 0.8 (claimed)
5. Significance level (α): 0.05
Hypotheses
1. H0: p = 0.8 (null hypothesis)
2. H1: p ≠ 0.8 (alternative hypothesis, two-tailed)
Test Statistic
1. z = (p̂ - p0) / √(p0(1-p0)/n)
2. z = (0.77 - 0.8 / √(0.8(1-0.8)/200)
3. z ≈ -2.22
Degrees of Freedom Not applicable for one-proportion z-test.
Critical Region
1. Two-tailed test, α = 0.05
2. Critical z-values ≈ ±1.96
p-value
p-value ≈ 0.026 (using z-distribution table or software)
Decision
1. Reject H0: Since z ≈ -2.22 < -1.96 and p-value ≈ 0.026 < α =
0.05.
2. Conclude: The company's claim is incorrect. The population
proportion of satisfied customers is significantly different from
80%.
Here's an overview of statistical inference for two samples:
Hypothesis Testing
1. Two-Sample T-Test: Compare means of two independent
samples.
2. Two-Sample Z-Test: Compare proportions of two independent
samples.
3. Wilcoxon Rank-Sum Test: Compare medians of two
independent samples (non-parametric).
4. Mann-Whitney U Test: Compare distributions of two
independent samples (non-parametric).
Confidence Intervals
1. Two-Sample T-Interval: Estimate difference between means.
2. Two-Sample Z-Interval: Estimate difference between
proportions.
Assumptions
1. Independence: Samples are randomly selected and
independent.
2. Normality: Data follows normal distribution (for parametric
tests).
3. Equal Variances: Variances are equal across samples (for
parametric tests).
Test Statistics
1. Two-Sample T-Test:
t = (x1̄ - x2̄ ) / sqrt((s1²/n1) + (s2²/n2))
2. Two-Sample Z-Test:
z = (p̂1 - p̂2) / sqrt((p1(1-p1)/n1) + (p2(1-p2)/n2))
Interpretation
1. p-value: Probability of observing test statistic under null
hypothesis.
2. Confidence Interval: Range of values for population
parameter.
3. Effect Size: Standardized difference between means (e.g.,
Cohen's d).
Common Tests
1. Paired T-Test: Compare means of paired samples.
2. Two-Sample Test of Proportions: Compare proportions of two
independent samples.
3. Kruskal-Wallis Test: Compare means of three or more
independent samples (non-parametric).
Example
Suppose we want to compare the average heights of males and
females.
1. H0: μ1 = μ2 (null hypothesis)
2. H1: μ1 ≠ μ2 (alternative hypothesis)
3. α = 0.05
4. Sample sizes: n1 = 50 (males), n2 = 50 (females)
5. Sample means: x1̄ = 175.2 cm, x2̄ = 162.1 cm
6. Sample standard deviations: s1 = 5.5 cm, s2 = 4.8 cm
7. t ≈ 8.21
8. p-value ≈ 0.0001 9.
Reject H0, conclude μ1 ≠ μ2.
Problem
A researcher wants to compare the average exam scores of students
from two different teaching methods. Method A (traditional) and
Method B (online). A random sample of 25 students from each
method yielded:
Given Values
1. Method A (Traditional):
- Sample size (n1): 25
- Sample mean (x1̄ ): 80
- Sample standard deviation (s1): 10
2. Method B (Online):
- Sample size (n2): 25
- Sample mean (x2̄ ): 85
- Sample standard deviation (s2): 12
1. Significance level (α): 0.05
Hypotheses
1. H0: μ1 = μ2 (null hypothesis)
2. H1: μ1 ≠ μ2 (alternative hypothesis, two-tailed)
Test Statistic
Two-Sample T-Test:
1 t = (x1̄ - x2̄ ) / sqrt((s1²/n1) + (s2²/n2))
2. t = (80 - 85) / sqrt((10²/25) + (12²/25))
3. t ≈ -1.67
Degrees of Freedom
1. df = n1 + n2 - 2
2. df = 25 + 25 - 2
3. df = 48
Critical Region
1. Two-tailed test, α = 0.05
2. Critical t-values ≈ ±1.96 (using t-distribution table)
p-value
1. p-value ≈ 0.101 (using t-distribution table or software)
Decision
1. Fail to reject H0: Since t ≈ -1.67 < 1.96 and p-value ≈ 0.101 > α
= 0.05.
2. Conclude: No significant difference between average exam
scores of students from Method A and Method B.
Confidence Interval
1. Two-Sample T-Interval: (x1̄ - x2̄ ) ± (t * sqrt((s1²/n1) + (s2²/n2)))
2. 95% CI: (-7.41, 1.41)
Software Output Here's an example output from R:
t.test(x = c(80), y = c(85), var.equal = TRUE)
Two Sample t-test data: c(80) and c(85)
t = -1.67, df = 48, p-value = 0.101
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.407141 1.407141
sample estimates:
mean of x mean of y 80 85
Inference on the difference in means of two normal distributions
with known variances involves hypothesis testing and confidence
intervals.
Hypothesis Testing
1. Null Hypothesis (H0): μ1 = μ2 (equal means)
2. Alternative Hypothesis (H1): μ1 ≠ μ2 (unequal means)
3. Test Statistic: Z = (x1̄ - x2̄ ) / sqrt((σ1^2 / n1) + (σ2^2 / n2))
4. Critical Region: Reject H0 if |Z| > Zα/2 (two-tailed test)
Confidence Interval
1. Confidence Interval: (x1̄ - x2̄ ) ± Zα/2 * sqrt((σ1^2 / n1) + (σ2^2 / n2))
2. Margin of Error: Zα/2 * sqrt((σ1^2 / n1) + (σ2^2 / n2))
Assumptions
1. Normality: Both populations are normally distributed.
2. Independence: Samples are independent.
3. Known Variances: Population variances (σ1^2, σ2^2) are
known.
Example Suppose we want to compare the mean heights of men and
women.
1. Sample Data:
- Men (n1 = 100): x1̄ = 175.2 cm, σ1^2 = 10^2
- Women (n2 = 100): x2̄ = 162.1 cm, σ2^2 = 8^2
2. Hypothesis Test:
- H0: μ1 = μ2
- H1: μ1 ≠ μ2
- Z = (175.2 - 162.1) / sqrt((10^2 / 100) + (8^2 / 100)) ≈ 4.31
- Reject H0 (p-value ≈ 0)
3. 95% Confidence Interval:
- (175.2 - 162.1) ± 1.96 * sqrt((10^2 / 100) + (8^2 / 100))
≈ (11.3, 14.1)
Key Considerations
1. Sample Size:
Ensure adequate sample sizes (n1, n2) for reliable inference.
2. Variance Homogeneity: Verify equal variances (σ1^2 = σ2^2) for
pooled variance estimates.
3. Non-Parametric Alternatives: Consider non-parametric tests (e.g.,
Wilcoxon rank-sum test) for non-normal data.
Example
Suppose we want to compare the mean exam scores of two classes.
1. Sample 1 (Class A): x̄1 = 85, n1 = 50, σ1^2 = 100
2. Sample 2 (Class B): x2̄ = 80, n2 = 60, σ2^2 = 120
3. Hypothesis:
H0: μ1 = μ2
H1: μ1 ≠ μ2
4. α: 0.05
Solution 1.
Test statistic: Z = (85 - 80) / sqrt((100 / 50) + (120 / 60)) ≈ 1.54
2. Critical region: Reject H0 if |Z| > 1.96
3. p-value: P(|Z| > 1.54) ≈ 0.123
4. Decision: Fail to reject H0 (p-value > 0.05)
95% Confidence Interval
1. Margin of error: 1.96 * sqrt((100 / 50) + (120 / 60)) ≈ 4.33
2. Confidence interval: (85 - 80) ± 4.33 ≈ (0.67, 8.33)
Inference on the Difference in Means of Two Normal Distribution,
Variances
Hypothesis Testing
1. Null Hypothesis (H0): μ1 = μ2 (equal means)
2. Alternative Hypothesis (H1): μ1 ≠ μ2 (unequal means)
3. Test Statistic: t = (x1̄ - x2̄ ) / sqrt((s1^2 / n1) + (s2^2 / n2))
4. Degrees of Freedom: Typically, min(n1-1, n2-1)
Confidence Interval
1. Confidence Interval: (x1̄ - x2̄ ) ± tα/2 * sqrt((s1^2 / n1) + (s2^2 / n2))
2. Margin of Error: tα/2 * sqrt((s1^2 / n1) + (s2^2 / n2))
Assumptions
1. Normality: Both populations are normally distributed.
2. Independence: Samples are independent.
3. Equal Variances: Population variances (σ1^2, σ2^2) are equal.
Types of Tests
1. Pooled Variance Test: Assumes equal variances.
2. Welch's Test: Does not assume equal variances.
3. t-Test: Suitable for small samples.
Considerations
1. Sample Size: Ensure adequate sample sizes (n1, n2).
2. Variance Homogeneity: Verify equal variances.
3. Non-Parametric Alternatives: Consider non-parametric tests for non-
normal data.
Example
Compare the mean heights of men and women.
Given Data
1. Men: x1̄ = 175.2 cm, n1 = 100, s1^2 = 10^2
2. Women: x2̄ = 162.1 cm, n2 = 100, s2^2 = 8^2
3. α = 0.05
Hypothesis
1. H0: μ1 = μ2 (equal means)
2. H1: μ1 ≠ μ2 (unequal means)
Solution Pooled Variance Test
1. Pooled variance: sp^2 = ((n1-1)s1^2 + (n2-1)s2^2) / (n1+n2-2)
= ((99100 + 9964) / 198) = 84.85
2. Standard error: SE = sqrt(sp^2 * (1/n1 + 1/n2))
= sqrt(84.85 * (1/100 + 1/100)) ≈ 1.38
3. Test statistic: t = (x1̄ - x2̄ ) / SE = (175.2 - 162.1) / 1.38 ≈ 9.42
4. Degrees of freedom: df = n1 + n2 - 2 = 198
5. Critical value: t0.025,198 ≈ 1.96
6. p-value: P(|t| > 9.42) ≈ 0
Conclusion
Reject H0. Mean heights differ significantly (p < 0.05).
Confidence Interval
1. Margin of error: ME = t0.025,198 * SE ≈ 1.96 * 1.38 ≈ 2.70
2. 95% CI: (175.2 - 162.1) ± 2.70 ≈ (10.40, 13.00)
Conclusion
Reject H0. Mean heights differ significantly (p < 0.05).
Confidence Interval
1. Margin of error: ME = t0.025,198 * SE ≈ 1.96 * 1.38 ≈ 2.70
2. 95% CI: (175.2 - 162.1) ± 2.70 ≈ (10.40, 13.00)
Simple Linear Regression
1. Definition: A statistical method to model the relationship between a
dependent variable and an independent variable (x).
2. Equation: y = β0 + β1x + ε (ε = error term)
3. Goals: Predict y values, identify relationships, and estimate
coefficients (β0, β1).
4. Assumptions: Linearity, independence, homoscedasticity, normality,
and no multicollinearity.
Correlation
1. Definition: Measures the strength and direction of a linear
relationship between two variables.
2. Types: Pearson's r (parametric), Spearman's ρ (non-parametric), and
Kendall's τ.
3. Interpretation: Values range from -1 (perfect negative correlation) to
1 (perfect positive correlation).
4. Correlation coefficient (r): Measures strength and direction.
Differences
1. Purpose: Regression predicts y values, while correlation measures
relationship strength.
2. Direction: Regression implies causality, whereas correlation suggests
association.
3. Equation: Regression provides a predictive model, whereas correlation
provides a coefficient.
Relationship Between Regression and Correlation
1. Correlation coefficient (r): Square root of R-squared (coefficient of
determination) in simple linear regression.
2. R-squared: Measures variability explained by the regression model.
3. Regression slope (β1): Related to correlation coefficient (r).
Statistical Tests
1. t-test: Evaluates regression coefficients.
2. F-test: Assesses overall model significance.
3. p-value: Indicates probability of observing results by chance.
Common Metrics
1. Mean Squared Error (MSE): Measures regression model accuracy.
2. Coefficient of Determination (R-squared): Evaluates model fit.
3. Root Mean Squared Error (RMSE): Measures model accuracy.
Problem A bakery wants to predict the number of bread loaves
sold based on the number of hours advertised (x). The data:
| Hours Advertised (x) | Bread Loaves Sold (y)|
x y
| --- | --- |
| 2 | 100 |
| 4 | 150 |
| 6 | 200 |
| 8 | 250 |
| 10 | 300 |
Step-by-Step Solution
1. Calculate means:
x̄ = (2+4+6+8+10)/5 = 6,
ȳ = (100+150+200+250+300)/5 = 200
2. Calculate deviations:
x deviation y deviation

x- x̄ y- ȳ
|x | y | x-6 | y-200 | (x-6)(y-200) | (x-6)^2 |
|2 | 100 | -4 | -100 | 400 | 16 |
|4 | 150 | -2 | -50 | 100 | 4 |
|6 | 200 | 0 | 0 | 0 | 0 |
|8 | 250 | 2 | 50 | 100 | 4 |
| 10 | 300 | 4 | 100 | 400 | 16 |
3. Calculate slopes (β1):
β1 = Σ[(x-6)(y-200)] / Σ(x-6)2
= (400+100+0+100+400) / (16+4+0+4+16)
= 1000 / 40
= 25
4. Calculate intercept (β0):
β0 = 200 - 25*6
= 200 - 150
= 50
5. Linear Regression Equation:
y = 50 + 25x
6. Plot the graph using scatter plot
7. Interpret the result :
- Every additional hour advertised increases bread loaves sold by 25.
- The bakery sells 50 loaves when no hours are advertised.
Empirical models are mathematical models based on observed
data, experience and statistical analysis rather than purely
theoretical assumptions. They describe relationships between
variables, predict outcomes and estimate parameters.
Types of Empirical Models
1. Linear Regression: Models linear relationships between
variables.
2. Non-Linear Regression: Models non-linear relationships
using polynomial or logarithmic functions.
3. Time Series Models: Analyze and forecast data with
temporal dependencies (e.g., ARIMA, Exponential Smoothing).
4. Machine Learning Models: Algorithms like decision trees,
random forests, and neural networks.
5. Econometric Models: Study economic relationships and
forecast economic indicators (e.g., GDP, inflation).
6. Statistical Models: Hypothesis testing and confidence
intervals (e.g., t-tests, ANOVA).
Characteristics
1. Data-driven: Derived from observational data.
2. Pragmatic: Focus on predictive accuracy rather than
theoretical purity.
3. Flexible: Can accommodate non-linear relationships and
interactions.
4. Interpretable: Provide insights into variable relationships.
Applications
1. Forecasting: Predict future values of economic indicators,
sales or demand.
2. Policy Evaluation: Assess impact of policy interventions.
3. Risk Analysis: Estimate probability of adverse events.
4. Optimization: Identify optimal settings for system
performance.
5. Data Mining: Discover hidden patterns and relationships.
Advantages
1. Improved prediction: Better forecasting accuracy.
2. Practical insights: Inform decision-making.
3. Flexibility: Handle complex relationships.
4. Interpretability: Understand variable interactions.
Limitations
1. Data quality: Sensitive to data errors and biases.
2. Overfitting: Models may fit noise rather than underlying
patterns.
3. Limited generalizability: Models may not apply outside the
data range.
4. Assumptions: Require careful validation.
Common Empirical Modeling Techniques
1. Least Squares Estimation
2. Maximum Likelihood Estimation
3. Cross-Validation
4. Bootstrap Resampling
5. Feature Engineering
Real-World Examples
1. Demand forecasting for supply chain optimization.
2. Credit risk modeling for loan approval.
3. Economic forecasting for fiscal policy.
4. Customer segmentation for targeted marketing.
5. Quality control in manufacturing.
Common Empirical Modeling Techniques
1. Least Squares Estimation
2. Maximum Likelihood Estimation
3. Cross-Validation
4. Bootstrap Resampling
5. Feature Engineering
Example: Predicting Coffee Shop Sales
Problem Statement
A coffee shop owner wants to predict daily sales based on
advertising expenditure.
Data
1. Independent variable (x): Advertising expenditure ($1,000s)
2. Dependent variable (y): Daily sales ($1,000s)
3. Sample size: 10 weeks
4. Data:
| Week | Advertising (x) | Sales
|1 |2 | 10 |
|2 |3 | 12 |
|3 |4 | 15 |
|4 |2 |9 |
|5 |5 | 18 |
|6 |3 | 11 |
|7 |4 | 16 |
|8 |6 | 20 |
|9 |5 | 19 |
| 10 |4 | 17 |
Empirical Model Linear Regression:
y = β0 + β1x + ε
Estimated Model
1. β0: Intercept = 6
2. β1: Slope = 2.5
Estimated Model Equation
y = 6 + 2.5x
Interpretation
1. For every additional $1,000 spent on advertising, sales
increase by $2,500.
2. Initial sales are $6,000 without advertising.
Limitations
1. Assumes linear relationship.
2. Ignores seasonality, competition and economic factors.

You might also like