0% found this document useful (0 votes)
462 views8 pages

Interval Estimation and Sample Size

This document discusses interval estimation for a population mean. It defines key terms like point estimate, margin of error, and interval estimate. It provides the formulas for constructing a confidence interval for a population mean when the population standard deviation is known or unknown. It discusses choosing an appropriate sample size and using Excel to calculate a confidence interval.

Uploaded by

Park Mina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
462 views8 pages

Interval Estimation and Sample Size

This document discusses interval estimation for a population mean. It defines key terms like point estimate, margin of error, and interval estimate. It provides the formulas for constructing a confidence interval for a population mean when the population standard deviation is known or unknown. It discusses choosing an appropriate sample size and using Excel to calculate a confidence interval.

Uploaded by

Park Mina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

CHAPTER 8 ---- INTERVAL ESTIMATION

Margin of Error and the Interval Estimate


 A point estimator cannot be expected to provide the exact value of the population parameter.
 An interval estimate can be computed by adding and subtracting a margin of error to the point estimate.
 Point Estimate +/- Margin of Error
 The purpose of an interval estimate is to provide information about how close the point estimate is to
the value of the parameter.
 The general form of an interval estimate of a population mean is --- x + Margin of Error

Interval Estimate of a Population Mean: s Known


 In order to develop an interval estimate of a population mean, the margin of error must be computed
using either:
 the population standard deviation s, or
 the sample standard deviation s
 s is rarely known exactly, but often a good estimate can be obtained based on historical data or other
information.
 We refer to such cases as the s known case.

σ
Interval Estimate of m ------- x ± z α / 2 √ n
where: x is the sample mean
1 - a is the confidence coefficient
za/2 is the z value providing an area of a/2 in the upper tail of the standard normal probability
distribution
s is the population standard deviation
n is the sample size

Values of za/2 for the Most Commonly Used Confidence Levels

Meaning of Confidence
 Because 90% of all the intervals
constructed using x + 1.645 σ xwill
contain the population mean, we say
we are 90% confident that the interval x
+1.645 σ x includes the population mean m.
 We say that this interval has been established at the 90% confidence level.
 The value .90 is referred to as the confidence coefficient.
Example: Lloyds Department store
Each week Lloyds department store selects a simple random sample of 100 customers in order to learn
about the amount spent per shopping trip. The historical data indicates that the population follows a normal
distribution.
During most recent week, Lloyd’s surveyed 100 customers (n = 100) and obtained a sample mean of x
= $82. Based on historical data, Lloyd’s now assumes a known value of 𝜎 = $20. The confidence coefficient to
be used in the interval estimate is .95.

 95% of the sample means that can be observed are within + 1.96 σ x of the population mean m. The

margin of error is: z α / 2


σ
√n
=1.96
( √20100 )=3.92
 Interval estimate of m is:
$82 + $ 3.92 or $78.08 to $85.29
 We are 95% confident that the interval contains the population mean.

Using Excel to construct a confidence interval - m Known

Example: Lloyds Department store

In order to have a higher degree of confidence, the margin


of error and thus the width of the confidence interval must
be larger.

Adequate Sample Size


 In most applications, a sample size of n ≥ 30 is adequate.
 If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is
recommended.
 If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will
suffice.
 If the population is believed to be at least approximately normal, a sample size of less than 15 can be
used.
 If an estimate of the population standard deviation s cannot be developed prior to sampling, we use the
sample standard deviation s to estimate s.
 This is the s unknown case.
 In this case, the interval estimate for m is based on the t distribution.
 (We’ll assume for now that the population is normally distributed.)
t Distribution
 William Gosset, writing under the name “Student”, is the founder of the t distribution.
 Gosset was an Oxford graduate in mathematics and worked for the Guinness Brewery in Dublin.
 He developed the t distribution while working on small-scale materials and temperature experiments.

 The t distribution is a family of similar probability distributions.


 A specific t distribution depends on a parameter known as the degrees of freedom.
 Degrees of freedom refer to the number of independent pieces of information that go into the
computation of s.
 A t distribution with more degrees of freedom has less dispersion.
 As the degrees of freedom increase, the difference between the t distribution and the standard normal
probability distribution becomes smaller and smaller.

Comparison of the standard normal distribution with t distributions having 10 and 20 degrees of freedom.

 For more than 100 degrees of freedom,


the standard normal z value provides a
good approximation to the t value.
 The standard normal z values can be
found in the infinite degrees row (
labeled ∞ ) of the t distribution table.
.

Interval Estimate of a Population Mean: s Unknown

s
x ± tα/2
√n
where: x = the sample mean
1 - a = the confidence coefficient
ta/2 = the t value providing an area of a/2 in the upper tail of a t distribution with n - 1 degrees of
freedom
s = the sample standard deviation
n = the sample size
Example: Credit card debt for the population of US households
The credit card balances of a sample of 70 households provided a mean credit card debt of $9312 with
a sample standard deviation of $4007.
Let us provide a 95% confidence interval estimate of the mean credit card debt for the population of US
households. We will assume this population to be normally distributed.

Interval Estimate of a Population Mean: s Unknown


At 95% confidence,  = .05, and /2 = .025.
t.025 is based on n - 1 = 70 - 1 = 69 degrees of freedom.

Interval Estimate of a Population Mean: s Unknown


Example: Credit card debt for the population of US households

�𝑥�� ± 𝑡�.025��𝑠���𝑛��
9312 + 1.995 �4007���70�� = 9312 + 955

We are 95% confident that the mean credit card debt for the population of US households is
between $8357 and $10267.

Using Excel’s Descriptive Statistics Tool


Steps

Step 1: Click the Data tab on the Ribbon

Step 2: In the Analysis group click Data Analysis

Step 3: Choose Descriptive Statistics from the list of Analysis tools

Using Excel’s Descriptive Statistics Tool


Step 4: When the Descriptive statistics dialog box appears
Enter Input Range
Select Grouped by columns
Select Labels in the first row
Select Output range:
Enter C1 in the output range box
Select summary statistics
Select confidence level for mean
Enter 95 in the confidence level for mean box
Click OK

Using Excel’s Descriptive Statistics Tool


 Excel Worksheets

95% confidence interval


for credit card balances.
Interval Estimate of a Population Mean: s Unknown
Adequate Sample Size
Usually, a sample size of n ≥ 30 is adequate when using the expression �𝑥�±�𝑡�𝛼/2�𝑠/��𝑛� to develop
an interval estimate of a population mean.
If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is
recommended.
If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice.
If the population is believed to be at least approximately normal, a sample size of less than 15 can be used.

Summary of Interval Estimation Procedures


for a Population Mean
Sample Size for an Interval Estimate of a Population Mean

Let E = the desired margin of error.


E is the amount added to and subtracted from the point estimate to obtain an interval estimate.
If a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of
error can be determined.

Sample Size for an Interval Estimate of a Population Mean


Margin of Error

𝐸=�𝑧�𝛼/2��𝜎���𝑛��
Necessary Sample Size

n = �(�𝑧�𝛼/2��)�2��𝜎�2���𝐸�2��

Sample Size for an Interval Estimate of a Population Mean


The Necessary Sample Size equation requires a value for the population standard deviation s .
If s is unknown, a preliminary or planning value for s can be used in the equation.
• Use the estimate of the population standard deviation computed in a previous
study.
• Use a pilot study to select a preliminary study and use the sample standard deviation from the
study.
3. Use judgment or a “best guess” for the value of s .

Sample Size for an Interval Estimate of a Population Mean


Example: Cost of renting Automobiles in United States
A previous study that investigated the cost of renting automobiles in the United States found a mean cost of
approximately $55 per day for renting a midsize automobile with a standard deviation of $9.65.
Suppose the project director wants an estimate of the population mean daily rental cost such that there is a .95
probability that the sampling error is $2 or less.
How large a sample size is needed to meet the required precision?

Sample Size for an Interval Estimate of a Population Mean


Example: Cost of renting Automobiles in United States

�𝐸=𝑧�𝛼/2��𝜎���𝑛��=2

At 95% confidence, z.025 = 1.96. Recall that  = 9.65.

𝑛=�(1.96�)�2�(9.65�)�2��(2�)�2��=89.43⋍ 90
The sample size needs to be at least 90 mid size automobile rentals in order to satisfy the project director’s $2
margin-of-error requirement.

Interval Estimate of a Population Proportion


The general form of an interval estimate of a population proportion is:

�𝑝� + Margin of Error

Interval Estimate of a Population Proportion

The sampling distribution of �𝑝� plays a key role in computing the margin of error for this interval estimate.

The sampling distribution of �𝑝� can be approximated by a normal distribution whenever np > 5 and n(1 – p)
> 5.

Interval Estimate of a Population Proportion


Normal Approximation of Sampling Distribution of �𝑝�

Interval Estimate of a Population Proportion


�𝑝�±�𝑧�𝛼/2���� �� �𝑝�(1−�𝑝�)�𝑛��

where: 1 -  is the confidence coefficient,


z/2 is the z value providing an area of /2 in the upper tail of the standard normal
probability distribution, and
�𝑝� is the sample proportion

Interval Estimate of a Population Proportion


Example: Survey of women golfers

A national survey of 900 women golfers was conducted to learn how women golfers view their treatment at golf
courses in United States. The survey found that 396 of the women golfers were satisfied with the availability of
tee times.
Suppose one wants to develop a 95% confidence interval estimate for the proportion of the population of
women golfers satisfied with the availability of tee times.

Interval Estimate of a Population Proportion


Example: Survey of women golfers
�𝑝�±�𝑧�𝛼/2���� �� �𝑝�(1−�𝑝�)�𝑛��

where: n = 900, �𝑝� = 396/900 = .44, z/2 = 1.96

.44±1.96��
��.44(1−.44)�900�� = .44 ± .0324
Survey results enable us to state with 95% confidence that between 40.76% and 47.24% of all women golfers
are satisfied with the availability of tee times.
Using Excel to construct a confidence interval
 Excel Formula and Value Worksheet
Sample Size for an Interval Estimate of
a Population Proportion
Margin of Error
E = �𝑧�𝛼/2���� �� �𝑝�(1−�𝑝�)�𝑛��
Solving for the necessary sample size n, we get

𝑛=����𝑧�𝛼/2���2��𝑝��1−�𝑝����𝐸�2��

However, �𝑝� will not be known until after we have selected the sample. We will use the planning value p*
for �𝑝�.

Sample Size for an Interval Estimate of


a Population Proportion
Necessary Sample Size

𝑛=����𝑧�𝛼/2���2��𝑝�∗��1−�𝑝�∗����𝐸�2��

The planning value p* can be chosen by:


1. Using the sample proportion from a previous sample of the same or similar units, or
2. Selecting a preliminary sample and using the sample proportion from this sample.
3. Using judgment or a “best guess” for a p* value.
4. Otherwise, using .50 as the p* value.

Sample Size for an Interval Estimate of


a Population Proportion
Example: Survey of women golfers

Suppose the survey director wants to estimate the population proportion with a margin of error of .025 at 95%
confidence.

How large a sample size is needed to meet the required precision? (A previous sample of similar units
yielded .44 for the sample proportion.)

Sample Size for an Interval Estimate of


a Population Proportion
Example: Survey of women golfers

E = �𝑧�𝛼/2���� 𝑝�∗�(1−�𝑝�∗�)�𝑛��= .025


�� �

At 95% confidence, z.0125 = 1.96. Recall that p* = .44.

𝑛=����𝑧�𝛼/2���2��𝑝�∗��1−�𝑝�∗����𝐸�2��=���1.96��2�(.44)�.56���(.025)�2��=
1514.5
A sample of size 1515 is needed to reach a desired precision of + .025 at 95% confidence.
Sample Size for an Interval Estimate of
a Population Proportion

Note: We used .44 as the best estimate of p in the preceding expression. If no information is available
about p, then .5 is often assumed because it provides the highest possible sample size. If we had used
p = .5, the recommended n would have been 1537.

Implications of Big Data

As the sample size becomes extremely large, the margin of error becomes extremely small and resulting
confidence intervals become extremely narrow.
No interval estimate will accurately reflect the parameter being estimated unless the sample is relatively free of
nonsampling error.
Statistical inference along with information collected from other sources can help in making the most informed
decision.

End of Chapter 8

Common questions

Powered by AI

A higher confidence level requires capturing more of the population variability with a greater zα/2 value, leading to a wider margin of error. To maintain a particular margin of error with increased confidence, the sample size must increase to offset the greater variability allowance. The formula n = (zα/2 * σ / E)^2 illustrates this relationship, where a larger zα/2 (associated with higher confidence) necessitates a larger n when E is kept constant. Therefore, larger sample sizes are imperative for high confidence levels without sacrificing estimate precision .

The confidence coefficient, which represents the probability that the confidence interval contains the true population parameter, influences the width of the interval such that a higher confidence level (e.g., 95% vs. 90%) leads to a wider confidence interval. This is because a higher level of confidence requires capturing more variation, thereby increasing the margin of error. To maintain a desired margin of error at higher confidence levels, a larger sample size is needed, as the formula for margin of error is (zα/2 * σ) / √n, where zα/2 increases with confidence level .

To estimate the required sample size for achieving a desired margin of error, the following formula is used: n = (zα/2 * σ / E)^2, where zα/2 is the z-value corresponding to the desired confidence level, σ is the population standard deviation, and E is the desired margin of error. If σ is unknown, a preliminary study or pilot study may be used to estimate it. This calculation ensures that the interval estimate has the specified precision and confidence level. Adjustments to sample size calculation are necessary if the initial assumptions regarding σ or population distribution are updated .

The assumption that the population distribution is normal is crucial for determining the sample size because it affects the choice of using the t-distribution or the normal distribution for interval estimation. For normally distributed populations, a smaller sample size suffices since the central limit theorem approximates the sample mean's distribution as normal. Specifically, if the distribution is highly skewed or has outliers, a larger sample size of 50 or more is recommended to ensure the sample mean approximates normality. If the distribution is not normal but symmetric, samples as small as 15 can be used .

Using Excel or similar software for constructing confidence intervals offers several advantages, including efficiency, accuracy, and user-friendliness. These programs automate complex calculations and reduce human error potential, especially for large datasets. They also provide visualizations and data management features, facilitating a more comprehensive analysis. Moreover, tools like Excel's Descriptive Statistics simplify setting confidence coefficients and margins of error, allowing for quicker adjustments and scenario analyses. However, users must understand the statistical reasoning behind their inputs to ensure appropriate application .

Degrees of freedom are necessary when using a t-distribution because they influence the shape of the distribution, particularly its tail thickness. The degrees of freedom, defined as the sample size minus one (n - 1), adjust the t-distribution to account for the added variability present in a small sample. As degrees of freedom increase, the t-distribution converges to the normal distribution. Thus, knowing the degrees of freedom allows for the selection of the appropriate t-value for calculating the confidence interval .

An assumed known population standard deviation might be used instead of a sample standard deviation to enhance the precision and reliability of the interval estimate when historical data or reliable estimates of σ exist. Using a population standard deviation reduces the added variability introduced by estimating σ from a sample, which is particularly valuable with small samples. However, this approach requires confidence in the accuracy and applicability of the historical standard deviation to current conditions .

When using historical data to estimate a population standard deviation for interval estimation, it's essential to ensure that the conditions during the historical data's collection are similar to the current conditions. Factors such as changes in population variability, methodology alterations, or shifts in data collection methods can affect the validity of using historical estimates. Verifying the consistency of the standard deviation over time ensures the reliability of the interval estimate. If deviations are detected, more contemporary data or pilot studies might be needed to provide a more accurate estimate .

A t-distribution is used for interval estimation when the population standard deviation is unknown and the sample size is small (typically n < 30). Since smaller samples result in greater variability, the t-distribution, which accounts for this increased uncertainty with thicker tails, provides a more accurate reflection of the sample's variability compared to the normal distribution. The degrees of freedom, defined as n - 1, are used to determine the exact form of the t-distribution applicable to the data .

Using an incorrect estimate for the standard deviation in confidence interval construction affects the accuracy of the interval by misestimating the margin of error. If the standard deviation is underestimated, the interval will be too narrow, increasing the risk of failing to capture the true population parameter. Conversely, overestimating it will make the interval unnecessarily wide, suggesting more variability than actually exists. This misestimation can undermine the reliability of the confidence interval, leading to incorrect inferences about the population mean .

You might also like