0% found this document useful (0 votes)
3 views

BUS51A_lecture12

The document covers interval estimation and hypothesis testing in business analytics, highlighting the importance of using sample data to estimate population parameters and test hypotheses. It explains the concepts of point estimation, margin of error, t-distribution, and the types of errors in hypothesis testing, including Type I and Type II errors. Additionally, it provides examples and exercises to illustrate how to formulate hypotheses and interpret statistical results.

Uploaded by

emmafillen
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

BUS51A_lecture12

The document covers interval estimation and hypothesis testing in business analytics, highlighting the importance of using sample data to estimate population parameters and test hypotheses. It explains the concepts of point estimation, margin of error, t-distribution, and the types of errors in hypothesis testing, including Type I and Type II errors. Additionally, it provides examples and exercises to illustrate how to formulate hypotheses and interpret statistical results.

Uploaded by

emmafillen
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Interval

Estimation and
Hypothesis
Testing
BUS 51A: Lecture 12
Introduction to
Business Analytics with Excel
• Score on DAA is updated.

• Final project Team selection due is on Friday.

• Each final project will involve two different DAA teams working together (a

total of 4-5 students).

• If you are unable to find team members, please inform me in advance.

• See the guideline in the syllabus.


Interval Estimation

• So far, we have learned the point estimation.


• The sample mean is a point estimator of the population mean 𝜇 and the sample proportion

is a point estimator of the population proportion p.

• Because a point estimator cannot be expected to provide the exact value of

the population parameter, interval estimation is frequently used to generate

an estimate of the value of a population parameter.

• An interval estimate is often computed by adding and subtracting a value,

called the margin of error, to the point estimate:


Interval Estimation of the Population Mean

• The general form of an interval estimate of a population mean is .

• We have learned that


• The sampling distribution of has a mean equal to the population mean

• A standard deviation equal to the population standard deviation divided by the


square root of the sample size.

• For a large sample or for a sample taken from a normally distributed population,
the sampling distribution of follows a normal distribution.

• Because the sampling distribution of shows how values of are distributed


around the 𝜇, the sampling distribution of provides information about the
possible differences between and 𝜇 .
Interval Estimation of the Population Mean

• For any normally distributed random variable,

• 90% of the values lie within 1.645 standard deviations of the mean

• 95% of the values lie within 1.960 standard deviations of the mean

• 99% of the values lie within 2.576 standard deviations of the mean.

• Thus, when the sampling distribution of is normal,

• 90% of all values of must be within of the mean 𝜇

• 95% of all values of must be within of the mean 𝜇

• 99% of all values of must be within of the mean 𝜇 .


Sampling Distribution of

• 10 independent random samples when

the sampling distribution of is normal.

• Since 90% of all values of must be

within of the mean 𝜇, we expect 9 of

the values of for these 10 samples

within of 𝜇

• Let’s see whether this is really

happening.
Interval Estimation of the Population Mean

• We now want to use what we know about the sampling distribution of to


develop an interval estimate of the population mean 𝜇 .

• However, when developing an interval estimate of a population mean 𝜇 , we


generally do not know the population standard deviation 𝜎 and .

• In this case, we use sample data to estimate: .

• When we estimate with , we introduce an additional source of uncertainty


about the distribution of values of .

• If the sampling distribution of follows a normal distribution, we address this


additional source of uncertainty by using a probability distribution known as
The Family of t Distribution

• If an estimate of the population standard deviation s is unavailable, we use the


sample standard deviation 𝑠, instead.

• We address this additional uncertainty using a t distribution, a family of


similar probability distributions that depend on the degrees of freedom.

• These t distributions are similar in shape to the standard normal


distribution, but wider.

• As the degrees of freedom increase, the t distribution narrows, its peak


becomes higher, and it becomes more similar to the standard normal
distribution.
The Family of t Distribution

• These t distributions are similar in shape to the standard normal distribution,


but wider.

• As the degrees of freedom increase, the t distribution narrows, its peak


becomes higher, and it becomes more similar to the standard normal
distribution.
t Distribution with n-1 Degrees of Freedom

• To compute the margin of error for the EAI example, we consider the t distribution
with degrees of freedom.

• At a 90% confidence level, we use the expression to denote the value for which the
area in the upper tail of a t distribution is 0.05.
• We can use Excel to compute the value of at a
0.90 confidence coefficient.
• The significance level is computed as

• For 29 degrees of freedom, we would write:


Interval Estimate of the Population Mean

• The interval estimate of , using as an estimate of the population standard deviation , is written as

• If we want to find a 95% confidence interval for the mean manager salary in the EAI example, we
observe that and compute with degrees of freedom as .

• Thus, we have

• We are 95% confident that the mean manager salary at EAI is between $70,564 and $73,064.
Hypothesis Test

• Hypothesis testing determines whether a statement about the value of a


population parameter should or should not be rejected.

• The hypothesis testing procedure uses data from a sample to test the
validity of the two competing statements indicated by and .
• : the null hypothesis, is a statement (a tentative conjecture) about a population
parameter.
• the alternative hypothesis, is the opposite of what is stated in the null hypothesis .

• We describe how hypothesis tests can be conducted about a population


mean 𝜇 and a population proportion 𝑝.
Developing Null and Alternative
Hypotheses
• The context of the situation and structure of the hypotheses should be properly understood so

that the test conclusion provides the desired information.

• All hypothesis testing applications involve collecting a sample and using the sample results

to provide evidence for drawing a conclusion.

• Questions to consider when formulating null and alternative hypotheses are:

• What is the purpose of collecting the sample?

• What conclusions are we hoping to make?

• In some situations, it is easier to identify the alternative hypothesis before the null hypothesis.

• In other situations, starting with the null hypothesis may be easier.


as a Research Hypothesis

• When an attempt is being made to gather evidence in support of


a research hypothesis, it is often best to begin with the
alternative hypothesis and make it the conclusion that the
researcher hopes to support.

• The conclusion that the research hypothesis is true is made if


the sample data provide sufficient evidence to show that the null
hypothesis can be rejected.
as a Research Hypothesis

• Example: A new fuel injection unit being developed is believed


to provide more than 24 miles per gallon.

• First, we set the population mean miles per gallon, , as the


alternative hypothesis. Then, we set the the null hypothesis as .


as an Assumption to be Challenged

• We might begin with a belief or assumption that a statement about the


value of a population parameter is true.

• We then use a hypothesis test to challenge the assumption and


determine if there is statistical evidence to conclude that the
assumption is incorrect.

• In these situations, it is helpful to develop the null hypothesis first.


as an Assumption to be Challenged

• Example: The label on a bottle states that it contains 67.6 fluid


ounces.

• We assume the label that the population mean filling weight m is


67.6 fluid ounces and set it as the null hypothesis. Then, we
challenge it with the alternative hypothesis.


Summary of Forms for and

• The equality part of the hypotheses always appears in the null hypothesis.

• In general, a hypothesis test about the value of a population mean must take

one of the following three forms, where is the hypothesized value of the
One-tailed
population mean (or ). One-tailed Two-tailed
(Lower-tail) (Upper-tail)

• If the sample results support , then the conclusion is to reject . Otherwise,

the conclusion is to not reject .


Exercise 1

• The manager of the Danvers-Hilton Resort Hotel stated that the


mean guest bill for a weekend is $600 or less. A member of the
hotel’s accounting staff noticed that the total charges for guest
bills have been increasing in recent months. The accountant will
use a sample of future weekend guest bills to test the manager’s
claim.

• Which form of the hypotheses should be used to test the


manager’s claim?
Exercise 1

• vs.

• What conclusion is appropriate when H0 cannot be rejected?


• We are unable to conclude that the manager’s claim is wrong.

• What conclusion is appropriate when H0 can be rejected?


• We have some evidence to support
Exercise 2

• A production line operation is designed to fill cartons with laundry


detergent to a mean weight of 32 ounces. A sample of cartons is
periodically selected and weighed to determine whether underfilling
or overfilling is occurring. If the sample data lead to a conclusion of
underfilling or overfilling, the production line will be shut down and
adjusted to obtain proper filling.

• Formulate the null and alternative hypotheses that will help in


deciding whether to shut down and adjust the production line.
Exercise 2

• vs.

• What conclusion is appropriate when H0 cannot be rejected?


• There is no evidence that the production line is not operating properly.
Allow the production process to continue.

• What conclusion is appropriate when H0 can be rejected?


• We have some evidence to support . Shut down and adjust the production
line possibly.
Type I and Type II Errors

• Because hypothesis tests are based on sample data, we must allow


for the possibility of errors.

• There are two types of errors that can be made in hypothesis tests:
• Type I error: rejecting when true.

• The level of significance is the probability of making a Type I error when


the null hypothesis is true as an equality.

• Applications of hypothesis testing that only control for the Type I error are
often called significance tests.
Type I and Type II Errors

• Because hypothesis tests are based on sample data, we must allow


for the possibility of errors.

• There are two types of errors that can be made in hypothesis tests:
• Type II error: accepting when false.

• Determining Type II error is difficult, but when it can be done, the “accept ”
conclusion is appropriate.

• Statisticians avoid the risk of making a Type II error by concluding “do not
reject ” rather than “accept ”.
Example of Type I and Type II Errors

• Recall the hypothesis testing example about the development of


a new fuel injection unit.
• Population Condition
Conclusio
True True
n
Correct
Accept Type II Error
Conclusion
Correct
Reject Type I Error
Conclusion
Example of Type I and Type II Errors

• A Type I error, rejecting when true, is claiming that the new


system improves the miles-per-gallon rating () when in fact it
does not.

• A Type II error, accepting when false, is claiming that the new


system does not improve the miles-per-gallon rating () when in
fact it does.
Exercise 3

• Duke Energy reported that the cost of electricity for an efficient


home in a particular neighborhood of Cincinnati, Ohio, was $104
per month.

• A researcher believes that the cost of electricity for a comparable


neighborhood in Chicago, Illinois, is higher.

• A sample of homes in this Chicago neighborhood will be taken and


the sample mean will be used to test the following null and
alternative hypotheses: vs.
Exercise 3

• vs.

• What is the Type I error in this situation? What are the


consequences of making this error?
• Reject when it is true.

• This error occurs if the researcher concludes that the population mean
monthly cost of electricity is greater than $104 in the Chicago
neighborhood when the population mean cost is actually less than or
equal to $104.
Exercise 3

• vs.

• What is the Type II error in this situation? What are the


consequences of making this error?
• Accept when it is false.

• This error occurs if the researcher concludes that the population mean
monthly cost for the Chicago neighborhood is less than or equal to $104
when it is not.
One-Tailed Test About a Population Mean

• One-tailed tests about a population mean take the form of lower tail test or upper
tail test, depending on the inequality sign in the alternative hypothesis.

• Example: the Federal Trade Commission (FTC) is testing a Hilltop Coffee’s claim that
its large can contains at least 3 pounds of coffee.

• The FTC assumes the Hilltop Coffee’s claim correct ( lbs) and sets the alternative
hypothesis to challenge such claim with a lower tail test, lbs.
• vs.

• Suppose the FTC is willing to take a 1% risk of making an error on its testing of
Hilltop Coffee’s claim.
Test Statistic About a Population Mean

• Let’s use coffee.xlsx.

• There are 36 sample and assume that they are randomly sampled.

• Let’s figure out: sample mean (), sample standard deviation (s)

• We can use the sample standard deviation as an estimate of the population


standard deviation to write the sampling distribution of as
• normally distributed,

• centered around the hypothesized mean, , and with

• estimated standard error, .


Test Statistic About a Population Mean

• If is normally distributed, the test statistic of a hypothesis test about a


population mean follows a t distribution with degrees of freedom.
Lower Tail p-value About a Population Mean

• A p-value is a probability, computed using the test


statistic, that measures the support provided by the
sample for the null hypothesis.

• Suppose pounds. Thus, we write

• For a lower tail test, the p-value is the probability that .

• T.DIST function
Rejection Rule Using p-value

• Whether we reject depends upon the level of significance for the test:

• If a p-value is less than or equal to the level of significance , the value of the test statistic is in

the rejection region.

• Because the FTC stated to be willing to take a 1% risk of making a Type I error, we set the

significance level .

• We write the rejection rule using the p-value as:

• Reject if p-value ≤ α

• With the , we reject .

• Thus, the FTC finds sufficient statistical evidence to conclude at the 0.01 level of significance that

the filling weight of Hilltop Coffee cans is less than 3 lbs.


Exercise 4

• Which is cheaper: eating out or dining in?

• The mean cost of a flank steak, broccoli, and rice bought at the grocery store

is $23.04.

• A sample of 100 neighborhood restaurants showed a mean price of $22.75

and a standard deviation of $2 for a comparable restaurant meal.

• Develop appropriate hypotheses for a test to determine whether the sample

data support the conclusion that the mean cost of a restaurant meal is less

than fixing a comparable meal at home.


Exercise 4

• and

• Using the sample from the 100 restaurants, what is the p value?

• p-value is the lower tail area at the test statistic: p-value

• At , what is your conclusion?


• p-value > 0.05; do not reject H0.

• We cannot conclude that the cost of a restaurant meal is significantly cheaper than a

comparable meal fixed at home.


Two-Tailed Test About a Population Mean

• Holiday Toys’ marketing director surveys a sample of retailers to test whether a

new product per store demand is units (DATAfile: Orders.)

• The director sets the hypotheses as a two-tailed test, with .

• and

• With and , we have

• Because , cannot be rejected, and the director does not have enough evidence to

conclude that per store demand is 40 units.


Exercise 5

• Salary.com reports that national mean annual salary for a school

administrator is $80,935 a year.

• A school official took a sample of 25 school administrators in the state of

Ohio to learn about salaries in that state to see if they differed from the

national average. (Use administrator.xlsx)

• Formulate hypotheses that can be used to determine whether the population

mean annual administrator salary in Ohio differs from the national mean of

$90,000.
Exercise 5

• and

• What is the p-value?


• and Degrees of freedom:

• Because , p-value is two times the lower tail area Using t table: the area in lower tail is

between 0.01 and 0.025; therefore, p-value is between 0.02 and 0.05.

• Using Excel: p-value

• At , reject : The mean annual administrator salary in Ohio differs significantly

from the national mean annual salary.


Step 1. Develop the null and Step 4. Use the value of the test
alternative hypotheses. statistic to compute the p-value.
Step 2. Specify the level of Step 5. Reject if p-value.
significance. Step 6. Interpret the statistical
Step 3. Collect the sample data and conclusion in the context of the
compute the value of the test statistic.Upper
Lower Tail Test
application
Tail Test Two-Tailed Test
Hypotheses

Test
Statistic

Double the minimum


p-Value value of the two one-tail
cases
Exercise 6

• The Coca-Cola Company reported that the mean per capita annual sales of its

beverages in the United States was 423 eight-ounce servings.

• Suppose you are curious whether the consumption of Coca-Cola beverages is

higher in Atlanta, the location of Coca-Cola’s HQ.

• A sample of 36 individuals from the Atlanta area showed a sample mean

annual consumption of 460.4 eight ounce servings with a standard deviation

of s = 101.9 ounces.

• Using α = 0.05, do the sample results support the conclusion that mean
Exercise 6

• and

• and Degrees of freedom:

• P-value is upper-tail area!


• Using Excel: p-value

• At , reject : Atlanta customers have a higher annual rate of consumption of

Coca Cola beverages.


Hypothesis Test About a Population Proportion

• A hypothesis test about a population proportion p takes one of the


following Lower
forms.Tail Test Upper Tail Test Two-Tailed Test

• Example: the course manager at Pine Creek Golf Course wants to


determine whether a recent promotion targeted at women has
increased their proportion from a historical 20%.
Test Statistic About a Population Proportion

• Suppose a random sample of players was selected, and that were women.
The sample proportion of women golfers can be written as

• For the test statistic of a hypothesis test about a population proportion, we


can use the standard normal random variable, , to describe the sampling
distribution of so long as we have at least, , and .

• Thus, we can write the test statistic as:


One-Tailed Test p-value About a Population
Proportion

• We can use Excel to calculate the


p-value for an upper tail test as

• Because we reject and conclude


that the proportion of women
golfers at Pine Creek Golf Course
has increased.
Lower Tail Test Upper Tail Test Two-Tailed Test

Hypothese
s

Test
Statistic

Double the minimum


p-Value value of the two one-tail
cases
Next lecture

• Linear Regression

You might also like