Math 10 Winter 2016 Reg
Math 10 Winter 2016 Reg
Calculator – Scientific Calculator is sufficient. Cell phone calculators are not allowed on
exams.
Access to a computer outside of class; we will be using the computer lab and Minitab . Also,
you will need an e-mail address and access to the Internet. Course topics, homework, exam
information, handouts, data sets, and other information will be posted on the website.
Grading: Grading will be based on the following criteria. Grades are not negotiable.
Homework: Completed Homework must be turned in by the due date, but should be completely daily. Homework
assignments may also be posted on the website. There is no credit for late homework.
Exams: There will be two exams during the quarter. Your final exam score will replace your lowest scoring exam if
it improves your grade. There are no make-up exams.
Final Exam: A comprehensive exam will be given on the final exam date.
Computer Lab: Lab classes will be held in the math computer lab: S44. You will use Mintab and other statistical
software in analyzing data, learning statistical models and working on the class material Computer
labs can be done in groups of no more than four people for a common grade and be turned in by
email on the due date. There is no credit for late labs received after midnight on the due
date.
Adding/Dropping: If you choose not to complete the course, it is your responsibility to officially drop or withdraw
from the course by the deadline date. I will not sign late drop or withdrawal forms.
Attendance: It is expected that you attend both the lecture and labs. Attendance means arriving on time and
staying the entire scheduled period.
Changes: Information in this syllabus may be changed during the quarter, but you will be informed in advance.
Other Information: All students are expected to understand the college policy on cheating as outlined in the
student handbook. Plagiarism (submitting another’s work as your own) will result in an
immediate failure for the course for your entire group.
Cell phones and and other electronic devices need to be turned off or silenced. Please arrive
on time and stay the entire period.
Read the Frequently Asked Questions on the website for other policies and procedures.
Student Learning Outcomes (SLO's) are also posted on the class website.
If you feel that you may need an accommodation based on the impact of a disability, you
should contact me privately to discuss your specific needs. Also, please contact Disability
Support Services (864-8753) or Educational Diagnostic Center (864-8839) for information or
questions about eligibility, services and accommodations for physical (DSS), psychological
(DSS) or learning (EDC) disabilities.
Tentative Schedule - Math 10 - Sec 28
Winter Quarter - 2016
`
Monday Tuesday Wednesday Thursday Friday
Jan 4 5 6 7 8
Part 1 Part 1
HW 0
Lab 1 Due
Jan 11 12 13 14 15
Part 1/2 Part 2 Drop Deadline
HW 1 (Jan 17)
Lab 2 Due
Jan 18 19 20 21 22
Holiday Part 3
HW 2
Lab 3 Due
Jan 25 26 27 28 29
Part 3/4 Part 4/Review
HW 3
Lab 4 Due
Feb 1 2 3 4 5
Exam 1 Part 5
Part 4/5 HW 4
Lab 5 Due
Feb 8 9 10 11 12
Part 5/6 Part 6 Holiday
HW 5
Lab 6 Due
Feb 15 16 17 18 19
Part 6 Part 6
Lab 7 Due
Feb 22 23 24 25 26
Holiday Part 7 Withdraw Deadline
HW 6
Lab 8 Due
Feb/Mar 29 1 2 3 4
Part 7 Review/Part 8
Lab 9 Due
Mar 7 8 9 10 11
Exam 2 Part 8
Part 8
HW 7 Lab 10 Due
Mar 14 15 16 17 18
Part 8/9 Part 9
HW 8
Review Lab 11 Due
Mar 21 22 23 24 25
Final Exam
4:00-6:00
HW 9
Slides Topic Illowsky/Dean Geraghty
1 (all), 2 (all), 6.3, 12.4, 12.6,
Part 1 Descriptive Statistics 12.7 Sec 4 - outliers
Part 9 Regression 12
DE ANZA COLLEGE – DEPARTMENT OF MATHEMATICS
Supplementary material for an introductory lower division course in Probability and Statistics
Page |1
However, when the course turned to inference and hypothesis testing, I watched these students’
performance deteriorate. One student asked me after class to again explain the difference between the
Null and Alternative Hypotheses. I tried several methods, but it was clear these students never really
understood the logic or the reasoning behind the procedure. These students could easily perform the
calculations, but they had difficulty choosing the correct model, setting up the test, and stating the
conclusion.
These students, (to their credit) continued to work hard; they wanted to understand the material, not
simply pass the class. Since these students had excellent math skills, I went deeper into the explanation
of Type II error and the statistical power function. Although they could compute power and sample size
for different criteria, they still didn’t conceptually understand hypothesis testing.
On my long drive home, I was listening to National Public Radio’s Talk of the Nation 1 where there was a
discussion on the difference between the reductionist and holistic approaches to the sciences, which the
commentator described as the western tradition vs. the eastern tradition. The reductionist or western
method of analyzing a problem, mechanism or phenomenon is to look at the component pieces of the
system being studied. For example, a nutritionist breaks a potato down into vitamins, minerals,
carbohydrates, fats, calories, fiber and proteins. Reductionist analysis is prevalent in all the sciences,
including Inferential Statistics and Hypothesis Testing.
Holistic or eastern tradition analysis is less concerned with the component parts of a problem,
mechanism or phenomenon but instead how this system operates as a whole, including its surrounding
environment. For example, a holistic nutritionist would look at the potato in its environment: when it
was eaten, with what other foods, how it was grown, or how it was prepared. In holism, the potato is
much more than the sum of its parts.
This illustrative example shows the difference between reductionist and holistic analyses. Each
rendering teaches something important about the fish: The reductionist drawing of the fish anatomy
helps explain how a fish is built and the holistic watercolor helps explain how a fish relates to its
environment. Both the reductionist and holistic methods add to knowledge and understanding, and
both philosophies are important. Unfortunately, much of Western science has been dominated by the
reductionist philosophy, including the backbone of the scientific method, Inferential Statistics.
Although science has traditionally been reluctant to embrace, often hostile to including holistic
philosophy in the scientific method, there have been many who now support a multicultural or multi-
philosophical approach. In his book Holism and Reductionism in Biology and Ecology 4, Looijen claims
that “holism and reductionism should be seen as mutually dependent, and hence co-operating
research programs than as conflicting views of nature or of relations between sciences.” Holism
develops the “macro-laws” that reductionism needs to “delve deeper” into understanding or explaining
a concept or phenomena. I believe this claim applies to the study of Statistics as well.
I realize that the problem of my high-achieving students being unable to comprehend hypothesis testing
could be cultural – these were international students who may have been schooled under a more
holistic philosophy. The Introductory Statistics curriculum and most texts give an incomplete
explanation of the logic of Hypothesis Testing, eliminating or barely explaining such topics as Power, the
consequence of Type II error or Bayesian alternatives. The problem is how to supplement an
Introductory Statistics course with a holistic philosophy without depriving the students of the required
reductionist course curriculum – all in one quarter or semester!
I believe it is possible to teach the concept of Inferential Statistics holistically. This course material is a
result of that inspiration, which was designed to supplement, not replace, a traditional course textbook
or workbook. This supplemental material includes:
• Examples of deriving research hypotheses from general questions and explanatory conclusions
consistent with the general question and test results.
• An in-depth explanation of statistical power and type II error.
Page |4
• Techniques for checking that validity of model assumptions and indentifying potential outliers
using graphs and summary statistics.
• Replacement of the traditional step-by-step “cookbook” for hypothesis testing with interrelated
procedures.
• De-emphasis of algebraic calculations in favor of a conceptual understanding using computer
software to perform tedious calculations.
• Interactive Flash animations to explain the Central Limit Theorem, inference, confidence
intervals, and the general hypothesis testing model including Type II error and power.
• PowerPoint Slides of the material for classroom demonstration.
• Excel Data sets for use with computer projects and labs.
This material is limited to one population hypothesis testing but could easily be extended to other
models. My experience has been that once students understand the logic of hypothesis testing, the
introduction of new models is a minor change in the procedure.
Page |5
This old story from China or India was made into the poem The Blind Man and the Elephant by John
Godfrey Saxe 5. Six blind men find excellent empirical evidence from different parts of the elephant and
all come to reasoned inferences that match their observations. Their research is flawless and their
conclusions are completely wrong, showing the necessity of including holistic analysis in the scientific
process.
The second feeling of the tusk, cried: "Ho! what have we here,
so very round and smooth and sharp? To me tis mighty clear,
this wonder of an elephant, is very like a spear!"
The fourth reached out his eager hand, and felt about the knee:
"What most this wondrous beast is like, is mighty plain," quoth he;
"Tis clear enough the elephant is very like a tree."
The fifth, who chanced to touch the ear, Said; "E'en the blindest man
can tell what this resembles most; Deny the fact who can,
This marvel of an elephant, is very like a fan!"
The first story is about a drug that was thought to be effective in research, but was pulled from the
market when it was found to be ineffective in practice.
Companies that market the suppositories, according to FDA, are Bio Pharm, Dispensing Solutions,
G&W Laboratories, Paddock Laboratories, and Perrigo New York. Bio Pharm also distributes the
products, along with Major Pharmaceuticals, PDRX Pharmaceuticals, Physicians Total Care,
Qualitest Pharmaceuticals, RedPharm, and Shire U.S. Manufacturing.
FDA had determined in January 1979 that trimethobenzamide suppositories lacked "substantial
evidence of effectiveness" and proposed withdrawing approval of any NDA for the products.
"There's a variety of reasons" why it has taken FDA nearly 30 years to finally get the suppositories
off the market, Levy said.
At least 21 infant deaths have been associated with unapproved carbinoxamine-containing products,
Levy noted.
Many products with unapproved labeling may be included in widely used pharmaceutical reference
materials, such as the Physicians' Desk Reference, and are sometimes advertised in medical journals,
he said.
Regulators urged consumers using suppositories containing trimethobenzamide to contact their health
care providers about the products.
The second story is about promising research that was abandoned because the test data showed no
significant improvement for patients taking the drug.
Treatment with interferon gamma-1b (Ifn-g1b) does not improve survival in people with a fatal lung
disease called idiopathic pulmonary fibrosis, according to a study that was halted early after no
benefit to participants was found.
Previous research had suggested that Ifn-g1b might benefit people with idiopathic pulmonary fibrosis,
particularly those with mild to moderate disease.
The new study included 826 people, ages 40 to 79, who lived in Europe and North America. They
were given injections of either 200 micrograms of Ifn-g1b (551 people) or a placebo (275) three times
a week.
Page |7
After a median of 64 weeks, 15 percent of those in the Ifn-g1b group and 13 percent in the placebo
group had died. Symptoms such as flu-like illness, fatigue, fever and chills were more common
among those in the Ifn-g1b group than in the placebo group. The two groups had similar rates of
serious side effects, the researchers found.
"We cannot recommend treatment with interferon gamma-1b since the drug did not improve survival
for patients with idiopathic pulmonary fibrosis, which refutes previous findings from subgroup
analyses of survival in studies of patients with mild-to-moderate physiological impairment of
pulmonary function," Dr. Talmadge E. King Jr., of the University of California, San Francisco, and
colleagues wrote in the study published online and in an upcoming print issue of The Lancet.
The negative findings of this study "should be regarded as definite, [but] they should not discourage
patients to participate in one of the several clinical trials currently underway to find effective
treatments for this devastating disease," Dr. Demosthenes Bouros, of the Democritus University of
Thrace in Greece, wrote in an accompanying editorial.
Bouros added that people deemed suitable "should be enrolled early in the transplantation list, which
is today the only mode of treatment that prolongs survival."
Although these are both stories of failures in using drugs to treat diseases, they represent two different
aspects of hypothesis testing. In the first story, the suppositories were thought to effective in treatment
from the initial trials, but were later shown to be ineffective in the general population. This is an
example of what statisticians call Type I Error, supporting a hypothesis (the suppositories are effective)
that later turns out to be false.
In the second story, researchers chose to abandon research when the interferon was found to be
ineffective in treating lung disease during clinical trials. Now this may have been the correct decision,
but what if this treatment was truly effective and the researchers just had an unusual group of test
subjects? This would be an example of what statisticians call Type II Error, failing to support a
hypothesis (the interferon is effective) that later turns out to be true. Unlike the first story, we will never
get to find out the answer to this question since the treatment will not be released to the general public.
In a traditional Introductory Statistics course, very little time is spent analyzing the potential error shown
in the second story. However, both types of error are important and will be explored in this course
material.
Page |8
A student asked me about the distribution of exam scores after she saw her score of 87 out of 100. I told
her the distribution of test scores were approximately bell-shaped with a mean score of 75 and a
standard deviation of 10. Most people would have an intuitive grasp of the mean score as being the
“average student’s score” and would say this student did better than average. However, having an
intuitive grasp of standard deviation is more challenging. The Empirical Rule is a helpful tool in
explaining standard deviation.
The standard deviation is a measure of variability or spread from the center of the data as defined by
the mean. The empirical rules states that for bell-shaped data:
The student who scored an 87 would be in the upper 16% of the class, more than one standard
deviation above the mean score.
Related to the Empirical Rule is the Z-score which measures how many standard deviations a particular
data point is above or below the mean. Unusual observations would have a Z-score over 2 or under -2.
Extreme observations would have Z-scores over 3 or under -3 and should be investigated as potential
outliers.
Xi − X
Formula for Z-score: Z=
s
Page |9
The student who received an 87 on the exam would have a Z-score of 1.2, meaning her score was well
above average, but not highly unusual.
In the section on descriptive statistics, we studied the sample mean, 𝑋�, as measure of central tendency.
Now we want to consider 𝑋� as a Random Variable.
We start with a Random Sample X1, X2, …, Xn where each of the random variables Xi has the same
probability distribution and are mutually independent of each other. The sample mean is a function of
these random variables (add them up and divide by the sample size), so 𝑋� is a random variable. So what
is the Probability Distribution Function (PDF) of 𝑋� ?
To answer this question, conduct the following experiment. We will roll samples of n dice, determine
the mean roll, and create a PDF for different values of n.
For the case n=1, the distribution of the sample mean is the same as the distribution of the random
variable. Since each die has the same chance of being chosen, the distribution is rectangular shaped
centered at 3.5:
P a g e | 10
For the case n=2, the distribution of the sample mean starts to take on a triangular shape since some
values are more likely to be rolled than others. For example, there six ways to roll a total of 7 and get a
sample mean of 3.5, but only one way to roll a total of 2 and get a sample mean of 1. Notice the PDF is
still centered at 3.5.
For the case n=10, the PDF of the sample mean now takes on a familiar bell shape that looks like a
Normal Distribution. The center is still at 3.5 and the values are now more tightly clustered around the
mean, implying that the standard deviation has decreased.
P a g e | 11
Finally, for the case n=30, the PDF continues to look like the Normal Distribution centered around the
same mean of 3.5, but more tightly clustered than the prior example:
This die-rolling example demonstrates the Central Limit Theorem’s three important observations about
the PDF of 𝑋� compared to the PDF of the original random variable.
1. 𝜇𝑋� = 𝜇
𝜎
2. 𝜎𝑋� = 𝑛
√
3. The Distribution of 𝑋� is approximately Normal.
𝑋�−𝜇
Combining all of the above into a single formula: 𝑍=𝜎
� 𝑛
√
where Z represents the Standard Normal Distribution.
This powerful result allows us to use the sample mean 𝑋� as an estimator of the population mean 𝜇. In
fact, most inferential statistics practiced today would not be possible without the Central Limit
Theorem.
P a g e | 12
Example:
(70 − 69.2)
P ( X > 70) = P Z > = P ( Z > 2.14)= 0.0162
2.9 60
Compare this to the much larger probability that one male chosen will be over 70 inches tall:
This example demonstrates how the sample mean will cluster towards the population mean as the
sample size increases.
P a g e | 13
Example: Lupe is trying to sell her house and needs to determine the market value of the home. The
population in this example would be all the homes that are similar to hers in the neighborhood.
Lupe’s realtor chooses for the sample nine recent homes in this neighborhood that sold in the last six
months. The realtor then adjusts some of the sales prices to account for differences between Lupe’s
home and the sold homes.
Next the realtor takes the mean of the adjusted sample and recommends to Lupe a market value for
Lupe’s home of $450,000. The realtor has made an inference about the mean value of the population.
To measure the reliability of the inference, the realtor should look at factors like: the sample size being
small, values of homes may have changed in the last six months, or that Lupe’s home is not exactly like
the sampled homes.
The example above is an example of Estimation, a branch of Inferential Statistics where sample statistics
are used to estimate the values of a population parameter. Lupe’s realtor was trying to estimate the
population mean (𝜇) based on the sample mean (𝑋�).
Sample Population
Statistics Parameters
Mean 𝑋� ⟶ 𝜇
Standard Deviation s ⟶ 𝜎
Proportion 𝑝̂ ⟶ 𝑝
In the example above, Lupe’s realtor estimated the population mean of similar homes in Lupe’s
neighborhood by using the sample mean of $450,000 from the adjusted price of the sampled homes.
Interval Estimation
A point estimate is our “best” estimate of a population parameter, but will most likely not exactly equal
the parameter. Instead, we will choose a range of values called an Interval Estimate that is likely to
include the value of the population parameter.
Using probability and the Central Limit Theorem, we can design an Interval Estimate called a Confidence
Interval that has a known probability (Level of Confidence) of capturing the true population parameter.
P a g e | 15
To find a confidence interval for the population mean (𝜇) when the population standard deviation (𝜎) is
known, and n is sufficiently large, we can use the Standard Normal Distribution probability distribution
function to calculate the critical values for the Level of Confidence:
c=Level of Zc=Critical
Confidence Value
90% 1.645
95% 1.960
99% 2.578
Example: The Dean wants to estimate the mean number of hours worked per week by students. A
sample of 49 students showed a mean of 24 hours with a standard deviation of 4 hours. The point
estimate is 24 hours (sample mean). What is the 95% confidence interval for the average number of
hours worked per week by the students?
1.96∙4
24 ± = 24 ± 1.12 = (22.88, 25.12) hours per week
√49
The margin of error for the confidence interval is 1.12 hours. We can say with 95% confidence that mean
number of hours worked by students is between 22.88 and 25.12 hours per week.
If the level of confidence is increased, then the margin of error will also increase. For example, if we
increase the level of confidence to 99% for the above example, then:
2.578∙4
24 ± = 24 ± 1.47 = (22.53, 25.47) hours per week
√49
5.3.2 Confidence Interval for Population Mean using Sample Standard Deviation – Student’s t
Distribution
The formula for the confidence interval for the mean requires the knowledge of the population standard
deviation (𝜎). In most real-life problems, we do not know this value for the same reasons we do not
know the population mean. This problem was solved by the Irish statistician William Sealy Gosset, an
employee at Guiness Brewing. Gosset, however, was prohibited by Guiness in using his own name in
publishing scientific papers. He published under the name “A Student”, and therefore the distribution
he discovered was named "Student's t-distribution" 8.
𝒔
� ± 𝒕𝒄
𝑿 with degrees of freedom = n - 1
√𝒏
Example
Last year Sally belonged to an Health Maintenance Organization (HMO) that had a population average
rating of 62 (on a scale from 0-100, with ‘100’ being best); this was based on records accumulated about
the HMO over a long period of time. This year Sally switched to a new HMO. To assess the population
mean rating of the new HMO, 20 members of this HMO are polled and they give it an average rating of
65 with a standard deviation of 10. Find and interpret a 95% confidence interval for population average
rating of the new HMO.
The t distribution will have 20-1 =19 degrees of freedom. Using table or technology, the critical value for
the 95% confidence interval will be tc=2.093
2.093∙10
65 ± = 65 ± 4.68 = (60.32, 69.68) HMO rating
√20
P a g e | 17
With 95% confidence we can say that the rating of Sally’s new HMO is between 60.32 and 69.68. Since
the quantity 62 is in the confidence interval, we cannot say with 95% certainty that the new HMO is
either better or worse than the previous HMO.
Recall from the section on random variables the binomial distribution where 𝑝 represented the
proportion of successes in the population. The binomial model was analogous to coin-flipping, or yes/no
question polling. In practice, we want to use sample statistics to estimate the population proportion (𝑝).
The sample proportion ( 𝑝̂ ) is the proportion of successes in the sample of size n and is the point
estimator for 𝑝. Under the Central Limit Theorem, if 𝑛𝑝 > 5 and 𝑛(1 − 𝑝) > 5, the distribution of the
sample proportion 𝑝̂ will have an approximately Normal Distribution.
Using this information we can construct a confidence interval for 𝑝, the population proportion:
𝑝(1−𝑝) 𝑝�(1−𝑝�)
Confidence interval for 𝒑: 𝑝̂ ± 𝑍� ≈ 𝑝̂ ± 𝑍�
𝑛 𝑛
Example
The margin of error for this poll is 6% and we can say with 99% confidence that true percentage of
drivers who are using their cell phones illegally is between 6.5% and 18.5%
P a g e | 18
We often want to study the variability, volatility or consistency of a population. For example, two
investments both have expected earnings of 6% per year, but one investment is much riskier, having
higher ups and downs. To estimate variation or volatility of a data set, we will use the sample standard
deviation (𝑠) as a point estimator of the population standard deviation (𝜎).
Example
Investments A and B are both known to have a rate of return of 6% per year. Over the last 24 months,
Investment A has sample standard deviation of 3% per month, while for Investment B, the sample
standard deviation is 5% per month. We would say that Investment B is more volatile and riskier than
Investment A due to the higher estimate of the standard deviation.
To create a confidence interval for an estimate of standard deviation, we need to introduce a new
distribution, called the Chi-square (𝜒 2 ) distribution.
The Chi-square distribution is a family of distributions related to the Normal Distribution as it represents
a sum of independent squared standard Normal Random Variables. Like the Student’s t distribution, the
degrees of freedom will be n-1 and determine the shape of the distribution. Also, since the Chi-square
represents squared data, the inference will be about the variance rather than the standard deviation.
• It is positively skewed
• It is non-negative
• It is based on degrees of freedom (n-1)
• When the degrees of freedom change,
a new distribution is created
(𝑛−1)𝑠2
• 𝜎2
will have Chi-square distribution.
Since the Chi-square represents squared data, we can construct confidence intervals for the population
variance (𝜎 2 ), and take the square root of the endpoints to get a confidence interval for the population
standard deviation. Due to the skewness of the Chi-square distribution the resulting confidence interval
will not be centered at the point estimator, so the margin of error form used in the prior confidence
intervals doesn’t make sense here.
P a g e | 19
(n − 1)s 2 (n − 1)s 2
,
χR χL
2 2
Example
One can say with 95% confidence that the standard deviation for this mutual fund is between 3.8% and
7.3% per month.
P a g e | 20
Others may choose a more formalized and detailed set of procedures, but the general concepts of
inspiration, design, experimentation, and conclusion allow one to see the whole process.
Most general questions start with an inspiration or an idea about a topic or phenomenon of interest.
Some examples of general questions:
• (Health Care) Would a public single payer health care system be more effective than the current
private insurance system?
• (Labor) What is the effect of undocumented immigration and outsourcing of jobs on the current
unemployment rate.
• (Economy) Is the federal economic stimulus package effective in lessening the impact of the
recession?
• (Education) Are colleges too expensive for students today?
• 𝑝 > 0.20
• 𝜇 > 5000
• 𝜇1 = 𝜇2
• 𝑝1 < 𝑝2
• 𝜎 > 10
Hypothesis Testing is a procedure, based on sample evidence and probability theory, used to determine
whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable and
should be rejected. This hypothesis that is tested is called the Null Hypothesis designated by the symbol
Ho. If the Null Hypothesis is unreasonable and needs to be rejected, then the research supports an
Alternative Hypothesis designated by the symbol Ha.
From these definitions it is clear that the Alternative Hypothesis will necessarily contradict the Null
Hypothesis; both cannot be true at the same time. Some other important points about hypotheses:
• Hypotheses must be statements about population parameters, never about sample statistics.
• In most hypotheses tests, equality ( =, ≤, ≥ ) will be associated with the Null Hypothesis while
non-equality (≠, <, > ) will be associated with the Alternative Hypothesis.
• It is the Null Hypothesis that is always tested in attempt to “disprove” it and support the
Alternative Hypothesis. This process is analogous in concept to a “proof by contradiction” in
Mathematics or Logic, but supporting a hypothesis with a level of confidence is not the same as
an absolute mathematical proof.
To test a hypothesis we need to use a statistical model that describes the behavior for data and the type
of population parameter being tested. Because of the Central Limit Theorem, many statistical models
are from the Normal Family, most importantly the Z, t, χ2, and F distributions. Other models that are
used when the Central Limit Theorem is not appropriate are called non-parametric Models and will not
be discussed here.
Each chosen model has requirements of the data called model assumptions that should be checked for
appropriateness. For example, many models require the sample mean has approximately a Normal
Distribution, which may not be true for some smaller or heavily skewed data sets.
Once the model is chosen, we can then determine a test statistic, a value derived from the data that is
used to decide whether to reject or fail to reject the Null Hypothesis.
𝑋�−𝜇𝑜
Mean vs. Hypothesized Value 𝑡= 𝑠
� 𝑛
√
𝑝�−𝑝𝑜
Proportion vs. Hypothesized Value 𝑍=
𝑝 (1−𝑝0 )
� 𝑜
𝑛
(𝑛−1)𝑠2
Variance vs. Hypothesized Value 𝜒2 = 𝜎2
P a g e | 23
Whenever we make a decision or support a position, there is always a chance we make the wrong
choice. The hypothesis testing process requires us to either to reject the Null Hypothesis and support
the Alternative Hypothesis or fail to reject the Null Hypothesis. This creates the possibility of two types
of error:
• Type I Error
Rejecting the null hypothesis when
it is actually true.
• Type II Error
Failing to reject the null hypothesis
when it is actually false.
In designing hypothesis tests, we need to carefully consider the probability of making either one of
these errors.
Example:
Recall the two news stories discussed earlier in Section 3. In the first story, a drug company marketed a
suppository that was later found to be ineffective (and often dangerous) in treatment. Before marketing
the drug, the company determined that the drug was effective in treatment, which means the company
rejected a Null Hypothesis that the suppository had no effect on the disease. This is an example of Type I
error.
In the second story, research was abandoned when the testing showed Interferon was ineffective in
treating a lung disease. The company in this case failed to reject a Null Hypothesis that the drug was
ineffective. What if the drug really was effective? Did the company make Type II error? Possibly, but
since the drug was never marketed, we have no way of knowing the truth.
These stories highlight the problem of statistical research: errors can be analyzed using probability
models, but there is often no way of indentifying specific errors. For example, there are unknown
innocent people in prison right now because a jury made Type I error in wrongfully convicting
defendants. We must be open to the possibility of modification or rejection of currently accepted
theories when new data is discovered.
In designing an experiment, we set a maximum probability of making Type I error. This probability is
called the level of significance or significance level of the test and designated by the Greek letter α.
The analysis of Type II error is more problematic as there many possible values that would satisfy the
Alternative Hypothesis. For a specific value of the Alternative Hypothesis, the design probability of
making Type II error is called Beta (β) which will be analyzed in detail later in this section.
P a g e | 24
Once the significance level of the test is chosen, it is then possible to find region(s) of the probability
distribution function of the test statistic that would allow the Null Hypothesis to be rejected. This is
called the Rejection Region and the boundry between the Rejection Region and the “Fail to Reject” is
called the Critical Value.
There can be more than one critical value and rejection region. What matters is that the total area of the
rejection region equals the significance level α.
A test is one-tailed when the Alternative Hypothesis, Ha , states a direction, such as:
H0: The mean income of females is less than or equal to the mean income of male.
Ha : The mean income of females is greater than males.
Since equality is usually part of the Null Hypothesis, it is the Alternative Hypothesis which determines
which tail to test.
A test is two-tailed when no direction is specified in the alternate hypothesis Ha , such as:
In a two tailed-test, the significance level is split into two parts since there are two rejection regions. In
hypothesis testing where the statistical model is symmetrical ( eg: the Standard Normal Z or Student’s t
distribution) these two regions would be equal. There is a relationship between a confidence interval
and a two-tailed test: If the level of confidence for a confidence interval is equal to 1-α, where α is the
significance level of the two-tailed test, the critical values would be the same.
P a g e | 25
Here are some examples for testing the mean µ against a hypothesized value µ0:
Ha: µ>µ0 means test the upper tail and is also called a right-tailed test.
Ha: µ<µ0 means test the lower tail and is also called a left-tailed test.
Ha: µ≠µ0 means test both tails.
Deciding when to conduct a one or two-tailed test is often controversial and many authorities even go
so far as to say that only two-tailed tests should be conducted. Ultimately, the decision depends on the
wording of the problem. If we want to show that a new diet reduces weight, we would conduct a lower
tailed test since we don’t care if the diet causes weight gain. If instead, we wanted to determine if mean
crime rate in California was different from the mean crime rate in the United States, we would run a
two-tailed test, since different means greater than or less than.
After collecting the data but before running the test, we need to verify the data. First, get a picture of
the data by making a graph (histogram, dot plot, box plot, etc.) Check for skewness, shape and any
potential outliers in the data.
An outlier is data point that is far removed from the other entries in the data set. Outliers could be
caused by:
The first two cases are simple to deal with as we can correct errors or remove data that that does not
belong in the population. The third case is more problematic as extreme outliers will increase the
standard deviation dramatically and heavily skew the data.
In The Black Swan, Nicholas Taleb argues that some populations with extreme outliers should not be
analyzed with traditional confidence intervals and hypothesis testing. 9 He defines a Black Swan to be an
P a g e | 26
unpredictable extreme outlier that causes dramatic effects on the population. A recent example of a
Black Swan was the catastrophic drop in the value of unregulated Credit Default Swap (CDS) real estate
insurance investments which caused the near collapse of international banking system in 2008. The
traditional statistical analysis that measured the risk of the CDS investments did not take into account
the consequence of a rapid increase in the number of foreclosures of homes. In this case, statistics that
measure investment performance and risk were useless and created a false sense of security for large
banks and insurance companies.
Example
2 2 3 4 5 5 6 6 7 50
In this example, the number 50 is an outlier. When calculating summary statistics, we can see that the
mean and standard deviation are dramatically affected by the outlier, while the median and the
interquartile range (which are based on the ranking of the data) are hardly changed. One solution when
dealing with a population with extreme outliers is to use inferential statistics using the ranks of the data,
also called non-parametric statistics.
• The “box” is the region between the 1st and 3rd quartiles.
• Possible outliers are more than 1.5 IQR’s from the box (inner fence)
• Probable outliers are more than 3 IQR’s from the box (outer fence)
• In the box plot below of the realtor example, the dotted lines represent the “fences” that are
1.5 and 3 IQR’s from the box. See how the data point 50 is well outside the outer fence and
therefore an almost certain outlier.
P a g e | 27
After the data is verified, we want to conduct the hypothesis test and come up with a decision, whether
or not to reject the Null Hypothesis. The decision process is similar to a “proof by contradiction” used in
mathematics:
• We assume Ho is true before observing data and design Ha to be the complement of Ho.
• Observe the data (evidence). How unusual are these data under Ho?
• If the data are too unusual, we have “proven” Ho is false: Reject Ho and support Ha (strong
statement).
• If the data are not too unusual, we fail to reject Ho. This “proves” nothing and we say data are
inconclusive. (weak statement) .
• We can never “prove” Ho , only “disprove” it.
• “Prove” in statistics means support with (1-α)100% certainty. (example: if α=.05, then we are at
least 95% confident in our decision to reject Ho.
Earlier we introduced the idea of a test statistic which is a value calculated from the data under the
appropriate Statistical Model from the data that can be compared to the critical value of the Hypothesis
test. If the test statistic falls in the rejection region of the statistical model, we reject the Null
Hypothesis.
Recall that the critical value was determined by design based on the chosen level of significance α. The
more preferred method of making decisions is to calculate the probability of getting a result as extreme
as the value of the test statistic. This probability is called the p-value, and can be compared directly to
the significance level.
• p-value: the probability, assuming that the null hypothesis is true, of getting a value of the test
statistic at least as extreme as the computed value for the test.
• If the p-value is smaller than the significance level α, H0 is rejected.
• If the p-value is larger than the significance level α, H0 is not rejected.
Comparing p-value to α
Both the p-value and α are probabilities of getting results as extreme as the data assuming Ho is true.
The p-value is determined by the data is related to the actual probability of making Type I error
(Rejecting a True Null Hypothesis). The smaller the p-value, the smaller the chance of making Type I
error and therefore, the more likely we are to reject the Null Hypothesis.
The significance level α is determined by design and is the maximum probability we are willing to accept
of rejecting a true H0.
P a g e | 28
1. If the test statistic lies in the rejection region, reject Ho. (critical value method)
2. If the p-value < α, reject Ho. (p-value method)
This p-value method of comparison is preferred to the critical value method because the rule is the
same for all statistical models: Reject Ho if p-value < α.
Let’s see why these two rules are equivalent by analyzing a test of mean vs. hypothesized value.
Decision is Reject Ho
• Ho: µ = 10
Ha: µ > 10
• Design: Critical value is determined by
significance level α.
• Data Analysis: p-value is determined by
test statistic
• Test statistic falls in rejection region.
• p-value (blue) < α (purple)
• Reject Ho.
• Strong statement: Data supports the
Alternative Hypothesis.
In this example, the test statistic lies in the rejection region (the area to the right of the critical value).
The p-value (the area to the right of the test statistic) is less than the significance level (the area to the
right of the critical value). The decision is Reject Ho.
In this example, the Test Statistic does not lie in the Rejection Region. The p-value (the area to the right
of the test statistic) is greater than the significance level (the area to the right of the critical value). The
decision is Fail to Reject Ho.
P a g e | 29
The hypothesis test has been conducted and we have reached a decision. We must now communicate
these conclusions so they are complete, accurate, and understood by the targeted audience. How a
conclusion is written is open to subjective analysis, but here are a few suggestions:
Rejecting Ho requires a strong statement in support of Ha, while failing to reject Ho does NOT support
Ho, but requires a weak statement of insufficient evidence to support Ha.
6.5.2 Use language that is clearly understood in the context of the problem.
Do not use technical language or jargon, but instead refer back to the language of the original general
question or research hypotheses. Saying less is better than saying more.
100
Care must be taken to describe the population being sampled and understand that the any claim is
limited to this sampled population. If a survey was taken of a subgroup of a population, then the
inference applies only to the subgroup.
P a g e | 30
For example, studies by pharmaceutical companies will only test adult patients, making it difficult to
determine effective dosage and side effects for children. “In the absence of data, doctors use their
medical judgment to decide on a particular drug and dose for children. ‘Some doctors stay away from
drugs, which could deny needed treatment,’ Blumer says. ‘Generally, we take our best guess based on
what's been done before.’ The antibiotic chloramphenicol was widely used in adults to treat infections
resistant to penicillin. But many newborn babies died after receiving the drug because their immature
livers couldn't break down the antibiotic.” 10 We can see in this example that applying inference of the
drug testing results on adults to the un-sampled children led to tragic results.
6.5.4 Report sampling methods that could question the integrity of the random sample assumption.
In practice it is nearly impossible to choose a random sample, and scientific sampling techniques that
attempt to simulate a random sample need to be checked for bias caused by under-sampling.
Telephone polling was found to under-sample young people during the 2008 presidential campaign
because of the increase in cell phone only households. Since young people were more likely to favor
Obama, this caused bias in the polling numbers. Additionally, caller ID has dramatically reduced the
percentage of successful connections with people being surveyed. The pollster Jay Leve of SurveyUSA
said telephone polling was “doomed” and said his company was already developing new methods for
polling. 11
Sampling that didn’t occur over the weekend may exclude many full time workers while self-selected
and unverified polls (like ratemyprofessors.com) could contain immeasurable bias.
6.5.5 Conclusions should address the potential or necessity of further research, sending the process
back to the first procedure.
Answers often lead to new questions. If changes are recommended in a researcher’s conclusion, then
further research is usually needed to analyze the impact and effectiveness of the implemented changes.
There may have been limitations in the original research project (such as funding resources, sampling
techniques, unavailability of data) that warrants more a comprehensive study.
For example, a math department modifies its curriculum based on a performance statistics for an
experimental course. The department would want to do further study of student outcomes to assess the
effectiveness of the new program.
The quality control statistician has been given the authority to sample 36 bottles of soy sauce and knows
from past testing that the population standard deviation is 0.5 ounces. The model will be a test of
population mean vs. hypothesized value of 16 oz. A two-tailed test is selected since the company is
concerned about both overfilling and underfilling the bottles as the stated policy is the stated weight
match the actual weight of the product.
𝑋�−𝜇
Since the population standard deviation is known the test statistic will be 𝑍 = 𝜎 . This model is
� 𝑛
√
appropriate since the sample size assures the distribution of the sample mean is approximately Normal
from the Central Limit Theorem.
Type I error would be to reject the Null Hypothesis and say the machine is not running properly when in
fact it was operating properly. Since the company does not want to needlessly stop production and
recalibrate the machine, the statistician chooses to limit the probability of Type I error by setting the
level of significance (α) to 5%.
Next, the sample mean and the test statistic are calculated.
𝟏𝟔.𝟏𝟐−𝟏𝟔
� = 𝟏𝟔. 𝟏𝟐 ounces
𝑿 𝒁= = 𝟏. 𝟒𝟒
𝟎.𝟓�
√𝟑𝟔
Alternatively (and preferably) the statistician would use the p-value method of decision rule. The p-value
for a two-tailed test must include all values (positive and negative) more extreme than the Test Statistic,
so in this example we find the probability that Z < -1.44 or Z > 1.44 (the area shaded blue).
Using a calculator, computer software or a Standard Normal table, the p-value=0.1498. Since the p-
value is greater than α, the decision again is fail to reject Ho.
The statistician makes the weak statement and is not stating that the machine is running properly, only
that there is not enough evidence to state machine is running improperly. The statistician also reporting
concerns about the sampling of only one shift of employees (restricting the inference to the sampled
population) and recommends repeating the experiment over several shifts.
If a hypothesis test has low power, then it would difficult to reject Ho, even if Ho were false; the research
would be a waste of time and money. However, analyzing power is difficult in that there are many
values of the population parameter that support Ha. For example, in the soy sauce bottling example, the
Alternative Hypothesis was that the mean was not 16 ounces. This means the machine could be filling
the bottles with a mean of 16.0001 ounces, making Ha technically true. So when analyzing power and
Type II error we need to choose a value for the population mean under the Alternative Hypothesis (µa)
that is “practically different” from the mean under the Null Hypothesis (µo). This practical difference is
called the effect size.
µo: The value of the population mean under the Null Hypothesis
µa: The value of the population mean under the Alternative Hypothesis
Effect Size: The “practical difference” between µo and µa = | 𝜇𝑜 − 𝜇𝑎 |
Example
Bus brake pads are claimed to last on average at least 60,000 miles and the company wants to test this
claim. The bus company considers a “practical” value for purposes of bus safety to be that the pads last
at least 58,000 miles. If the standard deviation is 5,000 and the sample size is 50, find the power of the
test when the mean is really 58,000 miles. (Assume α = .05)
Zσ
X = µo + = 60000 − (1.645)(5000) / 50 = 58837
n
(58837 − µ a )
P ( X < 58837) = P Z <
σ/ n
(58837 − 58000)
= P Z < = P ( Z < 1.18) = .8810
5000 / 50
Input Values
𝜇𝑜 = 60,000 miles
𝜇𝑎 = 58,000 miles
𝛼 = 0.05
𝑛 = 50
𝜎 = 5000 miles
Calculated Values
Effect Size = 2000 miles
Critical Value = 58,837 miles
𝛽 = 0.1190 or about 12%
Power = 0.8810 or about 88%
P a g e | 35
The procedures outlined for the test of population mean vs. hypothesized value with known population
standard deviation will apply to other models as well. All that really changes is the test statistic.
• Test of population mean vs. hypothesized value, population standard deviation unknown.
• Test of population proportion vs. hypothesized value.
• Test of population standard deviation (or variance) vs. hypothesized value.
The test statistic for the one sample case changes to a Student’s t distribution with degrees of freedom
� −𝜇
𝑋
equal to n-1: 𝑡= 𝑠⁄√𝑛
𝑜
The shape of the t distribution is similar to the Z, except the tails are fatter, so the logic of the decision
rule is the same as the Z test statistic.
Example
1. Design a hypotheses where the alternative claim would be the humerus bones were not from
Species A.
Research Hypotheses
Ho: µ = 9.6 (The humerus bones are from Species A)
Ha: µ ≠ 9.6 (The humerus bones are not from Species A)
Test Statistic (Model): t-test of mean vs. hypothesized value, unknown standard deviation
Model Assumptions: we may need to check the data for extreme skewness as the distribution of the
sample mean is assumed to be approximately the Normal Distribution.
P a g e | 36
2. Determine the power of this test if the bones actually came from Species B (assume a standard
deviation of 0.7)
Information needed for Power Calculation Results using Online Power Calculator 12
• µo = 9.6 (Species A) • Power =.8755
• µa = 9.1 (Species B) • β = 1 - Power = .1245
• Effect Size =| mo - ma | = 0.5 • If humerus bones are from Species B,
• s = 0.7 (given) test has an 87.55% chance of correctly
• α = .05 rejecting Ho and a maximum Type II
• n = 21 (sample size) error of 12.55%
• Two tailed test
6 8 10 12
Conclusion: The evidence supports the claim (p-value<.05) that the humerus bones are not from Species
A. The small sample size limited the power of the test, which prevented us from making a more
definitive conclusion. Recommend testing to see if bones are from Species B or other unknown species.
We are assuming since the bones were unearthed in the same location, they came from the same
species.
P a g e | 37
When our data is categorical and there are only two possible choices (for example a yes/no question on
a poll), we may want to make a claim about a proportion or a percentage of the population (𝑝) being
compared to a particular value (𝑝𝑜 ). We will then use the sample proportion (𝑝̂ )to test the claim.
� = sample proportion
𝒑 𝒑𝒂 = population proportion under Ha
𝑝�−𝑝𝑜
Test Statistic: 𝑍= Requirement for Normality Assumption: 𝑛𝑝(1 − 𝑝) > 5
(1−𝑝𝑜 )
�𝑝𝑜
𝑛
Example
Information needed for Sample Size Results using online Power Calculator and
Calculation Megastat
Since p-value < α, reject Ho and support Ha. Since the p-value is actually less than 0.01, we would go
further and say that the data supports rejecting Ho for α = .01.
Conclusion: The evidence supports the claim that the new letter is more effective. The 1652 test letters
were selected as a random sample from the charity’s mailing list. All letters were sent at the same time
period. The letters needed to be sent in a specific time period, so we were not able to control for
seasonal or economic factors. We recommend testing both solicitation methods over the entire year to
eliminate seasonal effects and to create a control group.
P a g e | 39
6.8.3 Test of population standard deviation (or variance) vs. hypothesized value.
We often want to make a claim about the variability, volatility or consistency of a population random
variable. Hypothesized values for population variance σ2 or standard deviation s are tested with the Chi-
square (χ2) distribution.
Examples of Hypotheses:
• Ho: σ = 10 Ha: σ ≠ 10
• Ho: σ2 = 100 Ha: σ2 > 100
𝒔𝟐 = sample variance
(𝑛−1)𝑠2
Test Statistic: 𝜒2 = 𝒏 − 𝟏 = degrees of freedom
𝜎𝑜2
Example
Design:
Research Hypotheses:
• Ho: σ2 = 900
• Ha: σ2 < 900
P a g e | 40
Results:
Decision: Reject Ho
Conclusion:
The evidence supports the claim (p-value<.01) that the standard deviation for 8th grade test scores is less
than 30. The 40 test scores were the results of the recently administered exam to the 8th grade students.
Since the exams were for the current class only, there is no assurance that future classes will achieve
similar results. Further research would be to compare results to other schools that administered the
same exam and to continue to analyze future class exams to see if the claim is holding true.
P a g e | 41
In this section we consider expanding the concepts from the prior section to design and conduct
hypothesis testing with two samples. Although the logic of hypothesis testing will remain the same, care
must be taken to choose the correct model. We will first consider comparing two population means.
In designing a two population test of means, first determine whether the experiment involves data that
is collected by independent or dependent sampling.
The data is collected by two simple random samples from separate and unrelated populations. This data
will then be used to compare the two population means. This is typical of an experimental or treatment
population versus a control population.
INDEPENDENT SAMPLING
Example
A community college mathematics department wants to know if an experimental algebra course has
higher success rates when compared to a traditional course. The mean grade points for 80 students in
the experimental course (treatment) is compared to the mean grade points for 100 students in the
traditional course (control).
The data consists of a single population and two measurements. A simple random sample is taken from
the population and pairs of measurement are collected. This is also called related sampling or matched
pair design. Dependent sampling actually reduces to a one population model of differences.
P a g e | 42
DEPENDENT SAMPLING
Example
An instructor of a statistics course wants to know if student scores are different on the second midterm
compared to the first exam. The first and second midterm scores for 35 students is taken and the mean
difference in scores is determined.
We will first consider the case when we want to compare the population means of two populations
using independent sampling.
Suppose we wanted to test the hypothesis 𝑯𝒐: 𝝁𝟏= 𝝁𝟐 . We have point estimators for both 𝜇1 and 𝜇2 ,
namely 𝑋�1 and 𝑋�2 , which have approximately Normal Distributions under the Central Limit Theorem, but
it would useful to combine them both into a single estimator. Fortunately it is known that if two random
variables have a Normal Distribution, then so does the sum and difference. Therefore we can restate the
hypothesis as 𝑯𝒐: 𝝁𝟏− 𝝁𝟐 = 𝟎 and use the difference of sample means 𝑋�1 − 𝑋�2 as a point estimator for
the difference in population means 𝜇1 − 𝜇2 .
�𝟏 − 𝑿
Distribution of 𝑿 � 𝟐 under the Central Limit Theorem
𝜎2 𝜎2
𝝁𝑿�𝟏 −𝑿�𝟐 = 𝜇1− 𝜇2 𝝈𝑿�𝟏 −𝑿�𝟐 = �𝑛1 + 𝑛2
1 2
7.2.2 Comparing two means, independent sampling: Model when population variances known
When the population variances are known, the test statistic for the Hypothesis 𝑯𝒐: 𝝁𝟏= 𝝁𝟐 can be tested
with Normal distribution Z test statistic shown above. Also, if both sample size n1 and n2 exceed 30, this
model can also be used.
Example
Example - Design
Research Hypotheses: Ho: µ1≤µ2 (Homes with pools do not have more mean square footage)
Ha: µ1>µ2 (Homes with pools do have more mean square footage)
Since both sample sizes are over 30, the model will be a Large sample Z test comparing two population
means with independent sampling. This model is appropriate since the sample sizes assures the
distribution of the sample mean is approximately Normal from the Central Limit Theorem. A one-tailed
test is selected since we want to support the claim that homes with pools are larger. The test statistic
(𝑋�1 −𝑋�2 )−(𝜇1− 𝜇2 )
will be = .
𝜎2 𝜎2
� 1+ 2
𝑛1 𝑛2
Type I error would be to reject the Null Hypothesis and claim home with pools are larger, when they are
not larger. It was decided to limit this error by setting the level of significance (α) to 1%.
The decision rule under the critical value method would be to reject the Null Hypothesis when the value
of the test statistic is in the rejection region. In other words, reject Ho when Z > 2.326. The decision
under the p-value method is to reject Ho if the p-value is < α.
Example - Data/Results
Since the test statistic (Z = 4.19) is greater than the critical value (2.326), Ho is rejected. Also the p-value
(0.000013) is less than α (0.01), the decision is Reject Ho.
P a g e | 44
Example - Conclusion
The researcher makes the strong statement that homes with pools have a significantly higher mean
square footage than home without pools.
In the case when the population standard deviations are unknown, it seems logical to simply replace the
population standard deviations for each population with the sample standard deviations and use a t-
distribution as we did for the one population case. However, this is not so simple when the sample size
for either group is under 30.
We will consider two models. This first model (which we prefer to use since it has higher power)
assumes the population variances are equal and is called the pooled variance t-test. In this model we
combine or “pool” the two sample standard deviations into a single estimate called the pooled standard
deviation, sp . If the central limit theorem is working, we then can substitute sp for s1 and s2 get a t-
distribution with n1 +n2 -2 degrees of freedom:
Pooled variance t-test to compare the means for two independent populations
Example
Example - Design
It is best to associate the subscript 2 with the control group, in this case we will let domestic cars be
population 2.
Research Hypotheses: Ho: µ1≤µ2 (Imported compact cars do not have a higher mean MPG)
Ha: µ1>µ2 (Imported compact cars have a higher mean MPG)
We will assume the population variances are equal 𝜎12 = 𝜎22 , so the model will be a Pooled variance t-
test. This model is appropriate if the distribution of the differences of sample means is approximately
Normal from the Central Limit Theorem. A one-tailed test is selected based on Ha.
P a g e | 45
Type I error would be to reject the Null Hypothesis and claim imports has a higher mean MPG, when
they do not have higher MPG. The test will be run at a level of significance (α) of 5%.
The degrees of freedom for this test is 25, so the decision rule under the critical value method would be
to reject Ho when t > 1.708. The decision under the p-value method is to reject Ho if the p-value is < α.
Example - Data/Results
Since 1.85 > 1.708, the decision would be to Reject Ho. Also the p-value is calculated to be .0381 which
again shows that the result is significant at the 5% level.
Example - Conclusion
Imported compact cars have a significantly higher mean MPG rating when compared to domestic cars.
In the prior example, we assumed the population variances were equal. However, when looking at the
box plot of the data or the sample standard deviations, it appears that the import cars have more
variability MPG than domestic cars, which would violate the assumption of equal variances required for
the Pooled Variance t-test.
Fortunately, there is an alternative model that has been developed for when population variances are
unequal, called the Behrens-Fisher model 14, or the unequal variances t-test.
Unequal variance t-test to compare the means for two independent populations
The degrees of freedom will be less then or equal to 𝑛1 + 𝑛2 − 2, so this test will usually have less power
than the pooled variance t-test.
Example
We will repeat the prior example to see if we can support the claim that imported compact cars have
higher mean MPG when compared to domestic compact cars. This time we will assume that the
population variances are not equal.
Example - Design
Again we will let domestic cars be population 2.
Research Hypotheses: Ho: µ1≤µ2 (Imported compact cars do not have a higher mean MPG)
Ha: µ1>µ2 (Imported compact cars have a higher mean MPG)
We will assume the population variances are unequal 𝜎12 ≠ 𝜎22 , so the model will be an unequal
variance t-test. This model is appropriate if the distribution of the differences of sample means is
approximately Normal from the Central Limit Theorem. A one-tailed test is selected based on Ha.
Type I error would be to reject the Null Hypothesis and claim imports has a higher mean MPG, when
they do not have higher MPG. The test will be run at a level of significance (α) of 5%.
The degrees of freedom for this test is 16 (see calculation below), so the decision rule under the critical
value method would be to reject Ho when t > 1.746. The decision under the p-value method is to reject
Ho if the p-value is < α.
Example - Data/Results
2
2.162 3.862
� + �
15 12
𝑑𝑓 = 2 2 = 16
2
�2.16 �15�
2
�3.86 �12�
� (15−1)
+ (12−1)
�
(35.76−33.59)−0
𝑡= = 1.74
2 2
�2.16 +3.86
15 12
Since 1.74 <1.708, the decision would be Fail to Reject Ho. Also the p-value is calculated to be .0504
which again shows that the result is not significant (barely) at the 5% level.
Example - Conclusion
Insufficient evidence to claim imported compact cars have a significantly higher mean MPG rating when
compared to domestic cars.
You can see the lower power of this test when compared to the pooled variance t-test example where
Ho was rejected. We always prefer to run the test with higher power when appropriate.
P a g e | 47
The independent models shown above compared samples that were not related. However, it is often
advantageous to have related samples that are paired up – Two measurements from a single
population. The model we will consider here is called the matched pairs t-test also known as the paired
difference t-test. The advantage of this design is that we can eliminate variability due to other factors
not being studied, increasing the power of the design.
In this model we take the difference of each pair and create a new population of differences, so if effect,
the hypothesis test is a one population test of mean that we already covered in the prior section.
Matched pairs t-test to compare the means for two dependent populations
• Dependent Sampling � −𝜇
𝑋
• 𝑋𝑑 = 𝑋1 − 𝑋2
𝑡= 𝑑
𝑠𝑑 ⁄√𝑛
𝑑
𝑑𝑓 = 𝑛 − 1
• 𝑋�𝑑 = 𝑋�1 −𝑋�2 approximately Normal
Example
Notice in this example that cities are the single population being sampled and two measurements (Hertz
and Avis) are being taken from each city. Using the matched pair design, we can eliminate the variability
due to cities being differently priced (Honolulu is cheap because you can’t drive very far on Oahu!)
Example - Design
Research Hypotheses: Ho: µ1=µ2 (Hertz and Avis have the same mean price for compact cars.)
Ha: µ1≠µ2 (Hertz and Avis do not have the same mean price for compact cars.)
Model will be matched pair t-test and these hypotheses can be restated as: Ho: µd=0 Ha: µd≠0
Model is two tailed matched pairs t-test with 14 degrees of freedom. Reject Ho if t < -2.145 or t >2.145.
P a g e | 48
Example - Data/Results
We take the difference for each pair and find the sample mean and
standard deviation.
X d = 1.80
sd = 2.513
n = 15
1.80 − 0
𝑡= = 2.77
2.513⁄√15
Example – Conclusion
There is a difference in mean price for compact cars between Hertz and Avis. Avis has lower mean
prices.
The advantage of the matched pair design is clear in this example. The sample standard deviation for the
Hertz prices is $5.23 and for Avis it is $5.62. Much of this variability is due to the cities, and the matched
pairs design dramatically reduces the standard deviation to $2.51, meaning the matched pairs t-test has
significantly more power in this example.
Sometimes we want to test if two populations have the same spread or variation, as measured by
variance or standard deviation. This may be a test on its own or a way of checking assumptions when
deciding between two different models (e.g.: pooled variance t-test vs. unequal variance t-test). We will
now explore testing for a difference in variance between two independent samples.
7.4.1 F distribution
The F distribution is a family of distributions related to the Normal Distribution. There are two different
degrees of freedom, usually represented as numerator (dfnum) and denominator (dfden). Also, since the F
represents squared data, the inference will be about the variance rather than the standard deviation.
Characteristics of F Distribution
• It is positively skewed
• It is non-negative
• There are 2 different degrees of freedom (dfnum, dfden)
• When the degrees of freedom change,
a new distribution is created
• The expected value is 1.
P a g e | 49
Example - Design
Research Hypotheses: Ho: σ1≤σ2 (Software stocks do not have more variation)
Ha: σ1>σ2 (Software stocks do have more variation)
𝑠12
Model will be F test for variances and the test statistic from the table will be F= . The degrees of
𝑠22
freedom for numerator will be n1-1=9 and the degrees of freedom for denominator will be n2-1=7.
Critical Value for F with dfnum=9 and dfden=7 is 3.68. Reject Ho if F >3.68.
Example - Data/Results
2
𝐹 = 4.9 � 2 = 1.96, which is less than critical value, so Fail to Reject Ho.
3.5
Example – Conclusion
When comparing two means from independent samples, you have a choice between the more powerful
pooled variance t-test (assumption is 𝜎21 = 𝜎22 ) or the weaker unequal variance t-test (assumption is
𝜎12 ≠ 𝜎22 ). We can now design a hypothesis test to help us choose the appropriate model. Let us revisit
the example of comparing the mpg for import and domestic compact cars. Consider this example a "test
before the main test" to help choose the correct model for comparing means.
Example - Design
Research Hypotheses: Ho: σ1=σ2 (choose the pooled variance t-test to compare means)
Ha: σ1≠σ2 (choose the unequal variance t-test to compare means)
𝑠12
Model will be F test for variances and the test statistic from the table will be F= (s1 is larger). The
𝑠22
degrees of freedom for numerator will be n1-1=11 and the degrees of freedom for denominator will be
n2-1=14.
The test will be run at a level of significance (α) of 10%, but use the α=.05 table for a two-tailed test.
Critical Value for F with dfnum=11 and dfden=14 is 2.57. Reject Ho if F >2.57.
Example - Data/Results
Also p-value = 0.0438 < 0.10 which also makes the result significant.
Example – Conclusion
Do not assume equal variances and run the unequal variance t-test to compare
population means
In Summary
Often we want to conduct tests claims about the characteristics of qualitative or categorical non-
numeric data. In Section 6, we covered a test of one population proportion. In reality, this was a test of a
categorical variable with 2 choices (success, failure). Now in this section, we will expand our study of
hypothesis tests involving categorical data to include categorical random variables with more than two
choices using a goodness-of-fit test. In addition, we will compare two categorical variables for
independence. Both of these models will use a Chi-square test statistic, by looking at deviations
between the observed values and expected values of the data.
A financial services company had anecdotal evidence that people were calling in sick on Monday and
Friday more frequently than Tuesday, Wednesday or Thursday. The speculation was that some
employees were using sick days to extend their weekends. A researcher for the company was asked to
determine if the data supported a significant difference in absenteeism due to the day of the week.
The categorical variable of interest here is “Day of Week” an employee called in sick (Monday through
Friday). This is an example of a multinomial random variable, where we will observe a fixed number of
trials (the total number of sick days sampled) and at least 2 possible outcomes. (A binomial random
variable is a special case of the multinomial random variable where there is exactly 2 possible outcomes
and was studied in Section 9 as a Z Test of Proportion.)
The Chi-square goodness-of-fit test is used to test if observed data from a categorical variable is
consistent with an expected assumption about the distribution of that variable.
• Oi = Observed in category i k
(Oi − Ei )
2
χ =∑
2
df = k-1
• pi = Expected proportion in category i i =1 Ei
• =
Ei np =i Expected in category i k = number of categories
• Ei ≥ 5 for each i n = sample size
P a g e | 52
A researcher for the financial services company collected 400 records of what
day of the week employees called in sick to work. Can the researcher conclude
that proportion of employees who call in sick is not the same for each day of
the week? Design and conduct a hypothesis test at the 1% significance level.
Research Hypotheses: Ho: There is a no difference in the proportion of employees who call in sick
due to the day of the week.
Ha: There is a difference in the proportion of employees who call in sick
due to the day of the week.
We can also state the hypotheses in terms of population parameters, pi for each category. Under the
null hypothesis we would expect 20% sick days would occur on each week day.
Important Assumption: The Expected Value of Each Category needs to be greater than or equal to 5. In
this example, E=
i np=
i ( 400 )(.20=) 80 ≥ 5 for each category, so the model is appropriate.
(Oi − Ei )
2
k
Test Statistic: χ =∑
2
df = 5-1=4
i =1 Ei
Results:
Since the Test Statistic is in the Rejection Region, the decision is to Reject Ho. Under the p-value
method, Ho is also rejected since the p-value = P(χ2>15.625) = 0.004 which is less than the Significance
Level α of 1%.
In the prior example, the Null Hypothesis was that all categories had the same proportion, in other
words there was no difference in counts due to the choices of a categorical variable. Another set of
hypotheses using this same Chi-square goodness-of-fit test can be used to compare current results of an
current experiment to prior results. In these tests, it is quite likely that prior proportions were not the
same.
Research Hypotheses: Ho: Workers in Santa Clara county choose methods of commuting that match
the United States averages.
Ha: Workers in Santa Clara county choose methods of commuting that do not
match the United States averages.
P a g e | 54
We can also state the hypotheses in terms of population parameters, pi for each category. Under the
null hypothesis we would expect 20% sick days would occur on each week day.
Important Assumption: The Expected Value of Each Category needs to be greater than or equal to 5. In
this example check the lowest pi : E=
5 np=
5 (1000 )(.018=) 18 ≥ 5 , so the model is appropriate.
(Oi − Ei )
2
k
Test Statistic: χ2 = ∑ df = 6-1=5
i =1 Ei
After designing the experiment, we conducted the sample of Santa Clara County, shown in the Observed
Frequency Column of the table below. The Expected Proportion and Expected Frequency Columns are
calculated using the U.S. 2010 Census.
Results:
In 2014, Colorado became the first state to legalize the recreational use of marijuana. Other states have
joined Colorado, while some have decriminalized or authorized the medical use of marijuana. The
question is should marijuana be legalized in all states. Suppose we took a poll of 1000 American adults
and asked "Should marijuana be legal or not legal for recreational use" and got the following results:
Marijuana
should be Count Percent
Legal 500 50%
Not Legal 450 45%
Don't know 50 5%
Total 1000 100%
The interpretation of this poll is that 50% of adults polled favored the legalization of marijuana for
recreational use, while 45% opposed it. The remaining 5% were undecided.
At this time, you might have questions and want to explore this poll in more depth. For example, are
younger people more likely to support legalization of marijuana? Do other demographic characteristics
such as gender, ethnicity, sexual orientation, religion affect people's opinions about legalization.
Let us explore the possibility of difference of opinion due to gender. Are men more likely (or less likely)
to oppose legalization of marijuana compared to women?
In the example above, suppose we have exactly 500 men and 500 women in the survey. What would we
expect to see in the data if there was no difference in opinion between men and women?
Two-way or contingency tables are used to summarize two categorical variables, also known as
bivariate categorical data. In order to create a two-way table, the researcher must cross-tabulate the
two responses for each categorical questions.
P a g e | 56
In the example above, the two categorical variables are gender and opinion on marijuana legalization.
Gender has two choices (male or female) while opinion on marijuana legalization has three choices
(legal, not legal and unsure).
In the example above, suppose we have exactly 500 men and 500 women in the survey. What would we
expect to see in the data if there was no difference in opinion between men and women? We could then
simply apply the total percentages to each group.
Marijuana
To create a hypothetical
should be Men Women Total
two-way table if there was
Legal 50% 50% 50%
no difference in opinion Not Legal 45% 45% 45%
between men and women, Unsure 5% 5% 5%
apply the total percentages Total 100% 100% 100%
for each choice of Opinion
to the total number for Marijuana
each choice of Gender. should be Men Women Total
Legal 250 250 500
eg: Men/Legal would 50% Not Legal 225 225 450
of 500 or 250 people. Unsure 25 25 50
Total 500 500 1000
Let’s review from probability what independence means. If two events A an B are independent, then the
following statements are true:
You can pick any two events in the table above to verify that Gender and Opinion of Legalization of
Marijuana are independent events. For example, compare the events Not Legal and Men.
P(Not Legal given Men) = 225/500 = 45% same as P(Not Legal) = 45%
P(Men given Not Legal) = 225/450 = 50% same as P(Men) = 50%
P(Not Legal and Men) = 225/1000 = 22.5% same as P(Not Legal)P(Men) = (45%)(50%) = 22.5%
Based on these probability rules we can calculate the expected value of any pair of independent events
by using the following formula:
P a g e | 57
What if the events are not independent? Let's review the same survey. What would we expect to see in
the data if there was a difference in opinion between men and women? Let's say women were more
likely to support legalization. In that case, we would expect the 450 people who supported legalization
of marijuana to have a higher number of women (and a smaller number of men) compared to the first
table. Note we only change the first six boxes (shaded below), the totals must remain the same.
Marijuana
This is an example of a
should be Men Women Total
hypothetical two-way table Legal 40% 60% 50%
where women were more Not Legal 55% 35% 45%
likely to support Unsure 5% 5% 5%
legalization. Total 100% 100% 100%
Now let's see the actual results of this survey and see what is happening:
Marijuana
Actual Poll of 500 men and
should be Men Women Total
500 women adults. Should Legal 54% 46% 50%
marijuana be legal for Not Legal 41% 49% 45%
recreational use? Unsure 5% 5% 5%
Total 100% 100% 100%
Marijuana
should be Men Women Total
Legal 270 230 500
Not Legal 205 245 450
Unsure 25 25 50
Total 500 500 1000
P a g e | 58
In this poll, a higher percentage of men support legalization of marijuana for recreational use compared
to women. Question: Is this evidence strong enough to support the claim that gender and opinion about
marijuana legalization are not independent events? This question can addressed by conducting a
hypothesis test using with the Chi-square Test for Independence model.
P a g e | 59
Are Gender and Opinion about legalization of marijuana for recreational use independent events.
Conduct a hypothesis test with a significance level of 5%.
(O − E )
2
r c
• Oij = χ =∑ ∑
ij ij
Observed in category ij 2
df = (r-1)(c-1)
Eij
= =
( ColumnTotal )( RowTotal ) =i 1 =j 1
• Eij npij r = number of row categories
GrandTotal
c = number of column categories
Eij ≥ 5 for each ij n = sample size
Research Hypotheses: Ho: Gender and Opinion about legalization of marijuana for recreational use are
independent events.
Ha: Gender and Opinion about legalization of marijuana for recreational use are
dependent events.
Results
Unsure 25 25 50
25 25
0.000 0.000
Important Assumption: The Expected Value of Each Category needs to be greater than or equal to 5. In
this example, the lowest expected value is 225 (Men, not legal) so the assumption is met.
(O − E )
2
r c
χ =∑ ∑
ij ij
Test Statistic:
2
df = (3-1)(2-1)=2
=i 1 =j 1 Eij
Since the Test Statistic exceeds the critical value, the decision is to Reject Ho. Under the p-value
method, Ho is also rejected since the p-value = P(χ2>6.756) = 0.034 which is less than the Significance
Level α of 5%.
Conclusion: Gender and Opinion about legalization of marijuana for recreational use are dependent
events. Men are more likely to support legalization of marijuana for recreational use.
P a g e | 61
In the Section 7 we used statistical inference to compare two population means under variety of models.
These models can be expanded to compare more than two populations using a technique called Analysis
of Variance, or ANOVA for short. There are many ANOVA models, but we limit our study to one of them,
the One Factor ANOVA model, also known as One Way ANOVA.
Suppose we wanted to compare the means of more than two (k) independent populations and want to
test the null hypothesis 𝑯𝒐: 𝝁𝟏 = 𝝁𝟐 = ⋯ = 𝝁𝒌 . If we can assume all population variances are equal,
we can expand the pooled variance t-test for two populations to one factor ANOVA for k populations.
8.3.2 The logic of ANOVA - How comparing variances test for a difference in means.
When running Analysis of Variance, the data is usually organized into a special ANOVA table, especially
when using computer software.
Sum of Squares: The total variability of the numeric data being compared is broken into the variability
between groups (SSFactor) and the variability within groups (SSError). These formulas are the most tedious
part of the calculation. Tc represents the sum of the data in each population and nc represents the
sample size of each population. These formulas represent the numerator of the variance formula.
( ) (ΣXn ) T 2 (ΣX )
2 2
SSTotal = Σ X 2 − SS Factor = Σ c − SSError = SSTotal − SSFactor
nc n
Degrees of freedom: The total degrees of freedom is also partitioned into the Factor and Error
components.
Mean Square: This represents calculation of the variance by dividing Sum of Squares by the appropriate
degrees of freedom.
F: This is the test statistic for ANOVA: the ratio of two sample variances (mean squares) that are both
estimating the same population value has an F distribution. Computer software will then calculate the p-
value to be used in testing the Null Hypothesis that all populations have the same mean.
Example
At the .05 significance level can Hsieh Li conclude that there is a difference in the mean number of tofu
pizzas sold per day at the three pizzerias?
Example - Design
Research Hypotheses: Ho: µ1=µ2 =µ3 (Mean sales same at all restaurants)
Ha: At least µi is different (Means sales not the same at all restaurants)
We will assume the population variances are equal 𝜎12 = 𝜎22 = 𝜎32 , so the model will be One Factor
ANOVA. This model is appropriate if the distribution of the sample means is approximately Normal
from the Central Limit Theorem.
Type I error would be to reject the Null Hypothesis and claim mean sales are different, when they
actually are the same. The test will be run at a level of significance (α) of 5%.
𝑀𝑆𝐹𝑎𝑐𝑡𝑜𝑟
The test statistic from the table will be F= . The degrees of freedom for numerator will be 3-1=2
𝑀𝑆𝐸𝑟𝑟𝑜𝑟
and the degrees of freedom for denominator will be 13-1=12. (The total sample size turned out to be
only 13, not 15 as planned)
Critical Value for F at α of 5% with dfnum=2 and dfden=12 is 4.10. Reject Ho if F >4.10. We will also run this
test using the p-value method with statistical software, such as Minitab.
Example - Data/Results
𝐹 = 38.125�0.975 = 39.10, which is more than critical value of 4.10, Reject Ho.
Example – Conclusion
8.6 Post-hoc Analysis – Tukey’s Honestly Significant Difference (HSD) Test 15.
When the Null Hypothesis is rejected in one factor ANOVA, the conclusion is that not all means are the
same. This however leads to an obvious question: Which particular means are different? Seeking further
information after the results of a test is called post-hoc analysis.
Overall significance level of α. This means that all pairwise tests can be run at the
same time with an overall significance level of α.
MSE
Test Statistic: HSD = q
nc
q = value from studentized range table
Computer software, such as Megastat, will calculate the critical values and test statistics for these series
of tests.
P a g e | 65
Example
Example - Design
H o : µ1 = µ 2 H a : µ1 ≠ µ 2 H o : µ1 = µ 3 H a : µ1 ≠ µ 3 H o : µ 2 = µ3 H a : µ 2 ≠ µ3
These three tests will be conducted with an overall significance level of α = 5%.
The model will be the Tukey HSD test.
The Minitab approach for the decision rule will be to reject Ho for each pair that does not share a
common group.
Example - Data/Results/Conclusion
Santa Clara has a significantly higher mean number of tofu pizzas sold compared to both San Jose and
Cupertino. There is no significant difference in mean sales between San Jose and Cupertino.
A different way of looking at this model is considering a single population with one numeric and one
categorical variable being sampled. The numeric variable is called the response (tofu pizzas sold) and the
categorical variable is the factor (location of restaurant). The possible responses to the factor are called
the levels (Cupertino, San Jose and Sunnyvale). The number of observations per level are called the
replicates (n1=4, n2=4, n3=5 in our example). If the replicates are equal, the design is balanced. (our
example is not balanced).
By thinking of the model in this way, it easy to extend the concept to the multi-factor ANOVA models
that are prevalent in the research you will encounter in future studies.
P a g e | 66
Beta (β)
The probability, set by design, of failing to reject the Null Hypothesis when it is actually false. Beta is
calculated for specific possible values of the Alternative Hypothesis.
Confidence Interval
An Interval estimate that estimates a population parameter from a random sample using a
predetermined probability called the level of confidence.
Critical value(s)
The dividing point(s) between the region where the Null Hypothesis is rejected and the region where it is
not rejected. The critical value determines the decision rule.
P a g e | 67
Decision Rule
The procedure that determines what values of the result of an experiment will cause the Null Hypothesis
to be rejected. There are two methods that are equivalent decision rules:
1. If the test statistic lies in the Rejection Region, Reject Ho. (Critical Value method)
2. If the p-value < α, Reject Ho. (p-value method)
Dependent Sampling
A method of sampling where 2 or more variables are related to each other (paired or matched).
Examples would be the “Before and After” type models using the Matched Pairs t-test.
Effect Size: The “practical difference” between a population parameter under the Null Hypothesis and a
selected value of the population parameter under the Alternative Hypothesis.
Estimation
An inference process that attempts to predict the values of population parameters based on sample
statistics.
F Distribution
A family of continuous random variables (based on 2 different degrees of freedom for numerator and
denominator) with a probability density function that is from the Normal Family of probability
distributions. The F distribution is non-negative and skewed to the right and has many uses in statistical
inference such as inference about comparing population variances, ANOVA, and regression.
Factor
In ANOVA, the categorical variable(s) that break the numeric response variable into multiple populations
or treatments.
Hypothesis
A statement about the value of a population parameter developed for the purpose of testing.
Hypothesis Testing
A procedure, based on sample evidence and probability theory, used to determine whether the
hypothesis is a reasonable statement and should not be rejected, or is unreasonable and should be
rejected.
P a g e | 68
Independent Sampling
A method of sampling where 2 or more variables are not related to each other. Examples would be the
“Treatment and Control” type models using the independent samples t-test.
Interval Estimate
A range of values based on sample data that used to estimate a population parameter.
Level
In ANOVA, a possible value that a categorical variable factor could be. For example, if the factor was
shirt color, levels would be blue, red, yellow, etc.
Level of Confidence
The probability, usually expressed as a percentage, that a Confidence Interval will contain the true
population parameter that is being estimated.
Margin of Error
The distance in a symmetric Confidence Interval between the Point Estimator and an endpoint of the
interval. For example a confidence interval for 𝜇 may be expressed as 𝑋� ± Margin of Error.
Model Assumptions
Criteria which must be satisfied to appropriately use a chosen statistical model. For example, a student’s
t statistic used for testing a population mean vs. a hypothesized value requires random sampling and
that the sample mean has an approximately Normal Distribution.
Normal Distribution
Often called the “bell-shaped” curve, the Normal Distribution is a continuous random variable which has
Probability Density Function 𝑋 = 𝑒𝑥𝑝[−(𝑥 − 𝜇)2 /2𝜎 2 ]/𝜎√2𝜋. The special case where 𝜇 = 0 and 𝜎 =
1, is called the Standard Normal Distribution and designated by Z.
Outlier
A data point that is far removed from the other entries in the data set.
p-value
The probability, assuming that the Null Hypothesis is true, of getting a value of the test statistic at least
as extreme as the computed value for the test.
Parameter
A fixed numerical value that describes a characteristic of a population.
Point Estimate
A single sample statistic that is used to estimate a population parameter. For example, 𝑋� is a point
estimator for 𝜇.
Population
The set of all possible members, objects or measurements of the phenomena being studied.
A function that assigns a probability to all possible values of a random variable. In the case of a
continuous random variable (like the Normal Distribution), the PDF refers to the area to the left of a
designated value under a Probability Density Function.
Random Sample
A sample where the values are equally likely to be selected and mutually independent of each other.
Random Variable
A numerical value that is determined by an experiment with a probability distribution function.
Replicate
In ANOVA, the sample size for a specific level of factor. If the replicates are the same for each level, the
design is balanced.
P a g e | 70
Rejection Region
Region(s) of the Statistical Model which contain the values of the Test Statistic where the Null
Hypothesis will be rejected. The area of the Rejection Region = α.
Response
In ANOVA, the numeric variable that is being tested under different treatments or populations.
Sample
A subset of the population.
Sample Mean
a) The arithmetic average of a data set.
b) A random variable that has an approximately Normal Distribution if the sample size is sufficiently
large.
Standard Deviation
The square root of the variance and measures the spread of data, distance from the mean. The units of
the standard deviation are the same units as the data.
Statistic
A value that is calculated from sample data only that is used to describe the data. Examples of statistics
are the sample mean, sample standard deviation, range, sample median and the interquartile range.
Since statistics depend on the sample, they are also random variables.
Statistical Inference
The process of estimating or testing hypotheses of population parameters using statistics from a random
sample.
Statistical Model
A mathematical model that describes the behavior of the data being tested.
Test Statistic
A value, determined from sample information, used to determine whether or not to reject the Null
Hypothesis.
Type I Error
Rejecting the Null Hypothesis when it is actually true.
Type II Error
Failing to reject the Null Hypothesis when it is actually false.
Variance
A measure of the mean squared deviation of the data from the mean. The units of the variance are the
square of the units of the data.
Z-score
A measure of relative standing that shows the distance in standard deviations a particular data point is
above or below the mean.
P a g e | 72
I have designed four interactive Flash animations that will provide the student with deeper insight of the
major concepts of inference and hypothesis testing. These animations are on my website
http://nebula2.deanza.edu/~mo/ .
Section 1:
Descriptive Statistics
Section 2:
Probability
Section 3:
Discrete Random Variables
Section 4:
Continuous Random Variables and the Central Limit Theorem (Partially covered in this text)
Section 5:
Point Estimation and Confidence Intervals (Covered in this text)
Section 6:
One Population Hypothesis Testing (Covered in this text)
Section 7:
Two Population Inference (Covered in this text)
Section 8:
Chi-square and ANOVA Tests (Partially covered in this text)
Section 9:
Correlation and Regression
P a g e | 74
5
The Poems of John Godfrey Saxe (Highgate Edition), Boston: Houghton, Mifflin and Company, 1881
6
Donna Young, American Society of Health System Pharmacists, April 6, 2007,
http://www.ashp.org/import/News/HealthSystemPharmacyNews/newsarticle.aspx?id=2517
7
The Lancet, news release, June 29, 2009,
http://www.nlm.nih.gov/medlineplus/news/fullstory_86206.html
8
Ronald Walpole & Raymond Meyers & Keying Ye, Probability and Statistics for Engineers and Scientists.
Pearson Education, 2002, 7th edition.
9
Taleb, Nicholas, The Black Swan: The Impact of the Highly Improbable, Penguin, 2007.
10
Food and Drug Administration, FDA Consumer Magazine , Jan/Feb 2003
11
Mark Blumenthal, Is Polling as we Know it Doomed?, The National Journal Online,
http://www.nationaljournal.com/njonline/mp_20090810_1804.php, August 10, 2009
12
Russ Lenth, Java Applets for Power and Sample Size, University of Iowa ,
http://www.stat.uiowa.edu/~rlenth/Power/ , 2009
13
J. B. Orris, MegaStat for Excel, Version 10.1, Butler University, 2007
14
Shlomo S. Sawilowsky, Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference
Between Two Means When 𝜎12 ≠ 𝜎22 , Journal of Modern Applied Statistical Methods, Vol. 1, No 2, Fall
2002
15
Lowry, Richard. One Way ANOVA – Independent Samples. Vassar.edu, 2011
Dean Fearn, Elliot Nebenzahl, Maurice Geraghty, Student Guide for Elementary Business Statistics,
Kendall/Hunt, 2003
Math 10 – Part 1 Slides
Introduction
Green Sheet – Homework 0
Math 10 Projects
Computer Lab – S44
Minitab
Website
Part 1 http://nebula2.deanza.edu/~mo
Data and Descriptive Statistics Tutor Lab - S43
© Maurice Geraghty 2015 Drop in or assigned tutors – get form from lab.
Group Tutoring
Other Questions
1 2
3 4
5 6
Crime Rate
In the last 18 years, has violent crime:
Increased?
Decreased?
7 8
9 10
11 12
13 14
15 16
17 18
Decline of MySpace
19 20
23 24
25 26
Total 30 1.000
31 32
33 34
n
75.0
cent
M di
Median
Cumulative Perc
35 36
c) 6
37 38
39 40
41 42
n −1
∑x − (∑ xi ) 2 / n
2
s 2
= i
n −1
43 44
2 -4 16
s=
∑ (x − x)
i
2
2 -4
4 16 s2 =
78
=19.5
n −1 5 -1 1 4
9 3 9
12 6 36
s = 19.5 ≈4.42
30 0 78
45 46
47 48
51 52
4-26
53 54
55 56
Probable outliers are more than 3 IQR’s from the box (outer fence)
2 2 3 4 5 5 6 6 7 50 In the box p
plot below,, the dotted lines represent
p the “fences” that are
1.5 and 3 IQR’s from the box. See how the data point 50 is well
with outlier without outlier outside the outer fence and therefore an almost certain outlier.
57 58
without the suspected outlier. For some populations, outliers don’t dramatically change the
Calculate the Z-score of the suspected overall statistical analysis. Example: the tallest person in the
world
ld will
ill nott dramatically
d ti ll change
h the
th mean height
h i ht off 10000
outlier. people.
If the Z-score is more than 3 or less than -3,
However, for some populations, a single outlier will have a
that data point is a probable outlier.
dramatic effect on statistical analysis (called “Black Swan” by
Nicholas Taleb) and inferential statistics may be invalid in
50 − 4.4
Z= = 25.2 analyzing these populations. Example: the richest person in the
world will dramatically change the mean wealth of 10000
1.81 people.
59 60
Graph as Scatterplot
61 62
200 130
180 120
160 110
140 100
90
Price
120
80
Price
100
70
80
60
60
50
40
40
20 15 20 25 30
0 Size
10 15 20 25 30
Size
63 64
12-3 12-4
12-6 12-5
12-7 12-8
12-8
12-9
73 74
X 10 15 20 30 40
Y 40 35 25 25 15
75 76
Example continued
SSXY
r=
SSX ⋅ SSY
− 445
r= = −0.9479
580 ⋅ 330
79
Probability
Classical probability
Math 10 Based on mathematical formulas
Empirical probability
Based on the relative frequencies of
Part 2 historical data.
Probability Subjective probability
© Maurice Geraghty 2015
1 2
3 4
er
l
or
ce
oo
Fa
th
Po
P(A) = 1 – P(A’)
O
Rating
5 6
7 8
9 10
11 12
13 14
Example Example
Accident No Accident Total Accident No Accident Total
DUI 70 130 200 US Car 60 540 600
Non- DUI 30 770 800 Import Car 40 360 400
Total 100
00 900 1000
000 Total 100
00 900 1000
000
17 18
Example Example
10% of prisoners in a Canadian prison are .1 .9
HIV positive.
A test will correctly detect HIV 95% of the A A’
time, but will incorrectly “detect” HIV in non- .95 .05 .15 .85
Example
HIV+ HIV-
A A’ Total
.095
P( A | B ) = ≈ .413
.230
23
Random Variable
The value of the variable depends on
Math 10 an experiment, observation or
measurement.
The result is not known in advance.
Part 3
For the purposes of this class, the
Discrete Random Variables
variable will be numeric.
© Maurice Geraghty 2013
1 2
5 6
7 8
9 10
11 12
13 14
15 16
μ = 2(2) = 4
e −4 46
P( X = 6) = ≈ .1042
6!
19
Continuous Distributions
“Uncountable” Number of possibilities
Math 10 Probability of a point makes no sense
Probability is measured over intervals
Part 4 Slides Comparable to Relative Frequency
Continuous Random Variables and Histogram – Find Area under curve.
the Central Limit Theorem
© Maurice Geraghty, 2015
1 2
3 4
Examples of Exponential
Exponential distribution Distributiuon
Waiting time Time until…
“Memoryless” a circuit will fail
f(x) = (1/μ)e
(1/ )e−(1/μ)x
(1/μ)x the next RM 7 Earthquake
P(x>a) = e –(a/μ) the next customer calls
μ=μ σ2=μ2 An oil refinery accident
P(x>a+b|x>b) = e –(a/μ) you buy a winning lotto ticket
5 6
7 8
a c d b a Xp b
11 12
15 16
7-6 7-9
The Standard Normal Areas Under the Normal Curve – Empirical Rule
Probability Distribution
About 68 percent of the area under the
A normal distribution with a mean of 0 and a normal curve is within one standard deviation
standard deviation of 1 is called the standard of the mean. μ ± 1σ
normal distribution.
Z value: The distance between a selected value,
designated x, and the population mean μ, divided About 95 percent is within two standard
by the population standard deviation, σ deviations of the mean μ ± 2σ
17 18
7-11
Normal Distribution –
EXAMPLE probability problem procedure
The daily water usage per person in a town is
normally distributed with a mean of 20 gallons and Given: Interval in terms of X
a standard deviation of 5 gallons.
X −μ
About 68% of the daily water usage per person in Convert to Z by Z =
New Providence lies between what two values? σ
μ ±1σ = 20 ±1(5). That is, about 68% of the daily Look up probability in table.
water usage will lie between 15 and 25 gallons.
19 20
7-12 7-12
21 22
7-14
Normal Distribution –
EXAMPLE continued percentile problem procedure
The daily water usage per person in a town is Given: probability or percentile desired.
normally distributed with a mean of 20 gallons and
a standard deviation of 5 gallons. Look up Z value in table that corresponds to
What
h percentage off the
h population
l uses more than
h probability
probability.
26.2 gallons?
Convert to X by the formula:
The Z value associated with X=26.2,
Z=(26.2-20)/5=1.24.
Thus P(X>26.2)=P(Z>1.24) X = μ + Zσ
=1-.8925=.1075
23 24
7-14 7-15
EXAMPLE EXAMPLE
Professor Kurv has determined that the final
The daily water usage per person in a town is averages in his statistics course is normally
normally distributed with a mean of 20 gallons and distributed with a mean of 77.1 and a standard
deviation of 11.2.
a standard deviation of 5 gallons. A special tax is
Hee dec
decides
des to ass
assign
g hiss ggrades
ades for
o hiss cu
current
e t
going to be charged on the top 5% of water users
users. course such that the top 15% of the students
Find the value of daily water usage that generates receive an A.
the special tax What is the lowest average a student can receive
to earn an A?
The Z value associated with 95th percentile =1.645
The top 15% would be the finding the 85th
percentile. Find k such that P(X<k)=.85.
X=20 + 5(1.645) = 28.2 gallons per day The corresponding Z value is 1.04. Thus we have
X=77.1+(1.04)(11.2), or X=88.75
25 26
7-17
27 28
0. 1 8
0. 1 8
0. 1 6
0. 1 6
0. 1 4
0. 1 4
0. 1 2
0. 1 2
0. 1
0. 1
0. 0 8
0. 0 8
0. 0 6
0. 0 6
0. 0 4
0. 0 4
0. 0 2
0. 0 2
0
0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
1 2 3 4 5 6
29 30
0. 0 45
0 . 03
0 . 04
0. 0 25
0 0 35
0.
0 . 03
0 . 02
0. 0 25
0. 0 15
0 . 02
0 . 01 0. 0 15
0 . 01
0. 0 05
0. 0 05
0 0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
31 32
0.20
THEN the distribution of the
0.15 sample mean has a Normal x
x
Distribution with: xx
0.10
σ xx x
0.05 μX = μ σX = xxx
n xxx
0.00 xxxxx
xxxxx
1
11
13
15
17
19
21
23
25
x X
33 34
37 38
39
Inference Process
Math 10
Part 5 Slides
Confidence Intervals
© Maurice Geraghty, 2009
1 2
3 4
5 6
9 10
8-18 8-19
8-19 8-20
8-21 8-21
Using the 95% CI for the population Using the 99% CI for the population
mean, we have mean, we have
24 ± 1.96 ( 4 / 7 ) = 22 .88 to 25 .12 24 ± 2.58 ( 4 / 7 ) = 22 .53 to 25 .47
The endpoints of the confidence Compare to the 95% confidence
interval are the confidence limits. The interval. A higher level of confidence
lower confidence limit is 22.88 and the means the confidence interval must be
upper confidence limit is 25.12 wider.
15 16
8-27 8-28
17 18
8-29
n = [( 2 .5 8 )( 2 0 ) / 5 ] 2 = 1 0 6 .5 0 2 4 ≈ 1 0 7
19 20
10-3 9-3
9-3
23 24
Confidence Intervals,
Example- Confidence Interval Population Proportions
Point estimate for proportion X
α/2=.025 of successes in population is: pˆ =
n
df=13-1=12
X is the number of successes
t=2 18
t=2.18 in a sample of size n.
25 26
27 28
8-28
29 30
Example Example
In polling, determine the minimum In polling, determine the minimum
sample size needed to have a sample size needed to have a
margin of error of 3% when p is margin of error of 3% when p is
unknown. known to be close to 1/4.
2
⎛ 1.96 ⎞
n = (.5)(1 − .5)⎜
2
⎛ 1.96 ⎞
⎟ = 1068 n = (.25)(1 − .25)⎜ ⎟ = 801
⎝ .03 ⎠ ⎝ .03 ⎠
31 32
14-2 2-2
CHI--SQUARE DISTRIBUTION
CHI
Characteristics of the Chi-Square
Distribution
df = 3
The major characteristics of the chi-
square distribution are:
It is p
positivelyy skewed
df = 5
It is non-negative df = 10
It is based on degrees of freedom
When the degrees of freedom change, a new
distribution is created
χ2
33 34
Example (cont)
df = n-1 =19
95% CI for σ
39
Math 10
Part 6
Hypothesis Testing
© Maurice Geraghty, 2010
1 2
3 4
9-3 9-4
$9,000.
At least twenty percent of all juvenile offenders are unreasonable and should be rejected.
caught and sentenced to prison.
The standard deviation for an investment portfolio is
no more than 10 percent per month.
7 8
9-6
9-7
11 12
9-6
Correct
Ho is False Type II error Determine Decision Criteria
Decision
α – Significance Level β and Power Analysis
15 16
9-7 9-8
One-Tailed Tests of
Definitions
Significance
A test is one-tailed when the alternate
hypothesis, Ha , states a direction, such as:
Critical value(s): The dividing point(s) between the H0 : The mean income of females is less than or equal to the
region where the null hypothesis is rejected and the mean income of males.
region where it is not rejected. The critical value Ha : The mean income of females is greater than males.
determines the decision rule. Equality is part of H0
Rejection Region: Region(s) of the Statistical Model Ha determines which tail to test
which contain the values of the Test Statistic where
Ha: μ>μ0 means test upper tail.
the Null Hypothesis will be rejected. The area of the
Rejection Region = α Ha: μ<μ0 means test lower tail.
17 18
9-10
Two-Tailed Tests of
One-tailed test Significance
A test is two-tailed when no direction is
H 0 : μ ≤ μ0
specified in the alternate hypothesis Ha ,
such as:
H a : μ > μ0 H0 : The mean income of females is equal to the mean
income of males.
α = .05 Ha : The mean income of females is not equal to the mean
income of the males.
Equality is part of H0
X − μ0
n
19 20
Collect and Analyze Experimental Data Collect and Analyze Experimental Data
Collect and Verify Data Collect and Verify Data
Conduct Experiment Check for Outliers Conduct Experiment Check for Outliers
Determine Test Statistic and/or p-value Determine Test Statistic and/or p-value
Reject Ho and support Ha Fail to Reject Ho Reject Ho and support Ha Fail to Reject Ho
23 24
25 26
Probable outliers are more than 3 IQR’s from the box (outer fence)
In the box p
plot below, the dotted lines represent
p the “fences” that are Calculate the Z-score of the suspected
1.5 and 3 IQR’s from the box. See how the data point 50 is well outlier.
outside the outer fence and therefore an almost certain outlier.
If the Z-score is more than 3 or less than -3,
that data point is a probable outlier.
50 − 4.4
Z= = 25.2
1.81
27 28
For some populations, outliers don’t dramatically change the Conduct Experiment Check for Outliers
overall statistical analysis. Example: the tallest person in the
world
ld will
ill nott dramatically
d ti ll change
h the
th mean height
h i ht off 10000
people. Determine Test Statistic and/or p-value
However, for some populations, a single outlier will have a Compare to Critical Value Compare to α
dramatic effect on statistical analysis (called “Black Swan” by
Nicholas Taleb) and inferential statistics may be invalid in
analyzing these populations. Example: the richest person in the Make a Decision about Ho
world will dramatically change the mean wealth of 10000
people. Reject Ho and support Ha Fail to Reject Ho
29 30
31 32
9-12 9-15
33 34
35 36
37 38
39 40
Conclusions need to use language that is clearly Conclusions need to limit the inference to the
understood in the context of the problem. population that was sampled.
Avoid technical or statistical language. If a survey was taken of a sub-group of population, then the
Refer to the language of the original general question. inference applies to the subgroup.
Compare these two conclusions from a test of correlation Example
between home prices square footage and price. Studies by pharmaceutical companies will only test adult patients, making it
difficult to determine effective dosage and side effects for children.
children
Housing Prices and Square Footage
Conclusion 1: By rejecting the Null “In the absence of data, doctors use their medical judgment to decide on a
Hypothesis we are inferring that the
200
180
particular drug and dose for children. ‘Some doctors stay away from drugs,
Alterative Hypothesis is supported and 160 which could deny needed treatment,’ Blumer says. "Generally, we take our
that there exists a significant correlation 140 best guess based on what's been done before.”
between the independent and dependent
“The antibiotic chloramphenicol was widely used in adults to treat infections
120
variables in the original problem
Price
100
comparing home prices to square 80 resistant to penicillin. But many newborn babies died after receiving the
footage. 60 drug because their immature livers couldn't break down the antibiotic.”
40
Conclusion 2: Homes with more square source: FDA Consumer Magazine – Jan/Feb 2003
20
0
footage generally have higher prices. 10 15 20 25 30
Size
41 42
Conclusions need to report sampling methods Conclusions should address the potential or
that could question the integrity of the necessity of further research, sending the
random sample assumption. process back to the first procedure.
Be aware of how the sample was obtained. Here are some Answers often lead to new questions.
examples of pitfalls: If changes are recommended in a researcher’s conclusion, then
Telephone polling was found to under-sample young people during the further research is usually needed to analyze the impact and
2008 presidential campaign because of the increase in cell phone only
households. Since yyoung g people
p p were more likelyy to favor Obama,, this
effectiveness of the implemented changes.
caused bias in the polling numbers. There may have been limitations in the original research project
Sampling that didn’t occur over the weekend may exclude many full time (such as funding resources, sampling techniques, unavailability
workers. of data) that warrants more a comprehensive study.
Self-selected and unverified polls (like ratemyprofessors.com) could contain
immeasurable bias.
Example: A math department modifies is curriculum based on a
performance statistics for an experimental course. The department would
want to do further study of student outcomes to assess the effectiveness of
the new program.
43 44
9-13
45 46
9-16
49 50
51 52
55 56
The bus company considers a “practical” value Determine the Critical Value
for purposes of bus safety to be that the pads Reject Ho if X > 58,837
at least 58,000 miles.
Calculate β and Power
If the standard deviation is 5,000 and the β = 12%
sample size is 50, find the Power of the test Power = 1 – β = 88%
when the mean is really 58,000 miles. Assume
α = .05
57 58
59 60
10-5 10-9
Like the normal distribution, the logic for one and two tail
The test statistic for the one sample case is given by: testing is the same.
For a two-tail test using the t-distribution, you will reject
X − μ the null hypothesis when the value of the test statistic is
t = greater than tdf,α/2 or if it is less than - tdf,α/2
s / n
For a left-tail test using the t-distribution, you will reject
The degrees of freedom for the test is n-1. the null hypothesis when the value of the test statistic is
The shape of the t distribution is similar to the Z, less than -tdf,α
except the tails are fatter, so the logic of the decision
For a right-tail test using the t-distribution, you will reject
rule is the same. the null hypothesis when the value of the test statistic is
greater than tdf,α
61 62
10-6 10-7
63 64
67 68
9-24 9-25
9-26 10-7
In the past, 15% of the mail order solicitations for a certain Research Hypotheses
charity resulted in a financial contribution. Ho: The new letter is not more effective.
A new solicitation letter has been drafted and will be sent to Ha: The new letter is more effective.
a random sample of potential donors.
A hypothesis test will be run to determine if the new letter is In terms of the population proportion
more effective. Ho: p = 0.15
Determine the sample size so that: Ha: p > 0.15
The test can be run at the 5% significance level. Significance level
If the letter has an 18% success rate, (an effect size of 3%), the power
of the test will be 95% α =.05
After determining the sample size, conduct the test. Test Statistic (Model)
Z-test of proportion vs. hypothesized value.
71 72
73 74
9-27
EXAMPLE
Example – Output of Data Analysis Critical Value Alternative Method
286
Critical Value =1.645 (95th percentile of the Normal
Distribution.)
H0 is rejected if Z > 1.645
⎛ 286 ⎞
Test Statistic: ⎜ − .15 ⎟
1366
⎝ 1652 ⎠ = 2 .63
Z=
Response No Response
(. 15 )(. 85 )
1652
P-value = .0042
α =0.05 Since Z = 2.63 > 1.645, H0 is rejected. The new
letter is more effective.
Since p-value < α, Ho is rejected and we support Ha.
75 76
9-24
77 78
σ o2
0 10 20 30 40
79 80
10-7
25
20
Significance level
α =.01 p-value = .0054
Test Statistic (Model) α =0.01
χ2-test of variance vs. hypothesized value.
Since p-value < α, Ho is rejected and we support Ha.
81 82
9-27
EXAMPLE
Critical Value Alternative Method Example – Decision Graph
Critical Value =22.164 (1st percentile of the Chi-
square Distribution.)
Example - Conclusions
Results:
The evidence supports the claim (pvalue<.01) that the standard deviation
for 8th grade test scores is less than 30.
Sampling Methodology:
Th 41 test
The t t scores were the
th results
lt off th
the recently
tl administered
d i i t d exam to
t
the 8th grade students.
Limitations:
Since the exams were for the current class only, there is no assurance that
future classes will achieve similar results.
Further Research
Compare results to other schools that administered the same exam.
Continue to analyze future class exams to see if the claim is holding true.
85
1 2
3 4
( X 1 − X 2 ) − ( μ1 − μ 2 )
The standard deviation is Z=
σ1 2
σ 2
σ 12 σ 22
given by the formula
n1
+
n2
2
+
n1 n2
If n1 and n2 are sufficiently large, X1 − X 2
follows a normal distribution.
5 6
10-13
7 8
9 10
Reject Ho if the
pool pool
Z 4.19
p-value 0.0000137
11 12
10-10 10-11
13 14
10-12 10-13
EXAMPLE 2 EXAMPLE 2
A recent EPA study compared the highway fuel
: H o : μ1 ≤ μ 2 H a : μ1 > μ 2
economy of domestic and imported passenger
cars. : α=.05
A sample of 12 imported cars revealed a mean of
35 76 mpg with a standard deviation of 3
35.76 3.86.
86 : t = ( X 1 − X 2 ) /( s p 1 / n1 + 1 / n2 )
A sample of 15 domestic cars revealed a mean of : H0 is rejected if t>1.708, df=25
33.59 mpg with a standard deviation of 2.16
mpg. : t=1.85 H0 is rejected. Imports have a
At the .05 significance level can the EPA conclude higher mean mpg than domestic cars.
that the mpg is higher on the imported cars?
(Let subscript 2 be associated with domestic
cars.)
15 16
10-13
t′ =
( X 1 − X 2 ) − (μ1 − μ 2 ) : H o : μ1 ≤ μ 2 H a : μ1 > μ 2
Test statistic: s 12 s2 : α=.05
+ 2
n1 n2
: tt’ test
2
⎛ s 12
⎜⎜
s2 ⎞
+ 2 ⎟⎟ : H0 is rejected if t>1.746, df=16
df = ⎝ n1 n2 ⎠
Degrees of freedom: (
⎡ s2 n 2 ) (
s 2 n2 ⎤ )
2
: t’=1.74 H0 is not rejected. There is
⎢ 1 1
+ 2 ⎥
⎣⎢ (n 1 − 1 ) (n 2 − 1 ) ⎥⎦ insufficient sample evidence to claim a higher
mpg on the imported cars.
This test (also known as the Welch-Aspin Test) has less power
then the prior test and should only be used when it is clear the
population variances are different.
17 18
domestic 29.8 33.3 34.7 37.4 34.4 32.7 30.2 36.2 35.5 34.6 33.2 35.1 33.6 31.3 31.9
import 39.0 35.1 39.1 32.2 35.6 35.5 40.8 34.7 33.2 29.4 42.3 32.2
19 20
10-14
21 22
10-15 10-16
By taking the
•Data for Hertz difference of each pair,
variability (measured
X 1 = 46.67 by standard deviation)
is reduced.
s1 = 5.23
X d = 1.80
•Data for Avis
sd = 2.513
X 2 = 44.87
n = 15
s2 = 5.62
25 26
10-18
EXAMPLE 3 continued
Megastat Output – Example 3
H 0 : μd = 0 H1: μd ≠ 0
α=.05
Matched pairs t test, df=14
H0 is rejected if t<-2.145 or t>2.145
t = (1.80 ) /[ 2.513 / 15 ] = 2.77
Reject H0.
There is a difference in mean price for
compact cars between Hertz and Avis.
Avis has lower mean prices.
27 28
11-3 11-4
Characteristics of F-
Test for Equal Variances
Distribution
There is a “family” of F For the two tail test, the test statistic is given
Distributions.
Each member of the family is by: S2
determined by two
parameters: the numerator F = i
2
degrees of freedom and the S j
denominator degrees of
freedom. si2 and s 2j are the sample variances for
F cannot be negative, and it
is a continuous distribution. the two populations.
The F distribution is There are 2 sets of degrees of freedom:
positively skewed.
Its values range from 0 to ∞
ni-1 for the numerator, nj-1 for the
. As F → ∞ the curve denominator
approaches the X-axis.
29 30
11-6
11-7
EXAMPLE 4 continued
Excel Example
: H o : σ1 ≤ σ 2 H a : σ1 > σ 2 Using Megastat – Test for equal variances under two
: α =.05 population independent samples test and click the
: F-test box to test for equality of variances
:H0 is rejected if F>3
F>3.68,
68 df=(9
df=(9,7)
7) The default pp-value is a two-tailed test,, so take one-
half reported p-value for one-tailed tests
: F=4.92/3.52 =1.96 Æ Fail to RejectH0. Example – Domestic vs Import Data
Ho :σ1 = σ 2 H a :σ1 ≠ σ 2
There is insufficient evidence to claim more α =.10
variation in the software stock. Reject Ho means use unequal variance t-test
FTR Ho means use pooled variance t-test
33 34
35 36
14-2
Part 8
It is based on degrees of freedom
When the degrees of freedom change a new
Chi-square and ANOVA tests distribution is created
1 2
2-2 14-4
CHI--SQUARE DISTRIBUTION
CHI Goodness-of-Fit Test: Equal
Expected Frequencies
df = 3
Let Oi and Ei be the observed and expected
frequencies respectively for each category.
df = 5 H0 : there is no difference between Observed and
Expected Frequencies
df = 10 H a: there is a difference between Observed and
Expected Frequencies
The test statistic is: (O i − Ei )
2
χ 2
= ∑ Ei
14-5 14-6
EXAMPLE 1
EXAMPLE 1 continued
The following data on absenteeism was collected from a
manufacturing plant. At the .01 level of significance, test to Assume equal expected frequency:
determine whether there is a difference in the absence rate by (95+65+60+80+100)/5=80
day of the week.
Day Frequency Day
y O E ((O-E)^2/E
)
Mon 95 80 2.8125
Monday 95 Tues 65 80 2.8125
Tuesday 65 Wed 60 80 5.0000
Wednesday 60 Thur 80 80 0.0000
Fri 100 80 5.0000
Thursday 80
Total 400 400 15.625
Friday 100
5 6
14-7 14-8
7 8
14-9 14-10
Status O E (O − E)
2
9 10
14-15 14-16
11 12
14-17 14-18
13 14
11-3 11-8
15 16
11-9
17 18
11-10 11-11
ANOVA NOTES
Formulas for ANOVA
If there are k populations being sampled, then the df
(numerator)=k-1
( ) (ΣXn )
If there are a total of n sample points, then df (denominator) =
2
n-k
SSTotal = Σ X 2 −
The test statistic is computed
p by:F=[(SS
y [( F)/(
)/(k-1)]/[(SS
)]/[( E)/(
)/(N-k)].
)]
11-12
21 22
23 24
11-14
ANOVA TABLE
Design: Ho: μ1=μ2=μ3
Source SS df MS F Ha: Not all the means are the same
α= 05
α=.05
Factor 76.25 2 38.125 39.10 Model: One Factor ANOVA
H0 is rejected if F>4.10
Error 9.75 10 0.975 Data: Test statistic: F=[76.25/2]/[9.75/10]=39.1026
Total 86.00 12 H0 is rejected.
Conclusion: There is a difference in the mean
number of pizzas sold at each pizzeria.
25 26
27 28
29 30
Requirements: Model is ussually balanced, which means that the sample size in each population
should be the same. The samples taken in each population are called replicates. Each population is
called a treatment. (Note: There are methods of approximating this model if the design is not
balanced, but we will not cover them.)
Overall significance level of α. This means that all pairwise tests can be run at the same time with an
overall significance level of α.
MSE
Test Statistic: HSD = q
nc
q = value from studentized range table.
Note: Minitab will group differences into families by assigning letters. Pairs that do not share a
common letter are significantly different pairs.
Example:
Valencia oranges were tested for juiciness at 4 different orchards. Eight oranges were sampled from
each orchard, and the total ml of juice per 20 gms of orange was calculated:
Source DF SS MS F P
Factor 3 69.59 23.20 7.31 0.001
Error 28 88.88 3.17
Total 31 158.47
N Mean Grouping
Orchard C 8 12.750 A
Orchard A 8 11.500 A B
Orchard B 8 9.375 B
Orchard D 8 9.250 B
Mathematical Model
Math 10
You have a small business producing custom t-shirts.
Without marketing, your business has revenue
(sales) of $1000 per week.
E
Every dollar
d ll you spend d marketing
k ti will
ill increase
i
revenue by 2 dollars.
Correlation and Regression Let variable X represent amount spent on marketing
and let variable Y represent revenue per week.
Part 9 Slides Write a mathematical model that relates X to Y
© Maurice Geraghty 2015
1 2
$0 $1000
$500 $2000
$1000 $3000
$1500 $4000
$2000 $5000
3 4
Y = β0 + β1X Y =1000 + 2X
Y : Dependent Variable Y : Re venue
X : Independent Variable X : Marketing
β0 : Y −intercept β0 : $1000
β1 : Slope β1 : $2 per $1marketing
5 6
7 8
9 10
12-15
11 12
13 14
12-16 12-19
60
Make a Scatterplot
per 1000
0
sales sungla
40
Find the least square line 20
0
0 10 20 30 40 50
X 10 15 20 30 40
rainfall
Y 40 35 25 25 15
17 18
b0 = 45.647
Σ 115 140 3225 4300 2775
Yˆ = 45.647 - .767X
19 20
12-18
12-3 12-4
12-6 12-5
12-7 12-8
12-8
12-10
City with more police per capita have r2 is the proportion of the total variation in
more crime per capita. the dependent variable Y that is explained
or accounted for by the variation in the
As Ice cream sales go up,
up shark attacks independent
d d variable
bl X.
go up. The coefficient of determination is the
People with a cold who take a cough square of the coefficient of correlation, and
medicine feel better after some rest. ranges from 0 to 1.
31 32
12-9
Y 40 35 25 25 15
SSXY = ΣXY − 1n (ΣX ⋅ ΣY )
(
SSR = SSY − SSXY
2
SSX
) 33 34
60
per 1000
0
sales sungla
40
20
0
0 10 20 30 40 50
rainfall
35 36
11-3
39 40
12-20 12-21
1 (X − X ) 1 (X − X )
2 2
Yˆ ± t ⋅ se ⋅ + Yˆ ± t ⋅ se ⋅ 1 + +
n SSX n SSX
Degrees of freedom for t =n-2 Degrees of freedom for t =n-2
43 44
45 46
49 50
51 52
53
a. CategoricalorNumericData?
b. One,Twoormanypopulations?
c. Testofmean,proportion,standarddeviation,orsomethingelse?
d. Independentordependentsampling?
e. Largeorsmallsamplesize?
f. Oneortwotailedtest?
x OnePopulationTests
o NumericData
1. Ztestforpopulationmeanvs.hypothesizedvalue(Part6slides)
x Testofmean,populationstandarddeviationknown
x Ho : P 10 Ha : P z 10, Ho : P d 10 Ha : P ! 10, Ho : P t 10 Ha : P 10
X P0
x Z Degreesoffreedom–NotApplicable
V
n
2. ttestforpopulationmeanvs.hypothesizedvalue(Part6slides)
x Testofmean,populationstandarddeviationunknown
x Ho : P 10 Ha : P z 10, Ho : P d 10 Ha : P ! 10, Ho : P t 10 Ha : P 10
X P0
x t Degreesoffreedom=n1
s
n
3. F2testforvariancevs.hypothesizedvalue(Part6slides)
x Testofstandarddeviationorvariance
x Ho : V 10 Ha : V z 10, Ho : V d 10 Ha : V ! 10, Ho : V t 10 Ha : V 10
s2 n 1
x F 2
Degreesoffreedom=n1
V2
o CategoricalData
4. Ztestforproportionvs.hypothesizedvalue(Part6slides)
x Twochoices(Yes/No)Testofpopulationproportion
x Ho : p 0.5 Ha : p z 0.5, Ho : p d 0.5 Ha : p ! 0.5, Ho : p t 0.5 Ha : p 0.5
pˆ p0
x Z Degreesoffreedom=notapplicable
po 1 po
n
5. F2Goodnessoffittest(Part8Slides)
x Multiplechoices(k)Testofmultipleproportions
x Ho : p1 0.4 p2 0.1 p3 0.5 Ha : At least one pi is different
2
OE
x F 2
¦ E
Degreesoffreedom=k1
x TwoormorePopulationTests
o NumericDataOnescalevariablewithtwoormorepopulations(factorvariable)
6. IndependentSamples:Ztest(Part7Slides)
x Comparing2Means–LargeSampleSize(n1,n2>30)orpopulationstandarddeviation
known
x Ho : P1 P 2 Ha : P1 z P 2 , Ho : P1 d P 2 Ha : P1 ! P 2 , Ho : P1 t P 2 Ha : P1 P 2
X 1 X 2 P1 P 2
x Z Degreesoffreedom–NotApplicable
V 12 V 22
n1 n2
7. IndependentSamplettestwithequalvariances(pooledvariancettest)(Part7Slides)
x Comparing2Means–NotLargeSampleSizes,assume V 1 V 2
x Ho : P1 P 2 Ha : P1 z P 2 , Ho : P1 d P 2 Ha : P1 ! P 2 , Ho : P1 t P 2 Ha : P1 P 2
X 1 X 2 P1 P 2
x t Degreesoffreedom=n1+n22(Morepower)
1 1
sp
n1 n2
8. IndependentSamplettestwithequalvariances(Part7Slides)
x Comparing2Means–NotLargeSampleSizes,assume V 1 z V 2
x Ho : P1 P 2 Ha : P1 z P 2 , Ho : P1 d P 2 Ha : P1 ! P 2 , Ho : P1 t P 2 Ha : P1 P 2
X 1 X 2 P1 P 2
x t degreesoffreedom<n1+n22(Lesspower)
s12 s 22
n1 n2
9. DependentSampling–MatchedPairs(Part7Slides)
x Comparing2Means–Lookatdifferencesofmeasurements
x Ho : P d 0 Ha : P d z 0, Ho : P d d 0 Ha : P d ! 0, Ho : P d t 0 Ha : P d 0
2
OE
x F2 ¦ E
Degreesoffreedom=n1
10. FtestofVariances(Part7Slides)
x Comparing2Variances
x Ho : V 1 V 2 Ha : V 1 z V 2 , Ho : V 1 d V 2 Ha : V 1 ! V 2 , Ho : V 1 t V 2 Ha : V 1 V 2
s12 s 22
x F or F Degreesoffreedom=n11,n21orn21,n11
s 22 s12
x Usethistesttohelpchoosebetweenmodels7and8above.
11. OneFactorAnalysisofVariance(Part8Slides)
x Comparing3ormoreMeans–(ANOVA)–Ftest
x Ho : P1 P 2 P 3 ... P k Ha : at least one P i is different
MSfactor
x (ANOVAtable) F Degreesoffreedom=k1,nk
MSerror
x PostHocPairwisecomparisons–Tukey’sHSDtest(Part8Slides)
x CategoricalvariableisFactor,NumericVariableisResponse
o CategoricalData–Comparing2ormorevariables
12. F2TestforIndependence(Part8Slides)
x Testforarelationshipbetweentwovariables(AandB)inacontingencytable
x Ho:AandBareIndependentHa:AandBaredependent
2
OE
x F 2
¦ E
Degreesoffreedom=(rows1)(columns1)
x E=(RowTotal)(ColumnTotal)/GrandTotal
o Twonumericvariables(X,Y)–bivariatedata
13. CorrelationCoefficient(Part9Slides)
StrengthofRelationshipbetweentwovariables
x correlationbetweentwonumericvariables
x XandYarenotcorrelatedHa:XandYarecorrelated
MSregression
x (ANOVAtable) F Degreesoffreedom=1,n2
MSerror
14. SimpleLinearRegression–Ftest(Part9Slides)
x SignificanceandpredictionofLinearfitbetweentwovariables
x Ho : slope 0 Ha : slope z 0
MSregression
x (ANOVAtable) F Degreesoffreedom=1,n2
MSerror
Math 10 ‐ Homework 0 Name:_____________________________
Course Syllabus and Materials
Questions about the Calendar (on the same page as the Syllabus)
9. What are the deadlines for dropping and withdrawing from the course?
Exploring the Website – Find the link for Math 10 Handouts and open the PDF file for Part 1. I highly recommend you
print this out and bring it to class to take notes.
Frequently Asked Questions – Find the ”FAQ” at the top of the page. If Flash is not working, you can use the menu
sidebar on the home page.
12. Find the question that starts “I need help in this class…” What are three things you can do if you need help?
13. Read the two questions that have two do with cheating and sign and date the following statement:
I have read the course policy on cheating in both the syllabus and the Frequently Asked Questions . I understand and
agree to the terms as outlined in these policies.
___________________________________________________ __________________________
Signature Date
Please write any Comments or Questions about the Course Policies here:
Math 10 ‐ Homework 1
1. Identify the following data by type (categorical, discrete, continuous) and level (nominal, ordinal, interval, ratio)
b. Make of automobile.
c. Age of a fossil.
2. A poll was taken of 150 students at De Anza College. Each student was asked how many hours they work outside of
college. The students were interviewed in the morning between 8Am and 11 AM on a Thursday. The sample mean
for these 150 students was 9.2 hours.
d. Is the sample mean of 9.2 a reasonable estimate of the mean number of hours worked for all students
at De Anza? Explain any possible bias.
City B 29 38 38 40 40 48 48 50 52 52 54 55 56 57 57 58
58 58 59 59 59 62 62 63 66 66 67 69 69 71 75 89
a. Construct a back‐to back stem and leaf diagram and interpret the results.
e. For each group, determine the z‐score for a commute of 75 minutes. For which group would a 75 minute
commute be more unusual.
5. The February 10, 2009 Nielsen ratings of 20 TV programs shown on commercial television, all starting between 8 PM
and 10 PM, are given below:
2.1 2.3 2.5 2.8 2.8 3.6 4.4
4.5 5.7 7.6 7.6 8.1 8.7 10.0
10.2 10.7 11.8 13.0 13.6 17.3
a. Graph a stem and leaf plot with the tens and ones units making up the stem and the tenths unit being the leaf.
b. Group the data into intervals of width 2, starting the 1st interval at 2 and obtain the frequency of each of the
intervals.
d. Obtain the relative frequency, % and cumulative frequency and cumulative relative frequency for the intervals in
(b)
f. Obtain the sample mean and the median. Compare the median to the ogive.
i. Assuming the data are bell shaped, between what two numbers would you expect to find 68% of the data
6. The following data represents recovery time for 16 patients (arranged in a table to help you out)
c. Use the range of the data to see if the standard deviation makes sense. (Range should be between 3 and
6 standard deviations)
d. Using the empirical rule between what two numbers should you expect to see 68% of the data? 95% of
the data? 99.7% of the data?
e. Calculate the Z‐score for observation. Do you think any of these data are outliers?
7. The following data represents the heights (in feet) of 20 almond trees in an orchard.
b. Do you think the tree with height of 45 feet is an outlier? Use both methods we covered in class to
justify your answer.
8. Rank the following correlation coefficients from weakest to strongest.
9. If you were trying to think of factors that affect health care costs:
a. Choose a variable you believe would be positively correlated with health care costs.
b. Choose a variable you believe would be negatively correlated with health care costs.
c. Choose a variable you believe would be uncorrelated with health care costs.
Math 10 ‐ Homework 2
1. A student has a 90% chance of getting to class on time on Monday and a 70% chance of
getting to class on time on Tuesday. Assuming these are independent events, determine the
following probabilities:
2. A class has 10 students, 6 females and 4 males. 3 students will be sampled without
replacement for a group presentation.
a. Construct a tree diagram of all possibilities (there will be 8 total branches at the end)
3. 20% of professional cyclists are using a performance enhancing drug. A test for the drug has
been developed that has a 60% chance of correctly detecting the drug(true positive).
However, the test will come out positive in 2% of cyclists who do not use the drug (false
positive).
a. Construct a tree diagram where the first set of branches are cylcists with and without the
drug, and the 2nd set is whether or not they test positive.
d. If a cyclist tests positive, what is the probability that the cyclist really used the drug?
4. We wish to determine the morale for a certain company. We give each of the workers a
questionnaire and from their answers we can determine the level of their morale, whether it
is ‘Low’, ‘Medium ‘ or ‘High’; also noted is the ‘worker type’ for each of the workers. For
each worker type, the frequencies corresponding to the different levels of morale are given
below.
WORKER MORALE
Worker Type Low Medium High
Executive 1 14 35
Upper Management 5 30 65
Lower Management 5 40 55
Non‐Management 354 196 450
a. We randomly select 1 worker from this population. What is the probability that the
worker selected
• is an executive?
• is an executive, given the information that the worker has medium morale.
b. Given the information that the selected worker is an executive, what is the probability
that the worker
Additional Problems:
1. Explain the difference between population parameters and sample statistics. What symbols do we use for the
mean and standard deviation for each of these?
2. Consider the following probability distribution function of the random variable X which represents the number
of people in a group(party) at a restaurant:
X P(X)
1 .10
2 .25
3 .20
4 .20
5 .10
6 .05
7 .05
8 .05
c. Find the probability that the next party will be over 4 people.
d. Find the probability that the next three parties (assuming independence) will each be over 4 people.
3. 10% of all children at large urban elementary school district have been diagnosed with learning disabilities. 10
children are randomly and independently selected from this school district.
a. Let X = the number of children with learning disabilities in the sample. What type of random variable is this?
c. Find the probability that exactly 2 of these selected children have a learning disability.
d. Find the probability that at least 1 of these children has a learning disability.
e. Find the probability that less than 3 of these children have a learning disability.
4. A general statement is made that an error occurs in 10% of all retail transactions. We wish to evaluate the
truthfulness of this figure for a particular retail store, say store A. Twenty transactions of this store are
randomly obtained. Assuming that the 10% figure also applies to store A and let X be the number of retail
transactions with errors in the sample
a. The probability distribution function (pdf) of X is binomial. Identify the parameters n and p.
5. A newspaper finds a mean of 4 typographical errors per page. Assume the errors follow a Poisson distribution.
a. Let X equal the number of errors on one page. Find the mean and standard deviation of this random
variable.
b. Find the probability that exactly three errors are found on one page.
c. Find the probability that no more than 2 errors are found on one page.
d. Find the probability that no more than 2 errors are found on two pages.
6. Major accidents at a regional refinery occur on the average once every five years. Assume the accidents follow a
Poisson distribution.
7. 20% of the people in a California town consider themselves vegetarians. If 20 people are randomly sampled,
find the probability that:
a. Exactly 3 are vegetarians.
b. At least 3 are vegetarians.
c. At most 3 are vegetarians
8. Cargo ships arrive at a loading dock at a rate of 2 per day. The dock has the capability of handling 3 arrivals per
day. How many days per month (assume 30 days in a month) would you expect the dock being unable to handle
all arriving ships? (Hint: first find the probability that more than 3 ships arrive and then use that probability to
find the expected number of days in a month too many ships arrive.)
Math 10 ‐ Homework 4
1. A ferry boat leaves the dock once per hour. Your waiting time for the next ferryboat will follow a uniform
distribution from 0 to 60 minutes.
2. The cycle times for a truck hauling concrete to a highway construction site are uniformly distributed over the
interval 50 to 70 minutes.
a. Find the mean and variance for cycle times.
b. Find the 5th and 95th percentile of cycle times.
c. Find the interquartile range.
d. Find the probability the cycle time for a randomly selected truck exceeds 62 minutes.
e. If you are given the cycle time exceeds 55 minutes, find the probability the cycle time is between 60
and 65 minutes.
3. The amount of gas in a car’s tank (X) follows a Uniform Distribution where the minimum is zero and the
maximum is 12 gallons.
a. Find the mean and median amount of gas in the tank.
b. Find the variance and standard deviation of gas in the tank.
c. Find the probability that there is more than 3 gallons in the tank.
d. Find the probability that there is between 4 and 6 gallons in the tank.
e. Find the probability that there is exactly 3 gallons in the tank
f. Find the 80th percentile of gas in the tank.
4. A normally distributed population of package weights has a mean of 63.5 g and a standard deviation of 12.2 g.
5. Assume the expected waiting time until the next RM (Richter Magnitude) 7.0 or greater earthquake somewhere
in California follows an exponential distribution with μ = 10 years.
a. Find the probability of waiting 10 or more years for the next RM 7.0 or greater earthquake.
b. Determine the median waiting time until the next RM 7.0 or greater earthquake.
6. High Fructose Corn Syrup (HFCS) is a sweetener in food products that is linked to obesity and type II diabetes.
The mean annual consumption in the United States in 2008 of HFCS was 60 lbs with a standard deviation of 20
lbs. Assume the population follows a Normal Distribution.
a. Find the probability a randomly selected American consumes more than 50 lbs of HFCS per year.
b. Find the probability a randomly selected American consumes between 30 and 90 lbs of HFCS per year.
d. In a sample of 40 Americans how many would you expect consume more than 50 pounds of HFCS per
year.
e. Between what two numbers would you expect to contain 95% of Americans HFCS annual consumption?
g. A teenager who loves soda consumes 105 lbs of HFCS per year. Is this result unusual? Use probability to
justify your answer.
7. State in your own words the 3 important parts of the Central Limit Theorem.
8. For women aged 18‐24, systolic blood pressures (in mmHg) are normally distributed with μ=114.8 and σ=13.1.
a. Find the probability a woman aged 18‐24 has systolic blood pressure exceeding 120.
b. If 4 women are randomly selected, find the probability that their mean blood pressure exceeds 120.
c. If 40 women are randomly selected, find the probability that their mean blood pressure exceeds 120.
d. If the pdf for systolic blood pressure did NOT follow a normal distribution, would your answer to part c
change? Explain.
9. The following data represents 20 random samples from a discrete uniform distribution S={1,2,3,4,5,6,7,8,9}.
( )
The sample mean X was calculated for each group:
a. Consider the sample mean (last column) as a random variable and group the data into the following
categories and make a histogram:
( )
Interval for X Frequency Rel Freq
(4.05 to 4.50)
(4.55 to 5.00)
(5.05 to 5.50)
(5.55 to 6.00)
Total
d. For this discrete uniform distribution, μ= 5 and σ = 2.58. Based on the Central Limit Theorem, what
would the mean and standard deviation of the sample mean random variable be? How does this
compare with sample mean and standard deviation results from part c?
Math 10 - Homework 5
1. The average number of years of post secondary education of employees in an industry is 1.5. A company claims
that this average is higher for its employees. A random sample of 16 of its employees has an mean of 2.1 years
of post secondary education with a standard deviation of 0.6 years.
a. Find a 95% confidence interval for the mean number years of post secondary education for the
company’s employees. How does this compare with the industry value?
b. Find a 95% confidence interval for the standard deviation of number years of post secondary education
for the company’s employees.
2. When polling companies report a margin of error, they are referring to a 95% confidence interval. Go to the
website www.pollingreport.com and verify the stated margins of error for 2 polls.
Constructing Confidence Intervals In Exercises 3 and4 you are given the sample mean and the sample standard
deviation. Assume the random variable is normally distributed and use a t-distribution to construct a 95%
confidence interval for the population mean µ. What is the margin of error of the confidence interval?
3. Repair Costs: Microwaves In a random sample of five microwave ovens, the mean repair cost was $75.00 and
the standard deviation was $12.50.
4. Repair Costs: Computers In a random sample of seven computers, the mean repair cost was $100.00 and the
standard deviation was $42.50.
5. You did some research on repair costs of microwave ovens and found that the standard deviation is σ = $15.
Repeat Exercise 3, using a normal distribution with the appropriate calculations for a standard deviation that is
known. Compare the results.
6. Mini-Soccer Balls A soccer ball manufacturer wants to estimate the mean circumference of mini-soccer balls
within 0.15 inch. Assume that the population of circumferences is normally distributed.
(a) Determine the minimum sample size required to construct a 99% confidence interval for the population
mean. Assume the population standard deviation is 0.20 inch.
(b) Repeat part (a) using a standard deviation of 0.10 inch. Which standard deviation requires a larger sample
size? Explain.
(c) Repeat part (a) using a confidence level of 95%. Which level of confidence requires a larger sample size?
Explain.
7. If all other quantities remain the same, how does the indicated change affect the minimum sample size
requirement (Increase, Decrease or No Change)?
8. Stressful Travel: In a survey of 3224 U.S. adults, 1515 said flying is the most stressful form of travel. Construct a
95% confidence interval for the proportion of all adults who say flying is the most stressful form of travel.
9. Accidents and Alcohol: A study of 2008 traffic fatalities found that 800 of the fatalities were alcohol related.
Find a 99% confidence interval for the population proportion and explain what it means.
10. Happy at Work? In a survey of 1003 U.S. adults, 662 would be happy spending the rest of their career with their
current employer. Construct a 90% confidence interval for the proportion who would be happy staying with
their current employer. Does this result surprise you?
11. Computer Repairs You wish to estimate, with 95% confidence and within 3.5% of the true population, the
proportion of computers that need repairs or have problems by the time the product is three years old
a. No preliminary estimate is available. Find the minimum sample size needed.
b. Find the minimum sample size needed, using a prior study that found that 19% of computers needed
repairs or had problems by the time the product was three years old.
12. Lawn Mower A lawn mower manufacturer is trying to determine the standard deviation of the life of one of its
lawn mower models. To do this, it randomly selects 12 lawn mowers that were sold several years ago and finds
that the sample standard deviation is 3.25 years. Use a 99% level of confidence to find a confidence interval for
standard deviation.
13. Monthly Income The monthly incomes of 20 randomly selected individuals who have recently graduated with a
bachelor's degree in social science have a sample standard deviation of $107. Use a 95% level of confidence to
find a confidence interval for standard deviation.
14. Read the attached article on the CBS News poll regarding the birth control pill.
a. What would the point estimator be for the proportion of adults who believe the pill has made
women’s lives better.
c. What is the margin of error for this poll as reported in the article. Assuming a 95% level of
confidence, verify this poll by calculation.
Poll: Most Say The Pill Improved Women's Lives - CBS News http://www.cbsnews.com/stories/2010/05/07/health/main6468828.shtml?...
May 7, 2010
The birth control pill was approved by the Food and Drug Administration in 1960. Today, 52 percent
of Americans say it has been one of the most significant medical developments of the last 50 years,
according to the poll, conducted on May 4th and 5th.
Four in five Americans think the birth control pill has had at least some effect on American society
overall, including 41 percent who say it’s had a great deal of impact.
Even more, 54 percent, think the birth control pill has had a great deal of impact on women’s lives in
particular.
Most Americans say women’s lives were changed for the better because of the birth control pill. Only a
quarter think it made no difference, and even fewer say the pill made women’s lives worse.
Men (59 percent), women (54 percent), and women who have ever taken the pill (54 percent) say that
women’s lives were improved as a result of the birth control pill.
More specifically, Americans think the birth control pill helped women enter the work force: 57
1 of 3 5/9/2010 5:45 AM
Poll: Most Say The Pill Improved Women's Lives - CBS News http://www.cbsnews.com/stories/2010/05/07/health/main6468828.shtml?...
percent say the pill made it easier for women to have jobs and careers outside the home.
That number rises to 69 percent among Americans age 45 and over -- an age group more likely to have
felt the impact of the pill when it was first developed and put on the market. Among women age 45 and
older that figure is 64 percent.
By contrast, 53 percent of younger Americans say the birth control pill had no effect on the ability of
women to work outside the home.
Among working women, 55 percent say the birth control poll has made it easier for women to enter the
workforce.
(CBS)
Family Life and Attitudes Toward Sex
Roughly half of Americans say the birth control pill has improved American family life, while a third
doesn’t think it has had much effect.
Religion has some impact on these views. Among Catholics, whose church opposes non-natural forms
of birth control, just 38 percent believe the birth control pill has improved American family life. That
figure is 52 percent among Protestants.
Eight in ten Americans think the birth control pill has affected Americans’ attitudes toward sex,
including 51 percent who say it impacted those attitudes a great deal.
The poll finds public concerns about the safety of the birth control pill have diminished over time.
2 of 3 5/9/2010 5:45 AM
Poll: Most Say The Pill Improved Women's Lives - CBS News http://www.cbsnews.com/stories/2010/05/07/health/main6468828.shtml?...
In 1966, six years after the pill was approved by the FDA, fewer than half of Americans - 43 percent -
told a Gallup Poll that birth control pills could be used safely without danger to a person’s health.
Among women, 58 percent now think the birth control pill can be used safely, as do a similar
percentage of women who have ever taken it.
Nearly half of women think the birth control pill is just as safe as other forms of birth control, and
another 20 percent believe the pill is safer. Still, one in five thinks it is less safe. Views are similar
among women who have ever taken birth control pills.
More than eight in 10 Americans (including 82 percent of women) say birth control pills are effective.
In a 1966 Gallup Poll, a smaller number of Americans (though still a 61 percent majority) thought the
birth control pill was effective.
Some medical research has been done on a contraceptive for men similar to that of the birth control
pill. A majority of women do not think most men would take birth control pills if they were available.
In contrast, two-thirds of men think most men would take the pill if it were available.
This poll was conducted among a random sample of 591 adults nationwide, interviewed by telephone
May 4-5, 2010. Phone numbers were dialed from random digit dial samples of both standard
land-line and cell phones. The error due to sampling for results based on the entire sample could be
plus or minus four percentage points. The error for subgroups is higher.
This poll release conforms to the Standards of Disclosure of the National Council on Public Polls.
3 of 3 5/9/2010 5:45 AM
Math 10 - Homework 6
Part A
1. What are the two types of hypotheses used in a hypothesis test? How are they related?
True or False?
In Exercises 3-8, determine whether the statement is true of false. If it is false, rewrite it as a true
statement.
5. If you decide to reject the null hypothesis, you can support the alternative hypothesis.
6. The level of significance is the maximum probability you allow for rejecting a null hypothesis when
it is actually true.
Stating Hypotheses
In Exercises 9-14, use the given statement to represent a
claim. Write its complement and state which is Ho and which is Ha.
9. p >.65
10. µ ≤ 128
11. σ2 ≠ 5
12. µ =1.2
13. p ≥0.45
15. You represent a chemical company that is being sued for paint damage to automobiles. You want
to support the claim that the mean repair cost per automobile is about $650. How would you
write the null and alternative hypotheses?
16. You are on a research team that is investigating the mean temperature of adult humans. The
commonly accepted claim is that the mean temperature is about 98.6°F. You want to show that
this claim is false. How would you write the null and alternative hypotheses?
17. A light bulb manufacturer claims that the mean life of a certain type of light bulb is at least 750
hours. You are skeptical of this claim and want to refute it.
18. As stated by a company's shipping department, the number of shipping errors per million
shipments has a standard deviation that is less than 3. Can you support this claim?
19. A research organization reports that 33% of the residents in Ann Arbor, Michigan are college
students. You want to reject this claim.
20. The results of a recent study show that the proportion of people in the western United States who
use seat belts when riding in a car or truck is under 84%. You want to support this claim.
PART B – Hypothesis Testing Procedure
21. In your work for a national health organization, you are asked to monitor the amount of sodium in a
certain brand of cereal. You find that a random sample of 82 cereal servings has a mean sodium content
of 232 milligrams with a standard deviation of 10 milligrams. At α = 0.01 , can you conclude that the
mean sodium content per serving of cereal is over 230 milligrams?
(a) (DESIGN) State your Hypothesis (d) (DESIGN) Determine decision rule
(pvalue method)
(a) (DESIGN) State your Hypothesis (d) (DESIGN) Determine decision rule
(critical value method)
(d) (DESIGN) State your Hypothesis (d) (DESIGN) Determine decision rule
(critical value method)
(e) (DESIGN) State Significance Level of the test (e) (DATA) Conduct the test and circle your
and explain what it means. decision
(a) (DESIGN) State your Hypothesis (d) (DESIGN) Determine decision rule
(pvalue method)
25. The geyser Old Faithful in Yellowstone National Park is claimed to erupt for on average for about three
minutes. Thirty-six observations of eruptions of the Old Faithful were recorded (time in minutes)
1.8 1.98 2.37 3.78 4.3 4.53
1.82 2.03 2.82 3.83 4.3 4.55
1.88 2.05 3.13 3.87 4.43 4.6
1.9 2.13 3.27 3.88 4.43 4.6
1.92 2.3 3.65 4.1 4.47 4.63
1.93 2.35 3.7 4.27 4.47 6.13
a. Parameter
b. Statistic
c. Statistical Inference
d. Hypothesis
e. Hypothesis Testing
f. Null Hypothesis (Ho)
g. Alternative Hypothesis (Ha)
h. Type I Error
i. Type II Error
j. Level of Significance (α)
k. Beta (β)
l. Statistical Model
m. Test Statistic
n. Model Assumptions
o. Critical value(s)
p. Rejection Region
q. p-value
r. Decision Rule
s. Power
t. Effect Size
27. A study claims more than 60% of students text-message frequently. In a poll of 1000 students, 660 students
said they text message frequently. Can you support the study’s claim? Conduct the test with α = 1%
28. 15 I-pod users were asked how many songs were on their I-pod. Here are the summary statistics of that
study:
X = 650 s = 200
a. Can you support the claim that the number of songs on a user’s I-pod is different from 500?
Conduct the test with α= 5% .
b. Can you support the claim that the population standard deviation is under 300? Conduct the test
with α= 5% .
29. Consider the design procedure in the test you conducted in Question 28a. Suppose you wanted to conduct
a Power analysis if the population mean under Ha was actually 550. Use the online Power calculator to
answer the following questions.
b. Determine Beta.
c. Determine the sample size needed if you wanted to conduct the test in Question 28a with 95%
power.
30. The drawing shown diagrams a hypothesis test for population mean design under the Null Hypothesis (top
drawing) and a specific Alternative Hypothesis (bottom drawing). The sample size for the test is 200.
i. If the test was conducted, and the p-value was .085, would the decision be Reject or Fail to Reject
the Null Hypothesis?
j. If the sample size was changed to 100, would the shaded on area on the bottom (Ha) graph
increase, decrease or stay the same?
Math 10 ‐ Homework 7
1. What is the difference between two samples that are dependent and two samples that are
independent? Give an example of two dependent samples and two independent samples.
2. What conditions are necessary in order to use the dependent samples t‐test for the mean of the
difference of two populations?
In Problems 3‐10, classify the two given samples as independent or dependent. Explain your reasoning.
3. Sample 1: The SAT scores for 35 high school students who did not take an SAT preparation course
Sample 2: The SAT scores for 40 high school students who did take an SAT preparation course
9. The table shows the braking distances (in feet) for each of four different sets of tires with the car's anti‐
lock braking system (ABS) on and with ABS off. The tests were done on ice with cars traveling at 15
miles per hour.
Tire Set 1 2 3 4
Braking distance with ABS 42 55 43 61
Braking distance without ABS 58 67 59 75
10. The table shows the heart rates (in beats per minute) of five people before exercising and after.
Person 1 2 3 4 5
Heart Rate before Exercising 42 55 43 61 65
Heart Rate after Exercising 58 67 59 75 90
11. In a study testing the effects of an herbal supplement on blood pressure DATA in men, 11 randomly selected
men were given an herbal supplement for 15 weeks. The following measurements are for each subject's
diastolic blood pressure taken before and after the 15‐week treatment period. At α = .10 , can you support the
claim that systolic blood pressure was lowered?
(a) (DESIGN) State your Hypothesis (e) (DATA) Conduct the test and circle your decision
(a) (DESIGN) State your Hypothesis (d) (DESIGN) Determine decision rule
(critical value method)
Experimental 395 389 421 394 407 411 389 402 422 416 402 408 400 386 411 405 389
Conventional 362 352 380 382 413 384 400 378 419 379 384 388 372 383
(a) (DESIGN) State your Hypothesis (d) (DESIGN) Determine decision rule
(pvalue method)
1. A bicycle safety organization claims that fatal bicycle accidents are uniformly distributed throughout the
week. The table shows the day of the week for which 911 randomly selected fatal bicycle accidents
occurred. At α= 0.10, can you reject the claim that the distribution is uniform?
(a) (DESIGN) State your Hypothesis (d) (DATA) Conduct the test and circle your decision
3. In a recent SurveyUSA poll, 500 Americans adults were asked if marijuana should be legalized. The results of
the poll were cross tabulated as shown in the contingency tables below. Conduct two tests for
independence to determine if opinion about legalization of marijuana is dependent on gender or age
Male Female
Should be Legal 123 90
Should Not be Legal 127 160
a) Are gender and opinion on the stimulus dependent variables? Test using α =1%.
b) Give a possible explanation for the conclusion you came up with in part a.
5. A clinical psychologist completed a study on hyperactivity in children using one‐way ANOVA. The model was
balanced with 5 replicates per treatment. The factor was 3 types of school district (urban, rural and
suburban). Unfortunately, hackers broke into the psychologist’s computer and wiped out all the data. All
that remained was a fragment of the ANOVA table:
Fill in the table and conduct the hypotheses test that compares mean level of hyperactivity in the 3 types of
6. A sociologist was interested in commute time for workers in the Bay Area. She categorized commuters by 4
regions (North Bay, South Bay, East Bay and Peninsula) and designed a balanced model with 8 replicates per
region. Data is round trip commute time in minutes . The results and ANOVA output are shown on the next
page:
a. Test the Null Hypothesis that all regions have the same mean commute time at a significance level
of 5%. State your decision in non‐statistical language.
c. Explain the results of this experiment as if you were addressing a transportation committee. What
would you recommend?
MINITAB OUTPUT
Source DF SS MS F P
Factor 3 6392 2131 7.14 0.001
Error 28 8356 298
Total 31 14748
N Mean Grouping
South 8 49.38 A
East 8 35.00 A B
Pen 8 15.88 B
North 8 15.75 B
A test for a difference in calories due to hot dog type will be performed.
i. Design the test.
ii. Fill in the missing information in the ANOVA table on the next page.
iii. Conduct the test with an overall confidence level of 5%, including pairwise comparisons.
Source DF SS MS F p-value
Type ______ 17692 ________ ________ 0.000
Error ______ 28067 ________
Total ______ 45759
Q1 A manager is concerned that overtime (measured in hours) is contributing to more sickness (measured in sick
days) among the employees. Data records for 10 employees were sampled with the following results:
a) Find the least square line where Sick Days is dependent on Overtime. Interpret the slope.
e) What would your prediction of sick days be for an employee who works 100 hours overtime.
f) Analyze the residuals and determine which pair of data is the most unusual.
g) Explain why this model would not be appropriate for an employee who works 500 hours overtime.
Regression Analysis
r² n 10
r k 1
Dep.
Std. Error Var. SickDays
ANOVA table
Source SS df MS F p-value
Regression 80.6944 1 80.6944 15.05 .0047
Residual 42.9056 8 5.3632
Total 123.6000 9
Residua 12
Observation SickDays Predicted l 10
SickDays