0% found this document useful (0 votes)
33 views65 pages

Lecture 02 - Review of Statistics - McLave - 2 Per Page

Uploaded by

kiroboypro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views65 pages

Lecture 02 - Review of Statistics - McLave - 2 Per Page

Uploaded by

kiroboypro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

11/9/2020

Review of Basic Statistics

Statistics, 12th Edition


James T. McClave, University of Florida
Terry T Sincich, University of South Florida
ISBN13: 9780321755933

Statistics

Chapter 1: Statistics, Data and Statistical Thinking

1
11/9/2020

1.1: The Science of Statistics


◼ Statistics is the science of data. This involves
collecting, classifying, summarizing, organizing,
analyzing and interpreting numerical
information.

McClave, Statistics, 11th ed. Chapter 1: 3


Statistics, Data and Statistical Thinking

1.2: Types of Statistical


Applications

Statistics

Descriptive Inferential
Statistics Statistics

McClave, Statistics, 11th ed. Chapter 1: 4


Statistics, Data and Statistical Thinking

2
11/9/2020

1.2: Types of Statistical Applications

◼ Descriptive statistics utilizes numerical


and graphical methods to look for patterns
in a data set, to summarize the information
revealed in a data set and to present that
information in a convenient form.

McClave, Statistics, 11th ed. Chapter 1: 5


Statistics, Data and Statistical Thinking

1.2: Types of Statistical Applications

◼ Inferential statistics utilizes sample data


to make estimates, decisions, predictions
or other generalizations about a larger set
of data.

McClave, Statistics, 11th ed. Chapter 1: 6


Statistics, Data and Statistical Thinking

3
11/9/2020

1.3: Fundamental Elements of


Statistics
◼ A statistical inference is an estimate, prediction, or
some other generalization about a population based on
information contained in a sample.
◼ A measure of reliability is a statement about the degree
of uncertainty associated with a statistical inference.
◼ An inference is incomplete without a measure of its
reliability.

Based on our analysis, we think 56% of


soda drinkers prefer Pepsi to Coke, ± 5%.

McClave, Statistics, 11th ed. Chapter 1: 7


Statistics, Data and Statistical Thinking

1.3: Fundamental Elements of


Statistics
Descriptive Statistics Inferential Statistics
1. The population or 1. Population of interest
sample of interest 2. One or more variables to
2. One or more variables to be investigated
be investigated 3. The sample of
3. Tables, graphs or population units
numerical summary tools 4. The inference about the
4. Identification of patterns population based on the
in the data sample data
5. A measure of reliability
of the inference
McClave, Statistics, 11th ed. Chapter 1: 8
Statistics, Data and Statistical Thinking

4
11/9/2020

1.5: Collecting Data

◼ Designed Experiment
 Strict control over the experiment and the units
in the experiment
◼ Observational Study
 Observe units in natural settings
 No control over behavior of units
◼ Survey
 Gallup, Harris and other polls
 Nielsen
McClave, Statistics, 11th ed. Chapter 1: 9
Statistics, Data and Statistical Thinking

Statistics

Chapter 2: Methods for Describing Sets of Data

10

5
11/9/2020

2.4: Numerical Measures of


Central Tendency

◼ The mean of a set of quantitative data is the


sum of the observed values divided by the
number of values
n

x i
x= i =1

McClave, Statistics , 11th ed. Chapter 2: 11


Methods for Describing Sets of Data

11

2.4: Numerical Measures of


Central Tendency
n n

x i x i

x= i =1
= i =1

n N

◼ The mean of a sample is typically denoted by


x-bar, but the population mean is denoted by
the Greek symbol μ.

McClave, Statistics , 11th ed. Chapter 2: 12


Methods for Describing Sets of Data

12

6
11/9/2020

2.4: Numerical Measures of


Central Tendency

◼ If x1 = 1, x2 = 2, x3 = 3 and x4 = 4,

x
i =1
i

x= = (1 + 2 + 3 + 4)/4 = 10/4 = 2.5


n

McClave, Statistics , 11th ed. Chapter 2: 13


Methods for Describing Sets of Data

13

2.4: Numerical Measures of


Central Tendency

◼ The median of a set of quantitative data is


the value which is located in the middle of
the data, arranged from lowest to highest
values (or vice versa), with 50% of the
observations above and 50% below.

50% 50%

Lowest Value Median Highest Value

McClave, Statistics , 11th ed. Chapter 2: 14


Methods for Describing Sets of Data

14

7
11/9/2020

2.5: Numerical Measures of


Variability

◼ The sample variance, s2, for a sample


of n measurements is equal to the sum
of the squared distances from the
mean, divided by (n – 1).
n

 (x − x ) i
2

s2 = i =1

n −1
McClave, Statistics , 11th ed. Chapter 2: 15
Methods for Describing Sets of Data

15

2.5: Numerical Measures of


Variability

◼ The sample standard deviation, s, for


a sample of n measurements is equal to
the square root of the sample variance.

 (x − x ) i
2

s = s2 = i =1
n −1

McClave, Statistics , 11th ed. Chapter 2: 16


Methods for Describing Sets of Data

16

8
11/9/2020

2.5: Numerical Measures of


Variability

◼ Say a small data set consists of the


measurements 1, 2 and 3.
 μ=2
n

 (x − x )
i
2

s2 = i =1
= (3 − 2) 2 + (2 − 2) 2 + (1 − 2) 2  / (3 − 1)
n −1
(
s = 1 + 02 + 12 / 2 = 2 / 2 = 1
2 2
)
s = s2 = 1 = 1

McClave, Statistics , 11th ed. Chapter 2: 17


Methods for Describing Sets of Data

17

2.5: Numerical Measures of


Variability

◼ As before, Greek letters are used for


populations and Roman letters for
samples
s2 = sample variance
s = sample standard deviation
s2 = population variance
s = population standard deviation

McClave, Statistics , 11th ed. Chapter 2: 18


Methods for Describing Sets of Data

18

9
11/9/2020

2.7: Numerical Measures of


Relative Standing

◼ Percentiles: for any (large) set of n


measurements (arranged in ascending
or descending order), the pth percentile
is a number such that p% of the
measurements fall below that number
and (100 – p)% fall above it.

McClave, Statistics , 11th ed. Chapter 2: 19


Methods for Describing Sets of Data

19

2.7: Numerical Measures of


Relative Standing

◼ Finding percentiles is similar to finding the


median – the median is the 50th percentile.
 If you are in the 50th percentile for the GRE, half
of the test-takers scored better and half scored
worse than you.
 If you are in the 75th percentile, you scored
better than three-quarters of the test-takers.
 If you are in the 90th percentile, only 10% of all
the test-takers scored better than you.

McClave, Statistics , 11th ed. Chapter 2: 20


Methods for Describing Sets of Data

20

10
11/9/2020

2.7: Numerical Measures of


Relative Standing

◼ The z-score tells ◼ Sample z-score


us how many x−x
z=
standard s
deviations above ◼ Population z-score
or below the mean
x−
a particular z=
measurement is. s

McClave, Statistics , 11th ed. Chapter 2: 21


Methods for Describing Sets of Data

21

2.6: Interpreting the Standard


Deviation
◼ Since ~95% of all the
measurements will be within
2 standard deviations of the
mean, only ~5% will be
more than 2 standard
deviations from the mean.
◼ About half of this 5% will be
far below the mean, leaving
only about 2.5% of the measurements at least 2 standard
deviations above the mean.
◼ Z scores are related to the empirical rule:
For a perfectly symmetrical and mound-shaped distribution,
 ~68 % will have z-scores between -1 and 1
 ~95 % will have z-scores between -2 and 2
 ~99.7% will have z-scores between -3 and 3
McClave, Statistics , 11th ed. Chapter 2: 22
Methods for Describing Sets of Data

22

11
11/9/2020

2.8: Methods for Determining


Outliers

◼ An outlier is a measurement that is


unusually large or small relative to the
other values.
◼ Three possible causes:
 Observation, recording or data entry error
 Item is from a different population
 A rare, chance event

McClave, Statistics , 11th ed. Chapter 2: 23


Methods for Describing Sets of Data

23

2.8: Methods for Determining


Outliers

◼ The box plot is a graph representing


information about certain percentiles
for a data set and can be used to
identify outliers

McClave, Statistics , 11th ed. Chapter 2: 24


Methods for Describing Sets of Data

24

12
11/9/2020

2.8: Methods for Determining


Outliers
Lower Quartile Median Upper Quartile
(QL) 25th Percentile (QU) 75th Percentile

Minimum Value BoxPlot Maximum Value


QL +-1.5 IQD Qu + 1.5 IQD

30 35 40 45 50 55
Wins by Team at the 2007 MLB All-Star Break

• Any value falling outside 3σ , is considered and outlier.


• Any value falling outside the min and max values, should be investigated
as potential outliers
McClave, Statistics , 11th ed. Chapter 2: 25
Methods for Describing Sets of Data

25

2.8: Methods for Determining


Outliers

BoxPlot
Interquartile Range (IQR) = QU - QL

30 35 40 45 50 55
Wins by Team at the 2007 MLB All-Star Break

McClave, Statistics , 11th ed. Chapter 2: 26


Methods for Describing Sets of Data

26

13
11/9/2020

2.8: Methods for Determining


Outliers
◼ Outliers and z-scores
Here are the descriptive
#Wins n = 30 statistics for the games
won at the All-Star
Mean 45.68 break, except one team
Sample Variance 146.69 had its total wins for
2006 recorded.
Sample Standard Deviation 12.11 That team, with 104 wins
Minimum 25 recorded, had a z-score
of (104-45.68)/12.11 =
Maximum 104 4.82.
That’s a very unlikely result,
which isn’t surprising
given what we know
about the observation.

McClave, Statistics , 11th ed. Chapter 2: 27


Methods for Describing Sets of Data

27

2.8: Methods for Determining


Outliers
Outer Fence at
Inner Fence at QU + 1.5(IQR) QU + 3(IQR)

BoxPlot

20 30 40 50 60 70 80 90 100 110
Wins by Team at the 2007 MLB All-Star Break
(One team had its total wins for 2006 recorded)

McClave, Statistics , 11th ed. Chapter 2: 28


Methods for Describing Sets of Data

28

14
11/9/2020

2.8: Methods for Determining


Outliers

◼ Outliers and z-scores


 The chance that a z-score is between -3
and +3 is over 99%.

 Any measurement with |z| > 3 is


considered an outlier.

McClave, Statistics , 11th ed. Chapter 2: 29


Methods for Describing Sets of Data

29

Types and Uses of Graphs

McClave, Statistics, 11th ed. Chapter 1: 30


Statistics, Data and Statistical Thinking

30

15
11/9/2020

Basic Graphs
◼ Bar graphs to show numbers that are
independent of each other. Example data might
include things like the number of people who
preferred each of Chinese takeaways, Indian
takeaways and fish and chips.
◼ Pie charts to show you how a whole is divided
into different parts. You might, for example, want
to show how a budget had been spent on
different items in a particular year.
◼ Line graphs show you how numbers have
changed over time. They are used when you
have data that are connected, and to show
trends, for example, average night time
temperature in each month of the year.
McClave, Statistics, 11th ed. Chapter 1: 31
Statistics, Data and Statistical Thinking

31

Advanced Graphs

McClave, Statistics, 11th ed. Chapter 1: 32


Statistics, Data and Statistical Thinking

32

16
11/9/2020

Dashboards

McClave, Statistics, 11th ed. Chapter 1: 33


Statistics, Data and Statistical Thinking

33

Pictorial and Thematic Graphs

McClave, Statistics, 11th ed. Chapter 1: 34


Statistics, Data and Statistical Thinking

34

17
11/9/2020

Statistics

Chapter 4: Discrete Random Variables

35

4.1: Two Types of Random


Variables
◼ A random variable is a variable hat assumes
numerical values associated with the random
outcome of an experiment, where one (and only
one) numerical value is assigned to each sample
point.
◼ A discrete random variable can assume a
countable number of values.
 Number of steps to the top of the Eiffel Tower*
◼ A continuous random variable can assume any
value along a given interval of a number line.
 The time a tourist stays at the top
once s/he gets there
McClave, Statistics, 11th ed. Chapter 4: 36
Discrete Random Variables

36

18
11/9/2020

4.1: Two Types of Random


Variables
◼ Discrete random variables
 Number of sales
 Number of calls
 Shares of stock
 People in line
 Mistakes per page
◼ Continuous random
variables
 Length
 Depth
 Volume
 Time
 Weight
McClave, Statistics, 11th ed. Chapter 4: 37
Discrete Random Variables

37

4.2: Probability Distributions


for Discrete Random Variables

◼ The probability distribution of a


discrete random variable is a graph,
table or formula that specifies the
probability associated with each
possible outcome the random variable
can assume.
 p(x) ≥ 0 for all values of x
 p(x) = 1

McClave, Statistics, 11th ed. Chapter 4: 38


Discrete Random Variables

38

19
11/9/2020

4.2: Probability Distributions


for Discrete Random Variables
x P(x)

◼ Say a random variable 1 .30

x follows this pattern: 2 .21


3 .15
p(x) = (.3)(.7)x-1
4 .11
for x > 0. 5 .07
 This table gives the 6 .05
probabilities (rounded
7 .04
to two digits) for x
8 .02
between 1 and 10.
9 .02
10 .01
McClave, Statistics, 11th ed. Chapter 4: 39
Discrete Random Variables

39

4.3: Expected Values of


Discrete Random Variables
◼ The mean, or expected value, of a discrete
random variable is

 = E ( x) =  xp( x).
◼ The variance of a discrete random variable x is
s 2 = E[( x −  ) 2 ] =  ( x −  ) 2 p( x).

◼ The standard deviation of a discrete random


variable x is
s 2 = E[( x −  )2 ] =  (x − ) 2
p( x).
McClave, Statistics, 11th ed. Chapter 4: 40
Discrete Random Variables

40

20
11/9/2020

4.3: Expected Values of


Discrete Random Variables
◼ In a roulette wheel in a U.S. casino, a $1 bet on
“even” wins $1 if the ball falls on an even
number (same for “odd,” or “red,” or “black”).
◼ The odds of winning this bet are 47.37%

P ( win $1) = .4737


P (lose $1) = .5263
 = +$1  .4737 − $1  .5263 = −.0526
s = .9986

On average, bettors lose about a nickel for each dollar they put down on a bet like this.
(These are the best bets for patrons.)

McClave, Statistics, 11th ed. Chapter 4: 41


Discrete Random Variables

41

4.4: The Binomial Distribution

◼ A Binomial Random Variable


 n identical trials
 Two outcomes: Success or Failure
 P(S) = p ; P(F) = q = 1 – p
 Trials are independent
 x is the number of Successes in n trials
 Head or tail, pass or fail, male or female,
defective or non-defective.

McClave, Statistics, 11th ed. Chapter 4: 42


Discrete Random Variables

42

21
11/9/2020

4.4: The Binomial Distribution


◼ A Binomial Random
Variable (Flipping a coin)
 n identical trials Flip a coin 3 times
 Two outcomes: Success Outcomes are Heads or Tails
or Failure
 P(S) = p ; P(F) = q = 1 – p P(H) = .5; P(F) = 1-.5 = .5
 Trials are independent A head on flip i doesn’t
 x is the number of S’s in n change P(H) of flip i + 1
trials

McClave, Statistics, 11th ed. Chapter 4: 43


Discrete Random Variables

43

4.4: The Binomial Distribution

Results of 3 flips Probability Combined Summary


HHH (p)(p)(p) p3 (1)p3q0
HHT (p)(p)(q) p2q
HTH (p)(q)(p) p2q (3)p2q1
THH (q)(p)(p) p2q
HTT (p)(q)(q) pq2
THT (q)(p)(q) pq2 (3)p1q2
TTH (q)(q)(p) pq2
TTT (q)(q)(q) q3 (1)p0q3
McClave, Statistics, 11th ed. Chapter 4: 44
Discrete Random Variables

44

22
11/9/2020

4.4: The Binomial Distribution

◼ The Binomial Probability Distribution


 p = P(S) on a single trial
 q=1–p
 n = number of trials
 x = number of successes

 n  x n− x
P( x) =   p q
 x
McClave, Statistics, 11th ed. Chapter 4: 45
Discrete Random Variables

45

4.4: The Binomial Distribution

◼ The Binomial Probability Distribution

 n
P( x) =   p x q n − x
 x
McClave, Statistics, 11th ed. Chapter 4: 46
Discrete Random Variables

46

23
11/9/2020

4.4: The Binomial Distribution

◼ Say 40% of the n


class is female. P( x) =   p x q n − x
 x
◼ What is the 10 
probability that 6 =  (.46 )(.610−6 )
of the first 10 6
students walking = 210(.004096)(.1296)
in will be female? = .1115

McClave, Statistics, 11th ed. Chapter 4: 47


Discrete Random Variables

47

4.4: The Binomial Distribution

◼ A Binomial Random Variable has

Mean  = np
Variance s 2 = npq
Standard Deviation s = npq

McClave, Statistics, 11th ed. Chapter 4: 48


Discrete Random Variables

48

24
11/9/2020

4.4: The Binomial Distribution

◼ For 1,000 coin flips,


 = np = 1000  .5 = 500
s 2 = npq = 1000  .5  .5 = 250
s = npq = 250  16
The actual probability of getting exactly 500 heads out of 1000 flips is
just over 2.5%, but the probability of getting between 484 and 516 heads
(that is, within one standard deviation of the mean) is about 68%.

McClave, Statistics, 11th ed. Chapter 4: 49


Discrete Random Variables

49

4.5: The Poisson Distribution


◼ Evaluates the probability of a (usually small)
number of occurrences out of many opportunities
in a …
 Period of time
 Area
 Volume
 Weight
 Distance
 Other units of measurement
 No of traffic accidents per month at a busy intersection,
No of surface defects on a car body, No of death per
day, etc …
McClave, Statistics, 11th ed. Chapter 4: 50
Discrete Random Variables

50

25
11/9/2020

4.5: The Poisson Distribution


x e − 
P( x) =
x!
◼ = mean number of occurrences in the
given unit of time, area, volume, etc.
◼ e = 2.71828….
◼ µ=
◼ 2 =

McClave, Statistics, 11th ed. Chapter 4: 51


Discrete Random Variables

51

4.5: The Poisson Distribution


◼ Say in a given stream there are an average
of 3 striped trout per 100 yards. What is the
probability of seeing 5 striped trout in the
next 100 yards, assuming a Poisson
distribution?

x e −  35 e −3
P( x = 5) = = = .1008
x! 5!

McClave, Statistics, 11th ed. Chapter 4: 52


Discrete Random Variables

52

26
11/9/2020

4.5: The Poisson Distribution


◼ How about in the next 50 yards, assuming a
Poisson distribution?
 Since the distance is only half as long, is only
half as large.

x e −  1.55 e −1.5
P( x = 5) = = = .0141
x! 5!

McClave, Statistics, 11th ed. Chapter 4: 53


Discrete Random Variables

53

Chapter 5: Continuous Random


Variables

54

27
11/9/2020

5.1: Continuous Probability


Distributions

◼ A continuous random variable can


assume any numerical value within some
interval or intervals.
◼ The graph of the probability distribution is a
smooth curve called a
 probability density function,
 frequency function or
 probability distribution.

McClave: Statistics, 11th ed. Chapter 5: 55


Continuous Random Variables

55

5.1: Continuous Probability


Distributions

◼ There are an infinite


number of possible
outcomes
 p(x) = 0
 Instead, find p(a<x<b)
☺ Table
☺ Software
 Integral calculus

McClave: Statistics, 11th ed. Chapter 5: 56


Continuous Random Variables

56

28
11/9/2020

5.2: The Uniform Distribution


◼ X can take on any value between
c and d with equal probability
= 1/ (d - c)
◼ For two values a and b

b−a
P ( a  x  b) = c+d
d −c Mean: =
cabd 2
d −c
Standard Deviation: s=
McClave: Statistics, 11th ed. Chapter 5:
Continuous Random Variables
12 57

57

5.2: The Uniform Distribution

Suppose a random variable x is


distributed uniformly with
c = 5 and d = 25.
What is P(10 < x < 18)?
18 − 10
P(10  x  18) = = .40
25 − 5

McClave: Statistics, 11th ed. Chapter 5: 58


Continuous Random Variables

58

29
11/9/2020

5.3: The Normal Distribution


◼ Closely approximates many situations
 Perfectly symmetrical around its mean
◼ The probability density function f(x):
[( x −  ) / s ]2
1 −
f ( x) = e 2

s 2
µ = the mean of x
= the standard deviation of x
= 3.1416…
e = 2.71828 …
McClave: Statistics, 11th ed. Chapter 5: 59
Continuous Random Variables

59

5.3: The Normal Distribution


◼ Each combination of µ and produces a
unique normal curve
◼ The standard normal curve is used in
practice, based on the standard normal
random variable z (µ = 0, σ = 1), with the
probability distribution
z2
1 −2
f ( z) = e
2
The probabilities for z are given in Table IV
McClave: Statistics, 11th ed. Chapter 5: 60
Continuous Random Variables

60

30
11/9/2020

5.3: The Normal Distribution


P(0  z  1.00) = .3413

P(−1.00  z  0) = .3413

P(−1  z  1) = .3413 + .3413


= .6826

P(1  z  1.25) =
P(0  z  1.25) − P(0  z  1.00)
= .3944 − .3413 = .0531

McClave: Statistics, 11th ed. Chapter 5: 61


Continuous Random Variables

61

McClave, Statistics, 11th ed. Chapter 1: 62


Statistics, Data and Statistical Thinking

62

31
11/9/2020

McClave, Statistics, 11th ed. Chapter 1: 63


Statistics, Data and Statistical Thinking

63

5.3: The Normal Distribution

For a normally
So any normally
distributed random
variable x, if we know distributed variable
µ and , can be analyzed
xi −  with this single
zi = distribution
s

McClave: Statistics, 11th ed. Chapter 5: 64


Continuous Random Variables

64

32
11/9/2020

5.3: The Normal Distribution


◼ Say a toy car goes an average of 3,000 yards between
recharges, with a standard deviation of 50 yards (i.e., µ
= 3,000 and = 50)
◼ What is the probability that the car will go more than
3,100 yards without recharging?

 3100 − 3000 
P( x  3100) = P z  =
 50 
P( z  2.00) = 1 − P( z  2.00) =
1 − .5 − P(0  z  2.00) =
1 − .5 − .4772 = .0228
McClave: Statistics, 11th ed. Chapter 5: 65
Continuous Random Variables

65

5.6: The Exponential


Distribution

◼ Probability Distribution for an Exponential


Random Variable x
 Probability Density Function
1
f ( x) = e − x / ( x  0)

 Mean: µ = 

 Standard Deviation: =
McClave: Statistics, 11th ed. Chapter 5: 66
Continuous Random Variables

66

33
11/9/2020

5.6: The Exponential


Distribution
Suppose the waiting time to see the nurse at the student
health center is distributed exponentially with a mean of
45 minutes. What is the probability that a student will
wait more than an hour to get his or her generic pill?

a

P( x  a) = e 

60

P ( x  60) = e 45

= e −1.33 = .2645
60
McClave: Statistics, 11th ed. Chapter 5: 67
Continuous Random Variables

67

6.3: The Sampling Distribution of


and the Central Limit Theorem
Properties of the Sampling Distribution of

The mean of the The standard deviation


sampling distribution of the sampling
equals the mean of the distribution [the
population standard error (of the
mean)] equals the
 x = ( E x) =  population standard
deviation divided by the
square root of n
s
sx =
n
McClave: Statistics, 11th ed. Chapter 6: 68
Sampling Distributions

68

34
11/9/2020

6.3: The Sampling Distribution of


and the Central Limit Theorem
Here’s our small population again, this time with the standard deviations of the
sample means. Notice the mean of the sample means in each case equals the
population mean and the standard error falls as n increases.

n=1 n=2 n = 3 ( = N)
1 1 1, 2 1.5
2 2 1, 3 2 1, 2, 3 2
3 3 2, 3 2.5

x =2 x =2 x =2
3 3 1
s x = .82 s x = .41 sx = 0

McClave: Statistics, 11th ed. Chapter 6: 69


Sampling Distributions

69

6.3: The Sampling Distribution of


and the Central Limit Theorem

◼ If a random sample of n observations is


drawn from a normally distributed
population, the sampling distribution of
will be normally distributed

McClave: Statistics, 11th ed. Chapter 6: 70


Sampling Distributions

70

35
11/9/2020

6.3: The Sampling Distribution of


and the Central Limit Theorem
◼ The Central Limit Theorem
The sampling distribution of ,
based on a random sample
of n observations, will be
approximately normal with
µ = µ and s = s/√n.
The larger the sample size, the
better the sampling
distribution will approximate
the normal distribution.

McClave: Statistics, 11th ed. Chapter 6: 71


Sampling Distributions

71

Statistics

Chapter 7: Inferences Based on a


Single Sample: Estimation with
Confidence Interval

72

36
11/9/2020

7.1: Identifying the Target


Parameter

◼ The unknown population parameter


that we are interested in estimating is
called the target parameter.
Parameter Key Word or Phrase Type of Data
µ Mean, average Quantitative

p Proportion, percentage, Qualitative


fraction, rate

McClave, Statistics, 11th ed. Chapter 7: 73


Inferences Based on a Single Sample:
Estimation by Confidence Intervals

73

7.2: Large-Sample Confidence


Interval for a Population Mean

◼ A point estimator of a population


parameter is a rule or formula that tells
us how to use the sample data to
calculate a single number that can be
used to estimate the population
parameter.

McClave, Statistics, 11th ed. Chapter 7: 74


Inferences Based on a Single Sample:
Estimation by Confidence Intervals

74

37
11/9/2020

7.2: Large-Sample Confidence


Interval for a Population Mean

◼ Suppose a sample of 225 college


students watch an average of 28 hours
of television per week, with a standard
deviation of 10 hours.
 What can we conclude about all college
students’ television time?

McClave, Statistics, 11th ed. Chapter 7: 75


Inferences Based on a Single Sample:
Estimation by Confidence Intervals

75

7.2: Large-Sample Confidence


Interval for a Population Mean

◼ Assuming a normal distribution for television


hours, we can be 95%* sure that
s
 = x  1.96
n
10
 = 28  1.96
225
 = 28  1.96 (.67)
 = 28  1.31
*In the standard normal
distribution, exactly 95% of the
McClave, Statistics, 11th ed. Chapter 7:
area under the curve is in76the
Inferences Based on a Single Sample: interval
Estimation by Confidence Intervals -1.96 … +1.96

76

38
11/9/2020

7.2: Large-Sample Confidence


Interval for a Population Mean
◼ An interval estimator or confidence interval is a
formula that tell us how to use sample data to
calculate an interval that estimates a population
parameter.

 = x  zs x

McClave, Statistics, 11th ed. Chapter 7: 77


Inferences Based on a Single Sample:
Estimation by Confidence Intervals

77

7.2: Large-Sample Confidence


Interval for a Population Mean
◼ The confidence coefficient is the probability that a
randomly selected confidence interval encloses the
population parameter.
◼ The confidence level is the confidence coefficient
expressed as a percentage.
(90%, 95% and 99% are very commonly used.)
95% sure  = x  zs x
◼ The area outside the confidence interval is called α

So we are left with (1 – 95)% = 5% = α uncertainty about µ

McClave, Statistics, 11th ed. Chapter 7: 78


Inferences Based on a Single Sample:
Estimation by Confidence Intervals

78

39
11/9/2020

7.2: Large-Sample Confidence


Interval for a Population Mean

◼ Large-Sample (1-α)% Confidence


Interval for µ
s
 = x  za / 2s x = x  za / 2
n
◼ If is unknown and n is large, the
confidence interval becomes
s
  x  za / 2 s x = x  za / 2
n
McClave, Statistics, 11th ed. Chapter 7: 79
Inferences Based on a Single Sample:
Estimation by Confidence Intervals

79

7.2: Large-Sample Confidence


Interval for a Population Mean
For the confidence
interval to be valid …

the sample must be


random and …

the sample size n


must be large.
If n is large, the sampling distribution of the sample
mean is normal, and s is a good estimate of s
McClave, Statistics, 11th ed. Chapter 7: 80
Inferences Based on a Single Sample:
Estimation by Confidence Intervals

80

40
11/9/2020

7.3: Small-Sample Confidence


Interval for a Population Mean
Large Sample Small Sample
◼ Sampling Distribution on ◼ Sampling Distribution on
is normal is unknown
◼ Known or large n ◼ Unknown and small n

Standard Normal (z) Student’s t Distribution


Distribution (with n-1 degrees of
freedom)
s s
 = x  za / 2  = x  ta / 2
n n
McClave, Statistics, 11th ed. Chapter 7: 81
Inferences Based on a Single Sample:
Estimation by Confidence Intervals

81

7.3: Small-Sample Confidence


Interval for a Population Mean
Large Sample Small Sample
s s
 = x  za / 2  = x  ta / 2
n n

McClave, Statistics, 11th ed. Chapter 7: 82


Inferences Based on a Single Sample:
Estimation by Confidence Intervals

82

41
11/9/2020

McClave, Statistics, 11th ed. Chapter 1: 83


Statistics, Data and Statistical Thinking

83

Statistics

Chapter 8: Inferences Based on a


Single Sample: Tests of Hypotheses

84

42
11/9/2020

8.1: The Elements of a Test of


Hypotheses

Confidence Interval
µ? Where on the number line do the data point us? µ?
(No prior idea about the value of the parameter.)

Hypothesis Test
Do the data point us to this particular value? µ0?
(We have a value in mind from the outset.)

McClave, Statistics, 11th ed. Chapter 8: Inferences 85


Based on a Single Sample: Tests of Hypotheses

85

8.1: The Elements of a Test of


Hypotheses

Null Hypothesis: H0
•This will be supported
unless the data provide
evidence that it is false
• The status quo
Alternative Hypothesis: Ha
•This will be supported if
the data provide sufficient
evidence that it is true
• The research hypothesis

McClave, Statistics, 11th ed. Chapter 8: Inferences 86


Based on a Single Sample: Tests of Hypotheses

86

43
11/9/2020

8.1: The Elements of a Test of


Hypotheses

◼ A test statistic is a numerical value


computed from the sample (sample
statistic), it is used to decide between the
null and alternative hypothesis.
◼ If the test statistic has a high probability
when H0 is true, then H0 is not rejected.
◼ If the test statistic has a (very) low
probability when H0 is true, then H0 is
rejected.
McClave, Statistics, 11th ed. Chapter 8: Inferences 87
Based on a Single Sample: Tests of Hypotheses

87

8.1: The Elements of a Test of


Hypotheses

McClave, Statistics, 11th ed. Chapter 8: Inferences 88


Based on a Single Sample: Tests of Hypotheses

88

44
11/9/2020

8.1: The Elements of a Test of


Hypotheses
Reality ↓ / Test Result → Do not reject H0 Reject H0

Type I Error:

H0 is true Correct! rejecting a true null


hypothesis
P(Type I error) = α
Type II Error: not

H0 is false
rejecting a false null
hypothesis
Correct!
P(Type II error) = β

Note: Null hypotheses are either rejected, or else there is insufficient evidence
to reject them. (I.e., we don’t accept null hypotheses, but, we fail to reject it.)

McClave, Statistics, 11th ed. Chapter 8: Inferences 89


Based on a Single Sample: Tests of Hypotheses

89

8.1: The Elements of a Test of


Hypotheses
• Null hypothesis (H0): A theory about the values of one or more parameters
• Ex.: H0: µ = µ0 (a specified value for µ)
• Alternative hypothesis (Ha): Contradicts the null hypothesis
• Ex.: Ha: µ ≠ µ0
• Test Statistic: The sample statistic to be used to test the hypothesis
• Rejection region: The values for the test statistic which lead to rejection of
the null hypothesis
• Assumptions: Clear statements about any assumptions concerning the
target population
• Experiment and calculation of test statistic: The appropriate calculation for
the test based on the sample data
• Conclusion: Reject the null hypothesis (with possible Type I error) or do
not reject it (with possible Type II error)

McClave, Statistics, 11th ed. Chapter 8: Inferences 90


Based on a Single Sample: Tests of Hypotheses

90

45
11/9/2020

8.1: The Elements of a Test of


Hypotheses
Suppose a new interpretation of the rules by
soccer referees is expected to increase the
number of yellow cards per game. The
average number of yellow cards per game
had been 4. A sample of 121 matches
produced an average of 4.7 yellow cards
per game, with a standard deviation of .5
cards. At the 5% significance level, has
there been a change in infractions called?

McClave, Statistics, 11th ed. Chapter 8: Inferences 91


Based on a Single Sample: Tests of Hypotheses

91

8.1: The Elements of a Test of


Hypotheses
H0: µ = 4
Ha: µ ≠ 4
Sample statistic: = 4.7
α = .05

Assume the sampling distribution is normal.


x − 0 4.7 − 4
Test statistic: z* = = = 10.94
sx .064

Conclusion: z.05 = 1.96. Since z* > z.05 , reject H0.


(That is, there do seem to be more yellow cards.)

McClave, Statistics, 11th ed. Chapter 8: Inferences 92


Based on a Single Sample: Tests of Hypotheses

92

46
11/9/2020

8.2: Large-Sample Test of a


Hypothesis about a Population Mean

The null hypothesis is


usually stated as an
equality … H0: µ = µ0

Ha: µ < µ0 Ha: µ ≠ µ0 Ha: µ > µ0

… even though the alternative hypothesis


can be either an equality or an inequality.

McClave, Statistics, 11th ed. Chapter 8: Inferences 93


Based on a Single Sample: Tests of Hypotheses

93

8.2: Large-Sample Test of a


Hypothesis about a Population Mean

McClave, Statistics, 11th ed. Chapter 8: Inferences 94


Based on a Single Sample: Tests of Hypotheses

94

47
11/9/2020

8.2: Large-Sample Test of a


Hypothesis about a Population Mean
Rejection Regions for Common Values of α

Lower Tailed Upper Tailed Two tailed


α = .10 z < - 1.28 z > 1.28 | z | > 1.645
α = .05 z < - 1.645 z > 1.645 | z | > 1.96
α = .01 z < - 2.33 z > 2.33 | z | > 2.575

McClave, Statistics, 11th ed. Chapter 8: Inferences 95


Based on a Single Sample: Tests of Hypotheses

95

8.2: Large-Sample Test of a


Hypothesis about a Population Mean
One-Tailed Test Two-Tailed Test
◼ H0 : µ = µ0 ◼ H0 : µ = µ0

Ha : µ < or > µ0 Ha : µ ≠ µ0
x − 0 x − 0
Test Statistic: z = Test Statistic: z =
sx sx
Rejection Region: | z | > z α Rejection Region: | z | > z α /2

Conditions: 1) A random sample is selected from the target population.


2) The sample size n is large.

McClave, Statistics, 11th ed. Chapter 8: Inferences 96


Based on a Single Sample: Tests of Hypotheses

96

48
11/9/2020

8.2: Large-Sample Test of a


Hypothesis about a Population Mean
The Economics of ◼ H0 : µ = 60,000
Education Review (Vol. Ha : µ ≠ 60,000
21, 2002) reported a Test Statistic:
mean salary for males x − 0
with postgraduate z=
sx
degrees of $61,340, with
61,340 − 60,000
an estimated standard z=
error (s ) equal to 2,185
$2,185. We wish to test, z = .613
at the α = .05 level,H0: µ
= $60,000. Rejection Region: | z | > z.025 = 1.96

Do not reject H0
McClave, Statistics, 11th ed. Chapter 8: Inferences 97
Based on a Single Sample: Tests of Hypotheses

97

8.3:Observed Significance Levels:


p - Values

Suppose z = 2.12.
P(z > 2.12) = .0170.

Reject H0 at the α = .05 level Do not reject H0 at the α = .01 level

But it’s pretty close, isn’t it?

McClave, Statistics, 11th ed. Chapter 8: Inferences 98


Based on a Single Sample: Tests of Hypotheses

98

49
11/9/2020

8.3:Observed Significance Levels:


p - Values

The observed significance level, or p-value, for


a test is the probability of observing the results
actually observed (z*) assuming the null
hypothesis is true.

P ( z  z* | H 0 )

The lower this probability, the less likely H0 is true.

McClave, Statistics, 11th ed. Chapter 8: Inferences Based 99


on a Single Sample: Tests of Hypotheses

99

8.3:Observed Significance Levels:


p - Values
H0 : µ = 65,000
Let’s go back to the
Ha : µ ≠ 65,000
Economics of Test Statistic:
Education Review
x − 0
report ( = $61,340, z=
sx
s = $2,185). This
61,340 − 65,000
time we’ll test z=
2,185
H0: µ = $65,000.
z = −1.675

p-value: P( < 61,340 |H0 ) =


P(|z| > 1.675) = .0475

McClave, Statistics, 11th ed. Chapter 8: Inferences 100


Based on a Single Sample: Tests of Hypotheses

100

50
11/9/2020

8.3:Observed Significance Levels:


p - Values

◼ Reporting test results


 Choose the maximum tolerable value of α
 If the p-value < α, reject H0
If the p-value > α, do not reject H0

McClave, Statistics, 11th ed. Chapter 8: Inferences 101


Based on a Single Sample: Tests of Hypotheses

101

8.4: Small-Sample Test of a


Hypothesis about a Population Mean
If the sample is small and σ is unknown,
testing hypotheses about µ requires the
t-distribution instead of the z-distribution.

x − 0
t=
s/ n

McClave, Statistics, 11th ed. Chapter 8: Inferences 102


Based on a Single Sample: Tests of Hypotheses

102

51
11/9/2020

8.4: Small-Sample Test of a


Hypothesis about a Population Mean
One-Tailed Test Two-Tailed Test
◼ H0 : µ = µ0 ◼ H0 : µ = µ0

Ha : µ < or > µ0 Ha : µ ≠ µ0

x − 0 x − 0
Test Statistic: t = Test Statistic: t =
s/ n s/ n
Rejection Region: | t | > t α Rejection Region: | t | > t α /2
Conditions: 1) A random sample is selected from the target population.
2) The population from which the sample is selected is
approximately normal.
3) The value of t α is based on (n – 1) degrees of freedom
McClave, Statistics, 11th ed. Chapter 8: Inferences 103
Based on a Single Sample: Tests of Hypotheses

103

8.4: Small-Sample Test of a


Hypothesis about a Population Mean
Suppose copiers H0 : µ = 100,000
average 100,000 Ha : µ > 100,000
between paper jams. A Test Statistic:
salesman claims his are x − 0
better, and offers to t=
s/ n
leave 5 units for testing.
100,987 − 100,000
The average number of t=
copies between jams is 157 / 5
100,987, with a standard t = 14.06
deviation of 157. Does
his claim seem p-value: P( > 100,987|H0 ) =
P(|tdf=4| > 14.06) < .001
believable?
McClave, Statistics, 11th ed. Chapter 8: Inferences 104
Based on a Single Sample: Tests of Hypotheses

104

52
11/9/2020

8.4: Small-Sample Test of a


Hypothesis about a Population Mean
Suppose copiers HReject
0 : µ = 100,000
the null hypothesis
average 100,000 based on the very low
Ha : µ > 100,000
between paper jams. A Test Statistic:of seeing the
probability
observed results if the null
salesman claims his are x−
were true.
better, and offers to = claim0 does seem
So,t the
s/ n
plausible.
leave 5 units for testing.
100,987 − 100,000
The average number of t=
copies between jams is 157 / 5
100,987, with a standard t = 14.06
deviation of 157. Does
his claim seem p-value: P( > 100,987|H0 ) =
believable? P(|tdf=4| > 14.06) < .001

McClave, Statistics, 11th ed. Chapter 8: Inferences 105


Based on a Single Sample: Tests of Hypotheses

105

Statistics

Chapter 9: Inferences Based


on Two Samples: Confidence
Intervals and Tests of
Hypotheses

106

53
11/9/2020

9.1: Identifying the Target


Parameter

µ1 - µ2 p1 - p 2 σ12/σ22

Difference Ratio of
Mean difference; between variances;
difference in proportions, difference in
averages percentages, variability or
fractions or spread; compare
rates; compare variation
proportions

Quantitative Data Qualitative Data Quantitative Data

McClave, Statistics, 11th ed. Chapter 9: 107


Inferences Based on Two Samples

107

9.2: Comparing Two Population


Means: Independent Sampling
Point Estimators
→µ 1 - 2 → µ1 - µ2

To construct a confidence interval or conduct a


hypothesis test, we need the standard deviation:

Singe sample sˆ x = s n

s12 s22
Two samples sˆ x1 − x2 = +
n1 n2

McClave, Statistics, 11th ed. Chapter 9: 108


Inferences Based on Two Samples

108

54
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
The Sampling Distribution for ( 1 - 2)
1. The mean of the sampling distribution is (µ1-µ2).
2. If the two samples are independent, the standard
deviation of the sampling distribution (the
standard error) is
s12 s22
sˆ x1 − x2 = +
n1 n2
3. The sampling distribution for ( 1 - 2) is
approximately normal for large samples.
McClave, Statistics, 11th ed. Chapter 9: 109
Inferences Based on Two Samples

109

9.2: Comparing Two Population


Means: Independent Sampling
The Sampling Distribution for ( 1 - 2)

McClave, Statistics, 11th ed. Chapter 9: 110


Inferences Based on Two Samples

110

55
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
Large Sample Confidence Interval for (µ1 - µ2 )

s 12 s 22
( x1 − x2 )  z / 2s ( x1 − x2 ) = ( x1 − x2 )  z / 2 +
n1 n2
s12 s22
 ( x1 − x2 )  z / 2 +
n1 n2

McClave, Statistics, 11th ed. Chapter 9: 111


Inferences Based on Two Samples

111

9.2: Comparing Two Population


Means: Independent Sampling
Two samples concerning retention rates for first-year students at private and
public institutions were obtained from the Department of Education’s data
base to see if there was a significant difference in the two types of colleges.
Private Colleges Public Universities
◼ n: 71 ◼ n: 32
◼ Mean: 78.17 ◼ Mean: 84
◼ Standard Deviation: 9.55 ◼ Standard Deviation: 9.88
◼ Variance: 91.17 ◼ Variance: 97.64
What does a 95% confidence interval tell us about retention rates?
Source: National Center for Education Statistics

McClave, Statistics, 11th ed. Chapter 9: 112


Inferences Based on Two Samples

112

56
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
Private Colleges Public Universities
◼ n: 71 ◼ n: 32
◼ Mean: 78.17 ◼ Mean: 84
◼ Standard Deviation: 9.55 ◼ Standard Deviation: 9.88
◼ Variance: 91.17 ◼ Variance: 97.64
s12 s22
( x1 − x2 )  z / 2s ( x1 − x2 )  ( x1 − x2 )  z / 2 +
n1 n2
91.1 97.64
78.17 − 84  1.96 +
71 32
− 5.83  4.08
McClave, Statistics, 11th ed. Chapter 9: 113
Inferences Based on Two Samples

113

9.2: Comparing Two Population


Means: Independent Sampling
Private Colleges Public Universities
◼ n: 71 ◼ n: 32

◼ Mean: 78.17 ◼ Mean: 84

◼ Standard Deviation: 9.55 ◼ Standard Deviation: 9.88


◼ Variance: 91.17 Since 0 is not
◼ in Variance: 97.64
the confidence
s12 s22
( x1 − x2 )  z / 2s (interval,
x1 − x2 )
difference
the
( x − x
in1 the 2
)  z /2 +
n1 n2
sample means
appears to indicate
91.in1 97.64
78.17 − a84real  1difference
.96 +
retention. 71 32
− 5.83  4.08
McClave, Statistics, 11th ed. Chapter 9: 114
Inferences Based on Two Samples

114

57
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
One-Tailed Test Two-Tailed Test
H0: (µ1 - µ2) = D0 H0: (µ1 - µ2) = D0
Ha: (µ1 - µ2) > D0 (< D0) Ha: (µ1 - µ2) ≠ D0
Rejection region: Rejection region:
z < -zα (> zα) |z| > zα/2

( x1 − x2 ) − D0
Test Statistic: z=
s (x −x1 2)

s 2
s 22 s12 s22
where s (x −x ) = 1
+  +
1 2
n1 n2 n1 n2
McClave, Statistics, 11th ed. Chapter 9: 115
Inferences Based on Two Samples

115

9.2: Comparing Two Population


Means: Independent Sampling

Conditions Required for Valid Large-


Sample Inferences about (µ1 - µ2)
1. The two samples are randomly and
independently selected from the target
populations.
2. The sample sizes are both ≥ 30.

McClave, Statistics, 11th ed. Chapter 9: 116


Inferences Based on Two Samples

116

58
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
Let’s go back to the retention data and test the hypothesis that there is no
significant difference in retention at privates and publics.

Private Colleges Public Universities


◼ n: 71 ◼ n: 32
◼ Mean: 78.17 ◼ Mean: 84
◼ Standard Deviation: 9.55 ◼ Standard Deviation: 9.88
◼ Variance: 91.17 ◼ Variance: 97.64
s12 s22 91.17 97.64
s ( x −x )  + = +  2.08
1 2
n1 n2 71 32
McClave, Statistics, 11th ed. Chapter 9: 117
Inferences Based on Two Samples

117

9.2: Comparing Two Population


Means: Independent Sampling
H 0 : ( 1 −  2 ) = 0
H a : ( 1 −  2 )  0
 = .05

( x1 − x2 ) − 0 − 5.83
Test statistic: z= = = −2.799
s (x −x1 2)
2.08

Reject the null hypothesis: | z | z / 2


2.799  1.96
McClave, Statistics, 11th ed. Chapter 9: 118
Inferences Based on Two Samples

118

59
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling

For small samples, the t-distribution


can be used with a pooled sample
estimator of σ2, sp2
(n1 − 1) s12 + (n2 − 1) s22
s =
2

n1 + n2 − 2
p

McClave, Statistics, 11th ed. Chapter 9: 119


Inferences Based on Two Samples

119

9.2: Comparing Two Population


Means: Independent Sampling
Small Sample Confidence Interval for (µ1 - µ2 )

1 1
( x1 − x2 )  t / 2s ( x1 − x2 ) = ( x1 − x2 )  t / 2 s 2p  + 
 n1 n2 

The value of t is based on (n1 + n2 -2) degrees of freedom.

McClave, Statistics, 11th ed. Chapter 9: 120


Inferences Based on Two Samples

120

60
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
One-Tailed Test Two-Tailed Test
H0: (µ1 - µ2) = D0 H0: (µ1 - µ2) = D0*
Ha: (µ1 - µ2) > D0 (< D0) Ha: (µ1 - µ2) ≠ D0
Rejection region: Rejection region:
t < -tα (> tα) |t| > tα/2

( x1 − x2 ) − D0
Test Statistic: t=
1 1
s 2p  + 
 n1 n2 

McClave, Statistics, 11th ed. Chapter 9: 121


Inferences Based on Two Samples

121

9.2: Comparing Two Population


Means: Independent Sampling
Conditions Required for Valid Small-Sample
Inferences about (µ1 - µ2)
1. The two samples are randomly and
independently selected from the target
populations.
2. Both sampled populations have
distributions that are approximately normal.
3. The population variances are equal.

McClave, Statistics, 11th ed. Chapter 9: 122


Inferences Based on Two Samples

122

61
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
◼ Does class time affect performance?
 The test performance of students in two
sections of international trade, meeting at
different times, were compared.
8:00 a.m. Class 9:30 a.m. Class
Mean: 78 Mean: 82
Standard Deviation: 14 Standard Deviation: 17
Variance: 196 Variance: 289
n: 21 n: 21

With α = .05, test H0 : µ1 = µ2


McClave, Statistics, 11th ed. Chapter 9: 123
Inferences Based on Two Samples

123

9.2: Comparing Two Population


Means: Independent Sampling
8:00 a.m. Class 9:30 a.m. Class
Mean: 78 Mean: 82
H 0 : 1 −  2 = 0 Variance: 196 Variance: 289
n: 21 n: 21
H a : 1 −  2  0
 = .05
(n1 − 1) s12 + (n2 − 1) s22 (21 − 1)196 + (21 − 1)289
s =
2
= = 242.5
n1 + n2 − 2 21 + 21 − 2
p

( x1 − x2 ) − 0 (78 − 82) − 0
Test Statistic : t = = = −.832
1 1 1 1
s 2p  +  242.5 + 
 n1 n2   21 21 
McClave, Statistics, 11th ed. Chapter 9: 124
Inferences Based on Two Samples

124

62
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
8:00 a.m. Class 9:30 a.m. Class
Mean: 78 Mean: 82
H 0 : 1 −  2 = 0 Variance: 196 Variance: 289
n: 21 n: 21
H a : 1 −  2  0
 = .05 With df = 21 + 21 – 2 = 40, tα/2 = t.025 = 2.021.
Since out test2statistic t = -.812. |t| < t.025.
(n1 − 1) s1 + (n2 − 1) s2 (21 − 1)196 + (21 − 1)289
2
s 2p = = = 242.5
n1 + n2Do− 2not reject the null hypothesis
21 + 21 − 2
( x1 − x2 ) − 0 (78 − 82) − 0
Test Statistic : t = = = −.832
1 1 1 1
s 2p  +  242.5 + 
 n1 n2   21 21 
McClave, Statistics, 11th ed. Chapter 9: 125
Inferences Based on Two Samples

125

9.2: Comparing Two Population


Means: Independent Sampling
8:00 a.m. Class
Several students in the 8:00 a.m. section sleep through Mean: 72
Variance: 154
the next exam. We can still compare the results, n: 13
with some modifications.
H 0 : 1 − 2 = 0 9:30 a.m. Class
H a : 1 − 2  0
Mean: 86
Variance: 163
 = .05 n: 21

( x1 − x2 ) − 0 (72 − 86) − 0
Test Statistic: t = = = −3.16
 s12 s22   154 163 
 +   + 
 n1 n2   13 21 
McClave, Statistics, 11th ed. Chapter 9: 126
Inferences Based on Two Samples

126

63
11/9/2020

9.2: Comparing Two Population


Means: Independent Sampling
8:00 a.m. Class 9:30 a.m. Class
Mean: 72 Mean: 86
Variance: 154 Variance: 163
n: 13 n: 21

t = −3.16

Degrees of Freedom = v =
(s 2
1 / n1 + s22 / n2 )2

=
(154 / 13 + 163 / 21)2
(s 2
1 / n1 ) (
2
s2 / n
+ 2 2
)
2
(154 / 13)2 + (163 / 21)2
n1 − 1 n2 − 1 13 − 1 21 − 1
26.15  26
t.025,df = 26 = 2.056

McClave, Statistics, 11th ed. Chapter 9: 127


Inferences Based on Two Samples

127

9.2: Comparing Two Population


Means: Independent Sampling
8:00 a.m. Class 9:30 a.m. Class
Mean: 72 Mean: 86
Variance: 154 Variance: 163
n: 13 n: 21

t = −3.16

Degrees ofSince
Freedom = v =,,
(s 2
1 / n1 + s22 / n2 )2

=
(154 / 13 + 163 / 21)
2

|t| > t.025,df=26


( 2
s1 / n1
reject the null hypothesis. ) + (s
2 2
2/ n2 )
2
(154 / 13)2 + (163 / 21)2
n1 − 1 n2 − 1 13 − 1 21 − 1
26.15  26
t.025,df = 26 = 2.056

McClave, Statistics, 11th ed. Chapter 9: 128


Inferences Based on Two Samples

128

64
11/9/2020

Summary

◼ Descriptive vs inferential statistics.


◼ Types of graphs.
◼ Describing data.
◼ Discrete random variables.
◼ Continuous random variables.
◼ Test of hypothesis for means
 Large samples
 Small samples
McClave, Statistics, 11th ed. Chapter 1: 129
Statistics, Data and Statistical Thinking

129

65

You might also like