Chapter 6.
1 - Statistics
Statistics
🞅 A branch of mathematics dealing with the collection, analysis, interpretation, and
presentation of masses of numerical data
🞅 A collection of quantitative data
2 Types of Statistics
🞅 Descriptive Statistic
🞅 Inferential Statistic
-this involves the
-this interprets and
collection, organization,
draws conclusions from the
summarization and presentation
data.
of data.
Chap6.1 - Measures of Central
Tendency
Measures of Central Tendency
The Arithmetic Mean
🞅 The Arithmetic Mean is the most commonly used measure of central tendency.
🞅 The Arithmetic Mean of a set of numbers Is often referred to as simply the mean.
🞅 To find the mean for a set of data, find the sum of the data values and divide by
the number of data values.
For instance, to find the mean of the 5 salaries listed below.
$43,750 $39,500 $38,000 $41,250 44,000
Solutio
n
$43,750+$39,500+$38,000+41,250+44,000
🞅Mean =
$206,500 5
= 5 =
$41,300
The Arithmetic Mean
🞅 The traditional symbol used to indicate a summation is the Greek letter sigma, Σ . Thus
the notation Σx, called the summation notation, denotes the sum of all the numbers in a given set.
We can define the mean using summation notation.
Mean
The mean of n numbers is the sum of the numbers divided by n.
Σx
Mean = 𝑛
The Arithmetic Mean
🞅 Statisticians often collect data from small portions of a large group in order to
determine information about the group. In such situations the entire group under
consideration is known as the population, and any subset of the population is called a
sample. It is traditional to denote the mean of a sample by x̄ (which is read as x bar)
and to denote the mean of a population by the Greek letter 𝜇 (lowercase mu).
Example : Find the Mean
🞅 Six friends in a biology class of 20 students received test grades of
92, 84, 65, 76, 88, and 90
Find the mean of these test scores.
Solutio
n
🞅 The 6 friends are a sample of the population of 20 students. Use x̄ to represent the
mean.
Σx 92 + 84 + 65 + 76 + 88 + 90 495
x̄ = = = =
𝑛 6 6
82.5
🞅 The mean of these test scores is
82.5
Quiz on The Arithmetic Mean
🞅 A doctor ordered 4 separate blood tests to measure a patient`s total blood
cholesterol levels. The test results were
245, 235, 220, and 210
Find the mean of the blood cholesterol levels.
The Median
🞅 Another type of average is the median. Essentially, the median is the middle number
or the mean of the two middle numbers in a list of numbers that have been arranged
in numerical order from smallest to largest or largest to smallest. Any list of numbers
that is arranged in numerical order from smallest to largest or largest to smallest is a
ranked list.
Median
The median of a ranked list of n numbers is :
* the middle number if n is odd.
* the mean of the two middle numbers if n is even.
Example : Find the Median
Find the median of the data in the following lists.
1. 4, 8, 1, 14, 9, 21, 12
2. 46, 23, 92, 89, 77, 108
Solutio
n
🞅 1. Ranking the numbers from smallest to largest gives
1, 4, 8, 9, 12, 14, 21
The middle number is 9
🞅 2. Ranking the numbers from smallest to
largest gives 23, 46, 77, 89, 92, 108
The two middle numbers are 77 and 89. The mean of
77 and 89 is 83. Thus 83 is the median
of the data.
The Mode
🞅 A third type of average is the mode.
Mode
The mode of a list of numbers is the number that occurs most frequently.
Some lists of numbers do not have a mode. For instance, in the list 1, 6, 8, 10, 32, 15,
39, each number occurs exactly once. Because no number occurs more often than the
other numbers, there is no mode.
A list of numerical data can have more than one mode. For instance, in the list 4, 2, 6,
2, 7, 9, 2, 4, 9, 8, 9, 7 the number 2 occurs 3 times and the number 9 occurs three
times. Each of the other numbers occurs less than three times. Thus 2 and 9 are both
modes for the data.
Example: Find the Mode
Find the mode of the data in the following lists.
1. 18, 15, 21, 16, 15, 14, 15, 21
2. 2, 5, 8, 9, 11, 4, 7, 23
Solutio
n
1. In the list 18, 15, 21, 16, 15, 14, 15, 21, the number 15 occurs more often than the
other numbers. Thus 15 is the mode.
2. Each number in the list 2, 5, 8, 9, 11, 4, 7, 23 occurs only once. Because no number
occurs more often than the others, there is no mode.
Conclusion
🞅 The mean, the median, and the mode are all averages; however, they are generally not
equal. The mean of a set of data is the most sensitive of the averages. A change in any
of the numbers changes the mean, and the mean can be changed drastically by
changing an extreme value.
🞅 In contrast, the median and the mode of a set of data are usually not changed
by changing an extreme value.
🞅 When a data set has one or more extreme values that are very different from the
majority of data values, the mean will not necessarily be a good indicator of an
average value.
Example
🞅 In the following example, we compare the mean, median, and mode for the salaries of
5 employees of a small company.
Salaries: $370,000 $60,000 $36,000 $20,000 $20,000
Solutions
🞅 Mean =
$370,000 + $60,000 + $36,000 + $20,000 + $20,000 =
$506,000 = $101,200
5
🞅 The median is the middle number, $36,000
🞅 The mode is $20,000 because it occurs the most than the others.
🞅 The data contain one extreme value that is much larger than the other values. This
extreme value makes the mean considerably larger than the median. Most of the
employees of this company would probably agree that the median of $36,000
better represents the average of the salaries than does either the mean or the
mode.
The Weighted Mean
🞅 A value called the weighted mean is often used when some data values are more important
than others.
For instance, many professors determine a student`s course grade from the student`s quizzes
and the examination. Consider the situation in which a professor counts the examination score
as 2 quizzes scores. To find the Weighted Mean of the student`s scores, the professor first assign
each of the quiz scores a weight of 1 and the exam score a weight of 2.
A student with quiz scores of 65, 70 and 75 and an exam score of 90 has a weighted mean of
65 𝑥 1 + 70 𝑥 1 + 75 𝑥 1 + 90 𝑥 2
Weighted Mean = 390
5 5 =
=
🞅 Note that the numerator of the weighted mean above is the 78 of the
sum products of each
test score and its corresponding weight. The number 5 in the denominator is the sum of all
weights (1 + 1 + 1 + 2 = 5).
The Weighted Mean
🞅 The procedure for finding the weighted mean can be generalized as follows.
The Weighted Mean
The weighted mean of the n numbers x1, x2, x3, …. , xn with the
respective
assigned weights w1, w2, w3, …. , wn is
Weighted Mean = Σ(xΣ∗𝑤 w)
where Σ(x ∗ w) is the sum of the products formed by multiplying each number by its assigned
weight, and Σw is the sum of all weights.
Example on the Weighted Mean
🞅 Many colleges use the 4 –point grading system:
A = 4, B = 3, C = 2, D = 1, F = 0
A student`s grade point average (GPA) is calculated as a weighted mean, where
the student`s grade in each course is given a weight equal to the number of units (or
credits) that course is worth.
🞅 Find the Weighted Mean
The table below shows Dillon`s fall semester course grades. Use the weighted
mean formula to find Dillon`s GPA for the fall semester.
Course Course Grade Course Units
English B 4
History A 3
Chemistry D 3
Algebra C 4
Solutio
n
🞅 The B is worth 3 points, with a weight of 4; the A is worth 4 points with a weight of 3;
the D is worth 1 point, with a weight of 3; and the C is worth 2 points, with a weight of
4. The sum of all weights is 4 + 3 + 3 + 4 = 14
Weighted Mean =
(3 X 4) + (4 X 3)+ (1 X 3) + (2 X 4)
=
35 = 2.5
14
14
Dillon`s GPA for the fall semester is 2.5.
Frequency Distribution
🞅 Raw Data
- Data that have not been organized or manipulated in any manner are called raw data.
A large collection of raw data may not provide much readily observable information. A
frequency distribution, which is a table lists observed events and the frequency of occurrence
of each observed event, is often used to organized raw data. For instance, consider the
following table, which lists the number of laptop computers owned by families in each of 40
homes in a subdivision.
2 0 3 1 2 1 0 4
2 1 1 7 2 0 1 1
0 2 2 1 3 2 2 1
1 4 2 5 2 3 1 2
2 1 2 1 5 0 2 5
Table below is the Frequency Distribution of Table above
🞅 A Frequency Distribution
Observed event Number of laptop Frequency Number of households,
computers, x f, with x laptop computers
0 5 This row indicates
that there are 14
1 12
households with 2
2 14 laptop computers.
3 3
4 2
5 3
6 0
7 1
Total = 40
🞅 The formula for a weighted mean can be used to find the mean of the data in a
frequency distribution. The only change is that the weights w1, w2, w3, … , wn are
replaced with frequencies f1, f2, f3, …. fn. This procedure is illustrated in the next
example.
Example : Find the Mean of Data Displayed in a Frequency
Distribution
🞅 Find the mean of the data in the table below.
Observed event Number of Frequency Number of
laptop computers, x households, f, with x laptop
computers
0 5
1 12
2 14
3 3
4 2
5 3
6 0
7 1
Total = 40
Solutio
n
🞅 Mean =
Σ(x ∗ f) = (0 x 5) + (1 x 12)+ (2 x 14)+ (3 x 3)+ (4 x 2)+ (5 x 3)+ (6 x 0)+ (7 x 1)
Σ𝑓 40
79
= 40
=1.97
5
The mean number of laptop computers per household for the homes in the subdivision
is 1.975.
Measures of Dispersion
The
Range
🞅 In the preceding section we introduced three types of average values for a data set –
the mean, the median, and the mode. Some characteristics of a set of data may not
be evident from an examination of averages. For instance, consider a soft-drink
dispensing machine that should dispense 8 oz of your selection into a cup. The table
below shows data for two of these machines.
Machine 1 Machine 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
x̄ = 8.0 x̄ = 8.0
The
Range
🞅 The mean data value for each machine is 8 oz. However, look at the variation in data
values for Machine 1. The quantity of soda dispensed is very inconsistent – in some
cases the soda overflows the cup, and in other cases too little soda is dispensed. The
machine obviously needs adjustment. Machine 2, on the other hand, is working just
fine. The quantity dispensed is very consistent, with little variation.
Range
The range of a set of data values is the difference between the greatest data
value and the least data value.
Example: Find the Range
🞅 Find the range of the numbers of ounces dispensed by Machine 1 in the table
below.
Machine 1 Machine 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
x̄ = 8.0 x̄ = 8.0
Solutio
n
🞅 The greatest number of ounces displayed is 10.07 and the least is 5.85. The range of
the numbers of ounces dispensed is 10.07 – 5.85 = 4.22 oz.
The Standard Deviation
🞅 The range of a set of data is easy to compute, but it can be x x - x̄
deceiving. The range is a measure that depends only on the
two most extreme values, and as such it is very sensitive. A 8.01 8.01 – 8 = 0.01
measure of dispersion that is less sensitive to extreme values is 7.99 7.99 – 8 = -0.01
the standard deviation. 7.95 7.95 – 8 = -0.05
🞅 The standard deviation of a set of numerical data makes use of 8.03 8.03 – 8 = 0.03
the amount by which each individual data value deviates from
the mean. These deviations, represented (x - x̄ ), are positive when 8.02 8.02 – 8 = 0.02
the data value x is greater than the mean x̄ and are negative when x is Sum of deviations = 0
less than the mean x̄ . The sum of all the deviations (x - x̄ ) is 0 for all sets of data.
This is shown on the right side table for the Machine 2.
Standard Deviations for Populations and Samples
🞅 If x1, x2, x3, …. , xn is a population of n numbers with a mean of 𝜇, then the
Σ 𝑥 −𝜇 2
standard deviation of the population is 𝜎
𝑛
=
🞅 If x1, x2, x3, …. , xn is a sample of n numbers with a mean of x̄,
then the
Σ 𝑥 − x̄
standard deviation of the sample is 𝑠
𝑛 −1 2
=
Procedure for Computing a Standard
Deviation
🞅 1. Determine the mean of the n numbers.
🞅 2. For each number, calculate the deviation (difference) between the number and
the mean of the numbers.
🞅 3. Calculate the square of each deviation and find the sum of these square
deviatios.
🞅 4. If the data is a population then divide the sum by n. If the data is a sample, then
divide
the sum by n – 1.
🞅 5. Find the square root of the quotient in Step 4.
Example : Find the Standard Deviation
🞅 The following numbers were obtained by sampling a population.
2, 4, 7, 12, 15
Find the standard deviation of the sample.
Solutio
n
🞅 Step 1 : The mean of the numbers is
2+4+7+12+15 40
x̄ = 5
= 5
=8
🞅 Step 2 : For each number, calculate the deviation between the number and the
mean.
x x- x̄
2 2 – 8 = -6
4 4 – 8 = -4
7 7 – 8 = -1
12 12 – 8 = 4
15 15 – 8 = 7
Cont.
🞅 Step 3 : Calculate the square of each deviation in Step 2, and find the sum of
these
squared deviations. x x - x̄ (x - x̄)2
2 2 – 8 = -6 (-6) 2 = 36
4 4 – 8 = -4 (-4) 2 = 16
7 7 – 8 = -1 (-1) 2 = 1
12 12 – 8 = 4 (4) 2 = 16
15 15 – 8 = 7 (7) 2 = 49
2 2 – 8 = -6 (-6) 2 = 36
Sum of the squared deviation = 118
Cont.
🞅 Step 4 : Because we have a sample of n = 5 values, divide the sum 118 by n – 1.
which is 118
4. =
4
29.5
🞅 Step 5 : The standard deviation of the sample is 29.5. The standard deviation is
s= 5.43.
Another example : Use Standard Deviations
🞅 In this example we will use standard deviations to determine which company
produces batteries that are most consistent with regard to their life expectancy.
🞅 A consumer group tested a sample of 8 size-D batteries from each of 3 companies. The
results of the tests are shown in the following table. According to these tests, which
company produces batteries for which the values representing hours of constant use
have the smallest standard deviation?
Company Hours of constant use per Battery
EverSoBright 6.2, 6.4, 7.1, 5.9, 8.3, 5.3, 7.5, 9.3
Dependable 6.8, 6.2, 7.2, 5.9, 7.0, 7.4, 7.3, 8.2
Beacon 6.1, 6.6, 7.3, 5.7, 7.1, 7.6, 7.1, 8.5
Solutio
n
🞅 The mean for each sample of batteries is 7h.
🞅 The batteries from EverSoBright have a standard deviation The batteries from
of s1
6.2 −7 2 + 6.4 −7 2 + 7.1 −7 2 + 5.9 −7 2 + 8.3 −7 2 + 5.3 −7 2 + 7.5 −7 2
+ 9.3 −7 2
Dependable have the
7
= 12.34 smallest standard
s1 7 = 1.328 deviation. According to
h
🞅 The batteries
= from Dependable have a standard deviation these results, the
of s1
6.8 −7 2
+ 6.2 −7 2
+ 7.2 −7 2
+ 5.9 −7 2
+ 7.0 −7 2
+ 7.4 −7 2
+ 7.3 −7 2
+ 8.2 −7 2
Dependable company
7
= 3.62
produces the most
s1 7 = 0.719 consistent batteries with
h
🞅 The batteries
= from Beacon have a standard deviation regard to life expectancy
of 6.1 −7 2
+ 6.6 −7 2
+ 7.3 −7 2
+ 5.7 −7 2
+ 7.1 −7 2
+ 7.6 −7 2
+ 7.1 −7 2
+ 8.5 −7 2 under constant use.
s1
7
= 5.38
s1 7 = 0.877
h
=
The Variance
🞅 A statistic known as the variance is also used as a measure of dispersion. The variance
for a given set of data is the square of the standard deviation of the data. The
following charts shows the mathematical notations that are used to denote standard
deviations and variances.
Notations for Standard Deviation and Variance
𝜎 is the standard deviation of a population
𝜎2 is the variance of a population
s is the standard deviation of a
sample s2 is the variance of a
sample
Example : Find the Variance
🞅 Find the variance for the sample given Example : Find the Standard Deviation slide 43
of this pdf.
Solutio
n
In that example we found s = 29.5 . The variance is the square of the standard
deviation. Thus the variance is the square of the standard deviation. Thus the variance
is2 = 29.5)2 =
s
29.5.
(
End of Presentation