0% found this document useful (0 votes)
16 views211 pages

Sta15w1 Notes

This is for statistics. Useful for university students doing statistics. Summary of a textbook

Uploaded by

kgontsemandisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views211 pages

Sta15w1 Notes

This is for statistics. Useful for university students doing statistics. Summary of a textbook

Uploaded by

kgontsemandisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 211

1

Chapter 1 – Terminology
1.1 Definitions

Data/Data set – Set of values collected or obtained when gathering information on some
issue of interest.

Examples

1 The monthly sales of a certain vehicle collected over a period.

2 The number of passengers using a certain airline on various routes.

3 Rating (on a scale from 1 to 5) of a new product by customers.

4 The yields of a certain crop obtained after applying different types of fertilizer.

Statistics – Collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting the data, and drawing
conclusions from it.

Statistics in the above sense refers to the methodology used in drawing meaningful
information from a data set. This use of the term should not be confused with statistics
(referring to a set of numerical values) or statistics (referring to measures of description
obtained from a data set).

Descriptive Statistics – Collection, organization, summarization, and presentation of data.


To be discussed in chapter 2.

Population – All subjects possessing a common characteristic that is being studied.

Examples

1 The population of people inhabiting a certain country.

2 The collection of all cars of a certain type manufactured during a particular month.

3 All patients in a certain area suffering from AIDS.

4 Exam marks obtained by all students studying a certain statistics course.

Census – A study where every member (element) of the population is included.

Examples

1 Study of the entire population carried out by the government every 10 years.

2 Special investigations e.g., tax study commissioned by a government.


2

3 Any study of all the individuals/elements in a population.

A census is usually very costly and time consuming. It is therefore not carried out very often.
A study of a population is usually confined to a subgroup of the population.

Sample – A subgroup or subset of the population.

The number of values in the sample (sample size) is denoted by n.

The number of values in the population (population size) is denoted by N.

Statistical Inference – Generalizing from samples to populations and expressing the


conclusions in the language of probability (chance). To be discussed in chapters 5-9 and
11-13.

Variable – Characteristic or attribute that can assume different values

Discrete variables – Variables which can assume a finite or countable number of possible
values. Such variables are usually obtained by counting.

Examples

1 The number of cars parked in a parking lot.

2 The number of students attending a statistics lecture.

Continuous variables – Variables which can assume an infinite number of possible values.
Such variables are usually obtained by measurement.

1 The body temperature of a person.

2 The weight of a person.

3 The height of a tree.

4 The contents of a bottle of cool drink.

Some contiuous variables like e.g., age can become discrete when they are rounded.

Measurement scales

Qualitative variables – Variables which assume non-numerical values.

Examples

1 The course of study at university (B.Com, B.Eng , BA etc.)

2 The grade (A, B, C, D or E) obtained in an examination.


3

Nominal scale – Level of measurement which classifies data into categories in which no
order or ranking can be imposed on the data.

A variable can be treated as nominal when its values represent categories with no intrinsic
ranking. For example, the department of the company in which an employee works.
Examples of nominal variables include region, postal code, or religious affiliation.

Ordinal scale – Level of measurement which classifies data into categories that can be
ordered or ranked. Differences between the ranks do not exist.

A variable can be treated as ordinal when its values represent categories with some intrinsic
order or ranking.

Examples

1 Levels of service satisfaction from very dissatisfied to very satisfied.

2 Attitude scores representing degree of satisfaction or confidence and preference rating


scores (low, medium, or high).

3 A person’s response (agree, not agree) to a statement. A one (1) is recorded when the
person agrees with the statement, a zero (0) is recorded when a person does not agree.

4 Likert scale responses to statements (strongly agree, agree, neutral, disagree, strongly
disagree).

Quantitative variables – Variables which assume numerical values.

Examples

Discrete and continuous variables examples given above.

Interval scale – Level of measurement which classifies data that can be ordered and ranked
and where differences are meaningful. However, there is no meaningful zero and ratios are
meaningless.

Examples

1 The difference between a temperature of 100 degrees and 90 degrees is the same
difference as that between 90 degrees and 80 degrees. Taking ratios in such a case does not
make sense.

2 When referring to dates (years) or temperatures measured (degrees Fahrenheit or Celsius)


there is no natural zero point.

Ratio scale – Level of measurement where differences and ratios are meaningful and there is
a natural zero. This is the “highest” level of measurement in terms of possible operations that
can be performed on the data.
4

Examples

Variables like height, weight, mark (in test) and speed are ratio variables. These variables
have a natural zero and ratios make sense when doing calculations e.g., a weight of 80
kilograms is twice as heavy as one of 40 kilograms.

Summary of 4 measurement scales

Measurement examples Meaningful calculations


scale
Nominal Types of music Put into categories
University faculties
Vehicle makes
Ordinal Motion picture ratings: Put into categories
G- General audiences Put into order
PG-Parental guidance
PG-13 – Parents cautioned
R - Restricted
NC 17 – No under 17
Interval Years: 2009,2010, 2011 Put into categories
Months: 1,2, . . . , 12 Put into order
Find differences between values
Ratio rainfall Put into categories
humidity Put into order
income Find differences between values
Find ratios

Experiment – The process of observing some phenomenon that occurs.

An experiment can be observational or designed.

1 A designed experiment can be controlled to a certain extent by the experimenter.

Consider a study of 4 fuel additives on the reduction in oxides of nitrogen. You may have 4
drivers and 4 cars at your disposal. You are not particularly interested in any effects of cars or
drivers on the resultant oxide reduction. However, you do not want the results for the fuel
additives to be influenced by the driver or car. An appropriate design of the experiment (way
of performing the experiment) will allow you to estimate effects of all factors of interest
without these outside factors influencing the results.

2 An observational study is not controlled by the experimenter. The characteristic of interest


is simply observed, and the results recorded. Examples are

2.1 Collecting data that compares reckless driving of female and male drivers.
2.2 Collecting data on smoking and lung cancer.
5

Parameter – Characteristic or measure of description obtained from a population.

Examples

1 Mean age of all employees working at a certain company.

2 The proportion of registered female voters in a certain country.

Statistic – Characteristic or measure of description obtained from a sample.

1 The mean monthly salary of 50 selected employees in a certain government department.

2 The proportion of smokers in a sample of 60 university students.

1.2 Sampling methods

When selecting a sample, the main objective is to ensure that it is as representative as


possible of the population it is drawn from. When a sample fails to achieve this objective, it is
said to be biased.

Sampling frame (synonyms: "sample frame", "survey frame") – This is the actual set of units
from which a sample is drawn

Example

Consider a survey aimed at establishing the number of potential customers for a new service
in a certain city. The research team has drawn 1000 numbers at random from a telephone
directory for the city, made 200 calls each day from Monday to Friday from 8am to 5pm and
asked some questions.

In this example, the population of interest is all the inhabitants in the city. The sampling
frame includes only those city dwellers that satisfy all the following conditions:

1 They have a telephone.

2 The telephone number is included in the directory.

3 They are likely to be at home from 8am to 5pm from Monday to Friday;

4 They are not people who refuse to answer telephone surveys.

The sampling frame in this case differs from the population. For example, it under-represents
the categories which either do not have a telephone (e.g. the most poor), have an unlisted
number, or who were not at home at the time of calls (e.g. employed people), who don't like
to participate in telephone interviews (e.g. more busy and active people). Such differences
between the sampling frame and the population of interest are a main cause of bias when
drawing conclusions based on the sample.

Probability samples – Samples drawn according to the laws of chance. These include simple
random sampling, systematic sampling and stratified random sampling.
6

Simple random sampling – Sampling in which each sample of a given size that can be
drawn will have the same chance of being drawn. Most of the theory in statistical inference is
based on random sampling being used.

Examples

1 The 6 winning numbers (drawn from 49 numbers) in a Lotto draw. Each potential sample
of 6 winning numbers has the same chance of being drawn.

2 Each name in a telephone directory could be numbered sequentially. If the sample size
was to include 2 000 people, then 2 000 numbers could be randomly generated by computer
or numbers could be picked out of a hat. These numbers could then be matched to names in
the telephone directory, thereby providing a list of 2 000 people.

A random sample can be selected by using a table of random numbers (see table at the back).

Examples

Example 1 Lotto draw using a table of random numbers

The first 6 random numbers in the table of random numbers are 10480, 22368, 24130, 42167,
37570, 77921. Use these numbers to select the 6 wining numbers in a Lotto draw.

The 49 numbers from which the draw is made all involve 2 digits i.e. 01, 02, . . . , 49.
Putting the above numbers from the table of random numbers next to each other in a string of
digits gives 10 48 02 23 68 24 13 04 21 67 37 57 07 79 21 .

The winning numbers can be selected by either taking all pairs of digits between 01 and 49
(discarding any numbers outside this range or repeats) by working from left to right or right
to left in the above string.

By working from left to right the winning numbers are 10, 48, 2, 23, 24 and 13.
By working from right to left the winning numbers are 21, 7, 37, 21, 4 and 13.

Example 2

Generate 6 lotto winning numbers using excel.

1 Open an excel sheet and type the numbers 1,2, . . . ,49 in column A.
2 In cell B1 type = rand() and drag the cursor down the cell B49. These cells are filled
with numbers between 0 and 1.
3 Highlight the entries in column B, right click and select “copy”, paste, value 123. This
enbsures that these numbers do not change.
4 Higlight all the entries in cells A1 to B49 and select data, sort columnB numbers from
smallest to largest.

The numbers in cells A1 to A6 are the generated winning numbers.


7

The advantage of simple random sampling is that it is simple and easy to apply when small
populations are involved. However, because every person or item in a population has to be
listed before the corresponding random numbers can be read, this method is very
cumbersome to use for large populations and cannot be used if no list of the population items
is available. It can also be very time consuming to try and locate every person included in the
sample. There is also a possibility that some of the persons in the sample cannot be contacted
at all.

Systematic sampling – Sampling in which data is obtained by selecting every kth object,
N
where k is approximately n .

Examples

1 A manufacturer might decide to select every 20th item on a production line to test for
defects and quality. This technique requires the first item to be selected at random as a
starting point for testing and, thereafter, every 20th item is chosen.

2 A market researcher might select every 10th person who enters a particular store, after
selecting a person at random as a starting point, or interview occupants of every 5th house in
a street, after selecting a house at random as a starting point.

3 A systematic sample of 500 students is to be selected from a university with an enrolled


population of 10 000. In this case the population size N=10 000 and the sample size n=500.
10000
500
Then every = 20th student will be included in the sample. The first student in the
sample can be randomly selected from an alphabetical list of students and thereafter every
20th one can be selected until 500 names have been obtained.

Stratified random sampling – Sampling in which the population is divided into groups
(called strata) according to some characteristic. Each of these strata is then sampled using
random sampling.

Example

A general problem with random sampling is that you could, by chance, miss out a particular
group in the sample. However, if you subdivide the population into groups, and sample from
each group, you can make sure the sample is representative. Some examples of strata
commonly used are those according to province, age, and gender. Other strata may be
according to religion, academic ability, or marital status.

Example

In a study investigating the expenditure pattern of consumers, they were divided into low,
medium and high-income groups.

Income group percentage of population


low 40
8

medium 45
high 15

A stratified sample of 500 consumers is to be selected for this study.

When sampling is proportional to size (an income group comprises the same percentage of
the sample as of the population) the sample sizes for the strata should be calculated as
follows.

40∗500 45∗500 15∗500


=200 =225 =75 .
low : 100 , medium : 100 , high : 100

Convenience Sampling –Sampling in which data which are readily available is used e.g.,
surveys done on the internet. These include quota sampling.

Quota sampling

Quota sampling is performed in 4 stages.

Stage 1: Decide which characteristics of the elements/individuals in the population to be


sampled are of importance.

Stage 2: Decide on the categories to be sampled from. These categories are determined by
cross-classification according to the characteristics chosen at stage 1.

Stage 3: Decide on the overall number (quota) and numbers (sub-quotas) to be sampled
from each of the categories specified in step 2.

Step 4: Collect the information required until all the numbers (quotas) are obtained.

Example

A company is marketing a new product and needs to know how potential customers might
react to the product.

Stage 1: It is decided that age (the 3 groups under 20, 20-40, over 40) and gender (male,
female) are the characteristics that will determine the sample.

Stage 2: The 6 categories to be sampled from are (male under 20), (male 20-40), (male over
40), (female under 20), (female 20-40) and (female over 40).

Stage 3: Decide on the sub-quotas (required sample sizes) for the different subgroups.
Example

Category Sub-quota
male under 20 40
male 20-40 60
male over 40 25
female under 20 35
9

female 20-40 65
(female over 40 30
Total 255

The total quota is the total of all the sub-quotas i.e., 255.

Stage 4: Visit a place where individuals to be interviewed are readily available e.g., a large
shopping center and interview people until all the quotas are filled.

Quota sampling is a cheap and convenient way of obtaining a sample in a short space of time.
However, this method of sampling is not based on the laws of chance and cannot guarantee a
sample that is representative of the population from which it is drawn.

When obtaining a quota sample, interviewers often choose who they like (within criteria
specifications) and may therefore select those who are easiest to interview. Therefore,
sampling bias (uncontrolled factors that result in the sample being not representative of the
population) can result. It is also impossible to estimate the accuracy of quota sampling
(because the sampling is not random).

1.3 Computer output

The excel software has a facility with which a random sample of a specific size can be
selected from a given population. Below is the output of a random sample of size 5 selected
from a population consisting of 10 items.

Populati sampl
on e
13 16
27 27
14 12
12 12
15 13
9
10
12
16
9

Chapter 2 – Descriptive Statistics (Exploratory Data


Analysis)
All the data sets used in this chapter will be regarded as samples drawn from some
population. One of the main purposes of studying a sample is to get information about the
population. The focus here is on summarizing and describing some features of the data.

2.1 Graphs and diagrams


10

Line graph

A line graph is a graph used to present some characteristic recorded over time.

Example:

The graph above shows how a person's weight varied from the beginning of 1991 to the
beginning of 1995.

Bar charts

A bar chart or bar graph is a chart consisting of rectangular bars with heights proportional to
the values that they represent. Bar charts are used for comparing two or more values that are
taken over time or under different conditions.

Simple Bar Chart

In a simple bar chart, the figures used to make comparisons are represented by bars. These
are either drawn vertically or horizontally. Only totals are represented. The height or length
of the bar is drawn in proportion to the size of the figure being presented. An example is
shown below.
11

Component Bar Chart

When you want to draw a bar chart to illustrate your data, it is often the case that the totals of
the figures can be broken down into parts or components.

Year Total Male Female


1959 51 956 000 25 043 000 26 913 000
1969 55 461 000 26 908 000 28 553 000
1979 56 240 000 27 373 000 28 867 000
1989 57 365 000 27 988 000 29 377 000
1999 59 501 000 29 299 000 30 202 000

You start by drawing a simple bar chart with the total figures as shown above. The columns
or bars (depending on whether you draw the chart vertically or horizontally) are then divided
into the component parts.

Multiple (compound) Bar Chart

You may find that your data allows you to make comparisons of the component figures
themselves. If so, you will want to create a multiple (compound) bar chart. Each component
is represented by a separate bar with all the components relating to a particular case (e.g., a
12

year) next to each other. This type of chart enables you to trace the trends of each individual
component, as well as making comparisons between the components.

Pareto chart

This is a special bar chart where the frequencies (presented by bars) are arranged in
decreasing order of magnitude (largest to smallest).

Example

The table below shows the occurrence of diseases taken from a citrus orchard.

Citrus frequenc
diseases y
Anthraknose 467
Canker 598
Melanose 532
Scab 503
Leaf miner 427
Sooty mold 568
Pest hole 415
Total 3510
13

Pareto chart of citrus diseases


700

600

500

400

300

200

100

0
Canker Sooty mold Melanose Scab Anthraknose Leaf miner Pest hole

From this chart the seriousness of the diseases (most to least) can be seen.

Dot Plot

This is diagram where a line is drawn according to a scale that is appropriate for the data set
and the values (in the data set) plotted at their positions on the scale. If the same value occurs
more than once, the multiple values are plotted on top of each other at the same point on the
scale. For small data sets (few values) this plot can provide useful information regarding data
patterns.

Example

Imagine that a medium-sized retailer, thinking of expanding into a new region identifies a
business that it considers as being ready for takeover. It finds the following annual profit
figures (in tens of thousands of pounds) for the target retailer's last ten years trading:

9977765433

To draw a dot plot, we can begin by drawing a horizontal line across the page to represent the
range of values of all the numbers (scale). Then we can mark an 'x' above the appropriate
value along the line as follows:

Pie Chart
14

A Pie chart is a diagram that shows the subdivision of some entity/total into subgroups. The
diagram is in the form of a circle which is divided into slices with each slice having an area
according to the proportion that it makes up of the total.

Example

The pie chart below shows the ingredients used to make a sausage and mushroom pizza.

The degrees needed for each slice is found by calculating the appropriate percentage of 360
e.g., for sausage the degrees are 0.125*360 = 45, for cheese 0.25*360 =90 etc. The complete
calculations are shown in the table below.

Ingredient Percentag Degrees


e
Sausage 7.5 0.075*360=27
Cheese 25 0.25*360=90
Crust 50 0.50*360=180
Tomato sauce 12.5 0.125*360=45
Mushrooms 5 0.05*360=18

Stem-and-leaf plot

A stem-and-leaf plot is a device used for summarizing quantitative data in a table/graphical


format to assist in visualizing the shape of a data set.

Examples

To construct a stem-and-leaf plot, the values must first be sorted in ascending order. Here is
the sorted set of data values that will used in the example:

44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106

Next, it must be determined what the stems will represent and what the leaves will represent.
Typically, the leaf contains the last digit of the number, and the stem contains the other
digits. In the case of very large or very small numbers, the data values may be rounded to a
15

particular place value (such as the hundredths place) that will be used for the leaves. The
remaining digits to the left of the rounded place value are used as the stems.

In this example, the leaf represents the ones place and the stem the rest of the number (tens
place or higher).

The stem-and-leaf plot is drawn with two columns separated by a vertical line. The stems are
listed to the left and the leaves to the right of the vertical line. It is important that each stem is
listed only once and that no numbers are skipped, even if it means that some stems have no
leaves. The leaves are listed in increasing order in a row to the right of each stem.

4|4679
5 |
6 |34688
7 |2256
8 |148
9 |
10 | 6

key: 5|4=54
leaf unit: 1.0
stem unit: 10.0

A stem-and-leaf plot enables the researcher to see patterns of clustered values e.g., 12 of the
17 values are greater or equal to 63 and less or equal to 88.

2 Two data sets can be compared by drawing a back-to-back stem-and-leaf plot.

As an example, suppose the fat contents (in grams) for eating English breakfasts and cold
meat sandwiches are to be compared. The fat contents are shown below.

Sandwiches: 6, 7, 12, 13, 17, 18, 20, 21, 21, 24, 26, 28, 30, 34

Breakfasts: 12, 14, 15, 16, 18, 23, 25, 25, 36, 36, 38, 41, 44, 45

A back-to-back stem-and-leaf plot is shown below.

Breakfasts Sandwiches
|0| 6 7
2 4 5 6 8 |1| 2 3 7 8
3 5 5 |2| 0 1 1 4 6 8
6 6 8 |3| 0 4
1 4 5 |4|

key: 2|4=24
leaf unit: 1.0
16

stem unit: 10.0

Conclusion: The fat content in breakfasts appears to be higher than that in sandwiches.

2.2 Sigma and subscript notation

The symbol sigma ∑ (Capital S in Greek alphabet) is used to denote “the sum of” values.

Suppose the symbol x is used to denote some variable of interest in a study. To distinguish
between values of this variable, subscripts are used.

x1 – first value in the data set which has a subscript 1.

x2 – second value in the data set which has a subscript 2.


.
.
xn – nth value in the data set which has a subscript n.

The sum of these values is written in shorthand notation as


n
∑ xi
x1 + x2 + . . . + xn = i=1 .

If it is understood that the range of subscript indices over which the summation is taken
involves all the x values, the summation can be written as just

x1 + x2 + . . . + xn = ∑x.
Example 1: Suppose x1=70, x2=74, x3=66, x4=68, x5=71

5
∑ xi
Then i=1 = x1+x2+ . . . + x5 = 70+74+66+68+71 = 349.

The sum of the squares of a set of values are written as

n
∑ x 2i 2 2 2
∑ x 2 for short.
i=1 = x 1 + x 2 +⋯+ x n or

5
∑ x 2i
Example 2: For the data set in example 1, i=1 = 702+742+662+682+712 = 24397.

n n 5
∑ x 2i ∑ xi ) 2
∑ x 2i 349
2

Note that i=1 ¿ ( i=1 e.g., for the abovementioned data i=1 = 24397 ¿ = 121801.

The summation notation can also be used to write the sum of products of corresponding
values for 2 different sets of values.
17

n
∑ xi yi x 1 y 1 +x 2 y 2 +⋯+x n y n
i=1 =

Example: Consider the following values.

i 1 2 3 4 5 6
xi 11 13 7 1 10 8
2
yi 8 5 7 6 9 11
x i y i 88 65 49 7 90 88
2

6
∑ xi yi
For this data i=1 = 11*8+13*5+7*7+12*6+10*9+8*11 = 88+65+49+72+90+88 = 452.

n n n 6
∑ x i y i ∑ x i ) ∑ yi ∑xi
Note that i=1 ¿ ( i=1 ( i=1 ) e.g., for the abovementioned data i=1 = 61 and
6 6 6 6
∑ yi ∑ x i ∑ yi ∑ xi yi
i=1 = 46. ( i=1 ) ( i=1 ) = 2806 ¿ i=1 .

The summation notation is used extensively in specifying calculations in statistical formulae.

2.3 Frequency distributions and related graphs

Frequency distribution

A frequency distribution is a table in which data are grouped into classes and the number of
values (frequencies) which fall in each class recorded.

The main purpose of constructing a frequency distribution is to get an insight into the
distribution pattern of the frequencies over the classes. Hence, the name frequency
distribution is used to refer to this pattern.

Examples

Example 1 In a survey of 40 families in a village, the number of children per


family was recorded and the following data obtained.
10321562
21034216
32153324
22302145
33441245

number of Tally frequency (f)


children
0 ||| 3
18

1 ||||| || 7
2 ||||| | | | | | 10
3 ||||| ||| 8
4 ||||| | 6
5 |||| 4
6 || 2
Total 40
Note: The sum of the frequencies = sample size i.e., ∑ f =n.

Example 2 Consider the following data of low temperatures (in degrees Fahrenheit to the
nearest degree) for 50 days. The highest temperature is 64 and the lowest temperature is 39.

Data Set - Low Temperatures for 50 Days


57 39 52 52 43
50 53 42 58 55
58 50 53 50 49
45 49 51 44 54
49 57 55 64 45
50 45 51 54 58
53 49 52 51 41
52 40 44 49 45
43 47 47 43 51
55 55 46 54 41

Constructing a frequency distribution

The classes into which the above values can be sorted can be found by following the steps
shown below.

1 Find the maximum (=64) and minimum (=39) values and calculate the

range = maximum–minimum = 64-39 = 25.

2 Decide on the number of classes. Use Sturges’ rule which states that

no of classes = k = the rounded up value of 1 + 1.44 ln n = 1 + 1.44*ln(50) = 6.63 i.e.,

k = 7.

3 Calculate the class width such that no of classes*class width > range i.e.

7* class width > 25.

This suggests a class width of 4.


19

4 Find the lower value that defines the first class. This is usually a value just below the
minimum value in the data set. Since the minimum value for this data set is 39, the lowest
class can have a minimum value one below this i.e., 38.

5 Find the lower values that define each of the classes that follow by successively adding the
class width to the lower value of class.

lower value of the second class = 38 + 4 = 42.

lower value of the third class = 42 + 4 = 46 etc.

The frequency distribution below shows the data values sorted into the classes

38-41, 42-45, 46-49, 50-53, 54-57, 58-61, 62-65

The table below shows the classes, their frequencies, relative frequencies,and cumulative
frequencies for the temperatures data set.

class class
limits boundaries f relative frequency cumulative frequency
38-41 37.5-41.5 4 0.08 4
42-45 41.5-45.5 10 0.2 14
46-49 45.5-49.5 8 0.16 22
50-53 49.5-53.5 15 0.3 37
54-57 53.5-57.5 9 0.18 46
58-61 57.5-61.5 3 0.06 49
62-65 61.5-65.5 1 0.02 50
Total 50

Frequency distribution terminology

Class limits – The values that define the classes of the frequency distribution in terms of the
rounded values in the data set.

lower class limit – minimum rounded value that defines a class of the frequency distribution.

upper class limit – maximum rounded value that defines a class of the frequency
distribution.

class boundaries – The values that define the classes of the frequency distribution in terms
of the actual values in the data set.

lower class boundary – minimum actual value that defines a class of the frequency
distribution.

upper class boundary – maximum actual value that defines a class of the frequency
distribution.
20

The first class in the above table has lower class limit of 38 and lower-class boundary of 37.5
(since this is smallest actual value that can be rounded up to 38).

The first class in the above table has upper class limit of 41 and upper-class boundary of 41.5
(since this is largest actual value that can be rounded down to 41).

The lower and upper-class boundaries of a particular class can be calculated by using the
following formulae.

upperclass lim itclass (i−1)+lowerclass lim itclass(i)


lower class boundary (class i) = 2

upperclass lim itclass (i )+ lowerclass lim itclass (i+1 )


upper class boundary (class i) = 2

For the second class (i=2) in the above frequency distribution, class (i-1) is the first class
(since i-1=1). Hence

upperclass lim itclass (1 )+lowerclass lim itclass(2)


lower class boundary class (2) = 2
41+ 42
=41 .5 .
= 2

upperclass lim itclass (2)+lowerclass lim itclass(3)


upper class boundary class (2) = 2

45+ 46
=45. 5 .
= 2

Note that for the frequency distribution in example 1,

lower class limit(boundary) = upper class limit (boundary).

This follows from the fact that for this distribution

1 Each class is defined by a single value and not by a range of values like in example 2.
Therefore, upper class limit (boundary) = lower class limit (boundary).

2 The values are accurately recorded (no rounding). Therefore, the limits and boundaries are
identical values.

In general, for accurately recorded (not rounded) data, the lower-class limit = lower class
boundary and upper-class limit = upper-class boundary.

Example 3

The monthly expenditures (thousands of rands) of 60 households are shown below.


The values of this data set were accurately recorded (not rounded).
21

10.3116
7.21741 7.8989 6.85461 7 8.48253 5.17069
5.09063 8.16412 5.67094 7.7394 7.87423 5.41634
10.1443 10.3110
9.37265 6 7.15675 7 8.86571 10.1734
5.99276 6.5738 7.06965 8.82439 7.47467 9.50018
10.9559
4.90014 5.50273 8.12516 5.51933 7.43641 9
10.1889
5.87188 9.36936 9.83773 3 5.12028 9.60018
10.7834
8.56534 9.27719 8.37107 7.03318 4 9.08941
6.85749 7.7887 9.68159 6.75009 8.0521 8.19638
10.1731 11.3138
2 7.51527 3 8.5765 7.48021 8.39881
7.37565 7.28159 8.81773 5.53182 5.98515 7.71778

The frequency distribution shown below is a summary of this data set.

classes f
4.5-5.5 5
5.5-6.5 7
6.5-7.5 13
7.5-8.5 13
8.5-9.5 9
9.5-10.5 10
10.5-11.5 3
Total 60

For this distribution the lower (upper) class limit = lower (upper) class boundary for each of
the classes.

A value that falls on the boundary of 2 classes is allocated to the higher of the two classes
e.g., 5.50000 is allocated to the class 5.5-6.5 (not 4.5 to 5.5).

Class midpoints

The midpoint of class (xmid) can be calculated from

lowerclass lim it (boundary )+upperclass lim it (boundary )


x mid =
2

Examples

1 For the frequency distribution in example 2, the class midpoints are given below.

class midpoint
class limits boundaries s
38-41 37.5-41.5 39.5
42-45 41.5-45.5 43.5
46-49 45.5-49.5 47.5
50-53 49.5-53.5 51.5
22

54-57 53.5-57.5 55.5


58-61 57.5-61.5 59.5
62-65 61.5-65.5 63.5

2 For the frequency distribution in example 3 the class midpoints are given below.

classes midpoints
4.5-5.5 5
5.5-6.5 6
6.5-7.5 7
7.5-8.5 8
8.5-9.5 9
9.5-10.5 10
10.5-11.5 11

Cumulative frequencies

The “less than” cumulative frequency of a class is the number of values in the sample that are
less than or equal to the upper-class boundary of the class.

Examples

1 See frequency distribution in example 2.

2 For the frequency distribution in example 3 the “less than” cumulative frequencies are
calculated as shown below.

upper class cumulative


classes boundary f frequencies calculations
4.5-5.5 5.5 5 5 5
5.5-6.5 6.5 7 12 5+7
6.5-7.5 7.5 13 25 5+7+13
7.5-8.5 8.5 13 38 5+7+13+13
8.5-9.5 9.5 9 47 5+7+13+13+9
9.5-10.5 10.5 10 57 5+7+13+13+9+10
5+7+13+13+9+10+
10.5-11.5 11.5 3 60 3
Total 60

Relative and percentage frequencies

f
Relative frequency = frequency/sample size i.e., Rf = n .

Examples

1 See frequency distribution in example 2.

2 For the frequency distribution in example 3 the relative frequencies are calculated as
shown below.
23

relative
classes f frequency
4.5-5.5 5 0.083
5.5-6.5 7 0.117
6.5-7.5 13 0.217
7.5-8.5 13 0.217
8.5-9.5 9 0.15
9.5-10.5 10 0.167
10.5-11.5 3 0.05
Total 60 1

The percentage frequency of a class is calculated from relative frequency * 100.

Why it is necessary to distinguish between the definition of classes for accurately recorded
and rounded data? The following example explains why.

Example

Consider the data set (variable temperature) used in example 2. The classes (defined in terms
of boundaries) suggested for grouping the values into are 38-41, 42-45, . . . , 62-65
(grouping A). Suppose the classes are defined as 38-42, 42-46, . . . , 62-66 (grouping B).
The first definition of classes allows for rounded values, but the second one does not allow
for it. Consider the following.

actual rounded value value should be in class value is put class value is put
value (nearest class according to in with class in with class
integer) grouping B grouping B grouping A
41.5 42 38-42 42-46 42-45
45.5 46 42-46 46-50 46-49
61.5 62 58-62 62-66 62-65

With grouping B all the above values are put in incorrect classes, while with grouping A they
are put in their correct classes.

Histogram

A histogram is the graphical representation of a frequency distribution. The frequency for


each class is represented by a rectangular bar with the class boundaries as base and the
frequency as height.

Example

A histogram of the frequency distribution in example 2 is shown below.


24

16
14
12
10
frequency

8
6
4
2
0
37.5-41.5 41.5-45.5 45.5-49.5 49.5-53.5 53.5-57.5 57.5-61.5 61.5-65.5
temperature

Frequency polygon

This is also a graphical representation of a frequency distribution. For each class the class
midpoint is plotted against the frequency and the plotted points joined by means of straight
lines.

The following values are plotted.

midpoin
t 35.5 39.5 43.5 47.5 51.5 55.5 59.5 63.5 67.5
f 0 4 10 8 15 9 3 1 0

The plot is shown below.

16
14
12
10
frequency

8
6
4
2
0
0 10 20 30 40 50 60 70 80
midpoint

Note: The two plotted values at the lower and upper ends were added to anchor the graph to
the horizontal axis. The lower end value is a plot of 0 versus the midpoint of the class below
25

the first (lowest) class (35.5). This midpoint is obtained by subtracting the class width (4)
from the midpoint of the lowest class (39.5). The upper end value is a plot of 0 versus the
midpoint of the class above the last class (67.5). This midpoint is obtained by adding the class
width (4) to the midpoint of the last (highest) class (63.5).

The histogram and frequency polygon are equivalent graphical representations of the pattern
of the frequencies shown in the frequency distribution. It can be shown that the areas under
the histogram and frequency polygon are the same. The total area under the histogram
(frequency polygon) represents the total number of observations in the data set (n).

The ratio [area under the histogram (frequency polygon) between 2 values]/ sample size

= sum of frequencies between the 2 values/ sample size

is an estimate of the probability (chance) that a value drawn at random from the data set will
lie between these two values.

Examples

1 For the frequency distribution in example 2 the estimated chance that a randomly drawn
8+15+9
=0. 64 .
value will be between 45.5 and 57.5 is 50

2 For the frequency distribution in example 3 the estimated chance that a randomly drawn
13+9+ 10+3
=0. 583 .
value will be greater than 7.5 is 60

“Less than” ogive

This is the graph of the “less than” cumulative frequencies versus the upper-class boundaries.

Example

“Less than” ogive of the frequency distribution in example 2.

The following values are plotted.

class boundary 37.5 41.5 45.5 49.5 53.5 57.5 61.5 65.5
cumulative
frequency 0 4 14 22 37 46 49 50
26

cumulative frequency

60

50

40
Cum. frequency

30

20

10

0
0 10 20 30 40 50 60 70
class boundary

Note: The plotted value at the lower end was added to anchor the graph to the horizontal axis.
The lower end value is a plot of 0 versus the upper-class boundary of the class below the first
(lowest) class (37.5). This upper-class boundary is obtained by subtracting the class width (4)
from the upper class boundary of the lowest class (41.5).

A percentage “less than” ogive can be plotted by just changing the vertical scale. In this
example the frequencies add up to 50. These frequencies can be converted to percentages, by
multiplying each frequency by 2. To draw the percentage ogive, each cumulative frequency
in the above table will have to be multiplied by 2. The resulting graph is shown below.
Values that have a given percentage of the observations in the data set less than it can be read
off from the ogive.

120

100
% cumulative freq

80

60

40

20

0
0 10 20 30 40 50 60 70
boundaries
27

The shape of a distribution

The main purpose of drawing a histogram is to describe the clustering pattern of the values in
the data set. For a large sample size, the histogram (frequency polygon) can be well
approximated by a smooth curve (called a frequency curve) that is fitted to the frequencies.
The following patterns of the shape of the frequency curve appear regularly in data sets.

Symmetric bell shape

0.45

0.4

0.35

0.3
frequency

0.25

0.2

0.15

0.1

0.05

0
-4 -2 0 2 4
x

This shape is for data sets where many values are in the central portion of the scale with
fewer and fewer the values the further away from the center (in both directions). Many data
sets have this shape. The graph has a symmetrical appearance i.e., the two halves on either
side of the zero x-value are identical. Examples are

1 Marks obtained in an examination.

2 Heights of a large group of adult males.

3 IQ scores in a large population.

Uniform (rectangular) shape


28

0.12

0.1

0.08
frequency

0.06

0.04

0.02

0
0 1 2 3 4 5 6
x

This shape occurs when all the values in the data set occur approximately the same number of
times. Examples are

1 Frequencies of winning numbers of Lotto draws perfomed a large number of times.

2 Frequencies of winning numbers of roulette games perfomed a large number of times.

3 Frequencies obtained when tossing an unbiased coin and recording 0 if tails come up and
1 if heads come up.

Bimodal shape
29

60

50

40
frequency

30

20

10

0
0 20 40 60 80 100 120
Body length (m m )

This pattern which shows two distinct peaks (hence the name bimodal data) appearing when
there are two subgroups with different sets of values in the same data set.

Examples

1 Measuring the body lengths of ants when there are adults and juveniles together in the
same data set. The two peaks in the curve reflect the fact that juvenile ants have shorter body
lengths than adult ants.

2 Heights of a population of males and females. Since the females are shorter than the
males, the frequency curve will have two peaks. One peak will be located where the most
female heights are concentrated and one where the most male heights are concentrated.

Positive skew shape


30

1.2

0.8
frequency

0.6

0.4

0.2

0
0 2 4 6 8 10 12 14
x

This shape shows a high clustering of values at the lower end of the scale and less and less
clustering further away from the lower end towards the upper end.

Example

The time it takes to serve a customer at a supermarket. For most customers the service time is
quite short. The longer the service time, the less the number of customers.

Negative skewed shape

0.3

0.25

0.2
frequency

0.15

0.1

0.05

0
0 2 4 6 8 10 12 14 16
-0.05
x

This shape shows a high clustering of values at the upper end of the scale and less and less
clustering further away from the upper end towards the lower end.

Example
31

Marks in a test where most students did well, but a few performed poorly.

2.4 Measures of central tendency (location)

A measure of central tendency is a value that shows the location on the scale where a data
set is centrally located (most values are clustered around it).

In the calculations a distinction will be made between methods used when the data are in raw
form (values as collected) or grouped form (form of a frequency distribution).

For each of the measures discussed in sections 2.4 and 2.5 the formulas used will be based on
samples selected from the corresponding populations. The measures (statistics) are estimates
of the corresponding population parameters.

2.4.1 Raw data

The mean (average), median and mode

The mean (or average) of a set of data values is the sum of all of the data values in the set
divided by the n the number of data values. That is

1
n∑ .
x̄= x
mean =

x̄ is pronounced “x bar”.

Example

The marks of seven students in a mathematics test with a maximum possible mark of 20 are
given below:

15 13 18 16 14 17 12:

x̄=
∑x
mean = n

15+ 13+18+16+14 +17+12


=
7
= 15.

Median

The median is the value in the data set which is such that half of the values in the data set are
less than or equal to it and half are greater than or equal to it.
32

For an odd number of values in the data set, the median is the middle value of the data set
when it has been arranged in ascending order. That is, from the smallest value to the largest
value.

If the number of values in the data set is even, then the median is the average of the two
middle values.

Examples

1 The marks of nine students in a geography test that had a maximum possible mark of 50
are given below:

47 35 37 32 38 39 36 34 35

Find the median of this set of data values.

Arrange the data values in order from the lowest value to the highest value:

32 34 35 35 36 37 38 39 47

2 Consider the above data set with the first value (47) omitted.

Arrange the data values in order from the lowest value to the highest value:

32 34 35 35 36 37 38 39

In this case the number of values n = 8 which is an even number. The two middle values in
n 8 n
= =4 + 1=5
the data set are in positions 2 2 and 2 i.e., the values 35 and 36.

35+36
=35 . 5.
median = 2

Mode

The mode of a set of data values is the value(s) that occurs most often.

Example

Find the mode of the following data set:


33

48 44 48 45 42 49 48

The mode is 48 since it occurs most often.

Note

1 It is possible for a set of data values to have more than one mode.

2 If there are two data values that occur most frequently, we say that the set of data values is
bimodal e.g., the data set 2, 2, 4, 5, 5, 6 has two modes (2 and 5).

3 If no value in the data set occurs more than once, it has no mode e.g. the data set 4, 5, 7,
9 has no mode.

4 For continuos data, it is possible to have a large data set where no value occurs more than
once. For such data, see remark 3 below.

Comparison of mean, median and mode

1 The mean is used as a measure of central tendency for symmetrical, bell-shaped data that
do not have extreme values (extreme values are called outliers). Outliers are unusually small
or large numbers.

2 The median may be more useful than the mean when there are extreme values in the data
set as it is not affected by the extreme values.

3 The mode is useful when the most common item, characteristic or value of a data set is
required. In such case a smooth function is fitted to the histogram of the data and the mode
defined as the value on the horizontal axis corresponding to the maximum point of the fitted
curve.

Examples

1 The amounts (thousands) for which each of 7 properties were sold are shown below.

280, 390, 412, 555, 698, 725, 2 350


34

For this data set mean = x̄ = 772.86. This value of the mean is not a central value for the data
set (it is greater than all the values except the largest one). The reason for this is that the last
value (2350) has a considerable influence on the value of the mean.

The median = 555 is a value thatis more centrally located than the mean. Unlike the mean,
the median is not influenced by large values in the data set.

2 For qualitative (non-numerical) data only the mode can be calculated e.g., suppose 10
different rate payers are asked whether they think the percentage increase in rates is
reasonable. They can either agree (A), disagree (D) or be neutral (N) on the issue. Their
responses are shown below.

A, A, D, N, D, A, D, D, N, N.

For this data set the modal response is D (since D occurs more times than the other
responses). It is not possible to calculate a median or a mean for this data set.

The weighted mean

When calculating the mean for raw data, it is usually assumed that all the values in the data
set are equally important. If the values are not all considered equally important, the weighted
mean ( x̄ w ) is calculated according to the formula below.
r
∑ x i wi
x̄ w = i=1r
∑ wi
i=1 .

In the formula x1, x2, . . ., xr are the values and w1, w2, . . . ,wr their respective weights.

Example

The final mark (percentage) in a certain course is based on an assignment mark (which counts
for 10% of the final mark), a test mark (which counts for 30% of the final mark) and an exam
mark (which counts for 60% of the final mark). Calculate the final mark of a student who gets
a 65% assignment mark, a 70% test mark, and a 55% exam mark.

The above formula is applied with x1= 65, x2= 70 x3= 55, w1= 10, w2= 30 w3= 60.

65∗10+70∗30+55∗60 6050
x̄ w = = =60 . 5 .
10+30+ 60 100

2.4.2 Grouped data

Mean and mode

For grouped data the mean is calculated from the formula below.
35

k
∑ x mid ( i ) f i
x̄= i=1
n , where
xmid(i) is the midpoint of the ith class, k the number of classes and n the sample size.

This formula is a special case of the weighted mean formula with wi = fi and

k
∑ wi =n .
i=1

Example

For the frequency distribution of temperatures (example 2 of the frequency distributions), the
mean can be calculated as shown below.

Class
boundaries xmid(i) fi xmid(i) fi
37.5-41.5 39.5 4 158
41.5-45.5 43.5 10 435
45.5-49.5 47.5 8 380
49.5-53.5 51.5 15 772.5
53.5-57.5 55.5 9 499.5
57.5-61.5 59.5 3 178.5
61.5-65.5 63.5 1 63.5
Total 50 2487

2487
=49 . 74 .
mean = 50
Mode
f −f
m 0 =L+ m m−1 c
( f m−f m−1 )+(f m−f m+1 )

Modal class – Class with the highest frequency.


L− Lower class boundary of modal class.
f m− Frequency of modal class.
f m−1− Frequency of class before modal class.
f m+1 − Frequency of class after modal class.
c− Class length.

Example

For the above frequency distribution


36

f m−f m−1 15−8


m0 =L+ c=49 .5+ 4=51 .65
(f m−f m−1 )+( f m−f m +1 ) (15−8 )+(15−9 ) .

2.5 Measures of variability (variation, spread, dispersion)

Variability refers to the extent to which the values in a data set vary around (differ from) the
associated measure of central tendency.

Example

The performance of 2 different stocks is monitored over a period of 8 days. Their values are
shown in the table below.

day 1 2 3 4 5 6 7 8
A 100 120 110 108 130 106 120 112
B 112 97 88 123 153 84 146 110

The dot plot that follows shows the performance of each stock.

stock

80 100 120 140 160


x
37

The mean values for the two stocks are the same (=113.25), but they differ in variability
(extent of spread around the mean). Stock B has a far wider spread around the mean than
stock A.

2.5.1 Raw data

Range

range = maximum value in data set – minimum value in data set

For the stocks data sets the range = 130-100 =30 (for stock A data set)

= 153-84 = 69 (for stock B data set).

The larger (wider) spread in stock B values is reflected in the larger range (more than twice
that of stock A).

Standard deviation and variance

A measure of deviation is based on the differences between the values in the data set and the
n n
mean ( x ). Since ∑ (x i ¿−x)=∑ x i−n x =0 ¿ for any data set, a measure of deviation is based
i=1 i=1
on the sum of the squares of these differences.

The variance (denoted by S2) is a measure of variability based on squared differences


between the values in the data set and the mean.
n n n n
∑ ( x i− x̄ )2 ∑ x2i −(∑ xi )2 / n ∑ x 2i −n x̄ 2
S2 = i=1 = i=1 i=1
= i =1
variance = n−1 n−1 n−1 .

Note: 1 Division in the above formula is by n−1 and not by n . The reason why this is done
is that the variance estimator with division by n−1 has some superior properties for small
values of n (to be discussed in a second year module).

2 For large values of n , the answers when dividing by n and n−1 differ very little. For such
values of n the variance is calculed by dividing by n .

Unless told to do so, the variance will be estimated by using the formula where division is by
n−1 .

The variance is expressed in the data units squared. The standard deviation = S = √ S ,
2

which is the positive square root of the variance, is expressed in the same units as the data.

Example
38

For stock A the standard deviation is calculated as follows.

x = score
A x2
100 10000
120 14400
110 12100
108 11664
130 16900
106 11236
120 14400
112 12544
sum 906 103244

2
906
103244−
variance = S2 = 8 .
91.357
7

standard deviation = √ 91 .357=9 . 558.

For stock B the standard deviation is 25.385 (check this using STATMODE).

Interpretation: The stock A values differ (on average) from the mean by 9.558, while stock
B values differ (on average) from the mean by almost 3 times this amount.

2.5.2 Grouped data

Standard deviation and variance

For grouped data, the raw data formulae for the variance and standard deviation can be
slightly modified.
k k k
∑ ( x mid( i )− x̄ )2 f ∑ x 2mid( i ) f i −( ∑ x mid (i ) f i )2 / n
i=1 i=1 i =1

variance = S = 2 n−1 = n−1 .

As before standard deviation = S = √ S .


2

Example

For the frequency distribution of temperatures (example 2 of the frequency distributions), the
variance and standard deviation can be calculated as shown below.

Class
boundaries xmid(i) fi xmid(i) fi x 2mid ( i ) f i
37.5-41.5 39.5 4 158 6241
41.5-45.5 43.5 10 435 18922.5
45.5-49.5 47.5 8 380 18050
51.5 39783.7
49.5-53.5 15 772.5 5
39

55.5 27722.2
53.5-57.5 9 499.5 5
57.5-61.5 59.5 3 178.5 10620.7
5
61.5-65.5 63.5 1 63.5 4032.25
Total 50 2487 125372.
5

125372 .5−24872 /50


=34 . 06367
variance = S2 = 49

standard deviation = S = √ 34 . 06367 =5.836.

2.6 Mean deviation

A measure of variation can also be based on absolute differences between the values and the
mean i.e., ¿ x i−x∨¿.
n
1
MD = ∑ ¿ x i−x∨¿¿ .
n i=1

Example

For the stock A data, the mean deviation can be calculated as shown below.

x |x-113.25|
100 13.25
120 6.75
110 3.25
108 5.25
130 16.75
106 7.25
120 6.75
112 1.25
Total MD = 60.5
mean 113.25

2.7 Coefficient of variation

The standard deviations of 2 data sets that are expressed in different units cannot be
compared directly. Such a comparison can be done by calculating the

S∗100
.
coefficient of variation = CV = x̄

Example:
40

For the temperature data x̄= 49.74 and S=5.836.

For the expenditure data (see example 3 of the frequency distributions) x̄= 7.93333 and

S = 1.65567.

Since the two standard deviations that were calculated above are in different units, they
cannot be compared directly.

5. 836∗100
=11. 733 %
For the temperature data CV = 49 . 74 .

1. 65567∗100
=20 . 87 %.
For the expenditure data CV = 7 . 9333
The coefficient of variation calculations shows that in relative terms the variability for
expenditure data set is greater than that of the temperature data set.

2.8 Chebychev’s theorem and bell-shaped data

1
1−
Chebychev’s theorem states that for any data set a proportion of at least d 2 of the values
lie within d standard deviations of the mean.

Examples

1
1− =0. 75 .
1 Proportion of values that lie within 2 standard deviations of the mean ≥ 22

1
1− =0. 889 .
2 Proportion of values that lie within 3 standard deviations of the mean ≥ 32

3 A coffee maker is regulated so that it takes an average of 5.8 min to brew a cup of coffee
with a standard deviation of 0.6 min. What proportion of the time will it take

(a) between 4.8 and 6.8 minutes

(b) less than or equal to 4.8 minutes or greater than or equal to 6.8 minutes

to brew a cup of coffee?

Solution

4.8 – 5.8 = -1

6.8 – 5.8 = 1.
41

1
=
d = 0 .6 1.667 standard deviations.

1
2
=
(a) proportion of time between 4.8 and 6.8 minutes ≥ 1- 1. 667 1 - 0.36 = 0.64.

(b) From the answer to (a) and following from the fact that

proportion (between 4.8 and 6.8 minutes) + proportion(<= 4.8 minutes or >= 6.8 minutes) =1,

proportion (<= 4.8 minutes or >= 6.8 minutes) ≤ 1-0.64 =0.36.

The Empirical Rule (bell-shaped distributions)

If it is known that the data set of interest has a bell-shaped clustering pattern of the values,
results that are better than that of Chebychev’s theorem can be obtained. For data with such a
shape

(i) Approximately 68% of data values are within 1 standard deviation of the mean.
(ii) Approximately 95% of data values are within 2 standard deviations of the mean.
(iii) Approximately 99.7% of data values are within 3 standard deviations of the
mean.

Example: Men’s Heights have a bell-shaped distribution with a mean of 69.2 inches and a
standard deviation of 2.9 inches.

Approximately 68% of data values are within 69.2 ± 2.9 = (66.3, 72.1).
Approximately 95% of data values are within 69.2 ± 5.8 = (63.4, 75).
Approximately 99.7% of data values are within 69.2 ± 8.7 = (60.5, 77.9).

2.9 Measures of position – percentiles

2.9.1 Definitions

The ith percentile , Pi , is the value that has i% of the values in a data set less or equal to it
(0< i ≤.100).

Examples

1 Median = me = 50th percentile = P50.

2 First quartile = Q1 = 25th percentile = P25.

3 Third quartile = Q3 = 75th percentile = P75.

4 The 9 deciles D1, D2, . . . , D9 are the values that have 10%, 20%, . . . , 90% respectively
of the values in the data set less or equal to them.
42

D1 = P10, D2 = P20, , . . . , D5 = P50 = me, . . . , D9 = P90.

The pthquantile (not to be confused with a quartile) denoted by Qt p is the point that divides
a data set into two groups with 100 p % of the data values below it and 100(1− p)% above it.

Examples

Qt 0.5 = median, Qt 0.25 = first quartile, Qt 0.60 = 6 th decile, Qt 0.85= 85th percentile.

2.9.2 Calculation of quartiles and quartile deviation for raw data

There are many methods (15 according to the article to be found at the webite address below)
that can be used to calculate the first and third quartiles ( Q1 and Q3).

http://jse.amstat.org/v14n3/langford.html

The method explained here is referred to as the “M&M method”.

For raw data the calculations of the first and third quartiles are based on the same principles
as that of the median.

Steps to be followed in calculating the first and third quartiles for raw data.

1 Organize the values in the data set in ascending order in magnitude.

2 Find the median.

3 Divide the data set into 2 portions of equal numbers of values – set 1 consists of those
values less or equal to the median and set 2 consists of those values greater or equal to the
median. When the data set has an odd number of values, the median is excluded from the
division of the data set into 2 portions.

4 The first quartile (Q1) is the median of set 1 and the third quartile (Q3) is the median of
set 2.

Examples

The distance from home to work (kilometers) of 11 employees at a certain company are
shown below. Calculate Q1 and Q3.

6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36

Example 1 Ordered data set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49

2 Median = 40. After this step the median is deleted from the data set.

3 Set 1 – 5 values less than median i.e., 6, 7, 15, 36, 39.

Set 2 – 5 values greater than the median i.e., 41, 42, 43, 47, 49.
43

Q1 = median of set 1 = 15, Q3 = median of set 2 = 43.

Example 2 Suppose the data set consists of the above values and 56 (12 values).

6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36, 56

1 Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49, 56

40+ 41
=40 . 5.
2 median = 2 Unlike what was done in example 1, no values are deleted from
the data set.

3 Set 1 – 6 values less or equal than median i.e., 6, 7, 15, 36, 39, 40

Set 2 – 6 values greater or equal than the median i.e., 41, 42, 43, 47, 49, 56.

15+36 43+ 47
=25 . 5 =45.
4 Q1 = median of set 1 = 2 , Q3 = median of set 2 = 2
Note: Another approach when n is odd is to include the median in both halves when
calculating the two quartiles. Using this approach Q1 will be the median of 6, 7, 15, 36, 39, 40
15+36
which is =25.5 and Q3 will be the median of 40, 41, 42, 43, 47, 49 which is
2
1
(42+43)=42.5. This gives slightly different answers to the approach where the median is
2
excluded. This approach for calculating the quartiles is also known as the Tukey hinges
method.

Q3 −Q 1
The quartile deviation = Q = 2 can also be used as a measure of variability.
For the data set in example 1, quartile deviation = Q = (43 – 15)/2 = 14.

The quartile deviation value shows the extent to which the values in the data set deviate from
the median. For a skew data set (heavy clustering at lower or upper end of the scale) the
quartile deviation is a more appropriate measure of variability than the standard deviation
(which is more suitable as a measure of variability for symmetric data sets).

2.9.3 Calculation of median, quartiles and percentiles for grouped data

A formula for calculating the ith percentile Pi for grouped data is shown below.

c (n∗i/100−F less )
Pi = Li + fi , i = 1, 2, . . . , 100.

Percentile class – class that contains the percentile that is calculated.

Li = lower class boundary of percentile class.


44

fi = frequency of percentile class

n – sample size

Fless – Sum of frequencies of classes less than percentile class.

c – class width.

Examples

For the frequency distribution of temperatures (example 2 of the frequency distributions –


table given below), the calculations of the median, first quartile, third quartile, 4 th decile and
65th percentile are shown below.

class
boundaries f cumulative frequency
37.5-41.5 4 4
41.5-45.5 10 14
45.5-49.5 8 22
49.5-53.5 15 37
53.5-57.5 9 46
57.5-61.5 3 49
61.5-65.5 1 50
Total 50

1 Median.

The above formula with i = 50, n = 50 applies.

i∗n 50∗50
= =25 .
Step 1: Calculate position of median = 100 100

Step 2: Median class (class that contains 25th observation) is the class 49.5-53.5.

Step 3: L50 = 49.5, f50 =15, Fless= 22, c = 4.

Step 4: Substitute into the above formula.

(25−22)∗4
=
Median = 49.5 +15 50.3.

First quartile

The above formula with i = 25, n = 50 applies.

i∗n 25∗50
= =12 .5 .
Step 1: Calculate position of first quartile = 100 100
45

Step 2: First quartile class (class that contains 12.5th observation) is the class 41.5-45.5.

Step 3: L25 = 41.5, f25 =10, Fless= 4, c = 4.

Step 4: Substitute into the above formula.

(12 .5−4 )∗4


=
Q1 = 41.5 +10 44.9.

Third quartile

The above formula with i = 75, n = 50 applies.

i∗n 75∗50
= =37 . 5 .
Step 1: Calculate position of third quartile = 100 100

Step 2: Third quartile class (class that contains 37.5th observation) is the class 53.5-57.5.

Step 3: L75 = 53.5, f75 = 9, Fless= 37, c = 4.

Step 4: Substitute into the above formula.

(37 . 5−37 )∗4


=
Q3 = 53.5 + 9 53.72.

Fourth decile

The above formula with i = 40, n = 50 applies.

i∗n 40∗50
= =20 .
Step 1: Calculate position of 4th decile = 100 100

Step 2: 4th decile class (class that contains 20th observation) is the class 45.5-49.5.

Step 3: L40 = 45.5, f40 = 8, Fless= 14, c = 4.

Step 4: Substitute into the above formula.

(20−14 )∗4
=
D4 = 45.5 + 8 48.5.

65th Percentile

The above formula with i = 65, n = 50 applies.

i∗n 65∗50
= =32 .5 .
Step 1: Calculate position of 65 percentile = 100 100
th
46

Step 2: 65th percentile class (class that contains 32.5th observation) is the class 49.5-53.5.

Step 3: L65 = 49.5, f65 = 15 , Fless= 22, c = 4.

Step 4: Substitute into the above formula.

(32 .5−22)∗4
=
P65 = 49.5 +15 52.3.

Percentiles can also be read off from a “less than” ogive.

Example

The following cumulative frequency graph shows the distribution of marks scored by a class
of 40 students in a test.

From the graph Q1 = 36, Me = 44, Q3 =52.

2.10 Five number summary and Box-and-Whisker plot

Five number summary

A five number summary of a data set is a summary using the minimum, 1st quartile, median,
3rd quartile and maximum as summary measures.

The five number summary shows the following types of information.

type value(s)
47

central tendency median


deviation Q3 −Q 1
quartile deviation = Q = 2
extremes minimum and maximum

Example

The IQ’s of 13 people are shown below.

92, 104, 93, 98, 112, 145, 88, 90, 104, 119, 101, 95, 154

minimum = 88, Q1 = 93, median = 101, Q3 = 112, maximum = 154.

In the above the 1st and 3rd quartiles were calculated according to the Tukey hinges
definitions.

The difference
2 Q=Q3 −Q1 is known as the interquartile range and Q (above) as the semi
interquartile range.

Box-and-Whisker plot

This is a graphical representation of the following 5 numbers.

lower whisker, 1st quartile, median, 3rd quartile, upper whisker

The “box” portion of this graph has the 1st and 3rd quartiles defining its lower and upper
limits and the median plotted at its position in between these limits.

The whiskers can be calculated by following the following steps.

Step 1

Calculate the interquartile range = 2Q =


Q3 −Q1 Q1 −3 Q and
, Q* = Q1- 1.5(2Q) =

Q** = Q3 + 3Q.

Step 2

If Q* ≤ minimum then lower whisker = minimum


else lower whisker = smallest value in data set ≥ Q*.

Step 3

If Q** ≥ maximum then upper whisker = maximum


else upper whisker = largest value in data set ≤ Q**.

Example
48

For the IQ data set (see previous example) Q1 = 93, median = 101 and Q3 = 112 define the
“box” portion.

112−93
=9 .5
Q= 2 , Q* = 92.5-3*9.5 = 64.5, Q**=112 + 3*9.5 =140.5.

Since Q*=64.5 < minimum = 88, lower whisker = 88.

Since Q** = 140.5 < maximum = 154 and Q** = 140.5 < 2nd largest = 145, there are 2
outliers and upper whisker = 119 (largest value less or equal than 140.5).

In the plot the maximum = 154 and 2nd largest = 145 are shown as separate points (outliers)
above the upper whisker.

Unusually large or small values (such as this maximum) are called outliers.

A Box-and-Whisker plot can also be used to assess the skewness (departure from symmetry)
of a variable. For positively skewed data most of the values are at the lower end of the scale
(mean > median, “box” section of the plot towards the lower end of the scale) and for
negatively skewed data most of the values are at the upper end of the scale (mean < median,
“box” section of the plot towards the upper end of the scale). In the previous example the data
set is positively skew.

When several data sets are to be compared, several Box-and-Whisker plots can be plotted
side-by-side.

Example

The Box-and-Whisker plot shown below enables one to compare delays in departing flights
(in minutes) for certain days in December (16th to the 26th).
49

For all the days the data sets are positively skewed (data sets all have the “box” section closer
to the lower end of the scale with a long upper whisker). This means that there are short
delays in flight departures on all the days. The long upper whiskers that are visible show that
there were some quite late departures on 16, 17, 21, 22, 23, 24 and 25 December.

2.11 Skewness

Consider the following data

x1 =0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9,
10
x2 = 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8,
9, 10
x3 = 0, 1, 2, 2, 3, 4, 5, 5, 5, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10,
10,10

The formula used to calculate the skewness is


n
n
skewness ¿
( n−1 ) ( n−2 ) S3 ∑ ( xi −x ) 3for n> 2 , where S is the sample standard deviation.
i=1

Using the R software packge computer formula, skewness(x1) = 0, skewness(x2) = 0.6204


and skewness(x3) = -0.9751.

Skewness – summary table.

Variable Skewness Boxplot (see below) Comment


x1 0 Median halfway symmetric
between 2 quartiles,
50

whiskers of equal length


Box at the lower end of Positively skewed.
scale, median closer to Clustering at lower
Q3, upper whisker end of scale.
x2 0.6204 longer that lower one.
Box at the upper end of Negatively skewed.
scale, median closer to Clustering at upper
Q3, lower whisker end of scale.
x3 -0.9751 longer that upper one.

Boxplots

x1 x2 x3

Histograms
51

2.11 Computer output

1 The following is the output of computer calculations (using excel) of the various measures
of description for the data set in example 3 in section 2.3.

7.93987
Mean 4
0.21710
Standard Error 6
7.88656
Median 5
Standard 1.68169
Deviation 7
2.82810
Sample Variance 6
Skewness -0.02094
Range 6.41369
Minimum 4.90014
11.3138
Maximum 3
476.392
Sum 4
Count 60

2 The following is a stem-and-leaf plot for the data (rounded to 1 decimal place) in example
3 in section 2.3 obtained from SPSS. The pattern of the plot is bimodal.

expend Stem-and-Leaf Plot


52

Frequency Stem & Leaf

1.00 4 . 9
11.00 5 . 01145556899
4.00 6 . 5788
15.00 7 . 001223444577788
12.00 8 . 011133455888
8.00 9 . 02335668
8.00 10 . 11113379
1.00 11 . 3

Stem width: 1.00000


Each leaf: 1 case(s)

Chapter 3 – Probability
3.1 Terminology

Probability (Chance)

A probability is the chance that something of interest (called an event) will happen.

A probability is usually expressed as a proportion (range of values from 0 to 1) but can also
be expressed as a percentage (range of values from 0 to 100).

Examples

1 The probability (chance) of rain tomorrow is 0.40 (40%).

1
2 The probability of winning the Lotto is (0.00000715%).
13983816

3 The probability of a certain new product being successful is 0.75 (75%).

Random experiment

This is an experiment that gives different outcomes when repeated under similar conditions.

1 The experiment can have more than one possible outcome.

2 All possible outcomes can be listed.

3 The outcome that will occur when the experiment is performed depends on chance.

Examples

1 Tossing a coin (possible outcomes: heads, tails).

2 Rolling a die (possible outcomes: 1, 2, 3, 4, 5, 6).

3 Asking a person to assign a rating to a product (possible outcomes: A, B, C, D, E).


53

4 Drawing a card from a deck of cards (possible outcomes: 13 hearts, 13 clubs, 13 spades,
13 diamonds).

Set

A set is a collection of outcomes.

Sample space

The sample space is the set of all possible outcomes of a random experiment.

A sample space is usually denoted by the symbol S and the collection of elements contained
in S enclosed in curly brackets { }.

Sample point

A sample point is an individual outcome (element) in a sample space.

Examples

1 Tossing a single coin. S = {h, t}.

2 Tossing a die. S = {1, 2, 3, 4, 5, 6}.

3 Tossing a pair of dice


S=
{ (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}.

4 Tossing two coins.

S = {hh, ht, th, tt}.

5 Drawing a card from a deck of cards. The elements in the sample space are listed below.

Diamonds (Red): 2♦ 3♦ 4♦ 5♦ 6♦ 7♦ 8♦ 9♦ 10♦ J♦ Q♦ K♦ A♦


Hearts (Red): 2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ 9♥ 10♥ J♥ Q♥ K♥ A♥
Clubs (Black): 2♣ 3♣ 4♣ 5♣ 6♣ 7♣ 8♣ 9♣ 10♣ J♣ Q♣ K♣ A♣
Spades (Black): 2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠ A♠

Each outcome listed in the above examples is a sample point.

Event
54

An event is a subset of a sample space i.e., a collection of sample points taken from a sample
space.

Impossible event

An impossible event is an event that cannot happen (has probability zero).

Certain event

A certain event is an event that is sure to happen (has probability 1).

Simple events are events that involve only one outcome at a time.

Examples

1 Let E denote the event “an odd number is obtained when tossing a single die”.

Then E = {1, 3, 5}.

2 Let H denote the event “at least one head appears when tossing two coins”.

H = {hh, ht, th}. At least one means “one or more”.

2 Let C denote the event “at most one head appears when tossing two coins” and D the
event (“at least one head appears when tossing two coins”).

C = {tt, ht, th}. At most one means “one or less”.


D = {ht, th, hh}. At least one means “one or more”.

4 Let B denote the event “obtaining a club and a heart in a single draw from a deck of
cards”. The event B is impossible. The set of outcomes of B is an empty set denoted by
B = { } = φ.

5 Let A denote the event “obtaining a 1, 2, 3, 4, 5 or 6 when tossing a single die”. The event
A is a certain event i.e., one of the outcomes belonging to the set describing the event must
happen. This is denoted by A = S, where S is the sample space.

The events E, H, C, D, B and A above are all examples of simple events.

Venn diagrams

A Venn diagram is a drawing, in which circular areas represent groups of items usually
sharing common properties.

The drawing consists of two or more circles, each representing a specific group or set,
contained within a square that represents the sample space. Venn diagrams are often used as a
visual display when referring to sample spaces, events and operations involving events.

3.2 Complement, Union and Intersection of events


55

Compound events are events that involve more than one event. Such events can be obtained
by performing various operations involving two or more events.

Some of the operations that can be performed are described in the sections that follow.

Complementary events

The complementary event Ā of an event A is all the outcomes in S that are not in A (purple
part of the diagam below).

Examples

1 Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement of
the event A = “obtaining a 3 or less” = {1, 2, 3} is Ā = “obtaining a 4 or more” = {4, 5, 6}.
2 Consider the experiment of tossing two coins. S = {hh, ht, th, tt}. The complement of the
event H = “at least one head”= {hh, ht, th} is H̄= “no heads” = {tt}.

Union and intersection of events

The union of two events A and B denoted by A ∪B is the set of outcomes that are in A or
in B or in both A and B i.e., the event “either A or B or both A and B occur”. The event
A ∪B can also be interpreted as the event “at least one of A or B occurs”.
56

The intersection of two events A and B denoted by A ∩B is the set of outcomes that are in
both A and B i.e., the event “both A and B occur”.

The Venn diagrams below show the sets A ∪B and A ∩B .

These definitions involving two events can be extended to ones involving 3 or more events
e.g., for the 3 events A1, A2 and A3 the event
A1 ∪ A2 ∪ A3 is the event “at least one of A , A
1 2

or A3 occurs” and
A1 ∩ A2 ∩ A3 the event “A and A and A occur”.
1 2 3

Ā∩B is the event “a sample point is in B but not in A”.

A∩B̄ is the event “a sample point is in A but not in B”.

Examples

1 Consider the events A = {1, 3, 6, 7, 8} and B = { 2, 3, 5, 7, 9} defined on a sample space


S = {1, 2, 3, . . . , 10}.

A ∪B = {1, 2, 3, 5, 6, 7, 8, 9} , A ∩B = { 3, 7}, Ā∩B = {2, 5, 9}, A∩B̄ = {1, 6, 8}.


2 Let C be the event “drawing a face card from a deck of cards” and A the event “drawing a
king or an ace from a deck of cards”.

C = {J diamonds, Q diamonds, K diamonds, J hearts, Q hearts, K hearts,


J spades, Q spades, K spades, J clubs, Q clubs, K clubs}

A = {A diamonds, A hearts, A spades, A clubs, K diamonds, K hearts, K spades, K clubs}.

C∪ A = {J diamonds, Q diamonds, K diamonds, J hearts, Q hearts, K hearts,


J spades, Q spades, K spades, J clubs, Q clubs, K clubs,
A diamonds, A hearts, A spades, A clubs}.

C ∩ A = { K diamonds, K hearts, K spades, K clubs}.


57

Mutually (disjoint) exclusive events

Two events A and B are mutually exclusive (disjoint) if they have no elements (outcomes) in
common. This also means that these events cannot occur together.

Examples

1 Let B be the event “drawing a black card from a deck of cards” and

R the event “drawing a red card from a deck of cards”.

The events B and R have no outcomes in common i.e. , B∩R=φ (empty set). Hence B and
R are mutually exclusive.

2 Let E be the event “an even number with a single throw of a die” and O the event “an odd
number with a single throw of a die”.

E = {2, 4, 6} and O = {1, 3, 5}.

E and O have no outcomes in common i.e., E∩O=φ and are therefore mutually exclusive.

3.3 Definitions of probability

Classical definition of probability

If there are n equally likely total numbers of outcomes of which m are favorable to an event
A, then the probability of occurrence of an event A, denoted as P(A), is given by

N (A) m
P(A) = N ( S) = n ,

where N(A) = m is the number of outcomes favourable to the event A and N(S) = n the
number of outcomes in the sample space S i.e., the total number of outcomes.

Note: Since N(A) ≥0 and N(A) ≤ N(S), 0 ≤ P(A) ≤ 1.


58

Examples

1 Two coins are tossed. Find the probability of getting

(i) exactly two heads,


(ii) at least one head.

Solution: Here, S ={hh, ht, th, tt} .


(i) Let A = getting exactly two heads = {hh}
∴ P(A) = ¼.
(ii) Let B = getting at least one head = {hh, ht, th}
∴ P(B) = ¾.

2 Two dice are rolled. Find the probability that a sum of 7 will occur.

Solution: The number of sample points in S is 36 (see example 3 under sample space).

Let A = “a sum of 7 will occur”.

= {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}

P(A) = 6/36 = 1/6.

The classical definition of probability requires the assumption that all the outcomes in the
sample space are equally likely. If this assumption is not met, this formula cannot be used.

Example

The possible temperatures (degrees Celsius) at a certain location on a particular day are

21, 22, 23, 24, 25, 26, 27.

1
P(temperature=22) = 7 would be incorrect if all the temperature values are not equally likely
e.g., suppose that over the past year these temperatures occurred the following numbers of
times.

Temperature 21 22 23 24 25 26 27 Total
Number of
days 15 16 20 25 19 21 14 130
Estimated 0.11538 0.12307 0.15384 0.19230 0.14615 0.16153 0.10769
Prob. 5 7 6 8 4 8 2

Relative frequency (empirical) definition of probability

If an experiment is repeated n times and an event A is observed f times, then the estimated
probability of occurrence (empirical probability) of an event A is given by
59

f
P(A) = n .

Note: This formula differs from the classical formula in the sense that the classical formula
uses all the outcomes in the sample space as the total number of outcomes, while the relative
frequency formula uses the number of repetitions (n) of the experiment as the total number of
outcomes. In the classical formula the number of outcomes in the sample space is fixed,
while the number of repetitions of an experiment (n) can vary. It can be shown that the
empirical probability is a good approximation of the true probability when n is sufficiently
large (Law of large numbers).

Examples

1 A bent coin is tossed 1000 times with heads coming up 692 times.

692
=0 .692 .
An estimate of P(heads) is 1000

2 A summary of the final marks in a certain statistics course is shown below.

mark f
less than 30 6
30-39 26
40-49 45
50-59 64
60-69 82
70-79 37
80-89 22
90-99 8
Total 290

From the table (using the empirical formula) the following probabilities can be estimated.

26+ 6
=0. 110.
(a) P(mark less than 40) = 290

64+ 82+37+22+8 6+ 26+45 213


=1− = =0. 73 .
(b) P(pass) = 290 290 290

22+8
=0 . 103.
(c) P(above 80) = 290

Probabilities involving the occurrence of single events are called marginal probabilities.
Probabilities involving the occurrence of two or more events are called joint probabilities
e.g., P(A¿ B) is the joint probability of both A and B occurring.

Example
60

The preference probabilities according to gender for 2 different brands of a certain product
are summarized in the table below.

gender/brand 1 2 marginal probability


male 0.20 0.3 0.52
2
female 0.40 0.0 0.48
8
marginal probability 0.60 0.4 1
0

Joint probabilities: P(male ¿ brand 1) = 0.20, P(male ¿ brand 2) = 0.32,


P(female ¿ brand 1) = 0.40, P(female ¿ brand 2) = 0.08

Marginal probabilities: P(male) = 0.52, P(female) = 0.48, P(brand 1) = 0.60,

P(brand 2) = 0.40.

The gender marginal probabilities are obtained by summing the joint probabilities over the
brands. The brand marginal probabilities are obtained by summing the joint probabilities over
the genders.

3.4 Counting formulae

The computation of probabilities using the classical definition involves counting the number
of outcomes favourable to the event of interest (say event A) and the total number of possible
outcomes in the sample space. The following formulae can be used to count numbers of
outcomes to be used in the classical definition formula.

Addition and Multiplication formulae for counting

Addition formula: If an experiment can be performed in n ways, and another experiment can
be performed in m ways then either of the two experiments can be performed in (m+n) ways.

This rule can be extended to any finite number of experiments. If one experiment can be done
in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then one of the k
experiments can be done in n1 + n2 +. . . + nk ways.

Example: Suppose there are 3 doors in a room, 2 on one side and 1 on other side. A person
wants to go out of the room. Obviously, he/she has 3 options to go out. He/she can go out by
61

door A or door B or door C.

Multiplication formula: If an experiment can be done in m ways and another experiment


can be done in n ways, then both the experiments can be done in m x n ways.

This rule can be extended to any finite number of experiments. If one experiment can be done
in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then the k experiments
together can be done in n1* n2 * . . . * nk ways.

Example 1: A basic meal consists of soup, a sandwich and a beverage. If a person having this
meal has 3 choices of soup, 4 choices of sandwiches and a choice of coffee or tea as a
beverage, how many such meals are possible?

Choosing soup (experiment 1) has 3 possibilities.


Choosing a sandwich (experiment 2) has 4 possibilities.
Choosing a beverage (experiment 3) has 2 possibilities.

Number of choices of meals = 3 x 4 x 2 = 24.

Example 2: A PIN to be used at an ATM can be formed by selecting 4 digits from the digits
0, 1, 2, . . . , 9 . How many choices of PIN are there if

(a) digits may be repeated?


(b) digits may not be repeated?

(a) First digit – 10 choices, second digit – 10 choices,


third digit – 10 choices, fourth digit – 10 choices.

number of choices = 10 x 10 x 10 x 10 = 104 = 10 000.

(b) First digit – 10 choices, second digit – 9 choices,


third digit – 8 choices, fourth digit – 7 choices.

number of choices = 10 x 9 x 8 x 7 = 5040.

Factorial notation

In how many ways can n (n – integer) objects be arranged in a row?

Let n =2: 1st object – 2 choices


2nd object – 1 choice.
62

Number of ways = 2 x 1 = 2.

Let n = 3: 1st object – 3 choices


2nd object – 2 choices.
3rd object – 1 choice.

Number of ways = 3 x 2 x 1 = 6.

In general the number of ways is n x (n-1) x (n-2) x . . . x 2 x 1 = n! (n factorial)

Using this notation 2 x 1 = 2! = 2


3 x 2 x 1 = 3! = 6
4 x 3 x 2 x 1 = 4! = 24 etc.

Note: 1! = 1, 0! = 1.

The factorial notation is used in counting formulae.

Examples

1 In how many ways can 7 people be placed in a queue at a bus stop?

The 7 people are placed in the 7 positions from 1st to 7th.

no of ways = 7 x 6x 5 x . . . x 2 x 1 = 7! = 5040.

2 In how many ways can 5 books be arranged in a row?

no of ways = 5 x 4 x 3 x 2 x 1 = 5! = 120.

Permutations and combinations

A permutation is the number of different arrangements of a group of items where order


matters.

The number of permutations of n objects taken r at a time is calculated from

n!
nPr = P(n, r) =
(n−r)! .

A combination is the number of different selections of a group of items where order does
not matter.

The number of combinations of a group of n objects taken r at a time is calculated from

n n!
nCr = C(n, r) = (r ) =. ( n−r ) ! r ! .
63

Examples: 1 Four people (A, B, C, D) serve on a board of directors. A chairman and vice-
chairman are to be chosen from these 4 people. In how many ways can this be done?

Chairma Vice-chairman
n
A B
B A
A C
C A
A D
D A
B C
C B
B D
D B
C D
D C

Number of ways = 4×3=¿ 12.

2 Four people (A, B, C, D) serve on a board of directors. Two people are to be chosen from
them as members of a committee that will investigate fraud allegations. In how many ways
can this be done?

People chosen A and A and C A and D B and C B and C and D


B D

Number of ways = 6.

In both these examples a choice of 2 people from 4 people is made. In example 1 the order of
choice of the 2 people matters (since the one person chosen is chairman and the other one
vice-chairman). In example 2 the order does not matter. The only interest is in who serves on
the committee.

Application of formulae.

In question 1 the permutations formula applies with n = 4, r =2.

4!
=12.
Number of ways = P(4, 2) = (4−2 )!
In question 2 the combinations formula applies with n = 4, r =2.

4!
=6.
Number of ways = C(4, 2) = 2!(4−2)!
64

3 Find the number of ways to take 4 people and place them in groups of 3 at a time where
order does not matter.

Solution:
Since order does not matter, use the combination formula.
4! 24
= =4
C(4,3) = 3!(4−3 )! 6 .
4 Find the number of way to arrange 6 items in groups of 4 at a time where order matters.
Solution:
6! 720
P(6,4) = = =360 .
( 6−4 ) ! 2
There are 360 ways to arrange 6 items taken 4 at a time when order matters.
5 Find the number of ways to take 20 objects and arrange them in groups of 5 at a time
where order does not matter.
Solution:
20 ! 20 ×19 ×18 ×17 × 16
C(20,5) = = =15504 .
5! (20−5 ) ! 1.2.3 .4 .5
There are 15 504 ways to arrange 20 objects taken 5 at a time when order does not matter.
6 Determine the total number of five-card hands that can be drawn from a deck of 52 cards.
Solution:
When a hand of cards is dealt, the order of the cards does not matter. Thus, the combinations
formula is used.

There are 52 cards in a deck, and we want to know in how many ways we can draw them in
groups of five at a time when order does not matter. Using the combination formula gives

C(52,5) = 2 598 960.


7 There are five women and six men in a group. From this group a committee of 4 is to be
chosen. In how many ways can a committee be formed that contain at least three women?
Solution
Possibilities: 3 women, 1 man. Number of ways = C(5,3) x C(6,1) = 10 x 6 = 60
4 women, no men. Number of ways = C(5,4) = 5
Total number of ways = 60 + 5 = 65.
8 In how many ways can a phone number consisting of 6 digits, where the first digit is not a
0, be chosen from the digits 0, 1, 2, 3, . . . , 9 if no digits are to be repeated?

Solution

First digit: 9 choices.


Next 5 digits: P(9, 5) = 15 120.
Number of choices = 9 × 15 120 = 136 080

9 In how many ways can the 6 winning numbers in a Lotto draw be selected?

C(49,6) = 13 983 816.


10 In how many ways can a five-card hand consisting of three eight's and two sevens be
dealt?
Solution
The combination formula is used (why?).
65

We have 4 eights and 4 sevens.


We want 3 eights and 2 sevens.
C(have 4 eights, want 3 eights)*C(have 4 sevens, want 2 sevens)
= C(4,3)*C(4,2) = 24.

Therefore, there are 24 different ways in which to deal the desired hand.
11 How many different 5-card hands include 4 of a kind and one other card?
Solution:
We have 13 different ways to choose 4 of a kind: 2's, 3's, 4's, … Queens, Kings and Aces.
Once a set of 4 of a kind has been removed from the deck, 48 cards are left.
Remember OR means add.
The possible situations that will satisfy the above requirement are:
4 Aces and one other card C(4,4)*C(48,1) = 48.
or 4 Kings and one other card C(4,4)*C(48,1) = 48.
or 4 Queens and one other card C(4,4)*C(48,1) = 48.
.
.
.
or 4 twos and one other card C(4,4)*C(48,1) = 48.
Total of 48*13 = 624 ways.
12 A local delivery company has three packages to deliver to three different houses.
If the packages are delivered at random to the three houses, in how many ways can at least
one house to get the wrong package?

Solution

The first package can be delivered to any of 3 houses.

Given that the first package was delivered, the second package can be delivered to any of 2
houses.

Given that the first two packages were delivered, the third package can be delivered to only
one house.

The 3 packages together can be delivered to the houses in 3*2*1 = 3! = 6 ways.

There is only one way in which all 3 packages can be delivered to the correct house.

The event “at least one house to gets the wrong package” is the complement of the event
“all 3 packages are delivered to the correct house” (why?).

The number of ways at least one house gets the wrong package is therefore

6 - 1 = 5.

3.5 Basic probability formulae

Complementary events

For any event A defined on some sample space,


66

P( Ā ) = 1 – P(A).

Union of two or more events

For any two events A and B defined on some sample space,

P( A∪B )= P(A) + P(B) for mutually exclusive events.

= P(A) + P(B) −P( A∩B ) for events that are not mutually exclusive.
Proof: P( A∪B )=P( Ā∩B)+P( A∩B)+P( A∩B̄ )
=P ( A )−P ( A∩B)+P( B )−P( A∩B )+P( A∩B )
=P ( A )+P( B)−P ( A∩B )
These formulae can be extended to probabilities involving more than two events e.g.,
for 3 events A, B and C defined on some sample space

P( A∪B∪C ) = P(A) + P(B) + P(C) for mutually exclusive events.

= P(A) + P(B) + P(C) – P( A∩B ) – P(A¿ C ) – P( B∩C )+P( A∩B∩C )

for events that are not mutually exclusive.

This formula can easily be verified with the aid of the Venn diagram shown below.

From the above diagram the following sets can be written down.

A = {1, 2, 4, 5 }; B = {2, 3, 5, 6} ; C = {4, 5 ,6, 7} ; A ∩B = {2, 5} ; A ∩C = {4, 5 };


B∩C = {5, 6}; A ∩B ∩C = {5}; A ∪B ∪C = {1, 2, 3, 4, 5, 6, 7, 8}.

Exercise: Complete the verification of the result for P( A∪B∪C ) .


Note: This is not a proof of the formula. It is verification that the formula does work.
67

The result for P( A∪B∪C ) can also be theoretically proved by applying the result for
P( A∪B ) (for non mutually exclusive events) more than once.

Exercise: Prove the result for P( A∪B∪C ) .

De Morgan’s Laws
____
1 P( Ā∩B̄ )=P( A∪B)

_____
2 P( Ā∪B̄ )=P( A∩B)

The two above results can also be written in a different notation as

' ' ' ' ' '


1 P( A ∩B )=P( A∪B ) and 2 P( A ∪B )=P( A∩B ) respectively, where ‘ is read
as “the complement of ”.

Venn diagram verification of second result

Exercise: Verify the first result by using Venn diagrams.

Total probability formulae

P(A) = P( A∩B )+P( A∩B̄ )


68

P(B) = P( A∩B )+P( Ā∩B )

These formulae can be verified from the Venn diagram shown below.

The above formulae can be extended to probabilities involving more than two events.

Examples

1 There are two telephone lines A and B. Line A is engaged 50% of the time and line B is
engaged 60% of the time. Both lines are engaged 30% of the time. Calculate the probability
that

(a) at least one of the lines are engaged.


(b) none of the lines are engaged.
(c) line B is not engaged.
(d) line A is engaged, but line B is not engaged.
(d) only one line is engaged.

Solution

Let E1 denote the event “line A is engaged” and E2 the event “line B is engaged”.

Given: P(E1) = 0.5, P(E2) = 0.6, P(E1¿ E2) = 0.3.

(a) P(at least one of the lines are engaged) = P(E1¿ E2) = P(E1) + P(E2) – P(E1¿ E2)

= 0.5 + 0.6 – 0.3 = 0.8

(b) P(none of the lines are engaged.) = 1 – P(at least one of the lines are engaged) = 1-0.8
= 0.2.

(c) P(B not engaged) = 1 – P(B engaged) = 1 – P(E2) = 1-0.6 = 0.4.

(d) The event “line A is engaged, but line B is not engaged” can be written in symbols as

P(E1 ¿ Ē2 ) = P(E1) – P(E1 ¿ E2) = 0.5-0.3 = 0.2. (Using the total probability
formula)
69

(e) P(only one line is engaged) = P(line A is engaged, but line B is not engaged) +
P(line B is engaged, but line A is not engaged)

= P( E 1 ∩ Ē 2 )+ P( Ē 1∩E 2 )

P( Ē1 ∩E2 ) = P(E2) - P(E1 ¿ E2) = 0.6-0.3 = 0.3. (Using the total probability
formula)

P(only one line is engaged) = 0.2 + 0.3 = 0.5

2 Let O be the event that a certain lecturer will be in his/her office on a particular afternoon
and L the event that he/she will be at a lecture. Suppose P(O) = 0.48 and P(L) = 0.27.

(a) State in words the event Ō∩ L̄ .

(b) Calculate P(Ō∩ L̄ ).

Solution

(a) Ō is the event that the lecturer will not be in his/her office on a particular afternoon.

L̄ is the event that the lecturer will not be at a lecture.

Ō∩ L̄ is the event that the lecturer will not be in his/her office and that the lecturer will not
be at a lecture i.e., that the lecturer will be neither in his/her office nor at a lecture.
_____
(b) P(Ō∩ L̄ ) = P(O¿ L) = 1 – P(O¿ L) (Using De Morgan’s law and the complementary
probability formula)

= 1 – [ P(O) + P(L) ] (Events O and L are mutually exclusive)

= 1 – 0.48 – 0.27 = 0.25.

3 A batch of 20 computers contain 3 that are faulty. Four (4) computers are selected at
random without replacement from this batch. Calculate the probability that

(a) all 4 the computers selected are not faulty.


(b) at least 2 of the computers selected are faulty.

Solution

There are C(20,4) = 4845 [why not P(20,4) ?] ways of selecting the 4 computers from the
batch of 20. Since random selection is used, all 4845 selections are equally likely. Let A
denote the event “all 4 the computers selected are not faulty” and B the event “at least 2 of
the computers selected are faulty”
70

Using the classical probability result,

N ( A ) C (17 , 4 ) 2380
= = =0 . 4912.
(a) P(A) = N ( S ) C (20 , 4 ) 4845

N (B ) N (2faulty )+N (3 faulty ) C (17 ,2 )∗C (3, 2)+C (17 ,1 )∗C (3, 3)
= =
(b) P(B) = N (S ) 4845 4845

136∗3+17∗1 425
= =0 .0877 .
= 4845 4845

3.6 Conditional probability

The conditional probability of an event A occurring given that another event B has occurred
is given by

P ( A∩B )
P(A | B) = P (B ) , where P(B) > 0.

P ( A∩B )
Also, P(B|A) = P( A ) , where P(A) > 0.

Examples

Example 1

Five hundred (500) TV viewers consisting of 300 males and 200 females were asked whether
they were satisfied with the news coverage on a certain TV channel. Their replies are
summarized in the table below.

gender/answer satisfied not satisfied Total


male 180 120 300
female 90 110 200
Total 270 230 500

180
P(satisfied | male) = 300 = 0.6.

90
P(satisfied | female) = 200 = 0.45.
71

120 180
=1−
P(not satisfied | male) = 300 300 = 0.4.

110 90
=1−
P(not satisfied | female) = 200 200 = 0.55.

270
=0 .54
P(satisfied) = 500 and P(not satisfied) = 1 – 0.54 = 0.46.

Note

1 When calculating a conditional probability, the sample space is restricted to that


associated with the event that is known to occur.

2 The probability of a person being satisfied depends on the gender of the person being
interviewed. In this case females are less satisfied than males with the news coverage.

Example 2

At a certain university the probability of passing accounting is 0.68, the probability of passing
statistics 0.65 and the probability of passing both statistics and accounting is 0.57. Calculate
the probability that a student

(a) passes statistics when it is known that he/she passed accounting.

(b) passes accounting when it is known that he/she passed statistics.

(c) passes statistics when it is known that he/she did not pass accounting.

Solution

Let A denote the event “a student passes accounting” and B the event “a student passes
statistics”. Then Ā is the event “a student did not pass accounting”, A ∩B the event “a
student passes both statistics and accounting” and Ā∩B the event “a student passes
statistics, but not accounting”.

Given: P(A) = 0.68, P(B) = 0.65, P( A∩B ) = 0.57.

P ( A∩B ) 0 .57
=0. 838
(a) P(B|A) = P( A ) = 0 .68 .

P ( A∩B ) 0 .57
=0. 877 .
(b) P(A|B) = P (B ) = 0 .65
72

P ( Ā∩B ) P (B )−P( A∩B) 0 .65−0 .57


(c) P(B | Ā ) = P( Ā ) = 1−P( A ) = 0. 32 = 0.25.

Multiplication rule of probabilities

Suppose the joint probability P( A∩B ) is to be calculated if either of the conditional


probabilities [ P(A|B) or P(B|A) ] and the corresponding unconditional probability [ P(B) or
P(A) ] are known. Then the conditional probability formulae can be manipulated to obtain the
joint probability i.e.,

P(A¿ B) = P(B) P(A|B) = P(A) P(B|A).

These formulae are known as the multiplication formulae of probabilities.

Examples

1 A box has 12 bulbs, 3 of which are defective. If two bulbs are selected at random without
replacement, then what is the probability that both are defective?

Solution

Let d1 denote the event “the first bulb is defective” and d2 the event “the second bulb is
defective”.
T
3 2
Then P(d1) = 12 and P(d2|d1) = 11 . Using the above-mentioned multiplication formula,

3 2
=0 .045 .
P(d2 ¿ d1) = P(d1) P(d2|d1) = 12 11

2 Two cards are drawn at random from from a deck of playing cards. What is the probability
that both these cards are aces?

Solution

Since there are 4 aces in a deck of 52 cards, the probability of drawing one ace is 4/52.
Having removed one ace and not replacing it reduces the probabilities of drawing another ace
on the second draw. The 51 cards remaining contain 3 aces and therefore the probability of
drawing an ace on the second draw is 3/51. We can multiply these probabilities and
determine the probability of drawing two aces.

P(drawing 2 aces) = 4/52 . 3/51 = 1/221.

The multiplication rule can be extended to involve more than 2 events e.g., for 3 events A1,
A2 and A3 defined on the same sample space,
73

P( A1 ∩ A2 ∩ A3 ) = P(A1) P((A2|A1) P(A3|A2,A1).

3 Three cards are drawn at random from from a deck of playing cards. What is the
probability that all 3 these cards are aces?

Solution

P(drawing 3 aces) = 4/52 ×3/51× 2/50 = 1/5525.

Independent events

Two events A and B are said to independent if P(A| B) = P(A) or P(B|A) = P(B).

This means that the occurrence of B does not affect the probability that A occurs.

Substitution of the above results into the multiplication formula for two probabilities

P(A¿ B) = P(A) P(B) if A and B are independent.

This formula is known as the product formula for independent events.

Example

1 The probability that person A will be alive in 20 years is 0.7 and the probability that
person B will be alive in 20 years is 0.5, while the probability that they will both be alive in
20 years is 0.45. Are the events E1 “A is alive in 20 years” and E2 “B is alive in 20 years”
independent?

Solution

P(E1) = 0.7, P(E2) = 0.5, P(E1¿ E2) = 0.45

Since P(E1) P(E2) = 0.7 x 0.5 = 0.35 ≠ P(E1¿ E2), the events E1 and E2 are not independent.

2 Two coins are tossed. Using the classical definition of probability,

P(both tosses heads) = ¼ .

Assuming that both coins are unbiased, P(1st coin is heads) = P(2nd coin is heads) = ½ .

Since P(1st coin is heads) x P(2nd coin is heads) = ½ . ½ = ¼ = P(both tosses heads), the
events “heads on the first toss” and “heads on the second toss” are independent.

The multiplication rule for independent events can be extended to involve more than 2
events. In general, if the events A1, A2, . . . , An are independent then

P(
A1 ∩ A2 ∩¿ ¿ . . . ¿ An ) = P(A ) P(A ) . . . P(A ).
1 2 n

Examples
74

1 A coin is tossed and a single 6-sided die is rolled. Find the probability of “heads” and
rolling a 3 with the die.

P(head) = ½ and P(3) = 1/6.

Since the results of the coin and the die are independent,

11 1
= .
P(heads and 3) = P(heads) P(3) = 2 6 12

2 A school survey found that 9 out of 10 students like pizza. If three students are chosen at
random with replacement, what is the probability that all three students like pizza?

P(student 1 likes pizza) = 9/10 = P(student 2 likes pizza) = P(student 3 likes pizza).

P(student 1 likes pizza and student 2 likes pizza and student 3 likes pizza) =

9 3
) =0 .729
P(student 1 likes pizza) x P(student 2 likes pizza) x P(student 3 likes pizza) = (10 .

3 It is known that 8% of all cars of a certain make that are sold encounter engine
overheating problems within 50 000 kilometers of travel. During the past week 4 such cars
were sold. Suppose that engine overheating problems for the 4 cars are encountered
independently. What is the probability that

(a) all 4 (b) none (c) at least one of these cars sold encounter engine overheating problems
within 50 000 kilometers of travel?

Solution

Let A denote the event “overheating problems within 50 000 kilometers of travel”.

(a) P(A) = 0.08.

P(all 4 have overheating problems) = [P(A)]4 = 0.084 = 0.00004096.

(b) P(not overheating problems) = P( Ā ) = 1- 0.08 = 0.92.

P(none) = P(none of the 4 cars have overheating problems) = [ P( Ā ) ]4 = 0.924 = 0.7164.

(c) P(at least 1) = 1 – P(none) = 1 – 0.7164 = 0.2836.

Law of Total Probability

The result P ( A )=∑ P ( A ∩ Bi ) =∑ P( A∨Bi )P (¿ B i)¿ is known as the Law of Total


i i
Probability.
75

Bayes’ theorem

When applying the conditional probability formula

P ( A∩B )
P(A|B) = P (B ) , values for P(A¿ B) and P(B) are needed.

Suppose that only the values for P(A), P(B|A) and P(B| Ā ) are available.

In this case the probabilities [ P(A¿ B) and P(B)] required for calculating P(A|B) can be
calculated from

P(A¿ B) = P(A) P(B|A) (Using conditional probability multiplication formula)

and

P(B) = P( A∩B )+P( Ā∩B ) = P(A) P(B|A) + P( Ā ) P(B| Ā ) .

(Using the total probability formula and the conditional probability multiplication formula)

Substituting these probabilities into the first conditional probability formula gives

P( A )P( B|A )
P(A|B) = P ( A ) P( B|A )+P( Ā )P( B| Ā ) .

This result is known as Bayes’ theorem after the person who proposed the method.

Example

When testing a person for a certain disease, the test can show either a positive result (the
person has the disease) or a negative result (the person does not have the disease).

When a person has the disease, the test shows positive 99% of the time. When the person
does not have the disease, the test shows negative 95% of the time. Suppose it is known that
only 0.1% of the people in the population have the disease.

(a) If a test turns out to be positive, what is the probability that the person has the disease?

(b) If the test turns out to be negative, what is the probability that the person does not have
the disease?

Solution

Let A be the event “the person has the disease”, and B be the event “the test returns a positive
result”.
76

Then Ā is the event “the person does not have the disease”, B|A is the event “the test is
positive given the person has the disease”, B| Ā the event “the test is positive given the
person does not have the disease” and B̄| Ā the event “the test is negative given the person
does not have the disease”.

(a) P(A) = 0.001 (given) , P( Ā ) = 1- P(A) = 0.999, P(B|A) = 0.99 (given), P( B̄| Ā ) =0.95
(given), P(B| Ā ) = 1- P( B̄| Ā ) = 0.05.

Substitution into the above formulae gives

P(A¿ B) = P(A) P(B|A) = 0.001 x 0.99 = 0.00099

P(B) = P( A∩B )+P( Ā∩B ) = P(A) P(B|A) + P( Ā ) P(B| Ā ) = 0.001 x 0.99 + 0.999 x 0.05

= 0.00099 + 0.04995

= 0.05094

These above calculations can also be done as set out below.

conditional
unconditional probabilities probabilities product
0.001 x 0.99 0.00099
0.999 x 0.05 0.04995

sum 0.05094

P ( A∩B ) 0 . 00099
P(A|B) = P (B ) = 0 .05094 = 0.0194.

P ( Ā∩ B̄ ) P( Ā )P( B̄| Ā ) 0 . 999 x 0 . 95


= = =
(b) P( Ā| B̄) = P ( B̄ ) 1−P( B ) 0 . 94906 0.9999895.

From the above it follows that a negative result of the test is very reliable (it will be wrong
only 105 times in 10 million cases). On the other hand, the chances that a person will have
the disease when the result of the test shows positive is 194 in 10 000.

3.7 Probabilities and odds

Let a to b be the odds in favour of some event A. Suppose P(A) = p. Then P( Ā ) = 1-p. The
odds in favour of A is then defined as

a p
=
b 1− p .
77

a b
From the above it can be shown that p = a+b and 1-p = a+b .

The odds against A is b to a. From the above

b 1− p
= .
a p
Examples

1 A pair of balanced dice is tossed. What are the odds in favour of the sum of the numbers
showing a 6?

Total number of outcomes = 6 x 6 =36.

Possible ways of getting a sum of 6 : (1, 5), (2, 4), (3, 3), (4, 2), (5,1).

Number of ways of getting a sum of 6 = 5.

p = probability (sum=6) = 5/36 , 1-p = 31/36.

Odds in favour of a 6 is 5 to 31 or 1 to 6.2.

2 The odds in favour of a red number coming up in a game of roulette is 18 to 19 or 1 to


1.056.

18 18
= =0 . 486
probability (red number) = 18+19 37 .

3 The table below shows data that were collected from 781 middle aged female patients at a
certain hospital.

smoker/heart problems yes no total

yes 172 173 345

no 90 346 436

total 262 519 781

From the table it can be seen that

For smokers the odds in favour of heart problems are 172 to 173 or 1 to 1.0058

For non-smokers the odds in favour of heart problems are 90 to 346 or 1 to 3.8444.
78

From this it follows that smokers are much more at risk for heart problems than non-smokers.

3.8 Computer output

The following is an output of permutation [P(n,r)], combination [C(n,r)] and factorial (n!)
values calculated by using excel.

n r C(n,r) P(n,r)
5 3 10 60
6 4 20 360
7 4 35 840
10 6 120 151200
15 10 455 10897286400
20 12 1140 6.03398E+13
25 13 2300 3.23824E+16
n n!
3 6
4 24
5 120
6 720
7 5040
8 40320
9 362880

2 The following is the computer output of a cross classification of the cards in a deck of
cards according to colour and type of card (number, picture, ace).

colour
black red Total
type number card 18 18 36
picture card 6 6 12
ace 2 2 4
Total 26 26 52

From the above table various probabilities can easily be calculated eg.

36 9 6 3
=
P(number card) = 52 = 13 , P(picture card and red) =52 26 ,

16 4
=
P(not number card) =52 13 .

Chapter 4 – Probability distributions of discrete random


variables
4.1 Discrete random variables
79

A random variable is a variable whose value depends on the outcome of a random


experiment. A random variable is denoted by a capital letter and a particular value of a
random variable by a lower case (small) letter.

Examples:

1 T = the number of tails (t) when a coin is flipped 3 times.

2 X = the sum of the values (x) showing when two dice are rolled.

3 H = the height (h) of a woman chosen at random from a group.

4 V = the liquid volume (v) of soda in a can marked 12 oz.

There are two types of random variables:

Discrete Random Variables – Variables that have a finite or countable number of possible
values. These variables usually occur in counting experiments.

Continuous Random Variables – Variables that can take on any value in some interval i.e.,
they can take an infinite number of possible values. These variables usually occur in
experiments where measurements are taken.

Examples:

1 The variables T and X from the above examples are discrete random variables.

2 The variables H and V from the above examples are continuous random variables.

4.2 Discrete probability distributions and their graphical representations

A discrete probability distribution is a list of the possible distinct values of the random
variable together with their corresponding probabilities. The probability of the random
variable X assuming a particular value x is denoted by P(X=x) = P(x). This probability,
which is a function of x, is referred to as the probability mass function.

Examples:

1 As above, let T be the random variable that represents the number of tails obtained when a
coin is flipped three times. Then T has 4 possible values 0, 1, 2, and 3. The outcomes of the
experiment and the values of T are summarized in the next table.

Outcomes T
80

hhh 0
hht, hth, thh 1
tth, tht, htt 2
ttt 3

Assuming that the outcomes are all equally likely, the probability distribution for T is given
in the following table.

t 0 1 2 3 Total
P(t) 1/8 3/8 3/8 1/8 1

2 Let Y denote the number of tosses of a coin until heads appear first. Then

S = {h, th, tth, ttth, . . . } and Y =1, 2, 3, 4, . . . .

y 1 2 3 . . . Total
P(y) ½ (½)2 (½)3 . . . 1

Why is ½ + (½)2 + (½)3 + . . . = 1 ?

3 A pair of dice is tossed. Let X denote the sum of the digits. The probability distribution of
X can be found from the following table. The entry in a particular cell is the sum of row and
column values

1st/2nd 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

x 2 3 4 5 6 7 8 9 10 11 12
P(X=x 1/36 2/36 3/36 4/36 5/3 6/36 5/36 4/36 3/36 2/36 1/36
) 6

Note: For any discrete random variable X the range of values that it can assume are such that

∑ P (x )=1
0 ≤ P(x) ≤ 1 and x .

The cumulative distribution function

The cumulative distribution function is defined as

∑ P(r )
F(x) = P(X ≤ x) = r≤x .

Examples
81

1 For the probability mass function in example 1 the cumulative distribution function is

x 0 1 2 3
F(x 1/8 ½ 7/8 1
)

2 For the probability mass function in example 3 the cumulative distribution function is

x 2 3 4 5 6 7 8 9 10 11 12
F(x) 1/3 3/36 6/3 10/36 15/36 21/36 26/36 30/36 33/36 35/36 1
6 6

3 Consider a discrete random variable with probability mass function given below.

x 1 2 3 4
P(X=x 0. 0.3 0.4 0.2
) 1

(a) CDF (b) PMF

The graphs above are plots of the probability mass function (graph on the right) and
cumulative distribution function (graph on the left).

A random variable can only take on one value at a time i.e., the events X=x1 and X=x2 for x1
≠ x2 are mutually exclusive. The probability of the variable taking on any number of
different values can be found by simply adding the appropriate probabilities.

Examples

1 Find the probability of getting 2 or more tails when a coin is flipped 3 times.
82

P(T≥2) = 3/8 + 1/8 = ½.

2 Find the probability of getting at least one tail when a coin is flipped 3 times.

P(at least 1) = P(1) + P(2) + P(3) = 3/8 + 3/8 +1/8 = 7/8 = 1 – P(0) = 1 – 1/8.

3 Find the probability of needing at most 3 tosses of a coin to get the first heads.

P(at most 3) = P(1) + P(2) + P(3) = ½ + (½)2 + (½)3 = 7/8

4 Find the probability of getting a sum of (a) 7 (b) at least 4 when tossing a pair of dice.

(a) P(7) = P(1st is 6, 2nd is 1) + P(1st is 5, 2nd is 2) + P(1st is 4, 2nd is 3) +

P(1st is 3, 2nd is 4) + P(1st is 2, 2nd is 5) + P(1st is 1, 2nd is 6)

= 6/36 = 1/6.

(b) P(at least 4) = P(4) + P(5) + . . . + P(12) = 1 – [P(2) + P(3)] = 1- 3/36 = 33/36 =11/12.

4.3 Mean (expected value), variance and standard deviation of a discrete random
variable

The mean or expected value of a random variable X is the average value that we would
expect for X when performing the random experiment many times.

Notation: The mean or expected value of a random variable X will be represented by μ or


E(X).

We can calculate the mean by using the formula

E(X) = μ = ∑ xp ( x) .
Examples

1 The expected value of the random variable T from above is:

1 3 3 1 3
+ 1∗ + 2∗ + 3∗
E(T) = ∑ t P(t) = 0* 8 8 8 8 = 2 .

Thus if 3 coins are flipped many times, we should expect the average number of tails (per 3
flips) to be about 1.5. Since the number of tails is an integer value, it will never actually
assume the mean value of 1.5. This mean value more reflects the fact that the extreme values
1
(0 and 3) occur the same proportion of times ( 8 th) and the middle values occur the same
3
proportion of times ( 8 ths).
83

2 The score S obtained in a certain quiz is a random variable with probability distribution
given below.

s 0 1 2 3 4 5
P(S=s) 0.12 0.04 0.16 0.32 0.24 0.12

The mean of the random variable S can be calculated as shown below.

s 0 1 2 3 4 5 sum
P(S=s) 0.1 0.04 0.16 0.3 0.24 0.1 1
2 2 2
s*P(s) 0 0.04 0.32 0.9 0.96 0.6 2.88
6 0

μ = E(S) = 2.88

2
For a random variable X, the variance, denoted by σ can be calculated by using the
formula

σ 2=∑ ( x−μ)2 P (x )=∑ x 2 P( x)−μ 2 .

2
The standard deviation of X, denoted by σ is just the positive square root of σ . This is a
measure of the extent to which the values are spread around the mean.

The calculation of the standard deviation for a random variable is similar to that of the
calculation of the standard deviation for grouped data.

Example

Calculate the standard deviation of the random variable T from above.

t 0 1 2 3 sum
P(t) 1/ 3/8 3/8 1/8 1
8
t*P(t) 0 3/8 6/8 3/8 1.5
t2*P(t) 0 3/8 12/8 9/8 3

σ =3−1.5 = 0.75 and σ =


2 2
√ 0.75 = 0.866
4.4 Binomial, hypergeometric and bernoulli distributions

Assumptions of binomial distribution

A discrete random variable X is said to have a binomial distribution if a random experiment


satisfies the following conditions.
84

1 The experiment is repeated a fixed number of times. Each repetition is called a trial. The
number of trials is denoted by n.

2 All trials are independent of each other.

3 The outcome for each trial of the experiment can be one of two complementary outcomes,
one (s) labeled “success” and the other (f) labeled “failure”. A single such a trial is called a
Bernoulli trial.

4 The probability of success P(s) has a constant value of p for each trial.

5 The random variable X counts the number of successes that has occurred.in n trials.

Examples:

1 Consider the experiment of flipping a coin 5 times. If we let the event of getting “tails” on
a flip be labeled “success” and “heads” failure, and if the random variable T represents the

number of tails obtained, then T will be binomially distributed with , , and .


2 A student answers 10 question in a multiple-choice test by guessing each answer. For
each question, there are 5 possible answers, only one of which is correct. If we consider a
“success” as getting a question right and consider the 10 questions as 10 independent
Bernoulli trials, then the random variable X representing the number of correct answers will
be binomially distributed with , p=0.2, and q=0.8.

3 Fourteen percent of flights from a certain airport are delayed. If 20 flights are chosen at
random, then we can consider each flight to be an independent Bernoulli trial. If we define a
successful trial to be one where a flight takes off on time, then the random variable Z
representing the number of on-time flights will be binomially distributed with ,.
p=0.86 , q=0.14.

Tree diagram

The number of possible outcomes in a binomial experiment can be written down from a
diagram such as the one below. This diagram called a tree diagram enables one to write down
all the outcomes when this experiment is performed 3 times.

s f s f s f s f 3rd
s f s f 2nd
s f 1st

start

The following outcomes and their respective number of successes (x) can be written down
from the above tree diagram.

Outcomes x
fff 0
85

ffs, fsf, ffs 1


ssf, sfs, fss 2
sss 3

Formula for the calculation of binomial probabilities

A formula for the binomial probability mass function for the case n = 3 can be written down
from the above table by noting the following.

1 Each outcome is a sequence of s (success) and f (failure) values e.g., fff, ffs, ssf etc.

2 In a particular sequence s occurs x times and f (3-x) times for x = 0, 1, 2, 3.

3 Since the trials are independent, the probability of a particular sequence of s’s and f’s is
given by a product of p (the probability of success) and q (the probability of failure) values,
where p’s occur x times and q’s (3-x) times e.g., P(fff) = q3, P(ffs) = pq2, P(ssf) = p2q etc.

4 The number of outcomes where there are x success and (3-x) failure outcomes can be
counted by using the formula C(3, x)= 3Cx .

By using the above, the binomial formula for n = 3 can be written down as

P(x) = 3Cx px q3-x for x = 0, 1, 2, 3.

To write down the general formula, the same reasoning as explained above applies to
sequences with n outcomes consisting of s (x of these) and f (n-x of these) values. In the
formula the number 3 is just replaced by n i.e.,

P(x) = nCx px qn-x for x = 0, 1, 2, . . . , n .

A short hand way of referring to a binomially distributed random variable X, based on n trials
with probability of success p, is X ~ B(n,p).

Examples

1 As in the previous examples, let T be the random variable representing the number of tails
when a coin is flipped 3 times. Using the formula above with n=3 and p = ½ , we can
calculate the probability of exactly 2 tails as:
1 2 1 1
)( )
P(2) = 3C2 ( 2 2 = 0.375 .

2 Let the random variable X represent the number of correct answers in the multiple-choice
test described above. Then the probability of a student guessing 3 answers correctly is

C3 (0.2)3 (0.8)7 = 0.2013,


10

while the probability of guessing seven answers correctly is

C7 (0.2)7 (0.8)3 = 0.000786.


10
86

Mean and standard deviation of a binomial random variable

If X is a binomial random variable with n trials, probability of success p and probability of


failure q, then the mean, variance and standard deviation of X can be calculated by using the
following formulae.

2
mean = E(X) = μ = np , var(X) = σ = npq and standard deviation (X) = √ npq.
Example

For T the number of tails when a coin is flipped 3 times, n = 3, p = q = ½.

1 3
E (T )  3   
 2  2 and σ=√ 3×0.5×0.5 = √ 0.75 =0.866.

Shape of the binomial distribution

A binomial distribution is symmetric if p q , positively skewed if p  q and negatively


skewed if p  q . These shapes are illustrated in the graphs for n = 20 shown below.

Binom ial distribution n=20 p=0.5

0.2

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
0 5 10 15 20 25
x
87

Binom ial distribution n=20 p=0.1

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25
x

Binom ial distribution n=20 p=0.9

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25
x

Bernoulli trials where sampling is without replacement

The folowing expreimental model is sometimes associated with the binomial distribution.

Consider a bowl with N marbles of which Np are blue and Nq red, where p+q=1. If
sampling is done with replacement and drawing a blue marble labeled “success” (red marble
Np Nq
=p =q
labeled “failure”), then P(success) = N and P(failure) = N . If P( x blue marbles
in n draws) is required and sampling is with replacement, the binomial formula will still
apply. If sampling is without replacement, P(success) is no longer constant (assumption 4 of
binomial experiment is violated) and the binomial formula will no longer apply for
calculating the abovementioned probability. In such a case

Np C x ×Nq C n−x

P( x blue marbles in n draws) = N Cn , where n≥x .


88

The abovementioned distribution is known as the hypergeometric distribution.

Example

A bowl contains 10 blue and 7 red marbles. Four (4) marbles are drawn at random from the
bowl. Calculate the probability of

(a) two (b) at least 3 blue marbles drawn when sampling is done

1 with replacement.
2 without replacement.

N=17 , Np=10 , Nq=7 , n=4 .

10 7
p= , q=
1(a) 17 17 , x=2 .

10 2 7 2
4 C 2( ) ( )
P(X=2) = 17 17 = 0.352.

10 3 7 1 10 7
4 C3( ) ( ) 4 C 4 ( )4 ( )0
(b) P(X≥3) = P(X=3) + P(X=4) = 17 17 + 17 17 = 0.335+0.120 = 0.455.

C 2×7 C 2
10 45×21
=0 .397
2(a) P(X=2) = 17 C 4 = 2380 .

C 3× 7 C 1 C 4 ×7 C 0
10 10 840+210
=0 . 441
(b) P(X≥3) = P(X=3) + P(X=4) = 17 C 4 + 17 C 4 = 2380 .

Bernoulli distribution

An important special case of the binomial distribution is the case where n = 1. Then the
probability formula becomes

P(x) = 1Cx px q1-x for x = 0, 1.

The probability distribution described by this formula is known as the Bernoulli


distribution. For this distribution

P(0) = q and P(1) = p.

The mean and variance (standard deviation) of the Bernoulli distribution can be written down
as special cases (n = 1) of the corresponding binomial formulae.

2
mean = E(X) = μ = p, var(X) = σ = pq and standard deviation (X) = √ pq .
4.5 Poisson distribution
89

A Poisson random variable (X) is one that counts the number of events that occur at random
in an interval of time or space. The average number of events that occur in the time/space
interval is denoted by μ.

Examples

1 The number of bad cheques presented for daily payment at a bank.


2 The number of road deaths per month.
3 The number of bacteria in a culture.
4 The number of defects per square meter on metal sheets being manufactured.
5 The number of mistakes per typewritten page.
6 The number of phone calls arriving at company’s switchboard.

Formula (number of occurrences of an event)

The probability that x events occur in time/space is given by

μ x e− μ
P(X=x) = P(x) = x ! , for x = 0, 1, 2, . . . (µ > 0)

A short hand way of referring to a Poisson distributed random variable X with average
(mean) rate of occurrence µ is X ~ Po(µ)..

Examples

1 A bank receives on average μ=6 bad cheques per day. Calculate the probability of the
bank receiving

(a) exactly 4 (b) at least 3 bad cheques per day.

Solution

(a) Substituting μ=6 and x=4 into the above formula gives

64 e−6
=0 . 134
P(4) = 4 ! .

−6e−6 6 e−6 6 2
e + +
(b) P(X ≥ 3) = 1 –P(X ≤ 2) = 1 – ( 1! 2 ! ) = 1-0.062 = 0.938.

2 A secretary claims an average mistake rate of 1 per page. A sample page is selected at
random, and 5 mistakes found. What is the probability of her making 5 or more mistakes if
her claim of 1 mistake per page on average is correct?

In this case μ=1 is claimed and X the number of mistakes ≥5. If the claim is true,
90

e−1 e−1 e−1


e−1 +e−1 + + + )
P(X ≥ 5) = 1 – P(X ≤ 4) = 1 – ( 2 ! 3! 4 ! = 1-0.9963=0.0037.

The above calculation shows that if the claim of 1 mistake per page on average is true, there
is only a 37 in 10 000 chance of getting 5 or more mistakes per page. This remote chance of 5
or more mistakes when an average of 1 mistake per page is made is true, casts doubt on
whether the claim of 1 mistake per page on average is in fact true.

Formula (Poisson approximation of binomial distribution)

The Poisson random variable can also be seen as an approximation to a binomial random
variable with n the number of trials large and p the probability of success small such that the
mean μ = np is of moderate size. This approximation is good when n≥20 and p ≤ 0.05 or
n≥100 and np< 10.

Example

A life insurance company has found that the probability is 0.000015 that a person aged 40-50
will die from a certain rare disease. If the company has 100 000 policy holders in this age
group, what is the probability that this company will have to pay out 4 claims or more
because of death from this disease?

For the following reasons a binomial distribution with n = 100 000 and p = 0.000015 is
reasonable in this case.

1 A person either dies or not from this disease (two outcomes).

2 The probability of dying from the disease is constant.

3 The death or not from this disease of one person does not affect that of another
person.

The Poisson distribution with µ = 100 000*(0.000015) = 1.5 can be used to approximate this
probability.

−1.5
P(X ≥ 4) = 1 – P(X ≤ 3) = 1 – (e +1.5e−1 .5+1.52 e−1 .5 /2!+1.53 e−1 .5/3!) = 1 – 0.9344
= 0.0656.

Mean and standard deviation of a Poisson random variable

The mean and variance of the Poisson distribution are given by E(X) = µ and var(X) = µ.
In the case of the Poisson approximation to the binomial distribution

mean(X) = variance(X) = np and standard deviation =√ np .

If the average rate of occurrence of µ is given for a particular time/space interval length/size,
probability calculations can also be carried out for an interval length/size which is different to
the one given.
91

Example

Calls arrive at switchboard at an average rate of 1 every 15 seconds. What is the probability
of not more than 5 calls arriving during a particular minute?

A mean rate of 1 every 15 seconds is equivalent to a mean rate of 4 every minute. Since the
question concerns an interval of 1 minute, µ = 4 (not µ = 1).

e−4 3 e−4 4 e−4 5 e−4


e−4 +4 e−4 +4 2 +4 +4 +4
P(X ≤ 5) = 2! 3! 4! 5 ! = 0.7851.

4.6 Computer output

The output of binomial and poisson probability calculations done on excel are shown below.

1 Binomial distribution probabilities

n=10 p
x 0.2 0.4 0.6 0.8
0 0.107 0.006 0 0
1 0.268 0.04 0.002 0
2 0.302 0.121 0.011 0
3 0.201 0.215 0.042 0.001
4 0.088 0.251 0.111 0.006
5 0.026 0.201 0.201 0.026
6 0.006 0.111 0.251 0.088
7 0.001 0.042 0.215 0.201
8 0 0.011 0.121 0.302
9 0 0.002 0.04 0.268
10 0 0 0.006 0.107
n=15 0.2 0.4 0.6 0.8
0 0.035 0 0 0
1 0.132 0.005 0 0
2 0.231 0.022 0 0
3 0.25 0.063 0.002 0
4 0.188 0.127 0.007 0
5 0.103 0.186 0.024 0
6 0.043 0.207 0.061 0.001
7 0.014 0.177 0.118 0.003
8 0.003 0.118 0.177 0.014
9 0.001 0.061 0.207 0.043
10 0 0.024 0.186 0.103
11 0 0.007 0.127 0.188
12 0 0.002 0.063 0.25
13 0 0 0.022 0.231
14 0 0 0.005 0.132
15 0 0 0 0.035
92

2 Poisson distribution probabilities

x/µ 0.2 0.4 1 2 3 6 10


0 0.8187 0.6703 0.3679 0.1353 0.0498 0.0025 0
1 0.1637 0.2681 0.3679 0.2707 0.1494 0.0149 0.0005
2 0.0164 0.0536 0.1839 0.2707 0.224 0.0446 0.0023
3 0.0011 0.0072 0.0613 0.1804 0.224 0.0892 0.0076
4 0.0001 0.0007 0.0153 0.0902 0.168 0.1339 0.0189
5 0 0.0001 0.0031 0.0361 0.1008 0.1606 0.0378
6 0 0 0.0005 0.012 0.0504 0.1606 0.0631
7 0 0 0.0001 0.0034 0.0216 0.1377 0.0901
8 0 0 0 0.0009 0.0081 0.1033 0.1126
9 0 0 0 0.0002 0.0027 0.0688 0.1251
10 0 0 0 0 0.0008 0.0413 0.1251
11 0 0 0 0 0.0002 0.0225 0.1137
12 0 0 0 0 0.0001 0.0113 0.0948
13 0 0 0 0 0 0.0052 0.0729
14 0 0 0 0 0 0.0022 0.0521
15 0 0 0 0 0 0.0009 0.0347
16 0 0 0 0 0 0.0003 0.0217
17 0 0 0 0 0 0.0001 0.0128
18 0 0 0 0 0 0 0.0071
19 0 0 0 0 0 0 0.0037
20 0 0 0 0 0 0 0.0019
93

Chapter 5 – The normal distribution


5.1 Probability distributions of continuous random variables

A random variable X is called continuous if it can assume any of the possible values in some
interval i.e., the number of possible values is infinite. In this case the definition of a discrete
random variable (list of possible values with their corresponding probabilities) cannot be used
(since there are an infinite number of possible values it is not possible to draw up a list of
possible values). For this reason, probabilities associated with individual values of a
continuous random variable X are taken as 0.

The clustering pattern of the values of X over the possible values in the interval is described
by a mathematical function f(x) called the probability density function. A high (low)
clustering of values will result in high (low) values of this function. For a continuous random
variable X, only probabilities associated with ranges of values (e.g., an interval of values
from a to b) will be calculated. The probability that the value of X will fall between the values
a and b is given by the area between a and b under the curve describing the probability
density function f(x). For any probability density function the total area under the graph of
f(x) is 1.

5.2 Normal distribution

A continuous random variable X is normally distributed (follows a normal distribution) if the


probability density function of X is given by
2
1 ( x −μ)
f(x) = × exp[ - ¿ for -∞ ≤ x ≤ ∞ .
√2 π σ 2
2σ 2

The constants and can be shown to be the mean and standard deviation respectively of
X. These constants completely specify the density function. A graph of the curve describing
the probability function (known as the normal curve) for the case μ=0 and σ =1 is shown
below.
94

Graph of standard norm al distribution

0.45
0.4
0.35
0.3
0.25
p(z)

0.2
0.15
0.1
0.05
0
-4 -2 0 2 4
z

5.2.1 Properties of the normal distribution

The graph of the function defined above has a symmetric, bell-shaped appearance. The mean
µ is located on the horizontal axis where the graph reaches its maximum value. At the two
ends of the scale the curve describing the function gets closer and closer to the horizontal axis
without touching it. Many quantities measured in everyday life have a distribution which
closely matches that of a normal random variable (e.g., marks in an exam, weights of
products, heights of a male population). The parameter µ shows where the distribution is
centrally located and σ the spread of the values around µ. A short hand way of referring to a
random variable X which follows a normal distribution with mean µ and variance σ2 is by
writing X ~ N(µ, σ2). The next diagram shows graphs of normal distributions for various
values of μ and σ2.
95

An increase (decrease) in the mean µ results in a shift of the graph to the right (left) e.g. the
curve of the distribution with a mean of -2 is moved 2 units to the left. An increase
(decrease) in the standard deviation σ results in the graph becoming more (less) spread out
e.g. compare the curves of the distributions with σ2 = 0.2, 0.5, 1 and 5.

5.2.2 Empirical example – The normal distribution and a histogram

Consider the scores obtained by 4 500 candidates in a matric mathematics examination.

Histogram

1000
900
800
700
600
fr

q
e

500
400
300
200
100
0
e
15

25

35

45

55

65

75

90

or
M

mark

The histogram of the marks has an appearance that can be described by a normal curve i.e., it
has a symmetric, bell-shaped appearance. The mean of the marks is 51.95 and the standard
deviation 10.

5.3 The Standard Normal Distribution


96

To find probabilities for a normally distributed random variable, we need to be able to


calculate the areas under the graph of the normal distribution. Such areas are obtained from a
table showing the cumulative distribution of the normal distribution (see appendix). Since the
normal distribution is specified by the mean (µ) and standard deviation (σ), there are many
possible normal distributions that can occur. It will be impossible to construct a table for each
possible mean and standard deviation. This problem is overcome by transforming X the
normal random variable of interest [X ~ N(µ, σ2) ] to a standardized normal random variable

X−μ
Z= σ .

It can be shown that the transformed random variable Z ~ N(0, 1). The random variable Z can
be transformed back to X by using the formula

X= μ+Zσ .
The normal distribution with mean µ = 0 and standard deviation σ = 1 is called the standard
normal distribution. The symbol Z is reserved for a random variable with this distribution.
The graph of the standard normal distribution appears below.

Various areas under the above normal curve are shown. The standard normal table gives the
area under the curve to the left of the value z. The area to the right of z and the area between
two z values (z1 and z2) can be found by subtraction of appropriate areas as shown in the next
examples.

5.4 Calculating probabilities using the standard normal table

The first few lines of the standard normal table are shown below.

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
97

-3.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
-3.6 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

. . . ‘

. . . .

0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

. . . ‘

. . . .

3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999

When looking up a particular value of Z the first two diguts (units, thenths) can be found in
the appropriate row on the left. The column entry is the second decimal digit (hundredths).
The areas shown in the table are those under the standard normal curve to the left of the value
of z looked up i.e. P(Z ≤ z) e.g. P(Z ≤ 0.14) = 0.5557.

Note

1 For negative values of z less than the minimum value (-3.79) in the table, the probabilities
are taken as 0 i.e.,

P(Z ≤ z) = 0 for z < -3.79.

2 For positive values of z greater than the maximum value (3.79) in the table, the
probabilities are taken as 1 i.e.

P(Z ≤ z) = 1 for z > 3.79.

Examples In all the examples that follow, Z ~ N(0, 1).

1 P(Z < 1.35) = 0.9115

2 P(Z > -0.47) = 1 - P(Z ≤ -0.47) = 1-0.3192 = 0.6808

3 P(-0.47 < Z < 1.35) = P(Z < 1.35) – P(Z < -0.47) = 0.9115-0.3192 = 0.5923

4 P(Z > 0.76) = 1 –P(Z < 0.76) = 1 – 0.7764 = 0.2236

5 P(0.95 ≤ Z ≤ 1.36) = P(Z ≤ 1.36) – P(Z ≤ 0.95) = 0.9131 – 0.8289 = 0.0842

6 P(-1.96 ≤ Z ≤ 1.96) = P(Z ≤ 1.96) – P(Z ≤ -1.96) = 0.9750 – 0.0250 = 0.95

In all the above examples an area was found for a given value of z. It is also possible to find a
value of z when an area to its left is given. This can be written as P(Z ≤ zα) = α (α is the
greek letter for a and is pronounced “alpha”). In this case zα has to be found such that α is the
area to its left

Examples
98

1 Find the value of z that has an area of 0.0344 to its left.

Search the body of the table for the required area (0.0344) and then read off the value of z
corresponding to this area. In this case z0.0344 = -1.82.

2 Find the value of z that has an area of 0.975 to its left.

Finding 0.975 in the body of the table and reading off the z value gives z0.975 = 1.96.

3 Find the values of z that have areas of 0.95 and 0.05 to their left.

When searching the body of the table for 0.95 this value is not found. The z value
corresponding to 0.95 can be estimated from the following information obtained from the
table.

z area to left
1.64 0.9495
? 0.95
1.65 0.9505

Since the required area (0.95) is halfway between the 2 areas obtained from the table, the
required z can be taken as the value halfway between the two z values that were obtained

1. 64+1 . 65
=1. 645 .
from the table i.e., z = 2

Exercise: Using the same approach as above, verify that the z value corresponding to an area
of 0.05 to its left is -1.645.

At the bottom of the standard normal table selected percentiles zα are given for different
values of α. This means that the area under the normal curve to the left of zα is α.

Examples: 1 α = 0.900, zα = 1.282 means P(Z < 1.282) = 0.900.

2 α = 0.995, zα = 2.576 means P(Z < 2.576) = 0.995.

3 α = 0.005, zα = -2.576 means P(Z < -2.576) = 0.005.

The standard normal distribution is symmetric with respect to the mean = 0. From this it
follows that the area under the normal curve to the right of a positive z entry in the standard
normal table is the same as the area to the left of the associated negative entry (-z) i.e.

P(Z ≥ z) = P(Z ≤ -z) .

E.g. P(Z ≥ 1.96) = 1 – 0.975 = 0.025 = P(Z ≤ -1.96).

5.5 Calculating probabilities for any normal random variable

Let X be a N(μ, σ2) random variable and Z a N(0, 1) random variable. Then
99

X−μ x−μ x−μ


≤ ) ).
1 P(X ≤ x) = P( σ σ = P(Z ≤ σ

a−μ X −μ b−μ a−μ b−μ


≤ ≤ ) ≤Z≤ ).
2 P(a ≤ X ≤ b) = P( σ σ σ = P( σ σ

Examples:

1 The height H (in inches) of a population of women is approximately normally distributed

with a mean of and a standard deviation of inches. To calculate the


probability that a woman is less than 63 inches tall, we first find the z-score for 63 inches

63−63 .5
z= =−0 . 18
2. 5

and then use P(H ≤ 63) = P(Z ≤-0.18)= 0.4286.

This means that 42.86% (a proportion of 0.4286) of women are less than 63 inches tall.

2 The length X (inches) of sardines is a N(4.62, 0.0529) random variable. What proportion
of sardines is

(a) longer than 5 inches? (b) between 4.35 and 4.85 inches?

5−4 .62
)
(a) P(X > 5) = P(Z > 0 .23 = P(Z > 1.65) = 1 – P(Z ≤ 1.65) = 1 – 0.9505 = 0.0495.

4 . 35−4 . 62 4 . 85−4 .62


≤Z≤ )=
(b) P(4.35 ≤ X ≤ 4.85) = P( 0 .23 0. 23 P(-1.17 ≤ Z ≤ 1)

=P(Z ≤1) - P(Z ≤ -1.17)

= 0.8413 – 0.1210 = 0.7203.

5.6 Finding percentiles by using the standard normal table

The standard normal table can be used to find percentiles for random variables which are
normally distributed.

Example

The scores S obtained in a mathematics entrance examination are normally distributed with

and . Find the score which marks the 80th percentile. From the standard
normal table, the z-score which is closest to an entry of 0.80 in the body of the table is 0.84
(the actual area to its left is 0.7995). The score which corresponds to a z-score of 0.84 can be
100

found by solving for s. This yields i.e. a score of approximately


609 is better than 80% of all other exam scores.

Exercises: All these exercises refer to the normal distribution above.

1 Find .

2 If a person scores in the top 5% of test scores, what is the minimum score they could have
received?

3 If a person scores in the bottom 10% of test scores, what is the maximum score they could
have received?

5.7 Continuity correction

A continuity correction is applied when you want to use a continuous distribution to


approximate a discrete distribution. For example, suppose you would like to find the
probability that a coin lands on heads less than or equal to 45 times during 100 flips. That is,
you want to find P(X ≤ 45). To use the normal distribution to approximate the binomial
distribution, you would instead find P(X ≤ 45.5).

Rules: 1 When a value is included in a lower limit of a probability calculation, subtract


0.5 from the limit. When a value is excluded in a lower limit of a probability calculation add
0.5 to the limit.

2 When a value is excluded in an upper limit of a probability calculation, subtract 0.5 from
the limit. When a value is included in an upper limit of a probability calculation add 0.5 to
the limit.

Examples

Probability Continuity correction Probability Continuity correction


P(a≤ X ≤ b) P(a−0.5 ≤ X ≤ b+0.5) P( X ≤ a) P( X ≤ a+ 0.5)
P(a< X ≤ b) P(a+0.5 ≤ X ≤ b+ 0.5) P( X< a) P( X ≤ a−0.5)
P(a≤ X < b) P(a−0.5 ≤ X ≤ b−0.5) P( X ≥ a) P( X ≥ a−0.5)
P(a< X <b) P(a+0.5 ≤ X ≤ b−0.5) P( X> a) P( X ≥ a+ 0.5)

Note: A continuity correction is only needed when using a continuous distribution (e.g.,
normal distribution) to approximate a discrete one and not when you calculate probabilities
involving a continuous random variable or a discrete random variable. Before using a
continuity correction, note the type of variable associated with the probability you are
approximating and the type of variable you are using to approximate it.

Example

The number of hamburgers ( X ¿sold daily at a fast-food restaurant is a normally distributed


random variable with mean 65 and standard deviation 6.
101

1 Calculate the following probabilities.

1.1 P( X> 50) 1.2 P( X ≥ 50) 1.3 P(58< X ≤62) 1.4 P(69 ≤ X <73) 1.5
P( X ≤ 75)

2 The 2.1 10th 2.2 88th 2.3 95th percentile of the daily hamburger sales.

Solution

μ=65 , σ =6. Since X is discrete, continuity corrections are needed.


1.1 P ( X >50 ) =P¿ (continuity correction)
¿P( X −65 50.5−65
6

6 ) =P ( Z ≥−2.42 )=1−P ( Z ←2.42 )=1−0.00776 .
= 0.9924

(
a. ( X ≥ 50 )=P ¿ P
X −65 49.5−65
6

6 )=P ( Z ≥−2.58 )=0.99506 .

b. P ( 58< X ≤ 62 )=P ( 58.5< X ≤ 62.5 )=P ( X ≤ 62.5 )−¿ P ( X ≤58.5 )

(
P Z≤
62.5−65
6 ) (−¿ P X ≤
58.5−65
6 ) =P ( Z ≤−0.42 ) −P ( Z ≤−1.08 )=¿
0.33724−0.14007=0.19717
1.4 P ( 69 ≤ X< 73 )=P ( 68.5 ≤ X <72.5 )=P ( 0.58 ≤ Z <1.25 )=¿
0.89435−0.71904=0.17531.

(
1.5 P ( X ≤75 )=P ( X ≤ 75.5 ) =P Z ≤
75.5−65
6 )=P ( Z ≤ 1.75 ) =0.95994

2.1 P ( Z ≤ z0.10 ) =0.10 . From the normal tables P ( Z ≤−1.28 )=0.10. Solving for x from
x−65
=−1.28 gives x=65−1.28 ×6=57.32. The answer is 57.32 rounded to the nearest
6
integer i.e., 57.
2.2 P( Z ≤ z 0.88 )=0.88 . From the normal tables P ( Z ≤1.175 )=0.88 . Solving for x from
x−65
=1.175 gives x=65+1.175 × 6=72.05 . The answer is 72.05 rounded to the nearest
6
integer i.e., 72.
2.3 P( Z ≤ z 0.95 )=0.95 . From the normal tables P ( Z ≤1.645 )=0.95. Solving for x from
x−65
=1.645 gives x=65+1.645 × 6=74.87 . The answer is 74.87 rounded to the nearest
6
integer i.e. ,75.

5.8 Computer out put

Excel has a built-in function (called normdist) that can be used to find areas under the normal
curve for a given z-score or to calculate a z-score that has a given area under the normal
curve to its left.

1 The table below shows areas under the standard normal curve to the left of various z-
scores.
102

z-score area
-2.5 0.0062
-2 0.0228
-1.5 0.0668
-1 0.1587
-0.5 0.3085
0 0.5
0.5 0.6915
1 0.8413
1.5 0.9332
2 0.9772
2.5 0.9938

2 Using the inverse function of normdist, the table below shows z-scores for certain areas
under the standard normal curve to its left.

area z-score
0.005 -2.5758
0.01 -2.3263
0.025 -1.96
0.05 -1.6449
0.1 -1.2816
0.2 -0.8416
0.8 0.8416
0.9 1.2816
0.95 1.6449
0.975 1.96
0.99 2.3263
0.995 2.5758

Chapter 6 – Sampling distributions

6.1 Definitions

A sampling distribution arises when repeated samples are drawn from a particular
population (distribution) and a statistic (numerical measure of description of sample data) is
calculated for each sample. The interest is then focused on the probability distribution
(called the sampling distribution) of the statistic.

Sampling distributions arise in the context of statistical inference i.e., when statements are
made about a population by drawing random samples from it.

Example

Suppose all possible samples of size 2 are drawn with replacement from a population with
sample space S = {2, 4, 6, 8} and the mean calculated for each sample.

The different values that can be obtained and their corresponding means are shown in the
table below.
103

1st value/2nd value 2 4 6 8


2 2 3 4 5
4 3 4 5 6
6 4 5 6 7
8 5 6 7 8

In the above table the row and column entries indicate the two values in the sample (16
possibilities when combining rows and columns). The mean is located in the cell
4+ 6
=5
corresponding to these entries e.g., 1 value = 4, 2 value = 6 has a mean entry of 2
st nd
.

Assuming that random sampling is used, all the mean values in the above table are equally
likely. Under this assumption the following distribution can be constructed for these mean
values.

x̄ 2 3 4 5 6 7 8 sum
count 1 2 3 4 3 2 1 16
P( 1 1 3 1 3 1 1 1
X̄ = x̄ ) 16 8 16 4 16 8 16

The above distribution is referred to as the sampling distribution of the mean for random
samples of size 2 drawn from this distribution. For this sampling distribution the population
size N = 4 and the sample size n = 2.

The mean of the population from which these samples are drawn is µ = 5 and the variance is

σ2 = [ ∑ x −( ∑ x ) / N ]÷N = (22+42+62+82 -202/4)/4 = 5.


2 2

The sampling distribution of the mean has mean


μ X̄ =5 and variance
440
−5 2
σ 2X̄ = ∑ x̄ 2
P( X̄= x̄ ) 2
-µ = 16 = 2.5 (verify this result).

Note that
μ X̄ =5 = µ and that σ 2X̄ = σ2/2 = 5/2 =2.5.

Consider a population with mean µ and variance σ2. It can be shown that the mean and
variance of the sampling distribution of the mean, based on a random sample of size n, are
given by

μ X̄ =μ and σ 2X̄ = σ2/n.

σ
σ X̄ = √ n is known as the standard error.

In the preceding example n = 2.


104

Sampling distributions can involve different statistics (e.g., sample mean, sample proportion,
sample variance) calculated from different sample sizes drawn from different distributions.
Some of the important results from statistical theory concerning sampling distributions are
summarized in the sections that follow.

6.2 The Central Limit Theorem

The following result is known as the Central Limit Theorem.

Let X1, X2, . . . , Xn be a random sample of size n drawn from a distribution with mean µ
n
X̄ =∑ X i / n
and variance σ2 (σ2 should be finite). Then for sufficiently large n the mean i=1 is
approximately normally distributed with mean = X̄
μ =μ and variance = σ X̄ = σ2/n.
2

This result can be written as X̄ ~ N(µ, σ2/n).


Note:
X̄−μ
1 The random variable Z = σ / √ n ~ N(0, 1).

2 The value of n for which this theorem is valid depends on the distribution from which the
sample is drawn. If the sample is drawn from a normal population, the theorem is valid for all
n. If the distribution from which the sample is drawn is close to being normal, a value of n >
30 will suffice for the theorem to be valid. If the distribution from which the sample is drawn
is substantially different from a normal distribution e.g., positively, or negatively skewed, a
value of n much larger than 30 will be needed for the theorem to be valid.

3 There are various versions of the central limit theorem. The only other central limit
theorem result that will be used here is the following one.
If the population from which the sample is drawn is a Bernoulli distribution (consists of only
values of 0 or 1 with probability p of drawing a 1 and probability of q = 1-p of drawing a 0),
n
S=∑ X i 2
then i=1 follows a binomial distribution with mean µS = np and variance σ S = npq.

n
^
P=S / n=∑ X i / n
According to the central limit theorem, i=1 follows a normal distribution
^ ^ σ 2S
with mean µ( P ) = µS /n = np/n = p and variance σ2( P ) = / n2 = npq/n2 = pq/n when n is
^
sufficiently large. P is the proportion of 1’s in the sample and can be seen as an estimate of p
the proportion of 1’s in the population (distribution from which sample is drawn).

Using the central limit theorem, it follows that

^
P−μ( P^ ) P−^ p
=
^ √ pq/n
Z = σ(P) ~ N(0, 1).
105

Example:

An electric firm manufactures light bulbs whose lifetime (in hours) follows a normal
distribution with mean 800 and variance 1600. A random sample of 10 light bulbs is drawn
and the lifetime recorded for each light bulb. Calculate the probability that the mean of this
sample

(a) differs from the actual mean lifetime of 800 by not more than 16 hours.

(b) differs from the actual mean lifetime of 800 by more than 16 hours.

(c) is greater than 820 hours.

(d) is less than 785 hours.

(a) P(-16 ≤ X̄ − 800 ≤ 16) = P( | X̄ −800 | ≤ 16) = P(|Z |≤ 16/√ 1600/10 ) = P(|Z|≤1.265)
= P(Z ≤ 1.265) – P(Z ≤ -1.265)

= 0.8971 – 0.1029 = 0.7942

(b) P( | X̄ −800 | > 16 ) = 1 - P( | X̄ −800 | ≤ 16) = 1 – 0.7942 = 0.2058

820−800
)

(c) P( > 820) = P( Z > √ 1600 /10 = P( Z > 1.58) = 1 – 0.9429 = 0.0571

785−800
)
(d) P( X̄<¿ ¿ 785) = P( Z < √ 1600 /10 = P( Z < -1.19) = 0.117

6.3 The t-distribution (Student’s t-distribution)

X̄−μ
The central limit theorem states that the statistic Z = σ / √ n follows a standard normal
distribution. If σ is not known, it would be logical to replace σ (in the formula for Z) by its
X̄−μ
sample estimate S. For small values of the sample size n , the statistic t = S/ √ n does not
follow a normal distribution. If it is assumed that sampling is done from a population that is
approximately a normal population, the distribution of the statistic t follows a t-distribution.
This distribution changes with the degrees of freedom = df = n-1 i.e.for each value of degrees
of freedom a different distribution is defined.

The t-distribution was first proposed in a paper by William Gosset in 1908 who wrote the
paper under the pseudonym “Student”. The t-distribution has the following properties.
106

1. The Student t-distribution is symmetric and bell-shaped, but for smaller sample sizes it
shows increased variability when compared to the standard normal distribution (its curve
has a flatter appearance than that of the standard normal distribution). In other words, the
distribution is less peaked than a standard normal distribution and with thicker tails. As
the sample size increases, the distribution approaches a standard normal distribution. For
n > 30, the differences are small.
2. The mean is zero (like the standard normal distribution).
3. The distribution is symmetrical about the mean.
4. The variance is greater than one but approaches one from above as the sample size
increases (σ2=1 for the standard normal distribution).

The graph below shows how the t-distribution changes for different values of r (the degrees
of freedom).

Tables for the t-distribution

The layout of the t-tables is as follows.

ν =df/ 0.25 0.20 .. 0.0005


α
1 1 1.376 636.6
2 0.816 1.061 31.6
. .
. .
∞ 0.674 0.841 3.291

The row entry is the degrees of freedom (df) and the column entry (α) the area under the t-
curve to the right of the value that appears in the table at the intersection of the row and
column entry.
107

When a t-value that has an area less than 0.5 to its left is to be looked up, the fact that the t-
distribution is symmetrical around 0 is used i.e.,

P(t ≤ tα) = P(t ≤ -t1-α) = P(t ≥ t1-α) for α ≤ 0.5 (Using symmetry).

This means that tα = -t1-α .

Examples
1 For df = 2 and α = 0.005 the entry is 9.925. This means that for the t-distribution with 2
degrees of freedom P(t ≥ 9.925) = 0.005.
2 For df = ∞ and α = 0.95 the entry is 1.645. This means that for the t-distribution with ∞
degrees of freedom
P(t ≤ 1.645) = 0.95. This is the same as P(Z ≤ 1.645) , where Z ~ N(0,1).
3 For df = ν = 10 and α = 0.10 the value of t0.10 such that P(t ≤ t0.10 ) = 0.10 is found from
t0.10 = -t1-0.10 = -t0.90 = 1.372.
Note that the percentile values in the last row of the t-distribution are identical to the
corresponding percentile entries in the standard normal table. Since the t-distribution for large
samples (degrees of freedom) is the same as the standard normal distribution, their percentiles
should be the same.

6.4 The chi-square (χ2) distribution

The chi-square distribution arises in several sampling situations. These include the ones
described below.

1 Drawing repeated samples of size n from an approximate normal distribution with


variance σ2 and calculating the variance (S2) for each sample. It can be shown that the
quantity

(n−1)S 2
χ2 = σ 2 follows a chi-square distribution with degrees of freedom = n-1.

2 When comparing sequences of observed and expected frequencies as shown in the table
below. The observed frequencies (referring to the number of times values of some variable of
interest occur) are obtained from an experiment, while the expected ones arise from some
pattern believed to be true.

observed frequency f1 f2 .. fk
expected frequency e1 e2 .. ek

k
( f i −ei )2
∑ e
The quantity χ2 = i=1 i can be shown to follow a chi-square distribution with k-1
degrees of freedom. The purpose of calculating this χ2 is to make an assessment as to how
well the observed and expected frequencies correspond.

The chi-square curve is different for each value of degrees of freedom. The graph below
shows how the chi-square distribution changes for different values of ν (the degrees of
freedom).
108

Unlike the normal and t-distributions the chi-square distribution is only defined for positive
values and is not a symmetrical distribution. As the degrees of freedom increase, the chi-
square distribution becomes more a more symmetrical. For a sufficiently large value of
degrees of freedom the chi-square distribution approaches the normal distribution.

Tables for the chi-square distribution

The layout of the chi-square tables is as follows.

ν= 0.995 0.99 .. 0.01 0.005


df/α
1 0.000039 0.00015 6.63 7.88
7
2 0.010025 0.02010 9.21 10.60
1
.
30 13.79 14.95 50.89 53.67

The row entry is the degrees of freedom (df) and the column entry (α) the area under the chi-
square curve to the riright of the value that appears in the table at the intersection of the row
and column entry.

Examples:

1 For df = 30 and α = 0.99 the entry is 14.95. This means that for the chi-square distribution
with 30 degrees of freedom

2
P( χ ≥ 14.95) = 0.99.
109

2 For df = 30 and α = 0.005 the entry is 53.67. This means that for the chi-square distribution
with 30 degrees of freedom

2
P( χ ≥53.67) = 0.005.

3 For df = 6 and α = 0.05 the entry is 12.59. This means that for the chi-square distribution
with 6 degrees of freedom

2 2
P( χ ≤12 .59 ) = 0.95 or P( χ >12. 59 ) = 0.05.

This probability statement is illusrated in the next graph.

6.5 The F-distribution

Random samples of sizes n1and n2 are drawn from normally distributed populations that are
2
labeled 1 and 2 respectively. Denote the variances calculated from these samples by S1 and
S22 respectively and their corresponding population variances by σ 21 and σ 22 respectively.
S 21 / σ 21
F=
The ratio S 22 / σ 22 is distributed according to an F-distribution (named after the famous

statistician R.A. Fisher) with degrees of freedom


df 1=n 1−1 (called the numerator degrees of
freedom) and
df 2 =n2 −1 (called the denominator degrees of freedom). When σ 21= σ 22 the
S 21
F=
F-ratio is S 22 .
110

The F-distribution is positively skewed, and the F-values can only be positive. The graph
2 2
below shows plots for three F-distributions (F-curves) with σ 1= σ 2 . These plots are referred
to by F (df 1 , df 2 ) e.g., F (33,10) refers to an F-distribution with 33 degrees of freedom
associated with the numerator and 10 degrees of freedom associated with the denominator.
df df
For each combination of 1 and 2 there is a different F-distribution. Three other important
distributions are special cases of the F-distribution. The normal distribution is an F(1,
infinity) distribution, the t-distribution an F(1, n2 ) distribution and the chi-square distribution
an F(
n1 , infinity) distribution.

Tables for the F-distribution


2 2
The layout of the F-distribution tables with σ 1= σ 2 is as follows.

df2/df1 1 2 ... ∞
1 161.5 199. 254.3
5
2 18.51 19.0 19.5
.
∞ 3.85 3.0 ... 1.01

The entry in the table corresponding to a pair of (


df 1 , df 2 ) values has an area of α under the
F (df 1 , df 2 ) curve to its right.

Examples

1 F (3 ,26 )=2. 98 has an area (under the F (3,26) curve) of α =0 . 05 to its right (see graph
below).
111

2 F (4 ,32 )=2.67 has an area (under the F (4,32) curve) of α =0 . 05 to its right (see graph
below).

For each different value of α a different F-table is used to read off a value that has an area of
α to its right i.e. a percentage of 100(1-α ) to its left. The F-tables that are used and their α
and 100(1-α ) values are summarized in the table below.

α Percentage point = 100(1-α


)
0.05 95%
0.025 97.5%
0.01 99%

The first entry in the above table refers to the percentage of the area under the F-curve to the
left of the F-value read off and the second entry to the proportion under the F-curve to the
right of this F-value.
112

Examples:

1 For 1
df =7 , df =52 the value read from the 95% F-distribution table is 4.88. This means
that for this F-distribution 95% of the area under the F-curve is to the left of 4.88 (a
proportion of 0.05 to the right of 4.88).

P( F ¿4 .88) = 0.95
P( F >4.88) = 0.05

2 For 1
df =7 , df =5
2 the value read from the 97.5% F-distribution table is 6.85. This
means that for this F-distribution 97.5% of the area under the F-curve is to the left of 6.85 (a
proportion of 0.025 to the right of 6.85).

P( F ¿6.85) = 0.975
P( F >6.85) = 0.025

3 For 1
df =10 , df =17
2 the value read from the 99% F-distribution table is 3.59. This means
that for this F-distribution 99% of the area under the F-curve is to the left of 3.59 (a
proportion of 0.01 to the right of 3.59).

P( F ¿3 .59 ) = 0.99
P( F >3.59) = 0.01

Lower tail values from the F-distribution

Only upper tail values (areas of 5%, 2.5% and 1% above) can be read off from the F-tables.
Lower tail values can be calculated from the formula

1
F (df 1 , df 2 ;α )=
F (df 2 , df 1 ,1−α ) i.e.

F value with an area α under the F-curve to its left

= 1/ (F value with an area 1− α under the F-curve to its left with numerator and
denominator
degrees of freedom interchanged)

Examples

1 Find the value such that 2.5% of the area under the F(7,5) curve is to the left of it.

In the above formula


df 1=7 , df 2 =5 and α =0 . 025 . Then
113

1 1
F (7,5;0.025)= = =0.189.
F (5,7;0.975) 5.29
2 Find the value such that 1% of the area under the F(10,17) curve is to the left of it.

In the above formula


df 1=10 , df 2 =17 and α =0 . 01 . Then

1 1
F (10,17;0.01)= = =0.223.
F (17,10; 0.99) 4.49
6.6 Computer output

In excel values from the t, chi-square and F-distributions, that have a given area under the
curve above it, can be found by using the TINV(area, df), CHIINV (area, df) and FINV(area,
df1,df2) functions respectively.

Examples

1 TINV(0.05, 15) = 2.13145. The area under the t(15) curve to the right of 2.13145 is 0.025
and to the left of -2.13145 is 0.025. Thus, the total tail area is 0.05.

2 CHIINV(0.01, 14) = 29.14124. The area under the chi-square (14) curve to the right of
29.14124 is 0.01.

3 FINV(0.05,10,8) = 3.347163. The area under the F (10, 8) curve to the right of 3.347163
is 0.05.
114

Chapter 7 – Statistical Inference: Estimation for one sample case

7.1 Statistical inference

Statistical inference (inferential statistics) refers to the methodology used to draw conclusions
(expressed in the language of probability) about population parameters by selecting random
samples from the population.

Examples

1 The government of a country wants to estimate the proportion of voters ( p ) in the country
that approve of their economic policies.

2 A manufacturer of car batteries wishes to estimate the average lifetime (µ) of their
batteries.

3 A paint company is interested in estimating the variability (as measured by the variance,
σ 2 ) in the drying time of their paints.
2
The quantities p , µ and σ that are to be estimated are called population parameters.

A sample estimate of a population parameter is called a statistic. The table below gives
examples of some commonly used parameters toegether with their statistics.

Parameter Statistic
p ^p
µ x̄
σ2 S2

7.2 Point and interval estimation

A point estimate of a parameter is a single value (point) that estimates a parameter.

An interval estimate of a parameter is a range of values from L (lower value) to U (upper


value) that estimate a parameter. Associated with this range of values is a probability or
percentage chance that this range of values will contain the parameter that is being estimated.

Examples

Suppose the mean time it takes to serve customers at a supermarket checkout counter is to be
estimated.
115

1 The mean service time of 100 customers of (say) x̄= 2.283 minutes is an example of a
point estimate of the parameter µ.

2 If it is stated that the probability is 0.95 (95% chance) that the mean service time will be
from 1.637 minutes to 4.009 minutes, the interval of values (1.637, 4.009) is an interval
estimate of the parameter µ.

The estimation approaches discussed will focus mainly on the interval estimate approach.

7.3 Confidence intervals terminology

A confidence interval is a range of values from L (lower value) to U (upper value) that
estimate a population parameter θ with 100(1-α )% confidence.

θ - pronounced “theta”.

L is the lower confidence limit.

U is the upper confidence limit.

The interval (L, U) is called the confidence interval.

1-α is called the confidence coefficient. It is the probability that the confidence interval will
contain θ the parameter that is being estimated.

100(1−α) is called the confidence percentage.

Example

Consider example 2 of the previous section.

θ , the parameter that is being estimated, is the population mean μ .

L = 1.637, U = 4.009

The confidence interval is the interval (1.637, 4.009).

α =0.05

The confidence coefficient is 1-α = 0.95

The confidence percentage is 100(1−α) = 95.

In the sections that follow the determination of L and U when estimating the parameters µ, p
and σ2 will be discussed.

7.4 Confidence interval for the population mean (population variance known)
116

The determination of the confidence limits is based on the central limit theorem (discussed in
the previous chapter). This theorem states that for sufficiently large samples

σ2 X̄−μ
)
the sample mean X̄ ~ N(µ, n and hence that Z = σ / √ n ~ N(0, 1).

Formulae for the lower and upper confidence limits can be constructed in the following way.

Since Z ~ N(0,1), it follows from the above graph that

P(-1.96 ≤ Z ≤ 1.96) = 0.95.

X̄−μ X̄−μ
P(-1.96 ≤ σ / √ n ≤ 1.96) = 0.95 , ( Substitute Z = σ / √ n in the line above ).

By a few steps of mathematical manipulation (not shown here), the above part in brackets can
be changed to have only the parameter µ between the inequality signs. This will give

σ σ
X̄ −1. 96 X̄ +1 . 96
P( √n ≤ µ ≤ √n ) = 0.95 .
σ σ
X̄ −1. 96 X̄ +1 . 96
Let L = √ n and U = √n . Then the above formula can be written as
117

P(L ≤ µ ≤ U) = 0.95.

This formula is interpreted in the following way.

Since both L and U are determined by the sample values (which determine X̄ ), they (and the
confidence interval) will change for different samples. Since the parameter µ that is being
estimated remains constant, these intervals will either include or exclude µ. The Central Limit
Theorem states that such intervals will include the parameter µ with probability 0.95 (95 out
of 100 times).

In a practical situation the confidence interval will not be determined by many samples, but
by only one sample. Therefore, the confidence interval that is calculated in a practical
situation will involve replacing the random variable X̄ by the sample value x̄ . Then the
above formulae for a 95% confidence interval for the population mean µ becomes

σ σ σ
x̄−1 . 96 x̄ +1. 96 x̄±1 . 96
( √n , √ n .) or √n .
The percentage of confidence associated with the interval is determined by the value (called
the z – multiplier) obtained from the standard normal distribution. In the above formula a z-
multiplier of 1.96 determines a 95% confidence interval.

If a different percentage of confidence is required, the z – multiplier needs to be changed. The


table below is a summary of z-multipliers needed for different percentages associated with
confidence intervals.

confidence percentage 99 95 90
z-multiplier 2.576 1.96 1.645
α 0.01 0.05 0.10

Calculation of confidence interval for µ (σ2 known)


Step 1 : Calculate x̄ . Values of n, σ2 and confidence percentage are given
Step 2 : Look up z-multiplier for given a confidence percentage.
σ
Step 3 : Confidence interval is x̄± z-multiplier √ n

Example

The actual content of cool drink in a 500 milliliters bottle is known to vary. The standard
deviation is known to be 5 milliliters. Thirty (30) of these 500 milliliter bottles were selected
at random and their mean content found to 498.5. Calculate 95% and 99% confidence
intervals for the population mean content of these bottles.

Solution

95% confidence interval


118

Substituting x̄ = 498.5, n = 30, σ = 5, z = 1.96 into the above formula gives

5
498.5 ± 1.96 √ 30 = (496.71, 500.29).

99% confidence interval

Substituting x̄ = 498.5, n = 30, σ = 5, z = 2.576 into the above formula gives

5
498.5 ± 2.576 √ 30 = (496.15, 500.85).

7.5 Confidence interval for the population mean (population variance not known)

When the population variance (σ2) is not known, it is replaced by the sample variance (S2) in
the formula for Z mentioned in the previous section. In such a case the quantity

X̄−μ
t = S/ √ n follows a t-distribution with degrees of freedom = df = n-1.

The confidence interval formula used in the previous section is modified by replacing the z-
multiplier by the t-multiplier that is looked up from the t-distribution.

Calculation of confidence interval for µ (σ2 not known)


Step 1: Calculate x̄ and S. Values of n and confidence percentage are given
Step 2: Look up t-multiplier for a given confidence percentage and degrees of freedom =
df.
S
Step 3: Confidence interval is x̄± t-multiplier √ n

Example

The time (in seconds) taken to complete a simple task was recorded for each of 15 randomly
selected employees at a certain company. The values are given below.

38. 43. 38. 26. 41. 42. 37. 37. 41. 42. 50. 37. 36. 31.
2 9 4 2 3 3 5 2 2 3 31 1 3 7 8

Calculate 95% and 99% confidence intervals for the population mean time it takes to
complete this task.

Solution

n = 15 (given), x̄= 38.36, S = 5.78 (Calculated from the data)

95% confidence interval


119

Looking up the t-multiplier involves a row and column entry in the t-table.

Row entry: df = ν = 15-1 = 14

Column entry: The α entry is determined from the confidence % required.

1-2α = 95 gives α = 0.025

From the t-tables with df = ν=14 and α = 0.025, t-multiplier = 2.145.

Substituting x̄ = 38.36, n = 15, S = 5.78, t = 2.145 into the above formula gives

5.78
38.36 ± 2.145 √15 = (35.16, 41.56).

99% confidence interval

Looking up the t-multiplier

Row entry: df = ν = 15-1 = 14

Column entry: 1-2α = 99 which gives α = 0.005.

From the t-tables with df = ν=14 and α = 0.005, t-multiplier = 2.977.

Substituting x̄ = 38.36, n = 15, S = 5.78, t = 2.977 into the above formula gives

5.78
38.36 ± 2.977 √15 = (33.92, 42.80).
Interpretation of confidence interval
The confidence limits depend on the sample values and will therefore change as the sample
changes. Suppose it is known that μ=9 , σ 2=2.The plot below shows 95% confidence
intervals calculated from 100 different data sets of size n=24 simuated from a N(9, 2)
distribution. Most of the confidence intervals will include the true mean of 9. The expression
“with 95% confidence” is interpreted as “95% of the simulated confidence intervals will
include the true mean of 9”.
120

7.6 Confidence interval for population variance

The formula for the confidence interval of the population variance σ2 follows from the fact
(n−1)S 2 α
2 χ 2 ( 1− )
that σ follows a chi-square distribution with (n-1) degrees of freedom. Let 2
α α 100 α
χ2 ( ) 1− )
and 2 denote the 100( 2 and 2 percentile points of the chi-square distribution
with (n-1) degrees of freedom. These points are shown in the graph below.
121

For this distribution, it follows from the graph above that

2 α (n−1)S 2 α
χ ( ) 2 χ 2 ( 1− )
P[ 2 ≤ σ ≤ 2 ] = 1-α .

By a few steps of mathematical manipulation (not shown here), the above part in brackets can
be changed to have only the parameter σ2 between the inequality signs. This will give

(n−1)S 2 (n−1)S 2
]
P[ upper ≤ σ2 ≤ lower = 1-α ,

α
χ 2 ( 1− )
where upper = 2 , the larger of the 2 percentile points and

α
χ2 ( )
lower = 2 , the smaller of the 2 percentile points.

The values of α and α / 2 are calculated from

confidence percentage = 100(1-α ) e.g., if

confidence percentage = 95, α = 0.05, α / 2=0 . 025 .


122

Calculation of confidence interval for σ2


Step 1: Calculate S2. Values of n and confidence percentage are given
Step 2: Look up upper and lower chi-square values for a given confidence percentage and
degrees of freedom = df.
(n−1)S 2 (n−1)S 2
]
Step 3: Confidence interval is [ upper , lower

Example

Calculate 90% and 95% confidence intervals for the population variance of the time taken to
complete the simple task (see previous example).

Solution

n =15, S2 = 33.3811 (Calculated from the data)

90% confidence interval

Look up upper and lower chi-square values by using df = ν = 14 and α =0.10.

α
χ 2 ( 1− ) 2
upper = 2 = χ (0 .95 ) = 23.68 for ν = 14.

α
χ2 ( ) 2
lower = 2 = χ (0 .05 ) = 6.57 for ν = 14.

(n-1)S2= 14 x 33.3811 = 467.34

467 . 34 467 . 34
)
The confidence interval is (23 . 68 , 6 . 57 = (19.74, 71.13).

95% confidence interval

Look up upper and lower chi-square values by using df = ν = 14 and α =0.05.

α
χ 2 ( 1− ) 2
upper = 2 = χ (0 .975 ) = 26.12 for ν = 14.

α
χ2 ( ) 2
lower = 2 = χ (0 .025 ) =5.63 for ν = 14.

(n-1)S2= 14 x 33.3811 = 467.34

467 . 34 467 . 34
)
The confidence interval is (26 . 12 , 5 . 63 = (17.89, 83.01).
123

7.7 Confidence interval for population proportion

In some experiments the interest is in whether items posses a certain characteristic of interest
(e.g., whether a patient improves or not after treatment, whether an item manufactured is
acceptable or not, whether an answer to a question is correct or incorrect). The population
proportion of items labeled “success” in such an experiment (e.g., patient improves, item is
acceptable, answer is correct) is estimated by calculating the sample proportion of “success”
items.

The determination of the confidence limits for the population proportion of items labeled
^ X
P=
“success” is based on the central limit theorem for the sample proportion n , where X is
the number of items in the sample labeled “success”.

This theorem states that for sufficiently large samples

pq
^ )
the sample proportion of “success” items P ~ N(p, n and hence that

^
P−μ( P^ ) P−^ p
=
^ √ pq/n
Z = σ(P) ~ N(0, 1).

Formulae for the lower and upper confidence limits can be constructed in the following way.

Since Z ~ N(0,1),

P(-1.96 ≤ Z ≤ 1.96) = 0.95

^ p
P−
P(-1.96 ≤ √ pq /n ≤ 1.96) = 0.95

By a few steps of mathematical manipulation (not shown here), the above part in brackets can
be changed to have the parameter p (in the numerator) between the inequality signs. This will
give

P(
^ . 96 √ pq /n ≤ p ≤ P+1
P−1 ^ .96 √ pq /n ) = 0.95.

Since the confidence interval formula is based on a single sample, the random variable
^ X
P= ^p=
x
n is replaced by its sample estimate n and the parameters p and q=1-p by their
x
^p=
respective sample estimates ^
n and q=1− p^ .

This gives the following 95% confidence interval for p: ( ^p−1. 96 √ p^ q^ /n , ^p +1 . 96 √ ^p q^ /n ).


124

If the percentage of confidence is changed, the z-multiplier is changed according to the


values given in the table below.

confidence percentage 99 95 90
z-multiplier 2.576 1.96 1.645
α 0.01 0.05 0.10

Calculation of confidence interval for p


x
^p=
Step 1 : Calculate ^
n and q=1− p^ . x, n and confidence percentage are
given
Step 2 : Look up z-multiplier for given a confidence percentage.
Step 3 : Confidence interval is ^p± z-multiplier√ p^ q^ /n
Example

During a marketing campaign for a new product, 176 out of the 200 potential users of this
product that were contacted indicated that they would use it. Calculate a 90% confidence
interval for the proportion of potential users who would use this product.

Solution

x = 176, n = 200, confidence percentage = 90 (given)

176
^p = 200 = 0.88, q=1−
^ p^ = 0.12.
z-multiplier = 1.645 (From above table)

Confidence interval is (0.88 ± 1.645 √ 0 . 88∗0 . 12/200 ) = (0.88 ± 0.0378) = (0.842, 0.918).

The confidence interval for the proportion of successes calculated above is called the
binomial confidence interval (based on the binomial distribution formulae).

Other approaches used to calculate such a confidence interval are the Wilson, Clopper-
Pearson, Jeffreys, Agresti-Coull and Arcsine.

7.8 Sample size when estimating the population mean

2
Consider the formula for the confidence interval of the mean (µ) when σ is known.

σ
x̄± z-multiplier √ n
125

σ
The quantity z-multiplier √ n is known as the error (denoted by E).

The smaller the error, the more accurately the parameter μ is estimated.

Suppose the size of the error is specified in advance and the sample size n is determined to
achieve this accuracy. This can be done by solving for n from the equation

σ
E = z-multiplier √ n , which gives

z−multiplier∗σ 2
)
n=( E .

The z-multiplier is determined by the percentage confidence required in the estimation.

Example

Consider the example on the interval estimation of the mean content of 500 millilitre cool
drink bottles. The standard deviation σ is known to be 5. Suppose it is desired to estimate the
mean with 95% confidence and an error that is not greater than 0.8. What sample size is
needed to achieve this accuracy?

Solution

σ = 5, E = 0.8 (given), z-multiplier = 1.96 (from 95% confidence requirement).

1. 96∗5 2
)
n= ( 0 . 8 = 150.0625 =151 (n is always rounded up).

7.9 Sample size for estimation of population proportion

The approach used in determining the sample size for the estimation of the population
proportion is much the same as that used when estimating the population mean.

The equation to be solved for n is

z−multiplier∗√ pq
E= √n .

When solving for n the formula becomes

z−multiplier 2
)
n = pq ( E .
126

A practical problem encountered when using this formula is that values for the parameters p
and q=1-p are needed. Since the purpose of this technique is to estimate p, these values of p
and q are obviously not known.

If no information on p is available, the value of p that will give the maximum value of

p(1-p) = pq will be taken.

It can be shown that p= ½ maximizes this expression. This gives

max pq = ¼.

Substituting this maximum value in the above formula gives

z−multiplier 2
)
max n = ¼ ( E .

If more accurate information on the value of p is known (e.g., some range of values), it
should be used in the above formula.

As explained before, the z-multiplier is determined by the percentage confidence required in


the estimation.

Example

Consider the problem (discussed earlier) of estimating the proportion of potential users who
would use a new product. Suppose this proportion is to be estimated with 99% confidence
and an error not exceeding 2% (proportion of 0.02) is required. What sample size is needed to
achieve this?

Solution

E = 0.02 (given), z-multiplier = 2.576 (99% confidence required)

2. 576 2
)
n = ¼ ( 0. 02 = 4147.36 = 4148 (rounded up).
Supppose it is known that the value of p is between 0.8 and 0.9. In such a case

max p(1− p )= pq = 0.8 x 0.2 =0.16 (Why is p = 0.8 used?).

0.8¿ p≤ 0.9
By using this information, the value of n can be calculated as

2. 576 2
)
n =0.16 ( 0. 02 = 2654.31 = 2655 (rounded up).

The additional information on possible values for p reduces the sample size by 36%.

7.10 Computer output


127

2
1 Confidence interval for the mean (σ known). For the data in the example in section 7.4,
the information can be typed on an excel sheet and the confidence interval calculated as
follows.

mean 498.5
sigma 5
n 30
z 1.9599
multiplier 64
Confiden interva
ce l
lower 496.71
upper 500.29

2
2 Confidence interval for the mean (σ not known). For the data in the example in section
7.5, the information can be typed on an excel sheet and the confidence interval calculated as
follows.

mean 38.36
5.7776
stand.dev 42
n 15
t 2.1447
multiplier 87
Confiden interva
ce l
lower 35.16
upper 41.56

3 Confidence interval for the variance. For the data in the example in section 7.6, the
information can be typed on an excel sheet and the confidence interval calculated as follows.

33.3811
variance 4
n 15
degrees of
freedom 14
5.62872
lower chisq. 6
26.1189
upper chisq. 5
Confidence interval
lower 17.89
upper 83.03

4 Confidence interval for the proportion of successes. For the data in the example in section
7.7, the information can be typed on an excel sheet and the confidence interval calculated as
follows.
128

n 200
x 176
z 1.6448
multiplier 54
0.0229
st.error 78
Confiden interva
ce l
lower 0.842
upper 0.918

Chapter 8 – Statistical Inference: Testing of hypotheses for one sample

8.1 Formulation of hypotheses and related terminology

A statistical hypothesis is an assertion (claim) made about a value(s) of a population


parameter.

The purpose of testing of hypotheses is to determine whether a claim that is made could be
true. The conclusion about the truth of such a claim is not stated with absolute certainty, but
rather in terms of the language of probability.

Examples of claims to be tested

1 A supermarket receives complaints that the mean content of “1 kilogram” sugar bags that
are sold by them is less than 1 kilogram.

2 The variability in the drying time of a certain paint (as measured by the variance) has until
recently been 65 minutes. It is suspected that the variability has now increased.

3 A construction company suspects that the proportion of jobs they complete behind
schedule is 0.20 (20%). They want to test whether this is indeed the case.

Null and alternative hypotheses

The null hypothesis (H0) is a statement concerning the value of the parameter of interest (θ
) in a claim that is made. This is formulated as

H0:
θ=θ 0 (The statement that the parameter θ is equal to the hypothetical value θ0 ).
129

The alternative hypothesis (H1) is a statement about the possible values of the parameter
θ that are believed to be true if H0 is not true. One of the alternative hypotheses shown
below will apply.

H1a:
θ<θ 0 or H : θ>θ 0 or H : θ≠θ0 .
1b 1c

Examples

1 In the first example (above) the parameter of interest is the population mean µ and the
hypotheses to be tested are

H0: µ = 1 (Population mean is 1 kilogram)

versus

H1a: µ < 1 (Population mean is less than 1 kilogram)

In terms of the general notation stated above θ =µ and


θ0 =1 .

2 In the second example (above) the parameter of interest is the population variance σ 2 and
the hypotheses to be tested are

H0: σ2 = 65 (Population variance is 65)

versus

H1b: σ2 > 65 (Population variance is greater than 65)

In terms of the general notation stated above θ = σ2 and


θ0 =65 .

3 In the third example (above) the parameter of interest is the population proportion, p, of
job completions behind schedule and the hypotheses to be tested are

H0: p = 0.20 (Population proportion is 0.20)

versus

H1c: p ≠ 0.20 (Population proportion is not equal to 0.20)

In terms of the general notation stated above θ = p and


θ0 =0 . 20 .

One and two-sided alternatives

A one-sided alternative hypothesis is one that specifies the alternative values (to the null
hypothesis) in a direction that is either below or above that specified by the null hypothesis.

Example
130

The alternative hypothesis H1a (see example 1 above) is the alternative that the value of the
parameter is less than that stated under the null hypothesis and the alternative H1b (see
example 2 above) is the alternative that the value of the parameter is greater than that stated
under the null hypothesis.

A two-sided alternative hypothesis is one that specifies the alternative values (to the null
hypothesis) in directions that can be either below or above that specified by the null
hypothesis.

Example

The alternative hypothesis H1c (see example 3 above) is the alternative that the value of the
parameter is either greater than that stated under the null hypothesis or less than that stated
under the null hypothesis.

8.2 Testing of hypotheses for one sample: Terminology and summary of procedure

The testing procedure and terminology will be explained for the test for the population mean
μ with population variance σ2 known.

The hypotheses to be tested are

H0: µ = µ0

versus

H1a: µ < µ0 or H1b: µ > µ0 or H1c: µ≠ µ0.

The data set that is needed to perform the test is x1, x2, . . . , xn ,

a random sample of size n drawn from the population for which the mean is tested. The test is
performed to see whether the sample data are consistent with what is stated by the null
hypothesis. The instrument that is used to perform the test is called a test statistic.

A test statistic is a quantity calculated from the sample data.

When testing for the population mean, the test statistic used is

x̄−μ0
z0 = σ / √ n .

If the difference between x̄ and µ0 (and therefore the value of z0) is reasonably small, H0 will
not be rejected. In this case the sample mean is consistent with the value of the population
mean that is being tested. If this difference (and therefore the value of z0) is sufficiently large,
H0 will be rejected. In this case the sample mean is not consistent with the value of the
population mean that is being tested. To decide how large this difference between x̄ and μ0
(and therefore the value of z0) should be before H0 is rejected, the following should be
considered.
131

Type I error

A type I error is committed when the null hypothesis is rejected when, in fact it is true i.e.,
H0 is wrongly rejected.

In this test, a type I error is committed when it is decided that the statement H 0: µ = μ0 should
be rejected when, in fact, it is true.

A type II error is committed when the null hypothesis is not rejected when, in fact, it is
false i.e., a decision not to reject H0 is wrong.

In this test, a type II error is committed when it is decided that the statement H 0: µ = μ0
should not be rejected when, in fact, it is false.

The power of a statistical test is the probability that H 0 is rejected when, in fact, it is false.

The following table gives a summary of possible conclusions and their correctness when
performing a test of hypotheses.

Actually true/Conclusion Reject H0 Do not reject H0


H0 is true Type I error Correct conclusion
H0 is false Correct conclusion Type II error

A type I error is often considered to be more serious, and therefore more important to avoid,
than a type II error. The hypothesis testing procedure is therefore designed so that there is a
guaranteed small probability of rejecting the null hypothesis wrongly. This probability is
never 0 (why?). Mathematically the probability of a type I error can be stated as

P(type I error) = P(Reject H0 | H0 is true) = α.

When testing for the population mean,

P(type I error) = P(reject μ = μ0 | μ = μ0 is true) = α and

P(type II error) = P(do not reject µ = µ0 | µ = µ0 is false) = β.

Probabilities of type I and type II errors work in opposite directions. The more reluctant you
are to reject H0, the higher the risk of accepting it when, in fact, it is false. The easier you
make it to reject H0, the lower the risk of accepting it when, in fact, it is false.

When taking power into account the sample size to be used in a test is
132

2
(z α + z β ) ¿
n= 2 , where ES = effect size ¿ ¿ μ1−μ0 ∨ σ ¿ , α is the level of significance, 1−β is
ES 2
the power and μ0and μ1are the means under H 0 and H 1 respectively.

Critical value(s) and critical region

The critical (cut-off) value (s) for tests of hypotheses is a value(s) with which the test
statistic is compared to determine whether or not the null hypothesis should be rejected.

The critical value is determined according to the specified value of α, the probability of a type
I error.

For the test of the population mean the critical value is determined in the following way.

X̄−μ 0
Assuming that H0 is true, the test statistic Z0 = σ / √ n ~ N(0, 1).

(i) When testing H0 versus the alternative hypothesis H1a (µ < µ0), the critical value is the
value Zα which is such that the area under the standard normal curve to the left of Zα is
α i.e., P(Z0 < Zα) = α.

The graph below illustrates the case α = 0.05 i.e., P(Z0 < -1.645) = 0.05.

(ii) When testing H0 versus the alternative hypothesis H1b (µ > µ0), the critical value is the
value Z1-α which is such that the area under the standard normal curve to the right of Z1-α is
α i.e., P(Z0 > Z1-α) = α..

The graph below illustrates the case α = 0.05 i.e. P(Z0 > 1.645) = 0.05.
133

(iii) When testing H0 versus the alternative hypothesis H1c (µ ≠ µ0), the critical values are the
values Z1-α/2 and Zα/2 which are such that the area under the standard normal curve to the right
of Z1-α/2 is α/2 and the area under the standard normal curve to the left of Zα/2 is α/2. i.e. P(Z0
> Z1-α/2) = α/2 and P(Z0 < Zα/2) = α/2.

The area under the normal curve between these two critical values is 1-α. The graph below
illustrates the case α = 0.05 i.e. P(Z0 <-1.96 or Z0> 1.96) = 0.05.

The critical region CR, or rejection region R, is the set of values of the test statistic for
which the null hypothesis is rejected.

(i) When testing H0 versus the alternative hypothesis H1a , the rejection region is

{ z0 | z0 < Zα }.

(ii) When testing H0 versus the alternative hypothesis H1b , the rejection region is

{ z0 | z0 > Z1-α }.

(iii) When testing H0 versus the alternative hypothesis H1c , the rejection region is

{ z0 | z0 > Z 1-α/2 or z0 < Zα/2 }.


H0 is rejected when there is a sufficiently large difference between the sample mean and
the mean (μ0) under H0 . Such a large difference is called a significant difference (result of
the test is significant). The value of α is called the level of significance. It specifies the level

beyond which this difference (between and μ0) is sufficiently large for H0 to be rejected.
The value of α is specified prior to performing the test and is usually taken as either 0.05 (5%
level of significance) or 0.01 (1% level of significance).

When H0 is rejected, it does not necessarily mean that it is not true. It means that according to
the sample evidence available it appears not to be true. Similarly, when H0 is not rejected it
does not necessarily mean that it is true. It means that there is not sufficient sample evidence
to disprove H0.

Critical values for tests based on the standard normal distribution can be found from the
selected percentiles listed at the bottom of the pages of the standard normal table.
134

Using the p-value in testing of hypotheses

The p-value is defined as the probability of getting a value more extreme than that of the test
statistic. Suppose the hypothesis H0: µ = µ0 is being tested and the test statistic values is
found to be
z0.
(i) When testing H0 versus H1a:
μ< μ0 , p-value =P (Z < z 0 ) .
(ii) When testing H0 versus H1b:
μ> μ0 , p-value =P (Z > z 0 ) .
(iii) When testing H0 versus H1c:
μ≠μ 0 , p-value =2 P( Z>|z 0|)

To reach a decision to reject or not reject H0, the p-value is compared to α the level of
significance. If p-value < α , reject H0. If this not true, do not reject H0.

To see that the above- mentioned rule for a decion rule based on a p-value will lead to the
same conclusion as that based on a critical region based on the level of significance α ,
consider the following.

(i) When testing H0 versus H1a:


μ< μ0 , the critical value Z α is chosen that P( Z< Z α )=α .
If p-value =P ( Z < z 0 )< α , then
z 0 < Z α and H is rejected. If p-value =P ( Z < z 0 )≥α ,
0

then
z 0 ≥Z α and H is not rejected.
0

(ii) When testing H0 versus H1b:


μ> μ0 , the critical value Z1−α is chosen that
P( Z> Z 1−α )=α . If p-value =P ( Z > z 0 )< α , then z 0 > Z 1−α and H0 is rejected. If p-value
=P ( Z > z 0 )≥α , then z 0 ≤Z 1−α and H is not rejected.
0

(iii) When testing H0 versus H1c:


μ≠μ 0 , the critical values Z α /2 and Z1−α /2 are chosen that
P( Z< Z α /2 )=α / 2 and P( Z> z 1−α /2 )=α /2 . If P( Z> ¿ z ∨¿< α , it follows that
0
2
|z |< Z α / 2 or |z 0|>Z 1−α /2 and H is rejected. If P(
p-value=2 P( Z>|z 0|)< α then either 0 0
α |z
Z> ¿ z ∨¿> , p-value =2 P( Z> ¿ z ∨¿ ≥ α , then either 0 |≥Z α /2 or |z 0|≤Z 1−α /2 and H is
0 0 0
2
not rejected.

In the above definitions, the calculation of the p-value is based on the standard normal
z
distribution (of Z ) and the test statistic value 0 . When performing a different test of
hypotheses, the p-value calculation will be based on the test statistic applicable to the test and
its associated sampling distribution.

8.3 Test for the population mean (population variance known)

A summary of the steps to be followed in the testing procedure is shown below.

2
Test for μ when σ is known
1 State null and alternative hypotheses.
135

H0:
μ=μ 0 versus H : μ< μ0 or H : μ > μ0 or H : μ≠μ 0
1a 1b 1c
x̄−μ0
z 0=
2 Calculate the test statistic σ /√n .
3 State the level of significance α and determine the critical value(s) and critical region.

(i) For alternative H1a the critical region is R = {z0 | z0 < Zα }.

(ii) For alternative H1b the critical region is R = {z0 | z0 > Z1-α }.

(iii) For alternative H1c the critical region is R = {z0 | z0 > Z1-α/2 or z0 < Zα/2 }.

4 If z0 lies in the critical region, reject H0, otherwise do not reject H0.

5 State conclusion in terms of the original problem.

Examples

1 A supermarket receives complaints that the mean content of “1 kilogram” sugar bags that
are sold by them is less than 1 kilogram. A random sample of 40 sugar bags is selected from
the shelves and the mean found to be 0.987 kilograms. The standard deviation contents of
these bags is known to be 0.025 kilograms. Test, at the 5% level of significance, whether this
complaint is justified.

H0 : μ = 1 (The complaint is not justified)

H1 : μ < 1 (The complaint is justified)

n = 40, x̄ = 0.987, σ = 0.025, μ0 = 1 (given)

0. 987−1
=
Test statistic: z0 = 0 .025 / √ 40 -3.289.

α = 0.05. Critical region R = {z0 < Z0.05 = -1.645}.

Since z0 = -3.289 < -1.645, H0 is rejected.

[p-value =P (Z <−3 .289 )=0.0005<0.05 ].

Conclusion: The complaint is justified.


136

Note on power of the test: The critical region can also be expressed in terms of values of x .
x−1
In the above example H 0 will be rejected if z o= ←1.645. This means that values
0.025 / √40
0.025
of x <1−1.645 × =0.9935 will lead to the rejection of H 0.
√ 40
Suppose H 0 : μ=1 is true. The power of the tests against the alternative H 1 : μ=0.98 will be

( )
0.9935−0.98
P ( x< 0.9935|μ=0.98 )=P z < =3.415 =0.9997
0.025 .
√ 40

2 A supermarket manager suspects that the machine filling “500 gram” containers of coffee
is overfilling them i.e., the actual contents of these containers is more than 500 grams. A
random sample of 30 of these containers is selected from the shelves and the mean found to
be 501.8 grams. The variance of contents of these bags is known to be 60 grams. Test at the
5% level of significance whether the manager’s suspicion is justified.

Solution

H0 : μ = 500 (Suspicion is not justified)

H1 : μ > 500 (Suspicion is justified)

n = 30, x̄ = 501.8, σ2 = 60, μ0 = 500 (given)

501 .8−500
=
Test statistic: z0 = √ 60 /30 1.273.

α = 0.05. Critical region R = {z0 > Z0.95 = 1.645}.

Since z0 = 1.273 < 1.645, H0 is not rejected. [p-value =P (Z >1. 273 )=0 . 102>0 .05 ].

Conclusion: The suspicion is not justified.

3 During a quality control exercise the manager of a factory that fills cans of frozen shrimp
wants to check whether the mean weights of the cans conform to specifications i.e., the mean
of these cans should be 600 grams as stated on the label of the can. He/she wants to guard
against either over or under filling the cans. A random sample of 50 of these cans is selected
and the mean found to be 595 grams. The standard deviation of contents of these bags is
known to be 20 grams. Test, at the 5% level of significance, whether the weights conform to
specifications. Repeat the test at the 10% level of significance.

Solution

H0 : μ = 600 (Weights conform to specifications)

H1 : μ ≠ 600 (Weights do not conform to specifications)

n = 50, x̄ = 595, σ = 20, μ0 = 600 (given)


137

595−600
=
Test statistic: z0 = 20/ √ 50 1.768.

α = 0.05. Critical region R = {z0 < Z0.025 = -1.96 or z0 > Z0.975 = 1.96}.

Since -1.96 < z0 = 1.768 < 1.96, H0 is not rejected.

Conclusion: The weights appear to conform to specifications.

Suppose the test is performed at the 10% level of significance. In such a case

α = 0.10. Critical region R = {z0 < Z0.05 = -1.645 or z0 > Z0.95 = 1.645}.

Since z0 = 1.768 > 1.645, H0 is rejected.


Conclusion: The weights appear not to conform to specifications.

The p-value =2 P( Z>1.768)=0.0771 is greater than 0.05 but less than 0.10.

Thus, being less strict about controlling a type I error (changing α from 0.05 to 0.10) results
in a different conclusion about H0 (reject instead of do not reject).

Note

1 In example 1 the alternative hypothesis H1a was used, in example 2 the alternative H1b and
in example 3 the alternative H1c.

2 Alternatives H1a and H1b [one-sided (tailed) alternatives] are used when there is a
particular direction attached to the range of mean values that could be true if H 0 is not true.

3 Alternative H1c [two-sided (tailed) alternative] is used when there is no direction attached
to the range of mean values that could be true if H0 is not true.

4 If, in the above examples, the level of significance had been changed to 1%, the critical
values used would have been Z0.01= -2.326 (in example 1) , Z0.99 = 2.326 (in example 2) and
and Z0.005 = -2.576, Z0.995= 2.576 (in example 3).

8.4 Test for the population mean (population variance not known): t-test

When performing the test for the population mean for the case where the population variance
is not known, the following modifications are made to the procedure.

1 In the test statistic formula the population standard deviation σ is replaced by the sample
standard deviation S.
138

x̄−μ0
2 Since the test statistic t0 = S/ √ n is used to perform the test, it is based on a
t-distribution with n-1 degrees of freedom. Critical values are looked up in the t-tables.

Assumption: Sample drawn from a normal distribution.

2
Test for μ when σ is not known (t-test)
1 State null and alternative hypotheses.
H0:
μ=μ 0 versus H : μ< μ0 or H : μ > μ0 or H : μ≠μ 0 .
1a 1b 1c
x̄ −μ 0
t0=
2 Calculate the test statistic S /√n .
3 State the level of significance α and determine the critical value(s) and critical region.

Degrees of freedom = ν = n-1.

(i) For alternative H1a the critical region is R = {t0 | t0 < tα}.

(ii) For alternative H1b the critical region is R = {t0 | t0 > t1-α}.

(iii) For alternative H1c the critical region is R = {t0 | t0 > t1-α/2 or t0 < tα/2}.

4 If t0 lies in the critical region, reject H0, otherwise do not reject H0.

5 State conclusion in terms of the original problem.

Examples

A paint manufacturer claims that the average drying time for a new paint is 2 hours (120
minutes). The drying times for 20 randomly selected cans of paint were obtained. The results
are shown below.

123 106 139 135


127 128 119 130
131 133 121 136
122 115 116 133
109 120 130 109

Assuming that the sample was drawn from a normal distribution,

(a) test whether the population mean drying time is greater than 2 hours (120 minutes)

(i) at the 5% level of significance.


(ii) at the 1% level of significance.
139

(b) test, at the 5% level of significance, whether the population mean drying time could be 2
hours (120 minutes).

Solution

(a) H0 : μ = 120 (mean is 2 hours)

H1 : μ > 120 (mean is greater than 2 hours)

n = 20, μ0 = 120 (given), x̄ = 124.1, S = 9.65674 (calculated from the data).

124. 1−120
Test statistic t0 = 9 .65674 / √ 20 = 1.899.

(i) If α = 0.05, 1-α = 0.95. From the t-distribution table with

degrees of freedom = ν = n-1 =19, t0.95 = 1.729.

Critical region R = {t0 > t0.95 = 1.729}.

Since 1.899 > 1.729, H0 is rejected. [p-value =P (t >1 .899 )=0. 0364 <0 .05 ].

Conclusion: The mean drying time appears to be greater than 2 hours.

(ii) If α = 0.01, 1-α = 0.99. From the t-distribution table with

degrees of freedom = ν = n-1 =19, t0.99 = 2.539.

Critical region R = {t0 > t0.99 = 2.539}.

Since 1.899 < 2.539, H0 is not rejected.

Conclusion: The mean drying time appears to be 2 hours.

Thus, being stricter about controlling a type I error (changing α from 0.05 to 0.01) results in
a different conclusion about H0 (Do not reject instead of reject).

(b) H0: μ = 120 (mean is 2 hours)

H1: μ ≠ 120 (mean is not equal to 2 hours)

n = 20, μ0 = 120 (given), x̄ = 124.1, S = 9.65674 (calculated from the data).

124. 1−120
Test statistic: t0 = 9 .65674 / √ 20 = 1.899 (as calculated in part(a)).
140

If α = 0.05, α/2 = 0.025, 1-α/2 = 0.975. From the t-distribution table with

degrees of freedom = ν = n-1 =19, t0.025 = -2.093, t0.975= 2.093.

Critical region R = t0 < -2.093 or t0 > t0.975 = 2.093}.

Since -2.093 <1.899 < 2.093, H0 is not rejected.

[p-value =2 P(t >1.899 )=2×0.0364=0.0729>0.05 ].

Conclusion: The mean drying time appears to be 2 hours.

Note: Despite the fact that the same data were used in the above examples, the conclusions
were different. In the first test H0 was rejected, but in the next 2 tests H0 was not rejected.

1 In the first test the probability of a type I error was set at 5%, while in the second test this
was changed to 1%. To achieve this, the critical was moved from 1.729 to 2.539, resulting in
the test statistic value (1.899) being less than (in stead of greater than) the critical value.

2 In the third test (which has a two-sided alternative hypothesis), the upper critical value
was increased to 2.093 (to have an area of 0.025 under the t-curve to its right). Again, this
resulted in the test statistic value (1.899) being less than (in stead of greater than) the critical
value.

8.5 Test for population variance

( n−1) S 2
χ 2=
The test for the population variance is based on σ2 following a chi-square
distribution with n-1 degrees of freedom. The critical values are therefore obtained from the
chi-square tables.

Test for the population variance σ2


1 State the null and alternative hypotheses.
2 2 2 2 2 2 2 2
H0: σ = σ 0 versus H1a: σ <σ 0 or H1b: σ >σ 0 or H1c: σ ≠ σ 0
( n−1) S 2
χ 20 =
2 Calculate the test statistic σ 20 .
3 State the level of significance α and determine the critical value(s) and critical region.

Degrees of freedom = ν = n-1.

χ 20 χ 20 χ 2α
(i) For alternative H1a the critical region is R = { | < }.

χ 20 χ 20 χ 21−α
(ii) For alternative H1b the critical region is R = { | > }.
141

χ 20 χ 20 χ 21−α /2 χ 20 χ 2α / 2
(iii) For alternative H1c the critical region is R = { | > or < }.

2
4 If χ 0 lies in the critical region, reject H0 , otherwise do not reject H0.

5 State conclusion in terms of the original problem.

For a one-sided test with alternative hypothesis H1b the rejection region (highlighted area) is
shown in the graph below.

For a two-sided test with alternative hypothesis H1c the rejection region (highlighted area) is
shown in the graph below.
142

Example

1 Consider the example on the drying time of the paint discussed in the previous section.
Until recently it was believed that the variance in the drying time is 65 minutes. Suppose it is
suspected that this variance has increased. Test this assertion at the 5% level of significance.

Solution

H0 : σ2 = 65 (Variance has not increased)

H1 : σ2 > 65 (Variance has increased)


2
n = 20, σ 0 = 65 (given), S = 9.65674 (calculated from the data).
2
χ 2 19× 9.65674
Test statistic: 0 = 65 = 27.258.

α = 0.05, 1-α = 0.95. From the chi-square distribution table with

2
degrees of freedom = ν = n-1 =19, χ 0 . 95 = 30.14.

2 2
Critical region R = { χ 0 > χ 0 . 95 = 30.14}.

2
Since 27.258 < 30.14, H0 is not rejected. [p-value =P ( χ >27 . 258)=0 .0988 ].

Conclusion: Variance has not increased.

2 A manufacturer of car batteries guarantees that their batteries will last, on average 3 years
with a standard deviation of 1 year. Ten of the batteries have lifetimes of
1.2, 2.5, 3, 3.5, 2.8, 4, 4.3, 1.9, 0.7 and 4.3 years.

Test at the 5% level of significance whether the variability guarantee is still valid.

Solution

H0 : σ2 = 1 (Guarantee is valid)

H1 : σ2 ≠ 1 (Guarantee is not valid)

2
n = 10, σ 0 = 1 (given), S = 1.26209702, S2 = 1.592889 (calculated from the data).
143

9∗1 . 592889
Test statistic: χ 20 = 1 = 14.336.

α = 0.05, α/2 = 0.025, 1-α/2 = 0.975. From the chi-square distribution table with
2 2
degrees of freedom = ν = n-1 =9, χ 0 . 025 = 2.70, χ 0 . 975 = 19.02.
2 2 2 2
Critical region R = { χ 0 < χ 0 . 025 = 2.70 or χ 0 > χ 0 . 975 =19.02}.

Since 2.70 < 14.336 < 19.02, H0 is not rejected. [p-value¿ P ( χ 2 >14.336 ) =0.110865> 0.05] .

Conclusion: Variability guarantee appears to still be valid.

8.6 Test for population proportion

The test for the population proportion (p) is based on the fact that the sample proportion
^ X
P=
n ~ N(p, pq/n) , where n is the sample size and X the number of items labeled
^ p
P−
“success” in the sample. From this result it follows that Z = √ pq /n ~ N(0, 1).
For this reason, the critical value(s) and critical region are the same as that for the test for the
population mean (both based on the standard normal distribution).

Test for the population proportion p


1 State the null and alternative hypotheses.
H0:
p= p 0 versus H : p< p0 or H : p > p or H : p≠ p 0
1a 1b 0 1c
^p− p 0

2 Calculate the test statistic 0 = √ p 0 q 0 / n ’


z
3 State the level of significance α and determine the critical value(s) and critical region.

(i) For alternative H1a the critical region is R = { z0 | z0 < Zα }.

(ii) For alternative H1b the critical region is R = { z0 | z0 > Z1-α }.

(iii) For alternative H1c the critical region is R = { z0 | z0 > Z1-α/2 or z0 < Zα/2 }.

4 If z0 lies in the critical region, reject H0, otherwise do not reject H0.

5 State conclusion in terms of the original problem.

Examples
144

1 A construction company suspects that the proportion of jobs they complete behind
schedule is 0.20 (20%). Of their 80 most recent jobs 22 were completed behind schedule.
Test at the 5% level of significance whether this information confirms their suspicion.

Solution

H0 : p = 0.20 (Suspicion is confirmed)

H1 : p ≠ 0.20 (Suspicion is not confirmed)

22
^ = 80 = 0.275, p0 = 0.20.
n = 80, x = 22 (given), p

0 .275−0 . 20
Test statistic z0 = √ 0. 20∗0 .80 /80 = 1.677.

α = 0.05. Critical region R = { z0 < Z0.025 = -1.96 or z0 > Z0.975 = 1.96 }.

Since -1.96 < z0 = 1.677 < 1.96, H0 is not rejected.

[p-value =2 P( Z=1.677 )=0 .0935>0 .05 ]

Conclusion: The suspicion is confirmed.

2 During a marketing campaign for a new product 176 out of the 200 potential users of this
product that were contacted indicated that they would use it. Is this evidence that more than
85% of all the potential customers will use the product? Use α = 0.01.

Solution

H0 : p = 0.85 (85% of all potential users will use the product)

H1 : p > 0.85 (More than 85% of all potential users will use the product)

176
^ =200 = 0.88.
n = 200, x = 176, p0 = 0.85 (given), p

0 .88−0 . 85
Test statistic z0 = √ 0. 85∗0 .15 /200 = 1.188.

α = 0.01. Critical region R = {z0 > Z0.99 = 2.326}.

Since z0 = 1.188 < 2.326, H0 is not rejected. [p-value =P (Z=1 .188 )=0.>0. 1174 ]

Conclusion: 85% of all potential users will use the product.

8.7 Computer output


145

1 The output shown below is when the test for the population mean, for the data in example
1 in section 8.4, is performed by using excel.

t-Test: Mean
Mean 129.1
Variance 93.25263158
Observations 20
Hypothesized Mean 120
df 19
t Stat 1.898752271
P(T<=t) one-tail 0.036445557
t Critical one-tail 1.729132792

The value of the test statistic is t0 = 1.90 (2 decimal places). From the table P(T<=-1.9) =
0.036. This probability is known as the p-value (the probability of getting a t value more
remote than the test statistic). When testing at the 5% level of significance, a p-value of
below 0.05 will cause the null hypothesis to be rejected.

2 The output shown below is when the test for the population variance in example 1 in
section 8.5 (the data in example 1 in section 8.4) is performed by using excel.

Chi-square test: Variance


Variance 93.25263
Observations 20
Hypothesized variance 65
df 19
Chi-square stat 27.25846
P(Chi-square<=27.25846) one-
tail 0.098775
Chi-square critical one-tail 30.14353

The values of the test statistic and critical value are the same as in the example in section 8.5.
The p-value is 0.098775 (2nd to last entry in the 2nd column in the table above). Since
0.098775 >0.05 the null hypothesis cannot be rejected at the 5% level of significance.

Chapter 9 – Statistical Inference: Testing of hypotheses for two samples

9.1 Formulation of hypotheses, notation, and additional results

The tests discussed in the previous chapter involve hypotheses concerning parameters of a
single population and were based on a random sample drawn from a single population of
interest. Often the interest is in tests concerning parameters of two different populations
(labeled populations 1 and 2) where two random samples (one from each population) are
drawn.

Examples
146

1 Are the mean salaries the same for males and females with the same educational
qualifications and work experience?

2 Do smokers and non-smokers have the same mortality rate?

3 Are the variances in drying times for two different types of paints different?

4 Is a particular diet successful in reducing people’s weights?

Null and alternative hypotheses

The following hypotheses involving two samples will be tested.

1 The test for equality of two variances. As an example, see example 3 above.
2 The test for equality of two means (independent samples). As an example, see example 1
above.
3 The test for equality of two means (paired samples). As an example, see example 4 above.
4 The test for equality of two proportions. As an example, see example 2 above.

The parameters to be used, when testing the hypotheses, are summarized in the table below.

Paramete population 1 population 2


r
mean μ1 μ2
variance σ 21 σ 22
proportion p1 p2

The following null and alternative hypotheses (as defined in section 8.1) also apply in the
two- sample case.

H0:
θ=θ 0 (The statement that the parameter θ is equal to the hypothetical value θ0 ).
H1a:
θ<θ 0 or H : θ>θ 0 or H : θ≠θ0 .
1b 1c

Examples

1 When testing for equality of variances from 2 different populations labeled 1 and 2 the
hypotheses are
2 2
H0: σ 1= σ 2
2 2 2 2 2 2
H1a: σ 1 <σ 2 or H1b: σ 1 >σ 2 or H1c: σ 1≠ σ 2 .

These hypotheses can also be written as

σ 21
2
=1
H0: σ 2
147

σ 21 σ 21 σ 21
2
<1 2
>1 2
≠1
H1a: σ 2 or H1b: σ 2 or H1c: σ 2 .

σ 21
θ=
In terms of the general notation stated above σ 22 and θ0 =1 .

2 When testing for equality of means from 2 different populations labeled 1 and 2 the
hypotheses are

H0:
μ1 =μ2

H1a:
μ1 < μ 2 or H : μ1 >μ 2 or H : μ1 ≠μ2 .
1b 1c

These hypotheses can also be written as

H0:
μ1 −μ2 =0

H1a:
μ1 −μ2 < 0 or H : μ1 −μ 2 > 0 or H : μ1 −μ2 ≠0
1b 1c

In terms of the general notation stated above


θ=μ 1−μ 2 and θ0 =0 .

3 When testing for equality of proportions from 2 different populations labeled 1 and 2 the
hypotheses are

H0:
p1 = p2

H1a:
p1 < p 2 or H : p1 > p 2 or H : p1 ≠ p2 .
1b 1c

These hypotheses can also be written as

H0:
p1 − p2 =0

H1a:
p1 − p2 <0 or H : p1 − p2 >0 or H : p1 − p2 ≠0 .
1b 1c

In terms of the general notation stated above


θ=p 1− p 2 and θ0 =0 .

Notation

The following notation will used in the description of the two sample tests.

Measure notation (population notation (population 2)


1)
148

sample size n m
sample x 1 , x2 ,⋯, x n x 1 , x 2 ,⋯, x m
sample mean x̄ 1 x̄ 2
sample variance (standard deviation) S21 S1 S22 ( S2 )
( )
sample proportion x¿n x¿m
^ 1= n
p ^p2 = m

¿ ¿
x n and x m are the numbers of “success” items in the samples from populations 1 and 2
respectively.

Standard error formulae

When testing hypotheses for the difference between two means (


μ1 −μ2 ) or the difference
^
θ−θ 0
Z= ~ N (0 , 1)
between two proportions ( 1
p −p 2 ), the test statistics of the form ^
SE( θ ) or
^θ−θ
0
t= ~t
S^ E ( θ)
^
- distribution are used. To calculate these test statistics, formulae for the
standard errors of the corresponding sample differences ( 1
X̄ − X̄2 when testing for the mean,
P^ 1 − P^ 2 when testing for proportions) will be needed. These formulae are summarized in the
table that follows.

^ condition ^
Sample difference ( θ ) standard error [ SE( θ )]
X̄ 1 − X̄ 2 population variances not equal σ 21 σ 22 1 /2
( + )
n m
X̄ 1 − X̄ 2 population variances equal 1 1
σ ( + )1 /2
2 2 2
i.e., σ 1= σ 2 =σ n m
P^ 1 − P^ 2 population proportions not equal p1 ( 1−p 1 ) p 2 ( 1− p 2 ) 1/ 2
[ + ]
n m
P^ 1 − P^ 2 population proportions equal 1 1
[ p(1−p )( + )]1/2
i.e.,
p1 = p2 =p n m

The above-mentioned formulae can also be used calculate 100(1-α ) % confidence intervals
^ ^ ^ ^ ^
for θ . These are of the form θ±Z α /2 SE( θ ) or θ±t α /2 S E ( θ) depending on whether the
normal or t distribution applies.

Two-sample sampling distribution results for differences between means and difference
between proportions

1 For sufficiently large random samples (both n , m>30 ) drawn from populations (with
known variances) that are not too different from a normal population, the statistic
149

X̄ 1− X̄ 2 −( μ1 −μ2 )
Z=
σ 21 σ 22 1 /2
( + ) 2
n m follows a N(0,1) distribution. Here the population variances σ 1
2
and σ 2 are assumed to be known but are not equal.

1 1
2 2 2 σ ( + )1 /2
2 When σ 1= σ 2 =σ the above-mentioned result still holds, but with n m in the
denominator in the formula for Z .
2 2 2
3 When the population variances σ 1 , σ 2 and σ , referred to in the two above mentioned
2 2
results, are not known they may be replaced by their sample estimates S1 , S2 and
2 (n−1)S 21 +(m−1)S 22
S=
n+ m−2 respectively in the above formula for Z .

In both the results that follow, it is assumed that the samples drawn are independent samples
from normally distributed populations.
2 2
(i) When the population variances σ 1 and σ 2 are not known but equal, it can be shown that
X̄ 1 − X̄ 2 −( μ1 −μ2 )
t=
1 1
S( + )1/ 2
n m follows a t-distribution with n+ m−2 degrees of freedom.
2 2
(ii) When the population variances σ 1 and σ 2 are not known and not equal, it can be shown
X̄ 1 − X̄ 2 −( μ1 −μ2 )
t=
S21 S 22 1/ 2
( + )
that n m follows a t-distribution with degrees of freedom = the integer
( S21 / n+S 22 / m )2
v=
( S 21 / n )2 ( S 22 / m )2
+
part of n−1 m −1 .

^ −P
P ^ −( p −p )
1 2 1 2
Z=
p1 ( 1−p 1 ) p 2 ( 1− p 2 )
[ + ]1/ 2
4 For sufficiently large random samples the statistic n m
follows a N(0,1) distribution.

1 1 1/2
p = p2 =p the abovementioned result still holds but with [ p(1−p )( n + m )]
5 When 1 in
the denominator.
150

6 Provided the sample sizes are sufficiently large, the two above mentioned results will still
x¿n x¿m
be valid with
p1 , p2 and p ^p ^p =
in the denominator replaced by 1 = n , 2 m and
x ¿n + x ¿m
^p=
n+m respectively.

σ 21
2
9.2 Test for equality of population variances (F-test) and confidence interval for σ 2

A summary of the steps to be followed in the testing procedure is shown below.

2 2
Test for σ 1= σ 2
Step 1: State null and alternative hypotheses
2 2 2 2 2 2 2 2
H0: σ 1= σ 2 versus H1a: σ 1 <σ 2 or H1b: σ 1 >σ 2 or H1c: σ 1≠ σ 2
max ( S21 , S22 )
F 0=
Step 2: Calculate the test statistic min ( S 21 , S 22 )
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

Degrees of freedom is
df 1 = sample size (numerator sample variance)-1 and
df 2 = sample size (denominator sample variance) -1

(i) For alternatives H1a and H1b the critical region is R = {


F 0|F 0 >F 1−α ¿¿ .
(ii) For alternative H1c the critical region is R = {
F 0|F 0 >F 1−α /2 ¿ ¿ .

Step 4: If
F 0 lies in the critical region, reject H , otherwise do not reject H .
0 0

Step 5: State the conclusion in terms of the original problem

σ 21
2
Confidence interval for σ 2
2 2
Step 1: Calculate S1 and S 2 . Values of n , m and confidence percentage are given.
Step 2: Determine the upper and lower F - distribution values for a given confidence
percentage,
df 1 and df 2 .
S 21 S 21
2 2
Step 3: Confidence interval is ( S 2 * lower, S 2 * upper)

Examples
151

1 The following sample information about the daily travel expenses of the sales (population
1) and audit (population 2) staff at a certain company was collected.

sales 1048 1080 1168 1320 1088 1136


audit 1040 816 1032 1142 1192 960 1112

(a) Test at the 10% level of significance whether the population variances could be the
same.
σ 21
2
(b) Calculate a 95% confidence interval for σ 2 .
2 2
(a) H : σ 1= σ 2
0

2 2
H1: σ 1≠ σ 2

From the above information n=6 , m=7 , S1 =9593 . 6 and S2 =15884 .


2 2

max(9593 . 6 , 15884 ) 15884


F 0= = =1. 656
Test statistic: min (9593. 6 , 15884 ) 9593 . 6

df 1=7−1=6 , df 2 =6−1=5 , α =0 . 10 , α / 2=0 . 05 .

For
df 1=6 ,df 2 =5 , F 0. 95=4 . 95 .

Critical region R = {
F 0 >4 . 95 }

Since
F 0=1. 656 < 4.95, H is not rejected. [p-value =P (F >1 .656 )=0. 2982 ]
0

Conclusion: The population variances could be the same.

S 22 / σ 22 S 21 σ 21 S 21
F 0. 025 < < F 0. 975 )=0 . 95 F 0 .025 < < F 0. 975 )=0 . 95
(b) P( S 21 / σ 21 2
or P( S 2 σ 22 S 22
2 2
In the above expression S2 is in the numerator and S1 in the denominator. Hence
1
df 1=6 ,df 2 =5 and upper = F 0. 975 =6 . 98. lower = F 0. 025 is found from F 0 .975 with
1
df 1=5 , df 2 =6 i.e. lower = 5. 99 = 0.1669.

S 21
=0 . 604
Substituting S 2
2 F =0 .1669 and F 0. 975 = 6.98 into the above gives a confidence
, 0. 025
interval of (0.604*0.1669, 0.604*6.98) = (0.101, 4.216).
152

2 The waiting times (minutes) for minor treatments were recorded at two different medical
centres. Below is a summary of the calculations made from the samples.

centre sample size mean variance


1 12 25.69 7.200
2 10 27.66 22.017

Test at the 5% level of significance whether the population 1 variance is less than that for
population 2.

2 2
H0: σ 1= σ 2

2 2
H1: σ 1 <σ 2

From the above table n=12 ,m=10 , S1 =7 . 200 and S2 =22 .017 .
2 2

max(22 . 017 ,7 . 200 ) 22 . 017


F 0= = =3 . 058
min (22 . 017 , 7 .200 ) 7 .200 .

df 1=10−1=9 , df 2 =12−1=11 , α =0 . 05

For
df 1=9 ,df 2 =11 F 0. 95=2 . 90 .

Critical region R = {
F 0 >2 . 90 }

Since
F 0=3. 058 > 2.90, H is rejected. [p-value =P (F >3 . 058)=0.0422<0. 05 ]
0

Conclusion: The variance for population 1 is probably less than that for population 2.

9.3 Test for difference between means for independent samples

(i) For independent large samples (both sample sizes n , m>30 ) and population
variances known

Test for 1
μ −μ =0
2 (large samples, population variances known)
Step 1: State null and alternative hypotheses
H0:
μ1 −μ2 =0
H1a:
μ1 −μ2 < 0 or H : μ1 −μ 2 > 0 or H : μ1 −μ2 ≠0
1b 1c
x̄ 1 − x̄2
z0=
σ 21 σ 22 1/ 2
( + )
Step 2: Calculate the test statistic n m .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.
153

(i) For alternative H1a the critical region is R = {z0 | z0 < Zα }.

(ii) For alternative H1b the critical region is R = {z0 | z0 > Z1-α}.

(iii) For alternative H1c the critical region is R = {z0 | z0 > Z1-α/2 or z0 < Zα/2}.

Step 4: If z0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.

σ 12 σ 22 1/ 2
μ −μ2 is given by x̄ 1− x̄ 2 ±Z 1−α / 2 ( n + m ) .
A 100(1-α ) % confidence interval for 1

2 2
If the population variances σ 1 and σ 2 are not known, they can be replaced in the above
2 2
formulae by their sample estimates S1 and S2 respectively with the testing procedure
unchanged.

Examples:

1 Data were collected on the length of short term stay of patients at hospitals. Independent
random samples of n= 40 male patients (population 1) and m= 35 female patients
(population 2) were selected. The sample mean stays for male and female patients were 1 =

x̄ 2
9 days and 2 = 7.2 days respectively. The population variances are known to be σ 1 = 55 and
σ 22 = 47.

(a) Test at the 5% level of significance whether male patients stay longer on average than
female patients.

(b) Calculate a 95% confidence interval for the mean difference (in staying time) between
males and females.

(a) H0:
μ1 −μ2 =0 (mean staying times for males and females the same)

H1:
μ1 −μ 2 > 0 (mean staying time for males greater than for females)

x̄ 1 − x̄2
z0= 9−7.2 1.8
σ 21 σ 22 1/ 2 = =1.09184
( + ) 55 47
1/ 2
1.6486
Test statistic: n m =( + )
40 35 .

α =0 . 05 . Critical region R = { z 0 > Z 0. 95 = 1.645}.


154

z
Since 0 = 1.09184 < 1.645, H0 cannot be rejected. [p-value
¿ P ( Z >1.09184 )=0.862548> 0.05 ¿ .

Conclusion: The mean staying times for males and females are probably the same.

σ 21 σ 22 1 /2
(b) 1
x̄ − x̄ 2 = 2, ( n + m ) = 1.6486 (denominator value when calculating the test
Z Z
statistic), 1−α=0 .95 , α =0 . 05 , α / 2 = 0.025, 1−α /2 = 0. 975 = 1.96.
σ 21 σ 22 1/ 2
x̄ 1− x̄ 2 −Z1−α / 2 ( + )
n m = 1.8 -1.96*1.6486 = -1.431

σ 21 σ 22 1 /2
x̄ 1− x̄ 2 + Z 1−α / 2 ( + )
n m = 1.8 + 1.96*1.6486 = 5.031

2 Researchers in obesity want to test the effectiveness of dieting with exercise against
dieting without exercise. Seventy-three patients who were on the same diet were randomly
divided into “exercise” (n =37 patients) and “no exercise” groups (m =36 patients). The
results of the weight losses (in kilograms) of the patients after 2 months are summarized in
the table below.

Diet with exercise Diet without exercise group


group
x̄ 1=7 . 6 x̄ 2 =6 .7
S21 =2 .53 S22 =5 .59

Test at the 5% level of significance whether there is a difference in weight loss between the 2
groups.

H0:
μ1 −μ2 =0 (No difference in weight loss)

H1:
μ1 −μ 2 ≠0 (There is a difference in weight loss)

x̄ 1− x̄ 2 7 . 6−6. 7
z0= =
S12 S22
1/ 2 (
2. 53 5 . 59 1/2 0.9
( + ) + )
Test statistic: n m = 37 36 0 . 473 = 1.903.

α =0 . 05 . Critical region R = { z 0 < Z 0. 025 = -1.96 or z 0 >Z 0 . 975=1 .96 }.

Since -1.96 <


z 0 =1.903 < 1.96, H cannot be rejected.
0

[p-value =2 P( Z>1. 903 )=0 .>0 . 057 ]


155

Conclusion: There is not sufficient evidence to suggest a difference in weight loss between
the 2 groups.

(ii) For independent samples from normal populations with variances unknown

The test to be performed in this case will be preceded by a test for equality of population
2 2 2
variances (σ 1= σ 2 = σ ) i.e., the F-test discussed in section 9.2. If the hypothesis of equal
variances cannot be rejected, the test described below should be performed. If this hypothesis
is rejected, the Welsh-Aspin test (see below) should be performed. If, in this case, the
assumption of samples from normal populations does not hold, a nonparametric test like the
Wilcoxon-Mann-Whitney test (to be discussed in later chapter) should be used.

Test for 1
μ −μ =0
2 (population variances unknown but equal)
Step 1: State null and alternative hypotheses
H0:
μ1 −μ2 =0
H1a:
μ1 −μ2 < 0 or H : μ1 −μ 2 > 0 or H : μ1 −μ2 ≠0
1b 1c
x̄1 − x̄ 2
t 0= 2 2
1 1 1/2 2 (n−1)S 1 +(m−1)S 2
S( + ) S=
Step 2: Calculate the test statistic n m with n+ m−2
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

Degrees of freedom = ν = n+ m−2 .

(i) For alternative H1a the critical region is R = { t0 | t0 < tα }.

(ii) For alternative H1b the critical region is R = { t0 | t0 > t1-α }.

(iii) For alternative H1c the critical region is R = { t0 | t0 > t1-α/2 or t0 < tα/2 }.

Step 4: If t0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.

1 1
μ −μ2 is given by x̄ 1− x̄ 2 ±t n+m−2, 1−α /2 S ( + )1 /2
A 100(1-α ) % confidence interval for 1 n m .

Examples

1 Consider the above example on the comparison of the travel expenses for the sale and
audit staff (see section 9.2, example 1 for F-test).

(a) Test, at the 5% level of significance, whether the mean expenses for the two types of
staff could be the same.
156

(b) Calculate a 95% confidence interval for the difference between the mean expenses for the
two types of staff.

(a) Since the hypothesis of equal population variances was not rejected, the test described
x̄ x̄ 2
above can be performed. From the data given 1 =1140, 2 = 1042, S1 = 9593.6 and S2 =
2

15884.

H0:
μ1 −μ2 =0 (Mean travel expenses for sale and audit staff the same)

H1:
μ1 −μ2 ≠0 (Mean travel expenses for sale and audit staff not the same)

α=0.05,α /2=0.025,1−α /2=0.975 . From the t-distribution with ν=n=m−2 = 6+7-


2=11 degrees of freedom,
t 0. 975 = 2.201.

2 (n−1)S 21 +(m−1)S 22 5∗9593. 6+6∗15884


S= =
n+ m−2 = 11 13024.727, S = 114.126

1140−1042
1 1
114 .126 ( + )1/2
Test statistic = 6 7 = 1.543.

Critical region = R = {
t 0 >t 0 . 975=2 .201 }.

Since 1.543 <


t 0. 975 =2. 201 , H cannot be rejected. [p-value =2 P(t >1. 543 )=0 .151 ]
0

Conclusion: Mean travel expenses for sale and audit staff are probably the same.

(b) A 95% confidence interval for the difference between sales and audit staff means is

1 1 1/2
+ )
1140-1042 ± 2.201*114.126*( 6 7 i.e, (-41.75, 237.75).

2 A certain hospital has been getting complaints that the response to calls from senior
citizens is slower (takes longer time on average) than that to calls from other patients. To test
this claim, a pilot study was carried out. The results are shown below.

Patient type sample mean response time sample standard deviation sample size
Senior 5.60 minutes 0.25 minutes 18
citizens
Others 5.30 minutes 0.21 minutes 13

Test, at the 1% level of significance, whether the complaint is justified.


157

Label the “senior citizens” and “others” populations as 1 and 2 and their population mean
response times as
μ1 and μ2 respectively.

H0:
μ1 −μ2 =0 (Mean response times the same)

H1:
μ1 −μ2 > 0 (Mean response time for senior citizens longer than for others)

The hypothesis that the population variances are equal cannot be rejected (perform the F-test
to check this). Hence equal variances for the 2 populations can be assumed.

17∗0 .25 2 +12∗0. 212


S2 = 29 = 0.0549

5 . 6−5. 3
t 0=
1 1
0 . 2343( + )1/2
Test statistic: 18 13 = 3.518

α=0.01,1−α=0.99. From the t-distribution table with ν=n+m− 2=18+13−2=29


degrees of freedom
t 0. 99=2 . 462 .

Critical region = R = {
t 0 >t 0 . 99=2. 462 }. [p-value = P ( t>3.518 )=0.00073 ¿

Since 0
t =3. 518 > 2.462, H0 is rejected.
Conclusion: The claim is justified i.e., the mean response time for senior citizens takes longer
than that for others.

(iii) For independent samples from normal populations with population variances not
known and not equal (Welsh-Aspin test)

Test for 1
μ −μ =0
2 (large samples, population variances not known)
Step 1: State null and alternative hypotheses
H0:
μ1 −μ2 =0
H1a:
μ1 −μ2 < 0 or H : μ1 −μ 2 > 0 or H : μ1 −μ2 ≠0
1b 1c
x̄1 − x̄ 2
t 0=
S21 S 22 1 /2
( + )
Step 2: Calculate the test statistic n m .
Step 3: State the level of significance α and determine the critical value(s) and critical
region using the degrees of freedom defined below.

(i) For alternative H1a the critical region is R = { t0 | t0 < tα }.

(ii) For alternative H1b the critical region is R = { t0 | t0 > t1-α }.


158

(iii) For alternative H1c the critical region is R = { t0 | t0 > t1-α/2 or t0 < tα/2 }.

Step 4: If t0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.

S21 S 22 1/2
μ −μ2 is given by x̄ 1− x̄ 2 ±t 1−α /2 ( n + m ) .
A 100(1-α ) % confidence interval for 1

( S21 / n+S 22 / m )2
v=
( S 21 / n )2 ( S 22 / m )2
+
In the above the integer part of n−1 m −1 is used as degrees of freedom to
determine the value from the t-tables.

Example

The waiting times (minutes) for minor treatments were recorded at two different medical
centres. Below is a summary of the calculations made from the samples.

centre sample size mean variance


1 12 25.69 7.200
2 10 27.66 22.017

(a) Test at the 5% level of significance whether the population means for the 2 centres could
be equal.

(b) Calculate a 95% confidence interval for the difference between the means of centres 1
and 2.

(a) H0:
μ1 −μ2 =0

H1:
μ1 −μ2 ≠0

An F-test for equality of variances (see example 2 of tests for equality of population
variances) show that the two population variances are probably not equal. Therefore, the test
statistic to be used is

x̄1 − x̄ 2 25. 69−27 .66 −1 . 97


t 0= = = =−1 .177
S21 S 22 1 /2 (
7 . 2 22 . 017 1/ 2 1. 673828
+ )
( + ) 12 10
n m

( S21 / n+S 22 / m )2 (7 . 2/ 12+22 .017 / 10)2


v= = =13 .74
( S 21 / n )2 ( S 22 / m )2 ( 7 .2 /12 )2 ( 22. 017 /10 )2
+ +
n−1 m −1 11 9
159

The degrees of freedom are integer ( v ) = 13. From the t-distribution table with 13 degrees
of freedom
t 0. 025 =−2. 1604 .

Critical region = R = {
t 0 <t 0 .025 =−2 .1604 }.
t =−1. 177
Since 0 >- 2.1604, H0 is not rejected. [ p-value P(t <−1.177 )=0.1301 ]
Conclusion: The population means for the 2 centres could be equal.

(b) A 95% confidence interval for the the difference between the means of centres 1 and 2 is
S12 S22 1/2 7 .2 22. 017 1/2
x̄ 1− x̄ 2 ±t 13 ;0 . 975×( + ) =(25. 69−27 .66 )±2. 1604×( + )
n m 12 10
=−1.97±3.616=(−5.586;1.646)

9.4 Test for difference between means for paired (matched) samples

The tests for the difference between means in the previous section assumed independent
samples. In certain situations, this assumption is not met.

Examples

1 A group of patients going on a diet is weighed before going on the diet and again after
having been on the diet for one month. A test to determine whether the diet has reduced their
weight is to be performed.

2 The aptitudes of boys and girls for mathematics are to be compared. To eliminate the
effect of social factors, pairs of brothers and sisters are used in the comparison. Each (brother,
sister) pair is given the same test and the mean marks of boys and girls compared.

In each of these situations the two samples cannot be regarded as independent. In the first
example two readings (before and after readings) are made on the same subject. In the second
example the two samples are matched via a common factor (family connection).

The data layout for the experiments described above is shown below.

sample 1 x1 x2 .... xn
.
sample 2 y1 y2 .... yn
.
difference d 1 x1  y1 d 2 =x 2− y 2 .... d n =x n − y n
.

The mean of the paired differences of the ( x , y ) values of the two populations is defined as
μd . Under the assumption that the differences are sampled from a normal population,
hypotheses concerning the mean of the differences
μd can be tested by performing a one
sample t -test (described in the previous chapter) with the observed differences
d 1 , d 2 ,⋯, d n
160

as the sample. The mean and standard deviation of these sample differences will be denoted
by d̄ and
Sd respectively.

Test for d
μ =0 (paired samples)
Step 1: State null and alternative hypotheses

H0:
μd =0
H1a:
μd <0 or H : μ d > 0 or H : μd ≠0
1b 1c


t 0=
Sd
Step 2: Calculate the test statistic √n .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

Degrees of freedom = ν = n−1 .

(i) For alternative H1a the critical region is R = {t0 | t0 < tα }.

(ii) For alternative H1b the critical region is R = {t0 | t0 > t1-α }.

(iii) For alternative H1c the critical region is R = {t0 | t0 > t1-α/2 or t0 < tα/2 }.

Step 4: If t0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.

Sd
A 100(1-α ) % confidence interval for d is given by d̄±t -multiplier* √ n , where the t -
μ
multiplier is obtained from the t-tables with n−1 degrees of freedom with an area 1− α / 2
under the t-curve below it.

Examples

1 A bank is considering loan applications for buying each of 10 homes. Two different
companies (company 1 and company 2) are asked to do an evaluation of each of these 10
homes. The evaluations (thousands of Rand) for these homes are shown in the table below.

Home 1 2 3 4 5 6 7 8 9 10
company 750 990 1025 1285 130 875 124 880 700 1315
1 0 0
company 810 100 1020 1320 129 915 125 910 650 1290
2 0 0 0
161

difference 60 10 -5 35 -10 40 10 30 -50 -25

(a) At the 5% level of significance, is there a difference in the mean evaluations for the 2
companies?

(b) Calculate a 95% confidence interval for the difference between the mean evaluations for
companies 1 and 2.

(a) H0:
μd =0 (No difference in mean evaluations)
μ ≠0 (There is a difference in mean evaluations)
H: d1

From the above table d̄ = 9.5,


Sd = 33.12015, n =10.

9 .5
t 0=
33 . 12015
Test statistic: √10 = 0.907.
α=0.05,α /2=0.025,1−α /2=0.975 . From the t-tables with ν=n - 1 = 9 degrees of
freedom,
t 0. 975 = 2.262.

Critical region R = {
t 0 >2 . 262 }.

Since
t 0 = 0.907 < 2.262, H is not rejected i.e., no difference in mean evaluations.
0

33.12015
±2 .262
(b) A 95% confidence interval is given by 9.5 √10 = (-14.19, 33.19).
2 Each of 15 people going on a diet was weighed before going on the diet and again after
having been on the diet for one month. The weights (in kilograms) are shown in the table
below.

Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
before 90 11 124 116 105 8 86 92 10 112 138 96 102 11 82
0 8 1 1
after 85 10 126 118 94 8 87 87 99 105 130 93 95 10 83
5 4 2
difference -5 -5 2 2 -11 -4 1 -5 -2 -7 -8 -3 -7 -9 1

Test, at the 1% level of significance, whether the mean weight after one month on the diet is
less than that before going on the diet.

μ
Let d denote the mean difference between the weight after having been on the diet for one
month and before going on the diet.

H0:
μd =0 (No difference in mean weights)
162

H1:
μd <0 (Mean weight after one month on diet less than before going on diet)

From the above table d̄ = -4,


Sd = 4.1231, n =15.

−4
t 0=
4 . 1231
Test statistic: √ 15 = -3.757.

α =0 . 01 . From the t-tables with ν=n - 1 = 14 degrees of freedom, t 0. 01=−t 0 . 99 = -2.624.

Critical region R = {
t 0 <−2 . 624 }.

Since
t 0 = -3.757 < -2.624, H is rejected.
0

Conclusion: The mean weight after one month on the diet is less than before going on diet.

9.5 Test for the difference between proportions for independent samples

When testing for the difference between the proportions of two different populations, the test
is based on the sampling distribution results 4-6 described in the first section of this chapter.

Test for 1
p − p =0
2
Step 1: State null and alternative hypotheses
H0:
p1 − p2 =0
H1a:
p1 − p2 <0 or H : p1 − p 2 >0 or H : p1 − p2 ≠0
1b 1c
p^ 1− p^ 2
z0= x ¿n + x ¿m
1 1 1/2 ^p=
[ p^ (1− p^ )( + )]
Step 2: Calculate the test statistic n m with n+m .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

(i) For alternative H1a the critical region is R = { z0 | z0 < Zα }.

(ii) For alternative H1b the critical region is R = { z0 | z0 > Z1-α }.

(iii) For alternative H1c the critical region is R = { z0 | z0 > Z1-α/2 or z0 < Zα/2 }.

Step 4: If z0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.


163

A 100(1-α ) % confidence interval for


p1 − p2 is given by
.
^p1 ( 1− ^p1 ) p^ 2 ( 1− ^p2 ) 1 /2
^p1− ^p2 ± Z α [ + ] .
2 n m

Example

A perfume company is planning to market a new fragrance. To test the popularity of the
fragrance, 120 young women and 150 older women were selected at random and asked
whether they liked the new fragrance. The results of the survey are shown below.

women like did not sample size


like
young 48 72 120
older 72 78 150

(a) Test, at the 5% level of significance, whether older women like the new fragrance more
than young women.

(b) Calculate a 95% confidence interval for the difference between the proportions of older
and young women who like the fragrance.

(a) Let the older and younger women populations be labeled 1 and 2 respectively and
p1 and
p2 the respective population proportions that like the fragrance.

H0:
p1 − p2 =0

H1:
p1 − p 2 >0

¿ ¿
From the above table n = 150, m = 120, x n = 72, x m = 48.

72+48 4
=
= 150+120 9

72 48

150 120
z0=
4 5 1 1 1/2 0 .08
[( )×( )×( + )]
Test statistic: 9 9 150 120 = 0 .060858 =1.3145.

α =0 . 05 . Critical region R = { z 0 > Z 0. 95 = 1.645}.

Since
z 0 = 1.3145 < 1.645, H cannot be rejected.
0
164

Conclusion: There is not sufficient evidence to suggest that older women like the new
fragrance more than young women.

(b)
^p1 − ^p2 = 0.08 [numerator of z 0 in part (a)], Z 0 . 975 =1.96

[ ]
1
72
(
1−
72 48
1−) 48
( ) 2

[ ]
1
^p1 ( 1− ^p1 ) ^p2 ( 1− ^p2 )
2 150 150 120 120
^p1− ^p2 ± Z α + =0.08± 1.96 +
2
n m 150 120
= 0.08± 0.11864 = (-0.03864, 0.19864)

9.6 Computer output

1 The test for the difference between population means in example 1 in section 9.3(ii) (the
data in example 1 in section 9.2) can be performed by using excel. What follows is the
output.

t-Test: Two-Sample Assuming Equal Variances


Variable Variabl
1 e2
Mean 1140 1042
Variance 9593.6 15884
Observations 6 7
Pooled Variance 13024.73
Hypothesized Mean
Difference 0
df 11
t Stat 1.543458
P(T<=t) two-tail 0.150984
t Critical two-tail 2.200985

The p-value is 0.150984 > 0.05. At the 5% level of significance the null hypothesis cannot be
rejected.

2 The output shown below is when the test for equality of population variances for the data
in example 1 in section 9.2 is performed by using excel.

F-Test Two-Sample for Variances


Variable 1 Variable 2
Mean 1140 1042
Variance 9593.6 15884
Observations 6 7
df 5 6
f 0.603979
P(F<=f) 0.701718
F Critical two-tail 0.143266

s21 9593. 6
= =0 . 603979
s 22 15884
The value of the test statistic shown in the above table is . The
critical value (last entry under variable 1 in the above table) is
165

1 1
F 5,6 ; 0.975= = =0.143266
F 6 ,5 ; 0.975 6.98 and the p-value (second to last entry under variable 1 in
the above table) is 0.701718. Since 0.701718 > 0.025, the null hypothesis cannot be rejected.

Chapter 10 – Linear Correlation and regression

10.1 Bivariate data and scatter diagrams

Often two variables are measured simultaneously and relationships between these variables
explored. Data sets involving two variables are known as bivariate data sets.

The first step in the exploration of bivariate data is to plot the variables on a graph. From
such a graph, which is known as a scatter diagram (scatter plot, scatter graph), an idea can
be formed about the nature of the relationship.

Examples

1 The number of copies sold (y) of a new book is dependent on the advertising budget (x)
the publisher commits in a pre-publication campaign. The values of x and y for 12 recently
published books are shown below.
166

x (thousands of y
rands) (thousands)
8 12.5
9.5 18.6
7.2 25.3
6.5 24.8
10 35.7
12 45.4
11.5 44.4
14.8 45.8
17.3 65.3
27 75.7
30 72.3
25 79.2

Scatter diagram

Adverting budget and copies sold

90
80
70
60
copies sold

50
40
30
20
10
0
0 5 10 15 20 25 30 35
advertising budget

2 In a study of the relationship between the amount of daily rainfall (x) and the quantity of
air pollution removed (y), the following data were collected.

Rainfall quantity removed (micrograms per cubic


(centimeters) meter)
4.3 126
4.5 121
5.9 116
5.6 118
6.1 114
5.2 118
3.8 132
2.1 141
7.5 108

Scatter diagram
167

Rainfall and quantity removed

160

140

120
Quantity removed

100

80

60

40

20

0
0 2 4 6 8
Rainfall

1 In both cases the relationship can be well described by means of a straight line i.e., both
these relationships are linear relationships.

2 In the first example an increase in y is proportional to an increase in x (positive linear


relationship).

3 In the second example a decrease in y is proportional to an increase in x (negative linear


relationship).

4 In both the examples changes in the values of y are affected by changes in the values of x
(not the other way round). The variable x is known as the explanatory (independent,
predictor) variable and the variable y the response (dependent) variable.

In this section only linear relationships between 2 variables will be explored. The issues to be
explored are

1 Measuring the strength of the linear relationship between the 2 variables (the linear
correlation problem).

2 Finding the equation of the straight line that will best describe the relationship between
the 2 variables (the linear regression problem). Once this line is determined, it can be used
to estimate a value of y for given value of x (linear estimation).

10.2 Linear Correlation

The calculation of the coefficient of correlation (r) is based on the closeness of the plotted
points (in the scatter diagram) to the line fitted to them. It can be shown that

-1 ≤ r ≤ 1.
168

If the plotted points are closely clustered around this line, r will lie close to either 1 or -1
(depending on whether the linear relationship is positive or negative). The further the plotted
points are away from the line, the closer the value of r will be to 0. Consider the scatter
diagrams below.

Strong positive correlation (r close to 1)

Strong negative correlation (r close -1)

No pattern (r close to 0)

For a sample of n pairs of values (x1, y1) , (x2, y2), . . . , (xn, yn) , the coefficient of
correlation can be calculated from the formula
169

n ∑ xy−∑ x ∑ y
√ ∑ x 2−( ∑ x )2 ][ n ∑ y 2−( ∑ y )2 ] .
r = [n

Example

Consider the data on the advertising budget (x) and the number of copies sold (y) considered
earlier. For this data r can be calculated in the following way.

x y xy x2 y2
8 12.5 100 64 156.25
9.5 18.6 176.7 90.25 345.96
7.2 25.3 182.16 51.84 640.09
6.5 24.8 161.2 42.25 615.04
10 35.7 357 100 1274.49
12 45.4 544.8 144 2061.16
11.5 44.4 510.6 132.25 1971.36
14.8 45.8 677.84 219.04 2097.64
17.3 65.3 1129.69 299.29 4264.09
27 75.7 2043.9 729 5730.49
30 72.3 2169 900 5227.29
25 79.2 1980 625 6272.64
10032.8
sum 178.8 545 9 3396.92 30656.5

Substituting n=12, ∑ x = 178.8, ∑ y = 545, ∑ xy = 10032.89, ∑ x2 = 3396.92 and

∑ y2 = 30656.5 into the equation for r gives

12∗10032 . 89−178 . 8∗545 22948.68


=0.9194
r = √ [12∗3396 . 92−(178 . 8 ) √ [12∗30656 .5−(545 ) = √ 8793.6∗70853
2 2
.

Comment: Strong positive correlation i.e., the increase in the number of copies sold is closely
linked with an increase in advertising budget.

Coefficient of determination

The strength of the correlation between 2 variables is proportional to the square of the
correlation coefficient (r2). This quantity, called the coefficient of determination, is the
proportion of variability in the y variable that is accounted for by its linear relationship with
the x variable.

Example

In the above example on copies sold (y) and advertising budget (x), the

coefficient of determination = r2 = 0.91942 = 0.8453.

This means that 84.53% of the change in the variability of copies sold is explained by its
relationship with advertising budget.
170

10.3 Linear Regression

Finding the equation of the line that best fits the (x, y) points is based on the least squares
principle. This principle can best be explained by considering the scatter diagram below.

The scatter diagram is a plot of the DBH (diameter at breast height) versus the age for 12 oak
trees. The data are shown in the table below.

Age (years 1
x ) 97 93 88 81 75 57 52 45 28 15 12 1
DB 12. 12. 16. 10. 1.
Hy (inch) 5 5 8 9.5 5 11 5 9 6 5 1 1

According to the least squares principle, the line that “best” fits the plotted points is the one
that minimizes the sum of the squares of the vertical deviations (see vertical lines in the
above graph) between the plotted y and estimated y (values on the line). For this reason the
line fitted according to this principle is called the least squares line.

Calculation of least squares linear regression line

The equation for the line to be fitted to the (x, y) points is

^y = a + bx,
171

where ^y is the fitted y value (y value on the line which is different to the observed y value),
a is the y-intercept and b the slope of the line.

It can be shown that the coefficients that define the least squares line can be calculated from

n ∑ xy−∑ x ∑ y
b= n ∑ x 2−( ∑ x)2 and

a= ȳ−b x̄.
Example

For the above data on age (x) and DBH (y) the least squares line can calculated as shown
below.

x y xy x2

97 12.5 1212.5 9409


93 12.5 1162.5 8649
88 8 704 7744
81 9.5 769.5 6561
75 16.5 1237.5 5625
57 11 627 3249
52 10.5 546 2704
45 9 405 2025
28 6 168 784
15 1.5 22.5 225
12 1 12 144
11 1 11 121
sum 654 99 6877.5 47240

Substituting n=12, ∑ x = 654, ∑ y = 99, ∑ xy = 6877.5 and ∑ x2 = 47240 into the above
equation gives.

12∗6877 .5−654∗99 17784


= =0 . 12779
b= 12∗47240−(654 )2 139164 and

99 654
−0 . 12779∗
a= 12 12 = 1.285.

Therefore, the equation of the y on x least squares line that can be used to estimate values of
y (DBH) based on x (age) is
172

^y = 1.285 + 0.12779 x.
Suppose the DBH of a tree aged 90 years is to be estimated. This can be done by substituting
the value of x = 90 into the above equation. Then

^y = 1.285 + 0.12779*90 = 12.786.


A word of caution

1 The linear relationship between y and x is often only valid for values of x within a certain
range e.g., when estimating the DBH using age as explanatory variable, it should be taken
into account that at some age the tree will stop growing. Assuming a linear relationship
between age and DBH for values beyond the age where the tree stops growing would be
incorrect.

2 Only relationships between variables that could be related in a practical sense are explored
e.g., it would be pointless to explore the relationship between the number of vehicles in New
York and the number of divorces in South Africa. Even if data collected on such variables
might suggest a relationship, it cannot be of any practical value.

3 If variables are not linearly related, it does not necessarily mean that they are not related.
There are many situations where the relationships between variables are non-linear.

Example

A plot of the banana consumption (y) versus the price (x) is shown in the graph below. A
straight line will not describe this relationship very well, but the non-linear curve shown
below will describe it well.

NONLINEAR REGRESSION: EXAMPLE

14
y
12

10

8

6 y    u   z  u
x
4

0
0 1 2 3 4 5 6 7 8 9 10 11 x12

This sequence shows how a nonlinear regression model may be fitted. It uses the banana
consumption example in the first sequence.

1
173

10.4 Testing for the slope

The true regression equation that describes the relationship between y and x can be written
as y=α+βx + error. The coefficients a and b that were calculated in the previous section
are the least squares estimates of α and β respectively. A hypothesis that is often of interest
when exploring a linear relationship between x and y , is whether they are indeed linearly
related. When testing this hypothesis, it is assumed that the error term in the above formula is
normally distributed.

A summary of the steps to be followed in the testing procedure is shown below.

Test for zero slope i.e., β=0


Step 1: State null and alternative hypotheses
H0: β=0 versus H1a: β <0 or H1b: β >0 or H1c: β≠0
b S 2xy 2 SSE
t 0= SSE=S yy − ,S =
Step 2: Calculate the test statistic S / √ S xx , where S xx n−2 ,

2 (∑ x )
2
2 (∑ y )
2
( ∑ x )( ∑ y )
S xx =∑ x − S yy =∑ y − S xy=∑ xy −
n , n and n .
Step 3: State the level of significance α and determine the critical value(s) and critical
region. This is based on
t 0 ~ t n−2 .

(i) For alternative H1a the critical region is R = {


t 0|t 0 <t α ¿ ¿ .
(ii) For alternative H1b the critical region is R = {
t 0|t 0 >t 1−α ¿¿ .
(ii) For alternative H1c the critical region is R = {
t 0|t 0 <t α / 2 or t 0 >t 1−α /2 ¿¿

Step 4: If t 0lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem

S
b±t α /2
A 100(1−α ) % confidence interval for β is given by √ S xx . The degrees of freedom
used to find the t-multiplier is n−2 .

The test described above is also a test for zero correlation between y and x in a linear
relationship.

Examples

1 For the data on age and DBH

(a) Test at the 1% level of significance whether the slope could be 0.


174

(b) Calculate a 95% confidence interval for the slope.

(a)

H0: β=0
H1: β≠0

The following sums are needed in the calculation of the denominator of the test statistic.

x y xy x2 y2
97 12.5 1212.5 9409 156.25
93 12.5 1162.5 8649 156.25
88 8 704 7744 64
81 9.5 769.5 6561 90.25
75 16.5 1237.5 5625 272.25
57 11 627 3249 121
52 10.5 546 2704 110.25
45 9 405 2025 81
28 6 168 784 36
15 1.5 22.5 225 2.25
12 1 12 144 1
11 1 11 121 1
sum 654 99 6877.5 47240 1091.5

( ∑ x )2 6542
S xx=∑ x − 2
=47240− =11597
n=12 , b=0 .12779 , n 12 ,
(∑ y ) 2
99 2
S yy =∑ y 2 − =1091. 5− =274 . 75
n 12 ,
( ∑ x )( ∑ y ) 654×99
S xy =∑ xy − =6877 .5− =1482
n 12
S 2xy 14822
SSE=S yy − =274 . 75− =85. 36274
S xx 11597 ,
SSE 1/2 85 . 36274 1/2
S=( ) =( ) =(8 . 536274 )1 /2 =2 . 92169
n−2 10

b 0 . 12779
=t 0 = = =4 . 710158
Test statistic S / √ S xx 2. 92169 / √11597

Using n−2=10 degrees of freedom


t 0. 005 =−3 .169273 and t 0. 995 =3 .169273 .
t |t <−3 . 169273 or t 0 >3 . 169273 ¿ ¿ .
Critical region is R = { 0 0
Since
t 0 =4 . 710158>3 . 169273 , H is rejected, and it is concluded that β≠0 .
0

p-value =2 P(t >4 .710158 )=0. 00083 .


175

(b) Using n−2=10 degrees of freedom


t 0. 975 =2. 228

2.92169
0.12779±2.228× =0.12779±0.06045=(0.06734 ;0.18824 )
Confidence interval: √ 11597

2 Suppose that in the above question the hyopotheses H0: β=0 versus H1: β >0 were
tested at the 1% level of significance. How will the testing procedure change?

The hypotheses to be tested will be H0: β=0 and H1: β >0 .


The steps followed to calculate the test statistic will be the same as described above.
Therefore, as shown above 0
t =4 . 710158.
Since this is one-sided alternative hypothesis, the critical region will change.
Using 10 degrees of freedom,
t 0. 99=2 . 7638 and the critical region is R ={t 0|t 0 >2. 7638}.
Since
t 0 =4 . 710158>2 .7638 , H is rejected and β >0 is concluded.
0

[ p-value=P (t >4 .710158 )=0 .000414 ]

10.5 Computer output

Consider the data on age (x variable) and DBH (y variable). The output when performing a
straight line regression on this data on excel is shown below.

SUMMARY OUTPUT
Regression Statistics
R Square 0.689307572
ANOVA
Significance
df SS MS F F
Regression 1 189.3872553 189.3873 22.1862 0.000828626
Residual 10 85.36274468 8.536274
Total 11 274.75

Standard
Coefficients Error t Stat P-value
Intercept 1.285353971 1.702259153 0.755087 0.46761
X Variable 0.12779167 0.027130722 4.71022 0.00083

1 The coefficient of determination in the above table is R square = 0.689307572.

2 The ANOVA (Analysis of Variance) table is constructed to test whether there is a


significant linear relationship between X and Y. The p-value for this test is the entry under
the Significance F heading in the ANOVA table. Since this p-value < 0.05 (or 0.01), the
hypothesis of “no linear relationship between X and Y” can be rejected and it can be
concluded that there is a significant linear relationship between X and Y.
176

3 The third of the tables in the summary output shows the intercept and slope values of the
line. These are the first two entries under Coefficients. The remaining columns to the right of
the Coefficients column concerns the performance of tests for zero intercept and slope. From
the intercept and slope p-values (0.46761 and 0.00083 respectively) it follows that the
intercept is not significantly different from zero at the 5% level of significance
(0.46761>0.05) but that the slope is significantly different from zero at the 5% or 1% levels
of significance (0.00083 < 0.01 < 0.05).

When the correlation coefficient is calculated for the above-mentioned data by using excel,
the output is as shown below.

Column
1 Column 2
Column
1 1
Column 0.8302
2 5 1

The above table shows that the correlation between x and y is 0.83025.
177

Chapter 11 – Analysis of Variance

11.1 Comparison of k means

Consider an experiment where samples are drawn from k different populations that are
labelled 1, 2, . . . ,k . A test for the equality of the k different means
μ1 , μ2 ,⋯, μ k is to be
performed. A random sample of size
n1 is drawn from population 1, one of size n2 from
n
population 2, . . . , one of size k from population k . The k different populations can also
be seen as k different treatments. The data layout and some of the important calculations are
shown below.

Observations mean Sum of squares


n1

x 11 , x12 ,⋯, x 1n x̄ 1 ∑ ( x 1 j− x̄ 1 )2 =( n1−1 ) S21


Treatment 1 : 1 j=1
n2

x 21 , x 22 ,⋯, x 2 n x̄ 2 ∑ ( x 2 j− x̄ 2 )2 =( n2−1 ) S 22
Treatment 2 : 2 j=1
. . . . . . .
nk

x , x ,⋯, x kn x̄ k ∑ ( x kj− x̄ k )2 =( nk −1) S 2k


Treatmentk : k 1 k 2 k j=1
k ni
1
= x̄= ∑ ∑ x ij
Total sample size
=n1 +n 2 +⋯+nk =n . Overall (grand) mean n i=1 j =1 .

The experiment described above can also be seen as randomly assigning n experimental
units to k different treatments such that
n1 units are assigned to treatment 1, n2 to treatment 2,

n
, k to treatment k with 1 2
n +n +⋯+ nk =n . For this reason, this design is referred to as
a completely randomized design.

Example

The sound distortion on 4 different types of coatings (A, B, C, D) on sound tapes are
measured. The data collected are shown below.

Coating Observations mean Sum of squares


A 10, 15, 8, 12, 15
x̄ 1=12 ( n1 −1) S 21 =( 5−1)×9. 5=38
B 14, 18, 21, 15
x̄ 2 =17 ( n2 −1) S22 =( 4−1)×10=30
C 17, 16, 14, 15, 17, 15, 18
x̄ 3 =16 ( n3 −1 ) S23 =(7−1 )×2=12
D 12, 15, 17, 15, 16, 15
x̄ 4 =15 (n 4 −1)S 24 =(6−1 )×2 . 8=14
178

In this example
12×5+17×4+16×7+15×6
k =4 , n1=5 , n 2=4 , n3 =7 , n 4 =6 , n=22 , x̄= =15
22 .

11.2 Hypotheses, partitioning of sum of squares and ANOVA table

The hypotheses to be tested can be written as

H0: 1
μ =μ =⋯=μ =μ
2 k
H1: Not all means are equal

or

H0: 1
α =α =⋯=α
2 k −1 =0
α
H1: No all ’s equal to 0

In the above null hypothesis


α i=μ i−μ , i=1 ,2 ,⋯, k is the effect of the i th treatment. Since
k
∑ αi =0 α k =0 .
it is assumed that i=1 , it follows that (under H0)
k ni
SST =∑ ∑ ( x ij − x̄ )2
The total sum of squares i=1 j =1 can be partitioned in the following way.
k ni k ni k ni k ni

SST =∑ ∑ ( x ij − x̄ ) =∑ ∑ ( x ij − x̄ i + x̄ i − x̄ ) =∑ ∑ ( x ij − x̄ i ) +2 ∑ ∑ ( x ij − x̄ i )( x̄ i− x̄ )
2 2 2

i=1 j =1 i=1 j =1 i=1 j=1 i=1 j =1


k ni
+ ∑ ∑ ( x̄ i− x̄ )2
i=1 j=1
The middle term on the right-hand side in the above expression can be shown to be 0.

Therefore

k ni k ni k ni
SST =∑ ∑ ( x ij − x̄ ) =∑ ∑ ( x ij − x̄i ) + ∑ ∑ ( x̄ i− x̄ )2
2 2

i=1 j =1 i=1 j =1 i=1 j=1

= SSE + SSTr
(Error) (Treatment)

In a similar fashion the degrees of freedom associated with each of the above sums of squares
can be partitioned as n−1=(n−k )+( k−1 ) .
These above results can be summarized in the form of an Analysis of Variance (ANOVA)
table as shown below.

Source Sum of squares df. Mean Square F0


Treatment SSTr k −1 MSTr=SSTr /(k −1) MSTr / MSE
Error SSE n−k MSE=SSE/(n−k )
179

Total SST n−1

When H0 is true and the y observations are assumed to be normally distributed with equal
variances, the F statistic follows an F distribution with k −1 and n−k degrees of freedom.

Since this test is concerned with testing the effect of a single factor, the testing procedure
described above is also known as one-way Analysis of Variance (ANOVA).

11.3 Computational formulae and testing steps

k ni k ni k ni
T2
SST =∑ ∑ ( x ij − x̄ ) =∑ ∑ x −
2
T =∑ ∑ x ij
2
ij
i=1 j =1 i=1 j=1 n , where i=1 j=1 is the grand total.
ni n
k k
T 2i T 2 i

SSTr=∑ ∑ ( x̄ i− x̄ )2 =∑ − T i= ∑ xij
i=1 j=1 i=1 ni n , where j=1

SSE= SST − SSTr

Test for equality of k means (one-way Analysis of Variance)


Step 1: State null and alternative hypotheses
H0:
μ1 =μ2 =⋯=μk =μ versus H : Not all means are equal
1

Step 2: Calculate SST ,SSTr and SSE and use these quantities to calculate 0 as explained
F
in the above ANOVA table,
Step 3: State the level of significance α and determine the critical value(s) and critical
region. This is based on
F 0 ~ F k−1 ,n−k .
Using k −1 and n−k degrees of freedom, the critical region is R = {
F 0|F 0 >F 1−α ¿¿ .

Step 4: If
F 0 lies in the critical region, reject H , otherwise do not reject H .
0 0

Step 5: State the conclusion in terms of the original problem

Example

By using the data on the sound distortion on the 4 different types of coatings, test at the 5%
level of siginifance whether their population means could be equal.

H0:
μ1 =μ2 =μ3 =μ4 =μ

H1: Not all means are equal

or

H0:
α 1=α 2 =α 3 =0
H1: No all α ’s equal to 0, where
α i=μ i−μ , i=1 ,2 ,⋯, 4 .
180

k ni 2
T
SST =∑ ∑ x 2ij − =
i=1 j =1 n (102+152+82+122+152+142+182+212+ 152+172+162+142+152+172+
2
−330
152+182 +122+152+172+152+162 +152)
22
=5112−4950=162
k
T i2 T2
SSTr=∑ −
i =1 ni n , where T 1=60 ,T 2=68 ,T 3 =112 , T 4 =90

2 2 2 2
60 68 112 90
¿ + + + −4950=( 720+1156+1792+1350 )−4950
5 4 7 6

¿ 5018−4950=68.

ANOVA table

Source Sum of squares df. Mean Square F0


Treatment 68 3 68 22 .67
=22. 67 =4 . 34
3 5 .22
Error 94 18 94
=5 . 22
18
Total 162 21

Test statistic
=F 0 =4 .34 .
Using 3 and 18 degrees of freedom, the critical region is R = {
F 0|F 0 >3 .16 ¿¿ .
Since
F 0=4 . 34>3 . 16 , H is rejected, and it is concluded that not all means are equal.
0

p-value =P (F >4 .34 )=0 .0181 .

11.4 Graphical analysis

A graph showing side-by-side box plots of the y observations from the different ANOVA
groups can reveal some useful additional information about the validity of the test
assumptions and conclusions made about the test results. As an example, the side-by-side box
plots for the sound distortion data shown below suggest that

1 The mean sound distortion for coating A appears to be less than that for coatings B ,C and
D.

2 The “box” part of the box plot for coatings A and B is greater than that for coatings C and
D . This suggests that the assumption of equal variances of the y observations is probably not
valid.
181

The traditional one-way ANOVA performed above is performed under the assumption of
equal population variances. The big differences between the 4 sample variances (9.5, 10, 2,
2.8) creates doubts about the validity of this assumption. An alternative test that allows
unequal variances is the Welch one-way ANOVA (not discussed here).

11.5 Follow up (post hoc) tests – Bonferroni adjustment to LSD method

When H0: 1
μ =μ =⋯=μ
2 k is rejected, follow up tests to determine which means are
different from each other are needed. This is done by testing for the equality of all possible
k (k−1 )
c=
pairs of means selected from the k population means. In general, there are 2 pairs
that are tested. When c such tests are each performed at the 100 α % level of significance,
the probability of a type I error associated with a conclusion based on the results of all these
c
tests is α E =1−(1−α ) . It can be shown that E
α ≤cα . This means that if the overall
α
α α= E
probability of a type I error is to be no more than E , it is specified that c . This
adjustment of α for the tests concerning the individual pairs of means is called the
Bonferroni adjustment. Each test is then performed by comparing the absolute difference
between the sample means for groups i and j i.e. i
| x̄ − x̄ |
j with the Least Significance
1 /2
1 1 n n
Difference = LSD=t n−k ;α /2 [ MSE ×( + )] , where i and j are the sample sizes of the
ni n j
groups that are being compared and MSE the mean square error that is obtained from the
ANOVA table. If
| x̄ i− x̄ j|> LSD the hypothesis of equal means is rejected, otherwise it is
not rejected.

These post hoc tests can also be carried out by using other similar testing procedures (not
discussed here).

Example
182

4×3
k =4 c= =6
For the sound distortion data (discussed above) . Then 2 and when α =0 . 05 ,
6
α E=1−(1−0 . 05) =0 . 265 . This means that the overall probability of a type I error (0.265)
for a conclusion based on the 6 tests for equality of means is more than 5 times the type I
error for each individual test (0.05).

When testing for the equality of all possible pairs of means in the sound distortion example
and specifying E
α =0 . 05
, the probability of a type I error for the individual tests should be
α E 0 .05
α= = =0 .0083
c 6 .
The 4 sample-means are
x̄ 1=12 , x̄ 2=17 , x̄ 3 =16 , x̄ 4 =15 , sample sizes n1 =5 , n2 =4 ,
n3 =7 , n 4 =6 and MSE=5 .22 (from the ANOVA table). Using n−k =22−4=18 degrees of
freedom.t 18; 0.05/(2 ×6) =2.962 Then

1 1 1 1
LSD=2 . 9627×[ 5. 22×( + )]1/2 =6 . 77×( + )1/2
ni n j ni n j .
Tests for differences between means

H0:
μ1 =μ2
1 1 1/2 1 1
LSD=6 . 77×( + ) =6 . 77×( + )1/2 =4 .54
| x̄ 1− x̄ 2|=|12−17|=5 , n1 n2 5 4
Conclusion: Since 5 > 4.54,
μ1 and μ2 are different.

H0:
μ1 =μ3
1 1 1/2 1 1
LSD=6 . 77×( + ) =6 . 77×( + )1/2 =3. 96
| x̄ 1− x̄ 3|=|12−16|=4 , n1 n3 5 7
Conclusion: Since 4 > 3.96,
μ1 and μ3 are different.

H0:
μ1 =μ4
1 1 1/2 1 1
LSD=6 . 77×( + ) =6 .77×( + )1/2=4 .10
| x̄ 1− x̄ 4|=|12−15|=3 , n1 n4 5 6
Conclusion: Since 3 < 4.10,
μ1 and μ4 are not different.

H0:
μ2 =μ3
1 1 1 /2 1 1
+ ) =6 . 77×( + )1/2 =4 . 24
LSD=6 . 77×(
| x̄ 2 − x̄3|=|17−16|=1 , n2 n3 4 7
μ μ
Conclusion: Since 1 < 4.24, 2 and 3 are not different.

H0:
μ2 =μ4
183

1 1 1/2 1 1
LSD=6 . 77×( + ) =6 .77×( + )1/2=4 .37
| x̄ 2 − x̄ 4|=|17−15|=2 , n2 n4 4 6
Conclusion: Since 2 < 4.37,
μ2 and μ4 are not different.

H0:
μ3 =μ4
1 1 1/2 1 1
LSD=6 . 77×( + ) =6 .77×( + )1/2 =3. 77
| x̄ 3 − x̄ 4|=|16−15|=1 , n3 n4 7 6
Conclusion: Since 1 < 3.77,
μ3 and μ4 are not different.

Overall conclusion: Population 1 (coating A) has a mean sound distortion that is less than
that of populations 2 and 3 (coating B and C). This is a confirmation of the conclusion
suggested by the side-by-side box plot.

11.6 Checking the assumption of normally distributed data

An important assumption in ANOVA is that the data are normally distributed. This can be
checked by drawing a histogram of the data or a Q-Q plot of the error (residual) values
obtained from fitting the model to the data. The code below (written in R) fits an ANOVA to
the data, draws a histogram and a Q-Q plot.

data=read.table("clipboard",header=T)
attach(data)
data
y coating
1 10 A
2 15 A
3 8 A
4 12 A
5 15 A
6 14 B
7 18 B
8 21 B
9 15 B
10 17 C
11 16 C
12 14 C
13 15 C
14 17 C
15 15 C
16 18 C
17 12 D
18 15 D
19 17 D
20 15 D
21 16 D
22 15 D
model <- aov(y ~ coating, data = data)
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
coating 3 68 22.667 4.34 0.0181 *
Residuals 18 94 5.222
---
184

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Histogram
hist(y)

The histogram has a bell-shaped appearance, which suggests that the data could be normally
distributed.
# Draw a Q-Q plot
qqnorm(model$residuals, pch=16,cex=0.5)
qqline(model$residuals)

For the error (residual) terms to be normally distributed, the Q-Q (Quantile-Quantile) plot
should show a straight-line pattern. In the above plot, the points do show a straight-line
pattern. This indicates that the error terms are normally distributed.
185

Chapter 12 – Analysis of Categorical data

12.1 Tables of counts

Data that are measured on the nominal or ordinal scales are usually summarized in the form
of tables of counts. In such tables the observations are allocated to categories/combinations of
categories and the number of observations in each category/combination of categories
determined. When observations are allocated to two or more combinations of categories, the
resulting table of counts is referred to as a cross-classification table (cross tab) or contingency
table.

Examples

Table 12.1 –The marital status of 500 adults

Status Frequency
Married 200
Widowed 47
Divorced 84
Separated 59
Never 110
married
Total 500

Table 12.2 – Educational status of a group of people that are 21 years or older

Status Frequency
Primary school or less 53
Less than grade 12 105
Grade 12 239
Diploma/Certificate 114
Degree 152
Post graduate 37
Total 700

Table 12.3 – Health classification of people on two different diets

Diet/ Excellent Averag Poor Total


Health e
A 37 24 19 80
B 17 33 20 70
Total 54 57 39 150

Table 12.4 – Cross-classification table of a newspaper readership survey


186

newspaper
occupation G&M Post Star Su Total
n
Blue collar 27 18 38 37 120
White 29 43 21 15 108
collar
Professional 33 51 22 20 126
Total 89 112 81 72 354

1 In table 12.1 the factor “marital status” is measured on the nominal scale of measurement.
2 In table 12.2 the factor “educational status” is measured on the ordinal scale of
measurement.
3 In table 12.3 the row factor (diet) is measured on the nominal scale and the column factor
(health) on the ordinal scale.
4 In table 12.4 both the row and column factors are measured on the nominal scale.
5 Both tables 12.3 and 12.4 are cross tabs but they differ in the sense that in table 12.3 the
row totals (margins) are fixed in advance (80 and 70), while in table 12.4 neither the row nor
the column totals are not fixed in advance.

12.2 The Pearson chi-square goodness-of-fit test

Each value in a random sample of n observations is classified in to one of k categories. The


resulting table looks as shown below.

Category 1 2 ... k
Count (number) n1 n2 . . . nk

k
n1 +n2 +⋯+ nk =∑ ni =n
In the above table i =1 . Let 1 2
p , p ,⋯, p
k denote the probabilities
associated with the different cells. The hypothesis that these probabilities follow a particular
pattern is to be tested.

Pearson chi-square goodness-of-fit test


Step 1: State null and alternative hypotheses
H0:
p1 = p10 , p2 = p20 ,⋯, pk = pk 0

H1: Probability pattern in H0 is not true.

p10 , p 20 ,⋯, p k 0 are the hypothesized probabilities.


Step 2: Calculate the expected frequencies assuming H0 is true i.e.
e 1=n p10 , e2=n p20=⋯ , e k =n p k0 and use these to calculate the test statistic
k
( ni−ei )2
χ 0 =∑
2

i =1 ei .
Step 3: State the level of significance α and determine the critical value(s) and critical
2 2
region. This is based on χ 0 ~ χ k −1 .
187

2 2 2
Using k −1 df. , the critical region is R = { χ 0| χ 0 > χ 1−α ¿ ¿ .
2
Step 4: If χ 0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem

Example

The number of births (per 10 000) by day of the week in the USA in 1971 is given below.

Mon. Tues. Wed. Thurs. Fri. Sat. Sun. Total


Numbe 52.09 54.4 52.68 51.68 53.83 47.2 44.36 356.31
r 6 1

Test whether all 7 days of the week are equally likely for childbirth.

1
p1 = p2 =⋯= p7 =
H0: 7
1
p
H1: Not all i ’s equal to 7
Total
ni 52.09 54.46 52.68 51.68 53.83 47.21 44.36 356.31
ei 50.90143 50.90143 50.90143 50.90143 50.90143 50.90143 50.90143 356.31

( ni −e i )2
ei 0.027754 0.248783 0.062146 0.011909 0.168493 0.267707 0.84065 1.627441

2
From the above table χ 0 =1 . 627441 .
2 2 2
Using k −1=6 df. the critical region is R = { χ 0| χ 0 > χ 0. 95=12 . 5916 ¿ ¿ .
2 2
Since χ 0 =1 . 627441< 12. 5916 , H is not rejected. [p-value =P ( χ >1 .627441 )=0 . 9506 ].
0
Conclusion: No reason to believe that all 7 days of the week are not equally likely for
childbirth.

12.3 Test of homogeneity (one margin fixed)

Consider the following contingency table with the row totals decided before hand (fixed).

Row/column 1 2 ... c Total


1 n11 n12 . . . n1 c n1 .=r 1
2 n21 n22 . . . n2 c n2 .=r 2
Total n. 1 =c 1 n. 2 =c 2 . . . n. c =c c n. .=n
188

p , p ,⋯, p
Let 11 12 1c and 21 22 p , p ,⋯, p
2 c denote the probabilities associated with the cells in
the first and second rows respectively. The test for homogeneity is a test that the probability
patterns in the two rows are the same i.e. H0:
p11=p 21 , p12= p22 ,⋯, p1c = p2 c .

Denote the common probabilities under H0 by 1 2


p , p ,⋯, p
c . Assuming H0 to be true these
c1 c2 c
^p1 = , ^p2 = ,⋯, ^pc = c
probabilities can be estimated by n n n . Assuming H0 to be true the
expected frequencies for the first row can be calculated by
r c r c r c
e 11=r 1 p^ 1= 1 1 , e12=r 1 p^ 2= 1 2 ,⋯,e 1 r =r 1 ^pc = 1 c
n n n and for the second row by

r2 c1 r2 c2 r2 cc
e 21=r 2 ^p1 = ,e 22=r 2 p^ 2= ,⋯, e 2 c=r 2 p^ c=
n n n . Therefore, the general formula for
ri c j
e ij= ,i=1 , 2; j=1 , 2,⋯, c
the expected frequencies assuming H0 to be true is n .
Test of homogeneity (one margin fixed) –Test for equality of k proportions
Step 1: State the null and alternative hypotheses
H0:
p11=p 21 , p12= p22 ,⋯, p1c = p2 c

H1: Not all proportions in the two rows are equal

Step 2: Calculate the expected frequencies assuming that H0 is true i.e.,


ri c j
e ij= ,i=1 , 2; j=1 , 2,⋯, c ri cj ith th
n , where and are the sums of the row and j column
respectively, and use these to calculate the test statistic

2
( nij −eij )2
c
χ 20 =∑ ∑
i =1 j =1 eij .
Step 3: State the level of significance α and determine the critical value(s) and critical
2 2
region. This is based on χ 0 ~ χ c−1 .

2 2 2
Using c−1 df. , the critical region is R = { χ 0| χ 0 > χ 1−α ¿ ¿ .
2
Step 4: If χ 0 lies in the critical region, reject H0, otherwise do not reject H0.

Example

Consider the data on health and diet in table 12.3.

Diet/ Excellent Averag Poor Total


Health e
A 37 24 19 80
B 17 33 20 70
189

Total 54 57 39 150

Test at the 5% level of significance whether the health proportions are the same for diets A
and B.

Refer to the diets A and B by i=1 , 2 and the health states by j=1,2,3 .
H0: p11 =p 21 , p12= p22 , p13= p23 (Diet proportions are the same)
H1: Not all proportions in the two rows are equal (Diet proportions are not all the same)

The calculations of expected frequencies are shown in the table below.

j=1 j=2 j=3


i=1 28.8 30.4 20.8 80
i=2 25.2 26.6 18.2 70
54 57 39 150

Each cell entry in the above table is cell row total ¿ cell column total / 150 e.g.,
80×54 80×57
28 . 8= 30 . 4=
150 , 150 etc.

2 3
(nij −eij )2 (37−28 . 8)2 (24−30 . 4 )2 ( 19−20. 8 )2 ( 17−25 . 2)2
χ 20 = ∑ ∑ e =28 . 8 +
30 . 4
+
20 . 8
+
25. 2
+
i =1 j =1 ij

(33−26 .6 )2 (20−18 . 2)2


+ =8 .224
26 . 6 18 .2
2 2 2
Using c−1=2 df. the critical region is R = { χ 0| χ 0 > χ 0. 95=5 . 991 ¿ ¿ .
2
Since χ 0 =8 . 224> 5. 991 , H is rejected, and it is concluded that the proportions for the 2
0
2
diets are not equal. [p-value =P ( χ >8 . 224 )=0 . 0164 ]

The multiple bar chart below shows that following diet A leads to a better health than
following diet B.
190

12.4 Test of independence of row and column factors (neither margin fixed)

As for the test described in the previous section, this test is also based on a two-factor table of
counts, but in the construction of the table neither the row, or column totals are fixed. The
table can have any number of rows (say r ) and any number of columns (say c ) and has the
following appearance.

Row/column 1 2 ... c Total


1 n11 n12 . . . n1 c n1 .=r 1
2 n21 n22 . . . n2 c n2 .=r 2
⋮ ⋮
r n r1 n r2 . . . n rc n r.=r r
Total n. 1 =c 1 n. 2 =c 2 . . . n. c =c c n. .=n

The hypotheses to be tested are H0: Row and column factor are independent
H1: Row and column factor are not independent

Assuming H0 to be true, the probability


pij of an observation in row i and column j can be
calculated as follows.

pij = P (row i and column j ) =P ( row i )¿ P( column j ) (independence)


r c
= i× j
n n
ri c j ri c j
e =npij =n× × =
Assuming H0 to be true, the expected count can be calculated as ij n n n .

Test of independence of row and column factors (neither margin fixed)


Step 1: State null and alternative hypotheses
191

H0: Row and column factor are independent


H1: Row and column factor are not independent
Step 2: Calculate the expected frequencies assuming H0 is true i.e.
ri c j
e ij= ,i=1 , 2 ,…, r ; j=1 , 2 ,⋯, c
n and use these to calculate the test statistic
r c 2
( nij −eij )
χ 20 =∑ ∑
i =1 j =1 eij .
Step 3: State the level of significance α and determine the critical value(s) and critical
2 2
region. This is based on χ 0 ~ χ( r−1)( c−1) .

Using (r−1 )(c−1 ) df. , the critical region is R = { χ 0| χ 0 > χ 1−α ¿ ¿ .


2 2 2

2
Step 4: If χ 0 lies in the critical region, reject H0, otherwise do not reject H0.

Example

By using the counts in table 12.4 (see below), test at the 1% level of significance whether
occupation and newspaper are independent.

newspaper
occupation G&M Post Star Sun Total
Blue collar 27 18 38 37 r 1 =120
White collar 29 43 21 15 r 2 =108
Professional 33 51 22 20 r 3 =126
Total c1 = c2 = c3 = c4 = n=
354
89 112 81 72

H0: Occupation and newspaper are independent


H1: Occupation and newspaper are not independent

ri c j
e ij= ,i=1 , 2 ,…, 3 ; j=1 , 2 ,⋯, 4
Using 354 , the following table of expected frequencies are
obtained.

j=1 j=2 j=3 j=4


30.1694915 27.4576271 24.4067
i=1 3 37.9661 2 8 120
27.1525423 34.1694 24.7118644
i=2 7 9 1 21.9661 108
39.8644 28.8305084 25.6271
i=3 31.6779661 1 7 2 126
89 112 81 72 354
192

From the above tables of observed and expected frequencies it follows that
3 4
( n −e )2
χ 20 =∑ ∑ ij ij =32 . 5726
i =1 j =1 eij .

Using (r−1)(c−1)2×3=6 df., the critical region is R = { χ 0| χ 0 > χ 0. 99=16 . 8119 ¿ ¿ .


2 2 2

2 2
Since χ 0 =32 .5726 > χ 0 . 99=16 . 8119 , H is rejected, and it is concluded that occupation and
0
newspaper are not independent.

2 −5
[p-value =P ( χ >32 .5726=1 .267×10 ]

The multiple bar chart below shows that the Star and Sun are mostly read by Blue Collar
workers, while the Post is mostly read by White Collar and Professional workers and G&M
by all 3 groups.

Chapter 13 – Nonparametric tests

13.1 Parametric versus nonparametric tests

Parametric tests are based on assumptions about the distributions from which the sample(s),
used in the testing procedure, are drawn. For example, the one-sample t test is based on a
sample being drawn from a population which is normally distributed. Because the
distribution from which the sample is taken is specified by the values of two parameters, μ
2
andσ , the t test is a parametric procedure.
193

When using nonparametric tests, no assumptions concerning the distributions from which the
samples are drawn, or their parameters are made. For this reason, the term “Distribution
Free” is sometimes also used to refer to these tests. These tests will be used when there is
some doubt about the validity of the assumptions that are made for parametric tests.

13.2 The Wilcoxon signed rank test

This test is the nonparametric equivalent of the one sample t test or the t test for data
involving matched pairs.

13.2.1 Wilcoxon signed rank test for the median

The hypothesis H0:


m=m0 is tested by drawing a random sample x 1 , x 2 ,⋯, x n of size n from
x −m0 , x 2−m0 ,⋯, x n−m0 are
some distribution. The absolute values of the differences 1
ranked from smallest to largest and the ranks originating from positive and negative
+ −
differences identified. The test statistic is T 0 =min (T ,T ) , where

T + = sum of ranks from positive differences and


T − = sum of ranks from negative differences.

The critical region can be written down by specifying 100(1−α) % ,the level of significance,
and looking up critical values from the tables of the signed rank distribution.

Wilcoxon signed rank test for the median


Step 1: State null and alternative hypotheses.
H0:
m=m0
H1a:
m<m0 or H : m>m0 or H : m≠m0 .
1b 1c
T
Step 2: Calculate the test statistic 0 .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

(i) For alternatives H1a and H1b the critical region is R = {T0 | T0 < Tα}.

(ii) For alternative H1c the critical region is R = {T0 | T0 < Tα/2}.
Step 4: If the test statistic is in the critical region reject H0 , otherwise do not reject H0.
Step 5: State conclusion in terms of the original problem.

Comment: Since T is a discrete random variable it might not always be possible to find the
T α such that P(T <T α )=α . Therefore, the critical value T c is defined as the
critical value
maximum value such that P(T <T c ) ¿ α . This applies to all tests where the critical values
are determined from discrete probability distributions.

Example
194

The following are the scores ( x) obtained by 12 randomly selected people in a standard
aptitude test: 107 113 108 128 146 103 109 118 111 119 155 140

(a) Represent this data in the form of a dot plot. Comment on its appearance.
(b) By performing the Wilcoxon signed rank test, test at the 5% level of significance
whether the median could 120.

(a) The dot plot below suggests that the distribution of scores are positively skewed. For this
reason, there is some doubt about the assumption that the scores are normally distributed.
Therefore, a nonparametric test is preferred to a t test.

H0: m=120
H1: m≠120

Calculation of the test statistic

x x-120 |x-120| rank sign


107 -13 13 8 -
113 -7 7 3 -
108 -12 12 7 -
128 8 8 4 +
146 26 26 11 +
195

103 -17 17 9 -
109 -11 11 6 -
118 -2 2 2 -
111 -9 9 5 -
119 -1 1 1 -
155 35 35 12 +
140 20 20 10 +

+ −
From the above table T = 4+11+12+10 = 37 , T = 8+3+7+9+6+5+2+1= 41 and
T 0 = min (37,41) = 37.

From the Wilcoxon signed rank table with n=12,α=0.05 (two-tailed) it follows that
T 0.025=14 .

The critical region is R = { T0 | T0 <14 }. Since T 0=37>14 , H0 cannot be rejected.


Conclusion: The median is probably 120.

13.2.2 Wilcoxon signed rank test for the difference between paired (matched) samples

The data consists of pairs of observations ( x i , y i ),i=1, 2 ,⋯, n where the two samples are
matched via some common factor (e.g., family connection, 2 different observations on the
same subject).

The data layout for the experiments described above is shown below.

sample 1 x1 x2 .... xn
.
sample 2 y1 y2 .... yn
.
difference d 1 x1  y1 d 2 =x 2− y 2 .... d n =x n − y n
.

The median of the paired differences of the ( x , y ) values of the two populations is defined as
m d . The signed rank test can be used to test H : m d =d 0 . When performing this test, no
0
assumption is made about the population from which the sample is drawn. The test statistic is
calculated in much the same way as that for the signed rank test for the median. The absolute
d −d , d −d ,⋯, d −d
values of the differences 1 0 2 0 n 0 are ranked from smallest to largest and
the ranks originating from positive and negative differences identified. The test statistic is
T 0 =min (T + ,T − ) , where
T + = sum of ranks from positive differences and
T − = sum of ranks from negative differences.

The critical region can be written down by specifying 100(1−α) % , the level of significance,
and looking up critical values from the tables of the signed rank distribution.
196

Wilcoxon signed rank test for the difference between paired (matched) samples
Step 1: State null and alternative hypotheses.
H0:
m d =d 0
H1a:
m d <d 0 or H : m d >d 0 or H : m d ≠d 0 .
1b 1c
T
Step 2: Calculate the test statistic 0 .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

(i) For alternatives H1a and H1b the critical region is R = {T0 | T0 < Tα}.

(ii) For alternative H1c the critical region is R = {T0 | T0 < Tα/2}.
Step 4: If the test statistic is in the critical region reject H0, otherwise do not reject H0.
Step 5: State conclusion in terms of the original problem.

Example

In order to compare the effectiveness of two methods (A and B) for teaching mathematics, 10
randomly selected pairs of twins of school going age were tested. For each of these pairs of
twins one twin was selected at random and assigned to method A, while the other twin was
assigned to method B. After being taught mathematics for 2 months according to the chosen
methods all the twins were given identical mathematics tests. The test scores are shown in the
table below.

Twin 1 2 3 4 5 6 7 8 9 10
Method A 6 80 65 70 86 5 63 81 86 60
7 0
Method B 3 75 73 55 74 5 56 72 89 47
9 2

Test, at the 5% level of significance, whether method A has higher scores than method B.

H0:
m d =0

H1:
m d >0

Calculation of the test statistic


197

A B A-B |A-B| rank sign


67 39 28 28 10 +
80 75 5 5 3 +
65 73 -8 8 5 -
70 55 15 15 9 +
86 74 12 12 7 +
50 52 -2 2 1 -
63 56 7 7 4 +
81 72 9 9 6 +
86 89 -3 3 2 -
60 47 13 13 8 +

+ −
From the above table T = 10+3+9+7+4+6+8 = 47, T = 5+1+2= 8 and
T 0 = min (47,8) = 8.

From the Wilcoxon signed rank table with n=10,α=0.05 (one-tailed) it is found that
T 0.05=11.

The critical region is R = { T0 | T0 ≤11 }. Since T 0 <11, H0 is rejected.


Conclusion: Method A has higher scores than method B.

13.2.3 The normal approximation to the Wilcoxon signed rank test

For a sufficiently large sample size n> 25 , the normal approximation to the Wilcoxon
signed rank test can be used. The hypotheses to be tested are the same as those formulated in
T 0 −μT
z0=
the previous two sections. The test statistic used is σ T , where T is calculated as
n ( n+1) n( n+ 1)( 2 n+1) 1/ 2
μT = , σ T =[ ]
explained in the previous two sections and 4 24 . The
critical region is determined from the fact that
Z 0 is approximately normally distributed.

The normal approximation to the Wilcoxon signed rank test


Step 1: State the appropriate null and alternative hypotheses
H0:
m=m0
H1a:
m<m0 or H : m>m0 or H : m≠m0 .
1b 1c
or
H0:
m d =d 0
H1a:
m d <d 0 or H : m d >d 0 or H : m d ≠d 0 .
1b 1c
T 0 −μT
z0=
Step 2: Calculate the test statistic σ T , where
n ( n+1) n( n+ 1)( 2 n+1) 1/ 2
μT = , σ T =[ ]
4 24 .
Step 3: State the level of significance α and determine the critical value(s) and critical
198

region.

(i) For alternative H1a the critical region is R = {z0 | z0 < Zα}.

(ii) For alternative H1b the critical region is R = {z0 | z0 > Z1-α}.

(iii) For alternative H1c the critical region is R = {z0 | z0 > Z1-α/2 or z0 < Zα/2}.

Step 4: If z0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.

Example

Each of 25 subjects was asked to perform a certain task under normal and stress conditions.
Blood pressure readings that were taken under both conditions are shown in the table below.

person normal stress


1 115.96 120.55
2 110.86 107.95
3 122.89 119.45
4 120.38 123.71
5 117.75 120.45
6 120.1 119.63
7 123.16 127.95
8 125.14 125.69
9 116.31 127.23
10 121.14 123.23
11 122.16 123.93
12 116.71 126.78
13 126.43 127.72
14 115.84 117.95
15 119.23 115.48
16 125.26 129.06
17 123.43 126.63
18 119.48 124.91
19 120.7 122.93
20 120.65 129.16
21 127.42 128.15
22 126.53 123.15
23 120.47 121.22
24 116.39 123.73
25 116.59 125.92

Do the data present sufficient evidence to indicate higher blood pressure readings during
conditions of stress? Perform the test at the 1% level of significance.
199

Let
d i = normal – stress , i=1,⋯,25 denote the difference between normal and stress blood
i i
m
pressure readings and d the median of the random variable d .

H0:
m d =0
H1:
m d <0

Calculation of the test statistic

normal- |normal-
person normal stress stress stress| rank sign
1 115.96 120.55 -4.59 4.59 18 -
2 110.86 107.95 2.91 2.91 11 +
3 122.89 119.45 3.44 3.44 15 +
4 120.38 123.71 -3.33 3.33 13 -
5 117.75 120.45 -2.7 2.7 10 -
6 120.1 119.63 0.47 0.47 1 +
7 123.16 127.95 -4.79 4.79 19 -
8 125.14 125.69 -0.55 0.55 2 -
9 116.31 127.23 -10.92 10.92 25 -
10 121.14 123.23 -2.09 2.09 7 -
11 122.16 123.93 -1.77 1.77 6 -
12 116.71 126.78 -10.07 10.07 24 -
13 126.43 127.72 -1.29 1.29 5 -
14 115.84 117.95 -2.11 2.11 8 -
15 119.23 115.48 3.75 3.75 16 +
16 125.26 129.06 -3.8 3.8 17 -
17 123.43 126.63 -3.2 3.2 12 -
18 119.48 124.91 -5.43 5.43 20 -
19 120.7 122.93 -2.23 2.23 9 -
20 120.65 129.16 -8.51 8.51 22 -
21 127.42 128.15 -0.73 0.73 3 -
22 126.53 123.15 3.38 3.38 14 +
23 120.47 121.22 -0.75 0.75 4 -
24 116.39 123.73 -7.34 7.34 21 -
25 116.59 125.92 -9.33 9.33 23 -

+
From the above table T = 1+11+14+15+16 = 57,
T − = 2+3+4+5+6+7+8+9+10+12+13+17+18+19+20+21+22+23+24+25 268 and =

T 0 = min (57,268) = 57.


200

n ( n+1) 25×26
μT = = =162. 5
Since n=25 , 4 4 ,
n(n+1 )(2n+ 1) 1/ 2 25×26×51 1 /2
σ T =[ ] =( ) =37 . 16517
24 24 and
T 0 −μT 57−162. 5
z0= = =−2 .839
σT 37 . 16517 .
Z 0. 01=−2 .326
The critical region is R = {z0 | z0 < }.

Since 0
z =−2 .839<−2. 326
, H0 is rejected, and it is concluded that the blood pressure
readings are higher during conditions of stress.

13.2.3 The normal approximation to the Wilcoxon signed rank test with tied ranks

When pairs of observations are tied (have the same value) the differences between them will
be 0. In such cases the 0 values are removed from the data and the absolute values of the
remaining values ranked. When tied values are encountered, average ranks are allocated to
the tied values. The formula for σ 2T is modified to

g
n ( n+1 )( 2 n+1 ) 1
σ 2tied = − ∑ (t¿¿ i 3 ¿−t i )¿ ¿ ,
24 48 i=1

where t i is the number of tied ranks in group i=1 , 2 ,⋯ , g .

Example

A B difference
2.5 2 0.5
3.5 1.5 2
2 2 0
1.5 4 -2.5
4 3.5 0.5
3.5 4 -0.5
3 3 0
2.5 2 0.5
4 3.5 0.5
3.5 2.5 1
3.5 3.5 0
2.5 1.5 1
2 2 0
3 3 0
1.5 2.5 -1
1.5 1.5 0
1.5 1.5 0
201

2 2.5 -0.5
3.5 2.5 1
1.5 1.5 0
3 2 1
3.3 2.8 0.5
1.9 1.4 0.5
2.6 2.6 0
1.7 1.5 0.2

Perform a signed rank test for the difference between the means for the paired data. Use
α =0.05 .

H 0 :md=0 , H 1 : md ≠0

A B difference abs rank sign


1.5 4 -2.5 2.5 16 -
1.5 2.5 -1 1 12 -
2 2.5 -0.5 0.5 5.5 -
3.5 4 -0.5 0.5 5.5 -
1.7 1.5 0.2 0.2 1 +
1.9 1.4 0.5 0.5 5.5 +
2.5 2 0.5 0.5 5.5 +
2.5 2 0.5 0.5 5.5 +
3.3 2.8 0.5 0.5 5.5 +
4 3.5 0.5 0.5 5.5 +
4 3.5 0.5 0.5 5.5 +
2.5 1.5 1 1 12 +
3 2 1 1 12 +
3.5 2.5 1 1 12 +
3.5 2.5 1 1 12 +
3.5 1.5 2 2 15 +

The 9 tied pairs are deleted from the data. The n=¿16 non-zero paired differences, the ranks
of their absolute differences and signs of ranks (+ or -) are shown in the above table.

From the above table: T −¿=16 +12+(5.5 × 2)=39¿,


T
+¿=1+(5.5 ×6 )+(12 ×4 )+15=97 ¿
and T 0=min ( 39 , 97 ) =39.

The tied ranks and their group sizes are shown below.

rank 5.5 12
group size 8 5

2
From the above table: t 1=8 , t 2=5 and ∑ (t ¿¿ i3 ¿−t i)=(8 ¿¿ 3−8)+(5¿¿ 3−5)=¿ ¿¿ ¿ ¿ 624.
i=1
202

16 × 17 2 16 × 17 ×33 624
μT = =68 and σ tied = − =361 .
4 24 48

39−68
z 0= =−1.526. Since z 0 lies between z 0.025=−1.96 and z 0.975=−z 0.025=1.96 , H 0
√ 361
cannot be rejected and it is concluded that there is no difference between the means. The p-
value is 2 × P ( Z ≤−1.526 )=0.127 .

13.3 The Wilcoxon-Mann-Whitney rank sum test

13.3.1 Small samples (one or both sample sizes 10 or less)

This test is the nonparametric equivalent of the two sample tests based on independent
samples. Random samples of sizes
n1 and n2 are drawn from populations 1 and 2
respectively, where the populations are labelled such that 1
n ≤n
2 . If 1 n =n
2 the labelling of
populations does not matter. It is assumed that the two populations are identical except for a
difference in location. The hypothesis to be tested is H0:
m1 =m2 , wherem 1 and m 2 are the
medians for populations 1 and 2 respectively.

The test statistic is calculated as follows.

1 Pool (join) the two samples and allocating ranks to the observations in the pooled sample.

2 Let
R01 denote the sum of the ranks associated with the observations drawn from
R
population 1 and 02 the sum of the ranks associated with the observations drawn from
n1 ( n1 +1 ) n ( n + 1)
w 01 =R01 − w 02=R012 − 2 2
population 2. Let 2 and 2 .

3 The test statistic is w 0 =min (w01 , w 02 ) .

The critical region can be written down by specifying 100(1−α) %, the level of significance,
and looking up critical values from the tables of the rank sum distribution.

Wilcoxon-Mann-Whitney rank sum test for the difference between medians


(independent samples)
Step 1: State null and alternative hypotheses.
H0:
m1 =m2
H1a:
m 1 < m2 or H : m 1 > m2 or H : m1 ≠m2 .
1b 1c
w
Step 2: Calculate the test statistic 0 .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.
203

(i) For alternatives H1a and H1b the critical region is R = {w0 | w0 < Wα}.

(ii) For alternative H1c the critical region is R = {w0 | w0 < Wα/2}.
Step 4: If the test statistic is in the critical region reject H0, otherwise do not reject H0.
Step 5: State conclusion in terms of the original problem.

Example

The strengths of two types of papers are to be compared. The one type of paper is made using
a standard process and the other by treating it with a chemical substance. A random sample of
size 10 is selected from each of the two types of paper and the strengths measured. The data
are shown below.

strengt
h Type
Standar
1.21 d
Standar
1.43 d
Standar
1.35 d
Standar
1.51 d
Standar
1.39 d
Standar
1.17 d
Standar
1.48 d
Standar
1.42 d
Standar
1.28 d
Standar
1.4 d
1.49 Treated
1.38 Treated
1.67 Treated
1.5 Treated
1.31 Treated
1.29 Treated
1.52 Treated
1.37 Treated
1.44 Treated
1.53 Treated

Test, at the 5% level of significance, whether the treated paper has greater strength than the
standard paper.
204

Since 1
n =n =10
2 , either population can be labelled as population 1. The “standard process”
paper will be labelled population 1 and the “treated process” paper population 2.

H0:
m1 =m2

H1:
m 1 < m2

Calculation of the test statistic

Strengt
h rank Type
Standar
1.21 2 d
Standar
1.43 12 d
Standar
1.35 6 d
Standar
1.51 17 d
Standar
1.39 9 d
Standar
1.17 1 d
Standar
1.48 14 d
Standar
1.42 11 d
Standar
1.28 3 d
Standar
1.4 10 d
1.49 15 Treated
1.38 8 Treated
1.67 20 Treated
1.5 16 Treated
1.31 5 Treated
1.29 4 Treated
1.52 18 Treated
1.37 7 Treated
1.44 13 Treated
1.53 19 Treated

From the above table it follows that

n1 ( n1 +1 ) 10×11
R01= 2+12+6+17+9+1+14+11+3+10=85 ; w 01=R01− 2
=85−
2
=30
.
205

n2 ( n2 +1 ) 10×11
R02= 15+8+20+16+5+4+18+7+13+19=125 ; w 02=R02− 2
=125−
2
=70
.
Test statistic: w 0 =min (w01 , w 02 )=min (30 ,70 )=30 .

From the Wilcoxon rank sum table with n1=n2=10 , α =0.05 (one-tailed) it is found that
W 0 . 05=27 .

The critical region is R = {w0 | w0 < 27}.

Since 0
w =30>27 , H0 is not rejected, and it is concluded that there is no evidence that the
treated paper has greater strength than the standard paper.

It should be noted here that H0 is close to being rejected and that when larger or different
samples are used the conclusion might change.

13.3.2 Large samples (both sample sizes greater than10)

When the sample sizes are both greater than 10, the normal approximation to the rank sum
test can be used. The hypotheses to be tested are the same as those formulated in the previous
w 0 −μW
Z 0=
section. The test statistic used is σW w
, where 0 is calculated as explained in the
n n n n ( n +n + 1) 1/ 2
μW = 1 2 , σ W =[ 1 2 1 2 ]
previous two sections and 2 12 . The critical region is
determined from the fact that
Z 0 is approximately normally distributed.

The normal approximation to the Wilcoxon rank sum test


Step 1: State the appropriate null and alternative hypotheses
H0:
m1 =m2
H1a:
m 1 < m2 or H : m 1 > m2 or H : m1 ≠m2 .
1b 1c
w0 −μW
z0=
Step 2: Calculate the test statistic σ W , where
n n n n ( n +n + 1) 1/ 2
μW = 1 2 , σ W =[ 1 2 1 2 ]
2 12 .
Step 3: State the level of significance α and determine the critical value(s) and critical
region.

(i) For alternatives H1a , H1b the critical region is R = { z0 | z0 < Zα }.

(ii) For alternative H1c the critical region is R = {z0 | z0 < Zα/2}.

Step 4: If z0 lies in the critical region, reject H0, otherwise do not reject H0.

Step 5: State the conclusion in terms of the original problem.


206

Example

Fifteen experimental batteries were selected at random from a lot at pilot plant A, and 15
standard batteries were selected at random from production at plant B. All 30 batteries were
simultaneously placed under an electrical load of the same magnitude. The first battery to fail
was an A, the second a B, the third a B, and so on. The following sequence shows the order
of failure for the 30 batteries:

ABBBABAABBBBABABBBBAABAAABAAAA

Using the large-sample theory for the rank sum test, determine whether there is sufficient
evidence to conclude that the lengths of life for the experimental batteries tend to be greater
than the lengths of life for the standard batteries. Use α = .05.

Denote the plant A batteries as population 1 and the plant B ones as population 2. Since the 2
samples are of equal size,
n1 =n 2=15 .

H0:
m1 =m2
H1:
m 1 > m2

Calculation of the test statistic


Failure A B B B A B A A B B B B A B A
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Failure B B B B A A B A A A B A A A A
Rank 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

R01= 1+5+7+8+13+15+20+21+23+24+25+27+28+29+30 = 276


R02= 2+3+4+6+9+10+11+12+14+16+17+18+19+22+26= 189

n1 ( n1 +1 ) 15×16
w 01 =R01 − =276− =276−120=156
2 2
n ( n + 1) 15×16
w 02=R012 − 2 2 =189− =189−120=69
2 2
w 0 =min (w01 , w 02 )=min (156 , 69)=69

n n
1 2 15×15
n1 =n 2=15 , μW = 2 = 2 =112 .5 ,
n1 n2 ( n1 + n2 +1 ) 1/ 2 15×15×31 1/ 2
σ W =[ ] =( ) =24 .10913
12 12

w0 −μW 69−112. 5
z0= = =−1 . 8043
σW 24 .10913
207

The critical region is R = {z0 | z0 < Z0.05 =-1.645}.

z 0 =−1. 8043<−1. 645


Since , H0 is rejected, and it is concluded that the lengths of life for the
experimental batteries (A) tend to be greater than the lengths of life for the standard batteries
(B).

13.3.3 Normal approximation to Wilcoxon-Mann-Whitney rank sum test with tied


observations

When tied observations are present in the two samples the following modifications to the
testing procedure are needed.

1 The average rank is allocated to tied observations.


K

2 The modified variance formula σ 2ties=¿ n1 n2 (n ¿ ¿ 1+n2 +1)


∑ (t¿¿ k 3−t k ) is
−n1 n2 k=1 ¿¿
12 12 n(n−1)
used.

In the above formula n=n1 +n2 , K is the number of groups of unique ranks with ties and t k is
the number of tied ranks in group k .

Example

The earthquake magnitudes in Chile according to the location (ocean or land) are shown in
the table below.

magnitude location magnitude location


39 ocean 41 land
40 ocean 43 land
41 ocean 43 land
43 ocean 43 land
43 ocean 44 land
44 ocean 44 land
45 ocean 45 land
48 ocean 46 land
54 ocean 50 land
63 ocean 51 land
68 ocean 51 land
68 ocean

Test whether the magnitudes of ocean earthquakes are greater than those of land ones. Use
α =0.10 .

H 0 :mocean=mland , H 1 : mocean> mland


208

magnitud
e location rank
41 land 3.5
43 land 7
43 land 7
43 land 7
44 land 11
44 land 11
45 land 13.5
46 land 15
50 land 17
51 land 18.5
51 land 18.5
39 ocean 1
40 ocean 2
41 ocean 3.5
43 ocean 7
43 ocean 7
44 ocean 11
45 ocean 13.5
48 ocean 16
54 ocean 20
63 ocean 21
68 ocean 22.5
68 ocean 22.5

Label land as population 1 and ocean as population 2.

R01=3.5+ (7 × 3 ) + ( 11×2 )+13.5+ 15+17+ ( 18.5 ×2 ) =129.

23 ×24
R02= −129=147. n1=11 , n2 =12 .
2

11×12 12 ×13
w 01=129− =63 , w 02=147− =69 .
2 2

w 0=min ( w01 , w02 )=min ( 63 , 69 ) =63.

n 1 n2 11×12
μW = = =66
2 2

The tied ranks and their group sizes are shown below.

rank 3.5 7 11 13.5 18.5 22.5


group size 2 5 3 2 2 2

t 1=2 ,t 2=5 , t 3=3 , t 4 =2 ,t 5=2 , t 6=2 . K=6.


209

∑ (t ¿¿ k 3−t k )=4 ( 23−2 ) +( 5 3−5 ) +( 3 3−3 ) =168 ¿, n=n1 +n2=11+12=23.


k =1

2
σ ties ¿ n1 n2 (n ¿ ¿ 1+n2 +1)
∑ (t¿¿ k 3−t k ) 11×12 ×24 11×12 ×168
−n1 n2 k=1 = − =260.3478 ¿ ¿
12 12 n ( n−1 ) 12 12× 23 ×22

w 0−μW 63−66
z 0= = =−0.186 .
σ ties √ 260.3478
Since z 0=−0.186 > z 0.10=−1.282 , H 0 cannot be rejected and it is concluded that the medians
are probably equal. p-value = P ( Z ≤−0.186 )=0.426 > 0.10.

Additional reading:

https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/

To be added to the notes

Exercises

Table of random numbers


Normal table
t distribution table
Chi-square distribution table
F distribution table
Critical values of the Wilcoxon signed rank test
Critical values of the Wilcoxon-Mann-Whitney rank sum test
Sharp calculator (model – EL – 531WHB) use (short manual)
Casio calculator (model – fx 82ZA Plus) use (short manual)

Appendix – Excel: Data manipulation and functions


A Data manupilation, Dot plot, Pie chart, Q-Q plot
A1 To export data from a Word/Text file to excel.
1 Highlight data in the Word/Text file, right click and select “copy”.
2 In excel, click on the first cell you want to paste the data. Then select Home – Paste –
Paste Option (right hand option).
A2 To move data from one location to another in excel.
1 Highlight data in excel to be moved.
1 Right click and select “cut”.
2 Click on first cell of the excel area you want to move the data to and press Ctrl and V
simultaneosly.

A3 To sort data in an excel column.


1 Highlight data in excel to be sorted.
2 In excel select Data – Sort, Specify the order (smallest to largest, largest to smallest) and
click ok.
A4 To do a dot plot.
210

The data to be sorted should be in a single column. If the data values are not in a single
column, move the values into a single column (see explanation A2 above).
Suppose the data are in cells A1 to A80.
1 Type =countif($A$1:$A1,A1) in cell B1.
2 Position the cursor in the bottom right hand corner of cell B1 and drag it down to cell B80.
3 Highlight the data in A1 to B80 and select Insert – Charts – Scatter (top of excel sheet)
and click on the Scatter icon to produce the dot plot.
A5 To do a Pie chart.
1 Type the names of the components in the chart in column A and their corresponding sizes
in column B in an excel sheet.
2 Highlight the data in columns A an B in the excel sheet.
3 Go to the top of the excel sheet and select Insert. Then click on the pie chart (round) icon.
A6 Q-Q plot (See notes)
B Excel add-in Data Analysis
See notes for making the add-in available in excel.
B1 Descriptive Statistics
Highlight the data in excel.
Data – Data Analysis – Descriptive Statistics: In the window that appears, specify input range
(cells with data) and first cell of output. Select Summary Statistics, Confidence level for
mean, kth largest and kth smallest.
B2 Data Analysis (Analysis Tools) – sampling
Data to be sampled put in a single column
Data – Data Analysis – sampling. Specify input range (data to be sampled), number of
samples and output range (cell containing first value of outout)
B3 Data Analysis (Analysis Tools) (t-test – paired two sample, two sample equal variances,
two sample unequal variances).
B4 Data Analysis (Analysis Tools) – Correlation, Regression. Data in two columns (x and y
columns).
B5 Data Analysis (Analysis Tools) – Anova: Single Factor. Data in 2 or more coumns.
C Excel functions
=max(cells with data) – minmum value.
=min((cells with data) – maximum value.
=frequency(cells with data, cells with boundaries) – frequency distribution counts.
=sum(cells with data)
=sumsq(cells with data) – sum of squares.
=sumproduct(cells with data 1, cells with data 2) – sum of products of two data sets.
=log(value) gives logarithm with base 10.
=ln(value) gives logarithm with base e (natural logarithm).
=average(cells with data) – mean.
=stdev(cells with data) – standard deviation.
=var(cells with data) – variance.
=percentile(cells with data,percent/100) – Percentile value for a given percentage. Special
cases are 1st decile(percent =10), 2nd decile (precent =20), 1st quartile (percent=25), 3rd decile
(percent=30), 4th decile (percent=40), median (percent = 50), 6th decile (percent=60), 7th
decile (percent=70), 3rd quartile (percent= 75), 8th decile (percent=80), ), 9th decile
(percent=90).
multiplication (*), division (/), exponentiation (^), sqrt (square root)
=BINOM.DIST(x,n, p, cumulative =TRUE or cumulative =FALSE)
=combin(n, i) where n – number selected from and i – number selected
=fact(n), n !=n(n−1)(n−2)⋯2.1
211

=POISSON(x,mean, cumulative =TRUE or cumulative =FALSE)


=NORMDIST(x, mean, standard deviation, cumulative =TRUE or cumulative =FALSE) –
see notes
NORMINV (probability, mean, standard deviation) – see notes
=RANDBETWEEN(a, b) – a and b are the lower and upper values of the interval you want to
sample from.
=RAND() – No argument inside the brackets. Randomly selects a number between 0 snd 1.
tdist (for p-value); tinv (critical value) – see notes
chidist (for p-value); chiinv (critical value) – see notes
fdist (for p-value); finv (critical value) – see notes
Doing a scatterplot
1 Import data (coulumns of x and y values) into excel. Highlight, copy (In document with
data), home, paste (In Excel).
2 Highlight data in excel and select “Insert” and “Chart”. Specify scatterplot.
3 To add line, put cursor on any plotted point in the plot, right click and select “Add
trendline”.

You might also like