Seta)
Edexcel AS and A Level
Modular Mathematics
SyContents
About this book
1 Combinations of random variables
1.1 Finding the distribution of random variables,
2 Sampling
2.1. Populations and sampling
2.2 Random sampling
2.3. Simple random sampling
2.4 Other methods of sampling
2.5. Non-random sampling.
2.6 Primary and secondary sources of data
3. Estimation, confidence intervals and tests
3.1 Concept of a statistic and sampling distribution
3.2 Estimation of population parameters using a sample
3.3. Standard error of the mean
3.4 The Central Limit ‘Theorem
3.5. Confidence intervals,
3.6 Hypothesis tests
3.7 Hypothesis test for the difference between two means
3.8 Large samples
Review Exercise 1
4 Goodness of fit and contingency tables
4.1. Forming a hypothesis
4.2. Goodness of fit
4.3 Degrees of freedom
4.4 The chi-squared (,2) family of distributions
4.5. Testing your hypothesis
4.6 The general method for testing the goodness of fit,
4.7. Applying goodness-of-fit tests to discrete data
4.8 Applying goodness-of-fit tests to continuous distributions
4.9 Contingency tables
10
1B
15
7
21
22
27
31
35
39.
45
50
54
63
67
68.
69
70
70
n
3
74
84
15 Regression and correlation
5.1 Spearman's rank correlation coefficient
5.2. Testing the hypothesis that a correlation coefficient is zero
5.3 Testing the hypothesis that Spearman's population rank correlation
coefficient is zero
Review Exercise 2
Examination style paper
101
102
107
im
19
126
128
140After completing this chapter you should be able to:
combine independent normal random variables
combine linear combinations of independent normal
random variables.
Combinations of
random variables
A sweet manufacturer produces two varieties of
fruit sweet, Xtras and Yummies. The weights,
X and Y, in grams, of randomly selected Xtras
and Yummies are such that X ~ N(30, 25) and
Y ~ N(32, 16). The manufacturer wishes to work
‘out the probability that the average weight of a
packet containing 6 Xtras and 4 Yummies lies
between 28 g and 33g. By the end of this
chapter you will know how to combine these
distributions and work this out.1.1 You need to be able to find the distribution of a combination of random variables.
MIX and ¥ are two random variables then
MIX and ¥ are two independent random variables then
© Var(X + ¥) = Var(X) + Var(¥)
# Var(X - ¥) = Var(X) + Var(¥)
‘The proofs of these relationships are not needed at this stage but they are used when combining
independent variables,
If X is a random variable with E(X) = jx, and Var(X) = 0? and ¥ is an independent random
variable with E(Y) = iy and Var(¥) = «3? find the mean and variance of:
ax+y, bXx-¥.
E(X + ¥) = £(X) + £1)
bt pe
Var(X + Y) = Var(X) + Var¢
Stott ae
— Variances are always added.
bo EK-Y¥) =X) - z. ad
b= be A
Var(X — Y) = Var(X) + Var(Y)
ates
In S1, Chapter 8 the following properties of expectation were introduced.
© E(aX) = aE(X)
© Var(aX) = @VantX)
IE Using these it can be shown that
+ E(aX + bY) = E(X) + bE(Y)
+ E(aX — bY) = o€(X) ~ bE(Y)
# Var(aX + bY) = @?Var(X) + b?Var(¥)
# Var(aX ~ bY) = 0? Var(X) + b?Var(¥)
linear combination of normal variables is also normal and so if X ~ N (y,, 12) and
Y~N (ju, o72) and X and ¥ are independent.
then
© aX + bY ~N(op, + by, a? 0? + Pa?)
+ aX ~ BY ~ N(om, ~ buy, a2 0? + Bo?)Combinations of random variables
The general form can be extended to any number of random variables.
For example Var(X, + X2) = 0)? +
Xt Xp tt Xo tN, + Yo ten +¥5, ae elo
~ N(Opy + Sua Soy? + 502") aes)
X,~ NUS, 3) and X2~ NO, 2). (15, 3) means that the mean is
If X, and X; are independent find the distribution of Y where: 15 and the variance is 3.
a Y=X\+X,
b Y= 4X,— 2%.
a Y=X,+X,~N(6+6,3+2) Pana
Xy + Xa~ NG + a, + 02)
Y~ N(2i,5)
X,— 2X ~NAX 1 2X 6,16KS+4X 2)
Y~ N(48, 56) Using 4X; ~ 2X,
~ NG ~ 2itz, Po? + 2,2)
en
If X,, X; and X, are independent normal random variables such that X; ~ N(8, 28), X» ~ N(13, 2°)
and X, ~ N(18, 38) and Visa random variable defined by Y = 3X, — X + X, find the distribution
of.
Y~NGX 8-13 +18,
Y ~ N(29, 49)
X4+44 9)
ee
Bottles of mineral water are delivered to shops in crates containing 12 bottles each. ‘The weights
of bottles are normally distributed with mean weight 2kg and standard deviation 0.05 kg. The
weights of empty crates are normally distributed with mean 2.5 kg and standard deviation 0.3kg.
a Assuming that all random variables are independent, find the probability that a full crate will
weigh between 26 kg and 27 kg,
'b Two bottles are selected at random from a crate. Find the probability that they differ in
weight by more than 0.1 kg.
\d the maximum weight, M, that a full crate should have on its label so that there is only a
1% chance that it will weigh more than M.The weight of a full crate
= the weight of 12 bottles
a LebW=X,+X, ++ Xp tC:
where X ~ N(2, 0.05%) and C ~ N(25, 0.3%): Re eon one cee
E(M) = 12E(X) + E()
= (2x2) +25
= 265
Var(W) = 12 Var(X) + Var (C) Using an extended version
(12x 0.08%) + (0.34) ca ey
=012 ———
W~ N(265, 012)
“1a Z< 144)
144) — (144)
etaaaaee niet a Wea
= 0850 (0.851) may use a calculator or
the tables so both these
b EX Xx) =0 answers are acceptable,
Var(X; ~ X,) = Var(X,) + Var(X)
0.08? + 0.082 ‘The diffrence in weight
between two bottles is
required tobe 0.1. Ether
bottle may be heavier
therefore
=27(z> 2a 08 Kye < -0.1 oF
¥0.005 XX, > 01
2(1 — (1.41)
= 0.005
P(X, — Xz] > O.) = 2F(X, - Xz > 0.)
Give your answer to a
minimum of 3s.
= 0.159 (0.167)
¢ F(W>M) =001
Look up 0.01 in the
percentage points of the
normal distribution table.
1| Given the random variables X ~ N(B0, 3%) and ¥ ~ N(50, 2) where X and Y are independent
find the distribution of W where:
aW=X+y, bWw=X-y¥,
2) Given the random variables X ~ N(4S, 6), ¥ ~ N(S4, 4) and W ~ N49, 8) where X, ¥ and W
are independent, find the distribution of R where R= X + ¥ + W.10
Combinations of random variables
X; and X, are independent normal random variables. X; ~ N(60, 25) and X; ~ N(S0, 16).
Find the distribution of T where:
a T= 3X, b T= 7X, © T= 3X, 47%), d T= X,~ 2X.
Y;, Y;and ¥, are independent normal random variables. Y, ~ N(8, 2), ¥2 ~ N(12, 3) and.
Y3~ N(15, 4). Find the distribution of A where:
a Aa + Yat Vy bA=%3-%, © AZ=Y,~ Yat 3Yy
A= 3%, + 4Yy, @ A=2Y,- V+ Vy
A, Band C are independent normal random variables. A ~ N(50, 6), B ~ N(60, 8) and.
C ~ N(B0, 10), Find:
a P(A +B <115), b PA +B + C> 198), © PB + C< 138),
a PRA+B-C<70), © PA + 3B-C>140), f P(105 < (A +B) < 116).
Given the random variables X ~ N(20, 5) and ¥ ~ N(, 4) where X and ¥ are independent, find
a BX - ¥), b VarX - ¥), © PI3
* know the advantages and disadvantages of the different
methods of sampling, {
Sampling
A charity wants you to find out which of their
proposed projects is most popular with the public.
What should you do? Do you interview everyone in
the country or do you take a sample?
Ifyou take a sample, what method should you use,
and why?
After reading this chapter you would be able to
answer these questions and carry out such an
investigation.“CHAPTER 2
Statistically a population is the whole set of,
items that are of interest.
Information may be obtained from a population
by taking a census or by taking a sample. The information obtained is known as raw data,
Taking a census
MA census observes or measures every
member of a population.
Perhaps the best known census is that conducted by the British Government. In this census every
known householder in Great Britain receives a census form every 10 years. Each householder is
required by law to complete and return the form by a certain date. The census form records a
variety of information, such as the number of people present, their ages, and so on.
[Link] is used if
‘+ the size of the population is small, or if
‘© extreme accuracy is required.
Sampling
A sample is a selection of observations taken
from a sub-set of the population, which is used.
to find out information about the population
as a whole. This is known as a sample survey.
‘This is a census.
4 Every member of the
popullation is used
Thisisa sample
4 survey. selection of
fobservations are taken
Population of the heights of a and used.
class of 25 pupils
The sample will be truly representative of the population as a whole provided that you select it so
that itis free from bias. To do this you must make sure that your selection is truly random.
‘The size of a sample (the number of people of units
sampled) does not depend entirely on the size of the
population. It depends on the accuracy you require
and the resources you are willing to allocate to data
collection. A large sample will usually be more accurate
than a small one, but will need greater resources,QF.
samping
The number of tems or people sampled may aso be affected by the natu ofthe population:
if the population is very variable you will require a larger sample size than you would if the ns
population were more uniform,
Both methods have advantages and disadvantages.
Advantages Disadvantages
Census | « Itshould give a completely accurate result. | Its very time consuming and expensive.
«# It cannot be used when the testing
process is to destruction (for example,
testing an apple for sweetness).
+ The information is difficult to process
because there is so much of it
Sample | « A sample survey costs less than a census. © The data may not be as accurate.
survey | + Results are obtained quicker for a sample The sample may not be large enough to
survey than for a census. sive information about small sub-groups
‘+ Fewer people have to respond in the sample. | _of the population.
«There is less data to deal with than in a census.
ea
Give a brief explanation, and an example of the use of,
a acensus b a sample survey.
a Census ~ every member of the population ia observed. Don’t forget to
“ give both an
Example: 10-year national census. explanation and
b Sample survey — a emall portion of the population is observed cI]
Example: opinion polls.
2.2. You need to know about random sampling.
In random sampling cach unit is chosen entirely by chance and each member of the
population has a known chance of being included in the sample.
Sampling with and without replacement
Ifthe unit selected at each draw is replaced into the population before the next draw, then
it can appear more than once in the sample. This is known as sampling with replacement
or unrestricted random sampling.
If the unit is not replaced, so that only those units that have not previously been selected are
eligible for the next draw, then it is known as sampling without replacement.
‘Two well-known examples of random sampling are ERNIE (Electronic Random Number
Indicating Equipment), which is used to select winning numbers on Premium Bonds, and the
selection of numbers for the national lottery.Suppose you wish to take a sample from a population of size N.
I Asample of size nis called a simple random sampl
every other sample of size n has an equal chance of being
selected.
‘Simple random sampling is sampling without replacement,
To do simple random sampling you need a sampling frame.
H_ Asampling frame is a list identifying every single sampling unit that could be included in
the sample.
Simple random sampling
Advantages
Disadvantages
Provided that the population is small in size
« itis cheap to do
«itis simple to do
standard formulae can be used to analyse
‘the results
# cach person or unit is included only
once.
‘* Itisnot suitable where the population
size is large.
‘* A sampling frame is required.
There are two simple techniques that are commonly used and do not require elaborate equipment
random number sampling
lottery or ticket sampling.
In random number sampling each element of
‘Once you have done this you can use tables of random sampling numbers such as the one at the
integers starting from 0, i. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
‘The table is constructed with great care so that each digit is equally likely to appear.
Suppose you want a sample of 50. You will need to select 50 random
numbers from the table. You could start at the top left hand corner
and work down the column, If you reach the bottom of the table
you could start again at the top with the next unused digits along
but it is better to start at a randomly selected place in the table, and you may travel in any
direction. If a number appears that has already appeared, itis ignored (in effect this is then
Random number sampling
‘the sampling frame is assigned a number.
back of this book (table on page 139). These tables contain 1000 or more digits, that is to say,
the top row.
To obtain a set of random numbers, you may start at the top of the table and read downwards,
ing without replacement).r.
Sampling
‘Once you have extracted 50 random numbers, the sample is selected from the numbered
sampling frame by using these numbers.
‘In random number sampling, each element is given a number and the numbers of the required
elements are selected by using random number tables or other random number generators.
Sample
Sampling Random
frame numbers
‘You are going to take a sample of 50 from a population of size 400, Write down the first five random.
numbers starting at the seventh column from the left of table on page 139 and working down,
Starting at the top form 3 digit
numbers. The first is 372.
The seventh column begins
3726 -—
03 94 Continue down the column. =
1726
4g), ——_____}_[Ilaiere4ts because i grenar tia
87 55 — dane
2617 » Ignore 875 and 951.
"78 -————__________ the next number is 117.
05 33+—__
The numbers are 372, 039, 172, 117, and 063.)
‘The last is 053.
‘Computers and calculators can produce lists of random numbers.
Random number sampling has advantages and one disadvantage,
Random number sampling
Advantages Disadvantage
‘The numbers are truly ‘© Itismot suitable where the
random and free from bias. | population size is large.
+ Tt is easy to use.
Each number has a known,
equal chance of selection.“CHAPTER 2
Lottery sampling
In lottery sampling each element of the population i
identified by some characteristic such as
a name or number, and this is put on a ticket. The tickets, witich should all be the same size and
shape, are put into a container and are drawn one at a time (without replacement). The elements
of the population corresponding to the tickets are selected.
Lottery sampling
Advantages Disadvantages
© The tickets are drawn at random, «+ It isnot suitable where the population
# Itis easy to use. size is large.
+ Each ticket has a known chance of selection. |» A sampling frame is needed.
Describe what is meant by a random sample, and give one advantage and one disadvantage
associated with it,
‘A random sample io one in which every other possible
J} sampie of size n has an equal chance of being selected,
Advantage: It is
free from bias.
‘Any of the other advantages
could have been given here.
Disadvantage: It Is not suitable for large sample sizes,
‘The 100 members of a yacht club are listed numerically in the club’s membership book.
‘The committee wants to select a sample of 12 members to fill in a questionnaire about the
facil
ies offered by the club.
a Explain how the committee could use a table of random numbers to take a simple random
sample of the members.
b Give one advantage of this method over taking a census,
a Allocate a two-digit: number to each person, ‘Sth column and 7th row, for
starting at OO and ending at 99. example
56, 86, 80, 57, 11, 78, 40,
Select a random starting point in the table.‘ ———- — _ 3° 58, 40, 86, 14, 31 across.
Select 12 random numbers,
Go back to the original population and select the
people corresponding to these numbers.
56, 71, 66, 87, 09, 11, 48,
14, 33, 79, 12, 02 vertically.
Note you have to select 13
‘numbers in the first case
b A sample survey costs less than a census. ase
REAPS SUEY COONS NOE EGE AONE _|~ because 86 occurs twice so is
oR ignored the second time.
Results are obtained quicker.
oR
"ewer people have to respond in the sample.r.
Sampling
1) Explain briefly what is meant by the term sampling and give three advantages of taking a
sample as opposed to a census.
2) Define what is meant by a census. By referring to specific examples, suggest two reasons why
a census might be used.
3] A factory makes safety harnesses for climbers and has an order to supply 3000 harnesses.
‘The buyer wishes to know that the load at which the harness breaks exceeds a certain figure
Suggest a reason why a census would not be used for this purpose.
4) Explain:
‘a why a sample might be preferred to a census,
b what you understand by a sampling frame,
‘© what effect the size of the population has on the size of the sampling frame,
what effect the variability of the population has on the size of the sampling frame.
5) Using the random numbers 4 and 3 to give you the column and line respectively in the
random number table (table on page 139), select a sample of size 6 from the numbers:
a 0-99 b 50-150 © 1-600
2.4 You need to know about other methods of sampling. =
Systematic sampling
In systematic sampling the required elements are chosen at regular intervals from an.
ordered list.
‘To take a systematic sample, you take every SSN EIS EGT OLED MoT 00)
th element from a sampling frame, where k,
‘the sampling interval, is calculated as:
Pick a number at random between 1 and 8, and
population size (N) ‘fits, say, the number 3, start at the third name
“Sample size (7) ‘on the list followed by the 11th, 19th, etc.
‘To overcome the objection that the first name is bound to be selected, you introduce a direct
‘element of randomness by selecting the first item randomly.
Sample
Method
Sampling
frame“CHAPTER 2
When you are selecting the interval, itis possible to introduce bias if you are not careful. Suppose
‘you were investigating the mean rainfall each month over 100 years: an interval of 12 months
‘would introduce bias, as you would be looking at the same month in each year.
Systematic sampling is used when:
‘* the population is too large for simple random number sampling.
Systematic sampling
Advantages Disadvantages
«© [tis simple to use. « Itis only random if the ordered
‘* Itis suitable for large samples. | _ list is truly random,
«_Itcan introduce bias.
Stratified sampling
This is a form of random sampling in which the
population is divided into groups or categories which
are mutually exclusive, so no individual or item can be
in two groups, and it is used where we may expect the
observation of interest to vary between the different
groups. These groups are called strata (singular:
stratum). The strata would be decided according to one
or more criteria such as gender, age, religion and so on,
Within each of these strata a simple random sample is
selected. The same proportion of each stratum is taken in
‘the sample as is found in the population, so that each stratum will be represented in the correct
proportion in the overall result.
‘The number sampled ina stratum = umber in stratum.
X overall sample size
‘ number in population
In stratified sampling the population
divided into mutually exclusive
strata and a random sample is
taken from each. Sampling frame
‘of population
The proportion for each
stratum is the same as
that in the population.
Splitinto
Sampling frame ‘Sampling frame
Men ‘Womenrr.
\
'
Sampling
ee
‘A factory manager wants to find out what his workers think about the factory canteen facilities.
He decides to give a questionnaire to a sample of 80 workers. Itis thought that different age
groups will have different opinions,
There are 75 workers between 18 and 32.
‘There are 140 workers between 33 and 47,
There are 85 workers between 48 and 62.
a Write down the name of the method of sampling the manager should use.
b Explain how he could use this method to select a sample of workers’ opi
a Stratified sampling
b There are: 75 + 140 + 85 = 300 workers altogether.
in the 18-32 age-group:
w a
he will select 2, x BO = 20 workers. For each age group find the number
in the 33-47 age-group, of workers needed for the sample.
he wil elect #2 x 80 = 374
in the 48-62 age-group.
he will elect 322, x BO = 223 = 23 workers. ‘Where the required number of
‘workers is not a whole number
The workers in each age group would be numbered and a WoTwers is not a whole number
random number table (or generator) would produce the r
required quantity of random numbers, the workers
corresponding to these numbers would be asked their opinions,
Find the total number
of workers.
37 workers:
‘number = proportion of workers x 80
the sample is large and
«the population divides naturally into mutually exclusive groups.
Stratified sampling
Advantages Disadvantages
« It can give more accurate estimates |e Within the strata, the problems are the
than simple random sampling where | same as for any simple random sample.
there are clear strata present. «If the strata are not clearly defined they
« Itreflects the population structure. may overlap,
2,5. You need to know about non-random sampling.
‘The chief characteristic of simple random, systematic and stratified sampling is that every
individual has a known probability of being included in the sample ~ the sample is random. Non-
random sampling methods are used when it is not possible to use random methods, for example,
when no sampling frame is available, An example of non-random sampling is quota sampling,“CHAPTER 2
Quota sampling
‘The number of peopl
In quota sampling the population is divided into groups in terms of gender, social class, etc.
each group is set to try and reflect the group’s proportion in the
whole population, The interviewer selects the actual sampling units.
‘When taking a quota sample, as you meet people you
assess thelr age or socio-economic group, etc. After they
have been interviewed, they are put towards the quota
into which they fit. This continues until all the quotas
have been filled. Ifa person refuses to be interviewed,
or the quota into which they would fitis full, then you
simply ignore them and pass onto the next person. In
practice you might also decide to take gender into account,
but the more characteristics you introduce the harder it
becomes to select people fitting al the characteristics.
A quota sampling scheme
‘Age group Socio-economic group Number/Quota
AIB 4
18-29 Cc 13
DIE He
AIB 6
30-44 c v7
DIE 4
AIB 7
45-64 c 7
N DIE 6
AB 4
c 12
DIE 6
‘Total 100
Quota sampling
‘Advantages Disadvantages
# Itenables the fieldwork to be done quickly | «
because a representative sample can be
achieved with a small sample size. .
+ Costs are kept toa minimum.
# Administering the testis easy.
It is not possible to estimate the sampling
errors. (The process is not a random process.)
‘The interviewer has to choose the
respondents and may not be able to judge
the characteristics easily.
Non-responses are not recorded. (Perhaps
the non-tespondent in the constituency
survey did not agree to be interviewed
because he was a ‘don't know’ voter)
It can introduce interviewer bias in who is
included.Ml Primary data are data that is collected by, or on behalf of, the person who is g
the data,
Ml Secondary data are data that is neither collected by, nor on behalf of, the person who is
to use the data, The data are second hand.
Type of data Advantages Disadvantages
Primary | « The collection method Is known. # It is costly in time and
data # The accuracy is known. effort
«The exact data needed are collected,
Secondary |» They are cheap to obtain ~ + Bias is not always recognised.
data government publications, for # It can be ina form that is
example, are relatively cheap. difficult to deal with
* Alarge quantity of data is available,
for example, on the internet.
«Much of the data has been collected
for years and can be used to plot
trends.
(1) Explain briefly the difference between a census and a sample survey.
Write brief notes on:
a simple random sampling, bb stratified sampling,
© systematic sampling, 4 quota sampling.
Your notes should include the definition, and any advantages and disadvantages associated.
with each method of sampling
|2) a Explain the purpose of stratification in carrying out a sample survey
1b The headteacher of an infant school wishes to take a stratified sample of 20% of the pupils
at his school. The school has the following numbers of pupils.
Year1 | Year2 | Year3
40 60 80
‘Work out how many pupils in each age group there will be in the sample.
[3] A survey is to be done on the adult population of a certain city suburb, the population of
which is 2000, An ordered list of the inhabitants is available.
‘a What sampling method would you use and why?
b What condition would have to be applied to your ordered lis if the selection isto be truly
random?“CHAPTER 2
4) Ina marketing sample survey the sales of cigarettes in a variety of outlets isto be investigated.
The outlets consist of small kiosks selling cigarettes and tobacco only, tobacconist’s shops that
sell cigarettes and related products and shops that sell cigarettes and other unrelated products.
@ Suggest the most suitable form of taking a random sample.
b Explain how you would conduct the sample survey.
‘© What are the advantages and disadvantages of the method chosen?
5] a Explain briefly
i why itis often desirable to take samples,
ii what you understand by a sampling frame.
b State one circumstance when you would consider using
i. systematic sampling,
{i_ stratification when sampling from a population,
iii quota samplin,
6) A factory manager wants to get information about the ways his workers travel to work. There
are 480 workers in the factory, and each has a clocking in number, The numbers go from 1 to
480, Explain how the manager could take a systematic sample of size 30 from these workers.
1 Using the random numbers on page 139, and starting at the top of the column with the
number 88 and working down, a simple random sample (without replacement) of size 10
‘was taken of numbers between 0 and 75 inclusive. The first two numbers were 17 and 52.
a Find the other eight numbers in the sample.
b Explain, with the aid of a practical situation, how this set of random numbers could be
used to take a sample of size 10.
2) @ Give one advantage and one disadvantage of using
i acensus, sample survey.
b Its decided to take a sample of 100 from a population consisting of 500 elements. Explain
how you would obtain a simple random sample without replacement from this population.
3) a Explain briefly what you understand by
ia population,
iia sampling frame.
1b A market research organisation wants to take a sample of,
i owners of diesel motor cars in the UK,
ii. persons living in Oxford who suffered from injuries to the back during July 1996.
Suggest a suitable sampling frame in each case.
4) A gym keeps a numbered alphabetical list of their 200 clients.
Explain how you would choose a simple random sample of 40 clients.
5) Write down one advantage and one disadvantage of using:
a stratified sampling, by simple random sampling.10
rt
12
Sampling
‘The managing director of a factory wants to know what the workers think about the factory
canteen facilities. One hundred people work in the offices and 200 work on the shop floor.
He decides to ask the people who work in the offices.
‘a Suggest reasons why this is likely to produce a biased sample.
b Explain briefly how the factory manager could select a sample of 30 workers using:
systematic sampling,
stratified sampling,
quota sampling
A garden centre employs 150 workers. Sixty-five of the workers are women and 85 are
men. Explain briefly how you would take a random sample of 30 workers using stratified
sampling
‘The 240 members of a bowling club are listed alphabetically in the clubs membership book.
‘The committee wishes to select a sample of 30 members to fill in a questionnaire about the
facilities the club has to offer.
a Explain how the committee could use a table of random numbers to take a systematic
sample
b Give one advantage of this method over taking a simple random sample. eo
a Explain briefly what you understand by:
i a population, iia sample.
'b Give one advantage and one disadvantage of taking a sample.
A college of 3000 students has students registered in four departments: arts, science,
education and crafts, The principal wishes to take a sample from the student population
to gain information about the likely student response to a rearrangement of the college
timetable in order to hold lectures on Wednesday, previously reserved for sports.
‘What sampling method would you advise the principal to use? Give reasons to justify your,
choice.
As part of her statistics project, Deepa decided to estimate the amount of time A-level
students at her school spent on private study each week. She took a random sample of
students from those studying arts subjects, science subjects and a mixture of arts and science
subjects. Each student kept a record of the time they spent on private study during the third
week of term.
‘@ Write down the name of the sampling method used by Deepa
b Give a reason for using this method and give one advantage this method has over
simple random sampling,
‘There are 64 girls and 56 boys in a school.
Explain briefly how you could take a random sample of 15 pupils using
a a simple random sample,
b a stratified sample °“CHAPTER 2
Summary of key points
1
2
A population is the whole set of items that are of interest.
A census observes or measures every member of a population.
A sample is a selection of observations taken from a sub-set of the population which is
used to find out information about the population as a whole. This is known as a sample
survey.
A random sample is one in which every possible sample of size m has an equal chance of
being selected.
A sampling frame is a list identifying every single sampling unit that could be included
in the sample.
In random number sampling, each element is given a number to identify it and the
numbers of the required elements are selected by using random number tables or other
random number generators.
In systematic sampling the required elements are chosen at regular intervals from an.
ordered list.
In stratified sampling the population is divided into mutually exclusive strata and a simple
random sample is taken from each. The proportion of the strata in the sample is the same
as the proportion of the strata in the population.
In quota sampling the population is divided into groups in terms of gender, social class,
etc. The number of people in each group is set to try and reflect the group's proportion in
the whole population, The interviewer selects the actual sampling units.After studying this chapter you should
© understand the concept of an unbiased estimate
* appreciate the significance of the Central Limit Theorem
‘* know how to find confidence intervals for the population
mean
* be able to test hypotheses about the population mean 1.
ly
4)
er
as
_
oy
@s
ar
Cr
S
=
=
=
( XN
intervals and tests
The doctors say Jim is of average height. In fact,
he is 1.84 m tall; but how can you find the average
height of adult men? John was also told that he
was of average height, but he is 1.88 m tall. Does
this mean that the doctors are using a range of
values to describe average height, say 1.80 to
1.90m perhaps? Paul is 1.92 tall but he claims to
be of average height. How can you test this claim,
and what basis could you give for saying that Paul
was above average height? In this chapter you will
examine ways of finding estimates, as well as using
probability to test claims like Paul's.“CHAPTER 3
You need to understand the concept of a statistic and a sampling distribution
Imagine that a new company is thinking of selling raincoats to students. The company would
like to know something about the heights of students, and in particular the mean height of a
student. Unfortunately, the number of students is so large that it is not practical to measure every
student and so a method of estimating this mean height is required. ‘The heights of the students.
at the college form a large population. As in book S2, here the mean height of the students
will be called ,. (mu) and the standard deviation of the heights of the students o (sigma), and
‘these parameters will be referred to as population parameters, They are the mean and standard
deviation for the whole population. The company does not know the values of and o-and it
cannot afford the time or money to find them.
‘The problem that the company has is how to estimate the parameter j. In order to answer this
‘question you take a sample from the population. In Chapter 2, several methods of sampling
were discussed but the theory of estimation that is used in this course assumes that a simple
random sample of size 1 is used,
Population X = the height of students
‘The population mean
and population standard
deviation « are unknown
population parameters.
‘The sample wil consist of
‘observations of the random
Sample variable X. These are usually
ofsizen referred t0 as Xi, Xoy wy Xp
A statistic is defined as follows
M IfX,,X,,Xy, .., Xpis.a random sample of size n from some population then a statistic Tis a
random variable consisting of any function of the X, that involves no other quantities.
In particular a statistic should not involve any unknown population parameters.
A sample, X,, X2, .., Xq Is taken from a population with unknown population parameters
and o:
State whether or not each of the following are statistics.
Mth tt vy Beal v%
£ median(X,, X,,
max(, Xo, Xe) ed (%
aEstimation, confidence Intervals and tests
{tis only a function of the sample
Xa, Xa, oor Xp A statistic need
‘ot involve all members of the
sample.
|e. a statietic, ———>
[X= wl
ig not a statistic, >The function contains .
= p2 Ie not a statistic, ——L- The function contains js.
4 max (X,,X,, .., X,) le a statistic, Lis only a funetion of the sample
Xi Xap coop Xe
eae
© DD (XF#/) tenova statiotic, 1 The fneton contains and
median (X, X», .., X,) is 4 statistic, tis only a function of the sample
Xu Xap Xe
Since it is possible to repeat the process of taking a sample, the particular value of a statistic Tin.
a specific case, namely £, will be different for each sample. If all possible samples are taken, then
these values will form a probability distribution called the sampling distribution of T. This
will usually depend upon the distribution of the population X.
The sampling distribution of a statistic Tis the probability distribution of T.
In Chapter 1 you saw how linear combinations of independent normal distributions could be
combined. The rule for Var(aX + bY) in particular required that the random variables X and Y
‘were independent. For this reason the theory in this chapter is based upon the idea of a simple
random sample met in Chapter 2. The sample is usually referred to as a random sample and it
has the following definition:
HL Arandom sample of size n consists of In this chapter (and throughout this,
the observations X;, Xz, Xy,..., X, from a series of books) we shall distinguish
i between the random variable X,
popalaton where Wheat representing the ith observation in
© are independent random variables, Siemans, cl ec pag 0a
© have the same distribution as the observation in a specific case. $0, for
popul example, if the fourth person measured
‘was 1.85.m tall then x,
85,
Ee
The noon day temperature, in °C, is measured for a random sample of 5 days in July in a certain
city and the following results were obtained
28.3, 31.2, 24.0, 28.7, 30.9
Calculate the values of the following statistics
aX byX? © max(Xt, Xz, --.. Xy) — min(Xy, Xz, «1 Xy)“CHAPTER 3
We use ¥ for the mean value and
‘X for the mean of the statistic.
The minimum value is 24.0
The maximum value is 31.2
The statistic has a value of 312 - 240=72 +-— THis. of course, is the statistic
commonly known as the range,
If the distribution of the population is known then the sampling distribution of a statistic can
sometimes be found.
‘The weights, in grams, of a consignment of apples are normally distributed with a mean and
standard deviation 4, A sample of size 25 is taken and the statistics R and T are calculated as follows:
R= Xyg— Xy and P= Xy + Xp +. + Nas
Hind the distributions of R and T.
The sample will be X, Xz... Xp where each X, ~ (a 42) State the distribution for
Nowe R= Xn — Xi > R~ Nw 4? + 4) ree
hat i ~N(o, lav2i" Use the formulae for
[_shacies R= Noteel)
Also T=X,+X, +. + Xp from Chapter 1.
[bb 2Otlaereessaeve fEocuN( 2D 120 24%) BearreasvaPreaPePeSSE Extend the formulae for
or T~ N(25p, 20°) &(X + Y) and Var(X ~ Y)
es
A large bag contains counters. Sixty per cent of the counters have the number (on them and
forty per cent have the number 1
a Find the mean 1 and variance o* for this population of counters.
A simple random sample of size 3 is taken from this population,
b List all possible samples.
¢ Find the sampling distribution for the mean
yaMthths
where X;, X, and X, are the three variables representing samples 1, 2 and 3.
@ Hence find EX) and VariX).
Find the sampling distribution for the mode M.
£ Hence find E(M) and Var(M).Estimation, confidence Intervals and tests
a The distribution of the population is Use the methods from
caoaeap aaa SV to find p and
P(X = 3}: eecee
p= e(X) = PK=x)=0+2sp=2
>o=
= Var(X) =) 2 P(X = x) p= 0+ x2
a
b The possible samples are Since the sample is random the
(0.0.0) ‘observations are independent. So to
find the probability of case (1, 0, 0)
Uist these sy, | {1-20 (0,%.0) (0.0.1) you ean mutipy the probable
L 5 PK, = 1) x POX = 0) x PKS = 0)
ee eee ay) Remember that each X, has the
4) same distribution as X.
le, the (0, 0, 0) case
(0, 0, 1) cases
ive. the (1,1, 0): (1,0, 1): (0, 1,1) case
le the (1, 1,1) cae,
General formulae for
& OX) and Vart
iven in Section 3.2.
25 een 2 ee
41x 8-42 64+16+B_ 20.2
128 125 1 2
[le. cages (0, 0, 0): (1, 0, 0): (0, 1.0): (0, 0, 1)]
[he. the other cases]
oa
ot
os
f EM) =-O+1x = #
sis
and Var(M) = 0 +1x 44 - (42)° = 0228
Notice that £(X) = x but E(M) # p and that neither E(X) nor E(M) are equal to the
population mode, which is of course zero as 60% of the counters have a zero on them
These results will be examined In greater detail in Section 3.2.“CHAPTER 3
1) The random variable # ~ N(w, 0%) represents the height of a variety of flower where 4, 0
are unknown population parameters.
A random sample of 5 flowers of this variety are measured and their height, in cm, is given
below.
hy = 35.1, y= 323, y= 345, y= 37.4, y= 32.8
Determine which ofthe folowing are statistics.
(X; — XP?
aDa-a by &e®
ere] a X,— Xs
2) Arandom sample of 6 apples are weighed and their weights, x; g, are recorded
y= 168, x)= 185, x)= 161, 24-172, xy= 187, 26-176
Calculate the values of the following statistics.
ay
X,+X,
2
dx
3) The lengths of nails produced by a certain machine are normally distributed with a mean
and standard deviation . A random sample of 10 nails is taken and their lengths
WX, Xa) Nay ony Nag are measured.
i. Write down the distributions of the following:
ce LK-w
> (454)
ii State which of the above are statistics.
4) A large bag of coins contains Ip, Sp and lOp coins in the ratio [Link]
a Find the mean j and the variance g? for the value of coins in this population
A random sample of two coins is taken and their values X, and X) are recorded,
b List all possible samples.
€ Find the sampling distribution for the mean X = 1
@ Hence show that E(X) = wand Var(X) = 5Estimation, confidence Intervals and tests
You need to be able to estimate population parameters using a sampl.
In Section 3.1, the problem of trying to estimate the mean height of students in a sixth form
college was considered. If you take a random sample of size n then you can find various statistics.
‘The question is, are any of these statistics useful in estimating the population parameters?
A statistic that is used to estimate a population parameter is called an estimator: the particular
value of this estimator generated by a particular sample is called an estimate.
Since all the X; are random variables having the same mean and variance as the population, you
can sometimes find expected values of a statistic 7, E(7), and this will tell you what the ‘average’
value of the statistic should be.
Ee
‘Arandom sample X,, X3, .., Xys taken from a population with X ~ N(u, 02).
Show that E(X) = 4.
In Chapter & of book SI an important property of
expected values was given
E(aX) = aE(X) ®O
Also in Chapter | of this book you saw that
X +) = EX) +E) ®
x=! You can extend formul
Now XRG +X mule applatons~ coher:
So! EX) = TeX, +... + X,) bty® E(X; + X,) + Xs)
fi = EX +X) + EO)
= EX) +... + EX] by® = E(X)) + EXD) + EO)
Nata tal
nm
wT
Thatiss E(X)=p
Example 5 shows that if we use the sample mean X as an estimator for g then ‘on average’ it will
give us the correct result. This is an important property for an estimator to have and you say that
X is an unbiased estimator of p. (So a specific value of ¥ will provide an unbiased estimate
of w.)
Ifa statistic Tis used as an estimator for a population parameter @ and E(1) = @then Tis an
unbiased estimator for 0.
It seems obvious that being unbiased is a desirable feature to have in an estimator, but not all
estimators possess this property. In Example 4 you found two statistics based on samples of size 3
from a population of counters of which 60% had the number 0 and 40% had the number 1. The
population mean j. was 2 and the population mode was 0 (since 60% of the counters had 0 on3
them). The two statistics that you calculated were the sample mean X and the sample mode M.
You could use either of them as estimators for x, the population mean, but you saw that
E(X) = 1 but E(M) # uso you would prefer to use the sample mean X rather than the sample
mode Mas an estimator for gin this case. How about an estimator for the population mode?
Neither of the statistics that you calculated had the property of being unbiased since E(X) = x = 2
and E(M) = ;4 whereas the population mode was 0.
Intuitively you might prefer the estimator M since it is, afterall, a mode and is also slightly closer
to the population mode. In this case you refer to M as a biased estimator for the population
mode. The bias is simply the expected value of the estimator minus the parameter of the
population itis estimating.
Ifa statistic Tis used as an estimator for a population parameter @ then the bias = E(T) ~ 0
In this case the bias is 4,
So far you have found an unbiased estimator for 1, but how would you find an estimator for 0?
Before answering this question you need to find Var().
Example
A random sample Xy, Xo, .., Xy is taken from a population with X ~ N(q, 02).
‘Show that Var(X)
In Chapter 6 of book 51 an important property of variances was given:
Var(aX) = a? Var(X) ®
‘Alco in Chapter | of this book you saw that
Yar(X + Y) = Var(X) + Var(¥) if X and Y are independent @
Now X=LK,+...+X,)
Var(X) = evar +... +X) b@®
= 2. + Var(X,)] w®
= +e]
That ist Yar(X) = his est i
‘again later in
Section 3.3.
ea
Show that S? (XX? ~ nX?) is an unbiased estimator for 0”.Estimation, confidence Intervals and tests
In order to find E(S*) you need to recall certain facts about
expected values and variances, These are:
a? = Yar(X) = E(X?) — pw?
so: E(X2) =o? + pe ©
and See Examples 5
and 6.
60!
and ®
So
‘then Using
eax — bY) =
[E(Dx*) — ne(X2)] ‘0E(X) ~ BE(Y).
But E(DX*) = DEX?) = ne( x?) ‘since each X,
has the same’
s0: Ce(xe) — E(X2)] distribution as X,
[oz + w=
That is: E(S?) =
and 0 the statistic S* is an unblased estimator of the
population variance 0°, Ic is because of thie property of S? that
we use 6° to estimate a in calculations where a value of 7 is
rot known
Sometimes the ‘hat notation’ is used to describe an estimator of a parameter and 0 represents
an estimator of 0.
So you might use jito represent an estimator of 4, usually you have ji
imilarly you sometimes use @? to represent an estimator of «2, Following on from Example 7,
you usually have $?
The table below summarises the number of breakdowns, x, on a town’s bypass on 30 randomly
chosen days.
Number of breakdowns | 2) 3 | 4/5 | 6/7] 8/9
‘Number of days aisfala[s[a]al2
Calculate unbiased estimates of the mean and variance of the number of breakdowns.“CHAPTER 3
You can use your calculator
By calculator Ex = 160 and Yx* = 990 to find these values but
its recommended that
So 160 "556 you show them in the
poe Appropriate formulae
and a 200 — 208" 4N (3 of)
‘The working shown
haere is recommended
‘when answering
‘The random variable X has a continuous uniform distribution defined over the range [0, a]. A
random sample X;, Xo)... Xp, is taken.
a Show that X is a biased estimator for a and state the bias.
1b Suggest a suitable unbiased estimator for «.
See the $2 part of the
a Since X ~ U[O, al] w= E(X) formula book where the
XH = formulae for mean and
Ca variance of a continuous
Sas uniform distribution are
Then E(R) = $ given.
60 X is a biased estimator of @ See Example 5. E(X) =
‘ for any sample.
a Use the definition of bias
b If Y is an unbiased estimator of a then from page 28.
EQ) =a. This is the definition of an
Since E(X) = 4, a sensible etatietic for Vie Y = 2X eee
Bees
1) Find unbiased estimates of the mean and variance of the populations from which the
following random samples have been taken.
a 21.3; 19.6; 18.5; 22.3; 17-4; 16.3; 18.9; 17.6; 18.7; 16.5; 19.
B 1; 255; 156545 15 3: 258; 556; 254; 351
© 120.4; 230.6; 356.1; 129.8; 185.6; 147.6; 258.3; 329.7; 249.3
0.862; 0.754; 0.459; 0.473; 0.493; 0.681; 0.743; 0.469; 0.53
1.8; 20.1; 22.0
361.
2) Find unbiased estimates of the mean and the variance of the populations from which
random samples with the following summaries have been taken,
an=120 Yx= 4368 Lx? = 162466
bn Ix = 270 Dx? = 2546
Yx= 1140.7 Lx? = 1278.08
din=15 Dx = 168 ‘Dx? = 1913Estimation, confidence Intervals and tests
3) The concentrations, in mg per litte, of a trace element in 7 randomly chosen samples of
water from a spring were:
240.8 237.3 236.7 236.6 234.2 233.9 232.5.
Determine unbiased estimates of the mean and the variance of the concentration of the
trace element per litre of water from the spring,
4) Cartons of orange are filled by a machine. A sample of 10 cartons selected at random from
the production contained the following quantities of orange (in mb).
201.2 205.0 209.1 202.3 204.6
206.4 210.1 201.9 203.7 207.3
Calculate unbiased estimates of the mean and variance of the population from which this
sample was taken,
5) A manufacturer of self-assembly furniture required bolts of two lengths, Sem and 10cm, in
the ratio 2:1 respectively.
a Find the mean and the variance o? for the lengths of bolts in this population,
A random sample of three bolts is selected from a large box containing bolts in the required
ratio.
b List all possible samples.
¢ Find the sampling distribution for the mean X.
i Hence find E(X) and Var(X).
€ Find the sampling distribution for the mode M.
f Hence find E(M) and Var(M).
Find the bias when Mis used as an estimator of the population mode,
6) A biased six-sided die has probability p of landing on a six.
Every day, for a period of 25 days, the die is rolled 10 times and the number of sixes X is.
recorded giving rise to a sample X,, Xo,
a Write down E(X) in terms of p.
b Show that the sample mean X is a biased estimator of p and find the bias.
© Suggest a suitable unbiased estimator of p.
7) The random variable X ~ Ula, at
a Find E(X) and E(X?),
A random sample X;, X2, Xy
b Show that Y is an unbiased estimator of a
You need to be able to use the standard error of the m
So far you have seen how to find unbiased estimators for » and oA little thought will show
‘that, for a sample Xy, Xo, .., Xw B(X;) =m, for every value of i, So why should you bother to
faken and the statistic ¥ = Xy? + X,? + X,? is calculated.“CHAPTER 3
calculate the mean X when any member of the sample has the same property; that it provides an
unbiased estimator for the mean 4? To answer this question you need to look back at Examples 5
and 6 where some important properties of the estimator X were found
BOY) = wand Var(X) = 9°
Notice that X is always an unbiased estimator of « but also that, as the sample size m increases,
the variance of this estimator decreases. It is this property of Var(X) that makes X a useful
estimator of and certainly a better estimator than X, or any other single member of the sample,
since a smaller variance means that the values of any estimates should be closer to the required
value p. (This principle is examined further in $4.)
‘The variance of an estimator is clearly a helpful guide about how useful a particular estimate may
be. In calculations you often want to find the standard deviation of the estimator, and this is
referred to as the standard error of the estimator. So if the estimator is the mean X then the
standard error of the mean is “©. Since, in practice, you often have to use s? instead of «?, the
standard error of the mean is used to refer to either or > if o is not known,
Standard error of the mean is “7 or
Gea
(This is an extension of Example 8)
The table below summarises the number of breakdowns, x, on a town’s bypass on 30 randomly
chosen days.
Number of breakdowns | 23) 4/5 | 6 7/8 9
‘Number of days 3 fie 4 eat] eo dee |
a Calculate unbiased estimates of the mean and variance of the number of breakdowns.
‘Twenty more days were randomly sampled and this sample had a mean of 6.0 days and s? = 5.0.
1b Treating the 50 results as a single sample, obtain further unbiased estimates of the population
‘mean and variance.
€ Find the standard error of this new estimate of the mean,
Estimate the size of sample required to achieve a standard error of less than 0.25.
a By calculator:
Yar = 160 and Ex? = 990 ‘These calculations were
completed in Example 8.
= 4.7126 = 4.71 (3 5f)Estimation, confidence Intervals and tests
b New sample: 7 = 6.0 + Dy = 20x 60 =120 First you need to ‘unwrap’
a - the formulae for y and sf
> eee 6 tofind Sy and Sy2,
50: Ly =5x 19+ 20x 36
ie. Lye = 16
So the combined sample (w) of size 50 has
160 + 120 = 280 Now combine with
Li = 990 + B15 = 1805 Sor and Ex Let the
combined variable be w.
Then the combined estimate of wis
ie.
484 (B of)
The best estimate of a? will be 9,? since it io based
on a larger sample than 3,7 or 94%
a Use the 5 formula for
So the standard error is 2a standard error
To achieve a standard error < 0.25 you require
[2886 < 025 You do not know the value
for so you will have to
eke eae eee use your best estimate of
O25) it, namely 5,
Vi > 8.797.
n> 77.38
So we need a sample of at least 78.
John and Mary each independently took a random sample of sixth-formers in their college
and asked them how much money, in pounds, they earned last week. John used his sample
of size 20 to obtain unbiased estimates of the mean and variance of the amount earned by a
sixth-former at their college last week. He obtained values of # = 15.5 and s,? = 8.0.
Mary's sample of size 30 can be summarised as Sy = 486 and Sy? = 8222.
‘a Use Mary’s sample to find unbiased estimates of and o*,
1b Combine the samples and use all 50 observations to obtain further unbiased estimates of
wand 0?
¢ Find the standard error of the mean for each of these estimates of.
@ Comment on which estimate of . you would prefer to use.“CHAPTER 3
A machine operator checks a random sample of 20 bottles from a production line in order to
estimate the mean volume of bottles (in em’) from this production run. ‘The 20 values can be
summarised as Dx = 1300 and Dx? = 84 685,
Use this sample to find unbiased estimates of . and 0°
A supervisor knows from experience that the standard deviation of volumes on this process,
a, Should be 3cm* and he wishes to have an estimate of that has a standard error of less
than 0.5em’,
b What size sample will he need to achieve this?
The supervisor takes a further sample of size 16 and finds x
© Combine the two samples to obtain a revised estimate of
‘The heights of certain seedlings after growing for 10 weeks in a greenhouse have a standard
deviation of 2.6cm. Find the smallest sample that must be taken for the standard error of
the mean to be less than [Link].
The hardness of a plastic compound was determined by measuring the indentation produced
bya heavy pointed device.
‘The following observations in tenths of a millimetre were obtained:
4.7, 5.2, 5.4, 48, 4.5, 49,45, 5.1, 5.0, 48,
a Estimate the mean indentation for this compound.
ate the standard error of the mean,
ate the size of sample required in order that in future the standard error of the mean
should be just less than 0.08.
Prospective army recruits receive a medical test. The probability of each recruit passing the test
is p, independent of any other recruit. The medicals are cartied out over two days and on the
first day n recruits are seen and on the next day 2n are seen. Let X; be the number of recruits
who pass the test on the first day and let X; be the number passing on the second day.
a Write down, aa ies Var(X,) and Var(X3).
B Show that and 3 2 are both unbiased estimates of p and state, giving a reason, which
you would fe fe use,
Xi Xs
© Show that X = 3 (5) +52
is an unbiased estimator of p,
4 show that ¥ = (*1*2) is an unbiased estimator ofp,
Xx,
© Which of the statistics “!, 52, X or Y is the best estimator of p?
Se
(2X, + X,)
3
‘The statistic T is proposed as an estimator ofp.
f Find the bias.
‘Iwo independent random samples Xy, X2y +, Xyand ¥y, Yor ony You ate taken from a
population with mean w and variance «?, The unbiased estimators X and ¥ of u are
calculated. A new unbiased estimator T of pis sought of the form T= 1X + s¥.Estimation, confidence Intervals and tests
a Show that, since Tis unbiased, r+
b By writing 7 =X + (1 - AY, show that
vanT) = 0? [f+ LE)
¢ Show that the minimum variance of Tis when r= 5.
4 Find the best (in the sense of minimum variance) estimator of of the form 1X + s¥.
7) Alarge bag of counters has 40% with the number 0 on, 40% with the number 2 on and 20%
with the number 1.
a Find the mean y, and the variance o?, for this population of counters.
A random sample of size 3 is taken from the bag.
b List all possible samples.
€ Find the sampling distribution for the mean X.
Find E(X) and Var(X),
€ Find the sampling distribution for the median N.
f Hence find E(N) and Vari).
4 Show that Nis an unbiased estimator of p.
h Explain which estimator, X or N, you would choose as an estimator of
3.4 You need to be able to use the Central Limit Theorem to find an approximation
to the sampling distribution of X.
A random sample X;, Xs, .., X,is taken from a population where X ~ Niy, 2)
Show that ¥ ~N(u, 2).
X, + Xp ~ N(2m, 202) Use the results from Chapter 1.
EX =X +... + X,~ Ning, no”), Extend the above results.
X= LYN and we have seen that
E(R) = wand Var(X) = 7 See Example 5 and Example 6.
0, lice the popslatio ls normal we lnow that wl
be normal too and therefore X ~ N{ p,
Example 11 has shown that if the distribution of the population is known to be normal then
the sampling distribution of X is normal too. However, in many cases the distribution of the“CHAPTER 3
population is not known or it is clearly not normal so what will the distribution of X be when
the population from which the sample was taken does not have a normal distribution? The
answer, in general, is that it depends upon the distribution of the population and in most cases
there is no easy way of desci
ing the distribution of X. However, there is an important result
that enables you to say something about the distribution of X when the sample size m is large.
This result is known as the Central Limit Theorem and it tells you that when m is large X is,
approximately normally distributed, whether or not the population is normally distributed.
The Central
Theorem says that if X;, Xz, .... X, is a random sample of size n from a
population with mean 1 and variance 0 then X is approximately ~ Ny, 2]
This theorem is very important in statistics and is one of the main reasons why the normal
distribution is so useful. The theorem is an approximation but the approximation improves as
n, the sample size, increases; this is another reason (remember VartX) gets smaller as 7 increases)
why a large sample is often desirable
A proof of this theorem is beyond the scope of this course, but the following example should
help you to see why it might be true.
A table of random digits is designed so that the value, R, of a digit comes from a discrete uniform
distribution over the set (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
a Find w= ER)
1b Using the first row of the table on page 139 take a sample of size 10.
Calculate the sample mean.
a By symmetry E(R) = 4.5 »——>— Use }(0+9).
b The first 10 random digits are: Notice that the sample has some high (e.g. 8) and
B.6.1.3,8,4.1,0.0.7 some low (e.g. 0) digits but that the high and low
Yalues tend to cancel each other out so that the
oe gees ‘mean value for the sample is close to the mean for
Z 0 the population as a whole. Itis therefore much more
38 unlikely that you would get a mean x of 0 or 8 than a
value close to u. Its this ‘cancelling out effect of taking
‘a mean that might lead you to expect the distribution
fof X to peak close to 4 and tal off at each end,
Itisa worthwhile experiment to repeat this sampling of random numbers and obtain a large number
‘of observations of X. A histogram of these.t values can be plotted and a shape approximating to a
‘normal distribution should result. This can be done on a calculator ora computer:
A sample of size 9 is taken from a popu
that the sample mean X is more than 11
n with distribution N(10, 22), Find the probabilityEstimation, confidence Intervals and tests
The population is normal, so X will have a normal distribution
despite the small size of the sample. =
‘The mean of X is
var(®) = 22 = % = (2)° J C10) and the
woos se,
So X~n(v0.(2)"). variance of X is
The mean of X is 10 and the standard deviation is 2 60:
PR >t) =e z2> 110
E> 1) =H[2> 15)
P(Z> 18)
=1- 0.9352
0.0668
A cubical die is relabelled so that there are three faces marked 1, two faces marked 3 and one
marked 6, The die is rolled 40 times and the mean of the 40 scores is recorded.
Find an approximation for the probability that the mean is over 3.
Let the random variable X = the score on a single roll; then
the distribution of X is
ene: Use the techniques
pla): cee for finding means and
oe variances of discrete
So = E(X) = Der(X =x) = 1K $+ 3xb+6x distributions you met in
= E(X) = Lxr¢ 2 3 book SI.
25
and a? = Var(X) = Ex*P(X = x) ~ yw ‘The population is clearly
Bren : 1_ (sy not normally distributed
ahaa aha aroha (2) burt the sample size
=2 3.25 or ® (n= 40) is quite large
2h aaa so the Central Limit
Now by the Central Limit Theorem X = ~N(25, 33) Theorem can be used.
So P(X>3)=P(z> 252
Vieo
= P@>178)
=1- 09599
= 0.0401 or 0.040 (3 4p)
Itis worth pointing out that although the X, and therefore X are discrete distributions,
‘whereas the normal distribution isa continuous distribution, a continuity correction is not
appropriate in this example, However, if you had been asked to find a probability for UX
such as P(X > 120), then a continuity correction as described in book [Link] be applied.“CHAPTER 3
A sample of size 6 is taken from a normal distribution N(10, 22).
‘What is the probability that the sample mean exceeds 12?
‘A machine fills cartons in such a way that the amount of drink in each carton is distributed
normally with a mean of 40cm’ and a standard deviation of 1.5 em’.
aA sample of four cartons is examined.
Find the probability that the mean amount of drink is more than 40.5 em’.
b A sample of 49 cartons is examined.
Find the probability that the mean amount of drink is more than 40.5 cm? on this,
occasion,
‘The lengths of bolts produced by a machine have an unknown distribution with mean
3.03¢m and standard deviation 0.20cm.
A sample of 100 bolts is taken.
a Es
imate the probability that the mean length of this sample is less than [Link].
1b What size sample is required if the probability that the mean is less than 3cm is to be less
than 1962
Forty observations are taken from a population with distribution given by the probability
density function
w= [ , 0<2<3,
0, otherwise.
a Find the mean and variance of this population.
b Find an estimate of the probability that the mean of the 40 observations is more
than 2.10,
A fair die is rolled 35 times.
a Find the approximate probability that the mean of the 35 scores is more than 4.
b Find the approximate probability that the total of the 35 scotes is less than 100.
‘The 25 children in a class each roll a fair die 30 times and record the number of sixes they
obtain.
Find an estimate of the probability that the mean number of sixes recorded for the class is
less than 4.5.
‘The error in mm made in measuring the length of a table has a uniform distribution over
the range [-5, 5]. The table is measured 20 times,
Find an estimate of the probability that the mean error is less than ~1 mm.Estimation, confidence Intervals and tests
8) Telephone calls arrive at an exchange at an average, rate of two per minute, Over a period of
30 days a telephonist records the number of calls that arrive in the five-minute period before
her break.
a Find an approximation for the probability that the total number of calls recorded is more
than 350.
'b Estimate the probability that the mean number of calls in the five-minute interval is less
than 9.0,
9 How many times must a fair die be rolled in order for there to be a less than 1% chance that
‘the mean of all the scores differs from 3.5 by more than 0.1?
10 The heights of women in a certain area have a mean of 175 cm and a standard deviation
of 2.5cm. The heights of men in the same area have a mean of 177m and a standard
deviation of 2.0cm. Samples of 40 women and 50 men are taken and their heights are
recorded. Find the probability that the mean height of the men is more than [Link] greater
than the mean height of the women.
11 A computer, in adding numbers, rounds each number off to the nearest integer. All the
rounding errors are independent and come from a uniform distribution over the range
[-05, 0.5)
a Given that 1000 numbers are added, find the probability that the total error is greater
than +10.
'b Find how many numbers can be added together so that the probability that the
magnitude of the total error is less than 10 is at least 0.95.
12, An electrical company repairs very large numbers of television sets and wishes to estimate
the mean time taken to repair a particular fault, It is known from previous research that the
standard deviation of the time taken to repair this particular fault is 2.5 minutes.
‘The manager wishes to ensure that the probability that the estimate differs from the true
‘mean by less than 30 seconds is 0.95.
Find how large a sample is required. oe
3.5. You need to be able to calculate confidence intervals for a population parameter.
‘You are now in a position to complete the estimation of j, the population mean. In the previous
sections we considered taking a random sample of ri students and measuring their heights. Now
‘we shall assume that the standard deviation of heights of students, i.e. a, is known but the mean
in metres) is not known and this is the parameter we seek to estimate. Suppose the sample
gave an estimate = 1.73. What can you say about 4?
‘You know that an estimate of pis fa = 1.73, but it would be more helpful if you could give a
range of values for x and also provide some measure of how reliable this range of values is.
People sometimes use phrases like ‘I'm 90% (or 99% or 95%) certain that | left the keys on the
kitchen table’. In statistics we use the properties of the standard normal distribution, N(O, 12), to
formalise this idea, and arrive at a range of values for about which we are, say, 95% confident.“CHAPTER 3
Example [ER
Show that a 95% confidence interval for u, based on a sample of size n, is given by
E+196 x2
a
Ko approximately ~ N(u, 22) iat dso fe
population you krow bythe
and therefore Central Limit Theorem that X will
eae be approximately normal,
z=75*~No,% r
va
Using the table on page 130 you can see that for fle)!
the N(O, 1?) distribution
P(Z> 1.9600) = P(Z < -1.9600) = 0.025
and 60 95% of the distribution ie between
—1.9600 and 1.9600
So P(-196 p>R- 196%
F- 196% 2 wth of 07 Cie 2 x 25766 x 25 AEE ENS
12.879. the definition for the width.
soyourequre 15> 2B72. :
ie. n> TBN9...
4.
soyouneed onEstimation, confidence Intervals and tests
Awidth of 15 = 15 = 2x zx
fe)
z=15 tes
From the table on page 129 you find that
F(Z<18) = 0.9882 0.0668 0.0668
andeo P(Z>15)=P(Z< -15)
=1- 0.9332 Be i oaeeee Ota) a
= 0.0668
0 the confidence level ls 100 X (1-2 x 0.0668) = 66.67, iiepeeetene
is given by the area
between 2 = 21:5.
1
A random sample of size 9 is taken from a normal distribution with variance 36.
‘The sample mean is 128.
a Find a 95% confidence interval for the mean j. of the distribution.
1b Find 2 99% confidence interval for the mean j. of the distribution.
Arandom sample of size 25 is taken from a normal distribution with standard deviation 4
The sample mean is 85.
2a Find a 90% confidence interval for the mean yof the distribution.
1b Find a 95% confidence interval for the mean j of the distribution.
A normal distribution has mean j: and variance 4.41. A random sample has the following values:
231, 21.8, 246, 22.5,
Use this sample to find 98% confidence limits for the mean
Anormal distribution has standard deviation 15. Estimate the sample size required if the
following confidence intervals for the mean should have width of less than 2.
a 90% b 95% © 99%
Repeat Question 4 for a normal distribution with standard deviation 2.4 and a desired width
of less than 0.8.
An experienced poultry farmer knows that the mean weight [Link] for a large population of
chickens will vary from season to season but the standard deviation of the weights should
remain at 0.70kg. A random sample of 100 chickens is taken from the population and
the weight kg of each chicken in the sample is recorded, giving Y3x = 190.2. Find a 95%
confidence interval for p.
[A railway watchdog is studying the number of seconds that express trains are late in arriving.
Previous surveys have shown that the standard deviation is 50. A random sample of 200
trains was selected and gave rise to a mean of 310 seconds late, Find a 90% confidence
interval for the mean number of seconds that express trains are late.‘CHAPTER 3
10
un
12
13
An Investigation was carried out into the total distance travelled by lorries in current use,
The standard deviation can be assumed to be 15000 km, A random sample of 80 lorries were
stopped and their mean distance travelled was found to be 75 872 km,
ind a 90% confidence interval for the mean distance travelled by lorries in current use.
It is known that each year the standard deviation of the marks in a certain examination is
13.5 but the mean mark x will fluctuate. An examiner wishes to estimate the mean mark
of all the candidates on the examination but he only has the marks of a sample of 250
candidates which give a sample mean of 68.4.
a What assumption about these candidates must the examiner make in order to use this
sample mean to calculate a confidence interval for 4?
b Assuming that the above assumption is justified, calculate a 95% confidence interval for p.
Later the examiner discovers that the actual value of 4. was 65.3.
© What conclusions might the examiner draw about his sample?
formation has a uniform
‘The number of hours for which an electronic device can retain
istribution over the range [1 ~ 10, . + 10] but the value of is not known.
‘a Show that the variance of the number of hours the device can retain the information for
random sample of 120 devices were tested and the mean number of hours they retained
information for was 78.7,
b Find a 95% confidence interval for u.
A statistics student calculated a 95% and a 99% confidence interval for the mean 4. of a certain
population but failed to label them. ‘The two intervals were (22.7, 27.3) and (23.2, 26.8).
a State, with a reason, which interval is the 95% one.
b Estimate the standard error of the mean in this case,
© What was the student's unbiased estimate of the mean yin this case?
A.95% confidence interval for a mean pis 85.3 + 2.35. Find the following confidence
intervals for p.
a 90% b 98% © 99%
‘The managing director of a certain firm has commissioned a survey to estimate the mean
expenditure of customers on electrical appliances. A random sample of 100 people were
questioned and the research team presented the managing director with a 95% confidence
interval of (£128.14, £141.86).
The director say's that this interval is too wide and wants a confidence interval of total width
£10.
a Using the same value of, find the confidence limits in this case.
b Find the level of confidence for the interval in part a.
The managing director is still not happy and now wishes to know how large a sample would.
bbe required to obtain a 95% confidence interval of total width no more than E10.
€ Find the smallest size of sample that will satisfy this request.Estimation, confidence Intervals and tests
14 A plant produces steel sheets whose weights are known to be normally distributed with a
standard deviation of 2.4 kg. A random sample of 36 sheets had a mean weight of 31.4 kg
Find 99% confidence limits for the population mean
1 is regulated to dispense liquid into cartons in such a way that the amount of
lispensed on each occasion is normally distributed with a standard deviation of 20 ml
Find 99% confidence limits for the mean amount of liquid dispensed if a random sample
of 40 cartons had an average content of 266ml.
16 a The error made when a certain instrument is used to measure the body length of a
butterfly of a particular species is known to be normally distributed with mean 0 and
standard deviation 1mm, Calculate, to 3 decimal places, the probability that the error
‘made when the instrument is used once is numerically less than 0.4 mm.
b Given that the body length of a butterfly is measured 9 times with the instrument,
calculate, to 3 decimal places, the probability that the mean of the 9 readings will be
within 0.5mm of the true length,
© Given that the mean of the 9 readings was 22.53mm, determine a 98% confidence
interval for the true body length of the butterfly.
3.6 You need to be able to test hypotheses about the mean of a normal distribution.
In book $2 you met the idea of a hypothesis test and a definition of itis given below.
M_ Abypothesis test about a population parameter 0 tests a null hypothesis Ho, specifying a
particular value for 0, against an alternative hypothesis H,, which will indicate whether the
test is one-tailed or two-tailed.
In book S2 the parameters considered were the proportion p of a binomial distribution and
the mean A or 1 of a Poisson distribution. In this section you will learn how to extend the idea
to tests for , the mean, of a normal distribution. The process is similar to that of a trial in a
courtroom. The null hypothesis is on trial, evidence is presented and the jury has to make a
decision ‘on the balance of probability’.
Example
in_a carton has a normal
A certain company sells fruit juice in cartons. The amount of juic
distribution with a standard deviation of 3 ml.
‘The company claims that the mean amount of juice per carton, 4, is 60 mi. A trading inspector
has received complaints that the company is overstating the mean amount of juice per carton
and he wishes to investigate this complaint. The trading inspector took a random sample of
16 cartons which gave a mean of 59.1 ml.
Using a 5% level of significance, and stating your hypotheses clearly, test whether or not there is
evidence to uphold this complaint.