THEORY OF
ESTIMATION
IN THIS CHAPTER YOU WILL STUDY
Estimation, estimator, characteristics of good
estimators, methods of estimation (point
estimation & interval estimation)
ESTIMATION:
Introduction:
Estimation is about guessing or predicting about
something. It happened in our daily life
knowingly or unknowingly. A housewife predicts
the amount of ingredients before cooking food
daily, a farmer predicts amount of seeds,
fertilizer etc. before planting crops, a scientist
make hypothesis about some phenomenon
before testing it etc. From daily expenditure of a
family to national policy level, estimation
become inseparable part of human activities.
Here we study about systematic methods of
guessing.
DEFINITION:
The process of guessing or predicting about
unknown population parameters with the help of
sample statistics or past experiences is known as
estimation. The sample statistics which are used
to predict true but unknown population
parameters are known as estimators. For
example; sample mean, sample proportion and
sample variance are estimators of population
mean, population proportion and population
variance respectively.
CHARACTERISTICS OF GOOD ESTIMATORS:
A good estimator is one which is as close as to
the true value of its population parameter as far
as possible. The prerequisites to be a good
estimators are following;
(a)UNBIASEDNESS:
If the value of an estimator is approximately
equal to its corresponding population
parameter then it is said to be unbiased
estimator of that population parameter.
̂ ≈ 𝜽, where former is representation of
i.e. 𝜽
estimator and later represents its population
parameter.
For example; mean of sampling distribution
of sample means is unbiased estimator of
population mean i.e. 𝝁𝒙̅ =𝝁
(b)CONSISTENCY:
When the sample size is increased gradually,
if the value of an estimator is more and more
closed with the value of its population parameter
then this estimator is said to be consistent
estimator of that population parameter. i.e.
𝐥𝐢𝐦 𝜽 ̂ 𝒏 =𝜽.
𝒏→∞
For example sample mean and variance are
consistent estimators of their corresponding
population parameters.
(c) EFFICIENCY:
If an estimator having fewer variance than
another estimator than it is said to be efficient
estimator than another. In other words, between
two or more estimators which has lesser
variability is said to be efficient than rest of the
others.
(d) SUFFICIENCY:
If an estimator is obtained by using all the
possible information as far as possible then it is
said to be sufficient estimator. For example;
sample mean ̅𝒙 and variance s2 are sufficient
estimators of their population parameters.
TYPES OF ESTIMATION
There are two types of estimation, which are
point estimation and Interval estimation.
POINT ESTIMATION
The process of estimating or guessing the
value of unknown but true value of
population parameter with the help of its
sample statistics by a single numerical
value is called point estimation. For
example; the daily born of children in a city
hospital is 50 is an example of point
estimation. Generally, sample mean 𝒙 ̅ is a
point estimator of population mean 𝝁 and
sample variance s2 is a point estimator of
population variance 𝝈𝟐 .
INTERVAL ESTIMATION
The process of estimating the value of
unknown population parameter by sample
statistic within a range of interval is known
as interval estimation. For example; the
average life expectancy of Nepalese people
is between 65 and 75 years is an example
of interval estimation.
Almost, point estimation is inaccurate so
to minimize this error interval estimation
plays vital role. For instance; if someone
guess the average life of Nepalese people
is 70 years, this will be wrong even the
true value is 69 and 71 also. But if it is
guessed between 65-75 years; it will be
almost true so some gap is leaved before
and after the sample statistic to make it
more reliable. That gap is made by the
researchers using scientific technique. In
this way an interval can be formed and
true value is expected to lie within that
interval and such interval is said to be
confidence interval or fiducial limits.
HOW CONFIDENCE INTERVAL IS FORMED?
𝑬𝒓𝒓𝒐𝒓 ̅−𝝁
𝒙
Actually, Z= = 𝝈 (for mean).
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓
√𝒏
So, Error = Z x standard error(S.E.) so this
error is subtracted to get lower limit of the
confidence interval and added to get upper
limit. In this way confidence interval is
formed.
Hence;
Confidence interval = Sample value ± Error
= Sample value ±Z x Standard error(S.E.)
HOW Z IS CHOSEN?
To obtain Z, probability is required. Area
under the probability density curve
represents probability. This probability is
called confidence level and denoted by 𝟏 −
𝜶. This Z is often written as 𝒁𝜶 . it is
obtained from the table so it is also called
tabulated value of Z and written as Ztab.
This 1- 𝜶 is chosen according to the nature
of the problems. It is almost between 90%
to 99%. Most often, it is given by question,
if it is not given, we should choose it
95%.This probability is either one sided or
two sided. If it is two sided Ztab= 𝒁𝜶/𝟐 and if
it is one sided Ztab=𝒁𝜶 .
Hence, we write;
Confidence interval (C.I)
= Sample value± Ztab x S.E. of sample value.
WHAT IS 𝜶?
Definition:
The probability or the region within which true value of
population parameter is not expected to lie is called
level of significance or critical region or rejection region.
+𝑍𝛼
2
-𝑍𝛼
2
CRITICAL VALUE OR CUT OFF POINTS:
The value of Z which separates confidence
interval and rejection interval is called critical
value. In above diagram ±𝒁𝜶 are critical values.
𝟐
The lines which separates confidence region and
rejection region are called critical lines or cut off
lines.
REMARKS:
For large samples n≥ 𝟑𝟎, 𝒁𝜶 or Z𝜶 are critical
𝟐
values as it is one tail(one sided) or two tail(two
sided). In case of small sample we need to use t
instead of Z. The value of t depends upon sample
size n, so degree of freedom should not be
ignored to find the value of t . In this case,
critical value ttab=𝒕𝜶 ,𝒏−𝟏 or 𝒕𝜶,𝒏−𝟏 .
𝟐
CONFIDENCE INTERVAL FOR DIFFERENT
STATISTICS:
Statistics C.I.
̅
𝒙 ̅ ± 𝒁𝒕𝒂𝒃 𝑺. 𝑬. (𝒙
𝒙 ̅)
̅̅̅𝟏 − 𝒙
𝒙 ̅̅̅𝟐 ̅̅̅̅𝟏 − 𝒙
|𝒙 ̅̅̅|
𝟐 ± 𝒁𝒕𝒂𝒃 𝑺. 𝑬. (𝒙̅̅̅𝟏 − 𝒙̅̅̅)
𝟐
̂
𝒑 ̂ ± 𝒁𝒕𝒂𝒃 𝑺. 𝑬. (𝒑
𝒑 ̂)
𝒑̂ ̂𝟐
𝟏− 𝒑 ̂
|𝒑 ̂
𝟏− 𝒑𝟐 | ± 𝒁𝒕𝒂𝒃 𝑺. 𝑬. (𝒑̂ ̂)
𝟏− 𝒑 𝟐
REMARKS: Remember that for small sample
n≤30 , 𝒕𝒕𝒂𝒃 is used instead of Ztab.
STANDARD ERROR OF DIFFERENT STATISTICS:
For infinite population or samples with
replacement n>30
S.N. Statistics S.E. Remarks
1 ̅
𝒙 𝝈
√𝒏
2 ̅
𝒙 𝒔 If 𝝈 is not given.
√𝒏
3 ̅̅̅-𝒙
𝒙𝟏 ̅̅̅𝟐
𝝈𝟐𝟏 𝝈𝟐𝟐
√ +
𝒏𝟏 𝒏𝟐
4 ̅̅̅-𝒙
𝒙𝟏 ̅̅̅𝟐 If𝝈𝟏 and 𝝈𝟐 are
𝒔𝟐𝟏 𝒔𝟐𝟐
√ + not given but
𝒏𝟏 𝒏𝟐 assumed to be
different.
5 ̅̅̅-𝒙
𝒙𝟏 ̅̅̅𝟐 𝟏 𝟏 If 𝝈𝟏 and 𝝈𝟐 are
𝒔𝒑 √ + Where,
𝒏𝟏 𝒏𝟐 not given but
𝒔𝒑 = assumed to be
(𝒏𝟏 −𝟏)𝒔𝟐𝟏 +(𝒏𝟐 −𝟏)𝒔𝟐𝟐 equal.
√
𝒏𝟏 +𝒏𝟐 −𝟐
6 ̂
𝒑
𝑷𝑸
√
𝒏
7 ̂
𝒑 If P is not given.
̂𝒒
𝒑 ̂
√
𝒏
8 𝒑̂ ̂𝟐
𝟏− 𝒑
𝑷𝟏 𝑸𝟏 𝑷𝟐 𝑸𝟐
√ +
𝒏𝟏 𝒏𝟐
9 𝒑̂ ̂𝟐
𝟏− 𝒑 If p1 and p2 are
̂𝟏 𝒒
𝒑 ̂𝟏 𝒑̂𝟐 𝒒
̂𝟐
√ + not given and
𝒏𝟏 𝒏𝟐 assumed to be
unequal.
10 𝒑̂ ̂𝟐
𝟏− 𝒑 𝟏 𝟏 If p1 and p2 are
√ ̂
𝒑 ̂
𝒒 ( + ) where
𝒏𝟏 𝒏𝟐 not given and
𝒏 𝒑̂ +𝒏 𝒑̂ 𝒙 +𝒙
̂= 𝟏 𝟏 𝟐 𝟐 or 𝟏 𝟐 assumed to be
𝒑
𝒏𝟏 +𝒏𝟐 𝒏𝟏 +𝒏𝟐
equal.
WORKED OUT EXAMPLES:
1. In 39 soil samples tested for trace elements,
the average amount of copper was found to
be 22 mg, with a standard deviation of 4mg
find (i) 90%
(ii)95% and (iii) 99% confidence interval for the
true mean copper contents in the soil from which
these samples were taken.
Solution;
Here, n=39, 𝒙 ̅=𝟐𝟐𝒎𝒈, 𝒔 = 𝟒𝒎𝒈
(i) 90% confidence interval
(1-𝜶)𝟏𝟎𝟎%=90%
∴ 𝜶=0.1
Now, Ztab = |𝒁𝜶⁄𝟐 | = |𝒁𝟎.𝟎𝟓 |= 1.645
̅̅̅ = 𝒔 = 𝟒 = 0.6405
S.E.(𝒙)
√𝒏 √𝟑𝟗
We know,
̅ ±Ztab S.E.(𝒙
C.I.(L,U)= 𝒙 ̅)
= 22±1.645x0.6405
= 22±1.0536
=(20.9464, 23.0536)
2. The mean weight loss of n=16 grinding balls
after a certain length of time in mill slurry is
3.42 grams with a standard deviation of 0.68
grams. Construct a 99% confidence interval
for the true mean weight loss of such grinding
balls under the stated conditions.
Solution;
Here n=16, 𝒙̅=3.42 grams, s=0.68 grams
(1-𝜶)100%=99% ∴ 𝜶=0.01
Now, ttab=|𝒕𝜶,𝒏−𝟏 |=|t0.005,15|=2.947
𝟐
𝒔
̅)= =0.68/4= 0.17
S.E.(𝒙
√𝒏
We have,
̅ ±ttab S.E.(𝒙
C.I.(L,U)= 𝒙 ̅)
= 3.42 ±2.947x0.17
= 3.42 ±0.501
=(2.92, 3.921)
4.The two independent random samples of sizes
80 and 35 drawn from two normal populations
N(𝝁𝟏, 𝝈𝟐 ) and N(𝝁𝟐 ,𝝈𝟐 ) have means 50 and 45
and variances 16 and 25 respectively. Obtain a
95% confidence interval for (𝝁𝟏 − 𝝁𝟐 ).
Solution;
n1=80, n2=35, ̅̅̅=50,
𝑥1 𝑥2
̅̅̅=45, s12=16, s22=25
(1-𝛼)100% = 95% ∴ 𝛼 = 0.05
Now;
Ztab=|𝑍𝛼 |=|Z0.025|=1.96
2
We have,
̅̅̅-𝑥
C.I.(L,U)=|𝑥 1 ̅̅̅|±𝑍
2 𝑡𝑎𝑏 S.E.( ̅̅̅-𝑥
𝑥1 ̅̅̅)
2
1 1
=|50-45| ± 1.96xSp√( + )
𝑛 𝑛 1 2
(𝒏𝟏 −𝟏)𝒔𝟐𝟏 +(𝒏𝟐 −𝟏)𝒔𝟐𝟐 1 1
=5 ± 1.96x √ √( + )
𝒏𝟏 +𝒏𝟐 −𝟐 80 35
= 5 ± 1.96 x4.37X 0.202
= 5 ± 1.71
= (3.29, 6.71)
.
PAIRED SAMPLES OR DEPENDENT SAMPLES:
If the n members of one set of sample are
pairwise dependent with the n members of
another set of sample taken from same
population or different population are pairwise
dependent then such samples are called
matched paired or paired samples or
dependent samples.
For example; the marks of student before and
after tuition, the sugar level of patient before
and after medication, sales of the products
before and after advertisement etc.
For these types of samples, the method of
obtaining confidence interval is given by
formula;
𝑠
C.I.=𝑑̅ ± 𝑡𝛼,𝑛−1 𝑑
2 √𝑛
Where,
∑ (𝑑−𝑑̅ )2
d=𝑦 − 𝑥, 𝑠𝑑 =√
𝑛−1
∑𝑑
𝑑̅ =
𝑛
𝑥 = 𝑠𝑒𝑡 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑏𝑒𝑓𝑜𝑟𝑒 𝑖𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛.
𝑦 = 𝑠𝑒𝑡 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑓𝑡𝑒𝑟 𝑖𝑚𝑝𝑙𝑖𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛.
3. Memory capacity of 9 students was tested
before and after training and following scores
were obtained.
Before 10 15 9 3 7 12 16 17 4
After 12 17 8 5 6 11 18 20 3
Construct a 95% confidence interval for
difference between the two mean scores.
Solution:
let x and y denote the memory capacity of the
students before and after training. These x and
y are pairwise dependent so this is a case of
paired samples or matched paired.
Before(x) After d=y-x (𝑑 − ̅̅̅
𝑑)2
(y)
10 12 2 1.4884
15 17 2 1.4884
09 08 -1 3.1684
03 05 2 1.4884
07 06 -1 3.1684
12 11 -1 3.1684
16 18 2 1.4884
17 20 3 4.9284
04 03 -1 3.1684
∑ 𝑑=7 ∑(𝑑 − 𝑑)̅̅̅2 = 23.56
∑𝑑 7
̅
𝑑= = ̅ 2=
∑(𝑑−𝑑)
𝑛 9 𝑠𝑑 =√ =1.71
𝑛−1
=0.78
Now, (1-𝛼)100% = 95% ∴ 𝛼 = 0.05
ttab= 𝑡𝛼,𝑛−1 =t0.025,8=2.306
2
We know,
𝑠𝑑
C.I.=𝑑̅ ± 𝑡𝑡𝑎𝑏
√𝑛
1.71
= 0.78 ±2.306
√9
= 0.78 ±1.314
=(-0.534, 2.094)
9. A dean of a college wants to use the mean
of a random sample to estimate the average
amount of time students take to get from one
class to the next, she wants to be able to
assert with 99% confidence that the error is at
most 0.25 minutes. If it can be presumed from
experiment that 𝝈 = 𝟏. 𝟒𝟏 𝒎𝒊𝒏𝒖𝒕𝒆𝒔, how
large a sample will she have to take?
Solution;
Error=0.25, (1-𝜶)𝟏𝟎𝟎% = 𝟗𝟗% ∴ 𝜶 = 𝟎. 𝟎𝟏
𝝈 = 𝟏. 𝟒𝟏.
We know,
𝑬𝒓𝒓𝒐𝒓
Ztab=
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓
𝟎.𝟐𝟓
Or, 𝒁𝜶 = 𝝈
𝟐 √𝒏
√𝒏
Or, Z0.005=0.25
𝟏.𝟒𝟏
√𝒏
Or 2.576 =0.25
𝟏.𝟒𝟏
∴ 𝒏 = 𝟐𝟎𝟖.
CONFIDENCE INTERVAL FOR SAMPLE
PROPORTION:
1. A sample poll of 100 voters chosen at
random from all voters in a given district
shows that 55% of them were in favors of a
particular candidates. Find 95% confidence
limits for the proportion of voters in his
favors, if a very large number of voters are
allowed to cast their votes.
Solution;
Here, n=100, 𝒑 ̂ = 𝟓𝟓% = 𝟎. 𝟓𝟓, 𝒒 ̂=1-𝒑̂=0.45
(1-𝜶)𝟏𝟎𝟎% = 𝟗𝟓% ∴ 𝜶 = 𝟎. 𝟎𝟓
Ztab=𝒁𝜶 =Z0.025=1.96
𝟐
̂𝒒
𝒑 ̂ (𝟎.𝟓𝟓)(𝟎.𝟒𝟓)
̂)=√ =√
S.E.( 𝒑
𝒏 𝟏𝟎𝟎
𝟎.𝟐𝟒𝟕𝟓
=√ =√𝟎. 𝟎𝟎𝟐𝟒𝟕𝟓=0.0497
𝟏𝟎𝟎
We know;
̂ ±ZtabS.E.( 𝒑
C.I.=𝒑 ̂)
=0.55±1.96x0.0497
=0.55±0.097412
=(0.452, 0.647)
2. A pool is taken among city dwellers and
villagers in a country to determine the
feasibility of a development programmed.
If 2400 of 5000 city dwellers and 1500 of
2500 villagers favor it. Find a 95%
confidence interval for the true difference
in the proportion favoring the development
programmed.
Solution;
2400
𝑝
̂=
1 =0.48, 𝑞
̂=1-𝑝
1 ̂=0.52
1
5000
1500
𝑝
̂=
2 =0.6, 𝑞
̂=1-𝑝
2 ̂=0.4
2
2500
n1=5000 and n2=2500
(1-𝛼)100%=95% ∴ 𝛼 = 0.05
̂𝑞
𝑝1̂1 ̂𝑞
𝑝2̂2
̂1 -𝑝
S.E.(𝑝 ̂)=√
2 + =…….
𝑛1 𝑛2
Ztab=𝒁𝜶 =Z0.025=1.96
𝟐
We know,
C.I.=| 𝑝
̂1 -𝑝
̂|±Z
2 tabS.E.( 𝑝
̂1 -𝑝
̂)
2
=……………=(0.0963,0.1473)
DEGREE OF CONFIDENCE
1. A random sample of 100 teachers in a large
metropolitan area revealed a mean weekly salary of
£𝟒𝟖𝟕 with a standard deviation of £48. With what
degree of confidence can you assert that the
average weekly salary of all the teachers in the
metropolitan area is between £472 and £502?
Solution;
̅=£487, s=£48, C.I.=(472,502)
n=100, 𝒙
(1-𝜶)𝟏𝟎𝟎%=?
Now, we have;
C.I.= 𝒙̅ ±Ztab √𝒔𝒏 = 487 ± 𝒁𝒕𝒂𝒃
48
√100
Taking lower limit of confidence interval
We have given that,
Lower limit= 472
Or, 487-𝐙𝐭𝐚𝐛 4.8= 472
𝟒𝟖𝟕−𝟒𝟕𝟐
Or, 𝐙𝐭𝐚𝐛=
𝟒.𝟖
Or, 𝐙𝐭𝐚𝐛 = 3.125
Or,−𝒁𝜶 = 𝟑. 𝟏𝟑
𝟐
Or,𝒁𝜶 = −𝟑. 𝟏𝟑
𝟐
𝜶
Or, =0.0009
𝟐
Or, 𝜶 =0.0018
Or,1- 𝜶 =0.9982 ∴ (𝟏 − 𝜶)𝟏𝟎𝟎% = 𝟗𝟗. 𝟖𝟐%.
Hence, the degree of confidence=𝟗𝟗. 𝟖𝟐%.
THANK YOU