Unit 4 Sampling and Estimation_21MA41
Unit 4 Sampling and Estimation_21MA41
UNIT-IV
SAMPLING AND ESTIMATION
SAMPLING THEORY
In a statistical investigation the interest usually lies in the assessment of the general magnitude
and the study of variation with respect to one or more characteristics relating to individuals
belonging to a group. This group of individuals under study is called population
or universe. Thus in statistics, population is an aggregate of objects, animate or inanimate,
under study. The population may be finite or infinite.
It is obvious that, for any statistical investigation complete enumeration of the population is
rather impracticable. For example, if we want to have an idea of the average per capita
(monthly) income of the people in India, we will have to enumerate all the earning individuals
in the country which is rather a very difficult task.
If the population is infinite, complete enumeration is not possible. Also if the units are
destroyed in the course of inspection, (e.g., inspection of crackers, explosive materials, etc.),
100% inspection, though possible, is not at all desirable. But even if the population is finite or
the inspection is not destructive, 100% inspection is not taken recourse to because of
multiplicity of causes, viz. ,administrative and financial implications, time factor, etc., and we
take the help of sampling.
Size of the population is the number of objects or observations in the population and is denoted
by 𝑁. A finite subset of statistical individuals in a population is called a sample and the number
of individuals in a sample is called the sample size. Size of the sample is denoted by 𝑛. If 𝑛 ≥
30, the sampling is said to be large sampling. If 𝑛 < 30, the sampling is said to be small
sampling.
For the purpose of determining population characteristics, instead of enumerating the entire
population, the individuals in the sample only are observed. Then the sample characteristics
are utilized to approximately determine or estimate the population. This method is called the
statistical inference. For example, on examining the sample of a particular stuff we arrive at a
decision of purchasing or rejecting that stuff. The error involved in such approximation is
known sampling error and is inherent and unavoidable in any and every sampling scheme. But
sampling results in considerable gains, especially in time and cost not only in respect of making
observations of characteristics but also in the subsequent handling of the data.
The entries in the first column are called the variables 𝑥𝑖 and the entries in the second column
are called the frequencies 𝑓𝑖 .
Further the data can be grouped as below.
The table of the above type is called a table of grouped frequency distribution. The entries in
the first column are called the class-intervals (or classes) and the entries in the second column
are the frequencies.
While analyzing statistical data, it is generally observed that the items or the frequencies cluster
around some central value of the variable. Such a central value is called a measure of central
tendency of the data. The mean (or average) is one such measure.
Mean:
1. For a raw data consisting of ‘n’ items 𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 , the arithmetic mean or mean is defined
by the formula
𝑥1 + 𝑥2 + 𝑥3 + ⋯ +𝑥𝑛 ∑ 𝑥𝑖
Mean = =
𝑛 𝑛
2. For a frequency distribution (𝑥𝑖 , 𝑓𝑖 ) , the mean is defined by the formula
𝑓1 𝑥1 + 𝑓2 𝑥2 + 𝑓3 𝑥3 + ⋯ +𝑓𝑛 𝑥𝑛 ∑ 𝑓𝑖 𝑥𝑖
Mean = =
𝑓1 + 𝑓2 + 𝑓3 + ⋯ + 𝑓𝑛 ∑ 𝑓𝑖
Variance:
1
1. For a raw data, the variance is defined by Variance = 𝑛 ∑(𝑥𝑖 − mean)2 .
If the population is infinite (or if the sampling is finite with replacement), the formula is given
as;
𝜎
𝜇𝑥̅ = 𝜇 , 𝜎𝑥̅ =
√𝑛
It can be proved that for samples of large size or for samples with replacement, the sampling
distribution of means is approximately a normal distribution for which the sample mean 𝑥̅ is
the random variable. If the population itself is normally distributed, then the sampling
distribution of means is a binomial distribution even for samples of small size. Accordingly,
the standard normal variate for the sampling distribution of means is given by
𝑥̅ −𝜇𝑥̅ 𝑥̅ −𝜇
𝑍= = .
𝜎𝑥̅ 𝜎𝑥̅
Problems:
1. A population consists of four numbers 3, 7, 11, 15. Consider all possible samples of size 2
which can be drawn from this population with and without replacement. Find the mean and
standard deviation in the population, and in the sampling distribution of means verify the
formulas 𝜇𝑋̅ and 𝜎𝑋̅ .
Solution: Given 𝑁 = 4
(3+7+11+15)
Mean = 𝜇 = =9
4
1
Variance = σ2 = 4 {(3 − 9)2 + (7 − 9)2 + (11 − 9)2 + (15 − 9)2 } = 20
=5
Standard variation= 𝜎𝑥̅ = √5
⇒ 𝜇𝑥̅ = 𝜇
and
𝜎 √20
= = √5 = 𝜎𝑥̅
√𝑛 √4
Case ii) Sampling without replacement:
Possible samples of size two which can be drawn without replacement is (3, 7),
(3, 11), (3, 15), (7, 11), (7, 15), (11, 15). The mean of these 6 samples are 5, 7, 9,
9, 11, 13 respectively. For this distribution,
(5+7+9+9+11+13)
Mean = 𝜇𝑥̅ = = 9.
6
Variance = σ2x̅
1
= 6 {(5 − 9)2 + (7 − 9)2 + (9 − 9)2 + (9 − 9)2 + (11 − 9)2 +
20
(13 − 9)2 } = .
3
√20
Standard deviation = 𝜎𝑥̅ = .
√3
𝜎 √𝑁−𝑛 √20 √4−2 √20
∗ = ∗ = = 𝜎𝑥̅ . Also, 𝜇𝑥̅ = 𝜇 .
√𝑛 √𝑁−1 √2 √4−1 √3
2. The daily wages of 3000 workers in a factory are normally distributed with mean equal to Rs
68 and standard deviation equal to Rs 3. If 80 samples consisting of 25 workers each are
obtained, what would be the mean and standard deviation of the sampling distribution of means
if sampling were done (a) with replacement (b) without replacement? In how many samples
will the mean is likely to be (i) between Rs 66.8 & Rs 68.3 and (ii) less than Rs 66.4?
Solution: Given 𝑁 = 3000 , 𝜇 = 68 , 𝜎 = 3 , 𝑛 = 25 .
In case of sampling with replacement
𝜎 3
𝜇𝑥̅ = 𝜇 = 68 and 𝜎𝑥̅ = = = 0.6 .
√𝑛 √25
3. Let 𝑥 be the mean of a random sample of size 50 drawn from a population with mean
112 and standard deviation 40. Find (a) the mean and standard deviation of 𝑥 , (b) the
probability that 𝑥 assumes a value between 110 and 114, (c) the probability that 𝑥
assumes a value greater than 113.
Solution:
𝑛 = 50, 𝜇 = 112, 𝜎 = 40
𝜎 40 𝑥̅ −𝜇𝑥̅
𝜇𝑥 = 𝜇 = 112, 𝜎𝑥 = 𝑛 = = 5.6569 and 𝑧 =
√ √50 𝜎𝑥̅
𝑃(110 < 𝑥 < 114) = 𝑃(−0.35 < 𝑧 < 0.35) = 0.6368 − 0.3632 = 0.2736
𝑃(𝑥 > 113) = 𝑃(𝑧 > 0.18) = 1 − 𝑃(𝑧 ≤ 0.18) = 1 − 0.5714 = 0.4286
4. An automobile battery manufacturer claims that its midgrade battery has a mean life
of 50 months with a standard deviation of 6 months. Suppose the distribution of battery
lives of this particular brand is approximately normal. On the assumption that the
manufacturer’s claims are true, find (a) the probability that a randomly selected battery
of this type will last less than 48 months, (b) the probability that the mean of a random
sample of 36 such batteries will be less than 48 months.
Solution:
𝑛 = 36, 𝜇 = 50, 𝜎 = 6
𝜎 6
𝜇𝑥 = 𝜇 = 50, 𝜎𝑥 = 𝑛 = =1
√ √36
𝑥−𝜇
Using 𝑧 = 𝑃(𝑥 < 48) = 𝑃(𝑧 < −0.33) = 0.3707
𝜎
Exercise:
1. A population consists of four numbers 1, 5, 6, 8. Consider all possible samples of size
2 which can be drawn from this population with and without replacement. Find the
mean and standard deviation in the population, and in the sampling distribution of
means verify the formulas 𝜇𝑋̅ and 𝜎𝑋̅ .
2. The mean and the standard deviation of a normally distributed population of size 250
are 100 and 16 respectively. What are the mean and the standard deviation of the
sampling distribution of means for random samples of size 4 drawn with replacement
and without replacement?
3. With reference to the above Problem No2, what is the probability that the sample mean
lies between 95 and 105 for a sample of size 4 drawn with and without replacement?
4. The mean of a certain normal infinite population is equal to the standard error of the
distribution of means of samples of size 100 drawn from that population. Find the
probability that the mean of a sample of size 25 drawn from the population will be
negative.
5. If the mean of an infinite population is 575 with standard deviation 8.3 how large a
sample must be used in order that there be one chance in 100 that the mean of the sample
is less than 572?
6. Suppose that the number of customers entering a grocery shop each day over a five-
year period is a random variable with mean 100 and standard deviation of 10. Then
what is the probability that randomly selected 30-day period is between 95 and 105?
√13
Answers: 1) With replacement: 𝜇𝑥̅ = 5 and 𝜎𝑥̅ = , Without replacement: 𝜇𝑥̅ = 5
2
13
and 𝜎𝑥̅ = √ 6 , 2) With replacement: Mean= 100, SD= 8, Without replacement:
Suppose we obtain a frequency distribution of 𝑡 by computing the value of 𝑡 for each of a set
of samples of size 𝑛 drawn from a normal or a nearly normal population. The sampling
distribution so obtained is called the Student’s t-distribution with probability density function
−(𝛾+1)
𝑡2 2
𝑌(𝑡) = 𝑌0 (1 + 𝛾 ) with 𝛾 = 𝑛 − 1 𝑑. 𝑓. where 𝑌0 is a constant and −∞ < 𝑡 < ∞. The
4. Eleven school boys were given a test in mathematics carrying a maximum of 25 marks.
They were given a month’s extra coaching and a second test of equal difficulty was
held thereafter. The following table gives the marks in the two tests.
Boy 1 2 3 4 5 6 7 8 9 10 11
I Test 23 20 19 21 18 20 18 17 23 16 19
Marks
II Test 24 19 22 18 20 22 20 20 23 20 17
Marks
Problems:
1. A certain stimulus administered to each of 12 patients resulted in the following change
in blood pressure: 5, 2, 8, -1, 3, 0, 6, -2, 1, 5, 0, 4 (in appropriate units). Find the students
‘t’ for the given sample taking the mean of the population to be 0.
2. Nine items of a sample have the following values: 45, 47, 50, 52, 48, 47, 49, 53, 51.
Find the students ‘t’ for the given sample taking the mean of the population to be 47.5.
Answers: 1) 2.89, 2) 1.83
where, 𝜈 = 𝑛 − 1.
Suppose that, in a random experiment, a set of events 𝐸1 , 𝐸2 , 𝐸3 … … . 𝐸𝑛 are observed to occur
with frequencies 𝑓1 , 𝑓2 , 𝑓3 , … … 𝑓𝑛 . According a theory based on probability rules, suppose the
same event are expected to occur with the frequencies 𝑒1 , 𝑒2 , 𝑒3 … … 𝑒𝑛 . Then
𝑁 = ∑ 𝑓𝑖 = ∑ 𝑒𝑖
𝑖=1 𝑖=1
If the expected frequencies are at least equal to 5, then it can be proved that the sampling
distribution of statistic 𝜒 2 is approximately identical with the probability distribution of the
𝜒2
2
variate 𝜒 whose density function is given by 𝑃(𝜒 2)
= 𝑃0 𝜒 𝜈−2 𝑒 − 2 , where 𝜈 is a positive
constant, called the number of degrees of freedom, 𝑃0 is such that the total area under the
corresponding probability curve is one.
Applications of 𝜒 2 -distribution:
The applications of 𝜒 2 -distribution are very wide in statistics. It is used:
• To test the hypothetical value of population variance.
• To test the goodness of fit, that is, to judge whether there is a discrepancy between
theoretical and experimental observations.
• To test the independence of two attributes, that is, to judge whether the two attributes
are independent.
Problems:
1. A manufacturer of car batteries guarantees that the batteries will last, on average, 3 years
with a standard deviation of 1 year. Assuming the battery lifetime follows a normal
distribution, find 𝜒 2 for the life time 1.9, 2.4, 3.0, 3.5 and 4.2 years of five of these batteries.
Solution:
Given 𝜇 = 3 and 𝜎 = 1.
𝑛
2
(𝑛 − 1)𝑠 2 (𝑥𝑖 − 𝑥̅ )2
𝜒 = = ∑
𝜎2 𝜎2
𝑖=1
(22−22.483)2 (8−5.912)2
+ = 3.28
22.483 5.912
4. Fit a normal distribution to the following data of weights of 100 students of a certain college,
obtain the theoretical frequencies and hence find 𝜒 2 for the above data.
Weights (Kgs) 60-62 63-65 66-68 69-71 72-74
Frequency 5 18 42 27 8
5. Fit a binomial distribution to the following data:
𝑥𝑖 0 1 2 3 4 5
𝑓𝑖 38 144 342 287 164 25
Find the corresponding theoretical estimates for 𝑓𝑖 . Also find the statistic 𝜒 2 .
likelihood estimator of 𝑝 is
1
𝑝̂ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 .
distributed with unknown 𝜇 and known variance 𝜎 2 . The likelihood function of a random
sample of size 𝑛, say 𝑋1 , 𝑋2 , … … 𝑋𝑛 , 𝑖𝑠
𝑛 𝑛
𝑒 −𝑛𝜆 𝜆∑𝑖=1 𝑥𝑖
𝐿(𝑥1 , 𝑥2 , 𝑥3 … 𝑥𝑛 ; 𝜆) = ∏ 𝑓(𝑥: 𝜆) =
∏𝑛𝑖=1 𝑥!
𝑖=1
Now
∑𝑛𝑖=1 𝑥𝑖
𝜆̂ = = 𝑥̅
𝑛
𝑛
𝐿(𝜆) = ∏𝑛𝑖=1 𝜆𝑒 −𝜆𝑥𝑖 = 𝜆𝑛 𝑒 −𝜆 ∑𝑖=1 𝑥𝑖
ln 𝐿(𝜆) = 𝑛 ln 𝜆 − 𝜆 ∑𝑛𝑖=1 𝑥𝑖
Now
𝑑 ln 𝐿(𝜆) 𝑛
= 𝜆 − ∑𝑛𝑖=1 𝑥𝑖
𝑑𝜆
𝑛 1
𝜆̂ = ∑𝑛 𝑥 = 𝑥̅
𝑖=1 𝑖
Conclusion: Thus, the maximum likelihood estimator of 𝜆 is the reciprocal of the sample mean.
Let 𝑋 be normally distributed with mean 𝜇 and variance 𝜎 2 , where both 𝜇 and 𝜎 2 are
unknown. The likelihood function for a random sample of size 𝑛 is.
And
𝑛 1
ln 𝐿(𝜇, 𝜎 2 ) = − ( 2) ln(2𝜋𝜎 2 ) − (2𝜎2 ) ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2
Now
𝜕 ln 𝐿(𝜇,𝜎2 ) 1
= 𝜎2 ∑𝑛𝑖=1(𝑥𝑖 − 𝜇) = 0
𝜕𝜇
𝜕 ln 𝐿(𝜇,𝜎2 ) 𝑛 1
= − 2𝜎2 + 2𝜎4 ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 = 0.
𝜕𝜎2
The solutions to the above equations yield the maximum likelihood estimators.
𝜇̂ = 𝑥̅ ̂2 = 1 ∑(𝑥𝑖 − 𝑥̅ )2
𝜎
𝑛
𝑖=1
Video Links:
https://www.youtube.com/watch?v=zK70Fc_HHmg
https://www.youtube.com/watch?v=mlQRIoBErso
Disclaimer: The content provided is prepared by department of Mathematics for the specified
syllabus by using reference books mentioned in the syllabus. This material is specifically for the
use of RVCE students and for education purpose only.