Sampling
Sampling
Dr. Raji
Sampling may be defined as the selection of some part of an aggregate or totality on the basis of which a
judgement or inference about the aggregate or totality is made. In other words, it is the process of obtaining
information about an entire population by examining only a part of it. In most of the research work and surveys,
the usual approach happens to be to make generalisations or to draw inferences based on samples about the
parameters of population from which the samples are taken.
SOME FUNDAMENTAL DEFINITIONS
1. Universe/Population: From a statistical point of view, the term ‘Universe’ refers to the total of the items or
units in any field of inquiry, whereas the term ‘population’ refers to the total of items about which
information is desired. The attributes that are the object of study are referred to as characteristics and the
units possessing them are called as elementary units. The aggregate of such units is generally described as
population. Thus, all units in any field of inquiry constitute universe and all elementary units (on the basis of
one characteristic or more) constitute population.
The population or universe can be finite or infinite. The population is said to be finite if it consists of a fixed
number of elements so that it is possible to enumerate it in its totality. For instance, the population of a city, the
number of workers in a factory are examples of finite populations. The symbol ‘N’ is generally used to indicate
how many elements (or items) are there in case of a finite population. An infinite population is that population in
which it is theoretically impossible to observe all the elements. Thus, in an infinite population the number of
items is infinite i.e., we cannot have any idea about the total number of items. The number of stars in a sky,
possible rolls of a pair of dice are examples of infinite population.
2. Sampling frame: The elementary units or the group or cluster of such units may form the basis of sampling
process in which case they are called as sampling units. A list containing all such sampling units is known as
sampling frame. Thus, sampling frame consists of a list of items from which the sample is to be drawn. For
instance, one can use telephone directory as a frame for conducting opinion survey in a city.
3. Sampling design: A sample design is a definite plan for obtaining a sample from the sampling frame. It refers
to the technique or the procedure the researcher would adopt in selecting some sampling units from which
inferences about the population is drawn. Sampling design is determined before any data are collected.
5. Sampling error: Sample surveys do imply the study of a small portion of the population and as such there
would naturally be a certain amount of inaccuracy in the information collected. This inaccuracy may be termed
as sampling error or error variance. In other words, sampling errors are those errors which arise on account of
sampling and they generally happen to be random variations (in case of random sampling) in the sample
estimates around the true population values. The meaning of sampling error can be easily understood from the
following diagram:
The magnitude of the sampling error depends upon the nature of the universe; the more homogeneous the
universe, the smaller the sampling error. Sampling error is inversely related to the size of the sample i.e.,
sampling error decreases as the sample size increases and vice-versa. A measure of the random sampling error
can be calculated for a given sample design and size and this measure is often called the precision of the
sampling plan. As opposed to sampling errors, we may have non-sampling errors which may creep in during the
process of collecting actual information and such errors occur in all surveys whether census or sample. We have
no way to measure non-sampling errors.
6. Precision: Precision is the range within which the population average (or other parameter) will lie in
accordance with the reliability specified in the confidence level as a percentage of the estimate ± or as a
numerical quantity. For instance, if the estimate is Rs 4000 and the precision desired is ± 4%, then the true value
will be no less than Rs 3840 and no more than Rs 4160. This is the range (Rs 3840 to Rs 4160) within which the
true answer should lie. But if we desire that they should not deviate from the actual value by more than Rs 200
in either direction, in that case the range would be Rs 3800 to Rs 4200.
7. Confidence level and significance level: The confidence level or reliability is the expected percentage of times
that the actual value will fall within the stated precision limits. Thus, if we take a confidence level of 95%, then
we mean that there are 95 chances in 100 (or .95 in 1) that the sample results represent the true condition of the
population within a specified precision range against 5 chances in 100 (or .05 in 1) that it does not. Precision is
the range within which the answer may vary and still be acceptable; confidence level indicates the likelihood
that the answer will fall within that range, and the significance level indicates the likelihood that the answer will
fall outside that range. We can always remember that if the confidence level is 95%, then the significance level
will be (100 – 95) i.e., 5%; if the confidence level is 99%, the significance level is (100 – 99) i.e., 1%, and so on.
Types of Sample Design
Sampling is divided into two types:
• Probability sampling: In a probability sample, every unit in the population has equal chances for being
selected as a sample unit.
• Non-probability sampling: In the non-probability sampling, the units in the population have unequal or
negligible, almost no chances for being selected as a sample unit.
Probability Sampling Techniques
1. Random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
5. Multistage sampling
Non-probability Sampling Techniques
1. Deliberate sampling
2. Shopping mall intercept sampling
3. Sequential sampling
4. Quota sampling
5. Snowball sampling
6. Panel samples
• Random Sampling
Simple random sample is a process in which every item of the population has an equal probability of
being chosen.
There are two methods used in the random sampling:
1. Lottery method: Take a population containing four departmental stores: A, B, C and D. Suppose we
need to pick a sample of two stores from the population using a simple random procedure. We write
down all possible samples of two. Six different combinations, each containing two stores from the
population, are AB, AD, AC, BC, BD, CD. We can now write down six sample combination on six
identical pieces of paper, fold the piece of paper so that they cannot be distinguished. Put them in a
box. Mix them and pull one at random. This procedure is the lottery method of making a random
selection.
2. Using random number table: A random number table consists of a group of digits that are arranged
in random order, i.e., any row, column, or diagonal in such a table contains digits that are not in any
systematic order.
There are three tables for random numbers
(a) Tippet's table
(b) Fisher and Yate's table
(c) Kendall and Raington table.
The table for random number is as follows:
40743 39672
80833 18496
10743 39431
88103 23016
53946 43761
31230 41212
24323 18054
Example: Taking the earlier example of stores. We first number the stores. 1 A 2 B 3 C 4 D
The stores A, B, C and D have been numbered as 1, 2, 3 and 4. We proceed as follows, in order to
select two shops out of four randomly:
Suppose, we start with the second row in the first column of the table and decide to read diagonally.
The starting digit is 8. There are no departmental stores with the number 8 in the population. There
are only four stores. Move to the next digit on the diagonal, which is 0. Ignore it, since it does not
correspond to any of the stores in the population. The next digit on the diagonal is 1 which
corresponds to store A. Pick A and proceed until we get two samples. In this case, the two
departmental stores are 1 and 4. The sample derived from this consists of departmental stores A and
D.
In random sampling, there are two possibilities (a) Equal probability (b) Varying probability.
(a) Equal Probability: This is also called as the random sampling with replacement.
Example: Put 100 chits in a box numbered 1 to 100. Pick one number at random. Now the population
has 99 chits. Now, when a second number is being picked, there are 99 chits. In order to provide equal
probability, the sample selected is being replaced in the population.
(b) Varying Probability: This is also called random sampling without replacement. Once a number is
picked, it is not included again. Therefore, the probability of selecting a unit varies from the other. In
our example, it is 1/100, 1/99, 1/98, 1/97 if we select four samples out of 100.
• Systematic Random Sampling
There are three steps:
1. Sampling interval K is determined by the following formula:
K = No. of units in the population
No. of units desired in the sample
2. One unit between the first and Kth unit in the population list is randomly chosen.
3. Add Kth unit to the randomly chosen number.
Example: Consider 1,000 households from which we want to select 50 units.
K = 1000
50
Calculate
To select the first unit, we randomly pick one number between 1 to 20, say 17. So our sample begins
with 17, 37, 57………….. Please note that only the first item was randomly selected. The rest are
systematically selected. This is a very popular method because we need only one random number.
• Stratified Random Sampling
A probability sampling procedure in which simple random sub-samples are drawn from within
different strata that are, more or less equal on some characteristics. Stratified sampling is of two types:
1. Proportionate stratified sampling: The number of sampling units drawn from each stratum is in
proportion to the population size of that stratum.
2. Disproportionate stratified sampling: The number of sampling units drawn from each stratum is
based on the analytical consideration, but not in proportion to the size of the population of that
stratum.
Sample Proportionate
• If N is the size of the population.
• n is the size of the sample.
• i represents 1, 2, 3,…………..k [number of strata in the population]
P = n1/N1=n2/N2=n3/N3………=nk/Nk=n/N
n1 is the sample size to be drawn from stratum 1
n1 + n2 +………… nk = n [Total sample size of the all strata]
Example: A survey is planned to analyse the perception of people towards their own religious practices.
The population consists of various religions, viz., Hindu, Muslim, Christian, Sikh, Jain, assuming a total
of 10,000. Hindu, Muslim, Christian, Sikh and Jains consists of 6,000, 2,000, 1,000, 500 and 500
respectively. Determine the sample size of each stratum by applying proportionate stratified sampling,
if the sample size required is 200.
Cross Houses
1 X1 X2 X3 X4
2 X5 X6 X7 X8
3 X9 X10 X11 X12
4 X13 X14 X15 X16
We need to select eight houses. We can choose eight houses at random. Alternatively, two clusters,
each containing four houses can be chosen. In this method, every possible sample of eight houses
would have a known probability of being chosen – i.e. chance of one in two. We must remember that
in the cluster, each house has the same characteristics. With cluster sampling, it is impossible for
certain random sample to be selected. For example, in the cluster sampling process described above,
the following combination of houses could not occur: X1 X2 X5 X6 X9 X10 X13 X14. This is because the
original universe of 16 houses have been redefined as a universe of four clusters. So only clusters can
be chosen as a sample.
• Multistage Sampling
The name implies that sampling is done in several stages. This is used with stratified/cluster designs.
An illustration of double sampling is as follows.
The management of a newly-opened club is solicits new membership. During the first rounds, all
corporates were sent details so that those who are interested may enroll. Having enrolled, the second
round concentrates on how many are interested to enroll for various entertainment activities that club
offers such as billiards, indoor sports, swimming, gym etc. After obtaining this information, you
might stratify the interested respondents. This will also tell you the reaction of new members to
various activities. This technique is considered to be scientific, since there is no possibility of ignoring
the characteristics of the universe.
• Area Sampling
This is a probability sampling, a special form of cluster sampling.
Example: If someone wants to measure the sales of toffee in retail stores, one might choose a city
locality and then audit toffee sales in retail outlets in those localities.
The main problem in area sampling is the non-availability of lists of shops selling toffee in a particular
area. Therefore, it would be impossible to choose a probability sample from these outlets directly.
Thus, the first job is to choose a geographical area and then list out outlets selling toffee. Then follows
the probability sample for shops among the list prepared.
Example: You may like to choose shops which sell the brand-Cadbury dairy milk. The disadvantage of
the area sampling is that it is expensive and time-consuming.
Non-probability Sampling Techniques
Category Quota
General merit 1,000
Sport 600
NRI 100
SC/ST 300
Total 2,000