7.1 Two Phase Sampling
7.1 Two Phase Sampling
1.0 INTRODUCTION
In sample surveys, the information on an auxiliary variatex is required many times, either for
estimation (e.g, ratio, difference and regression estimation) or for selection (e.g. PPS) or
stratification to increase the efficiency of the estimator. When such information is lacking and
it is relatively cheaper to obtain information on x, we can consider taking a large preliminary
_
sample n ' for estimating x N or distribution of x as the case may be, and only a small sample
(generally a sub-sample) for measuring the y variate the character of interest for estimation.
This could mean to devote a part of resources to this large preliminary sample and therefore
reduction in sample size for measuring the study variatey. This technique is known as twophase sampling and was proposed for the first time by Neyman (1938).
Difference between two-phase and two stage sampling:
The main difference is that in two phase sampling it is necessary to have a complete sampling
frame of the units whereas in two-stage sampling, a sampling frame of the second stage units
is necessary only for the sample units selected at the first stage.
2.0 TWO PHASE SAMPLING FOR RATIO ESTIMATOR
The two-phase sampling technique consists in taking a large preliminary sample of size n ' to
_
estimate the population mean xN while a sub-sample of size n is drawn from n ' to observe
the character under study. The simplest biased ratio estimator based on a sample of size n ' is
given by
_
_
y Rd =
yn
_
xn ' = Rn xn '
(1)
xn
_
_
Re l.Bias = B1 ( y Rd )
E ( y Rd ) y N
_
1 1
= (Cx2 2 C xC y )
n n'
(2)
yN
which will be negligible if the sample size n is sufficiently large, it will be zero to first degree
of approximation, if the regression of y on x is linear and passes through origin.
_
1 1
1 1
V ( y Rd ) = S y2 + ( S y2 + RN2 S x2 2 RN S xy )
n' N
n n'
The estimate of variance is given as
_
_
1 1
1 1
Est.V ( y Rd ) = v( y Rd ) = s 2y + ( s 2y + Rn2 sx2 2 Rn sxy )
n' N
n n'
For large N,
_
_
s 2y 1 1 2
Est.V ( y Rd ) = v ( y Rd ) =
+ ( s y + Rn2 s x2 2 Rn s xy )
n' n n'
(3)
(4)
(5)
Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for STA 453: Sampling Theory and Applications
estimate the population mean xN while a sub-sample of size n is drawn from n ' to observe
the character under study. The simplest biased regression estimator based on a sample of size
n ' is given by
_
y ld = yn + b ( xn ' xn )
(6)
(8)
(9)
Sl. No. of
cut
16
17
18
19
20
Sub Total
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
(i)
8.7
11.6
11.5
14.4
17.8
8.4
8.7
14.6
12.1
7.9
8.9
11.1
13
10.5
14.2
12.7
11.9
15.5
17.1
10.9
'
i
= 516.8
Estimate the average dry yield per hectare of dry maize along with its sampling
error, by utilizing both information on harvest as well as on dry yield using (a)
ratio estimator in two phase sampling ,and (b) regression estimator in two phase
sampling.
What would have been loss in precision, had we neglected the additional
information on harvest yield from the subsample of only 20 cuts for which dry
yield is available.
(ii)
CALCULATION
Given;
n=20;
n=40
'
i
_
275.3
516.8
= 12.64 ;
x'n =
=
= 12.92 ; y n =
20
n'
40
_
i =1
252.7
xn =
= 13.77 ;
20
Rn =
yn
_
xn
= 0.9179
n
y
;
s
=
x
n
x
;
s
=
x
y
n
x
n
n yn
i
i
xy
i i
n
x
(n 1)
(n 1)
(n 1)
2
2
s y = 5.45;
sx = 6.16; sxy = 5.78
b=
sxy
s
2
x
s
5.78
5.78
= 0.9380; r = xy =
= 0.9968
6.16
sx s y
6.16 x 5.45
(i) Estimate the average dry yield per hectare of dry maize
(a) Estimate the average dry yield per hectare of dry maize by ratio method of estimation in
two phase sampling
_
_
y Rd =
yn
_
xn
s 2y
1 1
+ ( s y2 + Rn2 s x2 2 Rn s xy ) ..........
n' n n'
_
_
5.45 1
1
Est.V ( y Rd ) = v( y Rd ) =
+ (5.45 + 0.91792 x 6.16 2 x 0.9179 x5.78)
40 20 40
=0.1363x104
_
Est.V ( y Rd ) = v ( y Rd ) =
(b) Estimate the average dry yield per hectare of dry maize by regression method of
estimation in two phase sampling
_
y ld = yn + b ( xn ' xn )
_
s 2y
1 1
+ (1 r 2 ) s 2y
n' n n'
_
_
5.45 1
1
Est.V ( y ld ) = v( y ld ) =
+ (1 0.99682 ) x5.45
40 20 40
=0.1372 x104
(c) Estimate of variance of the average dry yield per hectare of dry maize by SRS wor
_
_
s2
Est.V ( y n ) = v( y n ) = y
n
_
_
5.45
Est.V ( y n ) = v( y n ) =
= 0.277
20
(ii) What would have been loss in precision, had we neglected the additional information on
harvest yield from the subsample of only 20 cuts for which dry yield is available i.e estimate
% gain in efficiency using ratio and regression using two phase with respect to sample mean
per element
_
Est.V ( y ld ) = v( y ld ) =
(ii) % Gain in efficiency of ratio estimator with mean per unit SRS wor
_
v( y n )
% Gain in efficiency =
1 x100
v( y_ )
Rd
where
_
s 2 5.45
v( y n ) = y =
= 0.277 x104
n
2
_
0.277 x104
v( y )
% Gain in efficiency = _ n 1 x100; =
1 x100
4
v( y )
0.1363x10
Rd
=103%
(iii) % Gain in efficiency of regression estimator with mean per unit SRS wor
_
v( y n )
% Gain in efficiency =
1 x100
v( y_ )
ld
0.277 x104
v( y )
% Gain in efficiency = _ n 1 x100; =
1 x100
4
v( y )
0.1372 x10
ld
=102%
EXERCISES
1
Define (i) ratio, and (ii) regression estimators in two phase sampling.
Define (i) ratio, and (ii) regression estimators variances and estimate of variances.
For estimating the total cow population, a survey was conducted in a district, in two consecutive years.
The district has 50 villages in all. A sample of 10 villages was selected with equal probability wor. In
the second year the survey was confined to a sub-sample of 5 villages selected from 10 villages. The
number of cow population in the selected villages are given below:
Sl. No. Village
1
2
3
4
5
6
7
8
9
10
Estimate the total number of cows in the second year of the survey with and without using the figures in the first
year and compare their efficiencies.