Business Mathematics in English 2
Business Mathematics in English 2
,
_
dx
dy
3
dx
dy
+ 2y = e
x
2)
2
2
dx
y d
5
dx
dy
+3y = 0
DIFFERENTIAL EQUATIONS
6
2
3)
2
3
2
1
'
,
_
+
dx
dy
= k
2
2
dx
y d
4) x
x
u
+ y
y
u
= 0
5)
2
2
x
u
+ 2
2
y
u
+
2
2
z
u
= 0 6)
2
2
x
z
+ 2
2
y
z
= x + y
(1), (2) and (3) are ordinary differential equations and
(4), (5) and (6) are partial differential equations.
In this chapter we shall study ordinary differential equations
only.
6.1.1 Order and Degree of a Differential Equation
The order of the derivative of the highest order present in a
differential equation is called the order of the differential equation.
For example, consider the differential equation
x
2
3
2
2
,
_
dx
y d
+ 3
2
3
3
,
_
dx
y d
+7
dx
dy
4y = 0
The orders of
3
3
dx
y d
,
2
2
dx
y d
and
dx
dy
are 3, 2 and 1 respectively. So
the highest order is 3. Thus the order of the differential equation is 3.
The degree of the derivative of the highest order present in a
differential equation is called the degree of the differential equation.
Here the differential coefficients should be free from the radicals
and fractional exponents.
Thus the degree of
x
2
3
2
2
,
_
dx
y d
+3
2
3
3
,
_
dx
y d
+7
dx
dy
4y = 0 is 2
Example 1
Write down the order and degree of the following
differential equations.
3
(i)
3
,
_
dx
dy
4
,
_
dx
dy
+ y = 3e
x
(ii)
3
2
,
_
2
dx
y d
+ 7
4
,
_
dx
dy
= 3sin x
(iii)
2
2
dy
x d
+ a
2
x = 0 (iv)
2
,
_
dx
dy
3
3
3
dx
y d
+7
2
2
dx
y d
+4
,
_
dx
dy
logx= 0
(v)
2
1
,
_
+
dx
dy
= 4x (vi)
3
2
2
1
1
1
]
1
,
_
+
dx
dy
=
2
2
dx
y d
(vii)
2
2
dx
y d
dx
dy
= 0 (viii)
2
1 x +
=
dx
dy
Solution :
The order and the degree respectively are,
(i) 1 ; 3 (ii) 2 ; 3 (iii) 2 ; 1 (iv) 3 ; 1
(v) 1 ; 2 (vi) 2 ; 3 (vii) 2 ; 2 (viii) 1 ; 1
Note
Before ascertaining the order and degree in (v), (vi) & (vii)
we made the differential coefficients free from radicals and fractional
exponents.
6.1.2 Family of curves
Sometimes a family of curves can be represented by a single
equation. In such a case the equation contains an arbitrary constant
c. By assigning different values for c, we get a family of curves. In
this case c is called the parameter or arbitrary constant of the
family.
Examples
(i) y = mx represents the equation of a family of straight lines
through the origin , where m is the parameter.
(ii) x
2
+ y
2
= a
2
represents the equation of family of concentric
circles having the origin as centre, where a is the parameter.
(iii) y = mx + c represents the equation of a family of straight
lines in a plane, where m and c are parameters.
4
6.1.3 Formation of Ordinary Differential Equation
Consider the equation y = mx + ---------(1)
where m is a constant and is the parameter.
This represents one parameter family of parallel straight lines
having same slope m.
Differentiating (1) with respect to x, we get,
dx
dy
= m
This is the differential equation representing the above family
of straight lines.
Similarly for the equation y = Ae
5x
, we form the differential
equation
dx
dy
= 5y by eliminating the arbitrary constant A.
The above functions represent one-parameter families. Each
family has a differential equation. To obtain this differential equation
differentiate the equation of the family with respect to x, treating
the parameter as a constant. If the derived equation is free from
parameter then the derived equation is the differential equation of
the family.
Note
(i) The differential equation of a two parameter family is obtained
by differentiating the equation of the family twice and by
eliminating the parameters.
(ii) In general, the order of the differential equation to be formed
is equal to the number of arbitrary constants present in the
equation of the family of curves.
Example 2
Form the differential equation of the family of curves
y = A cos 5x + B sin 5x where A and B are parameters.
Solution :
Given y = A cos 5x + B sin 5x
dx
dy
= 5A sin5x + 5B cos 5x
5
2
2
dx
y d
= 25 (A cos 5x) 25 (B sin 5x) = 25y
2
2
dx
y d
+ 25y = 0.
Example 3
Form the differential equation of the family of curves
y = ae
3x
+ be
x
where a and b are parameters.
Solution :
y = ae
3x
+ be
x
------------(1)
dx
dy
= 3ae
3x
+ be
x
------------(2)
2
2
dx
y d
= 9ae
3x
+ be
x
------------(3)
(2) (1)
dx
dy
y = 2ae
3x
------------(4)
(3) (2)
2
2
dx
y d
dx
dy
=6ae
3x
= 3
,
_
y
dx
dy
[using (4)]
2
2
dx
y d
4
dx
dy
+ 3y = 0
Example 4
Find the differential equation of a family of curves given
by y = a cos (mx + b), a and b being arbitrary constants.
Solution :
y = a cos (mx + b) ------------(1)
dx
dy
= ma sin (mx + b)
2
2
dx
y d
= m
2
a cos (mx + b) = m
2
y [using (1)]
2
2
dx
y d
+ m
2
y = 0 is the required differential equation.
6
Example 5
Find the differential equation by eliminating the arbitrary
constants a and b from y = a tan x + b sec x.
Solution :
y = a tan x + b sec x
Multiplying both sides by cos x we get,
y cos x = a sin x + b
Differentiating with respect to x we get
y (sin x) +
dx
dy
cos x = a cos x
y tan x +
dx
dy
= a -----------(1)
Differentiating (1) with respect to x, we get
2
2
dx
y d
dx
dy
tan x y sec
2
x = 0
EXERCISE 6.1
1) Find the order and degree of the following :
(i) x
2
2
2
dx
y d
3
dx
dy
+ y = cos x (ii)
3
3
dx
y d
3
2
2
2
,
_
dx
y d
+5
dx
dy
= 0
(iii)
2
2
dx
y d
dx
dy
= 0 (iv)
2
1
2
2
1
,
_
+
dx
y d
=
dx
dy
(v)
3
1
1
,
_
+
dx
dy
=
2
2
dx
y d
(vi)
2
2
1
dx
y d
+ = x
dx
dy
(vii)
2
3
2
2
,
_
dx
y d
=
2
,
_
dx
dy
(viii) 3
2
2
dx
y d
+5
3
,
_
dx
dy
3y = e
x
(ix)
2
2
dx
y d
= 0 (x)
3
2
1
2
2
,
_
+
dx
y d
=
3
1
,
_
dx
dy
2) Find the differential equation of the following
(i) y = mx (ii) y = cx c + c
2
7
(iii) y = mx +
m
a
, where m is arbitrary constant
(iv) y = mx + c where m and c are arbitrary constants.
3) Form the differential equation of family of rectangular
hyperbolas whose asymptotes are the coordinate axes.
4) Find the differential equation of all circles x
2
+ y
2
+ 2gx = 0
which pass through the origin and whose centres are on
the x-axis.
5) Form the differential equation of y
2
= 4a (x + a), where a is
the parameter.
6) Find the differential equation of the family of curves
y = ae
2x
+ be
3x
where a and b are parameters.
7) Form the differential equation for y = a cos 3x + b sin 3x
where a and b are parameters.
8) Form the diffrential equation of y = ae
bx
where a and b
are the arbitrary constants.
9) Find the differential equation for the family of concentric circles
x
2
+ y
2
= a
2
, a is the paramter.
6.2 FIRST ORDER DIFFERENTIAL EQUATIONS
6.2.1 Solution of a differential equation
A solution of a differential equation is an explicit or implicit
relation between the variables which satisfies the given differential
equation and does not contain any derivatives.
If the solution of a differential equation contains as many
arbitrary constants of integration as its order, then the solution is
said to be the general solution of the differential equation.
The solution obtained from the general solution by assigning
particular values for the arbitrary constants, is said to be a particular
solution of the differential equation.
For example,
8
Differential equation General solutuion Particular solution
(i)
dx
dy
= sec
2
x y = tan x + c y= tan x - 5
(c is arbitrary constant)
(ii)
dx
dy
= x
2
+ 2x y =
3
3
x
+x
2
+ c y =
3
3
x
+ x
2
+ 8
(iii)
2
2
dx
y d
9y = 0 y = Ae
3x
+ Be
-3x
y = 5e
3x
7e
-3x
6.2.2 Variables Separable
If it is possible to re-arrange the terms of the first order and
first degree differential equation in two groups, each containing only
one variable, the variables are said to be separable.
When variables are separated, the differentail equation takes
the form f(x) dx + g(y) dy = 0 in which f(x) is a function of x
only and g(y) is a function of y only.
Then the general solution is
f (x) dx +
y
dy
=
x
dx
+ k where k is a constant of integration.
log y = log x + k.
The value of k varies from to .
This general solution can be expressed in a more convenient
form by assuming the constant of integration to be log c. This is
possible because log c also can take all values between - and
as k does. By this assumption, the general solution takes the form
log y log x = log c log (
x
y
) = log c
9
i.e.
x
y
= c y = cx
which is an elegant form of the solution of the differential equation.
Note
(i) When y is absent, the general form of first order linear
differential equation reduces to
dx
dy
= f(x) and therefore the
solution is y =
f (x) dx + c
(ii) When x is absent , it reduces to
dx
dy
= g(y)
and in this case, the solution is
) ( y g
dy
=
dx + c
Example 6
Solve the differential equation xdy + ydx = 0
Solution :
xdy + ydx = 0 , dividing by xy we get
y
dy
+
x
dx
= 0. Then
y
dy
+
x
dx
= c
1
log y + log x = log c xy = c
Note
(i) xdy + ydx = 0 d(xy) = 0 xy = c, a constant.
(ii) d(
y
x
) =
2
y
xdy ydx
2
y
xdy ydx
=
d (
y
x
) + c =
y
x
+ c
Example 7
Solve
dx
dy
= e
3x+y
Solution :
dx
dy
= e
3x
e
y
y
e
dy
= e
3x
dx
y
e dy =
x
e
3
dx + c
e
y
=
3
3 x
e
+ c
3
3 x
e
+ e
-y
= c
10
Example 8
Solve (x
2
ay) dx = (ax y
2
)dy
Solution :
Writing the equation as
x
2
dx + y
2
dy = a(xdy + ydx)
x
2
dx + y
2
dy = a d(xy)
2
x dx +
2
y dy= a
d (xy) + c
3
3
x
+
3
3
y
= a(xy) + c
Hence the general solution is x
3
+ y
3
= 3axy + c
Example 9
Solve y
2
1
) (1
2
x + dy + x
2
1 y +
dx = 0
Solution :
y
2
1 x + dy + x
2
1 y +
dx = 0 [dividing by
2
1 x +
2
1 y +
]
2
1 y
y
+
dy +
2
1 x
x
+
dx = 0
+
2
1 y
y
dy +
+
2
1 x
x
dx = c
1
2
1
2
1
t dt +
2
1
2
1
u du = c
i.e.
2
1
t +
2
1
u = c or
2
1 y +
+
2
1 x + = c
Note : This problem can also be solved by using
[ f(x)]
n
f (x) dx =
1
)] ( [
1
+
+
n
x f
n
Example 10
Solve (sin x + cos x) dy + (cos x sin x) dx = 0
Put 1+y
2
= t
2ydy = dt
put 1+x
2
= u
2xdx = du
11
Solution :
The given equation can be written as
dy +
x x
x x
cos sin
sin cos
+
dx = 0
dy +
x x
x x
cos sin
sin cos
dx = c
y + log(sin x + cos x) = c
Example 11
Solve x
dx
dy
+ cos y = 0, given y =
4
when x = 2
Solution :
x dy = cos y dx
sec y dy =
x
dx
+ k, where k is a constant of integration.
log (sec y + tan y) + log x = log c, where k = log c
or x(sec y + tan y) = c.
When x = 2 , y =
4
, we have
2 (sec
4
+ tan
4
) = c or c = 2 ( 2 + 1) = 2 + 2
The particular solution is x (sec y + tan y) = 2 + 2
Example 12
The marginal cost function for producing x units is
MC = 23+16x 3x
2
and the total cost for producing 1 unit is
Rs.40. Find the total cost function and the average cost
function.
Solution :
Let C(x) be the total cost function where x is the number of
units of output. Then
dx
dC
= MC = 23 + 16x 3x
2
12
dx
dC
dx =
( 23+16x 3x
2
)dx+ k
C = 23x + 8x
2
x
3
+ k, where k is a constant
At x = 1, C(x) = 40 (given)
23(1) + 8(1)
2
- 1
3
+ k = 40 k = 10
Total cost function C(x) = 23x + 8x
2
x
3
+ 10
Average cost function =
x
function cost Total
=
x
x x x 10 8 23
3 2
+ +
Average cost function = 23 + 8x x
2
+
x
10
Example 13
What is the general form of the demand equation which
has a constant elasticity of 1 ?
Solution :
Let x be the quantity demanded at price p. Then the
elasticity is given by
d
=
x
p
dp
dx
Given
x
p
dp
dx
= 1
x
dx
=
p
dp
x
dx
=
p
dp
+ log k
log x = log p + log k, where k is a constant.
log x = log kp x = kp p =
k
1
x
i.e. p = cx, where c =
k
1
is a constant
Example 14
The relationship between the cost of operating a
warehouse and the number of units of items stored in it is
given by
dx
dC
= ax + b, where C is the monthly cost of operating
the warehouse and x is the number of units of items in storage.
Find C as a function of x if C = C
0
when x = 0.
13
Solution :
Given
dx
dC
= ax + b dC = (ax + b) dx
C d =
+ ) b (ax dx + k, (k is a constant)
C =
2
2
ax
+bx + k,
when x = 0, C = C
0
(1) C
0
=
2
a
(0) + b(0) + k
k = C
0
Hence the cost function is given by
C =
2
a
x
2
+ bx + C
0
Example 15
The slope of a curve at any point is the reciprocal of
twice the ordinate of the point. The curve also passes through
the point (4, 3). Find the equation of the curve.
Solution :
Slope of the curve at any point P(x, y) is the slope of the
tangent at P(x, y)
dx
dy
=
y 2
1
2ydy = dx
y 2 dy =
dx + c y
2
= x + c
Since the curve passes through (4, 3), we have
9 = 4 + c c = 5
Equation of the curve is y
2
= x + 5
EXERCISE 6.2
1) Solve (i)
dx
dy
+
2
2
1
1
x
y
= 0 (ii)
dx
dy
=
2
2
1
1
x
y
+
+
(iii)
dx
dy
=
1
2
+
x
y
(iv) x
2
1 y + + y
2
1 x +
dx
dy
= 0
14
2) Solve (i)
dx
dy
= e
2x-y
+ x
3
e
y
(ii) (1e
x
) sec
2
y dy + 3e
x
tan y dx = 0
3) Solve (i)
dx
dy
= 2xy + 2ax (ii) x(y
2
+ 1) dx + y(x
2
+ 1) dy = 0
(iii) (x
2
yx
2
)
dx
dy
+ y
2
+ xy
2
= 0
4) Solve (i) xdy + ydx + 4
2 2
1 y x dx =0 (ii) ydxxdy+3x
2
y
2
e
x
3
dx = 0
5) Solve (i)
dx
dy
=
2 2
5 4
2
2
+
+ +
x x
y y
(ii)
dx
dy
+
1
1
2
2
+ +
+ +
x x
y y
= 0
6) Find the equation of the curve whose slope at the point (x, y) is
3x
2
+ 2, if the curve passes through the point (1, -1)
7) The gradient of the curve at any point (x, y) is proportional to
its abscissa. Find the equation of the curve if it passes through
the points (0, 0) and (1, 1)
8) Solve : sin
-1
x dy +
2
1 x
y
dv=
x
dx
or
2
1
2
v
v
=
x
dx
+ c
1
log (1 v
2
) = log x + log c [
) (
) (
x f
x f
dx = log f(x)]
or log (1 v
2
) + log x = log c (1 v
2
) x = c
Replacing v by
x
y
, we get
16
,
_
2
2
1
x
y
x = c or x
2
y
2
= cx
Example 17
Solve : (x
3
+ y
3
)dx = (x
2
y + xy
2
) dy
Solution :
The given equation can be written as
dx
dy
=
2 2
3 3
xy y x
y x
+
+
-------------- (1)
Put y = vx
dx
dy
= v + x
dx
dv
v + x
dx
dv
=
2
3
1
v v
v
+
+
x
dx
dv
=
2
3
1
v v
v
+
+
v =
) 1 (
1
2
+
v v
v
=
) 1 (
) 1 )( 1 (
+
+
v v
v v
v
v
1
dv =
x
1
dx + c
v
v
1
dv =
x
1
dx + c or
v
v
1
1 ) 1 (
dv =
x
1
dx + c
,
_
+
v 1
) 1 (
1
dv =
x
1
dx + c
v + log (1 v) = log x + c
Replacing v by
x
y
, we get
x
y
+ log (x y) = c
Example 18
Solve x
dx
dy
= y
2 2
y x +
Solution :
Now,
dx
dy
=
x
y x y
2 2
+
------------(1)
Put y = vx
dx
dy
= v + x
dx
dv
17
(1) v + x
dx
dv
=
x
x v x vx
2 2 2
+
= v
2
1 v +
x
dx
dv
=
2
1 v +
or =
2
1 v
dv
+
=
x
dx
+
2
1 v
dv
=
x
dx
+ c
1
log (v +
2
1 v + )= log x + log c
log x (v +
2
1 v + )= log c
or x (v +
2
1 v + ) = c
i.e. x
1
1
]
1
+ +
2
2
1
x
y
x
y
= c or y +
2 2
y x + = c
Example 19
Solve (x +y) dy + (x y)dx = 0
Solution :
The equation is
dx
dy
=
,
_
y x
y x
------------ (1)
Put y = vx
dx
dy
= v + x
dx
dv
we get v + x
dx
dv
=
vx x
vx x
+
or v + x
dx
dv
=
v
v
+
1
1
i.e. x
dx
dv
=
,
_
+
+
v
v
v
1
1
or x
dx
dv
=
v
v v v
+
+ +
1
) 1 (
2
2
1
1
v
v
+
+
dv =
x
1
dx or
+
2
1 v
dv
dv +
+
2
1
2
2
1
v
v
dv =
x
1
dx + c
tan
-1
v +
2
1
log (1 + v
2
) = log x + c
i.e. tan
-1
,
_
x
y
+
2
1
log
,
_
+
2
2 2
x
y x
= logx + c
18
tan
-1
,
_
x
y
+
2
1
log (x
2
+ y
2
)
2
1
logx
2
= logx + c
i.e. tan
-1
,
_
x
y
+
2
1
log (x
2
+ y
2
) = c
Example 20
The net profit p and quantity x satisfy the differential
equation
dx
dp
=
2
3 3
3
2
xp
x p
.
Find the relationship between the net profit and demand
given that p = 20 when x = 10.
Solution :
dx
dp
=
2
3 3
3
2
xp
x p
-------------(1)
is a differential equation in x and p of homogeneous type
Put p = vx
dx
dp
= v + x
dx
dv
(1) v + x
dx
dv
=
2
3
3
1 2
v
v
x
dx
dv
=
2
3
3
1 2
v
v
v
x
dx
dv
=
1
]
1
+
2
3
3
1
v
v
3
2
1
3
v
v
+
dv=
x
dx
+
3
2
1
3
v
v
dv =
x
dx
= k
log (1 + v
3
) = log x + log k , where k is a constant
log (1 + v
3
) = log
x
k
i.e. 1 + v
3
=
x
k
Replacing v by
x
p
, we get
x
3
+ p
3
= kx
2
But when x = 10, it is given that p = 20
(10)
3
+ (20)
3
= k(10)
2
k = 90 x
3
+p
3
= 90x
2
p
3
= x
2
(90 x) is the required relationship.
19
Example 21
The rate of increase in the cost C of ordering and
holding as the size q of the order increases is given by the
differential equation
dq
dC
=
2
2
2C C
q
q +
. Find the relationship between C and
q if C = 1 when q = 1.
Solution :
dq
dC
=
2
2
C 2 C
q
q +
------------(1)
This is a homogeneous equation in C and q
Put C = vq
dq
dC
= v + q
dq
dv
(1) v + q
dq
dv
=
2
2 2 2
2
q
vq q v +
= v
2
+ 2v
q
dq
dv
= v
2
+ v = v (v + 1)
) 1 ( + v v
dv
=
q
dq
+
+
) 1 (
) 1 (
v v
v v
dv =
q
dq
+ k , k is a constant
v
dv
+1 v
dv
=
q
dq
+ log k,
log v log (v + 1) = log q + log k
log
1 + v
v
= log qk or
1 + v
v
= kq
Replacing v by
q
C
we get, C = kq(C + q)
when C = 1 and q = 1
C = kq(C + q) k =
2
1
C =
2
) C ( q q +
is the relation between C and q
20
Example 22
The total cost of production y and the level of output x
are related to the marginal cost of production by the equation
(6x
2
+ 2y
2
) dx (x
2
+ 4xy) dy = 0. What is the relation
between total cost and output if y = 2 when x = 1?
Solution :
Given (6x
2
+ 2y
2
) dx = (x
2
+ 4xy) dy
dx
dy
=
xy x
y x
4
2 6
2
2 2
+
+
------------(1)
is a homogeneous equation in x and y.
Put y = vx
dx
dy
= v +x
dx
dv
(1) v + x
dx
dv
=
xy x
y x
4
2 6
2
2 2
+
+
or
2
2 6
4 1
v v
v
+
dv =
x
1
dx
2
2 6
4 1
v v
v
dv =
x
1
dx + k, where k is a constant
log(6v2v
2
) = log x + log k = log kx
2
2 6
1
v v
= kx
x = c(6x
2
xy 2y
2
) where c =
k
1
and v =
x
y
when x = 1 and y = 2 , 1 = c(6 2 8) c =
4
1
4x = (2y
2
+ xy 6x
2
)
EXERCISE 6.3
1) Solve the following differential equations
(i)
dx
dy
=
x
y
2
2
x
y
(ii) 2
dx
dy
=
x
y
2
2
x
y
(iii)
dx
dy
=
xy x
y xy
3
2
2
2
(iv) x(y x)
dx
dy
= y
2
(v)
dx
dy
=
xy x
xy y
2
2
2
2
(vi)
dx
dy
=
2 2
y x
xy
21
(vii) (x + y)
2
dx = 2x
2
dy (viii) x
dx
dy
= y +
2 2
y x +
2) The rate of increase in the cost C of ordering and holding as
the size q of the order increases is given by the differential
eqation
dq
dC
=
q
q
C 2
C
2 2
+
. Find the relatinship between C and
q if C = 4 when q = 2.
3) The total cost of production y and the level of output x
are related to the marginal cost of production by the
equation
dx
dy
=
xy
y x
2 2
24
. What is the total cost function
if y = 4 when x = 2 ?
6.2.5 First order linear differential equation
A first order differential equation is said to be linear when
the dependent variable and its derivatives occur only in first degree
and no product of these occur.
An equation of the form
dx
dy
+ Py = Q,
where P and Q are functions of x only, is called a first order
linear differential equation.
For example,
(i)
dx
dy
+3y = x
3
; here P = 3, Q = x
3
(ii)
dx
dy
+ y tan x = cos x, P = tan x, Q = cos x
(iii)
dx
dy
x 3y = xe
x
, P =
x
3
, Q = e
x
(iv) (1 + x
2
)
dx
dy
+ xy = (1+x
2
)
3
, P =
2
1 x
x
+
, Q = (1 + x
2
)
2
are first order linear differential equations.
6.2.6 Integrating factor (I.F)
A given differential equation may not be integrable as such.
But it may become integrable when it is multiplied by a function.
22
Such a function is called the integrating factor (I.F). Hence an
integrating factor is one which changes a differential equation into
one which is directly integrable.
Let us show that
dx
e
P
is the integrating factor
for
dx
dy
+ Py = Q ---------(1)
where P and Q are function of x.
Now,
dx
d
) (
P
dx
ye
=
dx
dy
dx
e
P
+ y
dx
d
(
dx
e
P
)
=
dx
dy
dx
e
P
+ y
dx
e
P
dx
d
dx P
=
dx
dy
dx
e
P
+ y
dx
e
P
P = (
dx
dy
+Py)
dx
e
P
When (1) is multiplied by
dx
e
P
,
it becomes (
dx
dy
+Py)
dx
e
P
= Q
dx
e
P
dx
d
) (
P
dx
ye
= Q
dx
e
P
Integrating this, we have
y
dx
e
P
=
dx
e
P
Q
dx + c -------------(2)
So
dx
e
p
is the integrating factor of the differential equation.
Note
(i) e
log f(x)
= f(x) when f(x) > 0
(ii) If Q = 0 in
dx
dy
+ Py = Q, then the general solution is
y (I.F) = c, where c is a constant.
(iii) For the differential equation
dy
dx
+ Px = Q where P and Q
are functions of y alone, the (I.F) is
dy
e
P
and the solution is
x (I.F) =
Q
(I.F) dy + c
23
Example 23
Solve the equation (1 x
2
)
dx
dy
xy = 1
Solution :
The given equation is (1x
2
)
dx
dy
xy = 1
dx
dy
2
1 x
x
y =
2
1
1
x
This is of the form
dx
dy
+ Py = Q,
where P =
2
1 x
x
; Q =
2
1
1
x
I.F =
dx
e
P
=
dx
x
x
e
2
1
=
2
1 x
The general solution is,
y (I.F) =
Q(I.F)dx + c
y
2
1 x =
2
1
1
x
2
1 x dx + c
=
2
1 x
dx
+ c
y
2
1 x = sin
-1
x + c
Example 24
Solve
dx
dy
+ay = e
x
(where a 1)
Solution :
The given equation is of the form
dx
dy
+ Py = Q
Here P = a ; Q = e
x
I.F =
dx
e
P
= e
ax
The general solution is
y (I.F) =
Q(I.F)dx + c
24
y e
ax
=
x
e e
ax
dx + c =
+ x a
e
) 1 (
dx + c
y e
ax
=
1
) 1 (
+
+
a
e
x a
+ c
Example 25
Solve cos x
dx
dy
+ y sin x = 1
Solution :
The given equation can be reduced to
dx
dy
+ y
x
x
cos
sin
=
x cos
1
or
dx
dy
+ y tanx = secx
Here P = tanx ; Q = secx
I.F =
dx x
e
tan
= e
log secx
= sec x
The general solution is
y (I.F) =
Q(I. F)) dx + c
y sec x =
x
2
sec dx + c
y sec x = tan x + c
Example 26
A bank pays interest by treating the annual interest as
the instantaneous rate of change of the principal. A man
invests Rs.50,000 in the bank deposit which accrues interest,
6.5% per year compounded continuously. How much will he
get after 10 years? (Given : e
. 65
=1.9155)
Solution :
Let P(t) denotes the amount of money in the account at time
t. Then the differential equation governing the growth of money is
dt
dP
=
100
5 . 6
P = 0.065P
P
P d
=
) 065 . 0 ( dt + c
log
e
P = 0.065t + c P = e
0.065t
e
c
25
P = c
1
e
0.065t
-------------(1)
At t = 0, P = 50000.
(1) 50000 = c
1
e
0
or c
1
= 50000
P = 50000 e
0.065t
At t = 10, P= 50000 e
0.065 x 10
= 50000 e
0.65
= 50000 x (1.9155) = Rs.95,775.
Example 27
Solve
dx
dy
+ y cos x =
2
1
sin 2x
Solution :
Here P = cos x ; Q =
2
1
sin 2x
Pdx =
x cos dx = sin x
I.F =
dx
e
P
= e
sin x
The general solution is
y (I.F) =
Q(I.F) dx + c
=
2
1
sin 2x. e
sin x
dx + c
=
x sin cos x. e
sin x
dx + c
=
t e
t
dt + c = e
t
(t 1) + c
= e
sin x
(sin x 1) + c
Example 28
A manufacturing company has found that the cost C of
operating and maintaining the equipment is related to the
length m of intervals between overhauls by the equation
m
2
dm
dC
+ 2mC = 2 and C = 4 when m = 2. Find the
relationship between C and m.
Let sin x = t,
then cos x dx = dt
26
Solution :
Given m
2
dm
dC
+ 2mC = 2 or
dm
dC
+
m
C 2
=
2
2
m
This is a first order linear differential equation of the form
dx
dy
+ Py = Q, where P =
m
2
; Q =
2
2
m
I.F =
dm
e
P
=
dm
e
m
2
= e
l og m
2
= m
2
General solution is
C (I.F) =
Pdx =
x
10
dx = 10 log x = log
,
_
10
1
x
27
I.F =
dx
e
P
=
,
_
10
1
log
x
e =
10
1
x
General solution is
C(I.F) =
2
10
x
,
_
10
1
x
dx + k or
10
C
x
=
11
10
,
_
11
1
x
+ k
when C = C
0
x = x
0
10
0
0
C
x
=
11
10
,
_
11
0
1
x
+ k k =
10
0
C
x
11
0
11
10
x
The solution is
10
C
x
=
11
10
,
_
11
1
x
+ 1
]
1
11
0
10
0
11
10 C
x x
10
C
x
10
0
C
x
=
11
10
,
_
11
0
11
1 1
x x
EXERCISE 6.4
1) Solve the following differential equations
(i)
dx
dy
+ y cot x = cosec x
(ii)
dx
dy
sin 2x = y cot x
(iii)
dx
dy
+ y cot x = sin 2x
(iv)
dx
dy
+ y cot x = 4x cosec x, if y = 0 when x =
2
(v)
dx
dy
3y cot x = sin 2x and if y = 2 when x =
2
(vi) x
dx
dy
3y = x
2
(vii)
dx
dy
+
2
1
2
x
xy
+
=
2 2
) 1 (
1
x +
given that y = 0 when x = 1
28
(viii)
dx
dy
y tan x = e
x
sec x
(ix) log x
dx
dy
+
x
y
= sin 2x
2) A man plans to invest some amount in a small saving scheme
with a guaranteed compound interest compounded continuously
at the rate of 12 percent for 5 years. How much should he
invest if he wants an amount of Rs.25000 at the end of 5 year
period. (e
-0. 6
= 0.5488)
3) Equipment maintenance and operating cost C are related to
the overhaul interval x by the equation x
2
dx
dC
(b1)Cx = ba,
where a, b are constants and C = C
0
when x = x
0
. Find the
relationship between C and x.
4) The change in the cost of ordering and holding C as quantity q
is given by
dq
dC
= a
q
C
where a is a constant.
Find C as a function of q if C = C
0
when q = q
0
6.3 SECOND ORDER LINEAR DIFFERENTIAL
EQUATIONS WITH CONSTANT COEFFICIENTS
The general form of linear and second order differential
equation with constant coefficients is
a
2
2
dx
y d
+ b
dx
dy
+ cy = f(x).
We shall consider the cases where
(i) f(x) = 0 and f(x) = Ke
x
For example,
(i) 3
2
2
dx
y d
5
dx
dy
+ 6y = 0 (or) 3y
``
5y
`
+ 6y = 0
(ii)
2
2
dx
y d
4
dx
dy
+ 3y = e
5x
(or) (D
2
4D + 3)y = e
5x
29
(iii)
2
2
dx
y d
+
dx
dy
y = 7 (or) (D
2
+ D 1)y = 7
are second order linear differential equations.
6.3.1 Auxiliary equations and Complementary functions
For the differential equation, a
2
2
dx
y d
+ b
dx
dy
+ cy = f (x),
am
2
+ bm + c = 0 is said to be the auxiliary equation. This is a
quadratic equation in m. According to the nature of the roots m
1
and m
2
of auxiliary equation we write the complementary function
(C.F) as follows.
Nature of roots Complementary function
(i) Real and unequal (m
1
m
2
) Ae
m
1
x
+ Be
m
2
x
(ii) Real and equal (m
1
= m
2
=m say) (Ax + B) e
mx
(iii) Complex roots ( + i) e
x
(Acos x + Bsin x)
(In all the cases, A and B are arbitrary constants)
6.3.2 Particular Integral (P.I)
Consider (aD
2
+ bD + c)y = e
x
Let f(D) = aD
2
+ bD + c
Case 1 : If f () 0 then is not a root of the auxiliary
equation f(m) = 0.
Rule : P.I =
) D (
1
f
e
x
=
) (
1
f
e
x
.
Case 2 : If f() = 0, satisfies the auxiliary equation f(m) = 0.
Then we proceed as follows.
(i) Let the auxiliary equation have two distinct roots m
1
and m
2
and let = m
1
.
Then f(m) = a(m m
1
) (m m
2
) = a(m ) (m m
2
)
Rule : P.I =
) )( - D (
1
2
m D a
e
x
=
) (
1
2
m a
xe
x
30
(ii) Let the auxiliary equation have two equal roots each equal to
. i.e. m
1
= m
2
= .
f(m) = a ( m )
2
Rule : P.I =
2
) D (
1
a
e
x
=
! 2
1
2
x
a
e
x
6.3.3 The General solution
The general solution of a second order linear differential
equation is y = Complementary function (C.F) + Particular
integral (P.I)
Example 30
Solve 3
2
2
dx
y d
5
dx
dy
+ 2y = 0
Solution :
The auxiliary equation is 3m
2
5m + 2 = 0
(3m 2) (m 1) = 0
The roots are m
1
=
3
2
and m
2
= 1 (Real and distinct)
The complementary function is
C.F = A
x
e
3
2
+ Be
x
The general solution is
y = A
x
e
3
2
+ Be
x
Example 31
Solve (16D
2
24D + 9) y = 0
Solution :
The auxiliary equation is 16m
2
24m + 9 = 0
(4m -3)
2
= 0 m =
4
3
,
4
3
The roots are real and equal
31
The C.F is (Ax + B)
x
e
4
3
The general solution is y = (Ax + B)
x
e
4
3
Example 32
Solve (D
2
6D + 25) y = 0
Solution :
The auxiliary equation is m
2
6m + 25 = 0
m =
a
ac b b
2
4
2
t
=
2
100 36 6 t
=
2
8 6 i t
= 3 + 4i
The roots are complex and is of the form
+ i with = 3 and = 4
C.F = e
x
(A cos x + B sin x)
= e
3x
(A cos 4x + B sin 4x)
The general solution is
y = e
3x
(A cos 4x + B sin 4x)
Example 33
Solve
2
2
dx
y d
- 5
dx
dy
+ 6y = e
5x
Solution :
The auxiliary equation is m
2
- 5m + 6 = 0 m = 3 , 2
Complementary function C. F = Ae
3x
+ Be
2x
P. I =
6 D 5 D
1
2
+
e
5x
=
6
1
e
5x
The general solution is
y = C.F + P. I
y = Ae
3x
+ Be
2x
+
6
5 x
e
32
Example 34
Solve
2
dx
y d
2
+ 4
dx
dy
+ 4y = 2e
-3x
Solution :
The auxiliary equation is m
2
+ 4m + 4 = 0 m = 2, 2
Complementary function is C. F = (Ax + B)e
2x
P. I =
4 D 4 D
1
2
+ +
2e
3x
=
4 ) 3 ( 4 ) 3 (
1
2
+ +
2e
3x
= 2e
3x
The general solution is
y = C.F + P. I
y = (Ax + B) e
2x
+ 2e
3x
Example 35
Solve
2
2
dx
y d
2
dx
dy
+ 4y = 5 + 3e
-x
Solution :
The auxiliary equation is m
2
2m + 4 = 0
m =
2
16 4 2 t
=
2
3 2 2 i t
= 1 + i 3
C.F = e
x
(A cos 3 x + B sin 3 x)
P. I
1
=
4 D 2 D
1
2
+
5 e
0x
=
4
1
5 e
0x
=
4
5
P.I
2
=
4 D 2 D
1
2
+
3 e
x
=
4 ) 1 ( 2 ) 1 (
1
2
+
3e
-x
=
7
3
x
e
2
5
=
10
1
P.I
2
=
25 D 10 D
1
2
+ +
e
5x
= 2
) 5 D (
1
+
e
5x
=
! 2
2
x
e
5x
=
2
2
x
(e
5x
)
The general solution is
34
y = C.F + P. I
1
+ P.I
2
y = (Ax + B) e
5x
+
10
1
+
2
2
x
e
5x
Example 38
Suppose that the quantity demanded
Q
d
= 42 4p 4
dt
dp
+
2
2
dt
p d
and quantity supplied
Q
s
= -6 + 8p where p is the price. Find the equilibrium price
for market clearance.
Solution :
For market clearance, the required condition is Q
d
= Q
s
.
42 4p 4
dt
dp
+
2
2
dt
p d
= 6 + 8p
48 12p 4
dt
dp
+
2
2
dt
p d
= 0
2
2
dt
p d
4
dt
dp
12p = -48
The auxiliary equation is m
2
4m 12 = 0
m = 6 , 2
C.F. = Ae
6t
+ Be
-2t
P. I =
12 D 4 D
1
2
(48) e
0t
=
12
1
(48) = 4
The general solution is
p = C.F. + P. I
p = Ae
6t
+ Be
-2t
+ 4
EXERCISE 6.5
1) Solve :
(i)
2
2
dx
y d
- 10
dx
dy
+ 24y = 0 (ii)
2
2
dx
y d
+
dx
dy
= 0
(iii)
2
2
dx
y d
+ 4y = 0 (iv)
2
2
dx
y d
+ 4
dx
dy
+ 4y = 0
35
2) Solve :
(i) (3D
2
+ 7D - 6)y = 0 (ii) (4D
2
12D + 9)y = 0
(iii) (3D
2
D + 1)y = 0
3) Solve :
(i) (D
2
13D + 12) y = e
2x
+ 5e
x
(ii) (D
2
5D + 6) y = e
x
+ 3e
2x
(iii) (D
2
14D + 49) y = 3 + e
7x
(iv) (15D
2
2D 1) y =
3
x
e
4) Suppose that Q
d
= 305P + 2
dt
dP
+
2
2
P
dt
d
and Q
s
= 6 + 3P. Find
the equilibrium price for market clearance.
EXERCISE 6.6
Choose the correct answer
1) The differential equation of straight lines passing through the
origin is
(a) x
dx
dy
= y (b)
dx
dy
=
y
x
(c)
dx
dy
= 0 (d)x
dx
dy
=
y
1
2) The degree and order of the differential equation
2
2
dx
y d
-6
dx
dy
= 0 are
(a) 2 and 1 (b) 1 and 2 (c) 2 and 2 (d) 1 and 1
3) The order and degree of the differential equation
2
,
_
dx
dy
3
3
3
dx
y d
+ 7
2
2
dx
y d
+
dx
dy
= x + log x are
(a) 1 and 3 (b) 3 and 1 (c) 2 and 3 (d) 3 and 2
4) The order and degree of
3
2
2
1
1
1
]
1
,
_
+
dx
dy
=
2
2
dx
y d
are
(a) 3 and 2 (b) 2 and 3 (c) 3 and 3 (d) 2 and 2
36
5) The solution of x dy + y dx = 0 is
(a) x + y = c (b) x
2
+ y
2
= c (c) xy = c (d) y = cx
6) The solution of x dx + y dy = 0 is
(a) x
2
+ y
2
= c (b)
y
x
= c (c) x
2
y
2
= c (d) xy = c
7) The solution of
dx
dy
= e
x y
is
(a) e
y
e
x
= c (b) y = log ce
x
(c) y = log(e
x
+c) (d) e
x+y
= c
8) The solution of
dt
dp
= ke
t
(k is a constant) is
(a) c -
t
e
k
= p (b) p = ke
t
+ c
(c) t = log
k
p c
(d) t = log
c
p
9) In the differential equation (x
2
- y
2
) dy = 2xy dx, if we make
the subsititution y = vx then the equation is transformed into
(a)
3
2
1
v v
v
+
+
dv =
x
dx
(b)
) 1 (
1
2
2
v v
v
+
dv =
x
dx
(c)
1
2
v
dv
=
x
dx
(d)
2
1 v
dv
+
=
x
dx
10) When y = vx the differential equation x
dx
dy
= y +
2 2
y x +
reduces to
(a)
1
2
v
dv
=
x
dx
(b)
1
2
+ v
vdv
=
x
dx
(c)
1
2
+ v
dv
=
x
dx
(d)
2
1 v
vdv
=
x
dx
11) The solution of the equation of the type
dx
dy
+ Py = 0, (P is a
function of x) is given by
(a) y
dx
e
P
= c (b) y
dx P = c
(c) x
dx
e
P
= y (d) y = cx
37
12) The solution of the equation of the type
dy
dx
+ Px = Q (P and Q
are functions of y) is
(a) y =
Q
dx
e
P
dy +c (b) y
dx
e
P
=
Q
dx
e
P
dx+c
(c) x
dy
e
P
=
Q
dy
e
P
dy +c (d) x
dy
e
P
=
Q
dx
e
P
dx +c
13) The integrating factor of x
dx
dy
y = e
x
is
(a) logx (b) x
e
1
(c)
x
1
(d)
x
1
14) The integrating factor of (1 + x
2
)
dx
dy
+ xy = (1 + x
2
)
3
is
(a)
2
1 x +
(b) log (1 + x
2
) (c) e
tan
-1
x
(d) log
(tan
-1
x)
15) The integrating factor of
x
y
dx
dy 2
+ = x
3
is
(a) 2 log x (b)
2
x
e (c) 3 log(x
2
) (d) x
2
16) The complementary function of the differential equation
(D
2
D) y = e
x
is
(a) A + B e
x
(b) (Ax + B)e
x
(c) A + Be
x
(d) (A+Bx)e
-x
17) The complementary function of the differential equation
(D
2
2D + 1)y = e
2x
is
(a) Ae
x
+ Be
x
(b) A + Be
x
(c) (Ax + B)e
x
(d) A+Be
x
18) The part i cul ar i nt egral of t he di fferent i al equat i on
2
2
dx
y d
5
dx
dy
+ 6y = e
5x
is
(a)
6
5 x
e
(b)
! 2
5 x
xe
(c) 6e
5x
(d)
25
5 x
e
19) The part i cul ar i nt egral of t he di fferent i al equat i on
2
2
dx
y d
6
dx
dy
+ 9y = e
3x
is
(a)
! 2
3 x
e
(b)
! 2
3 2 x
e x
(c)
! 2
3 x
xe
(d) 9e
3x
20) The solution of
2
2
dx
y d
y = 0 is
(a) (A + B)e
x
(b) (Ax + B)e
x
(c) Ae
x
+
B
x
e
(d) (A+Bx)e
x
38
7.1 INTERPOLATION
Interpolation is the art of reading between the lines in a table.
It means insertion or filling up intermediate values of a function from
a given set of values of the function. The following table represents
the population of a town in the decennial census.
Year : 1910 1920 1930 1940 1950
Population : 12 15 20 27 39
(in thousands)
Then the process of finding the population for the year 1914,
1923, 1939, 1947 etc. with the help of the above data is called
interpolation. The process of finding the population for the year
1955, 1960 etc. is known as extrapolation.
The following assumptions are to be kept in mind for
interpolation :
(i) The value of functions should be either in increasing
order or in decreasing order.
(ii) The rise or fall in the values should be uniform. In other
words that there are no sudden jumps or falls in the
value of function during the period under consideration.
The following methods are used in interpolation :
1) Graphic method, 2) Algebraic method
7.1.1 Graphic method of interpolation
Let y = f(x), then we can plot a graph between different
values of x and corresponding values of y. From the graph we can
find the value of y for given x.
INTERPOLATION AND
FITTING A STRAIGHT LINE
7
39
Example 1
From the following data, estimate the population for the
year 1986 graphically.
Year : 1960 1970 1980 1990 2000
Population : 12 15 20 26 33
(in thousands)
Solution :
From the graph, it is found that the population for 1986 was 24
thousands
Example 2
Using graphic method, find the value of y when x = 27,
from the following data.
x : 10 15 20 25 30
y : 35 32 29 26 23
1960 1970 1980 1990 2000 2010
year
34
32
30
28
26
24
22
20
18
16
14
12
10
p
o
p
u
l
a
t
i
o
n
i
n
t
h
o
u
s
a
n
d
s
1986
(1960, 12)
(1970, 15)
(1980, 20)
(1986, 24)
(1990, 26)
(2000, 33)
x
y
40
Solution :
The value of y when x = 27 is 24.8
7.1.2 Algebraic methods of interpolation
The mathematical methods of interpolation are many. Of these
we are going to study the following methods:
(i) Finite differences
(ii) Gregory-Newtons formula
(iii) Lagranges formula
7.1.3 Finite differences
Consider the arguments x
0
, x
1
, x
2
, ... x
n
and the entries
y
0
, y
1
, y
2
, ..., y
n
. Here y = f(x) is a function used in interpolation.
Let us assume that the x-values are in the increasing order
and equally spaced with a space-length h.
10 15 20 25 30
35
34
33
32
31
30
29
28
27
26
25
24
23
24.8
27
(30, 23)
(27, 24.8)
(25, 26)
(20, 29)
(15, 32)
(10, 35)
x
y
41
Then the values of x may be taken to be x
0
, x
0
+ h, x
0
+ 2h,
... x
0
+ nh and the function assumes the values f (x
0
), f (x
0
+h),
f(x
0
+ 2h), ..., f(x
0
+ nh)
Forward difference operator
For any value of x, the forward difference operator (delta)
is defined by
f(x) = f(x+h) - f(x).
In particular, y
0
= f(x
0
) = f(x
0
+h) f(x
0
) = y
1
y
0
f (x), [f (x+h)], [f (x+2h)], ... are the first order
differences of f(x).
Consider
2
f(x)= [{f(x)}]
= [f(x+h) f(x)]
= [f(x+h)] [f(x)]
= [f(x+2h) f(x+h)] [f(x+h) f(x)]
= f(x+2h) 2f (x+h) + f(x).
2
f(x),
2
[f(x+h)],
2
[f(x+2h)] ... are the second order
differences of f(x).
In a similar manner, the higher order differences
3
f (x),
4
f(x),...
n
f(x), ... are all defined.
Backward difference operator
For any value of x, the backward difference operator (nabla)
is defined by
f(x) = f(x) f(x h)
In particular, y
n
= f(x
n
) = f(x
n
) f(x
n
h) = y
n
y
n1
f(x), [f(x+h)], [f(x+2h)], ... are the first order differences
of f(x).
Consider
2
f(x)= [{f(x)}] = [f(x) f(xh)]
= [f(x)] [f(x h)]
= f(x) 2f(x h) + f(x2h)
42
2
f(x),
2
[f(x+h)],
2
[f(x+2h)] ... are the second order
differences of f (x).
In a similar manner the higher order backward differences
3
f (x),
4
f(x),...
n
f(x), ... are all defined.
Shifting operator
For any value of x, the shifting operator E is defined by
E[f(x)] = f(x+h)
In particular, E(y
0
) = E[f(x
0
)] = f(x
0
+h) = y
1
Further, E
2
[f(x)] = E[E{f(x)] = E[f(x+h)] = f(x+2h)
Similarly E
3
[f(x)] = f(x+3h)
In general E
n
[f(x)] = f(x+nh)
The relation between and E
We have f(x) = f(x+h) f(x)
= E f(x) f(x)
f(x) = (E 1) f(x)
= E 1
i.e. E = 1+
Results
1) The differences of constant function are zero.
2) If f(x) is a polynomial of the n
th
degree in x, then the n
th
difference of f(x) is constant and
n+1
f(x) = 0.
Example 3
Find the missing term from the following data.
x : 1 2 3 4
f(x) : 100 -- 126 157
Solution :
Since three values of f(x) are given, we assume that the
polynomial is of degree two.
43
Hence third order differences are zeros.
3
[f(x
0
)] = 0
or
3
(y
0
) = 0
(E 1)
3
y
0
= 0 ( = E 1)
(E
3
3E
2
+ 3E 1) y
0
= 0
y
3
3y
2
+ 3y
1
y
0
= 0
157 3(126) + 3y
1
100 = 0
y
1
= 107
i.e. the missing term is 107
Example 4
Estimate the production for 1962 and 1965 from the
following data.
Year : 1961 1962 1963 1964 1965 1966 1967
Production: 200 -- 260 306 -- 390 430
(in tons)
Solution :
Since five values of f(x) are given, we assume that polynomial
is of degree four.
Hence fifth order diferences are zeros.
5
[f(x
0
)]= 0
i.e.
5
(y
0
) = 0
(E 1)
5
(y
0
) = 0
i.e. (E
5
5E
4
+ 10E
3
10E
2
+ 5E 1) y
0
= 0
y
5
5y
4
+ 10y
3
10y
2
+ 5y
1
y
0
= 0
390 5y
4
+ 10(306) 10(260) + 5y
1
200 = 0
y
1
y
4
= 130 --------------(1)
Since fifth order differences are zeros, we also have
5
[f(x
1
)]= 0
44
i.e.
5
(y
1
) = 0
i.e. (E 1)
5
y
1
= 0
(E
5
5E
4
+ 10E
3
10E
2
+ 5E 1)y
1
= 0
y
6
5y
5
+ 10y
4
10y
3
+ 5y
2
y
1
= 0
430 5(390) + 10y
4
10(306) + 5(260) y
1
= 0
10y
4
y
1
= 3280 ------------(2)
By solving the equations (1) and (2) we get,
y
1
= 220 and y
4
= 350
The productions for 1962 and 1965 are 220 tons and 350 tons
respectively.
7.1.4 Derivation of Gregory - Newtons forward formula
Let the function y = f(x) be a polynomial of degree n which
assumes (n+1) values f(x
0
), f(x
1
), f(x
2
)... f(x
n
), where x
0
, x
1
, x
2
, ...
x
n
are in the increasing order and are equally spaced.
Let x
1
x
0
= x
2
x
1
= x
3
x
2
= ... = x
n
x
n-1
= h (a positive
quantity)
Here f (x
0
) = y
0
, f(x
1
) = y
1
, ... f(x
n
) = y
n
Now f(x) can be written as,
f(x) = a
0
+ a
1
(x x
0
) + a
2
(xx
0
)(xx
1
) + ...
+a
n
(xx
0
) (xx
1
)... (xx
n-1
) ----------------(1)
When x = x
0
, (1) implies
f(x
0
) = a
0
or a
0
= y
0
When x = x
1
, (1)
f(x
1
) = a
0
+ a
1
(x
1
x
0
)
i.e. y
1
= y
0
+ a
1
h
a
1
=
h
y y
0 1
a
1
=
h
y
0
When x = x
2
, (1)
f(x
2
) = a
0
+ a
1
(x
2
x
0
) + a
2
(x
2
x
0
) (x
2
x
1
)
45
y
2
= y
0
+
h
y
0
(2h) + a
2
(2h) (h)
2h
2
a
2
= y
2
y
0
2y
0
= y
2
y
0
2(y
1
y
0
)
= y
2
2y
1
+ y
0
=
2
y
0
a
2
=
2
0
2
! 2
h
y
In the same way we can obtain
a
3
=
3
0
3
! 3
h
y
, a
4
=
4
0
4
! 4
h
y
,..., a
n
=
n
n
h n
y
!
0
(x - x
0
) +
2
0
2
! 2
h
y
(x x
0
) (x x
1
) + ...
+
n
n
h n
y
!
0
(x x
0
) (x x
1
) ... (x x
n-1
) ---------(2)
Denoting
h
x x
0
by u, we get
x x
0
= hu
x x
1
= (x x
0
) (x
1
x
0
) = hu h = h(u1)
x x
2
= (x x
0
) (x
2
x
0
) = hu 2h = h(u2)
x x
3
= h (u - 3)
In general
x x
n-1
= h{u (n1)}
Thus (2) becomes,
f(x) = y
0
+
! 1
u
y
0
+
! 2
) 1 ( u u
2
y
0
+ ...
+
!
) 1 )...( 2 )( 1 (
n
n u u u u
n
y
0
where u =
h
x x
0
2
y
0
+
! 3
) 2 )( 1 ( u u u
3
y
0
+
! 4
) 3 )( 2 )( 1 ( u u u u
4
y
0
where u =
h
x x
0
Here h = 1, x
0
= 0 and x = 0.2
u =
1
0 2 . 0
= 0.2
The forward difference table :
x y y
2
y
3
y
4
y
0 176
9
1 185
9
0
-1
2 194
8
-1
3
4
3 202
10
2
4 212
y = 176 +
! 1
2 . 0
(9) +
! 2
) 1 2 . 0 ( 2 . 0
(0)
+
! 3
) 2 2 . 0 )( 1 2 . 0 )( 2 . 0 (
(-1) +
! 4
) 3 2 . 0 )( 2 2 . 0 )( 1 2 . 0 )( 2 . 0 (
(4)
= 176 + 1.8 0.048 0.1344
= 177.6176
i.e. when x = 0.2, y = 177.6176
47
Example 6
If y
75
= 2459, y
80
= 2018, y
85
= 1180 and
y
90
= 402 find y
82
.
Solution :
We can write the given data as follows:
x : 75 80 85 90
y : 2459 2018 1180 402
82 lies in the interval (80, 85). So we can use Gregory-
Newtons forward interpolation formula. Since four values are given,
the interpolation formula is
y = y
0
+
! 1
u
y
0
+
! 2
) 1 ( u u
2
y
0
+
! 3
) 2 )( 1 ( u u u
3
y
0
where u =
h
x x
0
Here h = 5, x
0
= 75 x = 82
u =
5
75 82
=
5
7
= 1.4
The forward difference table :
x y y
2
y
3
y
75 2459
-441
80 2018
-838
-397
457
85 1180
-778
60
90 402
y = 2459 +
! 1
4 . 1
(-441) +
! 2
) 1 4 . 1 ( 4 . 1
(-397)
+
! 3
) 2 4 . 1 )( 1 4 . 1 ( 4 . 1
(457)
= 2459 617.4 111.6 25.592
y = 1704.408 when x = 82
48
Example 7
From the following data calculate the value of e
1.75
x : 1.7 1.8 1.9 2.0 2.1
e
x
: 5.474 6.050 6.686 7.389 8.166
Solution :
Since five values are given, the interpolation formula is
y
x
= y
0
+
! 1
u
y
0
+
! 2
) 1 ( u u
2
y
0
+
! 3
) 2 )( 1 ( u u u
3
y
0
+
! 4
) 3 )( 2 )( 1 ( u u u u
4
y
0
where u =
h
x x
0
Here h = 0.1, x
0
= 1.7 x = 1.75
u =
1 . 0
7 . 1 75 . 1
=
1 . 0
05 . 0
=0.5
The forward difference table :
x y y
2
y
3
y
4
y
1.7 5.474
0.576
1.8 6.050 0.060
0.007
1.9 6.686
0.636
0.067
0.007
0
2.0 7.389
0.703
0.074
2.1 8.166
0.777
y = 5.474 +
! 1
5 . 0
(0.576) +
! 2
) 1 5 . 0 ( 5 . 0
(0.06)
+
! 3
) 2 5 . 0 )( 1 5 . 0 ( 5 . 0
(0.007)
= 5.474 + 0.288 0.0075 + 0.0004375
y = 5.7549375 when x = 1.75
49
Example 8
From the data, find the number of students whose height
is between 80cm. and 90cm.
Height in cms x : 40-60 60-80 80-100 100-120 120-140
No. of students y : 250 120 100 70 50
Solution :
The difference table
x y y
2
y
3
y
4
y
Below 60 250
Below 80 370
120
-20
-10
Below 100 470
100
-30
10
20
Below 120 540
70
-20
Below 140 590
50
Let us calculate the number of students whose height is less
than 90cm.
Here x = 90 u =
h
x x
0
=
20
60 90
= 1.5
y(90) = 250 +(1.5)(120) +
! 2
) 1 5 . 1 )( 5 . 1 (
(20)
+
! 3
) 2 5 . 1 )( 1 5 . 1 )( 5 . 1 (
(-10)+
! 4
) 3 5 . 1 )( 2 5 . 1 )( 1 5 . 1 )( 5 . 1 (
(20)
= 250 + 180 7.5 + 0.625 + 0.46875
= 423.59 ~ 424
Therefore number of students whose height is between
80cm. and 90cm. is y(90) y(80)
i.e. 424 370 = 54.
Example 9
Find the number of men getting wages between Rs.30
and Rs.35 from the following table
50
Wages x : 20-30 30-40 40-50 50-60
No. of men y : 9 30 35 42
Solution :
The difference table
x y y
2
y
3
y
Under 30 9
Under 40 39
30
5
Under 50 74
35
7
2
Under 60 116
42
Let us calculate the number of men whose wages is less
than Rs.35.
For x = 35 , u =
h
x x
0
=
10
30 35
= 0.5
By Newtons forward formula,
y(35) = 9+
1
) 5 . 0 (
(30) +
! 2
) 1 5 . 0 )( 5 . 0 (
(5)
+
! 3
) 2 5 . 0 )( 1 5 . 0 )( 5 . 0 (
(2)
= 9 + 15 0.6 + 0.1
= 24 (approximately)
Therefore number of men getting wages between
Rs.30 and Rs.35 is y(35) y(30) i.e. 24 9 = 15.
7.1.5 Gregory-Newtons backward formula
Let the function y = f(x) be a polynomial of degree n which
assumes (n+1) values f(x
0
), f(x
1
), f(x
2
), ..., f(x
n
) where x
0
, x
1
, x
2
,
..., x
n
are in the increasing order and are equally spaced.
Let x
1
- x
0
= x
2
x
1
= x
3
x
2
= ... x
n
x
n-1
= h (a positive
quantity)
51
Here f(x) can be written as
f(x) = a
0
+ a
1
(xx
n
) + a
2
(xx
n
) (xx
n-1
) + ...
+ a
n
(xx
n
) (xx
n-1
) ... (xx
1
) -----------(1)
When x = x
n
, (1)
f(x
n
) = a
0
or a
0
= y
n
When x = x
n1
, (1)
f(x
n1
)= a
0
+ a
1
(x
n1
x
n
)
or y
n1
= y
n
+ a
1
(h)
or a
1
=
h
y y
n n 1
a
1
=
h
y
n
When x = x
n2
, (1)
f (x
n2
) = a
0
+ a
1
(x
n2
x
n
) + a
2
(x
n2
x
n
) (x
n2
x
n1
)
y
n2
= y
n
+
h
y
n
(2h) + a
2
(2h) (h)
2h
2
a
2
= (y
n2
y
n
) + 2y
n
= y
n2
y
n
+ 2(y
n
y
n1
)
= y
n2
2y
n1
+ y
n
=
2
y
n
a
2
=
2
2
! 2 h
y
n
, a
4
=
4
4
! 4 h
y
n
... a
n
=
! n
y
n
n
f(x) = y
n
+
h
y
n
(xx
n
) +
2
2
! 2 h
y
n
(xx
n
)(xx
n1
) + ...
+
! n
y
n
n
(xx
n
) (xx
n1
) ... (xx
1
) ------------(2)
Further, denoting
h
x x
n
by u, we get
xx
n
= h
u
52
xx
n1
= (xx
n
) (x
n
x
n1
) = hu + h = h(u+1)
xx
n2
= (xx
n
) (x
n
x
n2
) = hu + 2h = h(u+2)
xx
n3
= h(u+3)
In general
xx
nk
= h(u+k)
Thus (2) becomes,
f(x) = y
n
+
! 1
u
y
n
+
! 2
) 1 ( + u u
2
y
n
+ ...
+
!
)} 1 ( )...{ 1 (
n
n u u u + +
n
y
n
where u =
h
x x
n
2
y
4
+
! 3
) 2 )( 1 ( + + u u u
3
y
4
+
! 4
) 3 )( 2 )( 1 ( + + + u u u u
4
y
4
where u =
h
x x
4
Here h = 10, x
4
= 2001 x = 1995
u =
10
2001 1995
= 0.6
53
The backward difference table :
x y y
2
y
3
y
4
y
1961 46
20
1971 66
15
-5
2
1981 81
12
-3
-1
-3
1991 93
8
-4
2001 101 101
y = 101 +
! 1
) 6 . 0 (
(8) +
! 2
) 1 6 . 0 )( 6 . 0 ( +
(4)
+
! 3
) 2 6 . 0 )( 1 6 . 0 )( 6 . 0 ( + +
(1) +
! 4
) 3 6 . 0 )( 2 6 . 0 )( 1 6 . 0 )( 6 . 0 ( + + +
(3)
= 1014.8+0.48+0.056+0.1008 y = 96.8368
i.e. the population for the year 1995 is 96.837 thousands.
Example 11
From the following table, estimate the premium for a
policy maturing at the age of 58
Age x : 40 45 50 55 60
Premium y : 114.84 96.16 83.32 74.48 68.48
Solution :
Since five values are given, the interpolation formula is
y = y
4
+
! 1
u
y
4
+...+
! 4
) 3 )( 2 )( 1 ( + + + u u u u
4
y
4
where u =
5
60 58
= 0.4
The backward difference table :
x y y
2
y
3
y
4
y
40 114.84
-18.68
45 96.16
-12.84
5.84
-1.84
50 83.32
-8.84
4.00
-1.16
0.68
55 74.48
-6.00
2.84
60 68. 48 68. 48
54
y = 68.48 +
! 1
) 4 . 0 (
(-6) +
2
) 6 . 0 )( 4 . 0 (
(2.84)
+
6
) 6 . 1 )( 6 . 0 )( 4 . 0 (
(-1.16) +
24
) 6 . 2 )( 6 . 1 )( 6 . 0 )( 4 . 0 (
(0.68)
= 68.48 + 2.4 0.3408 + 0.07424 0.028288
y = 70.5851052 i.e. y ~ 70.59
Premium for a policy maturing at the age of 58 is 70.59
Example 12
From the following data, find y when x = 4.5
x : 1 2 3 4 5
y : 1 8 27 64 125
Solution :
Since five values are given, the interpolation formula is
y = y
4
+
! 1
u
y
4
+...+
! 4
) 3 )( 2 )( 1 ( + + + u u u u
4
y
4
where u =
h
x x
4
Here u =
1
5 5 . 4
= 0.5
The backward difference table :
x y y
2
y
3
y
4
y
1 1
7
2 8
19
12
6
3 27
37
18
6
0
4 64
61
24
5 125
y = 125+
1
) 5 . 0 (
(61)+
2
) 5 . 0 )( 5 . 0 (
(24) +
6
) 5 . 1 )( 5 . 0 )( 5 . 0 (
(6)
y = 91.125 when x = 4.5
55
7.1.6 Lagranges formula
Let the function y = f(x) be a polynomial of degree n which
assumes (n + 1) values f(x
0
), f(x
1
), f(x
2
) ...f(x
n
) corresponding to
the arguments x
0
, x
1
, x
2
, ... x
n
(not necessarily equally spaced).
Here f(x
0
) = y
0
, f(x
1
) = y
1
, ..., f(x
n
) = y
n
.
Then the Lagranges formula is
f(x) = y
0
) )...( )( (
) )...( )( (
0 2 0 1 0
2 1
n
n
x x x x x x
x x x x x x
+ y
1
) )...( )( (
) )...( )( (
1 2 1 0 1
2 0
n
n
x x x x x x
x x x x x x
+ ... + y
n
) )...( )( (
) )...( )( (
1 1 0
1 1 0
n n n n
n
x x x x x x
x x x x x x
Example 13
Using Lagranges formula find the value of y when
x = 42 from the following table
x : 40 50 60 70
y : 31 73 124 159
Solution :
By data we have
x
0
= 40, x
1
= 50, x
2
= 60, x
3
= 70 and x = 42
y
0
= 31, y
1
= 73, y
2
= 124, y
3
= 159
Using Lagranges formula, we get
y = y
0
) )( )( (
) )( )( (
3 0 2 0 1 0
3 2 1
x x x x x x
x x x x x x
+ y
1
) )( )( (
) )( )( (
3 1 2 1 0 1
3 2 0
x x x x x x
x x x x x x
+ y
2
) )( )( (
) )( )( (
3 2 1 2 0 2
3 1 0
x x x x x x
x x x x x x
56
+ y
3
) )( )( (
) )( )( (
2 3 1 3 0 3
2 1 0
x x x x x x
x x x x x x
y(42) = 31
) 30 )( 20 )( 10 (
) 28 )( 18 )( 8 (
+ 73
) 20 )( 10 )( 10 (
) 28 )( 18 )( 2 (
+124
) 10 )( 10 )( 20 (
) 28 )( 8 )( 2 (
+159
) 10 )( 20 )( 30 (
) 18 )( 8 )( 2 (
= 20.832 + 36.792 - 27.776 + 7.632
y = 37.48
Example 14
Using Lagranges formula find y when x = 4 from the
following table
x : 0 3 5 6 8
y : 276 460 414 343 110
Solution :
Given
x
0
= 0, x
1
= 3, x
2
= 5, x
3
= 6, x
4
= 8 and x = 4
y
0
= 276, y
1
= 460, y
2
= 414, y
3
= 343, y
4
= 110
Using Lagranges formula
y = y
0
) )( )( )( (
) )( )( )( (
4 0 3 0 2 0 1 0
4 3 2 1
x x x x x x x x
x x x x x x x x
+ y
1
) )( )( )( (
) )( )( )( (
4 1 3 1 2 1 0 1
4 3 2 0
x x x x x x x x
x x x x x x x x
+ y
2
) )( )( )( (
) )( )( )( (
4 2 3 2 1 2 0 2
4 3 1 0
x x x x x x x x
x x x x x x x x
+ y
3
) )( )( )( (
) )( )( )( (
4 3 2 3 1 3 0 3
4 2 1 0
x x x x x x x x
x x x x x x x x
+ y
4
) )( )( )( (
) )( )( )( (
3 4 2 4 1 4 0 4
3 2 1 0
x x x x x x x x
x x x x x x x x
57
= 276
) 8 )( 6 )( 5 )( 3 (
) 4 )( 2 )( 1 )( 1 (
+ 460
) 5 )( 3 )( 2 )( 3 (
) 4 )( 2 )( 1 )( 4 (
+ 414
) 3 )( 1 )( 2 )( 5 (
) 4 )( 2 )( 1 )( 4 (
+343
) 2 )( 1 )( 3 )( 6 (
) 4 )( 1 )( 1 )( 4 (
+ 110
) 2 )( 3 )( 5 )( 8 (
) 2 )( 1 )( 1 )( 4 (
= 3.066 + 163.555 + 441.6 152.44 + 3.666
y = 453.311
Example 15
Using Lagranges formula find y(11) from the
following table
x : 6 7 10 12
y : 13 14 15 17
Solution :
Given
x
0
= 6, x
1
= 7, x
2
= 10, x
3
= 12 and x = 11
y
0
= 13, y
1
= 14, y
2
= 15, y
3
= 17
Using Lagranges formula
= 13
) 6 )( 4 )( 1 (
) 1 )( 1 )( 4 (
+ 14
) 5 )( 3 )( 1 (
) 1 )( 1 )( 5 (
+ 15
) 2 )( 3 )( 4 (
) 1 )( 4 )( 5 (
+17
) 2 )( 5 )( 6 (
) 1 )( 4 )( 5 (
= 2.1666 4.6666 + 12.5 + 5.6666
y = 15.6666
EXERCISE 7.1
1) Using Graphic method, find the value of y when x = 42, from
the following data.
x : 20 30 40 50
y : 51 43 34 24
58
2) The population of a town is as follows.
Year x : 1940 1950 1960 1970 1980 1990
Population y : 20 24 29 36 46 50
(in lakhs)
Estimate the population for the year 1976 graphically
3) From the following data, find f(3)
x : 1 2 3 4 5
f(x) : 2 5 - 14 32
4) Find the missing term from the following data.
x : 0 5 10 15 20 25
y : 7 11 14 -- 24 32
5) From the following data estimate the export for the year 2000
Year x : 1999 2000 2001 2002 2003
Export y : 443 -- 369 397 467
(in tons)
6) Using Gregory-Newtons formula, find y when x = 145 given
that
x : 140 150 160 170 180
y : 46 66 81 93 101
7) Using Gregory-Newtons formula, find y(8) from the following
data.
x : 0 5 10 15 20 25
y : 7 11 14 18 24 32
8) Using Gregory-Newtons formula, calculate the population for
the year 1975
Year : 1961 1971 1981 1991 2001
Population : 98572 132285 168076 198690 246050
9) From the following data find the area of a circle of diameter
96 by using Gregory-Newtons formula
Diameter x : 80 85 90 95 100
Area y : 5026 5674 6362 7088 7854
59
10) Using Gregory-Newtons formula, find y when x = 85
x : 50 60 70 80 90 100
y : 184 204 226 250 276 304
11) Using Gregory-Newtons formula, find y(22.4)
x : 19 20 21 22 23
y : 91 100 110 120 131
12) From the following data find y(25) by using Lagranges formula
x : 20 30 40 50
y : 512 439 346 243
13) If f(0) = 5, f(1) = 6, f(3) = 50, f(4) = 105, find f(2) by using
Lagranges formula
14) Apply Lagranges formula to find y when x = 5 given that
x : 1 2 3 4 7
y : 2 4 8 16 128
7.2 FITTING A STRAIGHT LINE
A commonly occurring problem in many fields is the necessity
of studying the relationship between two (or more) variables.
For example the weight of a baby is related to its age ; the
price of a commodity is related to its demand ; the maintenance
cost of a car is related to its age.
7.2.1 Scatter diagram
This is the simplest method by which we can represent
diagramatically a bivariate data.
Suppose x and y denote
respectively the age and weight
of an adult male, then consider
a sample of n individuals with
ages x
1
, x
2
, x
3
, . . . x
n
and
the corresponding weights as
y
1
, y
2
, y
3
, ... y
n
. Plot the points
y
x
Fig. 7.1
60
(x
1
, y
1
), (x
2
, y
2
), ( x
3
, y
3
), ... (x
n
, y
n
) on a rectangular co-ordinate
system. The resulting set of points in a graph is called a scatter
diagram.
From the scatter diagram it is often possible to visualize a
smooth curve approximating the data. Such a curve is called an
approximating curve. In the above figure, the data appears to be
well approximated by a straight line and we say that a linear
relationship exists between the two variables.
7.2.2 Principle of least squares
Generally more than one curve of a given type will appear to
fit a set of data. In constructing lines it is necessary to agree on a
definition of a best fitting line.
Consider the data points
(x
1
, y
1
), (x
2
, y
2
), (x
3
, y
3
), ...
(x
n
, y
n
). For a given value of x,
say x
1
, in general there will be a
difference between the value y,
and the corresponding value as
determined from the curve C
(in Fig. 7.2)
We denote this difference by d
1
, which is referred to as a
deviation or error. Here d
1
may be positive, negative or zero.
Similarly corresponding to the values x
2
, x
3
, ... x
n
we obtain the
deviations d
2
, d
3
, ... d
n
.
A measure of the goodness of fit of the curve to the set of
data is provided by the quantities d
1
2
, d
2
2
, ... d
n
2
.
Of all the curves approximating a given set of data points, the
curve having the property that d
1
2
+ d
2
2
+ d
3
2
+...+ d
n
2
is a minimum
is the best fitting curve. If the approximating curve is a straight line
then such a line is called the line of best fit.
x
y
(x
2
, y
2
)
(x
1
, y
1
)
(x
n
, y
n
)
d
n
d
2
d
1
Fig. 7.2
C
61
7.2.3 Derivation of normal equations by the principle of least
squares.
Let us consider the fitting of a straight line
y = ax + b ------------(1)
to set of n points (x
1
, y
1
), (x
2
, y
2
), ... (x
n
, y
n
).
For the different values
of a and b equation (1)
represents a family of straight
lines. Our aim is to determine
a and b so that the line (1) is
the line of best fit.
Now a and b are
determined by applying
principle of least squares.
Let P
i
(x
i
, y
i
) be any
point in the scatter diagram.
Draw P
i
M perpendicular to x-axis meeting the line (1) in H
i
. The
x-coordinate of H
i
is x
i
. The ordinate of H
i
is ax
i
+ b.
P
i
H
i
= P
i
M - H
i
M
= y
i
(ax
i
+b) is the deviation for y
i
.
According to the principle of least squares, we have to
find a and b so that
E =
n
i 1 =
P
i
H
i
2
=
n
i 1 =
[y
i
(ax
i
+ b)]
2
is minimum.
For maxima or minima, the partial derivatives of E with respect
to a and b should vanish separately.
a
E
= 0 2
n
i 1 =
x
i
[y
i
(ax
i
+ b)] = 0
a
n
i 1
=
x
i
2
+ b
n
i 1
=
x
i
=
n
i 1
=
x
i
y
i
-------------(2)
b
E
= 0 - 2
n
i 1 =
[y
i
(ax
i
+ b)] = 0
y
x M O
P
i
(x
i
, y
i
)
H
i
(x
i
, ax
i
+b)
Fig. 7.3
62
i.e., y
i
ax
i
nb = 0
a
n
i 1
=
x
i
+ nb =
n
i 1
=
y
i
-------------(3)
(2) and (3) are known as the normal equations. Solving the
normal equations we get a and b.
Note
The normal equations for the line of best fit of the form
y = a + bx are
na + b x
i
= y
i
a x
i
+ b x
i
2
= x
i
y
i
Example 16
Fit a straight line to the following
x = 10, y = 19, x
2
= 30, xy = 53 and n = 5.
Solution :
The line of best fit is y = ax + b
y = ax + nb
xy = ax
2
+ bx
10a + 5b = 19 ------------(1)
30a + 10b = 53 ------------(2)
Solving (1) and (2) we get, a = 1.5 and b = 0.8
The line of best fit is y = 1.5x + 0.8
Example 17
In a straight line of best fit find x-intercept when
x = 10, y=16.9, x
2
= 30, xy = 47.4 and n = 7.
Solution :
The line of best fit is y = ax + b
The normal equations are
y = ax + nb
63
xy = ax
2
+ bx
10a + 7b = 16.9 -----------(1)
30a + 10b = 47.4 -----------(2)
Solving (1) and (2) we get,
a = 1.48 and b = 0.3
The line of best fit is y = 1.48x + 0.3
The x-intercept of the line of best fit is
48 . 1
0.3
Example 18
Fit a straight line for the following data.
x : 0 1 2 3 4
y : 1 1 3 4 6
Solution :
The line of best fit is y = ax + b
The normal equations are
ax + nb = y ------------(1)
ax
2
+ bx = xy ------------(2)
Now from the data
x y x
2
xy
0 1 0 0
1 1 1 1
2 3 4 6
3 4 9 12
4 6 16 24
10 15 30 43
By substituting these values in (1) and (2) we get,
10a + 5b = 15 ------------(3)
30a + 10b = 43 ------------(4)
64
Solving (3) and (4) we get, a = 1.3 and b = 0.4
The line of best fit is y = 1.3x + 0.4.
Example 19
Fit a straight line to the following data:
x : 4 8 12 16 20 24
y : 7 9 13 17 21 25
Solution :
Take the origin at
2
16 12 +
= 14
Let u
i
=
2
14
i
x
Here n = 6
The line of best fit is y = au + b
The normal equations are au + nb = y ---------(1)
au
2
+ bu = uy ----------(2)
x y u u
2
uy
4 7 -5 25 -35
8 9 -3 9 -27
12 13 -1 1 -13
16 17 1 1 17
20 21 3 9 63
24 25 5 25 125
Total 92 0 70 130
On substituting the values in the normal equation (1) and (2)
a = 1.86 and b = 15.33
The line of best fit is y = 1.86
2
14 x
+15.33 = 0.93x + 2.31
Example 20
Fit a straight line to the following data.
x : 100 200 300 400 500 600
y : 90.2 92.3 94.2 96.3 98.2 100.3
65
Solution :
Let u
i
=
50
350
i
x
and v
i
= y
i
- 94.2 Here n = 6.
The line of best fit is v = au + b
The normal equations are au + nb = v ----------(1)
au
2
+ bu = uv----------(2)
x y u v u
2
uv
100 90.2 -5 -4 25 20
200 92.3 -3 -1.9 9 5.7
300 94.2 -1 0 1 0
400 96.3 1 2.1 1 2.1
500 98.2 3 4 9 12
600 100.3 5 6.1 25 30.5
Total 0 63 70 70.3
Substituting the values in (1) and (2) we get
a = 1.0043 and b = 1.05
The line of best fit is v = 1.0043 u + 1.05
y = 0.02x +88.25
EXERCISE 7.2
1) Define a scatter diagram.
2) State the principle of least squares.
3) Fit the line of best fit if x = 75, y = 115, x
2
= 1375,
xy = 1875, and n = 6.
4) In a line of best fit find the slope and the y intercept if
x = 10, y = 25, x
2
= 30, xy = 90 and n = 5.
5) Fit a straight line y = ax + b to the following data by the
method of least squares.
x 0 1 3 6 8
y 1 3 2 5 4
66
6) A group of 5 students took tests before and after training and
obtained the following scores.
Scores before training 3 4 4 6 8
Scores after training 4 5 6 8 10
Find by the method of least squares the straight line of best fit
7) By the method of least squares find the best fitting straight
line to the data given below:
x : 100 120 140 160 180 200
y : 0.45 0.55 0.60 0.70 0.80 0.85
8) Fit a straight line to the data given below. Also estimate the
value y at x = 3.5
x : 0 1 2 3 4
y : 1 1.8 3.3 4.5 6.3
9) Find by the method of least squares, the line of best fit for the
following data.
Depth of water applied x : 0 12 24 36 48
(in cm)
Average yield y : 35 55 65 80 90
(tons / acre)
10) The following data show the advertising expenses (expressed
as a percentage of total expenses) and the net operating profits
(expressed as a percentage of total sales) in a random sample
of six drug stores.
Advertising expenses 0.4 1.0 1.3 1.5 2.0 2.8
Net operating profits 1.90 2.8 2.9 3.6 4.3 5.4
Fit a line of best fit.
11) The following data is the number of hours which ten students
studied for English and the scores obtained by them in the
examinations.
Hours studied x : 4 9 10 12 14 22
Test score y : 31 58 65 68 73 91
(i) Fit a straight line y = ax + b
(ii) Predict the score of the student who studied for 17 hours.
67
EXERCISE 7.3
Choose the correct answer
1) f(x) =
(a) f(x+h) (b) f(x)f(x+h)
(c) f(x+h)f(x) (d) f(x)f(xh)
2) E
2
f(x) =
(a) f(x+h) (b) f(x+2h) (c) f(2h) (d) f(2x)
3) E =
(a) 1+ (b) 1 (c) + 1 (d) 1
4) f(x+3h) =
(a) f(x+2h) (b) f(x+3h)-f(x+2h)
(c) f(x+3h) (d) f(x+2h) f(x 3h)
5) When h = 1, (x
2
) =
(a) 2x (b) 2x 1 (c) 2x+1 (d) 1
6) The normal equations for estimating a and b so that the
line y = ax + b may be the line of best fit are
(a) ax
i
2
+ bx
i
= x
i
y
i
and ax
i
+ nb = y
i
(b) ax
i
+ bx
i
2
= x
i
y
i
and ax
i
2
+ nb = y
i
(c) ax
i
+ nb = x
i
y
i
and ax
i
2
+ bx
i
= y
i
(d) ax
i
2
+ nb = x
i
y
i
and ax
i
+ bx
i
= y
i
7) In a line of best fit y = 5.8 (x-1994) + 41.6 the value of y
when x = 1997 is
(a) 50 (b) 54 (c) 59 (d) 60
8) Five data relating to x and y are to be fit in a straight line. It is
found that x = 0 and y = 15. Then the y-intercept of the line
of best fit is,
(a) 1 (b) 2 (c) 3 (d) 4
68
9) The normal equations of fitting a straight line y = ax + b are
10a +5b = 15 and 30a + 10b = 43. The slope of the line of best
fit is
(a) 1.2 (b) 1.3 (c) 13 (d) 12
10) The normal equations obtained in fitting a straight line
y = ax + b by the method of least squares over n points (x, y)
are 4 = 4a + b and xy = 120a + 24b. Then n =
(a) 30 (b) 5 (c) 6 (d) 4
69
8.1 RANDOM VARIABLE AND
PROBABILITY FUNCTION
Random variable
A random variable is a real valued function defined on a
sample space S and taking values in ( , )
8.1.1 Discrete Random Variable
A random variable X is said to be discrete if it assumes only
a finite or an infinite but countable number of values.
Examples
(i) Consider the experiment of tossing a coin twice. The sample
points of this experiment are s
1
= (H, H), s
2
= (H, T), s
3
= (T,
H) and s
4
= (T, T).
Random variable X denotes the number of heads obtained in
the two tosses.
Then X(s
1
) = 2 X(s
2
) = 1
X(s
3
) = 1 X(s
4
) = 0
R
X
= {0, 1, 2}
where s is the typical element of the sample space, X(s) represents
the real number which the random variable X associates with the
outcome s.
R
X
, the set of all possible values of X, is called the range
space X.
(ii) Consider the experiment of rolling a pair of fair dice once.
Then sample space
PROBABILITY
DISTRIBUTIONS
8
70
S = {(1, 1) (1, 2) ............(1, 6)
. . .
. . .
. . .
(6, 1) (6, 2) ............(6, 6)}
Let the random variable X denote the sum of the scores on
the two dice. Then R
X
= {2, 3, 4, ......, 12}.
(iii) Consider the experiment of tossing of 3 coins simultaneously.
Let the random variable X be Number of heads obtained in
this experiment.
Then
S = {HHH, HHT, HTT, TTT, TTH, THH, HTH, THT}
R
X
= {0, 1, 2, 3}
(iv) Suppose a random experiment consists of throwing 4 coins
and recording the number of heads.
Then R
X
= {0, 1, 2, 3, 4}
The number of printing mistakes in each page of a book and
the number of telephone calls received by the telephone operator
of a firm, are some other examples of discrete random variable.
8.1.2 Probability function and Probability distribution of a
Discrete random variable
Let X be a discrete random variable assuming values
x
1
, x
2
, x
3
... If there exists a function p denoted by p(x
i
) = P[X = x
i
]
such that
(i) p(x
i
) > 0 for i = 1, 2, ...
(ii)
i
p(x
i
) = 1
then p is called as the probability function or probability mass
function (p.m.f) of X.
71
The collection of all pairs (x
i
, p(x
i
)) is called the probability
distribution of X.
Example 1
Consider the experiment of tossing two coins. Let X
be a random variable denoting the number of heads obtained.
X : 0 1 2
p(x
i
) :
4
1
2
1
4
1
Is p(x
i
) a p.m.f ?
Solution :
(i) p(x
i
) > 0 for all i
(ii) p(x
i
) = p(0) + p(1) + p(2)
=
4
1
+
2
1
+
4
1
= 1
Hence p(x
i
) is a p.m.f.
Example 2
Consider the discrete random variable X as the sum of
the numbers that appear, when a pair of dice is thrown. The
probability distribution of X is
X : 2 3 4 5 6 7 8 9 10 11 12
p(x
i
) :
36
1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
Is p(x
i
) a p.m.f?
Solution :
p(x
i
) > 0 for all i
(ii) p(x
i
) =
36
1
+
36
2
+
36
3
+ ........+
36
1
= 1
Hence p(x
i
) is a p.m.f.
72
8.1.3 Cumulative Distribution function : (c.d.f.)
Let X be a discrete random variable. The function F(x) is
said to be the cumulative distribution function (c.d.f.) of the random
variable X if
F(x) = P(X < x)
=
i
p(x
i
) where the sum is taken over i
such that x
i
< x.
Remark : P(a < X < b) = F(b) F(a)
Example 3
A random variable X has the following probability
function :
Values of X, x : 2 1 0 1 2 3
p(x) : 0.1 k 0.2 2k 0.3 k
(i) Find the value of k
(ii) Construct the c.d.f. of X
Solution :
(i) Since
i
p(x
i
) = 1,
p(2) + p(1) + p(0) + p(1) + p(2) + p(3) = 1
0.1 + k + 0.2 + 2k + 0.3 + k = 1
0.6 + 4k = 1 4k = 1 0.6
4k = 0.4 k =
4
4 .
= 0.1
Hence the given probability function becomes,
x : 2 1 0 1 2 3
p(x) : 0.1 0.1 0.2 0.2 0.3 0.1
(ii) Cumulative distribution function F(x) = P(X < x)
73
x F(x) = P(X < x )
2 F(2) = P(X < 2 ) = 0.1
1 F(1) = P(X < 1 ) = P(X = 2) + P(X = 1)
= 0.1 + 0.1 = 0.2
0 F(0) = P(X < 0 ) = P(X=2) + P(X=1)+ P(X = 0)
= 0.1 + 0.1 + 0.2 = 0.4
1 F(1) = P(X < 1 ) = 0.6
2 F(2) = P(X < 2 ) = 0.9
3 F(3) = P(X < 3 ) = 1
F(x) = 0 if x < 2
= .1 if 2 < x < 1
= .2 if 1 < x < 0
= .4 if 0 < x < 1
= .6 if 1 < x < 2
= 0.9, if 2 < x < 3
= 1 if x > 3
Example 4
For the following probability distribution of X
X : 0 1 2 3
p(x) :
6
1
2
1
10
3
30
1
Find (i) P(X < 1) (ii) P(X < 2) (iii) P(0< X < 2)
Solution :
(i) P(X < 1)= P(X = 0) + P(X = 1)
= p(0) + p(1)
=
6
1
+
2
1
=
6
4
=
3
2
74
(ii) P(X < 2)= P(X = 0) + P(X = 1) + P(X = 2)
=
6
1
+
2
1
+
10
3
=
30
29
Aliter P(X < 2 ) can also be obtained as
P(X < 2 ) = 1 P(X > 2)
= 1 P(X = 3) = 1
30
1
=
30
29
(iii) P(0 < X < 2) = P(X = 1) =
2
1
8.1.4 Continuous Random Variable
A random variable X is said to be continuous if it takes a
continuum of values. i.e. if it takes all possible values between certain
defined limits.
For example,
(i) The amount of rainfall on a rainy day.
(ii) The height of individuals. (iii) The weight of individuals.
8.1.5 Probability function
A function f is said to be the probability density function
(p.d.f) of a continuous random variable X if the following conditions
are satisfied
(i) f(x) > 0 for all x (ii)
) (x f dx = 1
Remark :
(i) The probability that the random variable X lies in the interval
(a, b) is given by P(a < X < b) =
b
a
x f ) ( dx.
(ii) P(X = a) =
a
a
x f ) ( dx = 0
(iii) P(a < X < b) = P(a < X < b) = P(a < X < b) = P(a < X < b)
75
8.1.6 Continuous Distribution function
If X is a continuous random variable with p.d.f. f(x), then
the function F
X
(x) = P(X < x)
=
x
t f ) ( dt
is called the distribution function (d.f.) or cumulative distribution
function (c.d.f) of the random variable X.
Properties : The cumulative distribution function has the
following properties.
(i)
x
t L
F(x) = 0 i.e. F() = 0
(ii)
x
t L
F(x) = 1 i.e. F() = 1
(iii) Let F be the c.d.f. of a continuous random variable X with
p.d.f f . Then f(x) =
dx
d
F(x) for all x at which F is differentiable.
Example 5
A continuous random variable X has the following p.d.f.
f(x) =
'
< <
otherwise 0
2 0 for ) (2 x x k
Determine the value of k.
Solution :
If f(x) be the p.d.f., then
) (x f dx = 1
0
) (x f dx +
2
0
) (x f dx +
2
) (x f dx = 1
0 +
2
0
) (x f dx + 0 = 1
2
0
k(2 x) dx = 1
76
k
,
_
2
0
2 xdx dx = 1 k =
2
1
Hence f(x) =
'
< <
otherwise 0
2 0 for ) (2
2
1
x x
Example 6
Verify that
f(x) =
'
< <
otherwise 0
1 0 for 3
2
x x
is a p.d.f and evaluate the following probabilities
(i) P(X <
3
1
) (ii) P(
3
1
< X <
2
1
)
Solution :
Clearly f(x) > 0 for all x and hence one of the conditions for
p.d.f is satisfied.
) (x f dx =
1
0
) (x f dx =
1
0
2
3x dx = 1
The other condition for p.d.f is also satisfied.
Hence the given function is a p.d.f
(i) P(X <
3
1
) =
3
1
) (x f
dx P(X < x) =
x
t f ) ( dt
=
3
1
0
2
3x
dx =
27
1
(ii) P(
3
1
< X <
2
1
) =
2
1
3
1
) (x f dx
=
2
1
3
1
2
3x dx =
8
1
27
1
=
216
19
77
Example 7
Given the p.d.f of a continuous random variable X as
follows
f(x) =
'
< <
otherwise 0
1 0 for ) (1 x x kx
Find k and c.d.f
Solution :
If X is a continuous random variable with p.d.f f(x) then
) (x f dx = 1
1
0
k x (1x) dx = 1
k
1
0
3 2
3 2
1
]
1
x x
= 1 k = 6
Hence the given p.d.f becomes,
f(x) =
'
< <
otherwise 0
1 0 for ) (1 6 x x x
To find c.d.f F(x)
F(x) = 0 for x < 0
F(x) = P(X < x) =
x
x f
) ( dx
=
x
0
6x(1 x)dx = 3x
2
2x
3
for 0 < x < 1
F(x) = 1 for x > 1
The c.d.f of X is as follows.
F(x) = 0 for x < 0
= 3x
2
2x
3
for 0< x < 1.
= 1 for x > 1
78
Example 8
Suppose that the life in hours of a certain part of radio
tube is a continuous random variable X with p.d.f is given by
f(x) =
'
elsewhere 0
100 when ,
100
2
x
x
(i) What is the probability that all of three such tubes in a
given radio set will have to be replaced during the first
of 150 hours of operation?
(ii) What is the probability that none of three of the original
tubes will have to be replaced during that first 150 hours
of operation?
Solution :
(i) A tube in a radio set will have to be replaced during the first 150
hours if its life is < 150 hours. Hence, the required probability
p that a tube is replaced during the first 150 hours is,
p = P(X < 150) =
150
100
) (x f dx
=
150
100
2
100
x
dx =
3
1
The probability that all three of the original tubes will have to
replaced during the first 150 hours = p
3
=
3
3
1
,
_
=
27
1
(ii) The probability that a tube is not replaced during the first 150
hours of operation is given by
P(X > 150) = 1 P(X < 150) = 1
3
1
=
3
2
the probability that none of the three tubes will be replaced
during the 150 hours of operation =
3
3
2
,
_
=
27
8
79
EXERCISE 8.1
1) Which of the following set of functions define a probability
space on S = [x
1
, x
2
, x
3
]?
(i) p(x
1
) =
3
1
p(x
2
) =
2
1
p(x
3
) =
4
1
(ii) p(x
1
) =
3
1
p(x
2
) =
6
1
p(x
3
) =
2
1
(iii) p(x
1
) = 0 p(x
2
) =
3
1
p(x
3
) =
3
2
(iv) p(x
1
) = p(x
2
) =
3
2
p(x
3
) =
3
1
2) Consider the experiment of throwing a single die. The random
variable X represents the score on the upper face and assumes
the values as follows:
X : 1 2 3 4 5 6
p(x
i
) :
6
1
6
1
6
1
6
1
6
1
6
1
Is p(x
i
) a p.m.f?
3) A random variable X has the following probability distribution.
Values of X, x : 0 1 2 3 4 5 6 7 8
p(x) : a 3a 5a 7a 9a 11a 13a 15a 17a
(i) Determine the value of a
(ii) Find P(X < 3), P(X > 3) and P(0< X < 5)
4) The following function is a probability mass function - Verify.
p(x) =
'
otherwise 0
2 for
1 for
3
2
3
1
x
x
Hence find the c.d.f
5) Find k if the following function is a probability mass function.
p(x) =
'
otherwise 0
4 for
2
k
2 for
3
k
0 for
6
x
x
x
k
80
6) A random variable X has the following probability
distribution
values of X, x : 2 0 5
p(x) :
4
1
4
1
2
1
Evaluate the following probabilities
(a) P(X < 0) (b) P(X < 0) (c) P(0< X < 10)
7) A random variable X has the following probability function
Values of X, x : 0 1 2 3
p(x) :
16
1
8
3
k
16
5
(i) Find the value of k (ii) Construct the c.d.f. of X
8) A continuous random variable has the following p.d.f
f(x) = k x
2
, 0 < x < 10
= 0 otherwise.
Determine k and evaluate (i) P(.2 < X < 0.5) (ii) P(X < 3)
9) If the function f(x) is defined by
f(x) = ce
x
, 0 < x < . Find the value of c.
10) Let X be a continuous random variable with p.d.f.
f(x) =
'
+
<
otherwise 0
3 2 , 3
2 1 ,
1 0 ,
x a ax
x a
x ax
(i) Determine the constant a
(ii) Compute P(X < 1.5)
11) Let X be the life length of a certain type of light bulbs in
hours. Determine a so that the function
f(x) =
2
x
a
, 1000 < x < 2000
= 0 otherwise.
may be the probability density function.
81
12) The kms. X in thousands which car owners get with a certain
kind of tyre is a random variable having p.d.f.
f(x) =
20
20
1
x
e
, for x > 0
= 0 for x < 0
Find the probabilities that one of these tyres will last
(i) atmost 10,000 kms
(ii) anywhere from 16,000 to 24,000 kms
(iii) atleast 30,000 kms.
8.2 MATHEMATICAL EXPECTATION
The concept of Mathematical expectation plays a vital role in
statistics. Expected value of a random variable is a weighted average
of all the possible outcomes of an experiment.
If X is a discrete random value which can assume values
x
1
, x
2
, ... x
n
with respective probabilities p(x
i
) = P[X = x
i
]; i = 1, 2
...n then its mathematical expectation is defined as
E(X) =
n
i 1
x
i
p(x
i
), (Here
n
i 1
p(x
i
) = 1)
If X is a continuous random variable with probability density
function f (x), then
E(X) =
) ( x f x
dx
Note
E(X) is also known as the mean of the random variable X.
Properties
1) E(c) = c where c is constant
2) E(X + Y) = E(X) + E(Y)
3) E(aX + b) = aE(X) + b where a and b are constants.
4) E(XY) = E(X) E(Y) if X and Y are independent
82
Note
The above properties holds good for both discrete and
continous random variables.
Variance
Let X be a random variable. Then the Variance of X, denoted
by Var(X) or
2
x
is
Var(X) =
2
x
= E[X E(X)]
2
= E(X
2
) [E(X)]
2
The positive square root of Var(X) is called the Standard
Deviation of X and is denoted by
x
.
Example 9
A multinational bank is concerned about the waiting time
(in minutes) of its customer before they would use ATM for
their transaction. A study of a random sample of 500
customers reveals the following probability distribution.
X : 0 1 2 3 4 5 6 7 8
p(x) : .20 .18 .16 .12 .10 .09 .08 .04 .03
Calculate the expected value of waiting time, X, of the
customer
Solution :
Let X denote the waiting time (in minutes) per customer.
X : 0 1 2 3 4 5 6 7 8
p(x) : .20 .18 .16 .12 .10 .09 .08 .04 .03
Then E(X)= x p(x)
= (0 x .2) + (1 x 0.18) + ... + (8 x 0.03) = 2.71
The expected value of X is equal to 2.71 minutes. Thus the
average waiting time of a customer before getting access to ATM is
2.71 minutes.
83
Example 10
Find the expected value of the number of heads
appearing when two fair coins are tossed.
Solvtion :
Let X be the random variable denoting the number of heads.
Possible values of X : 0 1 2
Probabilities p(x
i
) :
4
1
2
1
4
1
The Expected value of X is
E(X) = x
1
p(x
1
) + x
2
p(x
2
) + x
3
p(x
3
)
= 0
,
_
4
1
+ 1
,
_
2
1
+ 2
,
_
4
1
= 1
Therefore, the expected number of heads appearing in the
experiment of tossing 2 fair coins is 1.
Example 11
The probability that a man fishing at a particular place
will catch 1, 2, 3, 4 fish are 0.4, 0.3, 0.2 and 0.1 respectively.
What is the expected number of fish caught?
Solution :
Possible values of X : 1 2 3 4
Probabilities p(x
i
) : 0.4 0.3 0.2 0.1
E(X) =
i
x
i
p(x
i
)
= x
1
p(x
1
) + x
2
p(x
2
) + x
2
p(x
3
) + x
4
p(x
4
)
= 1 (.4) +2(.3) + 3(.2) + 4(.1)
= .4 + .6 + .6 + .4 = 2
Example 12
A person receives a sum of rupees equal to the square
of the number that appears on the face when a balance die is
tossed. How much money can he expect to receive?
84
Solution :
Random variable X: as square of the number that can appear
on the face of a die. Thus
possible values of X : 1
2
2
2
3
2
4
2
5
2
6
2
probabilities p(x
i
) :
6
1
6
1
6
1
6
1
6
1
6
1
The Expected amount that he receives,
E(X) = 1
2
,
_
6
1
+ 2
2
,
_
6
1
+ ... +6
2
,
_
6
1
= Rs.
6
91
Example 13
A player tosses two fair coins. He wins Rs.5 if two heads
appear, Rs. 2 if 1 head appears and Rs.1 if no head occurs.
Find his expected amount of gain.
Solution :
Consider the experiment of tossing two fair coins. There are
four sample points in the sample space of this experiment.
i.e. S = {HH, HT, TH, TT}
Let X be the random variable denoting the amount that a
player wins associated with the sample point.
Thus,
Possible values of X (Rs.) : 5 2 1
Probabilities p(x
i
) :
4
1
2
1
4
1
E(X) = 5
,
_
4
1
+ 2
,
_
2
1
+ 1
,
_
4
1
=
4
5
+ 1 +
4
1
=
4
10
=
2
5
= Rs. 2.50
Hence expected amount of winning is Rs.2.50
85
Example 14
A random variable X has the probability function as
follows :
values of X : -1 0 1
probability : 0.2 0.3 0.5
Evaluate (i) E(3X +1) (ii) E(X
2
) (iii) Var(X)
Solution :
X : 1 0 1
p(x
i
) : 0.2 0.3 0.5
(i) E(3X+1)= 3E(X) + 1
Now E(X) = 1 x 0.2 + 0 x 0.3 + 1 x 0.5
= 1 x 0.2 + 0 + 0.5 = 0.3
E(3X + 1) = 3(0.3) + 1 = 1.9
(ii) E(X
2
) = x
2
p(x)
= (1)
2
x 0.2 + (0)
2
x 0.3 + (1)
2
x 0.5
= 0.2 + 0 + 0.5 = 0.7
(iii) Var(X) = E(X
2
) - [E(X)]
2
= .7 - (.3)
2
= .61
Example 15
Find the mean, variance and the standard deviation for
the following probability distribution
Values of X, x : 1 2 3 4
probability, p(x) : 0.1 0.3 0.4 0.2
Solution :
Mean = E(X) = x p(x)
= 1(0.1) + 2(0.3) + 3(0.4) + 4(0.2) = 2.7
Variance = E(X
2
) [E(X)]
2
Now E(X
2
) = x
2
p(x)
= 1
2
(0.1) +2
2
(0.3) +3
2
(0.4) + 4
2
(0.2) = 8.1
86
Variance = 8.1 (2.7)
2
= 8.1 7.29 = .81
Standard Deviation = 81 . 0 = 0.9
Example 16
Let X be a continuous random variable with p.d.f.
f(x) =
'
< <
otherwise 0
1 1 for
2
1
x
Find (i) E(X) (ii) E(X
2
) (iii) Var(X)
Solution :
(i) E(X) =
) ( x xf
dx (by definition)
=
2
1
1
1
x
dx =
2
1
1
1
2
2
1
]
1
x
= 0
(ii) E(X
2
) =
1
1
2
x
f(x) dx
=
1
1
2
x
2
1
dx
=
2
1
1
1
3
3
1
]
1
x
=
3
1
(iii) Var(X) = E(X
2
) [E(X)]
2
=
3
1
0 =
3
1
EXERCISE 8.2
1) A balanced die is rolled. A person recieves Rs. 10 when the
number 1 or 3 or 5 occurs and loses Rs. 5 when 2 or 4 or 6
occurs. How much money can he expect on the average per
roll in the long run?
2) Two unbiased dice are thrown. Find the expected value of the
sum of the points thrown.
87
3) A player tossed two coins. If two heads show he wins Rs. 4.
If one head shows he wins Rs. 2, but if two tails show he must
pay Rs. 3 as penalty. Calculate the expected value of the sum
won by him.
4) The following represents the probability distribution of D, the
daily demand of a certain product. Evaluate E(D).
D : 1 2 3 4 5
P[D=d] : 0.1 0.1 0.3 0.3 0.2
5) Find E(2X-7) and E(4X + 5) for the following probability
distribution.
X : 3 2 1 0 1 2 3
p(x) : . 05 .1 .3 0 .3 .15 .1
6) Find the mean, variance and standard deviation of the following
probability distribution.
Values of X : 3 2 1 0 1 2 3
Probability p(x) :
7
1
7
1
7
1
7
1
7
1
7
1
7
1
7) Find the mean and variance for the following probability
distribution.
f(x) =
'
<
0 , 0
0 , 2
2
x
x e
x
8.3 DISCRETE DISTRIBUTIONS
We know that the frequency distributions are based on
observed data derived from the collected sample information. For
example, we may study the marks of the students of a class and
formulate a frequency distribution as follows:
Marks No. of students
0 - 20 10
20-40 12
40-60 25
60-80 15
80-100 18
Total 80
88
The above example clearly shows that the observed frequency
distributions are obtained by grouping. Measures like averages,
dispersion, correlation, etc. generally provide us a consolidated view
of the whole observed data. This may very well be used in
formulating certain ideas (inference) about the characteristics of the
whole set of data.
Another type of distribution in which variables are distributed
according to some definite probability law which can be expressed
mathematically are called theoretical probability distribution.
The probability distribution is a total listing of the various values
the random variable can take along with the corresponding
probabilities of each value. For example; consider the pattern of
distribution of machine breakdown in a manufacturing unit. The
random variable would be the various values the machine breakdown
could assume. The probability corresponding to each value of the
breakdown as the relative frequency of occurence of the breakdown.
This probability distribution is constructed by the actual breakdown
pattern discussed over a period of time.
Theoretical probability distributions are basically of two types
(i) Discrete and (ii) Continuous
In this section, we will discuss theoretical discrete distributions
namely, Binomial and Poisson distributions.
8.3.1 Binomial Distribution
It is a distribution associated with repetition of independent
trials of an experiment. Each trial has two possible outcomes,
generally called success and failure. Such a trial is known as
Bernoulli trial.
Some examples of Bernoulli trials are :
(i) a toss of a coin (Head or tail)
(ii) the throw of a die (even or odd number)
89
An experiment consisiting of a repeated number of Bernoulli
trials is called a binomial experiment. A binomial experiment must
possess the following properties:
(i) there must be a fixed number of trials.
(ii) all trials must have identical probabilities of success (p) i.e. if
we call one of the two outcomes as success and the other
as failure, then the probability p of success remains as
constant throughout the experiment.
(iii) the trials must be independent of each other i.e. the result of
any trial must not be affected by any of the preceeding trial.
Let X denote the number of successes in n trials of a binomial
experiment. Then X follows a binomial distribution with parameters
n and p and is denoted by X~B(n, p).
A random variable X is said to follow Binomial distribution
with parameters n and p, if it assumes only non-negative values and
its probability mass function is given by
P[X=x] = p(x) =
n
C
x
p
x
q
n-x
; x = 0, 1, 2, ..n ; q = 1 p
Remark
(i)
n
x 0
p(x) =
n
x 0
n
C
x
p
x
q
n-x
= (q + p)
n
= 1
(ii)
n
C
r
=
r ... 3 . 2 . 1
) 1 ( ... ) 1 ( r n n n
Mean and Variance
For the binomial distribution
Mean = np
Variance = npq; Standard Deviation =
npq
Example 17
What is the probability of getting exactly 3 heads in 8
tosses of a fair coin.
90
Solution :
Let p denote the probability of getting head in a toss.
Let X be the number of heads in 8 tosses.
Then p =
2
1
, q =
2
1
and n = 8
Probability of getting exactly 3 heads is
P(X = 3) =
8
c
3
3
2
1
,
_
5
2
1
,
_
=
3 2 1
6 7 8
x x
x x
8
2
1
,
_
=
32
7
Example 18
Write down the Binomial distribution whose mean is 20
and variance being 4.
Solution :
Given mean, np = 20 ; variance, npq = 4
Now q =
np
npq
=
20
4
=
5
1
p = 1 q =
5
4
From np = 20, we have
n =
p
20
=
5
4
20
= 25
Hence the binomial distribution is
p(x) =
n
C
x
p
x
q
n-x
=
25
C
x
x
,
_
5
4
x n
,
_
5
1
, x = 0, 1, 2, ... , 25
Example 19
On an average if one vessel in every ten is wrecked,
find the probability that out of five vessels expected to arrive,
atleast four will arrive safely.
Solution :
Let the probability that a vessel will arrive safely, p =
10
9
91
Then probability that a vessel will be wrecked, q = 1p =
10
1
No. of vessels, n = 5
The probability that atleast 4 out of 5 vessels to arrive
safely is
P(X > 4) = P(X = 4) + P(X = 5)
=
5
C
4
4
10
9
,
_
10
1
+
5
C
5
5
10
9
,
_
= 5(.9)
4
(.1) + (.9)
5
= .91854
Example 20
For a binomial distribution with parameters n = 5 and
p = .3 find the probabilities of getting (i) atleast 3 successes
(ii) atmost 3 successes.
Solution :
Given n = 5, p = .3 q = .7
(i) The probability of atleast 3 successes
P(X>3) = P(X =3) + P(X = 4) + P(X = 5)
=
5
C
3
(0.3)
3
(0.7)
2
+
5
C
4
(0.3)
4
(0.7) +
5
C
5
(.3)
5
(7)
0
= .1631
(ii) The probability of atmost 3 successes
P(X < 3)= P(X =0) + P(X = 1) + P(X = 2) + P(X = 3)
= (.7)
5
+
5
C
1
(.7)
4
(.3) +
5
C
2
(.7)
3
(.3)
2
+
5
C
3
(.7)
2
(.3)
3
= .9692
8.3.2 Poisson distribution
Poisson distribution is also a discrete probability distribution
and is widely used in statistics. Poisson distribuition occurs when
there are events which do not occur as outcomes of a definite number
92
of trials of an experiment but which occur at random points of time
and space wherein our interest lies only in the number of occurences
of the event, not in its non-occurances. This distribution is used to
describe the behaviour of rare events such as
(i) number of accidents on road
(ii) number of printing mistakes in a book
(iii) number of suicides reported in a particular city.
Poisson distribution is an approximation of binomial
distribution when n (number of trials) is large and p, the probability
of success is very close to zero with np as constant.
A random variable X is said to follow a Poisson distribution
with parameter > 0 if it assumes only non-negative values and
its probability mass function is given by
P[X = x] =p(x) =
! x
e
x
; x = 0, 1, 2, ...
Remark
It should be noted that
0 x
P[X = x] =
0 x
p(x) = 1
Mean and Variance
For the of poisson distribution
Mean, E(X) = , Variance, Var(X) , S.D =
Note
For poission distribution mean and variance are equal.
Example 21
Find the probability that atmost 5 defective fuses will
be found in a box of 200 fuses if experience shows that 2
percent of such fuses are defective. (e
4
= 0.0183)
93
Solution :
p = probability that a fuse is defective =
100
2
n = 200
= np =
100
2
x 200 = 4
Let X denote the number of defective fuses found in a box.
Then the distribution is given by
P[X = x] = p(x) =
!
4
4
x
e
x
So, probability that atmost 5 defective fuses will be found in
a box of 200 fuses
= P(X < 5)
= P(X = 0) + P(X = 1) + P(X = 2)
+ P(X = 3) + P(X =4) + P(X = 5)
= e
4
+
! 1
4
4
e
+
! 2
4
2 4
e
+
! 3
4
3 4
e
+
! 4
4
4 4
e
+
! 5
4
5 4
e
=e
4
(1 +
! 1
4
+
! 2
4
2
+
! 3
4
3
+
! 4
4
4
+
! 5
4
5
)
=0.0183 x
15
643
=0.785
Example 22
Suppose on an average 1 house in 1000 in a certain
district has a fire during a year. If there are 2000 houses in
that district, what is the probability that exactly 5 houses will
have fire during the year? (e
2
= .13534)
Solution :
p = probability that a house catches fire =
1000
1
94
Here n = 2000 = np = 2000 x
1000
1
= 2
Let X denote the number of houses that has a fire
Then the distribution is given by P[X = x] =
!
2
2
x
e
x
, x = 0, 1, 2,...
Probability that exactly 5 houses will have a fire during
the year is
P[X = 5] =
! 5
2
5 2
e
=
120
32 13534 .
= .0361
Example 23
The number of accidents in a year attributed to taxi
drivers in a city follows poisson distribution with mean 3. Out
of 1000 taxi drivers, find the approximate number of drivers
with
(i) no accident in a year
(ii) more than 3 accidents in a year
Solution :
Here = np = 3
N = 1000
Then the distribution is
P[X = x] =
!
3
3
x
e
x
where X denotes the number accidents.
(i) P (no accidents in a year) = P(X = 0)
= e
3
= 0.05
Number of drivers with no accident = 1000 x 0.05 = 50
(ii) P (that more than 3 accident in a year ) = P(X > 3)
= 1 P(X < 3)
= 1
1
]
1
+ + +
! 3
3
! 2
3
! 1
3
3 3 2 3 1 3
3 e e e
e
95
= 1 e
3
[1 + 3 + 4.5 + 4.5]
= 1e
3
(13) = 1 .65 = .35
Number of drivers with more than 3 accidents
= 1000 x 0.35 = 350
EXERCISE 8.3
1) Ten coins are thrown simultaneously. Find the probability of
getting atleast 7 heads.
2) In a binomial distribution consisting of 5 independent trials,
probabilities of 1 and 2 successes are 0.4096 and 0.2048
respectively. Find the parameter p of the distribution.
3) For a binomial distribution, the mean is 6 and the standard
deviation is 2 . Write down all the terms of the distribution.
4) The average percentage of failure in a certain examination is
40. What is the probability that out of a group of 6 candidates
atleast 4 passed in the examination?
5) An unbiased coin is tossed six times. What is the probability of
obtaining four or more heads?
6) It is stated that 2% of razor blades supplied by a manufacturer
are defective. A random sample of 200 blades is drawn from
a lot. Find the probability that 3 or more blades are defective.
(e
4
= .01832)
7) Find the probability that atmost 5 defective bolts will be found
in a box of 200 bolts, if it is known that 2% of such bolts are
expected to be defective (e
4
= 0.01832)
8) An insurance company insures 4,000 people against loss of
both eyes in car acidents. Based on previous data, the rates
were computed on the assumption that on the average 10
persons in 1,00,000 will have car accidents each year that result
in this type of injury. What is the probability that more
than 3 of the injured will collect on their policy in a given
year? ( e
0. 4
= 0.6703)
96
9) It is given that 3% of the electric bulbs manufactured by a
company are defective. Find the probability that a sample of
100 bulbs will contain (i) no defective (ii) exactly one defective.
(e
3
= 0.0498) .
10) Suppose the probability that an item produced by particular
machine is defective equals 0.2. If 10 items produced from
this machine are selected at random, what is the probability
that not more than one defective is found? (e
2
= .13534)
8.4 CONTINUOUS DISTRIBUTIONS
The binomial and Poisson distributions discussed in the
previous section are the most useful theoretical distributions. In
order to have mathematical distribution suitable for dealing with
quantities whose magnitudes vary continuously like heights and
weights of individuals, a continuous distribution is needed. Normal
distribution is one of the most widely used continuous distributions.
8.4.1 Normal Distribution
Normal Distribution is considered to be the most important
and powerful of all the distributions in statistics. It was first
introduced by De Moivre in 1733 in the development of probability.
Laplace (1749 - 1827) and Gauss (1827 - 1855) were also
associated with the development of Normal distribution.
A random variable X is said to follow a Normal Distribution
with mean and variance
2
denoted by X ~ N(,
2
), if its
probability density function is given by
f(x) =
2
2
2
) (
2
1
x
e
, < x < , < < , > 0
Remark
The parameters and
2
completely describe the normal
distribution. Normal distribution could be also considered as limiting
form of binomial distribution under the following conditions:
97
(i) n, the number of trials is indefinitely large i.e. n
(ii) neither p nor q is very small.
The graph of the p.d.f of the normal distribution is called the
Normal curve, and it is given below.
x =
Normal probability curve
8.4.2 Properties of Normal Distribution
The following are some of the important properties of the
normal curve and the normal distribution.
(i) The curve is bell - shaped and symmetric about x =
(ii) Mean, Median and Mode of the distribution coincide.
(iii) There is one maximum point of the normal curve which occurs
at the mean (). The height of the curve declines as we go in
either direction from the mean.
(iv) The two tails of the curve extend infinitely and never touch
the horizontal (x) axis.
(v) Since there is only one maximum point, the normal curve is
unimodal i.e. it has only one mode.
(vi) Since f(x) being the probability, it can never be negative and
hence no portion of the curve lies below the x - axis.
(vii) The points of inflection are given by x = +
(viii) Mean Deviation about mean
2
=
5
4
f(x)
98
(ix) Its mathematical equation is completely determined if the mean
and S.D are known i.e. for a given mean and S.D , there
is only one Normal distribution.
(x) Area Property : For a normal distribution with mean and
S.D , the total area under normal curve is 1, and
(a) P( < X < + ) = 0.6826
i.e. (mean) + 1 covers 68.27%;
(b) P( 2 < X < + 2) = 0.9544
i.e. (mean) + 2 covers 95.45% area
(c) P( 3 < X < + 3) = 0.9973
i.e. (mean) + 3 covers 99.73% area
10.4.3 Standard Normal Distribution
A random variable which has a normal distribution with a mean
= 0 and a standard deviation = 1 is referred to as Standard
Normal Distribution.
Remark
(i) If X~N(,
2
), then Z =
X
is a standard normal variate
with E(Z) = 0 and var(Z) = 1 i.e. Z ~ N(0, 1).
(ii) It is to be noted that the standard normal distribution has the
same shape as the normal distribution but with the special
properties of = 0 and = 1.
Z Z = 0
99
A random variable Z is said to have a standard normal
distribution if its probability density function is given by
(z) =
2
2
2
1
z
e
, < z <
Example 24
What is the probability that Z
(a) lies between 0 and 1.83
(b) is greater than 1.54
(c) is greater than 0.86
(d) lies between 0.43 and 1.12
(e) is less than 0.77
Solution :
(a) Z lies between 0 and 1.83.
P(0 < Z < 1.83) = 0.4664 (obtained from the tables directly)
(b) Z is greater than 1.54 i.e. P(Z > 1.54)
Z=0 1.83
Z
Z=0 1.54
Z
100
Since the total area to the right of Z = 0 is 0.5 and area between
Z = 0 and 1.54 (from tables) is 0.4382
P(Z > 1.54)= 0.5 P(0 < Z < 1.54)
= 0.5 .4382 = .0618
(c) Z is greater than 0.86 i.e. P(Z > 0.86)
Here the area of interest P(Z > 0.86) is represented by the
two components.
(i) Area between Z = 0.86 and Z = 0, which is equal to
0.3051 (from tables)
(ii) Z > 0, which is 0.5
P(Z > 0.86) = 0.3051 + 0.5 = 0.8051
(d) Z lies between 0.43 and 1.12
P(0.43 < Z < 1.12) = P(0 < Z < 1.12) P(0 < Z < 0.43)
= 0.3686 0.1664 (from tables)
= 0.2022.
Z=0 .86
Z
Z=0 .43 1.12
Z
101
(e) Z is less than 0.77
P( Z < 0.77) = 0.5 + P(0 < Z < 0.77)
= 0.5 + .2794 = .7794 (from tables)
Example 25
If X is a normal random variable with mean 100 and
variance 36
find (i) P(X > 112) (ii) P(X < 106) (iii) P(94 < X < 106)
Solution :
Mean, = 100 ; Variance,
2
= 36 ; S.D, = 6
Then the standard normal variate Z is given by
Z =
X
=
6
100 X
(i) When X = 112, then Z =
6
100 112
= 2
P(X > 112) = P(Z > 2)
= P (0 < Z < ) P(0 < Z < 2)
= 0.5 0.4772 = 0.0228 (from tables)
Z=0 .77
Z
Z=0 2
Z
102
(ii) For a given value X = 106, Z =
6
100 106
= 1
P(X < 106) = P (Z < 1)
= P ( < Z < 0) + P(0 < Z < 1)
= 0.5 + 0.3413 = 0.8413 (from tables)
(iii) When X = 94 , Z =
6
100 94
= 1
X = 106, Z =
6
100 106
= + 1
P(94 < X < 106) = P(1 < Z < 1)
= P(1 < Z < 0) + P(0 < Z < 1)
= 2 P(0 < Z < 1) (by symmetry)
= 2 (0.3413)
= 0.6826
Z=0 1 Z
Z=0 1
Z
1
103
Example 26
In a sample of 1000 candidates the mean of certain test
is 45 and S.D 15. Assuming the normality of the distrbution
find the following:
(i) How many candidates score between 40 and 60?
(ii) How many candidates score above 50?
(iii) How many candidates score below 30?
Solution :
Mean = = 45 and S.D. = = 15
Then Z =
X
=
15
45 X
(i) P (40 < X < 60) = P(
15
45 0 4
< Z <
15
45 0 6
)
= P(
3
1
< Z < 1)
= P(
3
1
< Z < 0) + P(0 < Z < 1)
= P(0 < Z < 0.33) + P(0 < Z < 1)
= 0.1293 + 0.3413 (from tables)
P(40 < X < 60) = 0.4706
Hence number of candidates scoring between 40 and 60
= 1000 x 0.4706 = 470.6
~
471
Z=0 1
Z
3
1
104
(ii) P(X > 50) = P(Z >
3
1
)
= 0.5 P(0 < Z <
3
1
) = 0.5 - P(0 < Z 0.33)
= 0.5 0.1293 = 0.3707 (from tables)
Hence number of candidates scoring above 50
= 1000 x 0.3707 = 371.
(iii) P(X < 30) = P(Z < 1)
= 0.5 P(1 < Z < 0)
= 0.5 P(0 < Z < 1) Symmetry
= 0.5 0.3413 = 0.1587 (from tables)
Number of candidates scoring less than 30
= 1000 x 0.1587 = 159
Example 27
The I.Q (intelligence quotient) of a group of 1000 school
children has mean 96 and the standard deviation 12.
Z=0
Z
3
1
Z=0 1
Z
105
Assuming that the distribution of I.Q among school children
is normal, find approximately the number of school children
having I.Q.
(i) less than 72 (ii) between 80 and 120
Solution :
GivenN = 1000, = 96 and = 12
Then Z =
X
=
12
96 X
(i) P(X < 72) = P(Z < 2)
= P( < Z < 0) P(2 < Z < 0)
= P(0 < Z < ) P(0 < Z < 2) (By symmetry)
= 0.5 0.4772 (from tables) = 0.0228.
Number of school children having I.Q less than 72
=1000 x 0.0228 = 22.8
~
23
(ii) P(80 < X < 120) = P(1.33 < Z < 2)
Z=0 2
Z
Z=0 2 Z 1.33
106
= P(1.33 < Z < 0) + P(0< Z < 2)
= P(0 < Z < 1.33) + P(0 < Z < 2)
= .4082 + .4772 (from tables)
= 0.8854
Number of school children having I.Q. between 80 and 120
= 1000 x .8854 = 885.
Exercise 28
In a normal distribution 20% of the items are less than
100 and 30% are over 200. Find the mean and S.D of the
distribution.
Solution :
Representing the given data diagramtically,
From the diagram
P(Z
1
< Z < 0) = 0.3
i.e. P(0 < Z < Z
1
) = 0.3
Z
1
= 0.84 (from the normal table)
Hence 0.84 =
100
i.e. 100 = 0.84 ----------(1)
P (0 < Z < Z
2
) = 0.2
Z
2
= 0.525 (from the normal table)
Z=0 X=200
Z=Z
2
X=100
Z=Z
1
107
Hence 0.525 =
200
i.e. 200 = 0.525 ----------(2)
Solving (1) and (2), = 161.53
= 73.26
EXERCISE 8.4
1) Find the area under the standard normal curve which lies
(i) to the right of Z = 2.70
(ii) to the left of Z = 1.73
2) Find the area under the standard normal curve which lies
(i) between Z = 1.25 and Z = 1.67
(ii) between Z = 0.90 and Z = 1.85
3) The distribution of marks obtained by a group of students may
be assumed to be normal with mean 50 marks and standard
deviation 15 marks. Estimate the proportion of students with
marks below 35.
4) The marks in Economics obtained by the students in Public
examination is assumed to be approximately normally distributed
with mean 45 and S.D 3. A student taking this subject is chosen
at random. What is the probability that his mark is above 70?
5) Assuming the mean height of soldiers to be 68.22 inches with
a variance 10.8 inches. How many soldiers in a regiment of
1000 would you expect to be over 6 feet tall ?
6) The mean yield for one-acre plot is 663 kgs with a S.D 32
kgs. Assuming normal distribution, how many one-acre plot
in a batch of 1000 plots would you expect to have yield (i)
over 700 kgs (ii) below 650 kgs.
7) A large number of measurements is normally distributed with
a mean of 65.5" and S.D of 6.2". Find the percentage of
measurements that fall between 54.8" and 68.8".
108
8) The diameter of shafts produced in a factory conforms to
normal distribution. 31% of the shafts have a diameter less
than 45mm. and 8% have more than 64mm. Find the mean
and standard deviation of the diameter of shafts.
9) The results of a particular examination are given below in a
summary form.
Result percentage of candidates
1. passed with distinction 10
2. passed 60
3. failed 30
It is known that a candidate gets plucked if he obtained less
tham 40 marks out of 100 while he must obtain atleast 75
marks in order to pass with distinction. Determine the mean
and the standard deviation of the distribution assuming this
to be normal.
EXERCISE 8.5
Choose the correct answer
1) If a fair coin is tossed three times the probability function p(x)
of the number of heads x is
(a) x 0 1 2 3 (b) x 0 1 2 3
p(x)
8
1
8
1
8
2
8
3
p(x)
8
1
8
3
8
3
8
1
(c) x 0 1 2 3 (d) none of these
p(x)
8
1
8
1
8
2
8
3
2) If a discrete random variable has the probability mass function as
x 0 1 2 3
p(x) k 2k 3k 5k then the value of k is
(a)
11
1
(b)
11
2
(c)
11
3
(d)
11
4
109
3) If the probability density function of a variable X is defined as
f(x) = Cx (2 -x), 0 < x < 2 then the value of C is
(a)
3
4
(b)
4
6
(c)
4
3
(d)
5
3
4) The mean and variance of a binomial distribution are
(a) np, npq (b) pq, npq (c)np, npq (d) np, nq
5) If X~N (, ), the standard Normal variate is distributed as
(a) N(0, 0) (b) N(1, 0) (c) N(0, 1) (d) N(1, 1)
6) The normal distribution curve is
(a) Bimodal (b) Unimodal
(c) Skewed (d) none of these
7) If X is a poission variate with P(X = 1) = P(X = 2), the mean
of the Poisson variate is equal to
(a) 1 (b) 2 (c) 2 (d) 3
8) The standard deviation of a Poissson variate is 2, the mean of
the poisson variate is
(a) 2 (b) 4 (c) 2 (d)
2
1
9) The random variables X and Y are independent if
(a) E(X Y) = 1 (b) E(XY) = 0
(c) E(X Y) = E(X) E(Y) (d) E(X+Y) = E(X) + E(Y)
10) The mean and variance of a binomial distribution are 8 and 4
respectively. Then P(X = 1) is equal to
(a)
12
2
1
(b)
4
2
1
(c)
6
2
1
(d)
10
2
1
11) If X~N (,
2
), the points of inflection of normal distribution
curve are
(a) + (b) + (c) + (d) + 2
12) If X~N (,
2
), the maximum probability at the point of
inflection of normal distribution is
(a)
2
1
2
1
e
(b)
2
1
2
1
e
(c)
2
1
(d)
2
1
110
13) If a random variable X has the following probability distribution
X 1 2 1 2
p(x)
3
1
6
1
6
1
3
1
then the expected value of X is
(a)
2
3
(b)
6
1
(c)
2
1
(d)
3
1
14) If X~N (5, 1), the probability density function for the normal
variate X is
(a)
2
5
1
2
1
) (
2 5
1
x
e
(b)
2
5
1
2
1
) (
2
1
x
e
(c)
2
2
1
) 5 (
2
1
x
e
(d)
2
2
1
) 5 (
1
x
e
15) If X~N (8, 64), the standard normal variate Z will be
(a) z =
8
64 X
(b)
64
8 X
(c)
8
8 X
(d)
8
8 X
111
9.1 SAMPLING AND TYPES OF ERRORS
Sampling is being used in our everyday life without knowing
about it. For examples, a cook tests a small quantity of rice to see
whether it has been well cooked and a grain merchant does not
examine each grain of what he intends to purchase, but inspects
only a small quantity of grains. Most of our decisions are based on
the examination of a few items only.
In a statistical investigation, the interest usually lies in the
assessment of general magnitude and the study of variation with
respect to one or more characteristics relating to individuals
belonging to a group. This group of individuals or units under study
is called population or universe. Thus in statistics, population is
an aggregate of objects or units under study. The population may
be finite or infinite.
9.1.1 Sampling and sample
Sampling is a method of selecting units for analysis such as
households, consumers, companies etc. from the respective
population under statistical investigation. The theory of sampling is
based on the principle of statistical regularity. According to
this principle, a moderately large number of items chosen at random
from a large group are almost sure on an average to possess the
characteristics of the larger group.
A smallest non-divisible part of the population is called a unit.
A unit should be well defined and should not be ambiguous. For
example, if we define unit as a household, then it should be defined
that a person should not belong to two households nor should it
leave out persons belonging to the population.
A finite subset of a population is called a sample and the
number of units in a sample is called its sample size.
SAMPLING TECHNIQUES
AND STATISTICAL INFERENCE
9
112
By analysing the data collected from the sample one can draw
inference about the population under study.
9.1.2 Parameter and Statistic
The statistical constants of a population like mean (), variance
(
2
), proportion (P) are termed as parameters. Statistical
measures like mean (X), variance (s
2
), proportion (p) computed
from the sampled observations are known as statistics.
Sampling is employed to throw light on the population
parameter. A statistic is an estimate based on sample data to draw
inference about the population parameter.
9.1.3 Need for Sampling
Suppose that the raw materials department in a company
receives items in lots and issues them to the production department
as and when required. Before accepting these items, the inspection
department inspects or tests them to make sure that they meet the
required specifications. Thus
(i) it could inspect all items in the lot or
(ii) it could take a sample and inspect the sample for defectives
and then estimate the total number of defectives for the
population as a whole.
The first approach is called complete enumeration (census).
It has two major disadvantages namely, the time consumed and the
cost involved in it.
The second approach that uses sampling has two major
advantages. (i) It is significantly less expensive. (ii) It takes least
possible time with best possible results.
There are situations that involve destruction procedure where
sampling is the only answer. A well-designed statistical sampling
methodology would give accurate results and at the same time will
result in cost reduction and least time. Thus sampling is the best
available tool to decision makers.
113
9.1.4 Elements of Sampling Plan
The main steps involved in the planning and execution of sample
survey are :
(i) Objectives
The first task is to lay down in concrete terms the basic
objectives of the survey. Failure to define the objective(s) will clearly
undermine the purpose of carrying out the survey itself. For example,
if a nationalised bank wants to study savings bank account holders
perception of the service quality rendered over a period of one
year, the objective of the sampling is, here,to analyse the perception
of the account holders in the bank.
(ii) Population to be covered
Based on the objectives of the survey, the population should
be well defined. The characteristics concerning the population under
study should also be clearly defined. For example, to analyse the
perception of the savings bank account holders about the service
rendered by the bank, all the account holders in the bank constitute
the population to be investigated.
(iii) Sampling frame
In order to cover the population decided upon, there should
be some list, map or other acceptable material (called the frame)
which serves as a guide to the population to be covered. The list or
map must be examined to be sure that it is reasonably free from
defects. The sampling frame will help us in the selection of sample.
All the account numbers of the savings bank account holders in the
bank are the sampling frame in the analysis of perception of the
customers regarding the service rendered by the bank.
(iv) Sampling unit
For the purpose of sample selection, the population should
be capable of being divided up into sampling units. The division of
the population into sampling units should be unambiguous. Every
element of the population should belong to just one sampling unit.
114
Each account holder of the savings bank account in the bank, form
a unit of the sample as all the savings bank account holders in the
bank constitute the population.
(v) Sample selection
The size of the sample and the manner of selecting the sample
should be defined based on the objectives of the statistical
investigation. The estimation of population parameter along with
their margin of uncertainity are some of the important aspects to be
followed in sample selection.
(vi) Collection of data
The method of collecting the information has to be decided,
keeping in view the costs involved and the accuracy aimed at.
Physical observation, interviewing respondents and collecting data
through mail are some of the methods that can be followed in
collection of data.
(vii) Analysis of data
The collected data should be properly classified and subjected
to an appropriate analysis. The conclusions are drawn based on
the results of the analysis.
9.1.5 Types of Sampling
Types of
Sampling
Probability Sampling
or
Random sampling
Non-Probability Sampling
or
Non-Random Sampling
Simple
Random
Sampling
Stratified
Random
Sampling
Systematic
Sampling
Cluster
Sampling
Quota
Sampling
Expert
Sampling
Convenience
Sampling
115
The technique of selecting a sample from a population usually
depends on the nature of the data and the type of enquiry. The
procedure of sampling may be broadly classified under the following
heads :
(i) Probability sampling or random sampling and
(ii) Non-probability sampling or non-random sampling.
(i) Probability sampling
Probability sampling is a method of sampling that ensures that
every unit in the population has a known non-zero chance of being
included in the sample.
The different methods of random sampling are :
(a) Simple Random Sampling
Simple random sampling is the foundation of probability
sampling. It is a special case of probability sampling in which every
unit in the population has an equal chance of being included in a
sample. Simple random sampling also makes the selection of every
possible combination of the desired number of units equally likely.
Sampling may be done with or without replacement.
It may be noted that when the sampling is with replacement,
the units drawn are replaced before the next selection is made.The
population size remains constant when the sampling is with
replacement.
If one wants to select n units from a population of size N
without replacement,then every possible selection of n units must
have the same probability. Thus there are
N
c
n
possible ways to
pick up n units from the population of size N. Simple random
sampling guarantees that a sample of n units, has the same probability
n
c
N
1
of being selected.
Example
A bank wants to study the Savings Bank account holders
perception of the service quality rendered over a period of one
116
year. The bank has to prepare a complete list of savings bank
account holders, called as sampling frame, say 500. Now the
process involves selecting a sample of 50 out of 500 and interviewing
them. This could be achieved in many ways. Two common ways
are :
(1) Lottery method : Select 50 slips from a box containing well
shuffled 500 slips of account numbers without replacement.
This method can be applied when the population is small
enough to handle.
(2) Random numbers method : When the population size is
very large, the most practical and inexpensive method of
selecting a simple random sample is by using the random
number tables.
(b) Stratified Random Sampling
Stratified random sampling involves dividing the population
into a number of groups called strata in such a manner that the
units within a stratum are homogeneous and the units between the
strata are hetrogeneous. The next step involves selecting a simple
random sample of appropriate size from each stratum. The sample
size in each stratum is usually of (a) equal size, (b) proportionate
to the number of units in the stratum.
For example, a marketing manager in a consumer product
company wants to study the customers attitude towards a new
product in order to improve the sales. Then three typical cities that
will influence the sales will be considered as three strata. The
customers within a city are similar but between the cities are vastly
different. Selection of the customers for the study from each city
has to be a random sample to draw meaningful inference on the
whole population.
(c) Systematic sampling
Systematic sampling is a convenient way of selecting a sample.
It requires less time and cost when compared to simple random
sampling.
117
In this method, the units are selected from the population at a
uniform interval. To facilitate this we arrange the items in numerical,
alphabetical, geographical or any other order. When a complete
list of the population is available, this method is used.
If we want to select a sample of size n from a population of
size N under systematic sampling, frist select an item j at random
such that 1 < j < k where k =
1
N
+ n
and k is the nearest possible
integer. Then j, j + k, j + 2k , ... , j + (n1) k
th
items constitute a
systematic random sample.
For example, if we want to select a sample of 9 students out
of 105 students numbered as 1, 2, ..., 105 , select a student among
1, 2, ... , 11 at random (say at 3rd position). Here k =
10
105
= 10.5
and k = 11. Hence students at the positions 3, 14, 25, 36, 47,
58, 69, 80, 91 form a random sample of size 9.
(d) Cluster sampling
Cluster sampling is used when the population is divided into
groups or clusters such that each cluster is a representative of the
population.
If a study has to be done to find out the number of children
that each family in Chennai has, then the city can be divided into
several clusters and a few clusters can be chosen at random. Every
family in the chosen clusters can be a sample unit.
In using cluster sampling the following points should be noted
(a) For getting precise results clusters should be as small as
possible consistent with the cost and limitations of the survey
and
(b) The number of units in each cluster must be more or less equal.
(ii) Non-Probability Sampling
The fundamental difference between probability sampling and
non-probability sampling is that in non-probability sampling
procedure, the selection of the sample units does not ensure a known
118
chance to the units being selected. In other words the units are
selected without using the principle of probability. Even though
the non-probability sampling has advantages such as reduced cost,
speed and convenience in implementation, it lacks accuracy in view
of the seleciton bias. Non-probability sampling is suitable for pilot
studies and exploratory research
The methods of non-random sampling are :
(a) Purposive sampling
In this sampling, the sample is selected with definite purpose
in view and the choice of the sampling units depends entirely on the
discretion and judgement of the investigator.
For example, if an investigator wants to give the picture that
the standard of living has increased in the city of Madurai, he may
take the individuals in the sample from the posh localities and ignore
the localities where low income group and middle class families live.
(b) Quota sampling
This is a restricted type of purposive sampling. This consists
in specifying quotas of the samples to be drawn from different groups
and then drawing the required samples from these groups by
purposive sampling. Quota sampling is widely used in opinion and
market research surveys.
(c) Expert opinion sampling or expert sampling
Expert opinion sampling involves gathering a set of people
who have the knowledge and expertise in certain key areas that are
crucial to decision making. The advantage of this sampling is that it
acts as a support mechanism for some of our decisions in situations
where virtually no data are available. The major disadvantage is
that even the experts can have prejudices, likes and dislikes that
might distort the results.
9.1.6 Sampling and non-sampling errors
The errors involved in the collection of data, processing and
analysis of data may be broadly classfied as (i) sampling errors
and (ii) non-sampling errors.
119
(i) Sampling errors
Sampling errors have their origin in sampling and arise due to
the fact that only a part of the population has been used to estimate
population parameters and draw inference about the population.
Increasing in the sample size usually results in decrease in the
sampling error.
Sampling errors are primarily due to some of the following
reasons :
(a) Faulty selection of the sample
Some of the bias is introduced by the use of defective sampling
technique for the selection of a sample in which the investigator
deliberately selects a representative sample to obtain certain results.
(b) Substitution
If difficulty arise in enumerating a particular sampling unit
included in the random sample, the investigators usually substitute a
convenient member of the population leading to sampling error.
(c) Faulty demarcation of sampling units
Bias due to defective demarcation of sampling units is
particularly significant in area surveys such as agricultural
experiments. Thus faulty demarcation could cause sampling error.
(ii) Non-sampling errors
The non-sampling errors primarily arise at the stages of
observation, classification and analysis of data.
Non-sampling errors can occur at every stage of the planning
or execution of census or sample surveys. Some of the more
important non-sampling errors arise from the following factors :
(a) Errors due to faulty planning and definitions
Sampling error arises due to improper data specification, error
in location of units, measurement of characteristics and lack
of trained investigators.
120
(b) Response errors
These errors occur as a result of the responses furnished by
the respondents.
(c) Non-response bias
Non-response biases occur due to incomplete information on
all the sampling units.
(d) Errors in coverage
These errors occur in the coverage of sampling units.
(e) Compiling errors
These errors arise due to compilation such as editing and
coding of responses.
EXERCISE 9.1
1) Explain sampling distribution and standard error.
2) Distinguish between the terms parameter and statistic
3) Explain briefly the elements of sampling plan.
4) Discuss probability sampling.
5) Discuss non-probability sampling.
6) Distinguish between sampling and non sampling errors.
9.2 SAMPLING DISTRIBUTIONS
Consider all possible samples of size n which can be drawn
from a given population. For each sample we can compute a statistic
such as mean, standard deviation, etc. which will vary from sample
to sample. The aggregate of various values of the statistic under
consideration may be grouped into a frequency distribution. This
distribution is known as sampling distribution of the statistic. Thus
the probability distribution of all the possible values that a sample
statistic can take, is called the sampling distribution of the statistic.
Sample mean and sample proportion based on a random sample
are examples of sample statistic.
Supposing a Market Research Agency wants to estimate the
annual household expenditure on consumer durables from among
121
the population of households (say 50000 households) in Tamil Nadu.
The agency can choose fifty different samples of 50 households
each. For each of the samples, we can calculate the mean annual
expenditure on consumer durables as given in the following table :
Sample No. Total expenditure for Mean
50 households Rs.
1 100000 2000
2 300000 6000
3 200000 4000
4 150000 3000
. . .
. . .
. . .
49 600000 12000
50 400000 8000
The distribution of all the sample means is known as the
sampling distribution of the mean. The figures Rs. 2000,
6000 ... 8000 are the sampling distribution of the means.
Similarly, the sampling distribution of the sample variance
and sample proportion can also be obtained.
In a sample of n items if n
1
belongs to Category-1 and
n- n
1
belongs to the Category-2, then
n
n
1
is defined as the
sample proportion p belonging to the first category and
n
n n
1
.
It may be noted that
(i) the sample mean X=
n
i
X
=
n
n
X ... X X
2 1
+ + +
Thus X is a random variable and will be different every time
when a new sample of n observations are taken
(ii) X is an unbiased estimator of the population mean .
i.e. E( X) = , denoted by
X
= .
(iii) the standard deviation of the sample mean X is given by
X
=
n
1 N
N
n
n
PQ
n
PQ
1 N
N
n
Population size is
infinite or sample with
replacement.
Population size N finite
or sample without
replacement
Population size is
infinite or sample with
replacement.
Population size N finite
or sample without
replacement
125
=
5
1
{(2-6)
2
+ (3-6)
2
+ (6-6)
2
+ (8-6)
2
+ (11-6)
2
}
= 10.8
the standard deviation of the population = 3.29
(iii) There are 25 samples of size two which can be drawn with
replacement. They are
(2, 2) (2, 3) (2, 6) (2, 8) (2, 11)
(3, 2) (3, 3) (3, 6) (3, 8) (3, 11)
(6, 2) (6, 3) (6, 6) (6, 8) (6, 11)
(8, 2) (8, 3) (8, 6) (8, 8) (8, 11)
(11, 2) (11, 3) (11, 6) (11, 8) (11, 11)
The corresponding sample means are
2.0 2.5 4.0 5.0 6.5
2.5 3.0 4.5 5.5 7.0
4.0 4.5 6.0 7.0 8.5
5.0 5.5 7.0 8.0 9.5
6.5 7.0 8.5 9.5 11.0
The mean of sampling distribution of means
X
=
25
means sample all of sum
=
25
150
= 6.0
(iv) The variance
2
X
of the sampling distribution of means is
obtained as follows :
2
X
=
25
1
{26)
2
+ (2.5 6)
2
+ ... + (6.56)
2
+ ...
+ (9.56)
2
+(116)
2
}
=
25
135
= 5.4
the standard error of means
X
= 4 . 5 = 2.32
126
Example 2
Assume that the monthly savings of 1000 employees
working in a factory are normally distributed with mean
Rs. 2000 and standard deviation Rs. 50 If 25 samples
consisting of 4 employees each are obtained, what would be
the mean and standard deviation of the resulting sampling
distribution of means if sampling were done (i) with
replacement, (ii) without replacement.
Solution :
Given N = 1000, = 2000, = 50, n = 4
(i) Sampling with replacement
X
= = 2000
X
=
n
=
4
50
= 25
(ii) Sampling without replacement
The mean of the sampling distribution of the means is
X
= = 2000
The standard deviation of the sampling distribution of means
is
X
=
1 N
N
n
n
=
1 1000
4 100
4
50
= (25)
999
996
= 25 ( ) 996 .
= (25) (0.9984) = 24.96
Example 3
A random sample of size 5 is drawn without replacement
from a finite population consisting of 41 units. If the population
S.D is 6.25, find the S.E of the sample mean.
127
Solution :
Population size N = 41
Sample size n = 5
Standard deviation of the popultion = 6.25
S.E of sample mean =
n
1 N
N
n
(N is finite)
=
5
25 . 6
1 41
5 41
=
10 2 5
6 25 . 6 x
=
2 5
25 . 6 3 x
=2.65
Example 4
The marks obtained by students in an aptitude test are
normally distributed with a mean of 60 and a standard
deviation of 30. A random sample of 36 students is drawn
from this population.
(i) What is the standard error of the sampling mean?
(ii) What is the probability that the mean of a sample of
16 students will be either less than 50 or greater than 80?
[ P (0 < Z < 4) = 0.4999 ]
Solution :
(i) The standard error of the sample mean X is given by
X
=
n
=
36
30
= 5 ( N is not given)
(ii) The random variable X follows normal distribution with mean
X
and standard deviation
n
.
To find P( X<50 or X>80).
P( X<50 or X>80) = P( X<50) + P( X>80)
= P
<
5
60 50
X
X
X
+ P
>
5
60 80
X
X
X
128
= P (Z < 2) + P(Z > 4)
= [0.5 P(0 < Z < 2)] + [0.5 P(0 < Z < 4)]
= (0.5 0.4772) + (0.5 0.4999)
= .02283, which is the required probability.
Example 5
2% of the screws produced by a machine are defective.
What is the probability that in a consignment of 400 such
screws, 3% or more will be defective.
Solution :
Here N is not given, but n = 400
Population proportion P = 2% = 0.02 Q = 1 P = 0.98
The sample size is large
The sample porportion is normally distributed with mean
p
= 0.02 and S.D =
n
PQ
=
400
98 . 0 02 . 0 x
= 0.007
Probability that the sample proportion p > 0.03
= Area under the normal curve to the right of Z = 1.43.
(Z =
S.D
P p
=
007 . 0
02 . 0 03 . 0
= 1.43)
required probability = 0.5 Area between Z = 0 to Z = 1.43
= 0.5 0.4236 = 0.0764
EXERCISE 9.2
1) A population consists of four numbers 3, 7, 11 and 15. Consider
all possible samples of size two which can be drawn with
replacement from this population.
Find (i) the population mean
(ii) the population standard deviation.
(iii) the mean of the sampling distribution of mean
(iv) the standard deviation of the sampling distribution of
mean.
129
2) A population consists of four numbers 3, 7, 11 and 15. Consider
all possible samples of size two which can be drawn without
replacement from this population.
Find (i) the population mean
(ii) the population standard deviation.
(iii) the mean of the sampling distribution of mean
(iv) the standard deviation of the sampling distribution of
mean.
3) The weights of 1500 iron rods are normally distributed with
mean of 22.4 kgs. and standard deviation of 0.048 kg. If 300
random samples of size 36 are drawn from this population,
determine the mean and standard deviation of the sampling
distribution of mean when sampling is done (i) with replacement
(ii) without replacement.
4) 1% of the outgoing +2 students in a school have joined I.I.T.
Madras. What is the probability that in a group of 500 such
students 2% or more will be joining I.I.T. Madras.
9.3 ESTIMATION
The technique used for generalising the results of the sample
to the population is provided by an important branch of statistics
called statistical inference. The concept of statistical inference
deals with two basic aspects namely (a) Estimation and (b) Testing
of hypothesis.
In statistics, estimation is concerned with making inference
about the parameters of the population using information available
in the samples. The parameter estimation is very much needed in
the decision making process.
The estimation of population parameters such as mean,
variance, proportion, etc. from the correspoinding sample statistics
is an important function of statistical inference.
130
9.3.1 Estimator
A sample statistic which is used to estimate a population
parameter is known as estimator.
A good estimator is one which is as close to the true value of
population parameter as possible. A good estimator possesses the
following properties:
(i) Unbiasedness
As estimate is said to be unbiased if its expected value is
equal to its parameter.
The sample mean X =
n
1
x is an unbiased estimator of
population mean . For a sample of size n, drawn from a population
of size N, s
2
=
1
1
n
(x x )
2
is an unbiased estimator of
population variance. Hence s
2
is used in estimation and in testing
of hypothesis.
(ii) Consistency
An estimator is said to be consistent if the estimate tends to
approach the parameter as the sample size increases.
(iii) Efficiency
If we have two unbiased estimators for the same population
prarameter, the first estimator is said to be more efficient than the
second estimator if the standard error of the first estimator is smaller
than that of the second estimator for the same sample size.
(iv) Sufficiency
If an estimator possesses all information regarding the
parameter, then the estimator is said to be a sufficient estimator.
9.3.2 Point Estimate and Interval Estimate
It is possible to find two types of estimates for a population
parameter. They are point estimate and interval estimate.
Point Estimate
An estimate of a population parameter given by a single
number is called a point estimator of the parameter. Mean ( x ) and
131
the sample variance [s
2
=
1
1
n
(x x )
2
] are the examples of
point estimates.
A point estimate will rarely coincide with the true population
parameter value.
Interval Estimate
An estimate of a population parameter given by two numbers
between which the parameter is expected to lie is called an interval
estimate of the parameter.
Interval estimate indicates the accuracy of an estimate and is
therefore preferable to point estimate. As point estimate provides
a single value for the population parameter it may not be suitable in
some situation.
For example,
if we say that a distance is measured as 5.28mm, we are giving
a point estimate. On the other hand, if we say that the distance is
5.28 + 0.03 mm i.e. the distance lies between 5.25 and 5.31mm,
we are giving an interval estimate.
9.3.3 Confidence Interval for population mean and proportion
The interval within which the unknown value of parameter is
expected to lie is called confidence interval. The limits so
determined are called confidence limits.
Confidence intervals indicate the probability that the population
parameter lies within a specified range.
Computation of confidence interval
To compute confidence interval we require
(i) the sample statistic,
(ii) the standard error (S.E) of sampling distribution of the statistic
(iii) the degree of accuracy reflected by the Z-value.
If the size of sample is sufficiently large, then the sampling
distribution is approximately normal. Therefore, the sample value
can be used in estimation of standard error in the place of population
132
value. The Z-distribution is used in case of large samples to estimate
the confidence limits.
We give below values of Z corresponding to some confidence
levels.
Confidence Levels 99% 98% 96% 95% 80% 50%
Value of Z, Z
c
2.58 2.33 2.05 1.96 1.28 0.674
(i) Confidence interval estimates for means
Let and be the population mean and standard deviation
of the population.
Let X and s be the sample mean and standard deviation of
the sampling distribution of a statistic.
The confidence limits for are given below :
Population Sample size confidence limits for .
size
Infinite n X + (Z
C
)
n
s
, z
c
is the value
of Z corresponding to
confidence levels.
Finite, N n X + (Z
C
)
n
s
1 N
N
n
(ii) Confidence intervals for proportions
If p is the proportion of successes in a sample of size n
drawn from a population with P as its proportion of successes,
then the confidence intervals for P are given below :
Population Sample size Confidence limits for P
Infinite n p+(Z
C
)
n
pq
Finite, N n p+(Z
C
)
n
pq
1 N
N
n
133
Example 6
Sensing the downward trend in demand for a leather
product, the financial manager was considering shifting his
companys resources to a new product area. He selected a
sample of 10 firms in the leather industry and discovered their
earnings (in %) on investment. Find point estimate of the
mean and variance of the population from the data given
below.
21.0 25.0 20.0 16.0 12.0 10.0 17.0 18.0 13.0 11.0
Solution :
X X X- X (X- X)
2
21.0 16.3 4.7 22.09
25.0 16.3 8.7 75.69
20.0 16.3 3.7 13.69
16.0 16.3 -0.3 0.09
12.0 16.3 -4.3 18.49
10.0 16.3 -6.3 39.69
17.0 16.3 0.7 0.49
18.0 16.3 1.7 2.89
13.0 16.3 -3.3 10.89
11.0 16.3 -5.3 28.09
163.0 212.10
Sample mean, X =
10
163 X
=
n
= 16.3
Sample variance, s
2
=
1
1
n
(X- X)
2
=
1
10 . 212
n
= 23.5 (the sample size is small)
Sample standard deviation = 5 . 23 = 4.85
134
Thus the point estimate of mean and of variance of the
population from which the samples are drawn are 16.3 and 23.5
respectively.
Example 7
A sample of 100 students are drawn from a school.
The mean weight and variance of the sample are 67.45 kg
and 9 kg. respectively. Find (a) 95% and (b) 99%
confidence intervals for estimating the mean weight of the
students.
Solution :
Sample size, n = 100
The sample mean, X= 67.45
The sample variance s
2
= 9
The sample standard deviation s = 3
Let be the population mean.
(a) The 95% confidence limits for are given by
X + (Z
c
)
n
s
67.45 + (1.96)
100
3
(Here Z
c
= 1.96 for 95%
67.45 + 0.588
confidence level)
Thus the 95% confidence intervals for estimating is given by
(66.86, 68.04)
(b) The 99% confidence limits for estimating are given by
X + (Z
c
)
n
s
67.45 + (2.58)
100
3
(Here z
c
= 2.58 for 99%
67.45 + 0.774
confidence level)
Thus the 99% confidence interval for estimating is given by
(66.67, 68.22)
135
Example 8
A random sample of size 50 with mean 67.9 is drawn
from a normal population. If it is known that the standard error
of the sample mean is 0.7 , find 95% confidence interval for
the population mean.
Solution :
n = 50, sample mean X = 67.9
95% confidence limits for population mean are :
X + (Z
c
){S.E( X)}
67.9 + (1.96) (
7 . 0
)
67.9 + 1.64
Thus the 95% confidence intervals for estimating is given by
(66.2, 69.54)
Example 9
A random sample of 500 apples was taken from large
consignment and 45 of them were found to be bad. Find the
limits at which the bad apples lie at 99% confidence level.
Solution :
We shall find confidence limits for the proportion of bad apples.
Sample size n = 500
Proportion of bad apples in the sample =
500
45
= 0.09
p = 0.09
Proportion of good apples in the sample q = 1-p = 0.91.
The confidence limits for the population proportion P of bad
apples are given by
p + (Z
c
)
n
pq
136
0.09 + (2.58)
500
91 . 0 )( 09 (.
0.09 + 0.033
The required interval is (0.057, 0.123)
Thus, the bad apples in the consignment lie between 5.7%
and 12.3%
Example 10
Out of 1000 TV viewers, 320 watched a particular
programme. Find 95% confidence limits for TV viewers who
watched this programme.
Solution :
Sample size n = 1000
Sample proportion of TV viewers p =
n
x
=
1000
320
= .32
q = 1 - p = .68
S.E (p) =
n
pq
= 0.0147
The 95% confidence limits for population propotion P are
given by
p + (1.96) S.E (p) = 0.32 + 0.028
0.292 and 0.348
TV viewers of this programme lie between 29.2% and
34.8%
Example 11
Out of 1500 school students, a sample of 150 selected
at random to test the accuracy of solving a problem in business
mathematics and of them 10 did a mistake. Find the limits
within which the number of students who did the problem
wrongly in whole universe of 1500 students at 99% confidence
level.
137
Solution :
Population size, N = 1500
Sample size, n = 150
Sample proportion, p =
150
10
= 0.07
q = 1p = 0.93
Standard error of p, SE (p) =
n
pq
= 0.02
The 99% confidence limits for population proportion P are
given by
p + (Z
c
)
n
pq
1 N
N
n
0.07 + (2.58) (0.02)
1 1500
150 1500
0.07 + 0.048
The confidence interval for P is (0.022 , 0.118)
The number of students who did the problem wrongly
in the population of 1500 lies between .022 x 1500 = 33 and
.118 x 1500 = 177.
EXERCISE 9.3
1) A sample of five measurements of the diameter of a sphere
were recorded by a scientist as 6.33, 6.37, 6.36, 6.32 and 6.37
mm. Determine the point estimate of (a) mean, (b) variance.
2) Measurements of the weights of a random sample of 200 ball
bearings made by a certain machine during one week showed
mean of 0.824 newtons and a standard deviation of 0.042
newtons. Find (a) 95% and (b) 99% confidence limits for the
mean weight of all the ball bearings.
3) A random sample of 50 branches of State Bank of India out of
200 branches in a district showed a mean annual profit of Rs.75
lakhs and a standard deviation of 10 lakhs. Find the 95%
confidence limits for the estimate of mean profit of 200
branches.
138
4) A random sample of marks in mathematics secured by 50
students out of 200 students showed a mean of 75 and a
standard deviation of 10. Find the 95% confidence limits for
the estimate of their mean marks.
5) Out of 10000 customers ledger accounts, a sample of 200
accounts was taken to test the accuracy of posting and
balancing wherein 35 mistakes were found. Find 95%
confidence limits within which the number of defective cases
can be expected to lie.
6) A sample poll of 100 voters chosen at random from all voters
in a given district indicated that 55% of them were in favour of
a particular candidate. Find (a) 95% confidence limits, (b)
99% confidence limits for the proportion of all voters in favour
of this candidate.
9.4 HYPOTHESIS TESTING
There are many problems in which, besides estimating the
value of a parameter of the population, we must decide whether a
statement concerning a parameter is true or false; that is, we must
test a hypothesis about a parameter.
To illustrate the general concepts involved in this kind of
decision problems, suppose that a consumer protection agency wants
to test a manufacturers claim that the average life time of electric
bulbs produced by him is 200 hours. So it instructs a member of its
staff to take 50 electric bulbs from the godown of the company and
test them for their lifetime continuously with the intention of rejecting
the claim if the mean life time of the bulbs is below 180 hours (say);
otherwise it will accept the claim.
Thus hypothesis is an assumption that we make about an
unknown population parameter. We can collect sample data from
the population, arrive at the sample statistic and then test if the
hypothesis about the population parameter is true.
139
9.4.1 Null Hypothesis and Alternative Hypothesis
In hypothesis testing, the statement of the hypothesis or
assumed value of the population parameter is always stated before
we begin taking the sample for analysis.
A statistical statement about the population parameter assumed
before taking the sample for possible rejection on the basis of
outcome of sample data is known as a null hypothesis.
The null hypothesis asserts that there is no diffeence between
the sample statistic and population parameter and whatever
difference is there is attributable to sampling error.
A hypothesis is said to be alternative hypothesis when it is
complementary to the null hypothesis.
The null hypothesis and alternative hypothesis are usually
denoted by H
0
and H
1
respectively.
For example, if we want to test the null hypothesis that the
average height of soldiers is 173 cms, then
H
0
: = 173 =
0
(say)
H
1
: 173
0
.
9.4.2 Types of Error
For testing the hypothesis, we take a sample from the
population, and on the basis of the sample result obtained, we decide
whether to accept or reject the hypothesis.
Here, two types of errors are possible. A null hypothesis
could be rejected when it is true. This is called Type I error and
the probability of committing type I error is denoted by .
Alternatively, an error could result by accepting a null
hypothesis when it is false. This is known as Type II error and the
probability of committing type II error is denoted by .
140
This is illustrated in the following table :
Actual Decision based Error and its
on sampling Probability
H
0
is True Rejecting H
0
Type I error ;
= P{H
1
/ H
0
}
H
0
is False Accepting H
0
Type II error ;
= P{H
0
/ H
1
}
9.4.3 Critical region and level of significance
A region in the sample space which amounts to rejection of
null hypothesis (H
0
) is called the critical region.
After formulating the null and alternative hypotheses about a
population parameter, we take a sample from the population and
calculate the value of the relevant statistic, and compare it with the
hypothesised population parameter.
After doing this, we have to decide the criteria for accepting
or rejecting the null hypothesis. These criteria are given as a range
of values in the form of an interval, say (a, b), so that if the statistic
value falls outside the range, we reject the null hypothesis.
If the statistic value falls within the interval (a, b), then we
accept H
0
. This criterion has to be decided on the basis of the
level of significance. A 5% level of signififance means that 5% of
the statistical values arrived at from the samples will fall outside this
range (a, b) and 95% of the values will be within the range (a, b).
Thus the level of significance is the probability of Type I
error . The levels of significance usually employed in testing of
hypothesis are 5% and 1%.
A high significance level chosen for testing a hypothesis would
imply that higher is the probability of rejecting a null hypothesis if it
is true.
141
9.4.4 Test of significance
The tests of significance are (a) Test of significance for large
samples and (b) Test of significance for small samples.
For larger sample size (>30), all the distributions like Binomial,
Poission etc., are approximated by normal distribution. Thus normal
probability curve can be used for testing of hypothesis.
For the test statistic Z (standard normal variate), the critical
region at 5% level is given by | Z | > 1.96 and hence the acceptance
region is | Z | < 1.96. Where as the critical region for Z at 1% level
is | Z | > 2.58 and the acceptance region is | Z | < 2.58.
The testing hypothesis involves five steps :
(i) The formulation of null hypothesis and an alternative hypothesis
(ii) Set up suitable significance level.
(iii) Setting up the statistical test criteria.
(iv) Setting up rejection region for the null hypothesis.
(v) Conclusion.
Example 12
The mean life time of 50 electric bulbs produced by a
manufacturing company is estimated to be 825 hours with a
standard deviation of 110 hours. If is the mean life time of
all the bulbs produced by the company, test the hypothesis
that = 900 hours at 5% level of significance.
Solution :
Null Hypothesis, H
0
: = 900
Alternative Hypothesis, H
1
: 900
Test statistic , Z is the standard normal variate.
under H
0
, Z =
n
X
where X is the sample mean
= s.d.of the population
142
=
n
s
X
(For large sample, = s)
=
50
110
900 825
= 4.82.
| Z | = 4.82
Significant level, = 0.05 or 5%
Critical region is | Z | > 1.96
Acceptance region is | Z | < 1.96
The calculated Z is much greater than 1.96.
Decision : Since the calculated value of | Z | = 4.82 falls in
the critical region, the value of Z is significant at 5% level.
the null hypothesis is rejected.
we conclude that the mean life time of the population of
electric bulbs cannot be taken as 900 hours.
Example 13
A company markets car tyres. Their lives are
normally distributed with a mean of 50000 kilometers and
standard deviation of 2000 kilometers. A test sample of
64 tyres has a mean life of 51250 kms. Can you conclude
that the sample mean differs significantly from the
population mean? (Test at 5% level)
Solution :
Sample size, n = 64
Sample mean, X= 51250
H
0
: population mean = 50000
H
1
: 50000
Under H
0
, the test statistic Z =
n
X
~ N(0, 1)
143
Z =
64
2000
50000 51250
= 5
Since the calculated Z is much greater than 1.96, it is highly
significant.
H
0
: = 50000 is rejected.
The sample mean differs significantly from the population mean
Example 14
A sample of 400 students is found to have a mean height
of 171.38 cms. Can it reasonably be regarded as a sample
from a large population with mean height of 171.17 cms
and standard deviation of 3.3 cms. (Test at 5% level)
Solution :
Sample size, n = 400
Sample mean , X = 171.38
Population mean, =171.17
Sample standard deviation = s.
Population standard deviation, = 3.3
Set H
0
: = 171.38
The test statistic, Z =
n
X
~ N (0, 1)
=
n
X
since the sample is large, s =
=
400
3 . 3
17 . 171 38 . 171
= 1.273
Since | Z | = 1.273 < 1.96, we accept the null hypothesis at
5% level of signifiance.
Thus the sample of 400 has come from the population with
mean height of 171.17 cms.
144
EXERCISE 9.4
1) The mean I.Q of a sample of 1600 children was 99. Is it likely
that this was a random sample from a population with mean
I.Q 100 and standard deviation 15 ? (Test at 5% level of
significance)
2) The income distribution of the population of a village has a
mean of Rs.6000 and a variance of Rs.32400. Could a sample
of 64 persons with a mean income of Rs.5950 belong to this
population?
(Test at both 5% and 1% levels of significance)
3) The table below gives the total income in thousand rupees per
year of 36 persons selected randomly from a particular class
of people
Income (thousands Rs.)
6.5 10.5 12.7 13.8 13.2 11.4
5.5 8.0 9.6 9.1 9.0 8.5
4.8 7.3 8.4 8.7 7.3 7.4
5.6 6.8 6.9 6.8 6.1 6.5
4.0 6.4 6.4 8.0 6.6 6.2
4.7 7.4 8.0 8.3 7.6 6.7
On the basis of the sample data, can it be concluded that the
mean income of a person in this class of people is Rs. 10,000
per year? (Test at 5% level of significance)
4) To test the conjecture of the management that 60 percent
employees favour a new bonus scheme, a sample of 150
employees was drawn and their opinion was taken whether
they favoured it or not. Only 55 employees out of 150 favoured
the new bonus scheme Test the conjecture at 1% level of
significance.
145
EXERCISE 9.5
Choose the correct answer
1) The standard error of the sample mean is
(a) Type I error (b) Type II error
(c) Standard deviation of the sampling distribution of the mean
(d) Variance of the sampling distribution of the mean
2) If a random sample of size 64 is taken from a population
whose standard deviation is equal to 32, then the standard
error of the mean is
(a) 0.5 (b) 2 (c) 4 (d) 32
3) The central limit theorem states that the sampling distribution
of the mean will approach normal distribution
(a) As the size of the population increases
(b) As the sample size increases and becomes larger
(c) As the number of samples gets larger
(d) As the sample size decreases
4) The Z-value that is used to establish a 95% confidence interval
for the estimation of a population parameter is
(a) 1.28 (b) 1.65 (c) 1.96 (d) 2.58
5) Probability of rejecting the null hypothesis when it is true is
(a) Type I error (b) Type II error
(c) Sampling error (d) Standard error
6) Which of the following statements is true?
(a) Point estimate gives a range of values
(b) Sampling is done only to estimate a statistic
(c) Sampling is done to estimate the population parameter
(d) Sampling is not possible for an infinite population
7) The number of ways in which one can select 2 customers out
of 10 customers is
(a) 90 (b) 60 (c) 45 (d) 50
146
10.1 LINEAR PROGRAMMING
Linear programming is the general technique of optimum
allocation of limited resources such as labour, material, machine,
capital etc., to several competing activities such as products, services,
jobs, projects, etc., on the basis of given criterion of optimality.
The term limited here is used to describe the availability of scarce
resources during planning period. The criterion of optimality
generally means either performance, return on investment, utility,
time, distance etc., The word linear stands for the proportional
relationship of two or more variables in a model. Programming
means planning and refers to the process of determining a particular
plan of action from amongst several alternatives. It is an extremely
useful technique in the decision making process of the management.
10.1.1 Structure of Linear Programming Problem (LPP)
The LP model includes the following three basic elements.
(i) Decision variables that we seek to determine.
(ii) Objective (goal) that we aim to optimize (maximize or
minimize)
(iii) Constraints that we need to satisfy.
10.1.2 Formulation of the Linear Programming Problem
The procedure for mathematical formulation of a linear
programming consists of the following major steps.
Step 1 : Study the given situation to find the key decision to be
made
Step 2 : Identify the variables involved and designate them by
symbols x
j
(j = 1, 2 ...)
Step 3 : Express the feasible alternatives mathematically in terms
of variables, which generally are : x
j
> 0 for all j
APPLIED STATISTICS
10
147
Step 4 : Identify the constraints in the problem and express them
as linear inequalities or equations involving the decision
variables.
Step 5 : Identify the objective function and express it as a linear
function of the decision variables.
10.1.3 Applications of Linear programming
Linear programming is used in many areas. Some of them are
(i) Transport : It is used to prepare the distribution plan between
source production and destination.
(ii) Assignment : Allocation of the tasks to the persons available
so as to get the maximum efficiency.
(iii) Marketing : To find the shortest route for a salesman who
has to visit different locations so as to minimize the total cost.
(iv) Investment : Allocation of capital to differerent activities so
as to maximize the return and minimize the risk.
(v) Agriculture : The allotment of land to different groups so as
to maximize the output.
10.1.4 Some useful Definitions
A feasible solution is a solution which satisfies all the
constraints (including non-negativity) of the problem.
A region which contains all feasible solutions is known as
feasible region.
A feasible solution which optimizes (maximizes or minimizes)
the objective function,is called optimal solution to the problem.
Note
Optimal solution need not be unique.
Example 1
A furniture manufacturing company plans to make
two products, chairs and tables from its available
resources, which consists of 400 board feet of mahogany
148
timber and 450 man-hours of labour. It knows that to make
a chair requires 5 board feet and 10 man-hours and yields
a profit of Rs.45, while each table uses 20 board feet and
15 man - hours and has a profit of Rs.80. How many chairs
and tables should the company make to get the maximum
profit under the above resource constraints? Formulate the
above as an LPP.
Solution :
Mathematical Formulation :
The data of the problem is summarised below:
Products Raw material Labour Profit
(per unit) (per unit) (per unit)
Chair 5 10 Rs. 45
Table 20 15 Rs. 80
Total availability 400 450
Step 1 : The key decision to be made is to determine the number
of units of chairs and tables to be produced by the
company.
Step 2 : Let x
1
designate the number of chairs and x
2
designate
the number of tables, which the company decides to
produce.
Step 3 : Since it is not possible to produce negative quantities,
feasible alternatives are set of values of x
1
and x
2
, such
that x
1
> 0 and x
2
> 0
Step 4 : The constraints are the limited availability of raw material
and labour. One unit of chair requires 5 board feet of
timber and one unit of table requires 20 board feet of
timber. Since x
1
and x
2
are the quantities of chairs and
tables, the total requirement of raw material will be
5x
1
+ 20x
2
, which should not exceed the available raw
material of 400 board feet timber. So, the raw material
constraint becomes,
5x
1
+ 20x
2
< 400
149
Similarly, the labour constraint becomes,
10x
1
+ 15x
2
< 450
Step 5 : The objective is to maximize the total profit that the
company gets out of selling their products, namely chairs,
tables. This is given by the linear function.
z = 45x
1
+ 80x
2
.
The linear programming problem can thus be put in the
following mathematical form.
maximize z = 45x
1
+ 80x
2
subject to 5x
1
+ 20x
2
< 400
10x
1
+ 15x
2
< 450
x
1
> 0, x
2
> 0
Example 2
A firm manufactures headache pills in two sizes A and
B. Size A contains 2 mgs. of aspirin. 5 mgs. of bicarbonate
and 1 mg. of codeine. Size B contains 1 mg. of aspirin, 8 mgs.
of bicarbonate and 6 mgs. of codeine. It is found by users
that it requires atleast 12 mgs. of aspirin, 74 mgs. of
bicarbonate and 24 mgs. of codeine for providing immediate
relief. It is required to determine the least number of pills a
patient should take to get immediate relief. Formulate the
problem as a standard LPP.
Solution :
The data can be summarised as follows :
Head ache per pill
pills Aspirin Bicarbonate Codeine
Size A 2 5 1
Size B 1 8 6
Minimum requirement 12 74 24
150
Decision variables :
x
1
= number of pills in size A
x
2
= number of pills in size B
Following the steps as given in (10.1.2) the linear programming
problem can be put in the following mathematical format :
Maximize z = x
1
+ x
2
subject to 2x
1
+ x
2
> 12
5x
1
+ 8x
2
> 74
x
1
+ 6x
2
> 24
x
1
> 0, x
2
> 0
10.1.5 Graphical method
Linear programming problem involving two decision variables
can be solved by graphical method. The major steps involved in
this method are as follows.
Step 1 : State the problem mathematically.
Step 2 : Plot a graph representing all the constraints of the problem
and identify the feasible region (solution space). The
feasible region is the intersection of all the regions
represented by the constraints of the problem and is
restricted to the first quadrant only.
Step 3 : Compute the co-ordinates of all the corner points of the
feasible region.
Step 4 : Find out the value of the objective function at each corner
point determined in step3.
Step 5 : Select the corner point that optimizes (maximzes or
minimizes) the value of the objective function. It gives the
optimum feasible solution.
Example 3
A company manufactures two products P
1
and P
2
. The
company has two types of machines A and B for processing
151
the above products. Product P
1
takes 2 hours on machine A
and 4 hours on machine B, whereas product P
2
takes 5 hours
on machine A and 2 hours on machine B. The profit realized
on sale of one unit of product P
1
is Rs.3 and that of product P
2
is Rs. 4. If machine A and B can operate 24 and16 hours per
day respectively, determine the weekly output for each
product in order to maximize the profit, through graphical
method.
Solution :
The data of the problem is summarized below.
Hours on profit
Product Machine A Machine B (per unit)
P
1
2 4 3
P
2
5 2 4
Max. hours / week 120 80
Let x
1
be the number of units of P
1
and x
2
be the number of
units of P
2
produced. Then the mathematical formulation of the
problem is
Maximize z = 3x
1
+ 4x
2
subject to 2x
1
+ 5x
2
< 120
4x
1
+ 2x
2
< 80
x
1
, x
2
> 0
Solution by graphical method
Consider the equation 2x
1
+ 5x
2
= 120, and 4x
1
+ 2x
2
= 80.
Clearly (0, 24) and (60, 0) are two points on the line 2x
1
+ 5x
2
=
120. By joining these two points we get the straight line 2x
1
+ 5x
2
= 120. Similarly, by joining the points (20, 0) and (0, 40) we get
the sraight line 4x
1
+ 2x
2
= 80. (Fig. 10.1)
152
Now all the constraints have been represented graphically.
The area bounded by all the constrints called feasible region
or solution space is as shown in the Fig. 10.1, by the shaded area
OCM
1
B
The optimum value of objective function occurs at one of the
extreme (corner) points of the feasible region. The coordinates of
the extreme points are
0 = (0, 0), C = (20, 0), M
1
= (10, 20), B = (0, 24)
We now compute the z-values correspoinding to extreme points.
Extreme coordinates z = 3x
1
+ 4x
2
point (x
1
, x
2
)
O (0, 0) 0
C (20, 0) 60
M
1
(10, 20) 110
B (0, 24) 96
The optimum solution is that extreme point for which the
objective function has the largest (maximum) value. Thus the
optimum solution occurs at the point M
1
i.e. x
1
= 10 and x
2
= 20.
Hence to maximize profit of Rs.110, the company should
produce 10 units of P
1
and 20 units of P
2
per week.
O
10 20 30 40 50 60 70 80
80
70
60
50
40
30
20
10
x
1
x
2
.
A
.
C
4
x
1
+
2
x 2
=
8
0
2
x
1
+
5
x
2
=
1
2
0
.
.
B
.M
1
(Fig. 10.1)
153
Note
In case of maximization problem, the corner point at which the
objective function has a maximum value represent the optimal solution.
In case of minimization problem, the corner point at which the objective
function has a minimum value represents the optimnal solution.
Example 4
Solve graphically :
Minimize z = 20x
1
+ 40x
2
Subject to 36x
1
+ 6x
2
> 108
3x
1
+ 12x
2
> 36
20x
1
+ 10x
2
> 100
x
1
, x
2
> 0
Solution :
A(0, 18) and B(3, 0) ; C(0, 3) and D(12, 0) ; E(0, 10) and
F(5, 0) are the points on the lines 36x
1
+ 6x
2
= 108, 3x
1
+ 12x
2
=
36 and 20x
1
+ 10x
2
= 100 respectively. Draw thte above lines as
Fig. 10.2.
Now all the constraints of the given problem have been
graphed. The area beyond three lines represents the feasible region
O
2 4 6 8 10 12 14 16
18
16
14
12
10
8
6
4
2
x
1
x
2
A
.
E
.
.
.
.
. . .
M
2
C
M
1
B F D
3
6
x
1
+
6x
2
=
1
0
8
2
0
x 1
+
1
0
x 2
=
1
0
0
3x
1
+
12
x
2
=
36
feasible region
(Fig. 10.2)
154
or solution space, as shown in the above figure. Any point from
this region would satisfy the constraints.
The coordinates of the extreme points of the feasible region are:
A = (0, 18), M
2
= (2, 6), M
1
= (4, 2), D = (12, 0)
Now we compute the z-values corresponding to extreme
points.
Extreme coordinates z = 20x
1
+ 40x
2
point (x
1
, x
2
)
A (0, 18) 720
M
1
(4, 2) 160
M
2
(2, 6) 280
D (12, 0) 240
The optimum solution is that extreme point for which the
objective function has minimum value. Thus optimum solution
occurs at the point M
1
i.e. x
1
= 4 and x
2
= 2 with the objective
function value of z = 160 Minimum z = 160 at x
1
= 4, x
2
= 2
Example 5
Maximize z = x
1
+ x
2
subject to x
1
+ x
2
< 1
4x
1
+ 3x
2
> 12
x
1
, x
2
> 0
Solution :
O
1 2 3 4 5 6
6
5
4
3
2
1
x
1
x
2
D
.
.
. .
A
B C
x
1 +
x
2 =
1
4
x
1
+
3
x
2
=
1
2
(Fig. 10.3)
155
From the graph, we see that the given problem has no solutiion
as the feasible region does not exist.
EXERCISE 10.1
1) A company produces two types of products say type A and B.
Profits on the two types of product are Rs.30/- and Rs.40/-
per kg. respectively. The data on resources required and
availability of resources are given below.
Requirements Capacity
Product A Product B
available
per month
Raw materials (kgs) 60 120 12000
Machining hours / piece 8 5 600
Assembling (man hours) 3 4 500
Formulate this problem as a linear programming problem to
maximize the profit.
2) A firm manufactures two products A & B on which the profits
earned per unit are Rs.3 and Rs.4 respectively. Each product
is processed on two machines M
1
and M
2
. Product A requires
one minute of processing time on M
1
and two minutes on M
2
,
while B requires one minute on M
1
and one minute on M
2
.
Machine M
1
is available for not more than 7 hrs 30 minutes
while M
2
is available for 10 hrs during any working day.
Formulate this problem as a linear programming problem to
maximize the profit.
3) Solve the following, using graphical method
Maximize z = 45x
1
+ 80x
2
subject to the constraints
5x
1
+ 20x
2
< 400
10x
1
+ 15x
2
< 450
x
1
, x
2
> 0
156
4) Solve the following, using graphical method
Maximize z = 3x
1
+ 4x
2
subject to the constraints
2x
1
+ x
2
< 40
2x
1
+ 5x
2
< 180
x
1
, x
2
> 0
5) Solve the following, using graphical method
Minimize z = 3x
1
+ 2x
2
subject to the constraints
5x
1
+ x
2
> 10
2x
1
+ 2x
2
> 12
x
1
+ 4x
2
> 12
x
1
, x
2
> 0
10.2 CORRELATION AND REGRESSION
10.2.1 Meaning of Correlation
The term correlation refers to the degree of relationship
between two or more variables. If a change in one variable effects
a change in the other variable, the variables are said to be correlated.
There are basically three types of correlation, namely positive
correlation, negative correlation and zero correlatiion.
Positive correlation
If the values of two variables deviate (change) in the same
direction i.e. if the increase (or decrease) in one variable results in
a corresponding increase (or decrease) in the other, the correlation
between them is said to be positive.
Example
(i) the heights and weights of individuals
(ii) the income and expenditure
(iii) experience and salary.
157
Negative Correlation
If the values of the two variables constantly deviate (change)
in the opposite directions i.e. if the increase (or decrease) in one
results in corresponding decrease (or increase) in the other, the
correlation between them is said to be negative.
Example
(i) price and demand ,
(ii) repayment period and EMI
10.2.2 Scatter Diagram
Let (x
1
, y
1
), (x
2
, y
2
) ... (x
n
, y
n
) be the n pairs of observation
of the variables x and y. If we plot the values of x along x-axis
and the corresponding values of y along y-axis, the diagram so
obtained is called a scatter diagram. It gives us an idea of
relationship between x and y. The types of scatter diagram under
simple linear correlation are given below.
Positive Correlation Negative Correlation
No Correlation
O
Y
X
(Fig. 10.6)
X
O
Y
X
(Fig. 10.5)
O
Y
(Fig. 10.4)
158
(i) If the plotted points show an upward trend, the correlation
will be positive (Fig. 10.4).
(ii) If the plotted points show a downward trend, the correlation
will be negative (Fig. 10.5).
(iii) If the plotted point show no trend the variables are said to be
uncorrelated (Fig. 10.6).
10.2.3 Co-efficient of Correlation
Karl pearson (1867-1936) a British Biometrician, developed
the coefficient of correlation to express the degree of linear
relationship between two variables. Correlation co-efficient between
two random variables X and Y denoted by r(X, Y), is given by
r(X, Y) =
SD(Y) SD(X)
Y) Cov(X,
where
Cov (X, Y) =
) Y )(Y X (X
1
i i
i n
(covariance between X and Y)
SD (X) =
x
=
2
) X (X
1
i
i n
(standard deviation of X)
SD (Y) =
y
=
2
) Y (Y
1
i
i n
(standard deviationof Y)
Hence the formula to compute Karl Pearson correlation
co-efficient is
r(X, Y) =
2 2
) Y (Y
1
) X (X
1
) Y )(Y X (X
1
i
i
i
i
i i
i
n n
n
=
2 2
) Y (Y ) X (X
) Y )(Y X (X
i
i
i
i
i i
i
=
2 2
y x
xy
159
Note
The following formula may also be used to compute correlation
co-efficient between the two variables X and Y.
(i) r(X, Y) =
2 2 2 2
) Y ( Y ) X ( X N
Y X - XY N
N
(ii) r(X, Y) =
2 2 2 2
) ( ) (
dy dy N dx dx N
dy dx dxdy N
where dx = x A ; dy = y B are the deviations from
arbitrary values A and B.
10.2.4 Limits for Correlation co-efficient
Correlation co-efficient lies between -1 and +1.
i.e. -1 < r (x, y) < 1.
(i) If r(X , Y) = +1 the variables X and Y are said to be perfectly
possitively correlated.
(ii) If r(X , Y) = 1 the variables X and Y are said to be perfectly
negatively correlated.
(iii) If r(X , Y) = 0 the variables X and Y are said to be
uncorrelated.
Example 6
Calculate the correlation co-efficient for the following
heights (in inches) of fathers(X) and their sons(Y).
X : 65 66 67 67 68 69 70 72
Y : 67 68 65 68 72 72 69 71
Solution :
8
544
n
X
X =
= = 68
8
552
n
Y
Y =
= = 69
160
X Y x=X X y=Y Y x
2
y
2
xy
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
544 552 0 0 36 44 24
Karl Pearson Correlation Co-efficient,
r(x, y) =
2 2
y x
xy
=
44 36
24
= .603
Since r(x, y) = .603, the variables X and Y are positively
correlated. i.e. heights of fathers and their respective sons are said
to be positively correlated.
Example 7
Calculate the correlation co-efficient from the data below:
X : 1 2 3 4 5 6 7 8 9
Y : 9 8 10 12 11 13 14 16 15
Solution :
X Y X
2
Y
2
XY
1 9 1 81 9
2 8 4 64 16
3 10 9 100 30
4 12 16 144 48
5 11 25 121 55
6 13 36 169 78
7 14 49 196 98
8 16 64 256 128
9 15 81 225 135
45 108 285 1356 597
161
r (X,Y) =
2 2 2 2
) Y ( Y ) X ( X N
Y X - XY N
N
=
2 2
) 108 ( ) 1356 ( 9 ) 45 ( ) 285 ( 9
(108) (45) - (597) 9
= .95
X and Y are highly positively correlated.
Example 8
Calculate the correlation co-efficient for the ages of
husbands (X) and their wives (Y)
X : 23 27 28 29 30 31 33 35 36 39
Y : 18 22 23 24 25 26 28 29 30 32
Solution :
Let A = 30 and B = 26 then dx = X dy = Y
X Y d
x
d
y
d
2
x
d
2
y
d
x
d
y
23 18 -7 -8 49 64 56
27 22 -3 -4 9 16 12
28 23 -2 -3 4 9 6
29 24 -1 -2 1 4 2
30 25 0 -1 0 1 0
31 26 1 0 1 0 0
33 28 3 2 9 4 6
35 29 5 3 25 9 15
36 30 6 4 36 16 24
39 32 9 6 81 36 54
11 3 215 159 175
r (x, y) = 2 2 2 2
) ( ) (
dy d N dx d N
dy dx dxdy N
y x
162
=
2 2
) 3 ( ) 159 ( 10 ) 11 ( ) 215 ( 10
) 3 )( 11 ( ) 175 ( 10
=
8 . 1790
1783
= 0.99
X and Y are highly positively correlated. i.e. the ages of
husbands and their wives have a high degree of correlation.
Example 9
Calculate the correlation co-efficient from the following
data
N = 25, X = 125, Y = 100
X
2
= 650 Y
2
= 436, XY = 520
Solution :
We know,
r =
2 2 2 2
) Y ( Y ) X ( X N
Y X - XY N
N
=
2 2
) 100 ( ) 436 ( 25 ) 125 ( ) 650 ( 25
(100) (125) - 5(520) 2
r = 0.667
10.2.5 Regression
Sir Francis Galton (1822 - 1911), a British biometrician,
defined regression in the context of heriditary characteristics. The
literal meaning of the word regression is Stepping back towards
the average.
Regression is a mathematical measure of the average
relationship between two or more variables in terms of the original
units of the data.
There are two types of variables considered in regression
analysis, namely dependent variable and independent variable(s).
163
10.2.6 Dependent Variable
The variable whose value is to be predicted for a given
independent variable(s) is called dependent variable, denoted by
Y. For example, if advertising (X) and sales (Y) are correlated, we
could estimate the expected sales (Y) for given advertising
expenditure (X). So in this case Y is a dependent variable.
10.2.7 Independent Variable
The variable which is used for prediction is called an
independent variable. For example, it is possible to estimate the
required amount of expenditure (X) for attaining a given amount of
sales (Y), when X and Y are correlated. So in this case Y is
independent variable. There can be more than one independent
variable in regression.
The line of regression is the line which gives the best estimate
to the value of one variable for any specific value of the other
variable.
Thus the line of Regression is the line of best fit and is
obtained by the principle of least squares. (Refer pages 61 & 62
of Chapter 7). The equation corresponds to the line of regression is
also referred to as regression equation.
10.2.8 Two Regression Lines
For the pair of values of (X, Y), where X is an independent
variable and Y is the dependent variable the line of regression of Y
on X is given by
Y Y = b
yx
(X X)
where b
yx
is the regression co-efficient of Y on X and
given
by b
yx
= r .
x
y
, where r is the correlation co-efficient between X
and Y and
x
and
y
are the standard deviations of X and Y
respectively.
b
yx
=
2
x
xy
where x = X X and y = Y Y
164
Similarly when Y is treated as an independent variable and
X as dependent variable, the line of regression of X on Y is given by
(X X) = b
xy
(Y Y)
where b
xy
== r
y
x
=
2
y
xy
here x = X X ; y = Y Y
Note
The two regression equations are not reversible or
interchangeable because of the simple reason that the basis and
assumption for deriving these equations are quite different.
Example 10
Calculate the regression equation of X on Y from the
following data.
X : 10 12 13 12 16 15
Y : 40 38 43 45 37 43
Solution :
X Y x=X X y=Y Y x
2
y
2
xy
10 40 -3 -1 9 1 3
12 38 -1 -3 1 9 3
13 43 0 2 0 4 0
12 45 -1 4 1 16 -4
16 37 3 -4 9 16 -12
15 43 2 2 4 4 4
78 246 0 0 24 50 6
X=
n
X
=
6
78
= 13 Y=
n
Y
=
6
246
= 41
b
xy
=
50
6
y
x
2
y
= 0.12
Regression equation of X on Y is (X X) = b
xy
(Y Y)
X 13 = 0.12 (Y 41) X = 17.92 0.12Y
165
Example 11
Marks obtained by 10 students in Economics and
Statistics are given below.
Marks in Economics : 25 28 35 32 31 36 29 38 34 32
Marks in Statistics : 43 46 49 41 36 32 31 30 33 39
Find (i) the regression equation of Y on X
(ii) estimate the marks in statistics when the marks
in Economics is 30.
Solution :
Let the marks in Economics be denoted by X and statistics
by Y.
X Y x=X X y=Y
Y
x
2
y
2
xy
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
320 380 0 0 140 398 -93
X =
10
320 X
=
n
= 32 Y =
10
380 Y
=
n
= 38
b
yx
=
140
93
2
x
xy
= 0.664
(i) Regression equation of Y on X is
Y Y = b
yx
(X X)
Y 38 = 0.664 (X 32)
Y = 59.25 0.664X
166
(ii) To estimate the marks in statistics (Y) for a given marks in the
Economics (X), put X = 30, in the above equation we get,
Y = 59.25 0.664(30)
= 59.25 19.92 = 39.33 or 39
Example 12
Obtain the two regression equations for the following
data.
X : 4 5 6 8 11
Y : 12 10 8 7 5
Solution :
The above values are small in magnitude. So the following
formula may be used to compute the regression co-efficient.
b
xy
=
2 2
) Y ( Y
Y X - XY N
N
b
yx
=
2 2
) X ( X
Y X - XY N
N
X Y X
2
Y
2
XY
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
34 42 262 382 257
5
34 X
X =
=
n
= 6.8
5
42 Y
Y =
=
n
= 8.4
b
XY
=
2
) 42 ( ) 382 ( 5
(34)(42) - (257) 5
= 0.98
b
YX
=
2
) 34 ( ) 262 ( 5
(34)(42) - (257) 5
= 0.93
Regression Equation of X on Y is
(X X) = b
XY
(Y Y)
167
X 6.8 = 0.98(Y 8.4)
X = 15.03 0.98Y
Regression equation of Y on X is
Y Y = b
YX
(X X)
Y 8.4 = 0.93 (X 6.8)
Y = 14.72 0.93X
EXERCISE 10.2
1) Calculate the correlation co-efficient from the following data.
X : 12 9 8 10 11 13 7
Y : 14 8 6 9 11 12 3
2) Find the co-efficient of correlation for the data given below.
X : 10 12 18 24 23 27
Y : 13 18 12 25 30 10
3) From the data given below, find the correlation co-efficient.
X : 46 54 56 56 58 60 62
Y : 36 40 44 54 42 58 54
4) For the data on price (in rupees) and demand (in tonnes) for a
commodity, calculate the co-efficient of correlation.
Price (X) : 22 24 26 28 30 32 34 36 38 40
Demand(Y) : 60 58 58 50 48 48 48 42 36 32
5) From the following data, compute the correlation co-efficient.
N = 11, X = 117, Y = 260, X
2
= 1313
Y
2
= 6580, XY = 2827
6) Obtain the two regression lines from the following
X : 6 2 10 4 8
Y : 9 11 5 8 7
7) With the help of the regression equation for the data given
below calculate the value of X when Y = 20.
X : 10 12 13 17 18
Y : 5 6 7 9 13
168
8) Price indices of cotton (X) and wool (Y) are given below for
the 12 months of a year. Obtain the equations of lines of
regression between the indices.
X : 78 77 85 88 87 82 81 77 76 83 97 93
Y : 84 82 82 85 89 90 88 92 83 89 98 99
9) Find the two regression equations for the data given below.
X : 40 38 35 42 30
Y : 30 35 40 36 29
10.3 TIME SERIES ANALYSIS
Statistical data which relate to successive intervals or points
of time, are referred to as time series.
The following are few examples of time series.
(i) Quarterly production, Half-yearly production, and
yearly production for particular commodity.
(ii) Amount of rainfall over 10 years period.
(iii) Price of a commodity at different points of time.
There is a strong notion that the term time series usually
refer only to Economical data. But it equally applies to data arising
in other natural and social sciences. Here the time sequence plays
a vital role and it requires special techniques for its analysis. In
analysis of time series, we analyse the past in order to understand
the future better.
10.3.1 Uses of analysis of Time Series
(i) It helps to study the past conditions, assess the present
achievements and to plan for the future.
(ii) It gives reliable forcasts.
(iii) It provides the facility for comparison.
Thus wherever time related data is given in Economics,
Business, Research and Planning, the analysis of time series provides
the opportunity to study them in proper perspective.
169
10.3.2 Components of Time Series
A graphical representation of a Time Series data, generally
shows the changes (variations) over time. These changes are known
as principal components of Time Series . They are
(i) Secular trend (ii) Seasonal variation
(iii) Cyclical variation (iv) Irregular variation.
Secular trend (or Trend)
It means the smooth, regular, long-term movement of a series
if observed long enough. It is an upward or downward trend. It
may increase or decrease over period of time. For example, time
series relating to population, price, production, literacy,etc. may
show increasing trend and time series relating to birth rate, death
rate, poverty may show decreasing trend.
Seasonal Variation
It is a short-term variation. It means a periodic movement in
a time series where the period is not longer than one year. A periodic
movement in a time series is one which recurs or repeats at regular
intervals of time or periods. Following are the examples of seasonal
variation.
(i) passenger traffic during 24 hours of a day
(ii) sales in a departmental stores during the seven days of a week.
The factors which cause this type of variation are due to
climatic changes of the different season, customs and habits of the
people. For example more amount of ice creams will be sold in
summer and more number of umbrellas will be sold during rainy
seasons.
Cyclical Variation
It is also a short-term variation. It means the oscillatory
movement in a time series, the period of oscillation being more than
a year. One complete period is called a cycle. Business cycle is the
suitable example for cyclical variation. There are many time series
relating to Economics and Business, which have certain wave-like
170
movements called business cycle. The four phases in business cycle,
(i) prosperity (ii) recession (iii) depression (iv) recovery, recur
one after another regularly.
Irregular Variation
This type of variation does not follow any regularity. These
variations are either totally unaccountable or caused by unforeseen
events such as wars, floods, fire, strikes etc. Irregular variation is
also called as Erratic Variation.
10.3.3 Models
In a given time series, some or all the four components, namely
secular trend, seasonal variation, cyclical variation and irregular
variation may be present. It is important to separate the different
components of times series because either our interest may be on a
particular component or we may want to study the series after
eliminating the effect of a particular component. Though there exist
many models, here we consider only two models.
Multiplicative Model
According to this model, it is assumed that there is a
multiplicative relationship among the four components. i.e.,
y
t
= T
t
x S
t
x C
t
x I
t
,
Where y
t
is the value of the variable at time t, or observed
data at time t, T
t
is the Secular trend or trend, S
t
is the Seasonal
variation, C
t
is the Cyclical variation and I is the Irregular variation
or Erratic variation.
(i) prosperity (i) prosperity
(iii) depression
(
i
i
)
r
e
c
e
s
s
i
o
n
(
i
v
)
r
e
c
o
v
e
r
y
Normal
(Fig. 10.7)
171
Additive Model
According to this model, it is assumed that y
t
be the sum of
the four components.
y
t
= T
t
+ S
t
+ C
t
+ I
t
,
10.3.4 Measurement of secular trend
The following are the four methods to estimate the secular
trend
(i) Graphic method or free - hand method
(ii) Method of Semi - Averages
(iii) Method of Moving Averages
(iv) Method of least squares.
(i) Graphic Method / Free - hand Method
This is the simplest method of studying the trend procedure.
Let us take time on the x - axis, and observed data on the y-axis.
Mark a point on a graph sheet, corresponding to each pair of time
and observed value. After marking all such possible points, draw a
straight line which will best fit to the data according to personal
judgement.
It is to be noted that the line should be so drawn that it passes
between the plotted points in such a manner that the fluctuations in
one direction are approximately equal to those in other directions.
When a trend line is fitted by the free hand method an attention
should be paid to conform the following conditions.
(i) The number of points above the line is equal to the
number of points below the line, as far as possible.
(ii) The sum of the vertical deviations from the trend of the
annual observation above the trend should equal the
sum of the vertical deviations from the trend of the
observations below the trend.
(iii) The sum of the squares of the vertical deviations of the
observations from the trend should be as small as
possible.
172
Example 13
Fit a trend line to the following data by the free hand
mehtod.
year 1978 1979 1980 1981 1982
production of steel 20 22 24 21 23
year 1983 1984 1985 1986
production of steel 25 23 26 25
Solution :
Note
(i) The trend line drawn by the free hand method can be extended
to predict future values. However, since the free hand curve
fitting is too subjective, this method should not be generally
used for predictions
(ii) In the above diagram false base line (zig-zag) has been used.
Generally we use false base line following objectives.
(a) Variations in the data are clearly shown
(b) A large part of the graph is not wasted or space is saved.
(c) The graph provides a better visual communications.
Trend by the freehand method
1978 1979 1980 1981 1982 1983 1984 1985 1986
Year
27
25
23
21
19
0
P
r
o
d
u
c
t
i
o
n
M
i
l
l
i
o
n
T
o
n
n
e
s
(Fig. 10.8)
173
(ii) Method of Semi Averages
This method involves very simple calculations and it is easy
to adopt. When this method is used the given data is divided into
two equal parts. For example, if we are given data from 1980 to
1999, i.e., over a period of 20 years, the two equal points will be
first 10 years, i.e. from 1980 to 1989, and from 1990 to 1999. In
case of odd number of years like 7, 11, 13 etc., two equal parts can
be made by omitting the middle year. For example, if the data are
given for 7 years from 1980 to 1986, the two equal parts would be
from 1980 to 1982 and from 1984 to 1986. The middle year 1983
will be omitted.
After dividing the data in two parts, find the arithmetic mean
of each part. Thus we get semi averages from which we calculate
the annual increase or decrease in the trend.
Example 14
Find trend values to the following data by the method
of semi-averages.
Year 1980 1981 1982 1983 1984 1985 1986
Sales 102 105 114 110 108 116 112
Solution :
No. of years = 7 (odd no.) By omitting the middle year
(1983) we have
Year Sales Semi total Semi-average
1980 102
1981 105 321 107
1982 114
1983 110
1984 108
1985 116 336 112
1986 112
Difference between middle periods = 1985 1981 = 4
Difference between semi averages = 112 - 107 = 5
174
Annual increase in trend =
4
5
= 1.25
Year 1980 1981 1982 1983 1984 1985 1986
Trend 105.75 107 108.25 109.50 110.75 112 113.25
Example 15
The sales in tonnes of a commodity varied from 1994
to 2001 as given below:
Year 1994 1995 1996 1997
Sales 270 240 230 230
Year 1998 1999 2000 2001
Sales 220 200 210 200
Find the trend values by the method of semi-average.
Estimate the sales in 2005.
Solution :
Year Sales Semi total Semi average
1994 270
1995 240
1996 230
970 242.5
1997 230
1998 220
1999 200
2000 210
830 207.5
2001 200
Difference between middle periods = 1999.5 - 1995.5 = 4
Decrease in semi averages = 242.5 - 207.5 = 35
Annual decrease in trend =
4
35
= 8.75
Half yearly decrease in trend = 4.375
175
Year 1994 1995 1996 1997
Sales trend 255.625 246.875 238.125 229.375
Year 1998 1999 2000 2001
Sales trend 220.625 211.875 203.125 194.375
Trend value for the year 2005 = 194.375 (8.75 x 4) = 159.375
(iii) Method of Moving Averages
This method is simple and flexible algebraic method of
measuring trend. The method of Moving Average is a simple device
for eliminating fluctuations and obtaining trend values with a fair
degree of accuracy. The technique of Moving Average is based on
the arithmetic mean but with a distinction. In arithmetic mean we
sum all the items and divide the sum by number of items, whereas in
Moving Average method there are various averages in one series
depending upon the number of years taken in a Moving Average.
While applying this method, it is necessary to define a period for
Moving Average such as 3 yearly moving average (odd number of
years), 4-yearly moving average (even number of years) etc.
Moving Average - Odd number of years (say 3 years)
To find out the trend values by the method of 3-yearly moving
averages, the following steps are taken into consideration.
1) Add up the values of the first three years and place the yearly
sum against the median year i.e. the 2nd year.
2) Leave the first year item and add up the values of the next
three years. i.e. from the 2nd year to 4 th year and place the
sum (known as moving total) against the 3rd year.
3) Leave the first two items and add the values of the next three
years. i.e. from 3rd year to the 5th year and place the sum
(moving total) against the 4th year.
4) This process must be continued till the value of the last item is
taken for calculating the moving average.
5) Each 3-yearly moving total must be divided by 3 to get
themoving averages. This is our required trend values.
176
Note
The above 5 steps can be applied to get 5-years, 7-years, 9-
years etc., Moving Averages.
Moving Average - Even number of years (say 4 years)
1) Add up the values of the first four years and place the sum
against the middle of 2nd and 3rd years.
2) Leave the first year value and add from the 2nd year onwards
to the 5th year and write the sum (moving total) against the
middle of the 3rd and the 4th items.
3) Leave the first two year values and add the values of the next
four years i.e. from the 3rd year to the 6th year. Place the
sum (moving total) against the middle of the 4th and the 5th
items.
4) This process must be continued till the value of the last item is
taken into account.
5) Add the first two 4-years moving total and write the sum
against 3rd year.
6) Leave the first 4-year moving total and add the next two 4-
year moving total. This sum must be placed against 4th year.
7) This process must be continued till all the four-yearly moving
totals are summed up and centred.
8) Divide the 4-years moving total centred by 8 and write the
quotient in a new column. These are our required trend values.
Note
The above steps can be applied to get 6-years, 8-years,
10-years etc., Moving Averages.
Example 16
Calculate the 3-yearly Moving Averages of the
production figures (in mat. tonnes) given below
177
Year 1973 1974 1975 1976 1977 1978 1979 1980
Production 15 21 30 36 42 46 50 56
Year 1981 1982 1983 1984 1985 1986 1987
Production 63 70 74 82 90 05 102
Solution :
Calculation of 3-yearly Moving Averages
Year Productiion 3-yearly 3-yearly
y Moving total Moving average
1973 15 --- ---
1974 21 66 22.00
1975 30 87 29.00
1976 36 108 36.00
1977 42 124 41.33
1978 46 138 46.00
1979 50 152 50.67
1980 56 169 56.33
1981 63 189 63.00
1982 70 207 69.00
1983 74 226 75.33
1984 82 246 82.00
1985 90 267 89.00
1986 95 287 95.67
1987 102 --- ---
Example 17
Estimate the trend values using the data given below by
taking 4-yearly Moving Average.
Year 1974 1975 1976 1977 1978 1979 1980
Value 12 25 39 54 70 37 105
Year 1981 1982 1983 1984 1985 1986 1987
Value 100 82 65 49 34 20 7
178
Solution :
Year value 4 year 4 year Two 4-year
moving moving total moving total
total centered (Trend values)
1974 12 --- --- ---
1975 25
130
--- ---
1976 39
188
318 39.75
1977 54
200
388 48.50
1978 70
266
466 58.25
1979 37
312
578 72.25
1980 105
324
636 79.50
1981 100
352
676 84.50
1982 82
296
648 81.00
1983 65
230
526 65.75
1984 49
168
398 49.75
1985 34
110
278 34.75
1986 20 --- ---
1987 7 --- ---
(iv) Method of Least Squares
The method of least squares is most widely used in practice.
The method of least squares may be used to fit a straight line trend.
The straight line trend is generally expressed by an equation
y
t
= a + bx
Where y
t
is used to represent the trend values, a is the
intercept, b represents slope of the line which is also known as
the ratio of growth during a unit of time. The variable x represents
the time periods.
179
In order to determine the values of the unknown constants
a and b the following equations, known as normal equations,
are used.
y = na + bx
xy = ax + bx
2
,
where n represents number of observations (years, months
or any other period) for which the data are given.
For derivation of normal equations, refer pages 61 & 62 of
Chapter 7. To solve the above normal equations and get trend
values the following are the computational steps.
Case (i) When the number of years is odd
1) Denote the years as the X variable and its corresponding
values as Y.
2) Assume the middle year as the period of origin and take
deviations accordingly. Thus X = 0.
3) Find X
2
, Y
2
and XY.
4) Substitute X, X
2
, Y and XY in the above normal
equation and the solve it.
Hence a =
n
Y
b =
2
X
XY
x 100 where w = p
0
q
0
is the weight
assigned to the items and P
01
is the price index.
(ii) Paasches price index
P
01
P =
1 0
1 1
q p
q p
x 100
Here the weights assigned to the items are the current year
quantities i.e. W = p
0
q
1
(iii) Fishers price Index
F
01
P =
P
01
L
01
P P x =
1 0
1 1
0 0
0 1
q p
q p
q p
q p
x x 100
Note
Fishers price index is the geometric mean of Laspeyres and
Paasches price index numbers.
Example 21
Compute (i) Laspeyres (ii) Paasches and (iii) Fishers
Index Numbers for the 2000 from the following :
Commodity Price Quantity
1990 2000 1990 2000
A 2 4 8 6
B 5 6 10 5
C 4 5 14 10
D 2 2 19 13
190
Solution :
Commodity Price Quantity
Base Current Base Current
year year year year
p
0
p
1
q
0
q
1
p
0
q
0
p
1
q
0
p
0
q
1
p
1
q
1
A 2 4 8 6 16 32 12 24
B 5 6 10 5 50 60 25 30
C 4 5 14 10 56 70 40 50
D 2 2 19 13 38 38 26 26
160 200 103 130
(i) Laspeyres Index :
L
01
P =
0 0
0 1
q p
q p
x 100
=
160
200
x 100 = 125
(ii) Paasches Index :
P
01
P =
1 0
1 1
q p
q p
x 100
=
103
130
x 100 = 126.21
(iii) Fishers Index :
F
01
P =
P
01
L
01
P P x
= 125.6
Example 22
From the following data, calculate price index number
by (a) Laspeyres method (b) Paasches method (iii) Fishers
method.
Commodity Base year Current year
Price Quantity Price Quantity
A 2 40 6 50
B 4 50 8 40
C 6 20 9 30
D 8 10 6 20
E 10 10 5 20
191
Solution :
Commodity Base year Current year
Price Qty Price Qty
p
0
q
0
p
1
q
1
p
0
q
0
p
1
q
0
p
0
q
1
p
1
q
1
A 2 40 6 50 80 240 100 300
B 4 50 8 40 200 400 160 320
C 6 20 9 30 120 180 180 270
D 8 10 6 20 80 60 160 120
E 10 10 5 20 100 50 200 100
580 930 800 1110
(i) Laspeyres Price Index :
L
01
P =
0 0
0 1
q p
q p
x 100
=
580
930
x 100 = 160.34
(ii) Paasches Price Index :
P
01
P =
1 0
1 1
q p
q p
x 100
=
800
1100
x 100 = 137.50
(iii) Fishers Index :
F
01
P =
P
01
L
01
P P x = 148.48
10.4.5 Test of adequacy for Index Number
Index Numbers are constructed to study the relative changes
in prices, quantities, etc. of one time in comparison with another.
Several formulae have been suggested for constructing index
numbers and one should select the most appropriate one in a given
situation. Following are the tests suggested for choosing an
appropriate index.
1) Time reversal test
2) Factor reversal test
192
Time reversal test
It is a test to determine whether a given method will work
both ways in time, forward and backward. When the data for any
two years are treated by the same method, but with the bases
reversed, the two index numbers secured should be reciprocals of
each other so that their product is unity. Symbolically the following
relation should be satisfied.
P
01
x P
10
= 1
(ignoring the factor 100 in each index) where P
01
is the index for
current period 1 on base period 0 and P
10
is the index for the current
period 0 on base period 1.
Factor reversal test
This test holds that the product of a price index and the
quantity index should be equal to the corresponding value index.
The test is that the change in price multiplied by the change in quantity
should be equal to the total change in value.
P
01
x Q
01
=
0 0
1 1
q p
q p
(ignoring the factor 100 in each index)
P
01
gives the relative change in price and Q
01
gives the
relative change in quantity. The total value of a given commodity
in a given year is the product of the quantity and the price per
unit.
0 0
1 1
q p
q p
x x 100
=
571
1500
510
1380
x x 100
= 2.6661 x 100 = 266.61
Time reversal test
Test is satisfied when P
01
x P
10
= 1
194
P
01
=
1 0
1 1
0 0
0 1
q p
q p
q p
q p
x =
571
1500
510
1380
x
P
10
=
1 1
0 0
1 1
1 0
q p
q p
q p
q p
x =
1380
510
1500
571
x
P
01
x P
10
=
1380
510
1500
571
571
1500
510
1380
x x x
= 1 = 1
Hence Fishers Ideal Index satisfies Time reversal test.
Factor reversal test
Test is satisfied when P
01
x Q
01
=
0 0
1 1
q p
q p
Q
01
=
1 0
1 1
0 0
0 1
p q
p q
p q
p q
x =
1380
1500
510
571
x
P
01
x Q
01
=
1380
1500
510
571
571
1500
510
1380
x x x
=
510
1500
=
0 0
1 1
q p
q p
x x 100
=
470
530
425
505
x x 100 = 115.75
Time reversal test
Test is satisfied when P
01
x P
10
= 1
P
01
=
1 0
1 1
0 0
0 1
q p
q p
q p
q p
x =
470
530
425
505
x
P
10
=
0 1
0 0
1 1
1 0
q p
q p
q p
q p
x =
505
425
530
470
x
P
01
x P
10
=
505
425
530
470
470
530
425
505
x x x
= 1 = 1
Hence Fishers Ideal Index satisfies Time Reversal Test.
Factor reversal test
Test is satisfied when P
01
x Q
01
=
0 0
1 1
q p
q p
196
Q
01
=
1 0
1 1
0 0
0 1
p q
p q
p q
p q
x =
505
425
530
470
x
P
01
x Q
01
=
505
530
425
470
470
530
425
505
x x x
=
425
530
=
0 0
1 1
q p
q p
x 100
This method is the most popular method for constructing cost
of living index and the method is same as Laspeyres price index.
Family budget method
In this method, the value weights obtained by multiplying
prices by quantities consumed (i.e. p
0
q
0
) are taken as weights. To
get the cost of living index, find the sum of respective products of
price relatives and value weights and then divide this sum by the
sum of the value weights.
Cost of living Index =
V
PV
where
P =
0
1
p
p
x 100 is the price relative and
V = p
0
q
0
is the value weight for each item.
This method is same as the Weighted average of price relative
method.
10.4.8 Uses of cost of living index number
(i) The cost of living index number is mainly used in wage
negotiations and wage contracts.
(ii) It is used to calculate the dearness allowance for the
employees.
198
Example 25
Calculate the cost of living index by aggregate
expenditure method
Commodity Quantity
Price (Rs)
2000 2000 2003
A 100 8 12.00
B 25 6 7.50
C 10 5 5.25
D 20 48 52.00
E 65 15 16.50
F 30 19 27.00
Solution :
Commodity Quantity Price
2000 2000 2003
q
0
p
0
p
1
p
1
q
0
p
0
q
0
A 100 8 12.00 1200.00 800
B 25 6 7.50 187.50 150
C 10 5 5.25 52.50 50
D 20 48 52.00 1040.00 960
E 65 15 16.50 1072.50 975
F 30 19 27.00 810.00 570
4362.50 3505
C. L. I =
0 0
0 1
q p
q p
x 100
=
3505
50 . 4362
x 100 = 124.46
199
Example 26
Construct the cost of living Index Number for 2003 on
the basis of 2000 from the following data using family Budget
method.
Items
Price
Weights
2000 2003
Food 200 280 30
Rent 100 200 20
Clothing 150 120 20
Fuel & lighting 50 100 10
Miscellaneous 100 200 20
Solution :
Calculation of CLI by family budget method
Items p
0
p
1
weights P=
0
1
p
p
x100 PV
V
Food 200 280 30 140 4200
Rent 100 200 20 200 4000
Clothing 150 120 20 80 1600
Fuel & Lighting 50 100 10 200 2000
Misc. 100 200 20 200 4000
100 15800
Cost of living index =
V
PV
=
100
15800
= 158
Hence, there is 58% increase in cost of living in 1986
compared to 1980.
200
EXERCISE 10.4
1) Compute (i) Laspeyres (ii) Paasches and (iii) Fishers index
Numbers
Commodity Price Quantity
Base Current Base Current
year year year year
A 6 10 50 50
B 2 2 100 120
C 4 6 60 60
D 10 12 30 25
2) Construct the price index number from the following data by
applying
(i) Laspeyres (ii) Paasches (iii) Fishers method
Commodity 1999 1998
Price Quantity Price Quantity
A 4 6 2 8
B 6 5 5 10
C 5 10 4 14
D 2 13 2 19
3) Compute (a) Laspyres (b) Paasches (c) Fishers method
of index numbers for 1990 from the following :
Commodity
Price Quantity
1980 1990 1980 1990
A 2 4 8 6
B 5 6 10 5
C 4 5 14 10
D 2 2 19 13
4) From the following data calculate the price index number by
(a) Laspeyres method (b) paasches method
(c) Fishers method
201
Commodity Base year Current year
Price Quantity Price Quantity
A 5 25 6 30
B 10 5 15 4
C 3 40 2 50
D 6 30 8 35
5) Using the following data, construct Fishers Ideal index and
show that it satisfies Factor Reversal test and Time Reversal
test.
Commodity Price Quantity
Base Current Base Current
year year year year
A 6 10 50 56
B 2 2 100 120
C 4 6 60 60
D 10 12 30 24
E 8 12 40 36
6) Calculate Fishers Ideal Index from the following data and show
how it satisfies time reversal test and factor reversal test.
Commodity Base year Current year
(1997) (1998)
Price Quantity Price Quantity
A 10 10 12 8
B 8 12 8 13
C 12 12 15 8
D 20 15 25 10
E 5 8 8 8
F 2 10 4 6
202
7) Construct cost of living index for 2000 taking 1999 as the base
year from the following data using Aggregate Expenditure
method.
Commodity
Quantity (kg.) Price
1999 1999 2000
A 6 5.75 6.00
B 1 5.00 8.00
C 6 6.00 9.00
D 4 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00
8) Calculate the cost of living Index Number using Family Budget
method
Commodity A B C D E F G H
Quantity in
Base year (unit) 20 50 50 20 40 50 60 40
Price in
Base year (Rs.) 10 30 40 200 25 100 20 150
Price in
current year (Rs) 12 35 50 300 50 150 25 180
9) Calculate the cost of living index number using Family Budget
method for the following data taking the base year as 1995
Commodity Weight Price (per unit)
1995 1996
A 40 16.00 20.00
B 25 40.00 60.00
C 5 0.50 0.50
D 20 5.12 6.25
E 10 2.00 1.50
203
10) From the data given below, construct a cost of living index
number by family budget method for 1986 with 1976 as the
base year.
Commodity P Q R S T U
Quantity in 1976
Base year 50 25 10 20 30 40
Price per
unit in 1976 (Rs.) 10 5 8 7 9 6
Price per
unit in 1986 (Rs.) 6 4 3 8 10 12
10.5 STATISTICAL QUALITY CONTROL (SQC)
Every product manufactured is required for a specific
purpose. It means that if the product meets the specifications
required for its rightful use, it is of good quality and if not, then
the quality of the product is considered to be poor.
It is a well known fact that all repetitive process no matter
how carefully arranged are not exactly identical and contain some
variability. Even in the manufacture of commodities by highly
specialised machines it is not unusual to come across differences
between various units of production. For example, in the
manufacture of corks, bottles etc. eventhough highly efficient
machines are used some difference may be noticed in various
units. If the differences are not much, it can be ignored and the
product can be passed off as within specifications. But if it is
beyond certain limits, the article has to be rejected and the cause
of such variation has to be investigated.
10.5.1 Causes for variation
The variation occurs due to two types of causes namely
(i) Chance causes (ii) Assignable causes
204
(i) Chance causes
If the variation occurs due to some inherent pattern of variation
and no causes can be assigned to it, it is called chance or random
variation. Chance Variation is tolerable, permissible inevitable and
does not materially affect the quality of a product.
(ii) Assignable causes
The causes due to faulty process and procedure are known
as assignable causes. the variation due to assignable causes is of
non-random nature. Chance causes cannot detected. However
assignable causes can be detected and corrected.
10.5.2 Role and advantages of SQC
The role of statistical quality control is to collect and analyse
relevant data for the purpose of detecting whether the process is
under control or not.
The value of quality control lies in the fact that assignable
causes in a process can be quickly detected. Infact the variations
are often discovered before the product becomes defective.
SQC is a well accepted and widespread process on the basis
of which it is possible to understand the principles and techniques
by which decisions are made based on variation.
Statistical quality control is only diagonstic. It tells us
whether the standard is being maintained or not. The remedial
action rests with the technicians who employ techniques for the
maintenance of uniform quality in a continuous flow of
manufactured products.
The purpose for which SQC are used are two fold namely,
(a) Process control, (b) Product control.
In process control an attempt is made to find out if a particular
process is within control or not. Process control helps in studying
the future performance.
205
10.5.3 Process and Product control
The main objective in any production process is to control
and maintain quality of the manufactured product so that it conforms
to specified quality standards. In otherwords, we want to ensure
that the proportion of defective items in the manufactured product
is not too large. This is called process control and is achieved
through the technique of control charts.
On the otherhand, by product control we mean controlling
the quality of the product by critical examination at strategic points
and this is achieved through product control plans pioneered by
Dodge and Romig. Product control aims at guaranteeing a certain
quality level to the consumer regardless of what quality level is being
maintained by the producer. In otherwords, it attempts to ensure
that the product marketed by the sale department does not contain
a large number of defective items.
10.5.4 Control Charts
The statistical tool applied in process control is the control
chart. Control charts are the devices to describe the patterns of
variation. The control charts were developed by the physicist,
Walter A. Stewart of Bell Telephone Company in 1924. He
suggested that the control chart may serve, first to define the goal
or standard for the process that the management might strive to
attain. Secondly, it may be used as an instrument to attain that goal
and thirdly, it may serve as a means of judging whether the goal is
being achieved. Thus, control chart is an instrument to be used in
specification, production and inspection and is the core of statistical
quality control.
A control chart is essentially a graphic device for presenting
data so as to directly reveal the frequency and extent of variations
from established standards or goals. Control charts are simple to
construct and easy to interpret and they tell the manager at a glance
whether or not the process is in control, i.e. within the tolerance
limits.
206
In general a control chart consists of three horizontal lines
(i) A central line to indicate the desired standard or level
of the process (CL)
(ii) Upper control limit (UCL) and
(iii) Lower control limit (LCL)
Outline of a control chart
Sample Numbers
From time to time a sample is taken and the data are plotted
on the graph. If all the sample points fall within the upper and lower
control limits, it is asumed that the process is in control and only
chance causes are present. When a sample point falls outside the
control limits, it is assumed that variations are due to assignable
causes.
Types of Control Charts
Broadly speaking, control charts can be divided under two
heads.
(i) Control charts of Variables
(ii) Control charts of Attributes
Control charts of variables concern with measurable data on
quality characteristics which are usually continous in nature. Such
type of data utilises X and R chart.
O
Q
u
a
l
i
t
y
s
c
a
l
e
1 2 3 4 5 6 7 8
out of control
out of control
UCL
CL
LCL
+3
3
(Fig. 10.9)
207
Control charts of attirbutes, namely c, np and p charts
concern with the data on quality characteristics, which are not
amenable to measurement or attributes (prodcut defective or non
defective)
In this chapter, we consider only the control charts of
variables, namely X chart and R chart.
R-Chart (Range chart)
The R chart is used to show the variability or dispersion of
the quality produced by a given process. R chart is the companion
chart to the Xchart and both are usually required for adequate
analysis of the production process under study. The R chart is
generally presented along with the Xchart. The general procedure
for constructing the R chart is similar to that for the Xchart. The
required values for constructing the R chart are :
(i) The range of each sample, R.
(ii) The Mean of the sample ranges, R
(iii) The control limits are set at
U.C.L = D
4
R
L.C.L = D
3
R
The values of D
4
and D
3
can be obtained from tables.
X Chart
The X chart is used to show the quality averages of the
samples drawn from a given process. The following values must
first be computed before an X chart is constructed.
1) Obtain the mean of each sample X
i
: i = 1, 2 ... n
2) Obtain the mean of the sample means
208
i.e X =
n
n
X ... X X
2 1
+ + +
where n is the total number of observations
3) The control limits are set at
U.C.L. = X+ A
2
R
LCL = X A
2
R , where R =
n
i
n
i
R
1 =
, where R
i
are
the sample ranges.
The values of A
2
for different n can be obtained from the tables.
Example 28
The following data relate to the life (in hours) of 10
samples of 6 electric bulbs each drawn at an interval of one
hour from a production process. Draw the control chart for
X and R and comment.
Sample No. Life time (in hours)
1 620 687 666 689 738 686
2 501 585 524 585 653 668
3 673 701 686 567 619 660
4 646 626 572 628 631 743
5 494 984 659 643 660 640
6 634 755 625 582 683 555
7 619 710 664 693 770 534
8 630 723 614 535 550 570
9 482 791 533 612 497 499
10 706 524 626 503 661 754
(Given for n = 6, A
2
= 0.483, D
3
= 0, D
4
= 2.004)
209
Solution :
Sample No. Total Sample Sample
Mean X Range R
1 4086 681 118
2 3516 586 167
3 3906 651 134
4 3846 641 171
5 4080 680 490
6 3834 639 200
7 3990 665 236
8 3622 604 188
9 3414 569 309
10 3774 629 251
Total 6345 2264
Central line X = mean of the sample means = 634.5
R = mean of the sample ranges = 226.4
U.C.L. = X+ A
2
R
= 634.5 + 0.483 x 226.4
= 634.5 + 109.35 = 743.85
L.C.L. = X- A
2
R
= 634.5 - 0.483 x 226.4
= 634.5 - 109.35 = 525.15
Central line R = 226.4
U.C.L. = D
4
R = 2.004 x 226.4
= 453.7056
L.C.L. = D
3
R = 0 x 226.4 = 0
210
.
.
.
.
.
.
.
.
X Chart
R Chart
0 1 2 3 4 5 6 7 8 9 10
Sample No.
750
725
700
675
650
625
600
575
550
525
S
a
m
p
l
e
m
e
a
n
.
(Fig. 10.10)
UCL
CL
LC
.
0 1 2 3 4 5 6 7 8 9
Sample No.
550
500
450
400
350
300
250
200
150
100
.
UCL
CL
LC
S
a
m
p
l
e
r
a
n
g
e
.
.
.
.
.
.
(Fig. 10.11)
.
.
.
211
Conclusion :
Since one of the points of the sample range is outside the
UCL of R chart, the process is not in control.
Example 29
The following data shows the value of sample mean X
and the range R for ten samples of size 5 each. Calculate the
values for central line and control limits for mean chart and
range chart and determine whether the process is in control
Sample no. 1 2 3 4 5 6 7 8 9 10
Mean X 11.2 11.8 10.8 11.6 11.0 9.6 10.4 9.6 10.6 10.0
Range (R) 7 4 8 5 7 4 8 4 7 9
(Given for n = 5, A
2
= 0.577, D
3
= 0 D
4
= 2.115)
Solution :
Control limits for X chart
X =
n
1
X
=
10
1
(11.2 + 11.8 + 10.8 + ...+ 10.0) = 10.66
R =
n
1
R =
10
1
(63) = 6.3
U.C.L = X+ A
2
R
= 10.66 + (0.577 x 6.3) = 14.295
L.C.L. = X A
2
R
= 10.66 (0.577 x 6.3) = 7.025
CL = Central line = X = 10.66
Range chart
U.C.L. = D
4
R = 2.115 x 6.3 = 13.324
L.C.L. = D
3
R = 0
C.L. = R = 6.3
212
X Chart
R Chart
Conclusion :
Since all the points of sample mean and range are within the
control limits, the process is in control.
0 1 2 3 4 5 6 7 8 9 10
Sample No.
16
14
13
12
11
10
9
8
7
6
.
UCL
CL
LCL
S
a
m
p
l
e
m
e
a
n
.
.
.
.
.
.
.
.
.
(Fig. 10.12)
0 1 2 3 4 5 6 7 8 9 10
Sample No.
13
12
11
10
9
8
7
6
5
4
.
UCL
CL
LCL
S
a
m
p
l
e
r
a
n
g
e
.
.
.
.
. .
.
.
.
(Fig. 10.13)
213
EXERCISE 10.5
1) The following are the X and R values for 20 samples of 5
readings. Draw X chart and R chart and write your
conclusion.
Samples 1 2 3 4 5 6 7 8 9 10
X 34 31.6 30.8 33 35 33.2 33 32.6 33.8 37.8
R 4 4 2 3 5 2 5 13 19 6
Samples 11 12 13 14 15 16 17 18 19 20
X 35.8 38.4 34 35 38.8 31.6 33 28.2 31.8 35.6
R 4 4 14 4 7 5 5 3 9 6
(Given for n = 5, A
2
= 0.58, D
3
= 0 D
4
= 2.12)
2) From the following, draw X and R chart and write your
conclusion.
Sample no. 1 2 3 4 5 6 7 8 9 10
140 138 139 143 142 136 142 143 141 142
143 143 133 141 142 144 147 137 142 137
137 143 147 137 145 143 137 145 147 145
134 145 148 138 135 136 142 137 140 140
135 146 139 140 136 137 138 138 140 132
Sample no. 11 12 13 14 15 16 17 18 19 20
137 137 142 137 144 140 137 137 142 136
147 146 142 145 142 132 137 142 142 142
142 142 139 144 143 144 142 142 143 140
137 142 141 137 135 145 143 145 140 139
135 140 142 140 144 141 141 143 135 137
(Given for n = 5, A
2
= 0.58, D
3
= 0, D
4
= 2.12)
3) From the following data construct X and R chart and write
your conclusion
214
Sample no. 1 2 3 4 5 6 7 8 9
46 41 43 37 37 37 44 35 37
40 42 40 40 40 38 39 39 44
48 49 46 47 46 49 43 48 48
Sample no. 10 11 12 13 14 15 16 17 18
45 48 36 40 42 38 47 42 47
43 44 42 39 40 40 44 45 42
49 48 48 48 48 48 49 37 49
(Given for n = 3, A
2
= 1.02, D
3
= 0, D
4
= 2.58)
EXERCISE 10.6
Choose the correct answer
1) A time series is a set of data recorded
(a) periodically (b) at equal time intervals
(c) at successive points of time (d) all the above
2) A time series consists of
(a) two components (b) three components
(c) four components (d) none of these
3) The component of a time series attached to long term
variation is termed as
(a) cyclic variations (b) secular trend
(c) irregular variation (d) all the above
4) The component of a time series which is attached to short
term fluctuations is
(a) seasonal variation (b) cyclic variation
(c) irregular variation (d) all the above
5) Cyclic variations in a time series are casued by
(a) lock out in a factory (b) war in a country
(c) floods in the states (d) none of these
215
6) The terms prosperity, recession depression and recovery are
in particular attached to
(a) Secular trend (b) seasonal fluctuation
(c) cyclic movements (d) irregular variation
7) An additive model of time series with the components T, S,
C and I is
(a) Y = T + S + C I (b) Y = T + S X C + I
(c) Y = T + S + C + I (d) Y = T + S + C X I
8) A decline in the sales of ice cream during November to March
is associated with
(a) Seasonal variation (b) cyclical variation
(c) random variation (d) secular trend
9) Index number is a
(a) measure of relative changes
(b) a special type of an average
(c) a percentage relative
(d) all the above.
10) Index numbers are expressed
(a) in percentages (b) in ratios
(c) in terms of absolute value (d) all the above
11) Most commonly used index numbers are
(a) Diffusion index number (b) price index number
(c) value index number (d) none of these
12) Most frequently used index number formulae are
(a) weighted formulae (b) Unweighted formulae
(c) fixed weighted formulae (d) none of these
13) Laspeyres index formula uses the weights of the
(a) base year quantities (b) current year prices
(c) average of the weights of number of years
(d) none of these
216
14) The weights used in Paasches formula belong to
(a) the base period (b) the current period
(c) to any arbitrary chosen period (d) none of these
15) Variation in the items produced in a factory may be due to
(a) chance causes (b) assignable causes
(c) both (a) and (b) (d) neither (a) or (b)
16) Chance variation in the manufactured product is
(a) controlable (b) not controlable
(c) both (a) and (b) (d) none of these
17) The causes leading to vast variation in the specification of a
product are usually due to
(a) random process (b) assignable causes
(c) non-traceable causes (d) all the above
18) Variation due to assignable causes in the product occur due to
(a) faulty process (b) carelessness of operators
(c) poor quality of raw material (d) all the above
19) Control charts in statistical quality consist of
(a) three control lines
(b) upper and lower control limits
(c) the level of process
(d) all the above
20) The range of correlation co-efficient is
(a) 0 to (b) 10
(c) 1 to 1 (d) none of these
21) If X and Y are two variates, there can be atmost
(a) one regression line (b) two regression lines
(c) three regression lines (d) none of these
217
22) In a regression line of Y on X, the variable X is known as
(a) independent variable (b) dependent variable
(c) both (a) and (b) (d) none of these
23) Scatter diagram of the variate values (X, Y) give the idea
about
(a) functional relationship (b) regression model
(c) distribution of errors (d) none of these
24) The lines of regression intersect at the point
(a) (X, Y) (b) ( X, Y)
(c) (0, 0 ) (d) none of these
25) The term regression was introduced by
(a) R.A.Fisher (b) Sir Francis Galton
(c) Karl pearson (d) none of these.