Research Methodology Part 3 - Shrivastava - Ibrg
Research Methodology Part 3 - Shrivastava - Ibrg
CONTENTS
Objectives
Introduction
11.1 Definitions and Characteristics of Index Numbers
11.2 Uses of Index Numbers
11.3 Construction of Index Numbers
11.4 Price Index Numbers
11.4.1 Use of Price Index Numbers in Deflating
11.5 Quantity Index Numbers
11.6 Consumer Price Index Number
11.6.1 Construction of Consumer Price Index
11.6.2 Uses of Consumer Price Index
11.7 Problems in the Construction of Index Numbers
11.8 Limitations of Index Numbers
11.9 Summary
11.10 Keywords
11.11 Review Questions
11.12 Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
An index number is a statistical measure used to compare the average level of magnitude of a
group of distinct but related variables in two or more situations. Suppose that we want to
compare the average price level of different items of food in 1992 with what it was in 1990. Let
the different items of food be wheat, rice, milk, eggs, ghee, sugar, pulses, etc. If the prices of all
these items change in the same ratio and in the same direction; assume that prices of all the items
have increased by 10% in 1992 as compared with their prices in 1990; then there will be no
difficulty in finding out the average change in price level for the group as a whole. Obviously,
Notes the average price level of all the items taken as a group will also be 10% higher in 1992 as
compared with prices of 1990. However, in real situations, neither the prices of all the items
change in the same ratio nor in the same direction, i.e., the prices of some commodities may
change to a greater extent as compared to prices of other commodities. Moreover, the price of
some commodities may rise while that of others may fall. For such situations, the index numbers
are very useful device for measuring the average change in prices or any other characteristics
like quantity, value, etc. for the group as a whole.
1. “An index number is a device for comparing the general level of magnitude of a group of
distinct, but related, variables in two or more situations.”
2. “An index number is a special type of average that provides a measurement of relative
changes from time to time or from place to place.”
3. “Index number shows by its variation the changes in a magnitude which is not susceptible
either of accurate measurement in itself or of direct valuation in practice.”
— Edgeworth
4. “An index number is a single ratio (usually in percentage) which measures the combined
(i.e., averaged) change of several variables between two different times, places or
situations.”
— Tuttle
On the basis of the above definitions, the following characteristics of index numbers are worth
mentioning:
1. Index numbers are specialised averages: As we know that an average of data is its
representative summary figure. In a similar way, an index number is also an average,
often a weighted average, computed for a group. It is called a specialised average because
the figures, that are averaged, are not necessarily expressed in homogeneous units.
2. Index numbers measure the changes for a group which are not capable of being directly
measured: The examples of such magnitudes are: Price level of a group of items, level of
business activity in a market, level of industrial or agricultural output in an economy, etc.
3. Index numbers are expressed in terms of percentages: The changes in magnitude of a group
are expressed in terms of percentages which are independent of the units of measurement.
This facilitates the comparison of two or more index numbers in different situations.
Self Assessment
1. Index numbers are called a specialised average because the figures, that are averaged, are
not necessarily expressed in ………………..units.
1. To measure and compare changes: The basic purpose of the construction of an index number
is to measure the level of activity of phenomena like price level, cost of living, level of
agricultural production, level of business activity, etc. It is because of this reason that
sometimes index numbers are termed as barometers of economic activity. It may be
mentioned here that a barometer is an instrument which is used to measure atmospheric
pressure in physics.
The level of an activity can be expressed in terms of index numbers at different points of
time or for different places at a particular point of time. These index numbers can be easily
compared to determine the trend of the level of an activity over a period of time or with
reference to different places.
2. To help in providing guidelines for framing suitable policies: Index numbers are
indispensable tools for the management of any government or non-government
organisation.
Example: The increase in cost of living index is helpful in deciding the amount of
additional dearness allowance that should be paid to the workers to compensate them for the
rise in prices. In addition to this, index numbers can be used in planning and formulation of
various government and business policies.
3. Price index numbers are used in deflating: This is a very important use of price index
numbers. These index numbers can be used to adjust monetary figures of various periods
for changes in prices.
Example: The figure of national income of a country is computed on the basis of the
prices of the year in question. Such figures, for various years often known as national income at
current prices, do not reveal the real change in the level of production of goods and services. In
order to know the real change in national income, these figures must be adjusted for price
changes in various years. Such adjustments are possible only by the use of price index numbers
and the process of adjustment, in a situation of rising prices, is known as deflating.
4. To measure purchasing power of money: We know that there is inverse relation between
the purchasing power of money and the general price level measured in terms of a price
index number. Thus, reciprocal of the relevant price index can be taken as a measure of the
purchasing power of money.
Self Assessment
To illustrate the construction of an index number, we reconsider various items of food mentioned
earlier. Let the prices of different items in the two years, 1990 and 1992, be as given below:
The comparison of price of an item, say wheat, in 1992 with its price in 1990 can be done in two
ways, explained below:
1. By taking the difference of prices in the two years, i.e., 360 - 300 = 60, one can say that the
price of wheat has gone up by 60/quintal in 1992 as compared with its price in 1990.
360
2. By taking the ratio of the two prices, i.e., = 1.20, one can say that if the price of wheat
300
in 1990 is taken to be 1, then it has become 1.20 in 1992. A more convenient way of
comparing the two prices is to express the price ratio in terms of percentage, i.e.,
360
´ 100 = 120 , known as Price Relative of the item. In our example, price relative of
300
wheat is 120 which can be interpreted as the price of wheat in 1992 when its price in 1990
is taken as 100. Further, the figure 120 indicates that price of wheat has gone up by 120 – 100
= 20% in 1992 as compared with its price in 1990.
The first way of expressing the price change is inconvenient because the change in price depends
upon the units in which it is quoted. This problem is taken care of in the second method, where
price change is expressed in terms of percentage. An additional advantage of this method is that
various price changes, expressed in percentage, are comparable. Further, it is very easy to grasp
the 20% increase in price rather than the increase expressed as 60/quintal.
For the construction of index number, we have to obtain the average price change for the group
in 1992, usually termed as the Current Year, as compared with the price of 1990, usually called
the Base Year. This comparison can be done in two ways:
1. By taking suitable average of price relatives of different items. The methods of index
number construction based on this procedure are termed as Average of Price Relative
Methods.
2. By taking ratio of the averages of the prices of different items in each year. These methods
are popularly known as Aggregative Methods.
Since the average in each of the above methods can be simple or weighted, these can further be
divided as simple or weighted. Various methods of index number construction can be classified
as shown below:
In addition to this, a particular method would depend upon the type of average used. Although, Notes
geometric mean is more suitable for averaging ratios, arithmetic mean is often preferred because
of its simplicity with regard to computations and interpretation.
Current Year: The year under consideration for which the comparisons are to be computed
is called the current year. It is commonly denoted by writing ‘1’ as a subscript of the
variable.
Let there be n items in a group which are numbered from 1 to n. Let p0i denote the price of
the i th item in base year and p 1i denote its price in current year, where i = 1, 2, ...... n. In a
similar way q 0i and q1i will denote the quantities of the i th item in base and current years
respectively.
Using these notations, we can write an expression for price relative of the ith item as
p1i q
Pi = ´ 100 and quantity relative of the i th item as Q i = 1i ´ 100 .
p0i q 0i
Further, P01 will be used to denote the price index number of period ‘1’ as compared with
the prices of period ‘0’. Similarly, Q 01 and V01 would denote the quantity and the value
index numbers respectively of period ‘1’ as compared with period ‘0’.
Self Assessment
p1i
åP åp ´ 100
= or P01 =
i 0i
The index number formula is given by P01 Omitting the
n n
p1
åp ´ 100
0
subscript i, the above formula can also be written as P 01 =
n
Notes 1
1
n
P01 = P1 × P2 × ..... Pn ) n = Pi
i=1
n log Pi
= Antilog
n
Example: Given below are the prices of 5 items in 1985 and 1990. Compute the simple price
index number of 1990 taking 1985 as base year. Use (a) arithmetic mean and (b) geometric mean.
Solution:
Calculation Table
Price Relative
Price in Price in
Item p1i log Pi
1985 (P0i) 1990 (P0i) Pi = × 100
p0i
1 15 20 1 3 3 .3 3 2 .1 2 4 9
2 8 7 8 7 .5 0 1 .9 4 2 0
3 200 300 1 5 0 .0 0 2 .1 7 6 1
4 60 110 1 8 3 .3 3 2 .2 6 3 2
5 100 130 1 3 0 .0 0 2 .1 1 3 9
Total 6 8 4 .1 6 1 0 .6 2 0 1
684.16
\ Index number, using A.M., is P01 = = 136.83 and Index number, using G.M., is
5
é 10.6201 ù
P01 = Antilog ê = 133.06
ë 5 úû
In the method of simple average of price relatives, all the items are assumed to be of equal
importance in the group. However, in most of the real life situations, different items of a group
have different degree of importance. In order to take this into account, weighing of different
items, in proportion to their degree of importance, becomes necessary.
Let wi be the weight assigned to the i th item (i = 1, 2, ...... n). Thus, the index number, given by
Similarly, the index number, given by the weighted geometric mean of price relatives can be Notes
written as follows:
1 1
w i ù å w i or P01 = Antilog ê å i
é w log Pi ù
é w1 w2 w n ù å wi é ú
P01 = P1 .P2 P = êÕ Pi ú
êë n úû
ëê ûú ëê å wi ûú
Nature of Weights
While taking weighted average of price relatives, the values are often taken as weights. These
weights can be the values of base year quantities valued at base year prices, i.e., p 0iq0i, or the
values of current year quantities valued at current year prices, i.e., p 1iq1i, or the values of current
year quantities valued at base year prices, i.e., p 0iq1i, etc., or any other value.
Example: Construct an index number for 1989 taking 1981 as base for the following data, by
using
1. Weighted arithmetic mean of price relatives and
Prices in Prices in
Commodities Weights
1981 1989
A 60 100 30
B 20 20 20
C 40 60 24
D 100 120 30
E 120 80 10
Solution:
Calculation Table
Price Relative
Price in Price in
Item p log Pi
1985 (P0i ) 1990 (P0i ) Pi = 1i × 100
p0i
1 15 20 133.33 2.1249
2 8 7 87.50 1.9420
3 200 300 150.00 2.1761
4 60 110 183.33 2.2632
5 100 130 130.00 2.1139
Total 684.16 10.6201
14866.8
\ Index number using A.M. is P01 = = 130.41 and index number using G.M. is
114
é 239.498 ù
P01 = Antilog ê = 126.15
ë 114 úû
Notes
Task Taking 1983 as base year, calculate an index number of prices for 1990, for the
following data given in appropriate units, using:
1. Weighted arithmetic mean of price relatives by taking weights as the values of
current year quantities at base year prices, and
2. Weighted geometric mean of price relatives by taking weights as the values of base
year quantities at base year prices.
In this method, the simple arithmetic mean of the prices of all the items of the group for the
current as well as for the base year are computed separately. The ratio of current year average to
base year average multiplied by 100 gives the required index number.
Using notations, the arithmetic mean of prices of n items in current year is given by
åp 1i
and
n
åp 1i
Omitting the subscript i, the above index number can also be written as P01 =
åp 1
´ 100
åp 0
Example: The following table gives the prices of six items in the years 1980 and 1981. Use
simple aggregative method to find index of 1981 with 1980 as base.
Price in Price in
Item
1980 ( ) 1981 ( )
A 40 50
B 60 60
C 20 30
D 50 70
E 80 90
F 100 100
Solution: Notes
Let p0 be the price in 1980 and p 1 be the price in 1981. Thus, we have
400
\ P01 = ´ 100 = 114.29
350
This index number is defined as the ratio of the weighted arithmetic means of current to base
year prices multiplied by 100.
Using the notations, defined earlier, the weighted arithmetic mean of current year prices can be
written as =
åp w 1i i
åw i
åw i
åp w 1i i
åp w 0
Nature of Weights
In case of weighted aggregative price index numbers, quantities are often taken as weights.
These quantities can be the quantities purchased in base year or in current year or an average of
base year and current year quantities or any other quantities. Depending upon the choice of
weights, some of the popular formulae for weighted index numbers can be written as follows:
1. Laspeyres' Index: Laspeyres' price index number uses base year quantities as weights.
Thus, we can write
P01La =
åp 1i q 0i
´ 100 or P01La =
åp q 1 0
´ 100
åp 0i q 0i åp q 0 0
2. Paasche's Index: This index number uses current year quantities as weights. Thus, we can
write
P01Pa =
åp 1i q 1i
´ 100 or P01Pa =
åp q 1 1
´ 100
åp 0i q 1i åp q 0 1
3. Fisher's Ideal Index: As will be discussed later that the Laspeyres's Index has an upward
bias and the Paasche's Index has a downward bias. In view of this, Fisher suggested that an
ideal index should be the geometric mean of Laspeyres' and Paasche's indices. Thus, the
Fisher's formula can be written as follows:
Notes
P01F = P01La ´ P01Pa =
åp q 1 0
´ 100 ´
åp q 1 1
´ 100 =
åp q 1 0
´
åp q 1 1
´ 100
åp q 0 0 åp q 0 1 åp q 0 0 åp q 0 1
åp q 1 0 åp q 1 1
If we write L =
åp q 0 0
and P =
åp q 0 1
, the Fisher's Ideal Index can also be written as
P01 = L ´ P ´ 100 .
4. Dorbish and Bowley's Index: This index number is constructed by taking the arithmetic
mean of the Laspeyres's and Paasche's indices.
1 é å p 1q 0 åp q ù 1 é å p 1q 0 åp q ù
P01DB = ´ 100 + ´ 100 ú = ê + ú ´ 100 = 1 [L × P] × 100
1 1 1 1
ê
2 êë å p0 q 0 åp q 0 1 úû 2 êë å p0 q 0 åp q 0 1ú
û 2
5. Marshall and Edgeworth's Index: This index number uses arithmetic mean of base and
current year quantities.
æ q0 + q1 ö
åp çè
1
2 ø
÷
å p (q + q1 ) åp q + åp q
= ´ 100 = ´ 100 = ´ 100
1 0 1 0 1 1
P01ME
q +q
å p0 æçè 0 2 1 ö÷ø å p (q0 0 + q1 ) åp q + åp q
0 0 0 1
6. Walsh's Index: Geometric mean of base and current year quantities are used as weights in
this index number.
P01Wa =
åp 1 q 0q 1
´ 100
åp 0 q 0q 1
7. Kelly's Fixed Weights Aggregative Index: The weights, in this index number, are quantities
which may not necessarily relate to base or current year. The weights, once decided,
remain fixed for all periods. The main advantage of this index over Laspeyres's index is
that weights do not change with change of base year. Using symbols, the Kelly's Index can
be written as
P01Ke =
å p q ´ 100 1
åp q 0
Example: Calculate the weighted aggregative price index for 1990 from the following data
:
Price in Price in
Item Weights
1971 1990
A 8 9.5 5
B 12 12.5 1
C 6.5 9 3
D 4 4.5 6
E 6 7 4
F 2 4 3
Solution: Notes
Calculation Table
Price in Price in Weights
Item p0w p 1w
1971 (p0 ) 1990 (p1 ) (w)
A 8 9.5 5 40.0 47.5
B 12 12.5 1 12.0 12.5
C 6.5 9 3 19.5 27.0
D 4 4.5 6 24.0 27.0
E 6 7 4 24.0 28.0
F 2 4 3 6.0 12.0
Total 125.5 154.0
154.0
\ Price Index (1971 = 100) P01 = ´ 100 = 122.71
125.5
The term within bracket, i.e., 1971 = 100, indicates that base year is 1971.
Money Wage
Real Wage= ×100 .... (1)
Consumer Price Index
Another application of the process of deflating to find the value of output at constant prices so as
to facilitate the comparison of real changes in output. It may be pointed out here that the output
of a given year is often valued at the current year prices. Since prices in various years are often
different, the comparison of output at current year prices has no relevance.
The output at constant prices is obtained using the following formula.
Example: The following table gives the average monthly wages of a worker along with
the respective consumer price index numbers for ten years.
Years : 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Average monthly
: 500 525 560 600 630 635 700 740 800 900
wages ( )
Consumer Price
: 100 110 120 125 135 160 185 200 210 240
Index
Notes Solution:
Computation of Real Wages
Average Monthly Consumer Price Real average
Years monthly wage
wage Index
500
1980 500 100 × 100 = 500.00
100
525
1981 525 110 × 100 = 477.27
110
560
1982 560 120 × 100 = 466.67
120
600
1983 600 125 × 100 = 480.00
125
630
1984 630 135 × 100 = 466.67
135
635
1985 635 160 × 100 = 396.88
160
700
1986 700 185 × 100 = 378.38
185
740
1987 740 200 × 100 = 370.00
200
800
1988 800 210 × 100 = 380.95
210
900
1989 900 240 × 100 = 375.00
240
When prices in general are rising, the real value of a rupee is declining. If, e.g., the price index in
1992 with base 1990 is 120, the real value of a rupee in 1992 as compared with its value in 1990.
This implies that a rupee in 1992 is worth only 83 paise of 1990.
From the above we note that the purchasing power of a rupee in current year is equal to the
reciprocal of the price index multiplied by 100. Thus, we can write
100
We can also write Price Index=
Constant Rupee
Example: Given the following information on the Gross Domestic Product (in crores)
at the constant (1980 - 81) prices and at current prices for five years. Calculate the series of price
index numbers and of quantity index numbers for each of the five years with 1980 - 81 as base
year.
Solution: Notes
Calculation of Price and Quantity Index Numbers
GDP at GDP at
Quantity Index Price Index*
Year constant current
Number Series Number Series
Prices Prices
200
1980-81 200 200 100 ×100=100
200
150 240
1981-82 150 240 ×100=75 ×100=160
200 150
125 350
1982-83 125 350 ×100=62.5 ×100=280
200 125
120 360
1983-84 120 360 ×100=60 ×100=300
200 120
160 400
1984-85 160 400 ×100=80 ×100=250
200 160
Self Assessment
Fill in the blanks:
7. In case of weighted aggregative price index numbers, quantities are often taken
as………………..
åq
Simple aggregative index Q 01 = ´ 100
1
1.
åq 0
åq 1
´ 100
(a) Taking A.M. Q =
å q 0
=
åQ
01
n n
Notes é å log Q ù
(b) Taking G.M. Q01 = Antilog ê ú
êë n úû
3. Weighted aggregative index
åq p
01 = ´ 100 (base year prices are taken as weights)
1 0
(a) Q La
åq p 0 0
åq p
01 = ´ 100 (current year prices are taken as weights)
1 1
(b) Q Pa
åq p 0 1
Q Fi01 =
åq p 1 0
´
åq p 1 1
´ 100 Other aggregative formulae can also be written in a
(c)
åq p 0 0 åq p 0 1
similar way.
4. Weighted average of quantity relatives
å Qw
(a) Taking A.M. Q 01 =
åw
é å w log Q ù
(b) Taking G.M. Q 01 = Antilog ê ú
ëê å w ûú
Like weighted average of price relatives, values are taken as weights.
Example: Using Fisher's formula, the quantity index number from the following data:
1974 1976
Article
Price (Rs) Value (Rs) Price (Rs) Value (Rs)
A 5 50 4 48
B 8 48 7 49
C 6 18 5 20
Solution:
Calculation Table
1974 1976
Article V0 V1 p0 q 1 p 1q 0
p0 V0 q0 = p1 V1 q1 =
p0 p1
A 5 50 10 4 48 12 60 40
B 8 48 6 7 49 7 56 42
C 6 18 3 5 20 4 24 15
Total å p0q0 = 116 å p1q1 = 117 140 97
Q Fi01 =
åq p 1 0
´
åq p 1 1
´ 100 =
140 117
´ = 120.65
åq p 0 0 åq p 0 1 116 97
10. The formulae for quantity index numbers can be directly written from
…………………….simply by interchanging the role of price and quantity.
The consumer price or the retail price is the price at which the ultimate consumer purchases his
goods and services from the retailer. According to the Labour Bureau, “with the help of Consumer
Price Index Number, it is intended to show over time the average change in prices paid by the consumers
belonging to the population group proposed to be covered by the index for a fixed list of goods and services
consumed by them”.
Formerly, this index was also known as the cost of living index. However, since this index
measures changes in cost of living due to changes in retail prices only and not due to changes in
living standards, etc., the name was changed to consumer price index or retail price index.
The following steps are involved in the construction of a consumer price index:
1. Scope and Coverage: The scope of consumer price index, proposed to be constructed, must
be very clearly defined. This implies the identification of the class of people for whom the
index will be constructed such as industrial workers, agricultural workers, urban wage
earners, etc. Further, it is also necessary to define the coverage of the class of people, i.e.,
the definition of geographical location of their stay such as a city or two or more villages,
etc. The selected class of people should form a homogeneous group so that weights of
various commodities are same for all the people.
2. Selection of Base Period: A normal period having comparative economic stability should
be selected as a base period in order that the consumption pattern used in the construction
of the index remain practically stable over a fairly long period.
3. Conducting Family Budget Enquiry: A family budget gives the details of expenditure
incurred by the family on various items in a given period. In order to estimate the
consumption pattern, a sample survey of family budgets of the group of people, for whom
the index is to be constructed, is conducted and from this an average family budget is
prepared. The goods and services that are to be included in the construction of the index
are selected from this average family budget. Efforts should be made to include as many
commodities as possible. Generally the commodities are divided into five broad groups:
(i) Food, (ii) Clothing, (iii) Fuel and Lighting, (iv) House Rent and (v) Miscellaneous.
If necessary, these groups may further be divided into sub-groups. Percentage expenditure
of a group is taken as its weight.
4. Obtaining Price Quotations: The next step in the construction of consumer price index is
to obtain the retail price quotations of various items that are selected. The price quotations
should be obtained from those markets from which the group of people, for whom the
index number is being constructed, normally make purchases. The quality of various
goods and services used by the group of people should also be kept in mind while obtaining
price quotations.
Notes 5. Computation of the Index Number: After the collection of necessary data, the consumer
price index can be computed by using either of the following formulae.
(a) Aggregate Expenditure Method: Base year quantities are taken as weights in the
aggregate expenditure method. The formula for the consumer price index is given
å p1 p0
CP
by p01 = ´ 100 which is the Laspeyres’s formula.
å p0 p0
(b) Family Budget Method: This method is also known as weighted average of price
relatives method and accordingly values are taken as weights. The formula for the
å Pw , where p = p1 ´ 100
CP
consumer price index is given by P01 =
åw p0
Example: From the information given below, construct the consumer price index number
of 1985 by (i) Aggregate Expenditure Method, and (ii) Family Budget Method.
Solution:
Calculation of Consumer Price Index
p1
Com. p0 q0 p1q0 P= ´ 100 w = p0 q0 Pw
p0
A 150 250 166.67 150 25000.5
B 300 400 133.33 300 39999.0
C 120 160 133.33 120 15999.6
D 50 75 150.00 50 7500.0
E 112.5 125 111.11 112.5 12499.9
F 400 480 120.00 400 48000.0
G 25 40 160.00 25 4000.0
Total 1157.5 1530 1157.5 152999.0
1530
1. Index by agg. exp. method 100 132.18
1157.5
152999
2. Index by F.B. method 132.18
1157.5
1. A consumer price index is used to determine the real wages from money wages and the
purchasing power of money.
2. It is also used to determine the dearness allowance to compensate the workers for the rise Notes
in prices.
Example: A particular series of consumer price index covers five groups of items. Between
1975 and 1980 the index rose from 180 to 225. Over the same period the price index numbers of
various groups changed as follows:
Food from 198 to 252; clothing from 185 to 205; fuel and lighting from 175 to 195; miscellaneous
from 138 to 212; house rent remained unchanged at 150.
Given that the weights of clothing , house rent and fuel and lighting are equal, determine the
weights for individual groups of items.
Solution:
Let w1% be the weight of food, w2% be the weight of miscellaneous group and w% be the weight
of each of the remaining three groups. Therefore we can write w 1 + w2 + 3w = 100 or w 2 = 100 –
w1 – 3w.
The given data can be written in the form of table as given below:
Index in Index in
Groups Weights
1975 (I1 ) 1980 (I 2 )
Food w1 198 252
Clothing w 185 205
Fuel & Lighting w 175 195
House Rent w 150 150
Miscellaneous 100 - w 1 - 3w 138 212
Total 100
On the basis of above, the consumer price index of 1975 is
1.98w 1 + (185 + 175 + 150)w + 138(100 - w 1 - 3w)
= 180 (given)
100
or 60w 1 + 96w = 4200 .... (1)
Further, the consumer price index of 1980 is
Self Assessment
11. Formerly, Consumer Price index was also known as the ………………………index.
12. A consumer price index is used to determine the ……………….from money wages.
The following are some general problems that are faced in the construction of any index number:
1. Definition of the purpose: Since it is possible to construct index numbers for a number of
purposes and one cannot have an all purpose index, therefore, it is very essential to define
the specific purpose of its construction. For example, if we are interested in the construction
of a price index number, we must have knowledge about the purpose to be served by it,
i.e., what is to be measured by it; like the cost of living of workers or the change in
wholesale prices, etc. In the absence of this information, it may be difficult to carry out
various steps in the construction of an index number. The questions like what are items to
be included, from which of the markets the price quotations are to be obtained, what will
be the weights of different items, etc., cannot be answered unless the purpose of the index
number construction is known. Further, an index number can be of sensitive or general
nature. In case of sensitive index, only those items are included whose variables (like
prices in case of price index) fluctuate very often; while efforts are made to include as
many items as possible when the index is of general nature. It may be pointed out that the
index numbers are specialised tools and as such are more useful and efficient when properly
used. The first step in this direction is a specific definition of the purpose of its construction.
2. Selection of the base period: Every index number is constructed with reference to a base
period. There are two important points that must be kept in mind while selecting the base
period of an index number.
(a) The base period should correspond to a period of relative economic and political
stability, i.e., it should be a normal or representative period in some way. In certain
situations where identification of such a period is not possible, the average of certain
periods can also be taken as base.
(b) The comparison of current period with a remote base doesn’t have much relevance.
In the words of Morris Hamburg, “It is desirable that the base period be not too far
away in time from the present. The further away we move from the base period the
dimmer are our recollections of economic conditions prevailing at that time.
Consequently, comparisons with these remote base periods tend to lose significance
and become rather tenuous in meaning”.
Another problem with a remote base period can be that certain items that were in
use in the base period are no longer in use while certain new items are in use in
current period. In such a situation the two item bundles are no longer homogeneous
and comparable. This problem is less likely to occur when fairly recent period is
chosen as base.
!
Caution The base period should not be too distant from the current period.
3. Selection of number and type of items: An index number of a particular group of items is
in fact based on a sample of items taken from it. It is neither possible nor necessary to
include all the items of the group in the construction of an index number. The number of
items to be included depends largely upon the purpose of the index number.
There are no hard and fast rules that can be laid down with regard to the selection of the
number of items, however, it must be remembered that more is the number of items the
more representative will be the index number and more cumbersome will be the task of
computations. Therefore, it is necessary to have some sort of balance between having a
representative index and the work of computation involved in its construction.
The following points should be kept in mind in selecting the type of items: Notes
(a) The items should be representative of the tastes, habits and customs of the people
for whom the index is to be constructed.
(b) The selected items should be of stable quality. The standardised items should be
given preference.
(c) As far as possible, the non-tangible items like personal services, goodwill, etc.,
should be excluded because it is difficult to ascertain their value.
4. Collection of data: The next important step in the construction of an index number is the
collection of data. For example, for the construction of price index, price quotations are to
be obtained. Since the prices of commodities may vary from one market to another and in
certain cases from one shop to another, it is necessary to select those markets which are
representative in the sense that the group under consideration generally make purchases
from these markets. The next logical step is to select an agency through which price
quotations are to be obtained. The selected agency should be highly reliable and if necessary
the accuracy of price quotations reported by it may also be checked by appointing some
other agency or agencies. Furthermore, care should always be taken to obtain price
quotations for the same quality of items.
Similar type of considerations are necessary for the collection of data for the construction
of index numbers such as quantity index, value index, unemployment index, etc.
5. Selection of a suitable average: Since the index numbers are also averages, any of the five
averages, viz. arithmetic mean, median, mode, geometric mean and harmonic mean can
be used in its construction. However, since in most of the situations we have to average
ratios of the values in current period to that in base period, geometric mean is the most
suitable average in the construction of index numbers. The main difficulty of using the
geometric mean is the complexities of its computations and hence, the use of arithmetic
mean is more popular in spite of its being less suitable.
Self Assessment
14. The basic purpose of …………..is to enable each item to have an influence, on the index
number, in proportion to its importance in the group.
Despite the fact that index numbers are very useful for the measurement of relative changes,
these suffer from the following limitations:
1. The computation of an index number is based on the data obtained from a sample, which
may not be a true representative of the universe.
2. The composition of the bundle of commodities may be for different years. This cannot be
taken into account by the fixed base method. Although this difficulty can be overcome by
the use of chain base index numbers, but their calculations are quite cumbersome.
3. An index number doesn’t take into account the quality of the items. Since a superior item
generally has a higher price and the increase in index may be due to an improvement in
the quality of the items and not due to rise of prices.
4. Index numbers are specialised averages and as such these also suffer from all the limitations
of an average.
5. An index number can be computed by using a number of formulae and different formulae
will give different results. Unless a proper method is used, the results are likely to be
inaccurate and misleading.
6. By the choice of a wrong base period or weighing system, the results of the index number
can be manipulated and, thus, are likely to be misused.
Self Assessment
15. An index number doesn’t take into account the ……………..of the items.
16. Index number computed by using a number of formulae will give …………….results
11.9 Summary
An index number is a device for comparing the general level of magnitude of a group of
distinct, but related, variables in two or more situations
é p1 ù
ê å log p ´ 100 ú
P01 = Antilog ê 0 ú
ê n ú (using G.M.)
ê ú
ë û
å p1 ´ 100
Simple Aggregative Index P01 =
å p0
Weighted Average of Price relatives Index
P01 =
å Pw
(using weighted A.M.)
åw
Notes
w log P
P01 Antilog (using weighted G.M.)
w
p1
Here P 100 and w denotes values (weights)
p0
1 p1q0 p1q1
DB 100
(d) Dorbish and Bowley's Index P01 2 p0 q0 p0 q1
p q 0
Money Wage
Real Wage = 100
C .P .I .
1
Purchasing Power of Money 100
Price Index
11.10 Keywords
Base Year: The year from which comparisons are made is called the base year. It is commonly
denoted by writing ‘0’ as a subscript of the variable.
Consumer Price: It is the price at which the ultimate consumer purchases his goods and services
from the retailer.
Current Year: The year under consideration for which the comparisons are to be computed is
called the current year. It is commonly denoted by writing ‘1’ as a subscript of the variable.
Index Number: An index number is a statistical measure used to compare the average level of
magnitude of a group of distinct but related variables in two or more situations.
Notes Quantity Index Number: Index number that measures the change in quantities in current year as
compared with a base year.
2. From the following data, prove that Fisher's Ideal Index satisfies both the time reversal
and the factor reversal tests.
3. Examine various steps and problems involved in the construction of an index number.
4. Distinguish between average type and aggregative type of index numbers. Discuss the
nature of weights used in each case.
5. Given the following data:
(a) What was the real weekly wage for each year?
(b) In which year did the employees had the greatest buying power?
(c) What percentage increase in the average weekly wages for the year 1973 is required
to provide the same buying power that the employees enjoyed in the year in which
they had the highest real wages?
6. Construct Consumer Price Index for the year 1981 with 1971 as the base year.
Items : Food Rent Clothes Fuel Others
Percentage Expenses : 35% 15% 20% 10% 20%
Value Index (1971) : 150 50 100 20 60
Value Index (1981) : 174 60 125 25 90
7. Compute consumer price index number from the following data by aggregate expenditure. Notes
Quantities Units in
Prices in Prices in
Commodity consumed in which prices
base year current year
base year are quoted
8. A textile worker in the city of Ahmedabad earns 750 per month. The cost of living index
for January 1986 is given as 160. Using the following data find out the amounts he spends
on (i) Food and (ii) Rent.
9. "In the construction of index numbers the advantages of geometric mean are greater than
those of arithmetic mean". Discuss.
10. Show that the Laspeyres's index has an upward bias and the Paasche's index has a downward
bias. Under what conditions the two index numbers will be equal?
1. homogeneous 2. weighted
7. weights 8. price
Books Allan & Blumon, Elementary Statistics : A Step by Step Approach. McGraw-Hill
College, June 2003.
David & Moae, Introduction to the Practice of Statistics, W.H. Freeman & Co., February
2005.
James T. McClave Terry Sincich, William Mendenhall, Statistics, Prentice Hall,
February 2005.
Mario F. Triola, Elementary Statistics, Addison-Wesley, January 2006.
Mark L. Berenson, David M. Revine, Tineothy C. Krehbiel, Basic Business Statistics:
Concepts & Applications, Prentice Hall, May 2005.
CONTENTS
Objectives
Introduction
12.1 Steps Involved in Hypothesis Testing
12.1.1 Formulate the Hypothesis
12.1.2 Significance Level
12.2 Errors in Hypothesis Testing
12.3 Parametric Tests
12.3.1 One Sample Test
12.3.2 Two Sample Test
12.4 Chi-square Test
12.5 ANOVA
12.5.1 One-way ANOVA
12.5.2 Two-way ANOVA
12.6 Non-parametric Test
12.6.1 One Sample Tests
12.6.2 Two Sample Tests
12.6.3 K Sample Test
12.7 Summary
12.8 Keywords
12.9 Review Questions
12.10 Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
A statistical hypothesis test is a method of making statistical decisions using experimental data.
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance.
The phrase “test of significance” was coined by Ronald Fisher: “Critical tests of this kind may be Notes
called tests of significance, and when such tests are available we may discover whether a second
sample is or is not significantly different from the first.”
1. Formulate the null hypothesis, with H 0 and HA, the alternate hypothesis. According to the
given problem, H0 represents the value of some parameter of population.
6. If the calculated value lies within the critical region, then reject H 0.
The normal approach is to set two hypotheses instead of one, in such a way, that if one hypothesis
is true, the other is false. Alternatively, if one hypothesis is false or rejected, then the other is true
or accepted. These two hypotheses are:
1. Null hypothesis
2. Alternate hypothesis
Let us assume that the mean of the population is m0 and the mean of the sample is x. Since we
have assumed that the population has a mean of m0, this is our null hypothesis. We write this as
H0m = m0, where H0 is the null hypothesis. Alternate hypothesis is H A = m. The rejection of null
hypothesis will show that the mean of the population is not m0. This implies that alternate
hypothesis is accepted.
Having formulated the hypothesis, the next step is its validity at a certain level of significance.
The confidence with which a null hypothesis is accepted or rejected depends upon the
significance level. A significance level of say 5% means that the risk of making a wrong
decision is 5%. The researcher is likely to be wrong in accepting false hypothesis or rejecting
a true hypothesis by 5 out of 100 occasions. A significance level of say 1% means, that the
researcher is running the risk of being wrong in accepting or rejecting the hypothesis is one of
every 100 occasions. Therefore, a 1% significance level provides greater confidence to the
decision than 5% significance level.
A hypothesis test may be one-tailed or two-tailed. In one-tailed test the test-statistic for rejection
of null hypothesis falls only in one-tailed of sampling distribution curve.
Figure 12.1
Example:
1. In a right side test, the critical region lies entirely in the right tail of the sample distribution.
Whether the test is one-sided or two-sided – depends on alternate hypothesis.
2. A tyre company claims that mean life of its new tyre is 15,000 km. Now the researcher
formulates the hypothesis that tyre life is = 15,000 km.
A two-tailed test is one in which the test statistics leading to rejection of null hypothesis falls on
both tails of the sampling distribution curve as shown. One-tailed test is used when the researcher's
interest is primarily on one side of the issue.
Example: "Is the current advertisement less effective than the proposed new advertisement"?
A two-tailed test is appropriate, when the researcher has no reason to focus on one side of the
issue.
Example:
1. "Are the two markets - Mumbai and Delhi different to test market a product?"
H0 = m1 = m2
It tells the researcher the number of elements that can be chosen freely.
If the hypothesis pertains to a larger sample (30 or more), the Z-test is used. When the sample is
small (less than 30), the T-test is used.
Compute
Make Decisions
Accepting or rejecting of the null hypothesis depends on whether the computed value falls in the
region of rejection at a given level of significance.
Task Discuss when would you prefer two tailed test to one tailed test.
Self Assessment
2. The confidence with which a null hypothesis is accepted or rejected depends upon the
.............................
3. The rejection of null hypothesis means that the ............................. hypothesis is accepted.
(1) is called Type 1 error (a), (2) is called Type 2 error (b). When a = 0.10 it means that true
hypothesis will be accepted in 90 out of 100 occasions. Thus, there is a risk of rejecting a true
hypothesis in 10 out of every 100 occasions. To reduce the risk, use a = 0.01 which implies that we
are prepared to take a 1% risk i.e., the probability of rejecting a true hypothesis is 1%. It is also
possible that in hypothesis testing, we may commit Type 2 error (b) i.e., accepting a null hypothesis
which is false.
Notes The only way to reduce Type 1 and Type 2 error is by increasing the sample size.
Type 1 and Type 2 error is presented as follows. Suppose a marketing company has 2 distributors
(retailers) with varying capabilities. On the basis of capabilities, the company has grouped them
into two categories (1) Competent retailer (2) Incompetent retailer. Thus R1 is a competent
retailer and R2 is an incompetent retailer. The firm wishes to award a performance bonus (as a
part of trade promotion) to encourage good retailership. Assume that two actions A1 and A2
would represent whether the bonus or trade incentive is given and not given. This is shown as
follows:
When the firm has failed to reward a competent retailer, it has committed type-2 error. On the
other hand, when it was rewarded to an incompetent retailer, it has committed type-1 Error.
Self Assessment
If we wish to analyse one variable at a time, this is called univariate analysis. Example:
Effect of sales on pricing. Here, price is an independent variable and sales is a dependent
variable. Change the price and measure the sales.
Bivariate
The relationship of two variables at a time is examined by means of bivariate data analysis.
z Test
Example: You are working as a purchase manager for a company. The following
information has been supplied by two scooter tyre manufacturers.
Company A Company B
Mean life (in km) 13000 12000
S.D (in km) 340 388
Sample size 100 100
In the above, the sample size is 100, hence a Z-test may be used.
2. Testing the hypothesis about difference between two means: This can be used when two
population means are given and null hypothesis is H o : P1 = P2.
Example: In a city during the year 2000, 20% of households indicated that they read Femina
magazine. Three years later, the publisher had reasons to believe that circulation has gone up. A
survey was conducted to confirm this. A sample of 1,000 respondents were contacted and it was
found 210 respondents confirmed that they subscribe to the periodical 'Femina'. From the above,
can we conclude that there is a significant increase in the circulation of 'Femina'?
Solution:
210
– 0.20
1000
Z=
0.20 (1 - 0.20 )
1000
0.21 – 0.20
Z=
0.2 ´ 0.8
1000
0.01 – m
=
0.16
1000
Notes
0.1
= 0.4
31.62
0.1
= = 8.33
0.012
As the value of Z at 0.05 =1.64 and calculated value of Z falls in the rejection region, we reject null
hypothesis, and therefore we conclude that the sale of 'Femina' has increased significantly.
T-test is used in the following circumstances: When the sample size n < 30.
Example:
1. A certain pesticide is packed into bags by a machine. A random sample of 10 bags are
drawn and their contents are found as follows: 50, 49, 52, 44, 45, 48, 46, 45, 49, 45. Confirm
whether the average packaging can be taken to be 50 kgs.
In this text, the sample size is less than 30. Standard deviations are not known using this
test. We can find out if there is any significant difference between the two means i.e.
whether the two population means are equal.
2. There are two nourishment programmes 'A' and 'B'. Two groups of children are subjected
to this. Their weight is measured after six months. The first group of children subjected to
the programme 'A' weighed 44, 37, 48, 60, 41 kgs. at the end of programme. The second
group of children were subjected to nourishment programme 'B' and their weight was 42,
42, 58, 64, 64, 67, 62 kgs. at the end of the programme. From the above, can we conclude that
nourishment programme 'B' increased the weight of the children significantly, given a 5%
level of confidence.
Null Hypothesis: There is no significant difference between Nourishment programme 'A' and
'B'.
Solution:
44 -2 4 42 -15 225
37 -9 81 42 -15 225
48 2 4 58 1 1
60 14 196 64 7 49
41 -5 25 64 7 49
67 10 100
62 5 25
230 0 310 399 0 674
Notes
x-y
t=
æ 1 1ö
s2 ç + ÷
è n1 n2 ø
Here n1 = 5 n2 = 7
x = 230, y = 399
( ) ( )
2 2
x-x = 310 , y - y = 399
x 230
= = 46
x = n 5
1
y 399
y = = = 57
n2 7
s2 =
1
n1 + n2 - 2 {å (x - x) + å (y - y) }
2 2
D.F. = (n1 + n2 – 2) = (5 + 7 – 2) = 10
1
s2 = {310 + 674} = 98.4
10
46 - 57
t=
æ 1 1ö
98.4 ´ ç + ÷
è 5 7ø
-11
=
æ 12 ö
98.4 ´ ç ÷
è 35 ø
-11 11
= =-
33.73 5.8
= – 1.89
t at 10 d.f. at 5% level is 1.81.
Since, calculated t is greater than 1.81, it is significant. Hence H A is accepted. Therefore the two
nutrition programmes differ significantly with respect to weight increase.
When two samples are related we use paired t-test for judging the significance of the mean of
difference of the two related samples. It can also be used for judging the significance of the
coefficients of simple and partial correlations.
n-2
t = r yx
1 - ryx
Notes Where, (n – 2) is degrees of freedom, ryx is coefficient of correlation between x and y. The
computed value of t is compared with its table value. If the computed value is less than the table
value the null hypothesis is accepted or rejected otherwise at a given level of significance.
Example: A study of weight of 18 pairs of male and female employees in a company shows
that coefficient of correlation is 0.52. Test the significance of correlation.
Solution:
Applying t test:
n-2
t= r
1 - r2
r = 0.52, n = 18
18 - 2
t = 0.52
1 - (0.52)2
0.52 ´ 4
= = 2.44
0.854
= (n – 2) = (18 – 2) = 16
= 16, t0.05 = 2.12
The calculated value of t is greater than the table value. The given value of r is significant.
F-Test
Let there be two independent random samples of sizes n1 and n2 from two normal populations
1
with variances s12 and s 22 respectively. Further, let s1 =
2
n1 - 1
å ( X 1 i - X 1 )2 and
1
s22 =
n2 - 1
å ( X2i - X2 )2 be the variances of the first sample and the second samples respectively.
Then F - statistic is defined as the ratio of two 2 - variates. Thus, we can write
n2 - 1 ( n1 - 1)s12 s12
1 /( n1 - 1)
n-1 = s 2
s 12
F= 1
=
n
2
( n2 - 1)s 2
s22
2 -1 2
/( n2 - 1)
n2 - 1 s 2
2 s 22
Features of F- distribution
v2
2. The mean of F - variate with v1 and v2 degrees of freedom is and standard error is
v2 - 2
æ v2 ö 2( v1 + v2 – 2)
çè v – 2 ÷ø v1 ( v2 – 4)
2
We note that the mean will exist if v2 > 2 and standard error will exist if v2 > 4. Further, the Notes
mean > 1.
3. The random variate F can take only positive values from 0 to . The curve is positively
skewed.
4. For large values of 1 and 2, the distribution approaches normal distribution.
5. If a random variate follows t-distribution with degrees of freedom, then its square
follows F-distribution with 1 and d.f. i.e. t2 = F1,
( v21 )
6. F and 2 are also related as F , = as 2
1 2 1
Figure 12.2
p(F)
1 = 40, 1 = 40
1 = 30, 1 = 30
1 = 10, 1 = 10
O F
Self Assessment
A chi-square test (also chi-squared or 2 test) is any statistical hypothesis test in which the
sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is
true, or any in which this is asymptotically true, meaning that the sampling distribution (if the
null hypothesis is true) can be made to approximate a chi-square distribution as closely as
desired by making the sample size large enough.
!
Caution One case where the distribution of the test statistic is an exact chi-square
distribution is the test that the variance of a normally-distributed population has a given
value based on a sample variance. Such a test is uncommon in practice because values of
variances to test against are seldom known exactly.
1. Sample observations should be independent i.e. two individual items should be included
twice in a sample.
or
Is there any significant difference between the age group and preference for the car?
Example: A company marketing tea claims that 70% of population in a metro drinks a
particular brand (Wood Smoke) of tea. A competing brand challenged this claim. They took a
random sample of 200 families to gather data. During the study period, it was found that 130
families were using this brand of tea. Will it be correct on the part of competitor to conclude that
the claim made by the company does not holds good at 5% level of significance?
Solution:
Hypothesis H0 – People who drink Wood Smoke brand is 70%.
H0 – People who drink Wood Smoke brand is not 70%.
If the hypothesis is true then number of consumers who drink this particular brand is 200 × 0.7
= 140.
Those who do not drink that brand are 200 × 0.3 = 60
Degree of freedom = D = 2 – 1 = 1, since there are two groups.
Those who drink branded tea 130 140 -10 100 0.714
200 200 0
(0 - E )
2 = = 2.381
E
A 0.5 level of significance of for 1 d.f. is equal to 3.841 (From tables). The calculated value is 2.381
is lower. Therefore, we accept the hypothesis that 70% of the people in that metro drink Wood
Smoke branded tea.
10. For applying chi-square test , sample should contain at least ……… observations
12.5 ANOVA
ANOVA is a statistical technique. It is used to test the equality of three or more sample means.
Based on the means, inference is drawn whether samples belongs to same population or not.
2. Compare the first year earnings of graduates of half a dozen top business schools.
Consider the following pricing experiment. Three prices are considered for a new toffee box
introduced by Nutrine company. Price of three varieties of toffee boxes are 39, 44 and
49. The idea is to determine the influence of price levels on sales. Five supermarkets are
selected to exhibit these toffee boxes. The sales are as follows:
What the manufacturer wants to know is: (1) Whether the difference among the means is
significant? If the difference is not significant, then the sale must be due to chance. (2) Do the
means differ? (3) Can we conclude that the three samples are drawn from the same population
or not?
Example: In a company there are four shop floors. Productivity rate for three methods of
incentives and gain sharing in each shop floor is presented in the following table. Analyze
whether various methods of incentives and gain sharing differ significantly at 5% and 1%
F-limits.
X1 X2 X3
1 5 4 4
2 6 4 3
3 2 2 2
4 7 6 3
Solution:
Step 1: Calculate mean of each of the three samples (i.e., x 1, x2 and x3, i.e. different methods of
incentive gain sharing).
5+6+2+7
X1 = =5
4
4+3+2+3
X2 = =3
4
4+3+2+3
X3 = =3
4
X1 + X2 + X3
Step 2: Calculate mean of sample means i.e., XX =
K
5+3+3
where, K denotes Number of samples = = 4(approximated)
3
Step 3: Calculate sum of squares (s.s.) for variance between and within the samples.
Sum of squares (ss) for variance between samples is obtained by taking the deviations of the
sample means from the mean of sample means () and by calculating the squares of such deviation,
which are multiplied by the respective number of items or categories in the samples and then by Notes
obtaining their total. Sum of squares(ss) for variance within samples is obtained by taking
deviations of the values of all sample items from corresponding sample means and by squaring
such deviations and then totalling them. For our illustration then
ss between = 4(5 – 4)2 + 4 (4 – 4)2 + 4 (3 – 4)2
= 4+0+4=8
{(5 - 5)2 + (6 - 5)2 + (2 - 5)2 + (7 - 5)2 } {(4 - 4)2 + (4 - 4)2 + (2 - 4)2 + (6 - 4)2 }
ss within = + S(x 2i - x2 )2
S(x 1i - x1 ) 2
= (0 + 1 + 9 + 4) + (0 + 0 + 4 + 4) + (1 + 0 + 1 + 0)
= 14 + 8 + 2
= 24
Step 4: ss of total variance which is equal to total of s.s. between and ss within and is denoted by
formula as follows:
S(x ij - x)2
where
i = 1.23
j = 1.23
[ {(5 - 4)2 + (6 - 4)2 + (2 - 4)2 + (7 - 4)2 } + {(4 - 4)2 + (4 - 4)2 + (2 - 4)2 + (6 - 4)2 }
= {(1 + 4 + 4 + 9) + (0 + 0 + 4 + 4) + (0 + 1 + 4 + 1)}
= 08 + 8 + 6 = 32
We will, however, get the same value if we simply total respective values of ss between and ss
within. For our example, ss between is 8 and ss within is 24, thus ss of total variance is 32 (8+24).
Step 5: Ascertain degrees of freedom and mean square (MS) between and within the samples.
Degrees of freedom (df) for between samples and within samples are computed differently as
follows.
For between samples, df is (k-1), where k' represents number of samples (for us it is 3). For
within samples df is (n-k), where 'n' represents total number of items in all the samples (for us
it is 12).
Mean squares (MS) between and within samples are computed by dividing the ss between and
ss within by respective degrees of freedom. Thus for our example:
ss between 8
(i) MS between = = =4
(k - 1) 2
Notes
ss within 24
(ii) MS within = = = 2.67
(n - k) 9
Step 6: Now we will have to compute F ratio by analysing our samples. The formula for computing
ss between
'F' ratio is:
ss within
4.00
Thus for our example, F ratio = = 1.5
2.67
Step 7: Now we will have to analyze whether various methods of incentives and gain sharing
differ significantly at 5% and 1% 'F' limits. For this, we need to compare observed 'F' ratio with
'F' table values. When observed 'F' value at given degrees of freedom is either equal to or less
than the table value, difference is considered insignificant. In reverse cases, i.e., when calculated
'F' value is higher than table-F value, the difference is considered significant and accordingly we
draw our conclusion.
For example, our observed 'F' ratio at degrees of freedom (v 1* & v2**, i.e., and 9) is 1.5. The table
value of F at 5% level with df 2 and 9 (v 1 = 2, v2 = 9) is 4.26. Since the table value is higher than
the observed value, difference in rate of productivity due to various methods of incentives and
gain sharing is considered insignificant. At 1% level with df 2 and 9, we get the table value of F
as 8.02 and we draw the same conclusion.
We can now draw an ANOVA table as follows to show our entire observation.
The procedure to be followed to calculate variance is the same as it is for the one-way classification.
The example of two-way classification of ANOVA is as follows:
Suppose, a firm has four types of machines – A, B, C and D. It has put four of its workers on each
machines for a specified period, say one week. At the end of one week, the average output of
each worker on each type of machine was calculated. These data are given below:
Average Production by the Type of Machine
A B C D
Worker 1 25 26 23 28
Worker 2 23 22 24 27
Worker 3 27 30 26 32
Worker 4 29 34 27 33
Example: Company ‘X’ wants its employees to undergo three different types of training
programme with a view to obtain improved productivity from them. After the completion of
the training programme, 16 new employees are assigned at random to three training methods
and the production performance were recorded.
The training managers problem is to find out if there are any differences in the effectiveness of
the training methods? The data recorded is as under:
Daily Output of New Employees
Method 1 15 18 19 22 11
Method 2 22 27 18 21 17
Method 3 18 24 19 16 22 15
å n (x )
2
2 i i -x
3. Calculate variance between columns using the formula s =
k-1
å (x )
2
i -x
Sample variance si 2
= where n is No. of observation under each method.
n-1
7. Calculate the number of degree of freedom in the numerator F ratio using equation, d.f =
(No. of samples –1).
8. Calculate the number of degree of freedom in the denominator of F ratio using the equation
d.f = S(ni – k)
9. Refer to F table f8 find value.
10. Draw conclusions.
Notes Solution:
85 105 114
x1 = = 17, x 2 = = 21, x 3 = = 19
5 5 6
2. Grand mean
15 + 18 + 19 + 22 + 11 + 22 + 27 + 18 + 21 + 17 + 24 + 19 + 16 + 22 + 15 + 18 304
x= = = 19
16 16
n ( x – x) ( )
2 2
x x x–x n x–x
5 17 19 -2 4 5 × 4 = 20
5 21 19 2 4 5 × 4 = 20
6 19 19 0 0 6×0=0
å n (x ) = 40
2
i 1 -x
å n (x )
2
2 i i -x 40
s = = = 20
k-1 3-1
Variance between column = 20
4. Calculation sample variance:
( x - x) ( x - x) ( x - x)
2 2 2
x-x x-x x-x
15-19 (-4)2 = 16
å ( x - x ) = 70 å ( x - x ) = 62 å ( x - x ) = 60
2 2 2
å (x - x ) å (x - x ) å (x - x )
2 2 2
70 62 60
Sample variance = = , = , =
n-1 5-1 n-1 5-1 n-1 5-1
60
Notes
70 62
s12 = = 17.5 , s22 = = 15.5 , s3 = = 12
2
4 4 5
æ ni - 1 ö 2
5. Within column variance s
2
= å çè n ÷ s1
i - kø
æ 4ö æ 4ö 5
= çè ÷ø ´ 17.5 + çè ÷ø ´ 15.5 + ´ 12
13 13 13
192
Within column variance = = 14.76
13
7. d.f. of Numerator = (3 - 1) = 2.
8. d.f. of Denominator = Sn1 – k = (5 - 1) + (5 - 1) + (6 - 1) = 16 - 3 = 13.
9. Refer to table using d.f. = 2 and d.f. = 13.
10. The value is 3.81. This is the upper limit of acceptance region. Since calculated value 1.354
lies within it we can accept H0, the null hypothesis.
Conclusion: There is no significant difference in the effect of the three training methods.
Example: Let us now frame a problem to study the effects of incentive and gain sharing and
level of technology (independent variables) on productivity rate (dependent variable).
Productivity Rate Data of Workers of M/s. XYZ & Co.
Solution:
(T)2 36 ´ 36
2. Correction factor = =
n 12
= 108
3. Total ss = (16 + 9 + 9 + 25 + 9 + 4 + 1 + 1 + 1 + 36 + 25 + 4)
= 140 – 108 = 32
é 16 ´ 16 12 ´ 12 8 ´ 8 ù
= ê + + - 108
ë 4 4 4 úû
5. ss between rows:
é 10 ´ 10 10 ´ 10 3 ´ 3 13 ´ 13 ù
= ê + + + - 108
ë 3 3 3 3 úû
= 126 - 108
6. ss residual:
= 32 – (8 +18) = 6.
From the ANOVA table, we find that differences related to varieties of incentives and gain
sharing are insignificant at 5% level as the calculated F-ratio, i.e., 4 is less than table value of F,
which is 5.14. However differences are significant for different levels of technology at 5% level
as the observed F ratio is higher than table value of F. At 1% level, however, differences are
insignificant.
Self Assessment
11. ............................. is used to test the equality of three or more sample means.
Non-parametric tests are used to test the hypothesis with nominal and ordinal data.
2. The hypothesis of non-parametric test is concerned with something other than the value
of a population parameter.
3. Easy to compute. There are certain situations particularly in marketing research, where
the assumptions of parametric tests are not valid. Example: In a parametric test, we assume
that data collected follows a normal distribution. In such cases, non-parametric tests are
used. Example of non-parametric tests are Binomial test, Mann-Whitney U test, Sign test,
etc. A binomial test is used when the population has only two classes such as male, female;
buyers, non-buyers, success, failure etc. All observations made about the population must
fall into one of the two tests. The binomial test is used when the sample size is small.
Advantages
2. When data are not very accurate, these tests produce fairly good results.
Disadvantage
Non-parametric test involves the greater risk of accepting a false hypothesis and thus committing
a Type 2 error.
The following are the main examples of one sample non-parametric tests:
This test is used to examine the presence of trends. A set of numbers is said to show upward trend
if the latter numbers in the sequence are greater than the former numbers. And similarly, one
can define a downward trend. How to examine whether a trend is noticeable in a sequence?
Example: Suppose a marketer wants to examine whether its sales are showing a trend or
just fluctuating randomly. Suppose the company has gathered the monthly sales figures during
the past one year month-wise:
Month 1 2 3 4 5 6 7 8 9 10 11 12
Sales 200 250 280 300 320 278 349 268 240 318 220 380
Notes Sign-test
Sign-test is used with matched pairs. The test is used to identify the pairs and decide whether the
pair has more or less similar characteristics.
The following are the main examples of two sample non-parametric tests:
This test is used to determine whether two independent samples have been drawn from the
same population. Suppose an experiment has obtained two sets of samples from two populations
and the study wishes to examine whether the two populations are identical.
To find out whether there is any difference in the performance indices of employees of the two
branches.
Kolmogorov-Smirnov Test
This is used for examining the efficacy of fit between observed samples and expected frequency
distribution of data when the variable is in the ordinal scale.
Example: A manufacturer of cosmetics wants to test four different shades of the liquid
foundation compound – very light, light, medium and dark. The company has hired a market
research agency to determine whether any distinct preference exists towards either extreme. If
so, the company will manufacture only the preferred shade, otherwise, the company is planning
to market all shades. Suppose, out of a sample of hundred, 50 preferred “very light shade” 30
liked light shade, 15 the medium shade, and 50 dark shades. Do you think the results show any
kind of preference?
Since the shade represents ordering (rank), this test can be used to find the preference.
We can use the Mann Whitney test; when two populations are involved, the Kruskal-Wallis test
is used, when more than two populations are involved. This test will enable us to know whether
independent samples have been drawn from the same population or from different populations
having the same distribution. This test is an extension of “Mann Whitney test”.
This is a type of Rank Sum test. This test is used to find out whether two or more independent
samples are drawn from an identical population. This test is also called the H Test. Mann
Whitney test is used when only two populations are involved and Kruskal- Wallis test is used
when more than two populations are involved.
Example: In an assembling unit, three different workers do assembly work in shifts. The
data is tabulated as follows:
Check whether there is any difference in the production quantum of the three workers:
Use H-test and state whether the three populations are same or different.
Solution:
Notes
Item Wage-Painter Wage-Carpenter Wage-Plumber
/day /day /day
Rank Rank Rank
1 64 5 72 7.5 51 1
2 66 6 74 9.5 52 2
3 72 7.5 75 11 54 3
4 74 9.5 78 12 56 4
5 80 13
Total 276 R1 = 28 379 R2 = 53 213 R3 = 10
n1 = 4, n2 = 5, n3 = 4
n = n1 + n2 + n3 = 4 + 5 + 4 = 13
R1 = 28, R2 = 53, R3 = 10
12 é R1 2 ù
H= å ê
n (n + 1) ë n1 û
ú - 3(n + 1)
12 é 282 532 10 2 ù
H= å ê
13 (13 + 1) ë 4
+
5
+
4 û
ú - 3(3 + 1) = 9.61
At 5% level of significance, for d.f. = (3 - 1) = 2, the table value is 5.991. Computed value 9.61 is
greater.
Conclusion: Reject the Null hypothesis that the three populations are different.
Self Assessment
13. ............................. Test is used to determine whether two independent samples have been
drawn from the same population.
14. ............................. Test is used for examining the efficacy of fit between observed samples
and expected frequency distribution.
16. Non-parametric tests are used to test the hypothesis with ............................. and
............................. data.
18. ............................. test involves the greater risk of accepting a false hypothesis.
12.7 Summary
Hypothesis testing is the use of statistics to determine the probability that a given
hypothesis is true.
Identify a test statistic that can be used to assess the truth of the null hypothesis. Notes
Compute the P-value, which is the probability that a test statistic at least as significant as
the one observed would be obtained assuming that the null hypothesis were true.
The smaller the r -value, the stronger the evidence against the null hypothesis.
If p £ , that the observed effect is statistically significant, the null hypothesis is ruled out,
and the alternative hypothesis is valid.
12.8 Keywords
Alternate Hypothesis: An alternative hypothesis is one that specifies that the null hypothesis is
not true. The alternative hypothesis is false when the null hypothesis is true, and true when the
null hypothesis is false.
ANOVA: It is a statistical technique used to test the equality of three or more sample means.
Degree of Freedom: It is the consideration that tells the researcher the number of elements that
can be chosen freely.
Null Hypothesis: The null hypothesis is a hypothesis which the researcher tries to disprove,
reject or nullify.
Significance Level: Significance level is the criterion used for rejecting the null hypothesis.
1. What hypothesis, test and procedure would you use when an automobile company has
manufacturing facility at two different geographical locations? Each location manufactures
two-wheelers of a different model. The customer wants to know if the mileage given by
both the models is the same or not. Samples of 45 numbers may be taken for this purpose.
2. What hypothesis, test and procedure would you use when a company has 22 sales
executives? They underwent a training programme. The test must evaluate whether the
sales performance is unchanged or improved after the training programme.
3. What hypothesis, test and procedure would you use in A company has three categories of
managers:
4. Each person in a random sample of 50 was asked to state his/her sex and preferred colour.
The resulting frequencies are shown below.
Sex Female 15 6 4
A chi-square test is used to test the null hypothesis that sex and preferred colour are
independent. Will you reject at the null hypothesis 0.005 level? Why/Why not?
Notes 5. Are all employees equally prone to having accidents? To investigate this hypothesis,
Parry (1985) looked at a light manufacturing plant and classified the accidents by type and
by age of the employee.
Accident Type
Age
Sprain Burn Cut
Under 25 9 17 5
25 or over 61 13 12
A chi-square test gave a test-statistic of 20.78. If we test at a =.05, does the proportion of
sprain, cuts and burns seems to be similar for both age classes? Why/why not?
6. In hypothesis testing, if is the probability of committing an error of Type II. The power
of the test, 1 – is then the probability of rejecting H0 when HA is true or not? Why?
7. In a statistical test of hypothesis, what would happen to the rejection region if , the level
of significance, is reduced?
8. During the pre-flight check, Pilot Mohan discovers a minor problem - a warning light
indicates that the fuel gauge may be broken. If Mohan decides to check the fuel level by
hand, it will delay the flight by 45 minutes. If he decides to ignore the warning, the aircraft
may run out of fuel before it gets to Mumbai. In this situation, what would be:
9. Can the probability of a Type II error be controlled by the sample size? Why/ why not?
10. A research biologist has carried out an experiment on a random sample of 15 experimental
plots in a field. Following the collection of data, a test of significance was conducted under
appropriate null and alternative hypotheses and the P-value was determined to be
approximately .03. What does this indicate with respect to the hypothesis testing?
11. Two samples were drawn from a recent survey, each containing 500 hamlets. In the first
sample, the mean population per hamlet was found to be 100 with a S.D. of 20, while in the
second sample the mean population was 120 with a S.D. 15. Do you find the averages of the
samples to be statistically significant?
12. A simple random sample of size 100 has a mean of 15, the population variance being 25.
Find an interval estimate of the population mean with a confidence level of (i) 99% and
(ii) 95%.
13. A population consists of five numbers 2, 3, 6, 8, 11. Consider all possible samples of size
two which can be drawn with replacement from this population. Calculate the S.E. of
sample means.
14. A certain drug is claimed to be effective in curing colds; half of them were given sugar
pills. The patients’ reactions to the treatment are recorded in the following table.
Test the hypothesis that the drug is no better than the sugar pills for curing colds. (The 5 %
value of x2 for v = 2 = 5.991)
15. A random sample of 640 persons from a village provided the following information: Notes
Test whether the new drug was effective in preventing the attack of influenza.
3. alternate 4. Type 1
5. Type 2 6. bivariate
7. ratio 8. 2
9. independent 10. 50
Books Abrams, M.A, Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
R.S. Bhardwaj, Business Statistics, Excel Books, New Delhi, 2008.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.
CONTENTS
Objectives
Introduction
13.1 Multivariate Analysis
13.1.1 Multiple Regression
13.2 Discriminant Analysis
13.3 Conjoint Analysis
13.4 Factor Analysis
13.4.1 Principle Component Factor Analysis
13.4.2 Rotation in Factor Analysis
13.5 Cluster Analysis
13.6 Multidimensional Scaling (MDS)
13.7 Summary
13.8 Keywords
13.9 Review Questions
13.10 Further Readings
Objectives
After studying this unit, you will be able to:
Introduction
As the name indicates, multivariate analysis comprises a set of techniques dedicated to the
analysis of data sets with more than one variable. Several of these techniques were developed
recently in part because they require the computational capabilities of modern computers.
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which
involves observation and analysis of more than one statistical variable at a time. In design and
analysis, the technique is used to perform trade studies across multiple dimensions while taking
into account the effects of all variables on the responses of interest. Sometimes, the marketers
will come across situations, which are complex involving two or more variables. Hence, bivariate
analysis deals with this type of situation. Chi-Square is an example of bivariate analysis.
Example: The demand for television sets may depend not only on price, but also on the
income of households, advertising expenditure incurred by TV manufacturer and other similar
factors. To solve this type of problem, multivariate analysis is required.
Classification
1. Multiple regression
2. Discriminant analysis
3. Conjoint analysis
4. Factor analysis
5. Cluster analysis
6. Multidimensional scaling.
In the case of simple linear regression, one variable, say, X1 is affected by a linear combination
of another variable X2 (we shall use X1 and X2 instead of Y and X used earlier). However, if X1 is
affected by a linear combination of more than one variable, the regression is termed as a
multiple linear regression.
Let there be k variables X1, X2 ...... Xk, where one of these, say Xj, is affected by the remaining k –
1 variables. We write the typical regression equation as
Xjc = aj×1, 2, .... j–1, j + 1, .... k + bj 1.2,3, .... j –1, j + 1, ....kX1 + bj 2.1, 3, .... j – 1, j + 1, ....kX2 +......(j = 1, 2,.... k).
Here aj.1,2, .... , bj1.2, 3, .... ...... etc. are constants. The constant aj.1,2, .... is interpreted as the value of Xj
when X2, X3, ..... Xj-1, Xj + 1 ..... Xk are all equal to zero. Further, bj1.2,3, .... j–1, j + 1, ....k , bj2.1,3, .... j –1, j +1, ....k etc.,
are (k – 1) partial regression coefficients of regression of Xj on X1, X2 ...... Xj – 1, Xj + 1 ...... Xk.
For simplicity, we shall consider three variables X1, X2 and X3. The three possible regression
equations can be written as
X1c = a1.23 + b12.3X2 + b13.2X3 .... (1)
X2c = a2.13 + b21.3X1 + b23.1X3 .... (2)
X3c = a3.12 + b31.2X1 + b32.1X2 .... (3)
Given n observations on X1, X2 and X3, we want to find such values of the constants of the
å (X )
n 2
regression equation so that ij - Xijc , j = 1, 2, 3, is minimised.
i=1
For convenience, we shall use regression equations expressed in terms of deviations of variables
from their respective means. Equation (1), on taking sum and dividing by n, can be written as
åX 1c
= a1.23 + b12.3
åX 2
+ b13.2
åX 3
n n n
Notes
or X 1 = a1.23 + b12.3X 2 + b13.2 X 3 .... (4)
( )
X 1c - X 1 = b12.3 X 2 - X 2 + b13.2 X 3 - X 3 ( ) or x1c = b12.3x 2 + b13.2 x 3 .... (5)
Notes The subscript of the coefficients preceding the dot are termed as primary subscripts
while those appearing after it are termed as secondary subscripts. The number of secondary
subscripts gives the order of the regression coefficient, e.g., b12.3 is regression coefficient of
order one, etc.
Let us first estimate the coefficients of regression equation (5). Given n observations on each of
the three variables X1, X2 and X3, we have to find the values of the constants b12.3 and b13.2X3 so
that is minimised. Using method of least squares, the normal equations can be written as
(å x x )(å x ) - (å x x )(å x x )
1 2
2
3 1 3 2 3
b12.3 = .... (10)
(å x )(å x ) - (å x x )
2
2 2
2 3 2 3
b13.2 =
(å x x )(å x ) - (å x x )(å x x )
1 3
2
2 1 2 2 3
.... (11)
(å x )(å x ) - (å x x )
2
2 2
2 3 2 3
Notes
1. Various sums of squares and sums of products of deviations, used above, can be
2. The fact that a regression coefficient is independent of change of origin can also be Notes
utilised to further simplify the computational work.
3. The regression coefficients of equations (2) and (3) can be written by symmetry as
given below:
(å x x )(å x ) - (å x x )(å x x )
2 1
2
3 2 3 1 3
(å x )(å x ) - (å x x )
b21.3 = 2 2
2
1 3 1 3
(å x x )(å x ) - (å x x )(å x x )
2 3
2
1 2 1 1 3
b23.1 =
(å x )(å x ) - (å x x )
2
2 2
1 3 1 3
Further, b31.2 = b13.2 and b32.1 = b23.1 and the expressions for the constant terms are
a2.13 = X 2 - b21.3X 1 - b23.1X 3 and a3.12 = X 3 - b31.2 X 1 - b32.1X 2 respectively.
Example: Fit a linear regression of rice yield (X 1 quintals) on the use of fertiliser
(X2 kgs per acre) and the amount of rain fall (X3 inches), from the following data:
X1 45 50 55 70 75 75 85
X2 25 35 45 55 65 75 85
X3 31 28 32 32 29 27 31
From the above table we compute the following sums of product and sums of squares:
( å X1 )( å X2 ) 455 ´ 385
å x1x2 = å X 1 å X 2 – = 26925 – = 1900
n 7
( å X1 )( å X3 ) 455 ´ 210
åx1x3 = SX1 X 3 – = 13630 – = –20
n 7
Notes
( å X2 )( å X3 ) 385 ´ 210
åx2x3 = SX 2 X 3 – = 11500 – = –50
n 7
( å X 2 )2 3852
å x 22 = SX 2 – = 23975 – = 2800
2
n 7
( å X 3 )2 210 2
å X 23 = SX 3 – = 6324 – = 24
2
n 7
Substituting these values in equations (10) and (11), we get
(
Note: Since SUi = 0, ui = Ui – U = Ui, i = 1, 2, 3. U i = 0 )
1900 ´ 24 - (- 20 )(- 50 )
= 0.689
2800 ´ 24 - (- 50 )
Hence b12.3 = 2
X1 =
åU 1
+ 65 = 65, X 2 =
åU 2
+ 55 = 55 and X 3 =
åU 3
+ 30 = 30
n n n
!
Caution The above method should be used when mean of all the variables are integers.
Alternative Method
The coefficients of the regression equation X1c = a1.23 + b12.3X2 + b13.2X3 can also be obtained by
simultaneously solving the following normal equations:
SX1 = na1.23 + b12.3SX2 + b13.2SX3
SX1X2 = a1.23SX2 + b12.3SX22 + b13.2SX2X3
SX1X3 = a1.23SX3 + b12.3SX2X3 + b13.2SX32
Self Assessment
3. Those who go to Food World to buy and those who buy in a Kirana shop.
Suppose there is a comparison between the groups mentioned as above along with demographic
and socio-economic factors, then discriminant analysis can be used. One way of doing this is to
proceed and calculate the income, age, educational level, so that the profile of each group could
be determined. Comparing the two groups based on one variable alone would be informative
but it would not indicate the relative importance of each variable in distinguishing the groups.
This is because several variables within the group will have some correlation which means that
one variable is not independent of the other.
If we are interested in segmenting the market using income and education, we would be
interested in the total effect of two variables in combinations, and not their effects separately.
Further, we would be interested in determining which of the variables are more important or
Notes had a greater impact. To summarize, we can say, that Discriminant Analysis can be used when
we want to consider the variables simultaneously to take into account their interrelationship.
Like regression, the value of dependent variable is calculated by using the data of independent
variable.
Z = b1x1 + b2x2 + b3x3 + ..............
Z = Discriminant score
b1 = Discriminant weight for variable
x = Independent variable
As can be seen in the above, each independent variable is multiplied by its corresponding
weightage.
This results in a single composite discriminant score for each individual. By taking the average
of discriminant score of the individuals within a certain group, we create a group mean. This is
known as centroid. If the analysis involves two groups, there are two centroids. This is very
similar to multiple regression, except that different types of variables are involved.
Application
A company manufacturing FMCG products introduces a sales contest among its marketing
executives to find out “How many distributors can be roped in to handle the company’s product”.
Assume that this contest runs for three months. Each marketing executive is given target regarding
number of new distributors and sales they can generate during the period. This target is fixed
and based on the past sales achieved by them about which, the data is available in the company.
It is also announced that marketing executives who add 15 or more distributors will be given a
Maruti Omni-van as prize. Those who generate between 5 and 10 distributors will be given a
two-wheeler as the prize. Those who generate less than 5 distributors will get nothing. Now
assume that 5 marketing executives won a Maruti van and 4 won a two-wheeler.
The company now wants to find out, “Which activities of the marketing executive made the
difference in terms of winning a prize and not winning the prize”. One can proceed in a number
of ways. The company could compare those who won the Maruti van against the others.
Alternatively, the company might compare those who won, one of the two prizes against those
who won nothing. It might compare each group against each of the other two.
Discriminant analysis will highlight the difference in activities performed by each group
members to get the prize. The activity might include:
1. What variable discriminates various groups as above; the number of groups could be two
or more? Dealing with more than two groups is called Multiple Discriminant Analysis
(M.D.A.).
2. Can discriminating variables be chosen to forecast the group to which the brand/person/
place belong to?
3. Dialogue box will appear. Select the GROUPING VARIABLE. This can be done by clicking
on the right arrow to transfer them from the variable list on the left to the grouping
variable box on the right.
4. Define the range of values by clicking on DEFINE RANGE. Enter Minimum and Maximum
value then click CONTINUE.
5. Select all the independent variable for discriminant analysis from the variable list by
clicking on the arrow that transfers them to box on the right.
6. Click on STATISTICS on the lower part of main dialogue box. This will open up a smaller
dialogue box.
7. Click on CLASSIFY on the lower part of the main dialogue box select SUMMARY TABLE
under the heading DISPLAY in a small dialogue box that appears.
Self Assessment
5. If the discriminant analysis involves two groups, there are ....................... centroids.
Conjoint analysis is concerned with the measurement of the joint effect of two or more attributes
that are important from the customers’ point of view. In a situation where the company would
like to know the most desirable attributes or their combination for a new product or service, the
use of conjoint analysis is most appropriate.
Example: An airline would like to know, which is the most desirable combination of
attributes to a frequent traveller: (a) Punctuality (b) Air fare (c) Quality of food served on the
flight and (d) Hospitality and empathy shown.
Conjoint Analysis is a multivariate technique that captures the exact levels of utility that an
individual customer places on various attributes of the product offering. Conjoint Analysis
enables a direct comparison,
Example: A comparison between the utility of a price level of 400 versus 500, a delivery
period of 1 week versus 2 weeks, or an after-sales response of 24 hours versus 48 hours.
Once we know the utility levels for each attribute (and at individual levels as well), we can
combine these to find the best combination of attributes that gives the customer the highest
utility, the second best combination that gives the second highest utility, and so on. This
information is then used to design a product or service offering.
Notes Application
Conjoint Analysis is extremely versatile and the range of applications includes virtually in any
industry. New product or service design, including the concepts in the pre-prototyping stage
can specifically benefit from the conjoint applications.
Some examples of other areas where this technique can be used are:
1. Designing an automobile loan or insurance plan in the insurance industry,
2. Designing a complex machine for business customers.
Process
Design attributes for a product are first identified. For a shirt manufacturer, these could be
design such as designer shirts vs plain shirts, this price of 400 versus 800. The outlets can have
exclusive distribution or mass distribution. All possible combinations of these attribute levels
are then listed out. Each design combination will be ranked by customers and used as input data
for Conjoint Analysis. Then the utility of the products relative to price can be measured.
The output is a part-worth or utility for each level of each attribute. For example, the design may
get a utility level of 5 and plain, 7.5. Similarly, the exclusive distribution may have a part utility
of 2, and mass distribution, 5.8. We then put together the part utilities and come up with a total
utility for any product combination we want to offer, and compare that with the maximum
utility combination for this customer segment.
This process clarifies to the marketer about the product or service regarding the attributes that
they should focus on in the design.
If a retail store finds that the height of a shelf is an important attribute for selling at a particular
level, a well-designed shelf may result from this knowledge. Similarly, a designer of clocks will
benefit from knowing the utility attached by customers to the dial size, background colours, and
price range of the clocks.
Approach
From a discussion with the client, identify the design attributes to be studied and the levels at
which they can be offered. Then build a list of product concepts on offer. These product concepts
are then ranked by customers. Once this data is available, use Conjoint Analysis to derive the
part utilities of each attribute level. This is then used to predict the best product design for the
given customer segment. Use the SPSS Conjoint procedure to analyse the data.
There are three steps in conjoint analysis:
1. Identification of relevant products or service attributes.
2. Collection of data.
3. Estimation of worth for the attribute chosen.
For attributes selection, the market researcher can conduct interview with the customers directly.
1. Weight (3 Kg or 5 Kg)
SPSS commands for conjoint Analysis. A data file is to be created containing all possible attribute
combination.
1. Ask each of the respondent to rank all the combination of attributes contained in the file.
This is nomenclated at DATA FILE 1. All the rankings should be entered in another file
called DATA FILE 2.
2. Now 2 files namely DATA FILE 1 and DATA FILE 2 are created.
3. A third file called SYNTAX file is to be opened. By using the FILE, OPEN command
followed by syntax.
4. Type the following - conjoint plan = DATA FILE 1 SAV/DATA' DATA FILE 2 SAV/
SCORES=SCORE 1 to Score number of ranking/FACTOR VARI (DISCRETE)/PLOT ALL
(Here 25 is the possible combination of attributes). Score is the term used for rankings. The
no of scores will be equal to number of rankings. We should use the word RANK in the
syntax instead of scores if Rankings are contained in the data file.
5. Click RUN from the menu of the syntax file that was created click all in the menu which
appears on the screen. If the syntax is correct, the output for conjoint will appear.
Combination Rank
3 Kg, 2 hours, Lenovo 4
5 Kg, 4 hours, Dell 5
5 Kg, 2 hours, Lenovo 8
3 Kg, 4 hours, Lenovo 3
3 Kg, 2 hours, Dell 2
5 Kg, 4 hours, Lenovo 7
5 Kg, 2 hours, Dell 6
3 Kg, 4 hours, Dell 1
One combination 3 kg, 4 hours, Dell clearly dominates and 5 kg, 2 hours, Lenovo is least
preferred.
Let us now take the average rank for 3 kg option = 4 + 3 + 2 + 1/4 = 2.5
Looking at the difference in average ranks, the most important characteristic to this
respondent is weight = 4, followed by brand name = 2 and battery life = 1.
6. ....................... analysis is concerned with the measurement of the joint effect of two or more
attributes.
7. For ....................... selection, the market researcher can conduct interview with the customers
directly.
The main purpose of Factor Analysis is to group large set of variable factors into fewer factors.
Each factor will account for one or more component. Each factor a combination of many variables.
There are two most commonly employed factor analysis procedures or methods. They are:
Example: Common factor – Inconvenience inside a car. The components may be:
1. Leg room
2. Seat arrangement
Method: The MR manager prepares a questionnaire to study the customer feedback. The researcher
has identified six variables or factors for this purpose. They are as follows:
3. Comfort (C)
6. Price (F)
The questionnaire may be administered to 5,000 respondents. The opinion of the customer is
gathered. Let us allot points 1 to 10 for the variables factors A to F. 1 is the lowest and 10 is the
highest. Let us assume that application of factor analysis has led to grouping the variables as
follows:
F into Factor -2
C into Factor - 3
For future analysis, while conducting a study to obtain customers’ opinion, three factors
mentioned above would be sufficient. One basic purpose of using factor analysis is to reduce the
number of independent variables in the study. By having too many independent variables, the
M.R study will suffer from following disadvantages:
1. Time for data collection is very high due to several independent variables.
The results provide information which is similar in nature to those produced by Factor Analysis
techniques, and they allow one to explore the structure of categorical variables included in the
table. The most common kind of table of this type is the two-way frequency cross-tabulation
table.
Example: Following are the data on the drinking habits of different employees in an
organization:
Drinking Habits
(2) (3)
Employee Group (1) None (4) Heavy Row Totals
Light Medium
(1) Senior Level Management 5 2 4 3 14
(2) Middle Level Management 4 2 5 9 20
(3) Junior Level Management 15 12 10 5 42
(4) Executives 25 20 30 15 90
(5) Other Employees 30 5 10 5 50
Column Totals 79 41 59 37 216
One may think of the 4 column values in each row of the table as coordinates in a 4-dimensional
space, and one could compute the (Euclidean) distances between the 5 row points in the 4-
Notes dimensional space. The distances between the points in the 4-dimensional space summarize all
information about the similarities between the rows in the table above. Now suppose one could
find a lower-dimensional space, in which to position the row points in a manner that retains all,
or almost all, of the information about the differences between the rows. You could then present
all information about the similarities between the rows (types of employees in this case) in a
simple 1, 2, or 3-dimensional graph. While this may not appear to be particularly useful for
small tables like the one shown above, one can easily imagine how the presentation and
interpretation of very large tables (e.g., differential preference for 10 consumer items among
100 groups of respondents in a consumer survey) could greatly benefit from the simplification
that can be achieved via correspondence analysis (e.g., represent the 10 consumer items in a two-
dimensional space).
Rotation is the step in factor analysis that permits you to identify meaningful factor names or
descriptions like these.
To identify with rotation, first consider a problem that doesn’t involve factor analysis. Suppose
you want to predict the grades of college students (all in the same college) in many dissimilar
courses, from their scores on general “verbal” and “math” skill tests. To build up predictive
formulas, you have a body of past data consisting of the grades of numerous hundred previous
students in these courses, plus the scores of those students on the math and verbal tests. To
predict grades for present and future students, you might use these data from past students to fit
a series of two-variable multiple regressions, each regression forecasting grade in one course
from scores on the two skill tests.
At present suppose a co-worker suggests summing each student’s verbal and math scores to
obtain a composite “academic skill” score I’ll call AS, and taking the difference among each
student’s verbal and math scores to obtain a second variable I’ll call VMD (verbal-math difference).
The co-worker advises running the same set of regressions to predict grades in individual
courses, except using AS and VMD as predictors in each regression, instead of the original verbal
and math scores. In this instance, you would get exactly the same predictions of course grades
from these two families of regressions: one predicting grades in individual courses from verbal
and math scores, the other predicting the identical grades from AS and VMD scores. In fact, you
would get the same predictions if you formed composites of 3 math + 5 verbal and 5 verbal + 3
math, and ran a series of two-variable multiple regressions forecasting grades from these two
composites. These examples are all linear functions of the original verbal and math scores.
The vital point is that if you have m predictor variables, and you replace the m original predictors
by m linear functions of those predictors, you usually neither gain nor lose any information—
you could if you wish use the scores on the linear functions to rebuild the scores on the original
variables. But multiple regression uses whatever information you have in the optimum way (as
measured by the sum of squared errors in the current sample) to forecast a new variable (e.g.
grades in a particular course). Since the linear functions contain the same information as the
original variables, you get the similar predictions as before.
Specified that there are lots of ways to get exactly the same predictions, is there any advantage
to using one set of linear functions rather than another? Yes there is; one set might be simpler
than another. One particular pair of linear functions may enable many of the course grades to be
forecasted from just one variable (that is, one linear function) rather than from two. If we regard
regressions with less predictor variables as simpler, then we can ask this question: Out of all the
possible pairs of predictor variables that would give the same predictions, which is simplest to Notes
use, in the logic of minimizing the number of predictor variables needed in the typical regression?
The pair of predictor variables maximising some measure of minimalism could be said to have
simple structure. In this example involving grades, you might be able to predict grades in some
courses correctly from just a verbal test score, and predict grades in other courses accurately
from just a math score. If so, then you would have achieved a “simpler structure” in your
predictions than if you had used both tests for each and every predictions.
The points of the preceding section are relevant when the predictor variables are factors. Think
of the m factors F as a set of independent or predictor variables, and imagine of the p observed
variables X as a set of dependent or criterion variables. Think a set of p multiple regressions,
each predicting one of the variables from all m factors. The standardized coefficients in this set of
regressions structure a p x m matrix called the factor loading matrix. If we replaced the original
factors by a set of linear functions of those factors, we would get just the same predictions as
before, but the factor loading matrix would be different. So we can ask which, of the many
possible sets of linear functions we might use, produces the simplest factor loading matrix.
Specially we will define simplicity as the number of zeros or near-zero entries in the factor
loading matrix—the more zeros, the simpler the structure. Rotation does not alter matrix C or
U at all, but does transform the factor loading matrix.
In the intense case of simple structure, each X-variable will have merely one large entry, so that
all the others can be ignored. But that would be a simpler structure than you would usually
expect to achieve; after all, in the real world each variable isn’t in general affected by only one
other variable. You then name the factors subjectively, based on an examination of their loadings.
In common factor analysis the procedure of rotation is in fact somewhat more abstract that I
have implied here, since you don’t actually know the individual scores of cases on factors.
However, the statistics for a multiple regression that is mainly relevant here—the multiple
correlation and the standardized regression slopes—can all be calculated just from the correlations
of the variables and factors involved. So we can base the calculations for rotation to simple
structure on just those correlations, devoid of using any individual scores.
A rotation which necessitates the factors to remain uncorrelated is an orthogonal rotation, while
others are oblique rotations. Oblique rotations regularly achieve greater simple structure, though
at the cost that you have to also consider the matrix of factor intercorrelations when interpreting
results. Manuals are usually clear which is which, but if there is ever any ambiguity, a simple
rule is that if there is any capability to print out a matrix of factor correlations, then the rotation
is oblique, as no such capacity is needed for orthogonal rotations.
Self Assessment
9. When the objective is to summarise information from a large set of variables into fewer
factors, ....................... analysis is used.
Process
There are two ways in which Cluster Analysis can be carried out:
The above two are basic approaches used in cluster analysis. This can be used to segment
customer groups for a brand or product category, or to segment retail stores into similar groups
based on selected variables.
Interpretation of Results
Ideally, the variables should be measured on an interval or ratio scale. This is because the
clustering techniques use the distance measure to find the closest objects to group into a cluster.
An example of its use can be clustering of towns similar to each other which will help decide
where to locate new retail stores.
If clusters of customers are found based on their attitudes towards new products and interest in
different kinds of activities, an estimate of the segment size for each segment of the population
can be obtained, by looking at the number of objects in each cluster.
Marketing strategies for each segment are fine-tuned based on the segment characteristics. For Notes
instance, a segment of customers, like sports car, get a special promotional offer during specific
period.
Did u know? Names can also be given to clusters to describe each one. For example, there
can be a cluster called “neo-rich”. Segments are prioritised based on their estimated size.
The example below shows Cluster Analysis based on three dimensions age, income and family
size. Cluster Analysis is used to segment the car-buying population in a Metro. For example “A”
might represent potential buyers of low end cars. Example: Maruti 800 (for common man).
These are people who are graduating from the two-wheeler market segment. Cluster “B” may
represent mid-population segment buying Zen, Santro, Alto etc. Cluster “C” represents car
buyers, who belong to upper strata of society. Buyers of Lancer, Honda city etc. Cluster “D”
represents the super-rich cluster, i.e. Buyers of Benz, BMW, etc.
Income
B
A
Age
Family size
Example: Suppose there are five attributes, 1 to 5, on which we are judging two objects A and
B. The existence of an attribute may be indicated by 1 and its absence by 0. In this way, two
objects are viewed as similar if they share common attributes.
Notes
Attribute 1 2 3 4 5 6 7
Brand - A 1 0 0 1 0 0 1
Brand - B 0 0 1 1 1 0 0
a+d
S=
a+b+c+d
Where
a = No. of attributes possessed by brands A and B
b = No. of attributes possessed by brand A but not by brand B
c = No. of attributes possessed by brand B but not by brand A
d = No. of attributes not possessed by both brands.
1+2 3
Substituting, we get S= = = 0.43
1+2+2+2 7
It is now clear that object A possess attributes 1, 4, and 7 while object B possess the attributes 3,
4 and 5. A glance at the above table will indicate that objects A and B are similar in respect of 2
(0 & 0), 6 (0 & 0) and 4 (1 & 1). In respect of other attributes, there is no similarity between A and
B. Now we can arrive at a simple matching measure by (a) counting up the total number of
matches - either 0, 0 or 1, (b) dividing this number by the total number of attributes.
Stage 1
Enter the input data along with variable and value labels in an SPSS file.
3. Dialogue box will appear select all the variables which are required to be used in cluster
analysis. This can be done by clicking on the right arrow to transfer them from the variable
list on the left.
4. Click on METHOD. The dialogue box will open. Choose "Between Groups Linkage" as the
CLUSTER METHOD.
6. Click STATISTICS on the main dialogue box. Choose "Agglomeration schedule" so that it
will appear in the final output click CONTINUE.
7. Choose DENDROGRAM then on the box called ICICLE, Choose "All Clusters" and "Vertical". Notes
8. Click OK on the main dialogue box to get the output of the hierarchical cluster analysis.
Stage 2
This stage is used to know how many clusters are required. This stage is called K- MEANS
CLUSTERING.
2. Fill in the desired number of clusters that has been identified from stage 1.
3. Click OPTIONS on the main dialogue box. Select "Initial Cluster Centers". Then click
CONTINUE to return to the main dialogue box.
4. Click OK on the main dialogue box to get the output which has final clusters.
Self Assessment
12. ....................... Analysis is a technique used for classifying objects into groups.
13. The ....................... application of cluster analysis is in customer segmentation and estimation
of segment sizes.
In addition to fulfilling the goals of detecting underlying structure and data reduction that is
shares with other methods, multidimensional scaling (MDS) provides the researcher with a
spatial representation of data that can facilitate interpretation and reveal relationships. Therefore,
we can define MDS as “a set of multivariate statistical methods for estimating the parameters in
and assessing the fit of various spatial distance models for proximity data.”
The spatial display of data provided by MDS is why it is also sometimes referred to as perceptual
mapping. MDS has much more flexibility about the types of data that can be used to generate the
solution. Almost any measures of similarity and dissimilarity can be used, depending on what
your statistical computer software will accept.
Types of MDS
1. Metric
2. Non-metric
Metric MDS makes the assumption that the input data is either ratio or interval data, while the
non-metric model requires simply that the data be in the form of ranks. Therefore, the non-
metric model has more fewer restrictions than the metric model, but also less rigor. One technique
to use if you are unsure whether your data is ordinal or can be considered interval is to try both
metric and non-metric models. If the results are very close, the metric model may be used.
An advantage of the non-metric models is that they permit the researcher to categorize and
examine preference data, such as the kind obtained in marketing studies or other areas where
comparisons are useful.
Another technique, correspondence analysis, can work with categorical data, i.e., data at the
nominal level of measurement, however that technique will not be described here.
Notes
Example: Let us say that you have a matrix of distances between a number of major cities,
such as you might find on the back of a road map. These distances can be used as the input data
to derive an MDS solution. When the results are mapped in two dimensions, the solution will
reproduce a conventional map, except that the MDS plot might need to be rotated so that the
north-south and east-west dimensions conform to expectations. However, the once the rotation
is completed, the configuration of the cities will be spatially correct.
Self Assessment
14. An advantage of the non-metric models is that they permit the researcher to .......................
and ....................... preference data.
15. The spatial display of data provided by MDS is also sometimes referred to as ………………..
13.7 Summary
Some of the multi variate analysis are discriminant analysis, Factor analysis, Cluster
analysis, conjoint analysis, and multi dimensional scaling.
In discriminant analysis, it is verified whether the 2 groups differ from one another.
Factor analysis is used to reduce large no of various factors into fewer variables cluster
analysis is used to segmenting the market or to identify the target group.
Regression is a term used for predicting the value of one variable from the other.
MDS as a set of multivariate statistical methods for estimating the parameters in and Notes
assessing the fit of various spatial distance models for proximity data.
The output of MDS looks very similar to that of factor analysis and the determination of
the optimal number of dimensions is handled in much the same way.
13.8 Keywords
Cluster Analysis: Cluster Analysis is a technique used for classifying objects into groups.
Conjoint Analysis: Conjoint analysis is concerned with the measurement of the joint effect of
two or more attributes that are important from the customers’ point of view.
Discriminant Analysis: In this analysis, two or more groups are compared. In the final analysis,
we need to find out whether the groups differ one from another.
Factor Analysis: Factor Analysis is the analysis whose main purpose is to group large set of
variable factors into fewer factors.
Multivariate Analysis: In multi variate analysis, the number of variables to be tackled are
many.
1. Which technique would you use to measure the joint effect of various attributes while
designing an automobile loan and why?
2. Do you think that the conjoint analysis will be useful in any manner for an airline? If yes
how, if no, give an example where you think the technique is of immense help.
4. Which analysis would you use in a situation when the objective is to summarise information
from a large set of variables into fewer factors? What will be the steps you would follow?
5. Which analysis would answer if it is possible to estimate the size of different groups?
6. Which analysis would you use to compare a good, bad and a mediocre doctor and why?
8. Which multivariate analysis would you apply to identify specific customer segment for a
company’s brand and why?
Variable Load
CGPA 0.60 x F
This table tells us communication skill score loads highly on intelligence factor of
management students, followed by problem solving skills and CGPA. These loads or
Notes weights are correlations, i.e., the correlations between communication skills and the factor.
But here we have only three variables and only one factor. In real life we may have many
variables and more factors. Whatever may be the case, the basic ideas remain the same.
Suppose we want to recruit management trainees from the campus and as a selection
process, we need to consider the following variables.
X1 = CGPA
X3 = Communication Skills
X5 = GD Score
12. People have been rated on their suitability for an advanced training course in computer
programming on the basis of six ratings given by their manager (rated 1=low to 20=high):
(a) Intellect
(b) Interest in doing the course
The training department believe that these are really measuring only three things; intellect,
computer programming experience and loyalty, and want you to carry out a factor analysis
to explore that hypothesis. Describe the decisions you would have to make in carrying out
a factor analysis and what the results would be likely to tell you.
13. Six observations on two variables are available, as shown in the following table:
Obs. X1 X2
a 3 2
b 4 1
c 2 5
d 5 2
e 1 6
f 4 2
(a) Plot the observations in a scatter diagram. How many groups would you say there
are, and what are their members?
(b) Apply the nearest neighbor method and the squared Euclidean distance as a measure
of dissimilarity. Use a dendrogram to arrive at the number of groups and their
membership.
14. Six observations on two variables are available, as shown in the following table: Notes
Obs. X1 X2
a -1 -2
b 0 0
c 2 2
d -2 -2
e 1 -1
f 1 2
(a) Plot the observations in a scatter diagram. How many groups would you say there
are, and what are their members?
(b) Apply the nearest neighbor method and the Euclidean distance as a measure of
dissimilarity.
CONTENTS
Objectives
Introduction
14.1 Characteristics of Research Report
14.1.1 Substantive Characteristics
14.1.2 Semantic Characteristics
14.2 Significance of Report Writing
14.3 Techniques and Precautions of Interpretation
14.3.1 Basic Analysis of "Quantitative" Information
14.3.2 Basic Analysis of "Qualitative" Information
14.3.3 Interpreting Information
14.3.4 Precautions
14.4 Types of Report
14.4.1 Oral Report
14.4.2 Written Report
14.4.3 Distinguish between Oral and Written Report
14.5 Preparation of Research Report
14.5.1 How to Write a Bibliography?
14.6 Style, Layout and Precautions of the Report writing
14.6.1 Style of Report Writing
14.6.2 Layout of the Report
14.6.3 Precautions in Report Writing
14.7 Summary
14.8 Keywords
14.9 Review Questions
14.10 Further Readings
Objectives
After studying this unit, you will be able to:
Introduction Notes
A report is a very formal document that is written for a variety of purposes, generally in the
sciences, social sciences, engineering and business disciplines. Generally, findings pertaining to
a given or specific task are written up into a report. It should be noted that reports are considered
to be legal documents in the workplace and, thus, they need to be precise, accurate and difficult
to misinterpret.
There are three features that, together, characterize report writing at a very basic level: a predefined
structure, independent sections, and reaching unbiased conclusions.
Predefined structure: Broadly, these headings may indicate sections within a report, such
as an introduction, discussion, and conclusion.
Characteristics feature is an integral part of the report. There is no hard and fast rule for preparing
a research report. The research report will differ based on the need of the particular managers
using the report. The report also depends on the philosophy of the researcher.
Example: A report prepared for a government agency will be different from the one prepared
for a private organization.
In spite of the fact that, marketing report is influenced by the researcher, there are certain
characteristics which the report should possess, if it is to be effectively communicated. These
characteristics can be classified as:
i. Substantive characteristics
Accuracy
Currency
Sufficiency
Availability
Relevancy
The more that the report possesses the above characteristics, the greater is its practical value in
decision making.
Accuracy: Accuracy refers to the degree to which information reflects reality. Specifically, research
report must accurately present both research procedure and research results. Even if the research
results are not as per the expectation of the management, the researcher has the professional
Notes obligation to present the findings accurately and objectively. Less accurate report means, injustice
to the management.
Currency: Currency refers to the time span between completion of the research project and
presentation of the research report to management. If the management receives the research
report too late, the results are no longer valid due to environmental changes, and then the report
will have no or little value for decision making. Currency is one of the reasons for orally or
informally communicating preliminary research results to management to ensure timely decision
making.
Sufficiency: The research report must have sufficient details, so that important and valid decision
can be made. Sometimes the sample size, sample representativeness may act as a constraint for
sufficient details not being available.
Example: Data required by the management, say segment wise market, whereas overall
market data is available.
Notes A research report must document methodology and techniques used so that an
assessment can be made regarding validity, reliability and generalizability. Therefore,
sufficiency refers to whether enough information is present in the research report to
enable the manager to take valid decision.
It should be remembered that sufficiency characteristic does not mean that all possible research
project information must be incorporated in the research report. A researcher should include in
a report only that information, which is necessary to convey complete perspective of the research
project.
Availability: The fourth important characteristic of research report is that, it is available to the
appropriate decision maker when they need it. Availability refers to the communication process
between researcher and the decision maker. We use the word 'appropriate decision maker' to
emphasize the fact "who should or who should not have access to the report". This decision is
made by the management, and it is the duty of the researcher to carry out this decision. Most
reports carry confidential information. Therefore, it is necessary to restrict the report availability,
to individuals as well as outside of an organization to prevent the competitor from having
access to it.
Relevancy: The research report should be confined to the decision issue researched. Sometimes
the researcher might include some information, which he thinks is interesting, but may not
have any relevance. This type of information should be excluded from the report.
Semantic characteristics are equally important in report. The report should be grammatically
correct. It should be free from spelling and typing errors. This will ensure that there is no
ambiguity or misunderstanding. Assistance of a proof reader, other than the researcher would
be required to eliminate the above errors.
v. Language of the report must be simple. For example, sentences like "illumination must be
extinguished when premises are not in use" can be expressed in simple words say "switch
off the lights when you leave".
vi. Avoid using 'I' 'we'. The report should be more impersonal.
vii. Sometimes, the current research uses the data of research conducted in the past. In this case
it is better to use past tense than present tense.
The following are the hindrances for clarity of any research report.
Ambiguity
Jargon
Misspelled words
Excessive prediction
Improper punctuation
Unfamiliar words
Clerical error
Some of the illustrations that can cause inaccuracy in report writing are given below:
Addition/subtraction error: Assume that a survey was conducted to ascertain the income
of various strata of population in a city. Suppose, it is found that 15% belong to super rich,
18% belong to rich class, 61% belong to middle class.
By oversight the total is recorded as (15+61+18) which is not equal to hundred. This error
can be corrected easily by the researcher. This type of error leads to confusion because the
reader or decision maker does not know which categories are left out (may be lower
middle class and lower class).
Confusion between percentage and percentage points: Suppose the report indicates that
raw material cost of a product as a percentage of total cost increased from 8 percentage
points in 2003 to 10 percentage points in 2009. Therefore, the raw material cost has increased
by only 2 percentage points in 6 years. The real increase is 2 percentage points or
25 percent.
Wrong conclusion: Mr. X annual income has increased from 20,000 to 40,000 in 8 years.
Therefore, the conclusion is, since income has doubled, the purchasing power also has
doubled. This may not be true because due to inflation in 8 years, purchasing power might
come down or money value could get eroded.
Self Assessment
1. The research report will differ based on the …………of the particular managers using the
report.
Notes 3. Availability refers to the communication process between researcher and the………………..
4. …………….refers to the time span between completion of the research project and
presentation of the research report to management
Preparation and presentation of a research report is the most important part of the research
process. No matter how brilliant the hypothesis and how well designed is the research study,
they are of little value unless communicated effectively to others in the form of a research
report. Moreover, if the report is confusing or poorly written, the time and effort spent on
gathering and analysing data would be wasted. It is therefore, essential to summarise and
communicate the result to the management in the form of an understandable and logical research
report.
Research report is regarded as a major component of the research study for the research task
remains unfinished till the report has been presented and/or written. As a matter of fact even
the most brilliant hypothesis, very well designed and conducted research study, and the most
striking generalizations and findings are of modest value unless they are effectively communicated
to others. The rationale of research is not well served unless the findings are made known to
others. Research results must customarily enter the general store of knowledge. All this explains
the importance of writing research report. There are people who do not consider writing of
report as an essential part of the research process. But the general opinion is in favour of treating
the presentation of research results or the writing of report as division and parcel of the research
project. Writing of report is the final step in a research study and requires a set of skills somewhat
different from those called for in respect of the former stages of research. This task should be
accomplished by the researcher with extreme care; he may seek the assistance and guidance of
experts for the reason.
Self Assessment
6. Writing of report is the ………..step in a research study and requires a set of skills somewhat
different from those called for in respect of the former stages of research.
Interpretation means bringing out the meaning of data. We can also say that interpretation is
to convert data into information. The essence of any research is to do interpretation about
the study. This requires a high degree of skill. There are two methods of drawing conclusions
(i) induction (ii) deduction.
In the induction method, one starts from observed data and then generalisation is done which
explains the relationship between objects observed.
On the other hand, deductive reasoning starts from some general law and is then applied to a
particular instance i.e., deduction comes from the general to a particular situation.
Example:
Example of Induction: All products manufactured by Sony are excellent. DVD player model 2602
MX is made by Sony. Therefore, it must be excellent.
Example of Deduction: All products have to reach decline stage one day and become obsolete. This Notes
radio is in decline mode. Therefore, it will become obsolete.
During the inductive phase, we reason from observation. During the deductive phase, we reason
towards the observation. Successful interpretation depends on how well the data is analysed. If
data is not properly analysed, the interpretation may go wrong. If analysis has to be corrected,
then data collection must be proper. Similarly, if the data collected is proper but analysed
wrongly, then too the interpretation or conclusion will be wrong. Sometimes, even with the
proper data and proper analysis, the data can still lead to wrong interpretation. Interpretation
depends upon the experience of the researcher and methods used by him for interpretation.
Did u know? Both logic and observation are essential for interpretation.
Example: A detergent manufacturer is trying to decide which of the three sales promotion
methods (discount, contest, buy one get one free) would be most effective in increasing the sales.
Each sales promotion method is run at different times in different cities. The sales obtained by
the different sale promotion methods is as follows.
1 2,000
2 3,500
3 2,510
The results may lead us to the conclusion that the second sales promotion method was the most
effective in developing sales. This may be adopted nationally to promote the product. But one
cannot say that the same method of sales promotion will be effective in each and every city
under study.
(for information other than commentary, e.g., ratings, rankings, yes's, no's, etc.)
Make copies of your data and store the master copy away. Use the copy for making edits,
cutting and pasting, etc.
Tabulate the information, i.e., add up the number of ratings, rankings, yes's, no's for each
question.
For ratings and rankings, consider computing a mean, or average, for each question. For
example, "For question #1, the average ranking was 2.4". This is more meaningful than
indicating, e.g., how many respondents ranked 1, 2, or 3.
Consider conveying the range of answers, e.g., 20 people ranked "1", 30 ranked "2", and 20
people ranked "3".
Attempt to identify patterns, or associations and causal relationships in the themes, e.g.,
all people who attended programs in the evening had similar concerns, most people came
from the same geographic area, most people were in the same salary range, what processes
or events respondents experience during the program, etc.
Keep all commentary for several years after completion in case needed for future reference.
Attempt to put the information in perspective, e.g., compare results to what you expected,
promised results; management or program staff; any common standards for your products
or services; original goals (especially if you're conducting a program evaluation);
indications or measures of accomplishing outcomes or results (especially if you're
conducting an outcomes or performance evaluation); description of the program's
experiences, strengths, weaknesses, etc. (especially if you're conducting a process
evaluation).
14.3.4 Precautions
2. Analysis of data should start from simpler and more fundamental aspects.
Caution: In report writing, do not miss the significance of some answers, because they are found
from very few respondents, such as "don't know" or "can't say".
Self Assessment
9. In the ………………method, one starts from observed data and then generalisation is done
There are two types of reports (1) Oral report (2) Written report.
This type of reporting is required, when the researchers are asked to make an oral presentation.
Making an oral presentation is somewhat difficult compared to the written report. This is
because the reporter has to interact directly with the audience. Any faltering during an oral
presentation can leave a negative impression on the audience. This may also lower the self-
confidence of the presenter. In an oral presentation, communication plays a big role. A lot of
planning and thinking is required to decide 'What to say', 'How to say', 'How much to say'. Also,
the presenter may have to face a barrage of questions from the audience. A lot of preparation is
required; the broad classification of an oral presentation is as follows.
Opening: A brief statement can be made on the nature of discussion that will follow. The opening
statement should explain the nature of the project, how it came about and what was attempted.
Recommendation: Each recommendation must have the support of conclusion. At the end of the
presentation, question-answer session should follow from the audience.
Method of presentation: Visuals, if need to be exhibited, can be made use of. The use of tabular
form for statistical information would help the audience.
(a) What type of presentation is a root question? Is it read from a manuscript or memorized or
delivered ex-tempo. Memorization is not recommended, since there could be a slip during
presentation. Secondly, it produces speaker-centric approach. Even reading from the manuscript
is not recommended, because it becomes monotonous, dull and lifeless. The best way to deliver
in ex-tempo, is to make main points notes, so that the same can be expanded. Logical sequences
should be followed.
4. Vital data such as figures may be printed and circulated to the audience so that their
ability to comprehend increases, since they can refer to it when the presentation is
going on.
5. The presenter should know his target audience well in advance to prepare tailor-
made presentation.
6. The presenter should know the purpose of report such as "Is it for making a decision",
"Is it for the sake of information", etc.
(1) Daily
(2) Weekly
(3) Monthly
(4) Quarterly
(5) Yearly
1. Short report: Short reports are produced when the problem is very well defined and if the
scope is limited. For example, Monthly sales report. It will run into about five pages. It
consists of report about the progress made with respect to a particular product in a clearly
specified geographical locations.
2. Long report: This could be both a technical report as well as non-technical report. This will
present the outcome of the research in detail.
(a) Technical report: This will include the sources of data, research procedure, sample
design, tools used for gathering data, data analysis methods used, appendix,
conclusion and detailed recommendations with respect to specific findings. If any
journal, paper or periodical is referred, such references must be given for the benefit
of reader.
(b) Non-technical report: This report is meant for those who are not technically qualified.
E.g. Chief of the finance department. He may be interested in financial implications
only, such as margins, volumes, etc. He may not be interested in the methodology.
3. Formal report:
Example: The report prepared by the marketing manager to be submitted to the Vice-
President (marketing) on quarterly performance, reports on test marketing.
4. Informal report: The report prepared by the supervisor by way of filling the shift log
book, to be used by his colleagues.
5. Government report: These may be prepared by state governments or the central government
on a given issue.
Example: Programme announced for rural employment strategy as a part of five-year plan.
Notes
Did u know? Report on children's education is a kind of government and social welfare
report.
Remembering all that is said is difficult if not This can be read a number of times and
impossible. This is because the presenter cannot clarification can be sought whenever the reader
be interrupted frequently for clarification. chooses.
The audience does not have the choice of picking The reader can pick and choose what he thinks
and choosing from the presentation. is relevant to him. For instance, the need for
information is different for technical and non-
technical persons.
Self Assessment
12. The …………….statement should explain the nature of the project, how it came about and
what was attempted.
Having decided on the type of report, the next step is report preparation. The following is the
format of a research report:
1. Title Page
2. Page Contents
3. Executive Summary
4. Body
6. Bibliography
7. Appendix
1. Title Page: Title Page should indicate the topic on which the report is prepared. It should
include the name of the person or agency who has prepared the report.
Notes 2. Table of Contents: The table of contents will help the reader to know "what the report
contains". The table of contents should indicate the various parts or sections of the report.
It should also indicate the chapter headings along with the page number.
3. Executive Summary: If your report is long and drawn out, the person to whom you have
prepared the report may not have the time to read it in detail. Apart from this, an executive
summary will help in highlighting major points. It is a condensed version of the whole
report. It should be written in one or two pages. Since top executives read only the executive
summary, it should be accurate and well-written. An executive summary should help in
decision-making.
(a) Objectives
(e) Conclusion
(a) Introduction
(b) Methodology
(c) Limitations
Introduction: The introduction must explain clearly the decision problem and research
objective. The background information should be provided on the product and services
provided by the organisation which is under study.
Methodology: How you have collected the data is the key in this section. For example,
Was primary data collected or secondary data used? Was a questionnaire used? What was
the sample size and sampling plan and method of analysis? Was the design exploratory or
conclusive?
Limitations: Every report will have some shortcoming. The limitations may be of time,
geographical area, the methodology adopted, correctness of the responses, etc.
Analysis and interpretations: collected data will be tabulated. Statistical tools if any will Notes
be applied to make analysis and to take decisions.
6. Bibliography: If portions of your report are based on secondary data, use a bibliography
section to list the publications or sources that you have consulted. The bibliography
should include, title of the book, name of the journal in case of article, volume number,
page number, edition, etc.
7. Appendix: The purpose of an appendix is to provide a place for material which is not
absolutely essential to the body of the report. The appendix will contain copies of data
collection forms called questionnaires, details of the annual report of the company, details
of graphs/charts, photographs, CDs, interviewers' instructions. Following are the items
to be placed in this section.
!
Caution The date of the submission of the report is to be included in the title page of the
report.
Bibliography, the last section of the report comes after appendices. Appendices contains
questionnaires and other relevant material of the study. The bibliography contains the source of
every reference used and any other relevant work that has been consulted. It imparts an
authenticity regarding the source of data to the reader.
Bibliography are of different types viz., bibliography of works cited; this contains only the
items referred in the text. A selected bibliography lists the items which the author thinks are of
primary interest to the reader. An annotated bibliography gives brief description of each item.
The method of representing bibliography is explained below.
Books
Name of the author, title of the book (underlined), publisher's detail, year of publishing, page
number.
Single Volume Works. Dube, S. C. "India's Changing Villages", Routledge and Kegan Paul Ltd.,
1958, p. 76.
Warwick, Donald P., "Comparative Research Methods" in Balmer, Martin and Donald Warwick
(eds), 1983, pp. 315-30.
Dawan Radile (2005), "They Survived Business World" (India), May 98, pp. 29-36.
Newspaper, Articles
Kumar Naresh, "Exploring Divestment", The Economic Times (Bangalore), August 7, 1999, p. 14.
Website
www.infocom.in.com
Krishna Murthy, P., "Towards Excellence in Management" (Paper presented at a Seminar in XYZ
College Bangalore, July 2000).
Task List the various abbreviations frequently used in footnotes with their meanings.
Self Assessment
13. The ………………..should indicate the various parts or sections of the report.
14. …………..Page should indicate the topic on which the report is prepared.
15. A selected bibliography lists the items which the author thinks are of ………….interest to
the reader.
Has many other urgent matters demanding his or her interest and attention,
Quantify when you have the data to do so. Avoid large, small, instead, say 50%, one in
three.
Avoid the passive voice, if possible, as it creates vagueness (e.g., 'patients were interviewed'
leaves uncertainty as to who interviewed them) and repeated use makes dull reading.
Aim to be logical and systematic in your presentation.
!
Caution In report writing, be consistent in the use of tenses (past or present tense).
iii. Give them an idea of how the material has been organised so the reader can make a quick
determination of what he will read first.
i. An attractive layout for the title page and a clear table of contents.
iii. Consistency in headings and subheadings, for example, font size 16 or 18 bold, for headings
of chapters; size 14 bold for headings of major sections; size 12 bold, for headings of sub-
sections, etc.
iv. Good quality printing and photocopying. Correct drafts carefully with spell check as well
as critical reading for clarity by other team-members, your facilitator and, if possible,
outsiders.
v. Numbering of figures and tables, provision of clear titles for tables, and clear headings for
columns and rows, etc.
Endless description without interpretation is another pitfall. Tables need conclusions, not detailed
presentation of all numbers or percentages in the cells which readers can see for themselves.
Sometimes qualitative data (e.g., open opinion questions) are just coded and counted like
quantitative data, without interpretation, whereas they may be providing interesting illustrations
Notes of reasons for the behavior of informants or of their attitudes. This is serious maltreatment of
data that needs correction.
Self Assessment
14.7 Summary
A report is a very formal document that is written for a variety of purposes, generally in
the sciences, social sciences, engineering and business disciplines.
The most important aspect to be kept in mind while developing research report, is the
communication with the audience.
Report should be able to draw the interest of the readers. Therefore, report should be
reader centric.
Other aspect to be considered while writing report are accuracy and clarity.
The point to be remembered while doing oral presentation is language used, Time
management, use of graph, purpose of the report, etc. Visuals used must be understandable
to the audience.
The presenter must make sure that presentation is completed within the time allotted.
Sometime should be set apart for questions and answers.
Written report may be classified based on whether the report is a short report or a long
report. It can also be classified based on technical report or non technical report.
Written report should contain title page, contents, executive summary. Body, conclusions
and appendix. The last part is bibliography.
There should not be endless description in report writing and qualitative data is not to be
excluded.
14.8 Keywords
Appendix: The part of the report whose purpose is to provide a place for material which is not
absolutely essential to the body of the report.
Bibliography: The section to list the publications or sources that you have consulted in Notes
preparation of report
Informal Report: The report prepared by the supervisor by way of filling the shift log book, to
be used by his colleagues
Short Report: Short reports are the reports that are produced when the problem is very well
defined and if the scope is limited.
7. What are the various criteria used for classification of written report?
8. What are the essential content of the following parts of research report?
(d) Introduction
(e) Conclusion
(f) Appendix
1. need 2. reality
7. Interpretation 8. analysed
17. systematic
Books Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
Bernal, J.D., The Social Function of Science, London: George Routledge and Sons,
1939.
Chase, Stuart, The Proper Study of Mankind: An inquiry into the Science of Human
Relations, New York, Harper and Row Publishers, 1958.
Notes
Statistical Tables
I. Logarithms
Notes
I. Logarithms
Notes
Note: To obtain the value of e–1.75, we write e–1.75 = e–1 × e–0.75 = 0.s36788 × 0.4724 = 0.17379
Notes
0.3989
0.1182
–1.56 0 1.56
Notes
P(z)
0.4945
0 2.54 z
Notes
two tailed test t.05,20 = ± 2.086 ; one tailed test t.05,20 = 1.725
p( 2)
2
.05,12 = 21.026
2
0 21.026
Notes
F.01, 9, 7 = 6.72
F
6.72
IX. Critical Values of F
0
p ( F)
Notes
Notes