0% found this document useful (0 votes)
101 views100 pages

Research Methodology Part 3 - Shrivastava - Ibrg

This document provides an overview of index numbers. It begins by defining index numbers as statistical measures used to compare average levels of distinct but related variables in different situations. Examples of variables that can be indexed include price levels, costs of living, and economic outputs. The document then discusses characteristics of index numbers such as them being specialized averages that measure changes not directly measurable. It also notes that index numbers are typically expressed as percentages. Uses of index numbers are then outlined, including measuring and comparing changes over time or between locations, helping form policies, and deflating monetary figures to adjust for price changes. Construction of index numbers is briefly explained using a food price example.

Uploaded by

zinga007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views100 pages

Research Methodology Part 3 - Shrivastava - Ibrg

This document provides an overview of index numbers. It begins by defining index numbers as statistical measures used to compare average levels of distinct but related variables in different situations. Examples of variables that can be indexed include price levels, costs of living, and economic outputs. The document then discusses characteristics of index numbers such as them being specialized averages that measure changes not directly measurable. It also notes that index numbers are typically expressed as percentages. Uses of index numbers are then outlined, including measuring and comparing changes over time or between locations, helping form policies, and deflating monetary figures to adjust for price changes. Construction of index numbers is briefly explained using a food price example.

Uploaded by

zinga007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Unit 11: Index Numbers

Unit 11: Index Numbers Notes

CONTENTS
Objectives
Introduction
11.1 Definitions and Characteristics of Index Numbers
11.2 Uses of Index Numbers
11.3 Construction of Index Numbers
11.4 Price Index Numbers
11.4.1 Use of Price Index Numbers in Deflating
11.5 Quantity Index Numbers
11.6 Consumer Price Index Number
11.6.1 Construction of Consumer Price Index
11.6.2 Uses of Consumer Price Index
11.7 Problems in the Construction of Index Numbers
11.8 Limitations of Index Numbers
11.9 Summary
11.10 Keywords
11.11 Review Questions
11.12 Further Readings

Objectives
After studying this unit, you will be able to:

 Define the conception of index numbers

 Discuss the uses of index numbers

 Describe the construction of index numbers

 Recognize the thought of consumer price index number

 Identify the problems in the construction of index numbers

Introduction

An index number is a statistical measure used to compare the average level of magnitude of a
group of distinct but related variables in two or more situations. Suppose that we want to
compare the average price level of different items of food in 1992 with what it was in 1990. Let
the different items of food be wheat, rice, milk, eggs, ghee, sugar, pulses, etc. If the prices of all
these items change in the same ratio and in the same direction; assume that prices of all the items
have increased by 10% in 1992 as compared with their prices in 1990; then there will be no
difficulty in finding out the average change in price level for the group as a whole. Obviously,

LOVELY PROFESSIONAL UNIVERSITY 231


Research Methodology

Notes the average price level of all the items taken as a group will also be 10% higher in 1992 as
compared with prices of 1990. However, in real situations, neither the prices of all the items
change in the same ratio nor in the same direction, i.e., the prices of some commodities may
change to a greater extent as compared to prices of other commodities. Moreover, the price of
some commodities may rise while that of others may fall. For such situations, the index numbers
are very useful device for measuring the average change in prices or any other characteristics
like quantity, value, etc. for the group as a whole.

11.1 Definitions and Characteristics of Index Numbers

Some important definitions of index numbers are given below:

1. “An index number is a device for comparing the general level of magnitude of a group of
distinct, but related, variables in two or more situations.”

—Karmel and Polasek

2. “An index number is a special type of average that provides a measurement of relative
changes from time to time or from place to place.”

— Wessell, Wilett and Simone

3. “Index number shows by its variation the changes in a magnitude which is not susceptible
either of accurate measurement in itself or of direct valuation in practice.”

— Edgeworth

4. “An index number is a single ratio (usually in percentage) which measures the combined
(i.e., averaged) change of several variables between two different times, places or
situations.”

— Tuttle

On the basis of the above definitions, the following characteristics of index numbers are worth
mentioning:

1. Index numbers are specialised averages: As we know that an average of data is its
representative summary figure. In a similar way, an index number is also an average,
often a weighted average, computed for a group. It is called a specialised average because
the figures, that are averaged, are not necessarily expressed in homogeneous units.

2. Index numbers measure the changes for a group which are not capable of being directly
measured: The examples of such magnitudes are: Price level of a group of items, level of
business activity in a market, level of industrial or agricultural output in an economy, etc.

3. Index numbers are expressed in terms of percentages: The changes in magnitude of a group
are expressed in terms of percentages which are independent of the units of measurement.
This facilitates the comparison of two or more index numbers in different situations.

Self Assessment

Fill in the blanks:

1. Index numbers are called a specialised average because the figures, that are averaged, are
not necessarily expressed in ………………..units.

2. Index number is often recognized a ……………….average, computed for a group.

232 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

11.2 Uses of Index Numbers Notes

The main uses of index numbers are:

1. To measure and compare changes: The basic purpose of the construction of an index number
is to measure the level of activity of phenomena like price level, cost of living, level of
agricultural production, level of business activity, etc. It is because of this reason that
sometimes index numbers are termed as barometers of economic activity. It may be
mentioned here that a barometer is an instrument which is used to measure atmospheric
pressure in physics.

The level of an activity can be expressed in terms of index numbers at different points of
time or for different places at a particular point of time. These index numbers can be easily
compared to determine the trend of the level of an activity over a period of time or with
reference to different places.

2. To help in providing guidelines for framing suitable policies: Index numbers are
indispensable tools for the management of any government or non-government
organisation.

Example: The increase in cost of living index is helpful in deciding the amount of
additional dearness allowance that should be paid to the workers to compensate them for the
rise in prices. In addition to this, index numbers can be used in planning and formulation of
various government and business policies.

3. Price index numbers are used in deflating: This is a very important use of price index
numbers. These index numbers can be used to adjust monetary figures of various periods
for changes in prices.

Example: The figure of national income of a country is computed on the basis of the
prices of the year in question. Such figures, for various years often known as national income at
current prices, do not reveal the real change in the level of production of goods and services. In
order to know the real change in national income, these figures must be adjusted for price
changes in various years. Such adjustments are possible only by the use of price index numbers
and the process of adjustment, in a situation of rising prices, is known as deflating.

4. To measure purchasing power of money: We know that there is inverse relation between
the purchasing power of money and the general price level measured in terms of a price
index number. Thus, reciprocal of the relevant price index can be taken as a measure of the
purchasing power of money.

Self Assessment

Fill in the Blanks:

3. Index numbers are termed as ……………….of economic activity.

4. The ………………….can be expressed in terms of index numbers at different points of time


or for different places at a particular point of time.

11.3 Construction of Index Numbers

To illustrate the construction of an index number, we reconsider various items of food mentioned
earlier. Let the prices of different items in the two years, 1990 and 1992, be as given below:

LOVELY PROFESSIONAL UNIVERSITY 233


Research Methodology

Notes Price in 1990 Price in 1992


Item
(in Rs/unit) (in Rs/unit)
1. Wheat 300/quintal 360/quintal
2. Rice 12/kg. 15/kg.
3. Milk 7/litre 8/litre
4. Eggs 11/dozen 12/dozen
5. Ghee 80/kg. 88/kg.
6. Sugar 9/kg. 10/kg.
7. Pulses 14/kg. 16/kg.

The comparison of price of an item, say wheat, in 1992 with its price in 1990 can be done in two
ways, explained below:

1. By taking the difference of prices in the two years, i.e., 360 - 300 = 60, one can say that the
price of wheat has gone up by 60/quintal in 1992 as compared with its price in 1990.

360
2. By taking the ratio of the two prices, i.e., = 1.20, one can say that if the price of wheat
300
in 1990 is taken to be 1, then it has become 1.20 in 1992. A more convenient way of
comparing the two prices is to express the price ratio in terms of percentage, i.e.,
360
´ 100 = 120 , known as Price Relative of the item. In our example, price relative of
300
wheat is 120 which can be interpreted as the price of wheat in 1992 when its price in 1990
is taken as 100. Further, the figure 120 indicates that price of wheat has gone up by 120 – 100
= 20% in 1992 as compared with its price in 1990.

The first way of expressing the price change is inconvenient because the change in price depends
upon the units in which it is quoted. This problem is taken care of in the second method, where
price change is expressed in terms of percentage. An additional advantage of this method is that
various price changes, expressed in percentage, are comparable. Further, it is very easy to grasp
the 20% increase in price rather than the increase expressed as 60/quintal.

For the construction of index number, we have to obtain the average price change for the group
in 1992, usually termed as the Current Year, as compared with the price of 1990, usually called
the Base Year. This comparison can be done in two ways:

1. By taking suitable average of price relatives of different items. The methods of index
number construction based on this procedure are termed as Average of Price Relative
Methods.

2. By taking ratio of the averages of the prices of different items in each year. These methods
are popularly known as Aggregative Methods.

Since the average in each of the above methods can be simple or weighted, these can further be
divided as simple or weighted. Various methods of index number construction can be classified
as shown below:

Methods of Index Number Construction

Average of Price Relatives Methods Aggregative Methods

Simple Average Weighted Average Simple Weighted


of Price Relatives of Price Relatives Aggregative Aggregative
Methods Methods Methods Methods

234 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

In addition to this, a particular method would depend upon the type of average used. Although, Notes
geometric mean is more suitable for averaging ratios, arithmetic mean is often preferred because
of its simplicity with regard to computations and interpretation.

Notes Before writing various formulae of index numbers, it is necessary to introduce


certain notations and terminology for convenience.
Base Year: The year from which comparisons are made is called the base year. It is commonly
denoted by writing ‘0’ as a subscript of the variable.

Current Year: The year under consideration for which the comparisons are to be computed
is called the current year. It is commonly denoted by writing ‘1’ as a subscript of the
variable.

Let there be n items in a group which are numbered from 1 to n. Let p0i denote the price of
the i th item in base year and p 1i denote its price in current year, where i = 1, 2, ...... n. In a
similar way q 0i and q1i will denote the quantities of the i th item in base and current years
respectively.

Using these notations, we can write an expression for price relative of the ith item as
p1i q
Pi = ´ 100 and quantity relative of the i th item as Q i = 1i ´ 100 .
p0i q 0i

Further, P01 will be used to denote the price index number of period ‘1’ as compared with
the prices of period ‘0’. Similarly, Q 01 and V01 would denote the quantity and the value
index numbers respectively of period ‘1’ as compared with period ‘0’.

Self Assessment

Fill in the blanks:

5. The year from which comparisons are made is called the………………….

6. ………………………is commonly denoted by writing ‘1’ as a subscript of the variable

11.4 Price Index Numbers

Simple Average of Price Relatives

1. When arithmetic mean of price relatives is used

p1i
åP åp ´ 100
= or P01 =
i 0i
The index number formula is given by P01 Omitting the
n n
p1
åp ´ 100
0
subscript i, the above formula can also be written as P 01 =
n

2. When geometric mean of price relatives is used


The index number formula is given by

LOVELY PROFESSIONAL UNIVERSITY 235


Research Methodology

Notes 1
1
 
 n
P01 =  P1 × P2 × ..... Pn ) n  =  Pi
  i=1
n   log Pi 
= Antilog 
 n 

( is used to denote the product of terms.)

Example: Given below are the prices of 5 items in 1985 and 1990. Compute the simple price
index number of 1990 taking 1985 as base year. Use (a) arithmetic mean and (b) geometric mean.

Price in1985 Price in1990


Item
(Rs/unit) (Rs/unit)
1 15 20
2 8 7
3 200 300
4 60 110
5 100 130

Solution:
Calculation Table

Price Relative
Price in Price in
Item p1i log Pi
1985 (P0i) 1990 (P0i) Pi = × 100
p0i

1 15 20 1 3 3 .3 3 2 .1 2 4 9
2 8 7 8 7 .5 0 1 .9 4 2 0
3 200 300 1 5 0 .0 0 2 .1 7 6 1
4 60 110 1 8 3 .3 3 2 .2 6 3 2
5 100 130 1 3 0 .0 0 2 .1 1 3 9
Total 6 8 4 .1 6 1 0 .6 2 0 1

684.16
\ Index number, using A.M., is P01 = = 136.83 and Index number, using G.M., is
5

é 10.6201 ù
P01 = Antilog ê = 133.06
ë 5 úû

Weighted Average of Price Relatives

In the method of simple average of price relatives, all the items are assumed to be of equal
importance in the group. However, in most of the real life situations, different items of a group
have different degree of importance. In order to take this into account, weighing of different
items, in proportion to their degree of importance, becomes necessary.

Let wi be the weight assigned to the i th item (i = 1, 2, ...... n). Thus, the index number, given by

the weighted arithmetic mean of price relatives, is P01 =


åPw i i
.
åw i

236 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

Similarly, the index number, given by the weighted geometric mean of price relatives can be Notes
written as follows:
1 1
w i ù å w i or P01 = Antilog ê å i
é w log Pi ù
é w1 w2 w n ù å wi é ú
P01 = P1 .P2 P = êÕ Pi ú
êë n úû
ëê ûú ëê å wi ûú

Nature of Weights

While taking weighted average of price relatives, the values are often taken as weights. These
weights can be the values of base year quantities valued at base year prices, i.e., p 0iq0i, or the
values of current year quantities valued at current year prices, i.e., p 1iq1i, or the values of current
year quantities valued at base year prices, i.e., p 0iq1i, etc., or any other value.

Example: Construct an index number for 1989 taking 1981 as base for the following data, by
using
1. Weighted arithmetic mean of price relatives and

2. Weighted geometric mean of price relatives.

Prices in Prices in
Commodities Weights
1981 1989
A 60 100 30
B 20 20 20
C 40 60 24
D 100 120 30
E 120 80 10

Solution:
Calculation Table
Price Relative
Price in Price in
Item p log Pi
1985 (P0i ) 1990 (P0i ) Pi = 1i × 100
p0i
1 15 20 133.33 2.1249
2 8 7 87.50 1.9420
3 200 300 150.00 2.1761
4 60 110 183.33 2.2632
5 100 130 130.00 2.1139
Total 684.16 10.6201

14866.8
\ Index number using A.M. is P01 = = 130.41 and index number using G.M. is
114

é 239.498 ù
P01 = Antilog ê = 126.15
ë 114 úû

LOVELY PROFESSIONAL UNIVERSITY 237


Research Methodology

Notes

Task Taking 1983 as base year, calculate an index number of prices for 1990, for the
following data given in appropriate units, using:
1. Weighted arithmetic mean of price relatives by taking weights as the values of
current year quantities at base year prices, and

2. Weighted geometric mean of price relatives by taking weights as the values of base
year quantities at base year prices.

Simple Aggregative Method

In this method, the simple arithmetic mean of the prices of all the items of the group for the
current as well as for the base year are computed separately. The ratio of current year average to
base year average multiplied by 100 gives the required index number.

Using notations, the arithmetic mean of prices of n items in current year is given by
åp 1i
and
n

the arithmetic mean of prices in base year is given by


åp 0i

åp 1i

\ Simple aggregative price index P01 = n ´ 100 =


åp 1i
´ 100
å p0i åp 0i

Omitting the subscript i, the above index number can also be written as P01 =
åp 1
´ 100
åp 0

Example: The following table gives the prices of six items in the years 1980 and 1981. Use
simple aggregative method to find index of 1981 with 1980 as base.

Price in Price in
Item
1980 ( ) 1981 ( )
A 40 50
B 60 60
C 20 30
D 50 70
E 80 90
F 100 100

238 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

Solution: Notes

Let p0 be the price in 1980 and p 1 be the price in 1981. Thus, we have

Sp 0 = 350 and Sp1 = 400

400
\ P01 = ´ 100 = 114.29
350

Weighted Aggregative Method

This index number is defined as the ratio of the weighted arithmetic means of current to base
year prices multiplied by 100.
Using the notations, defined earlier, the weighted arithmetic mean of current year prices can be

written as =
åp w 1i i

åw i

Similarly, the weighted arithmetic mean of base year prices =


åp w
0i i

åw i

åp w 1i i

\ Price Index Number, P01 =


åw i
´ 100 =
åp 1i wi
´ 100
åp w 0i i åp 0i wi
åw i

Omitting the subscript, we can also write P01 =


å p w ´ 100
1

åp w 0

Nature of Weights

In case of weighted aggregative price index numbers, quantities are often taken as weights.
These quantities can be the quantities purchased in base year or in current year or an average of
base year and current year quantities or any other quantities. Depending upon the choice of
weights, some of the popular formulae for weighted index numbers can be written as follows:

1. Laspeyres' Index: Laspeyres' price index number uses base year quantities as weights.
Thus, we can write

P01La =
åp 1i q 0i
´ 100 or P01La =
åp q 1 0
´ 100
åp 0i q 0i åp q 0 0

2. Paasche's Index: This index number uses current year quantities as weights. Thus, we can
write

P01Pa =
åp 1i q 1i
´ 100 or P01Pa =
åp q 1 1
´ 100
åp 0i q 1i åp q 0 1

3. Fisher's Ideal Index: As will be discussed later that the Laspeyres's Index has an upward
bias and the Paasche's Index has a downward bias. In view of this, Fisher suggested that an
ideal index should be the geometric mean of Laspeyres' and Paasche's indices. Thus, the
Fisher's formula can be written as follows:

LOVELY PROFESSIONAL UNIVERSITY 239


Research Methodology

Notes
P01F = P01La ´ P01Pa =
åp q 1 0
´ 100 ´
åp q 1 1
´ 100 =
åp q 1 0
´
åp q 1 1
´ 100
åp q 0 0 åp q 0 1 åp q 0 0 åp q 0 1

åp q 1 0 åp q 1 1
If we write L =
åp q 0 0
and P =
åp q 0 1
, the Fisher's Ideal Index can also be written as

P01 = L ´ P ´ 100 .

4. Dorbish and Bowley's Index: This index number is constructed by taking the arithmetic
mean of the Laspeyres's and Paasche's indices.

1 é å p 1q 0 åp q ù 1 é å p 1q 0 åp q ù
P01DB = ´ 100 + ´ 100 ú = ê + ú ´ 100 = 1 [L × P] × 100
1 1 1 1
ê
2 êë å p0 q 0 åp q 0 1 úû 2 êë å p0 q 0 åp q 0 1ú
û 2

5. Marshall and Edgeworth's Index: This index number uses arithmetic mean of base and
current year quantities.

æ q0 + q1 ö
åp çè
1
2 ø
÷
å p (q + q1 ) åp q + åp q
= ´ 100 = ´ 100 = ´ 100
1 0 1 0 1 1
P01ME
q +q
å p0 æçè 0 2 1 ö÷ø å p (q0 0 + q1 ) åp q + åp q
0 0 0 1

6. Walsh's Index: Geometric mean of base and current year quantities are used as weights in
this index number.

P01Wa =
åp 1 q 0q 1
´ 100
åp 0 q 0q 1

7. Kelly's Fixed Weights Aggregative Index: The weights, in this index number, are quantities
which may not necessarily relate to base or current year. The weights, once decided,
remain fixed for all periods. The main advantage of this index over Laspeyres's index is
that weights do not change with change of base year. Using symbols, the Kelly's Index can
be written as

P01Ke =
å p q ´ 100 1

åp q 0

Example: Calculate the weighted aggregative price index for 1990 from the following data
:
Price in Price in
Item Weights
1971 1990
A 8 9.5 5
B 12 12.5 1
C 6.5 9 3
D 4 4.5 6
E 6 7 4
F 2 4 3

240 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

Solution: Notes
Calculation Table
Price in Price in Weights
Item p0w p 1w
1971 (p0 ) 1990 (p1 ) (w)
A 8 9.5 5 40.0 47.5
B 12 12.5 1 12.0 12.5
C 6.5 9 3 19.5 27.0
D 4 4.5 6 24.0 27.0
E 6 7 4 24.0 28.0
F 2 4 3 6.0 12.0
Total 125.5 154.0

154.0
\ Price Index (1971 = 100) P01 = ´ 100 = 122.71
125.5
The term within bracket, i.e., 1971 = 100, indicates that base year is 1971.

11.4.1 Use of Price Index Numbers in Deflating


This is perhaps the most important application of price index numbers. Deflating implies making
adjustments for price changes. A rise of price level implies a fall in the value of money. Therefore,
in a situation of rising prices, the workers who are getting a fixed sum in the form of wages are
in fact getting less real wages. Similarly, in a situation of falling prices, the real wages of the
workers are greater than their money wages. Thus, to determine the real wages, the money
wages of the workers are to be adjusted for price changes by using relevant price index number.
The following formula is used for conversion of money wages into real wages.

Money Wage
Real Wage= ×100 .... (1)
Consumer Price Index

Another application of the process of deflating to find the value of output at constant prices so as
to facilitate the comparison of real changes in output. It may be pointed out here that the output
of a given year is often valued at the current year prices. Since prices in various years are often
different, the comparison of output at current year prices has no relevance.
The output at constant prices is obtained using the following formula.

Output at Current Prices


Output at Constant Prices = × 100 .... (2)
Price Index

Example: The following table gives the average monthly wages of a worker along with
the respective consumer price index numbers for ten years.
Years : 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Average monthly
: 500 525 560 600 630 635 700 740 800 900
wages ( )
Consumer Price
: 100 110 120 125 135 160 185 200 210 240
Index

Compute his real average monthly wages in various years.

LOVELY PROFESSIONAL UNIVERSITY 241


Research Methodology

Notes Solution:
Computation of Real Wages
Average Monthly Consumer Price Real average
Years monthly wage
wage Index
500
1980 500 100 × 100 = 500.00
100
525
1981 525 110 × 100 = 477.27
110
560
1982 560 120 × 100 = 466.67
120
600
1983 600 125 × 100 = 480.00
125
630
1984 630 135 × 100 = 466.67
135
635
1985 635 160 × 100 = 396.88
160
700
1986 700 185 × 100 = 378.38
185
740
1987 740 200 × 100 = 370.00
200
800
1988 800 210 × 100 = 380.95
210
900
1989 900 240 × 100 = 375.00
240

Purchasing Power of Money

When prices in general are rising, the real value of a rupee is declining. If, e.g., the price index in
1992 with base 1990 is 120, the real value of a rupee in 1992 as compared with its value in 1990.
This implies that a rupee in 1992 is worth only 83 paise of 1990.
From the above we note that the purchasing power of a rupee in current year is equal to the
reciprocal of the price index multiplied by 100. Thus, we can write

Current Rupee×100 100


Purchasing Power of a Rupee or Constant Rupee= =
Price Index Price Index

Note that the Current Rupee is always equal to unity.

100
We can also write Price Index=
Constant Rupee

Example: Given the following information on the Gross Domestic Product (in crores)
at the constant (1980 - 81) prices and at current prices for five years. Calculate the series of price
index numbers and of quantity index numbers for each of the five years with 1980 - 81 as base
year.

G.D.P. at constant G.D.P. at current


(1980-81) Prices Prices
1980-81 200 200
1981-82 150 240
1982-83 125 350
1983-84 120 360
1984-85 160 400

242 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

Solution: Notes
Calculation of Price and Quantity Index Numbers

GDP at GDP at
Quantity Index Price Index*
Year constant current
Number Series Number Series
Prices Prices
200
1980-81 200 200 100 ×100=100
200
150 240
1981-82 150 240 ×100=75 ×100=160
200 150
125 350
1982-83 125 350 ×100=62.5 ×100=280
200 125
120 360
1983-84 120 360 ×100=60 ×100=300
200 120
160 400
1984-85 160 400 ×100=80 ×100=250
200 160

Output at current Prices


Price Index= ×100
Output at constant Prices
Did u know? What is the usage of deflating?
The concept of deflating can be used to determine the purchasing power or real value of a
rupee.

Self Assessment
Fill in the blanks:

7. In case of weighted aggregative price index numbers, quantities are often taken
as………………..

8. Deflating implies making adjustments for ………….changes.

11.5 Quantity Index Numbers


A quantity index number measures the change in quantities in current year as compared with a
base year. The formulae for quantity index numbers can be directly written from price index
numbers simply by interchanging the role of price and quantity. Similar to a price relative, we
can define a quantity relative as
q1
Q= ´ 100
q0

Various formulae for quantity index numbers are as given below:

åq
Simple aggregative index Q 01 = ´ 100
1
1.
åq 0

2. Simple average of quantity relatives

åq 1
´ 100
(a) Taking A.M. Q =
å q 0
=
åQ
01
n n

LOVELY PROFESSIONAL UNIVERSITY 243


Research Methodology

Notes é å log Q ù
(b) Taking G.M. Q01 = Antilog ê ú
êë n úû
3. Weighted aggregative index

åq p
01 = ´ 100 (base year prices are taken as weights)
1 0
(a) Q La
åq p 0 0

åq p
01 = ´ 100 (current year prices are taken as weights)
1 1
(b) Q Pa
åq p 0 1

Q Fi01 =
åq p 1 0
´
åq p 1 1
´ 100 Other aggregative formulae can also be written in a
(c)
åq p 0 0 åq p 0 1

similar way.
4. Weighted average of quantity relatives

å Qw
(a) Taking A.M. Q 01 =
åw
é å w log Q ù
(b) Taking G.M. Q 01 = Antilog ê ú
ëê å w ûú
Like weighted average of price relatives, values are taken as weights.

Example: Using Fisher's formula, the quantity index number from the following data:

1974 1976
Article
Price (Rs) Value (Rs) Price (Rs) Value (Rs)
A 5 50 4 48
B 8 48 7 49
C 6 18 5 20

Solution:
Calculation Table

1974 1976
Article V0 V1 p0 q 1 p 1q 0
p0 V0 q0 = p1 V1 q1 =
p0 p1

A 5 50 10 4 48 12 60 40
B 8 48 6 7 49 7 56 42
C 6 18 3 5 20 4 24 15
Total å p0q0 = 116 å p1q1 = 117 140 97

Q Fi01 =
åq p 1 0
´
åq p 1 1
´ 100 =
140 117
´ = 120.65
åq p 0 0 åq p 0 1 116 97

244 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

Self Assessment Notes

Fill in the blanks:

9. …………………………..measures the change in quantities in current year as compared


with a base year.

10. The formulae for quantity index numbers can be directly written from
…………………….simply by interchanging the role of price and quantity.

11.6 Consumer Price Index Number

The consumer price or the retail price is the price at which the ultimate consumer purchases his
goods and services from the retailer. According to the Labour Bureau, “with the help of Consumer
Price Index Number, it is intended to show over time the average change in prices paid by the consumers
belonging to the population group proposed to be covered by the index for a fixed list of goods and services
consumed by them”.

Formerly, this index was also known as the cost of living index. However, since this index
measures changes in cost of living due to changes in retail prices only and not due to changes in
living standards, etc., the name was changed to consumer price index or retail price index.

11.6.1 Construction of Consumer Price Index

The following steps are involved in the construction of a consumer price index:

1. Scope and Coverage: The scope of consumer price index, proposed to be constructed, must
be very clearly defined. This implies the identification of the class of people for whom the
index will be constructed such as industrial workers, agricultural workers, urban wage
earners, etc. Further, it is also necessary to define the coverage of the class of people, i.e.,
the definition of geographical location of their stay such as a city or two or more villages,
etc. The selected class of people should form a homogeneous group so that weights of
various commodities are same for all the people.

2. Selection of Base Period: A normal period having comparative economic stability should
be selected as a base period in order that the consumption pattern used in the construction
of the index remain practically stable over a fairly long period.

3. Conducting Family Budget Enquiry: A family budget gives the details of expenditure
incurred by the family on various items in a given period. In order to estimate the
consumption pattern, a sample survey of family budgets of the group of people, for whom
the index is to be constructed, is conducted and from this an average family budget is
prepared. The goods and services that are to be included in the construction of the index
are selected from this average family budget. Efforts should be made to include as many
commodities as possible. Generally the commodities are divided into five broad groups:
(i) Food, (ii) Clothing, (iii) Fuel and Lighting, (iv) House Rent and (v) Miscellaneous.

If necessary, these groups may further be divided into sub-groups. Percentage expenditure
of a group is taken as its weight.

4. Obtaining Price Quotations: The next step in the construction of consumer price index is
to obtain the retail price quotations of various items that are selected. The price quotations
should be obtained from those markets from which the group of people, for whom the
index number is being constructed, normally make purchases. The quality of various
goods and services used by the group of people should also be kept in mind while obtaining
price quotations.

LOVELY PROFESSIONAL UNIVERSITY 245


Research Methodology

Notes 5. Computation of the Index Number: After the collection of necessary data, the consumer
price index can be computed by using either of the following formulae.
(a) Aggregate Expenditure Method: Base year quantities are taken as weights in the
aggregate expenditure method. The formula for the consumer price index is given
å p1 p0
CP
by p01 = ´ 100 which is the Laspeyres’s formula.
å p0 p0
(b) Family Budget Method: This method is also known as weighted average of price
relatives method and accordingly values are taken as weights. The formula for the

å Pw , where p = p1 ´ 100
CP
consumer price index is given by P01 =
åw p0
Example: From the information given below, construct the consumer price index number
of 1985 by (i) Aggregate Expenditure Method, and (ii) Family Budget Method.

Commodities Quantities (q 0 ) Price in 1980 (p 0 ) Price in 1985 (p1 )


A 2 75 125
B 25 12 16
C 10 12 16
D 5 10 15
E 25 4.5 5
F 40 10 12
G 1 25 40

Solution:
Calculation of Consumer Price Index

p1
Com. p0 q0 p1q0 P= ´ 100 w = p0 q0 Pw
p0
A 150 250 166.67 150 25000.5
B 300 400 133.33 300 39999.0
C 120 160 133.33 120 15999.6
D 50 75 150.00 50 7500.0
E 112.5 125 111.11 112.5 12499.9
F 400 480 120.00 400 48000.0
G 25 40 160.00 25 4000.0
Total 1157.5 1530 1157.5 152999.0

1530
1. Index by agg. exp. method  100  132.18
1157.5

152999
2. Index by F.B. method  132.18
1157.5

11.6.2 Uses of Consumer Price Index

1. A consumer price index is used to determine the real wages from money wages and the
purchasing power of money.

246 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

2. It is also used to determine the dearness allowance to compensate the workers for the rise Notes
in prices.

3. It can be used in the formulation of various economic policies of the government.


4. It may be useful in the analysis of markets of certain goods or services.

Example: A particular series of consumer price index covers five groups of items. Between
1975 and 1980 the index rose from 180 to 225. Over the same period the price index numbers of
various groups changed as follows:
Food from 198 to 252; clothing from 185 to 205; fuel and lighting from 175 to 195; miscellaneous
from 138 to 212; house rent remained unchanged at 150.

Given that the weights of clothing , house rent and fuel and lighting are equal, determine the
weights for individual groups of items.

Solution:

Let w1% be the weight of food, w2% be the weight of miscellaneous group and w% be the weight
of each of the remaining three groups. Therefore we can write w 1 + w2 + 3w = 100 or w 2 = 100 –
w1 – 3w.

The given data can be written in the form of table as given below:

Index in Index in
Groups Weights
1975 (I1 ) 1980 (I 2 )
Food w1 198 252
Clothing w 185 205
Fuel & Lighting w 175 195
House Rent w 150 150
Miscellaneous 100 - w 1 - 3w 138 212
Total 100
On the basis of above, the consumer price index of 1975 is
1.98w 1 + (185 + 175 + 150)w + 138(100 - w 1 - 3w)
= 180 (given)
100
or 60w 1 + 96w = 4200 .... (1)
Further, the consumer price index of 1980 is

252 w1 + (205 + 195 + 150 ) w + 212 (100 - w1 - 3 w )


= = 225 (given)
100
or 40w 1 - 86w = 1300 .... (2)
Solving equations (1) and (2) simultaneously, we get w = 10 and w 1 = 54
Substituting these values in expression for w2, we get
w2 = 100 – w 1 – 3w = 100 – 54 – 30 = 16

Self Assessment

Fill in the blanks:

11. Formerly, Consumer Price index was also known as the ………………………index.

12. A consumer price index is used to determine the ……………….from money wages.

LOVELY PROFESSIONAL UNIVERSITY 247


Research Methodology

Notes 11.7 Problems in the Construction of Index Numbers

The following are some general problems that are faced in the construction of any index number:

1. Definition of the purpose: Since it is possible to construct index numbers for a number of
purposes and one cannot have an all purpose index, therefore, it is very essential to define
the specific purpose of its construction. For example, if we are interested in the construction
of a price index number, we must have knowledge about the purpose to be served by it,
i.e., what is to be measured by it; like the cost of living of workers or the change in
wholesale prices, etc. In the absence of this information, it may be difficult to carry out
various steps in the construction of an index number. The questions like what are items to
be included, from which of the markets the price quotations are to be obtained, what will
be the weights of different items, etc., cannot be answered unless the purpose of the index
number construction is known. Further, an index number can be of sensitive or general
nature. In case of sensitive index, only those items are included whose variables (like
prices in case of price index) fluctuate very often; while efforts are made to include as
many items as possible when the index is of general nature. It may be pointed out that the
index numbers are specialised tools and as such are more useful and efficient when properly
used. The first step in this direction is a specific definition of the purpose of its construction.

2. Selection of the base period: Every index number is constructed with reference to a base
period. There are two important points that must be kept in mind while selecting the base
period of an index number.

(a) The base period should correspond to a period of relative economic and political
stability, i.e., it should be a normal or representative period in some way. In certain
situations where identification of such a period is not possible, the average of certain
periods can also be taken as base.

(b) The comparison of current period with a remote base doesn’t have much relevance.
In the words of Morris Hamburg, “It is desirable that the base period be not too far
away in time from the present. The further away we move from the base period the
dimmer are our recollections of economic conditions prevailing at that time.
Consequently, comparisons with these remote base periods tend to lose significance
and become rather tenuous in meaning”.
Another problem with a remote base period can be that certain items that were in
use in the base period are no longer in use while certain new items are in use in
current period. In such a situation the two item bundles are no longer homogeneous
and comparable. This problem is less likely to occur when fairly recent period is
chosen as base.

!
Caution The base period should not be too distant from the current period.

3. Selection of number and type of items: An index number of a particular group of items is
in fact based on a sample of items taken from it. It is neither possible nor necessary to
include all the items of the group in the construction of an index number. The number of
items to be included depends largely upon the purpose of the index number.

There are no hard and fast rules that can be laid down with regard to the selection of the
number of items, however, it must be remembered that more is the number of items the
more representative will be the index number and more cumbersome will be the task of
computations. Therefore, it is necessary to have some sort of balance between having a
representative index and the work of computation involved in its construction.

248 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

The following points should be kept in mind in selecting the type of items: Notes

(a) The items should be representative of the tastes, habits and customs of the people
for whom the index is to be constructed.

(b) The selected items should be of stable quality. The standardised items should be
given preference.

(c) As far as possible, the non-tangible items like personal services, goodwill, etc.,
should be excluded because it is difficult to ascertain their value.

4. Collection of data: The next important step in the construction of an index number is the
collection of data. For example, for the construction of price index, price quotations are to
be obtained. Since the prices of commodities may vary from one market to another and in
certain cases from one shop to another, it is necessary to select those markets which are
representative in the sense that the group under consideration generally make purchases
from these markets. The next logical step is to select an agency through which price
quotations are to be obtained. The selected agency should be highly reliable and if necessary
the accuracy of price quotations reported by it may also be checked by appointing some
other agency or agencies. Furthermore, care should always be taken to obtain price
quotations for the same quality of items.

Similar type of considerations are necessary for the collection of data for the construction
of index numbers such as quantity index, value index, unemployment index, etc.

5. Selection of a suitable average: Since the index numbers are also averages, any of the five
averages, viz. arithmetic mean, median, mode, geometric mean and harmonic mean can
be used in its construction. However, since in most of the situations we have to average
ratios of the values in current period to that in base period, geometric mean is the most
suitable average in the construction of index numbers. The main difficulty of using the
geometric mean is the complexities of its computations and hence, the use of arithmetic
mean is more popular in spite of its being less suitable.

6. Selection of suitable weights: According to John I. Griffin, “Weighing is designed to give


component series an importance in proper relation to their real significance.” The basic
purpose of weighing is to enable each item to have an influence, on the index number, in
proportion to its importance in the group. It is, therefore, necessary to design a system of
weighing such that true importance of the items is reflected by it. The system of weighing
may be either arbitrary or rational. Arbitrary or chance weighing implies that the
statistician is free to assign weights to different items as he thinks fit or reasonable.
Rational or logical weighing, on the other hand, implies that some criterion has been
fixed for assigning weights. Two types of weights are commonly used in the construction
of a price index number: (i) physical quantities and (ii) money values. These weights can
be quantities (or values) produced or consumed or sold in base or current or in any other
period.

Another problem, to be tackled, with regard to system of weights is whether weights


should be fixed or fluctuating. When relative importance of various items change in
different periods, it is desirable to have fluctuating system of weights to get better results.

Self Assessment

Fill in the blanks:

13. Every index number is constructed with reference to a …………..period.

14. The basic purpose of …………..is to enable each item to have an influence, on the index
number, in proportion to its importance in the group.

LOVELY PROFESSIONAL UNIVERSITY 249


Research Methodology

Notes 11.8 Limitations of Index Numbers

Despite the fact that index numbers are very useful for the measurement of relative changes,
these suffer from the following limitations:

1. The computation of an index number is based on the data obtained from a sample, which
may not be a true representative of the universe.

2. The composition of the bundle of commodities may be for different years. This cannot be
taken into account by the fixed base method. Although this difficulty can be overcome by
the use of chain base index numbers, but their calculations are quite cumbersome.

3. An index number doesn’t take into account the quality of the items. Since a superior item
generally has a higher price and the increase in index may be due to an improvement in
the quality of the items and not due to rise of prices.

4. Index numbers are specialised averages and as such these also suffer from all the limitations
of an average.

5. An index number can be computed by using a number of formulae and different formulae
will give different results. Unless a proper method is used, the results are likely to be
inaccurate and misleading.

6. By the choice of a wrong base period or weighing system, the results of the index number
can be manipulated and, thus, are likely to be misused.

Self Assessment

Fill in the blanks:

15. An index number doesn’t take into account the ……………..of the items.

16. Index number computed by using a number of formulae will give …………….results

11.9 Summary

 An index number is a device for comparing the general level of magnitude of a group of
distinct, but related, variables in two or more situations

 Simple Average of Price Relatives Index


p1
å
p0 (using A.M.)
P01 =
n

é p1 ù
ê å log p ´ 100 ú
P01 = Antilog ê 0 ú
ê n ú (using G.M.)
ê ú
ë û

å p1 ´ 100
 Simple Aggregative Index P01 =
å p0
 Weighted Average of Price relatives Index

P01 =
å Pw
(using weighted A.M.)
åw

250 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

Notes
  w log P 
P01  Antilog   (using weighted G.M.)
  w 

p1
Here P   100 and w denotes values (weights)
p0

 Weighted Aggregative Index Numbers

(a) Laspeyres's Index P01La  p q 1 0


 100
p q 0 0

(b) Paasche's Index P01Pa  p q 1 1


 100
p q 0 1

(c) Fisher's Ideal Index P01Fi  p q  p q


1 0 1 1
 100
p q p q
0 0 0 1

1   p1q0  p1q1 
DB      100
(d) Dorbish and Bowley's Index P01 2   p0 q0  p0 q1 

(e) Marshall and Edgeworth Index P01ME  p q  p q


1 0 1 1
 100
p q  p q
0 0 0 0

(f) Walsh's Index P01Wa  p 1 q0 q1


 100
p 0 q0 q1

(g) Kelly's Index P01Ke   p q  100


1

p q 0

Money Wage
 Real Wage =  100
C .P .I .

Output at Current Prices


 Output at Constant Prices   100
Price Index

1
 Purchasing Power of Money   100
Price Index

11.10 Keywords

Base Year: The year from which comparisons are made is called the base year. It is commonly
denoted by writing ‘0’ as a subscript of the variable.

Consumer Price: It is the price at which the ultimate consumer purchases his goods and services
from the retailer.

Current Year: The year under consideration for which the comparisons are to be computed is
called the current year. It is commonly denoted by writing ‘1’ as a subscript of the variable.

Index Number: An index number is a statistical measure used to compare the average level of
magnitude of a group of distinct but related variables in two or more situations.

LOVELY PROFESSIONAL UNIVERSITY 251


Research Methodology

Notes Quantity Index Number: Index number that measures the change in quantities in current year as
compared with a base year.

11.11 Review Questions


1. Construct Laspeyres's, Paasche's and Fisher's indices from the following data :
1986 1987
Item Price (Rs) Expenditure (Rs) Price (Rs) Expenditure (Rs)
1 10 60 15 75
2 12 120 15 150
3 18 90 27 81
4 8 40 12 48

2. From the following data, prove that Fisher's Ideal Index satisfies both the time reversal
and the factor reversal tests.

Base Year Current Year


Commodity Price Quantity Price Quantity
A 6 50 10 60
B 2 100 2 120
C 4 60 6 60

3. Examine various steps and problems involved in the construction of an index number.
4. Distinguish between average type and aggregative type of index numbers. Discuss the
nature of weights used in each case.
5. Given the following data:

Year Average weekly take-home wages Consumer price index


( ) ( )
1968 109.50 112.8
1969 112.20 118.2
1970 116.40 127.4
1971 125.08 138.2
1972 135.40 143.5
1973 138.10 149.8

(a) What was the real weekly wage for each year?

(b) In which year did the employees had the greatest buying power?

(c) What percentage increase in the average weekly wages for the year 1973 is required
to provide the same buying power that the employees enjoyed in the year in which
they had the highest real wages?

6. Construct Consumer Price Index for the year 1981 with 1971 as the base year.
Items : Food Rent Clothes Fuel Others
Percentage Expenses : 35% 15% 20% 10% 20%
Value Index (1971) : 150 50 100 20 60
Value Index (1981) : 174 60 125 25 90

252 LOVELY PROFESSIONAL UNIVERSITY


Unit 11: Index Numbers

7. Compute consumer price index number from the following data by aggregate expenditure. Notes

Quantities Units in
Prices in Prices in
Commodity consumed in which prices
base year current year
base year are quoted

Wheat 400 kgs /quintal 350 400


Rice 2 quin tals /quintal 580 700
Gram 100 kgs /quintal 740 950
Pulses 2 quin tals /quintal 980 1200
Ghee 50 kgs / kg . 70 85
Sugar 50 kgs / kg . 8 11
Fire wood 5 quintals /quin tal 50 60
House Rent 1 house /house 1600 1800

8. A textile worker in the city of Ahmedabad earns 750 per month. The cost of living index
for January 1986 is given as 160. Using the following data find out the amounts he spends
on (i) Food and (ii) Rent.

9. "In the construction of index numbers the advantages of geometric mean are greater than
those of arithmetic mean". Discuss.

10. Show that the Laspeyres's index has an upward bias and the Paasche's index has a downward
bias. Under what conditions the two index numbers will be equal?

Answers: Self Assessment

1. homogeneous 2. weighted

3. barometers 4. level of an activity

5. base year 6. Current year

7. weights 8. price

9. Quantity index number 10. price index numbers

11. cost of living 12. real wages

13. base 14. weighing

15. quality 16. different

11.12 Further Readings

Books Allan & Blumon, Elementary Statistics : A Step by Step Approach. McGraw-Hill
College, June 2003.
David & Moae, Introduction to the Practice of Statistics, W.H. Freeman & Co., February
2005.
James T. McClave Terry Sincich, William Mendenhall, Statistics, Prentice Hall,
February 2005.
Mario F. Triola, Elementary Statistics, Addison-Wesley, January 2006.
Mark L. Berenson, David M. Revine, Tineothy C. Krehbiel, Basic Business Statistics:
Concepts & Applications, Prentice Hall, May 2005.

LOVELY PROFESSIONAL UNIVERSITY 253


Research Methodology

Notes Unit 12: Hypothesis Testing

CONTENTS
Objectives
Introduction
12.1 Steps Involved in Hypothesis Testing
12.1.1 Formulate the Hypothesis
12.1.2 Significance Level
12.2 Errors in Hypothesis Testing
12.3 Parametric Tests
12.3.1 One Sample Test
12.3.2 Two Sample Test
12.4 Chi-square Test
12.5 ANOVA
12.5.1 One-way ANOVA
12.5.2 Two-way ANOVA
12.6 Non-parametric Test
12.6.1 One Sample Tests
12.6.2 Two Sample Tests
12.6.3 K Sample Test
12.7 Summary
12.8 Keywords
12.9 Review Questions
12.10 Further Readings

Objectives
After studying this unit, you will be able to:

 Identify the Steps involved in Hypothesis Testing

 Resolve the errors in Hypothesis Testing

 Describe the One Sample and Two Sample Parametric Tests

 Explain the Chi-square Test

 Recognize the conception of ANOVA

Introduction

A statistical hypothesis test is a method of making statistical decisions using experimental data.
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance.

254 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

The phrase “test of significance” was coined by Ronald Fisher: “Critical tests of this kind may be Notes
called tests of significance, and when such tests are available we may discover whether a second
sample is or is not significantly different from the first.”

Hypothesis testing is sometimes called confirmatory data analysis, in contrast to exploratory


data analysis. In frequency probability, these decisions are almost always made using null-
hypothesis tests; that is, ones that answer the question. Assuming that the null hypothesis is
true, what is the probability of observing a value for the test statistic that is at least as extreme
as the value that was actually observed? One use of hypothesis testing is deciding whether
experimental results contain enough information to cast doubt on conventional wisdom.

12.1 Steps Involved in Hypothesis Testing

1. Formulate the null hypothesis, with H 0 and HA, the alternate hypothesis. According to the
given problem, H0 represents the value of some parameter of population.

2. Select on appropriate test assuming H0 to be true.

3. Calculate the value.

4. Select the level of significance other at 1% or 5%.

5. Find the critical region.

6. If the calculated value lies within the critical region, then reject H 0.

7. State the conclusion in writing.

12.1.1 Formulate the Hypothesis

The normal approach is to set two hypotheses instead of one, in such a way, that if one hypothesis
is true, the other is false. Alternatively, if one hypothesis is false or rejected, then the other is true
or accepted. These two hypotheses are:

1. Null hypothesis

2. Alternate hypothesis

Let us assume that the mean of the population is m0 and the mean of the sample is x. Since we
have assumed that the population has a mean of m0, this is our null hypothesis. We write this as
H0m = m0, where H0 is the null hypothesis. Alternate hypothesis is H A = m. The rejection of null
hypothesis will show that the mean of the population is not m0. This implies that alternate
hypothesis is accepted.

12.1.2 Significance Level

Having formulated the hypothesis, the next step is its validity at a certain level of significance.
The confidence with which a null hypothesis is accepted or rejected depends upon the
significance level. A significance level of say 5% means that the risk of making a wrong
decision is 5%. The researcher is likely to be wrong in accepting false hypothesis or rejecting
a true hypothesis by 5 out of 100 occasions. A significance level of say 1% means, that the
researcher is running the risk of being wrong in accepting or rejecting the hypothesis is one of
every 100 occasions. Therefore, a 1% significance level provides greater confidence to the
decision than 5% significance level.

LOVELY PROFESSIONAL UNIVERSITY 255


Research Methodology

Notes There are two types of tests.

One-tailed and Two-tailed Tests

A hypothesis test may be one-tailed or two-tailed. In one-tailed test the test-statistic for rejection
of null hypothesis falls only in one-tailed of sampling distribution curve.

Figure 12.1

Example:
1. In a right side test, the critical region lies entirely in the right tail of the sample distribution.
Whether the test is one-sided or two-sided – depends on alternate hypothesis.

2. A tyre company claims that mean life of its new tyre is 15,000 km. Now the researcher
formulates the hypothesis that tyre life is = 15,000 km.

A two-tailed test is one in which the test statistics leading to rejection of null hypothesis falls on
both tails of the sampling distribution curve as shown. One-tailed test is used when the researcher's
interest is primarily on one side of the issue.

Example: "Is the current advertisement less effective than the proposed new advertisement"?
A two-tailed test is appropriate, when the researcher has no reason to focus on one side of the
issue.

Example:
1. "Are the two markets - Mumbai and Delhi different to test market a product?"

2. A product is manufactured by a semi-automatic machine. Now, assume that the same


product is manufactured by the fully automatic machine. This will be two-sided test,
because the null hypothesis is that "the two methods used for manufacturing the product
do not differ significantly".

 H0 = m1 = m2

Sign of alternate hypothesis Type of test


= / Two-sided
< One-sided to right
> One-sided to left

256 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

Degree of Freedom Notes

It tells the researcher the number of elements that can be chosen freely.

Example: a + b/2 = 5. fix a = 3, b has to be 7.


Therefore, the degree of freedom is 1.

Select Test Criteria

If the hypothesis pertains to a larger sample (30 or more), the Z-test is used. When the sample is
small (less than 30), the T-test is used.

Compute

Carry out computation.

Make Decisions

Accepting or rejecting of the null hypothesis depends on whether the computed value falls in the
region of rejection at a given level of significance.

Task Discuss when would you prefer two tailed test to one tailed test.

Self Assessment

Fill in the blanks:

1. Hypothesis testing is sometimes called ............................. analysis.

2. The confidence with which a null hypothesis is accepted or rejected depends upon the
.............................

3. The rejection of null hypothesis means that the ............................. hypothesis is accepted.

12.2 Errors in Hypothesis Testing

There are two types of errors:


1. Hypothesis is rejected when it is true.

2. Hypothesis is not rejected when it is false.

(1) is called Type 1 error (a), (2) is called Type 2 error (b). When a = 0.10 it means that true
hypothesis will be accepted in 90 out of 100 occasions. Thus, there is a risk of rejecting a true
hypothesis in 10 out of every 100 occasions. To reduce the risk, use a = 0.01 which implies that we
are prepared to take a 1% risk i.e., the probability of rejecting a true hypothesis is 1%. It is also
possible that in hypothesis testing, we may commit Type 2 error (b) i.e., accepting a null hypothesis
which is false.

Notes The only way to reduce Type 1 and Type 2 error is by increasing the sample size.

LOVELY PROFESSIONAL UNIVERSITY 257


Research Methodology

Notes Example of Type 1 and Type 2 error

Type 1 and Type 2 error is presented as follows. Suppose a marketing company has 2 distributors
(retailers) with varying capabilities. On the basis of capabilities, the company has grouped them
into two categories (1) Competent retailer (2) Incompetent retailer. Thus R1 is a competent
retailer and R2 is an incompetent retailer. The firm wishes to award a performance bonus (as a
part of trade promotion) to encourage good retailership. Assume that two actions A1 and A2
would represent whether the bonus or trade incentive is given and not given. This is shown as
follows:

Action (R1) Competent retailer (R2) Incompetent retailer


A 1 performance bonus is awarded Correct decision Incorrect decision error ()
A 2 performance bonus is not awarded Incorrect decision error () Correct decision

When the firm has failed to reward a competent retailer, it has committed type-2 error. On the
other hand, when it was rewarded to an incompetent retailer, it has committed type-1 Error.

Self Assessment

Fill in the blanks:

4. Hypothesis is rejected when it is true is called ……………error.

5. Hypothesis is not rejected when it is false is called …………..error

12.3 Parametric Tests


Parametric tests have following advantages:
1. Parametric tests are more powerful. The data in this test is derived from interval and ratio
measurement.
2. In parametric tests, it is assumed that the data follows normal distributions. Examples of
parametric tests are
(a) Z-Test,
(b) T-Test and
(c) F-Test.
3. Observations must be independent i.e., selection of any one item should not affect the
chances of selecting any others be included in the sample.

Did u know? What is univariate/bivariate data analysis?


Univariate

If we wish to analyse one variable at a time, this is called univariate analysis. Example:
Effect of sales on pricing. Here, price is an independent variable and sales is a dependent
variable. Change the price and measure the sales.

Bivariate

The relationship of two variables at a time is examined by means of bivariate data analysis.

If one is interested in a problem of detecting whether a parameter has either increased or


decreased, a two-sided test is appropriate.

258 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

Parametric tests are of following types: Notes

12.3.1 One Sample Test

One sample tests can be categorized into 2 categories.

z Test

1. When sample size is > 30


P1 = Proportion in sample 1
P2 = Proportion in sample 2

Example: You are working as a purchase manager for a company. The following
information has been supplied by two scooter tyre manufacturers.

Company A Company B
Mean life (in km) 13000 12000
S.D (in km) 340 388
Sample size 100 100

In the above, the sample size is 100, hence a Z-test may be used.

2. Testing the hypothesis about difference between two means: This can be used when two
population means are given and null hypothesis is H o : P1 = P2.

Example: In a city during the year 2000, 20% of households indicated that they read Femina
magazine. Three years later, the publisher had reasons to believe that circulation has gone up. A
survey was conducted to confirm this. A sample of 1,000 respondents were contacted and it was
found 210 respondents confirmed that they subscribe to the periodical 'Femina'. From the above,
can we conclude that there is a significant increase in the circulation of 'Femina'?

Solution:

We will set up null hypothesis and alternate hypothesis as follows:

Null Hypothesis is H0 × m = 15%

Alternate Hypothesis is HA × m > 15%

This is a one-tailed (right) test.

210
– 0.20
1000
Z=
0.20 (1 - 0.20 )
1000

0.21 – 0.20
Z=
0.2 ´ 0.8
1000

0.01 – m
=
0.16
1000

LOVELY PROFESSIONAL UNIVERSITY 259


Research Methodology

Notes
0.1
= 0.4
31.62

0.1
= = 8.33
0.012

As the value of Z at 0.05 =1.64 and calculated value of Z falls in the rejection region, we reject null
hypothesis, and therefore we conclude that the sale of 'Femina' has increased significantly.

T-test (Parametric Test)

T-test is used in the following circumstances: When the sample size n < 30.

Example:
1. A certain pesticide is packed into bags by a machine. A random sample of 10 bags are
drawn and their contents are found as follows: 50, 49, 52, 44, 45, 48, 46, 45, 49, 45. Confirm
whether the average packaging can be taken to be 50 kgs.
In this text, the sample size is less than 30. Standard deviations are not known using this
test. We can find out if there is any significant difference between the two means i.e.
whether the two population means are equal.

2. There are two nourishment programmes 'A' and 'B'. Two groups of children are subjected
to this. Their weight is measured after six months. The first group of children subjected to
the programme 'A' weighed 44, 37, 48, 60, 41 kgs. at the end of programme. The second
group of children were subjected to nourishment programme 'B' and their weight was 42,
42, 58, 64, 64, 67, 62 kgs. at the end of the programme. From the above, can we conclude that
nourishment programme 'B' increased the weight of the children significantly, given a 5%
level of confidence.

Null Hypothesis: There is no significant difference between Nourishment programme 'A' and
'B'.

Alternative Hypothesis: Nourishment programme B is better than 'A' or Nourishment


programme 'B' increase the children's weight significantly.

Solution:

Nourishment programme Nourishment programme


A B
X ( x – x )2 y (y – y)
2
x–x y–y
= (x - 46) = (y - 57)

44 -2 4 42 -15 225
37 -9 81 42 -15 225
48 2 4 58 1 1
60 14 196 64 7 49
41 -5 25 64 7 49
67 10 100
62 5 25
230 0 310 399 0 674

260 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

Notes
x-y
t=
æ 1 1ö
s2 ç + ÷
è n1 n2 ø

Here n1 = 5 n2 = 7
x = 230, y = 399

( ) ( )
2 2
 x-x = 310 ,  y - y = 399

x 230
= = 46
x = n 5
1

y 399
y = = = 57
n2 7

s2 =
1
n1 + n2 - 2 {å (x - x) + å (y - y) }
2 2

D.F. = (n1 + n2 – 2) = (5 + 7 – 2) = 10

1
s2 = {310 + 674} = 98.4
10

46 - 57
t=
æ 1 1ö
98.4 ´ ç + ÷
è 5 7ø

-11
=
æ 12 ö
98.4 ´ ç ÷
è 35 ø

-11 11
= =-
33.73 5.8
= – 1.89
t at 10 d.f. at 5% level is 1.81.

Since, calculated t is greater than 1.81, it is significant. Hence H A is accepted. Therefore the two
nutrition programmes differ significantly with respect to weight increase.

Two Tailed t-Test

When two samples are related we use paired t-test for judging the significance of the mean of
difference of the two related samples. It can also be used for judging the significance of the
coefficients of simple and partial correlations.

The t-test is performed using the following formula;

n-2
t = r yx
1 - ryx

LOVELY PROFESSIONAL UNIVERSITY 261


Research Methodology

Notes Where, (n – 2) is degrees of freedom, ryx is coefficient of correlation between x and y. The
computed value of t is compared with its table value. If the computed value is less than the table
value the null hypothesis is accepted or rejected otherwise at a given level of significance.

Example: A study of weight of 18 pairs of male and female employees in a company shows
that coefficient of correlation is 0.52. Test the significance of correlation.
Solution:

Applying t test:

n-2
t= r
1 - r2

r = 0.52, n = 18

18 - 2
t = 0.52
1 - (0.52)2

0.52 ´ 4
= = 2.44
0.854

 = (n – 2) = (18 – 2) = 16
 = 16, t0.05 = 2.12
The calculated value of t is greater than the table value. The given value of r is significant.

12.3.2 Two Sample Test

Two sample test if known as F test

F-Test

Let there be two independent random samples of sizes n1 and n2 from two normal populations
1
with variances s12 and s 22 respectively. Further, let s1 =
2

n1 - 1
å ( X 1 i - X 1 )2 and

1
s22 =
n2 - 1
å ( X2i - X2 )2 be the variances of the first sample and the second samples respectively.
Then F - statistic is defined as the ratio of two 2 - variates. Thus, we can write

 n2 - 1 ( n1 - 1)s12 s12
1 /( n1 - 1)
n-1 = s 2
s 12
F= 1
=
n
2
( n2 - 1)s 2
s22
2 -1 2
/( n2 - 1)
n2 - 1 s 2
2 s 22

Features of F- distribution

1. This distribution has two parameters v1 (= n1 – 1) and v2 (= n2 – 1).

v2
2. The mean of F - variate with v1 and v2 degrees of freedom is and standard error is
v2 - 2

æ v2 ö 2( v1 + v2 – 2)
çè v – 2 ÷ø v1 ( v2 – 4)
2

262 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

We note that the mean will exist if v2 > 2 and standard error will exist if v2 > 4. Further, the Notes
mean > 1.
3. The random variate F can take only positive values from 0 to . The curve is positively
skewed.
4. For large values of 1 and 2, the distribution approaches normal distribution.
5. If a random variate follows t-distribution with  degrees of freedom, then its square
follows F-distribution with 1 and  d.f. i.e. t2 = F1,

(  v21 )
6. F and 2 are also related as F  ,  = as 2 
1 2 1

Figure 12.2

p(F)
1 = 40, 1 = 40

1 = 30, 1 = 30

1 = 10, 1 = 10

O F

Self Assessment

Fill in the blanks:

6. The relationship of two variables at a time is examined by means of ............................. data


analysis.

7. The data in parametric test is derived from interval and ………….measurement.

8. One sample tests can be categorized into ……categories

12.4 Chi-square Test

A chi-square test (also chi-squared or  2 test) is any statistical hypothesis test in which the
sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is
true, or any in which this is asymptotically true, meaning that the sampling distribution (if the
null hypothesis is true) can be made to approximate a chi-square distribution as closely as
desired by making the sample size large enough.

!
Caution One case where the distribution of the test statistic is an exact chi-square
distribution is the test that the variance of a normally-distributed population has a given
value based on a sample variance. Such a test is uncommon in practice because values of
variances to test against are seldom known exactly.

LOVELY PROFESSIONAL UNIVERSITY 263


Research Methodology

Notes It is used in the following circumstances:

1. Sample observations should be independent i.e. two individual items should be included
twice in a sample.

2. The sample should contain at least 50 observations

or

total frequency should be greater than 50.


3. There should be a minimum of five observations in any cell. This is called cell frequency
constraint.
For instance: Chi-square

Age Group Total


Persons
Under 20-40 20-40 41-50 51 & Over
Liked the car 146 78 48 28 300
Disliked the car 54 52 32 62 200
Total 200 130 80 90 500

Is there any significant difference between the age group and preference for the car?

Example: A company marketing tea claims that 70% of population in a metro drinks a
particular brand (Wood Smoke) of tea. A competing brand challenged this claim. They took a
random sample of 200 families to gather data. During the study period, it was found that 130
families were using this brand of tea. Will it be correct on the part of competitor to conclude that
the claim made by the company does not holds good at 5% level of significance?

Solution:
Hypothesis H0 – People who drink Wood Smoke brand is 70%.
H0 – People who drink Wood Smoke brand is not 70%.
If the hypothesis is true then number of consumers who drink this particular brand is 200 × 0.7
= 140.
Those who do not drink that brand are 200 × 0.3 = 60
Degree of freedom = D = 2 – 1 = 1, since there are two groups.

Group Observed Expected O-E (O-E)2 (O-E)2/E


(O) (E)

Those who drink branded tea 130 140 -10 100 0.714

Those who did not drink 70 60 +10 100 1.667


branded tea

200 200 0

(0 - E )
2 = = 2.381
E
A 0.5 level of significance of for 1 d.f. is equal to 3.841 (From tables). The calculated value is 2.381
is lower. Therefore, we accept the hypothesis that 70% of the people in that metro drink Wood
Smoke branded tea.

264 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

Self Assessment Notes

Fill in the blanks:

9. A chi-square test is used when sample observations should be ………………

10. For applying chi-square test , sample should contain at least ……… observations

12.5 ANOVA
ANOVA is a statistical technique. It is used to test the equality of three or more sample means.
Based on the means, inference is drawn whether samples belongs to same population or not.

Notes Conditions for using ANOVA

1. Data should be quantitative in nature.


2. Data normally distributed.
3. Samples drawn from a population follow random variation.

ANOVA can be discussed in two parts:


1. One-way classification
2. Two and three-way classification.  

12.5.1 One-way ANOVA

Following are the steps followed in ANOVA:


1. Calculate the variance between samples.
2. Calculate the variance within samples.
3. Calculate F ratio using the formula.
F = Variance between the samples/Variance within the sample
4. Compare the value of F obtained above in (3) with the critical value of F such as 5% level
of significance for the applicable degree of freedom.
5. When the calculated value of F is less than the table value of F, the difference in sample
means is not significant and a null hypothesis is accepted. On the other hand, when the
calculated value of F is more than the critical value of F, the difference in sample means is
considered as significant and the null hypothesis is rejected.

Example: ANOVA is useful.


1. To compare the mileage achieved by different brands of automotive fuel.

2. Compare the first year earnings of graduates of half a dozen top business schools.

Application in Market Research

Consider the following pricing experiment. Three prices are considered for a new toffee box
introduced by Nutrine company. Price of three varieties of toffee boxes are 39, 44 and
49. The idea is to determine the influence of price levels on sales. Five supermarkets are
selected to exhibit these toffee boxes. The sales are as follows:

LOVELY PROFESSIONAL UNIVERSITY 265


Research Methodology

Notes Price ( ) 1 2 3 4 5 Total Sample mean x


39 8 12 10 9 11 50 10
44 7 10 6 8 9 40 8
49 4 8 7 9 7 35 7

What the manufacturer wants to know is: (1) Whether the difference among the means is
significant? If the difference is not significant, then the sale must be due to chance. (2) Do the
means differ? (3) Can we conclude that the three samples are drawn from the same population
or not?

Example: In a company there are four shop floors. Productivity rate for three methods of
incentives and gain sharing in each shop floor is presented in the following table. Analyze
whether various methods of incentives and gain sharing differ significantly at 5% and 1%
F-limits.

Shop Productivity rate data for three methods of incentives


Floor and gain sharing

X1 X2 X3
1 5 4 4
2 6 4 3
3 2 2 2
4 7 6 3

Solution:

Step 1: Calculate mean of each of the three samples (i.e., x 1, x2 and x3, i.e. different methods of
incentive gain sharing).

5+6+2+7
X1 = =5
4

4+3+2+3
X2 = =3
4

4+3+2+3
X3 = =3
4

X1 + X2 + X3
Step 2: Calculate mean of sample means i.e., XX =
K

5+3+3
where, K denotes Number of samples = = 4(approximated)
3

Step 3: Calculate sum of squares (s.s.) for variance between and within the samples.

ss between = n 1 (x1 - x)2 + n 2 (x2 - x)2 + n 3 (x3 - x)2

ss within = S(x1i - x1 )2 + S(x 2i - x2 )2 + S(x 3i - x3 )2

Sum of squares (ss) for variance between samples is obtained by taking the deviations of the
sample means from the mean of sample means () and by calculating the squares of such deviation,

266 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

which are multiplied by the respective number of items or categories in the samples and then by Notes
obtaining their total. Sum of squares(ss) for variance within samples is obtained by taking
deviations of the values of all sample items from corresponding sample means and by squaring
such deviations and then totalling them. For our illustration then
ss between = 4(5 – 4)2 + 4 (4 – 4)2 + 4 (3 – 4)2
= 4+0+4=8

{(5 - 5)2 + (6 - 5)2 + (2 - 5)2 + (7 - 5)2 } {(4 - 4)2 + (4 - 4)2 + (2 - 4)2 + (6 - 4)2 }
ss within = + S(x 2i - x2 )2
S(x 1i - x1 ) 2

{(4 - 3)2 + (3 - 3)2 + (2 - 3)2 + (3 - 3)2 }


+ S(x 3i - x3 )2

= (0 + 1 + 9 + 4) + (0 + 0 + 4 + 4) + (1 + 0 + 1 + 0)

= 14 + 8 + 2

= 24

Step 4: ss of total variance which is equal to total of s.s. between and ss within and is denoted by
formula as follows:

S(x ij - x)2

where

i = 1.23

j = 1.23

for our example, total ss will thus be:

[ {(5 - 4)2 + (6 - 4)2 + (2 - 4)2 + (7 - 4)2 } + {(4 - 4)2 + (4 - 4)2 + (2 - 4)2 + (6 - 4)2 }

+ {(4 - 4)2 + (3 - 4)2 + (2 - 4)2 + (3 - 4)2 }]

= {(1 + 4 + 4 + 9) + (0 + 0 + 4 + 4) + (0 + 1 + 4 + 1)}
= 08 + 8 + 6 = 32

We will, however, get the same value if we simply total respective values of ss between and ss
within. For our example, ss between is 8 and ss within is 24, thus ss of total variance is 32 (8+24).

Step 5: Ascertain degrees of freedom and mean square (MS) between and within the samples.
Degrees of freedom (df) for between samples and within samples are computed differently as
follows.

For between samples, df is (k-1), where k' represents number of samples (for us it is 3). For
within samples df is (n-k), where 'n' represents total number of items in all the samples (for us
it is 12).

Mean squares (MS) between and within samples are computed by dividing the ss between and
ss within by respective degrees of freedom. Thus for our example:

ss between 8
(i) MS between = = =4
(k - 1) 2

where (K – 1) is the df.

LOVELY PROFESSIONAL UNIVERSITY 267


Research Methodology

Notes
ss within 24
(ii) MS within = = = 2.67
(n - k) 9

where (n – k) is the df.

Step 6: Now we will have to compute F ratio by analysing our samples. The formula for computing
ss between
'F' ratio is:
ss within

4.00
Thus for our example, F ratio = = 1.5
2.67

Step 7: Now we will have to analyze whether various methods of incentives and gain sharing
differ significantly at 5% and 1% 'F' limits. For this, we need to compare observed 'F' ratio with
'F' table values. When observed 'F' value at given degrees of freedom is either equal to or less
than the table value, difference is considered insignificant. In reverse cases, i.e., when calculated
'F' value is higher than table-F value, the difference is considered significant and accordingly we
draw our conclusion.

For example, our observed 'F' ratio at degrees of freedom (v 1* & v2**, i.e., and 9) is 1.5. The table
value of F at 5% level with df 2 and 9 (v 1 = 2, v2 = 9) is 4.26. Since the table value is higher than
the observed value, difference in rate of productivity due to various methods of incentives and
gain sharing is considered insignificant. At 1% level with df 2 and 9, we get the table value of F
as 8.02 and we draw the same conclusion.

We can now draw an ANOVA table as follows to show our entire observation.

Variation SS df MS F-ratio Table value


of F
5% 1%
Between 8 (k–1)= ss between MS between F (v1, v2) F (v1, v2)
sample (3–1)=2 (k–1) MS within =F (2,9) =F(2,9)
= 8/2 = 4 = 4/2.67 = 4.26 8.02
Within 24 (n–k)= ss.within =1.5
simple (12–3) (n–k)
=9 = 24/9
= 2.67

12.5.2 Two-way ANOVA

The procedure to be followed to calculate variance is the same as it is for the one-way classification.
The example of two-way classification of ANOVA is as follows:

Suppose, a firm has four types of machines – A, B, C and D. It has put four of its workers on each
machines for a specified period, say one week. At the end of one week, the average output of
each worker on each type of machine was calculated. These data are given below:
Average Production by the Type of Machine

A B C D
Worker 1 25 26 23 28
Worker 2 23 22 24 27
Worker 3 27 30 26 32
Worker 4 29 34 27 33

268 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

The firm is interested in knowing: Notes

1. Whether the mean productivity of workers is significantly different.

2. Whether there is a significant difference in the mean productivity of different types of


machines.

Example: Company ‘X’ wants its employees to undergo three different types of training
programme with a view to obtain improved productivity from them. After the completion of
the training programme, 16 new employees are assigned at random to three training methods
and the production performance were recorded.

The training managers problem is to find out if there are any differences in the effectiveness of
the training methods? The data recorded is as under:
Daily Output of New Employees

Method 1 15 18 19 22 11
Method 2 22 27 18 21 17
Method 3 18 24 19 16 22 15

Following steps are followed.

1. Calculate Sample mean i.e. x

2. Calculate General mean i.e. x

å n (x )
2

2 i i -x
3. Calculate variance between columns using the formula s =
k-1

where K = (n1 + n2 + n3 – 3).


4. Calculate sample variance. It is calculated using formula:

å (x )
2
i -x
Sample variance si 2
= where n is No. of observation under each method.
n-1

5. Calculate variance within columns using the formula s =


2 ån i -1
nr - k

æ between column variance ö


6. Calculate F using the ratio F = ç within column variance ÷
è ø

7. Calculate the number of degree of freedom in the numerator F ratio using equation, d.f =
(No. of samples –1).
8. Calculate the number of degree of freedom in the denominator of F ratio using the equation
d.f = S(ni – k)
9. Refer to F table f8 find value.
10. Draw conclusions.

LOVELY PROFESSIONAL UNIVERSITY 269


Research Methodology

Notes Solution:

Method 1 Method 2 Method 3


15 22 24
18 27 19
19 18 16
22 21 22
11 17 15
18
85 105 114

1. Sample mean is calculated as follows:

85 105 114
x1 = = 17, x 2 = = 21, x 3 = = 19
5 5 6

2. Grand mean

15 + 18 + 19 + 22 + 11 + 22 + 27 + 18 + 21 + 17 + 24 + 19 + 16 + 22 + 15 + 18 304
x= = = 19
16 16

3. Calculate variance between columns:

n ( x – x) ( )
2 2
x x x–x n x–x

5 17 19 -2 4 5 × 4 = 20
5 21 19 2 4 5 × 4 = 20
6 19 19 0 0 6×0=0

å n (x ) = 40
2
i 1 -x

å n (x )
2

2 i i -x 40
s = = = 20
k-1 3-1
Variance between column = 20
4. Calculation sample variance:

Training method -1 Training method -2 Training method -3

( x - x) ( x - x) ( x - x)
2 2 2
x-x x-x x-x

15-17 (-2)2 = 4 22-21 (1)2 = 1 18-19 (1)2 = 1

18-17 (1)2 = 1 27-21 (6)2 = 36 24-19 (5)2 = 25

19-17 (2)2 = 4 18-21 (-3)2 = 9 19-19 (0)2 = 0

22-17 (5)2 = 25 21-21 (0)2 = 1 16-19 (-3)2 = 9

11-17 (-6)2 = 36 17-21 (-4)2 = 16 22-19 (3)2 = 9

15-19 (-4)2 = 16

å ( x - x ) = 70 å ( x - x ) = 62 å ( x - x ) = 60
2 2 2

å (x - x ) å (x - x ) å (x - x )
2 2 2
70 62 60
Sample variance = = , = , =
n-1 5-1 n-1 5-1 n-1 5-1

270 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

60
Notes
70 62
s12 = = 17.5 , s22 = = 15.5 , s3 = = 12
2

4 4 5

æ ni - 1 ö 2
5. Within column variance s
2
= å çè n ÷ s1
i - kø

æ 5-1 ö æ 5-1 ö æ 6-1 ö


= çè ÷ ´ 17.5 + çè ÷ ´ 15.5 + çè ÷ ´ 12
16 - 3 ø 16 - 3 ø 16 - 3 ø

æ 4ö æ 4ö 5
= çè ÷ø ´ 17.5 + çè ÷ø ´ 15.5 + ´ 12
13 13 13

192
Within column variance = = 14.76
13

Between column variance 20


6. F= = = 1.354
Within column variance 14.76

7. d.f. of Numerator = (3 - 1) = 2.
8. d.f. of Denominator = Sn1 – k = (5 - 1) + (5 - 1) + (6 - 1) = 16 - 3 = 13.
9. Refer to table using d.f. = 2 and d.f. = 13.
10. The value is 3.81. This is the upper limit of acceptance region. Since calculated value 1.354
lies within it we can accept H0, the null hypothesis.

Conclusion: There is no significant difference in the effect of the three training methods.

Example: Let us now frame a problem to study the effects of incentive and gain sharing and
level of technology (independent variables) on productivity rate (dependent variable).
Productivity Rate Data of Workers of M/s. XYZ & Co.

Level of Technology Incentive and gain sharing


A B C
W 4 3 3
X 5 3 2
Y 1 1 1
Z 6 5 2

Solution:

1. Total values (T) of individual item = 36, n = 12

(T)2 36 ´ 36
2. Correction factor = =
n 12

= 108

3. Total ss = (16 + 9 + 9 + 25 + 9 + 4 + 1 + 1 + 1 + 36 + 25 + 4)

= 140 – 108 = 32

LOVELY PROFESSIONAL UNIVERSITY 271


Research Methodology

Notes 4. ss between columns:

é 16 ´ 16 12 ´ 12 8 ´ 8 ù
= ê + + - 108
ë 4 4 4 úû

5. ss between rows:

é 10 ´ 10 10 ´ 10 3 ´ 3 13 ´ 13 ù
= ê + + + - 108
ë 3 3 3 3 úû

é 100 100 9 169 ù


= ê + + + - 108
ë 3 3 3 3 úû

= [33.33 + 33.33 + 3 + 56.33] - 108

= 126 - 108

= 18 (after adjusting fraction)

6. ss residual:

= Total ss - (ss between column + ss between rows)

= 32 – (8 +18) = 6.

Now we need to set up ANOVA table.

Variation SS d.f M.S F ratio 5% 1%


source
Between 8 (c–1) = 2 8/2=4 4/1=4 F (2, 6) F (2, 6)
columns (r–1) = 3 = 5.14 = 10.92
Between rows 18 18/3=6 6/1=6 F (3, 6) F (3, 6)
= 4.76 = 9.78

Residual 6 (c–1) x (r–1) 6/6=1


=6

From the ANOVA table, we find that differences related to varieties of incentives and gain
sharing are insignificant at 5% level as the calculated F-ratio, i.e., 4 is less than table value of F,
which is 5.14. However differences are significant for different levels of technology at 5% level
as the observed F ratio is higher than table value of F. At 1% level, however, differences are
insignificant.

Self Assessment

Fill in the blanks:

11. ............................. is used to test the equality of three or more sample means.

12. For using ANOVA Data should be ……………..in nature

272 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

12.6 Non-parametric Test Notes

Non-parametric tests are used to test the hypothesis with nominal and ordinal data.

1. We do not make assumptions about the shape of population distribution.

2. The hypothesis of non-parametric test is concerned with something other than the value
of a population parameter.

3. Easy to compute. There are certain situations particularly in marketing research, where
the assumptions of parametric tests are not valid. Example: In a parametric test, we assume
that data collected follows a normal distribution. In such cases, non-parametric tests are
used. Example of non-parametric tests are Binomial test, Mann-Whitney U test, Sign test,
etc. A binomial test is used when the population has only two classes such as male, female;
buyers, non-buyers, success, failure etc. All observations made about the population must
fall into one of the two tests. The binomial test is used when the sample size is small.

Did u know? Non-parametric tests are distribution-free tests.

Advantages

1. They are quick and easy to use.

2. When data are not very accurate, these tests produce fairly good results.

Disadvantage

Non-parametric test involves the greater risk of accepting a false hypothesis and thus committing
a Type 2 error.

12.6.1 One Sample Tests

The following are the main examples of one sample non-parametric tests:

Cox and Stuart Test

This test is used to examine the presence of trends. A set of numbers is said to show upward trend
if the latter numbers in the sequence are greater than the former numbers. And similarly, one
can define a downward trend. How to examine whether a trend is noticeable in a sequence?

Example: Suppose a marketer wants to examine whether its sales are showing a trend or
just fluctuating randomly. Suppose the company has gathered the monthly sales figures during
the past one year month-wise:

Month 1 2 3 4 5 6 7 8 9 10 11 12
Sales 200 250 280 300 320 278 349 268 240 318 220 380

From the given data, analyse the sales trend.

LOVELY PROFESSIONAL UNIVERSITY 273


Research Methodology

Notes Sign-test

Sign-test is used with matched pairs. The test is used to identify the pairs and decide whether the
pair has more or less similar characteristics.

Example: Suppose, an experiment on the effect of brand name on quality perceptions is to be


conducted. 10 persons are selected and asked to taste and compare the two products (beverage).
One of them is identified as branded well known beverage, and the other is a new beverage. In
reality, the samples are identical. The respondents who tested were asked to rate the two samples
on an ordinal scale. Two hypotheses are set up as follows:

H0 - there is no difference between the perceived qualities of two beverages.

HA - there is a difference in the perceived qualities of two beverages.

12.6.2 Two Sample Tests

The following are the main examples of two sample non-parametric tests:

Mann Whitney “U” Test


(Rank Sum test)

This test is used to determine whether two independent samples have been drawn from the
same population. Suppose an experiment has obtained two sets of samples from two populations
and the study wishes to examine whether the two populations are identical.

Example: A computer company XYZ would like to choose the performance of


programmers, working in 2 branches, located in different cities. The performance indices of
employees:
Branch A Branch B
84 76
68 77
78 64
49 62
45 53

To find out whether there is any difference in the performance indices of employees of the two
branches.

Kolmogorov-Smirnov Test

This is used for examining the efficacy of fit between observed samples and expected frequency
distribution of data when the variable is in the ordinal scale.

Example: A manufacturer of cosmetics wants to test four different shades of the liquid
foundation compound – very light, light, medium and dark. The company has hired a market
research agency to determine whether any distinct preference exists towards either extreme. If
so, the company will manufacture only the preferred shade, otherwise, the company is planning
to market all shades. Suppose, out of a sample of hundred, 50 preferred “very light shade” 30
liked light shade, 15 the medium shade, and 50 dark shades. Do you think the results show any
kind of preference?

Since the shade represents ordering (rank), this test can be used to find the preference.

274 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

12.6.3 K Sample Test Notes

We can use the Mann Whitney test; when two populations are involved, the Kruskal-Wallis test
is used, when more than two populations are involved. This test will enable us to know whether
independent samples have been drawn from the same population or from different populations
having the same distribution. This test is an extension of “Mann Whitney test”.

This is a type of Rank Sum test. This test is used to find out whether two or more independent
samples are drawn from an identical population. This test is also called the H Test. Mann
Whitney test is used when only two populations are involved and Kruskal- Wallis test is used
when more than two populations are involved.

Example: In an assembling unit, three different workers do assembly work in shifts. The
data is tabulated as follows:

Shift No. Worker-1 Worker-2 Worker-3


1 25 28 29
2 31 28 30
3 35 29 27
4 33 28 36
5 35 32 31
6 31 32 34

Check whether there is any difference in the production quantum of the three workers:

Example: (Kruskal-Wallis Test, H-Test)


Let us assume that there are three categories of workers involved in a building construction. The
wages depends on the skills possessed by them and their availability. The wages of three
categories, namely painter carpenter and plumber are as follows:

Item Sample 1 Sample 2 Sample 3


Daily wages Daily wages Daily wages
(Painter ) (Carpenter ) (Plumber )
1 64 72 51
2 66 74 52
3 72 75 54
4 74 78 56
5 80

Use H-test and state whether the three populations are same or different.

Solution:

H0 - The wages of the three occupations are the same.

H1 - The wages of the three occupations is not the same.

LOVELY PROFESSIONAL UNIVERSITY 275


Research Methodology

Notes
Item Wage-Painter Wage-Carpenter Wage-Plumber
/day /day /day
Rank Rank Rank
1 64 5 72 7.5 51 1
2 66 6 74 9.5 52 2
3 72 7.5 75 11 54 3
4 74 9.5 78 12 56 4
5 80 13
Total 276 R1 = 28 379 R2 = 53 213 R3 = 10

n1 = 4, n2 = 5, n3 = 4
n = n1 + n2 + n3 = 4 + 5 + 4 = 13
R1 = 28, R2 = 53, R3 = 10

12 é R1 2 ù
H= å ê
n (n + 1) ë n1 û
ú - 3(n + 1)

12 é 282 532 10 2 ù
H= å ê
13 (13 + 1) ë 4
+
5
+
4 û
ú - 3(3 + 1) = 9.61

At 5% level of significance, for d.f. = (3 - 1) = 2, the table value is 5.991. Computed value 9.61 is
greater.

Conclusion: Reject the Null hypothesis that the three populations are different.

Self Assessment

Fill in the blanks:

13. ............................. Test is used to determine whether two independent samples have been
drawn from the same population.

14. ............................. Test is used for examining the efficacy of fit between observed samples
and expected frequency distribution.

15. Sign-test is used with ............................. pairs.

16. Non-parametric tests are used to test the hypothesis with ............................. and
............................. data.

17. ............................. Test is used to examine the presence of trends.

18. ............................. test involves the greater risk of accepting a false hypothesis.

12.7 Summary

 Hypothesis testing is the use of statistics to determine the probability that a given
hypothesis is true.

 The usual process of hypothesis testing consists of four steps.

 Formulate the null hypothesis and the alternative hypothesis.

276 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

 Identify a test statistic that can be used to assess the truth of the null hypothesis. Notes

 Compute the P-value, which is the probability that a test statistic at least as significant as
the one observed would be obtained assuming that the null hypothesis were true.

 The smaller the r -value, the stronger the evidence against the null hypothesis.

 Compare the r -value to an acceptable significance value .

 If p £  , that the observed effect is statistically significant, the null hypothesis is ruled out,
and the alternative hypothesis is valid.

12.8 Keywords

Alternate Hypothesis: An alternative hypothesis is one that specifies that the null hypothesis is
not true. The alternative hypothesis is false when the null hypothesis is true, and true when the
null hypothesis is false.

ANOVA: It is a statistical technique used to test the equality of three or more sample means.

Degree of Freedom: It is the consideration that tells the researcher the number of elements that
can be chosen freely.

Null Hypothesis: The null hypothesis is a hypothesis which the researcher tries to disprove,
reject or nullify.

Significance Level: Significance level is the criterion used for rejecting the null hypothesis.

12.9 Review Questions

1. What hypothesis, test and procedure would you use when an automobile company has
manufacturing facility at two different geographical locations? Each location manufactures
two-wheelers of a different model. The customer wants to know if the mileage given by
both the models is the same or not. Samples of 45 numbers may be taken for this purpose.

2. What hypothesis, test and procedure would you use when a company has 22 sales
executives? They underwent a training programme. The test must evaluate whether the
sales performance is unchanged or improved after the training programme.

3. What hypothesis, test and procedure would you use in A company has three categories of
managers:

(a) With professional qualifications but without work experience.

(b) With professional qualifications accompanied by work experience.

(c) Without professional qualifications but with work experience.

4. Each person in a random sample of 50 was asked to state his/her sex and preferred colour.
The resulting frequencies are shown below.

Red Blue Green


Colour
Male 5 14 6

Sex Female 15 6 4

A chi-square test is used to test the null hypothesis that sex and preferred colour are
independent. Will you reject at the null hypothesis 0.005 level? Why/Why not?

LOVELY PROFESSIONAL UNIVERSITY 277


Research Methodology

Notes 5. Are all employees equally prone to having accidents? To investigate this hypothesis,
Parry (1985) looked at a light manufacturing plant and classified the accidents by type and
by age of the employee.

Accident Type
Age
Sprain Burn Cut

Under 25 9 17 5

25 or over 61 13 12

A chi-square test gave a test-statistic of 20.78. If we test at a =.05, does the proportion of
sprain, cuts and burns seems to be similar for both age classes? Why/why not?

6. In hypothesis testing, if  is the probability of committing an error of Type II. The power
of the test, 1 –  is then the probability of rejecting H0 when HA is true or not? Why?

7. In a statistical test of hypothesis, what would happen to the rejection region if , the level
of significance, is reduced?

8. During the pre-flight check, Pilot Mohan discovers a minor problem - a warning light
indicates that the fuel gauge may be broken. If Mohan decides to check the fuel level by
hand, it will delay the flight by 45 minutes. If he decides to ignore the warning, the aircraft
may run out of fuel before it gets to Mumbai. In this situation, what would be:

(a) the appropriate null hypothesis? and;

(b) a type I error?

9. Can the probability of a Type II error be controlled by the sample size? Why/ why not?

10. A research biologist has carried out an experiment on a random sample of 15 experimental
plots in a field. Following the collection of data, a test of significance was conducted under
appropriate null and alternative hypotheses and the P-value was determined to be
approximately .03. What does this indicate with respect to the hypothesis testing?

11. Two samples were drawn from a recent survey, each containing 500 hamlets. In the first
sample, the mean population per hamlet was found to be 100 with a S.D. of 20, while in the
second sample the mean population was 120 with a S.D. 15. Do you find the averages of the
samples to be statistically significant?

12. A simple random sample of size 100 has a mean of 15, the population variance being 25.
Find an interval estimate of the population mean with a confidence level of (i) 99% and
(ii) 95%.

13. A population consists of five numbers 2, 3, 6, 8, 11. Consider all possible samples of size
two which can be drawn with replacement from this population. Calculate the S.E. of
sample means.

14. A certain drug is claimed to be effective in curing colds; half of them were given sugar
pills. The patients’ reactions to the treatment are recorded in the following table.

Helped Harmed No effect


Drug 52 10 18
Sugar pills 44 10 26

Test the hypothesis that the drug is no better than the sugar pills for curing colds. (The 5 %
value of x2 for v = 2 = 5.991)

278 LOVELY PROFESSIONAL UNIVERSITY


Unit 12: Hypothesis Testing

15. A random sample of 640 persons from a village provided the following information: Notes

Effect of Influenza New drug administered New drug not Total


administered
Attacked 100 60 160
Not attacked 200 280 480
Total 300 340 640

Test whether the new drug was effective in preventing the attack of influenza.

Answers: Self Assessment

1. confirmatory data 2. significance level

3. alternate 4. Type 1

5. Type 2 6. bivariate

7. ratio 8. 2

9. independent 10. 50

11. ANOVA 12. quantitative

13. Mann Whitney “U” 14. Kolmogorov-Smirnov

15. matched 16. nominal, ordinal

17. Cox and Stuart 18. Non-parametric

12.10 Further Readings

Books Abrams, M.A, Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.
R.S. Bhardwaj, Business Statistics, Excel Books, New Delhi, 2008.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.

LOVELY PROFESSIONAL UNIVERSITY 279


Research Methodology

Notes Unit 13: Multivariate Analysis

CONTENTS
Objectives
Introduction
13.1 Multivariate Analysis
13.1.1 Multiple Regression
13.2 Discriminant Analysis
13.3 Conjoint Analysis
13.4 Factor Analysis
13.4.1 Principle Component Factor Analysis
13.4.2 Rotation in Factor Analysis
13.5 Cluster Analysis
13.6 Multidimensional Scaling (MDS)
13.7 Summary
13.8 Keywords
13.9 Review Questions
13.10 Further Readings

Objectives
After studying this unit, you will be able to:

 Explain the concept of multivariate analysis

 Classify the multivariate analysis

 Define the Discriminant Analysis and Conjoint Analysis

 Discuss the Factor Analysis and Cluster Analysis

 State the Multidimensional Scaling (MDS)

Introduction

As the name indicates, multivariate analysis comprises a set of techniques dedicated to the
analysis of data sets with more than one variable. Several of these techniques were developed
recently in part because they require the computational capabilities of modern computers.
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which
involves observation and analysis of more than one statistical variable at a time. In design and
analysis, the technique is used to perform trade studies across multiple dimensions while taking
into account the effects of all variables on the responses of interest. Sometimes, the marketers
will come across situations, which are complex involving two or more variables. Hence, bivariate
analysis deals with this type of situation. Chi-Square is an example of bivariate analysis.

280 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

13.1 Multivariate Analysis Notes

In multivariate analysis, the number of variables to be tackled are many.

Example: The demand for television sets may depend not only on price, but also on the
income of households, advertising expenditure incurred by TV manufacturer and other similar
factors. To solve this type of problem, multivariate analysis is required.

Classification

Multiple-variate analysis: This can be classified under the following heads:

1. Multiple regression

2. Discriminant analysis

3. Conjoint analysis

4. Factor analysis

5. Cluster analysis

6. Multidimensional scaling.

13.1.1 Multiple Regression

In the case of simple linear regression, one variable, say, X1 is affected by a linear combination
of another variable X2 (we shall use X1 and X2 instead of Y and X used earlier). However, if X1 is
affected by a linear combination of more than one variable, the regression is termed as a
multiple linear regression.
Let there be k variables X1, X2 ...... Xk, where one of these, say Xj, is affected by the remaining k –
1 variables. We write the typical regression equation as
Xjc = aj×1, 2, .... j–1, j + 1, .... k + bj 1.2,3, .... j –1, j + 1, ....kX1 + bj 2.1, 3, .... j – 1, j + 1, ....kX2 +......(j = 1, 2,.... k).

Here aj.1,2, .... , bj1.2, 3, .... ...... etc. are constants. The constant aj.1,2, .... is interpreted as the value of Xj
when X2, X3, ..... Xj-1, Xj + 1 ..... Xk are all equal to zero. Further, bj1.2,3, .... j–1, j + 1, ....k , bj2.1,3, .... j –1, j +1, ....k etc.,
are (k – 1) partial regression coefficients of regression of Xj on X1, X2 ...... Xj – 1, Xj + 1 ...... Xk.
For simplicity, we shall consider three variables X1, X2 and X3. The three possible regression
equations can be written as
X1c = a1.23 + b12.3X2 + b13.2X3 .... (1)
X2c = a2.13 + b21.3X1 + b23.1X3 .... (2)
X3c = a3.12 + b31.2X1 + b32.1X2 .... (3)
Given n observations on X1, X2 and X3, we want to find such values of the constants of the

å (X )
n 2
regression equation so that ij - Xijc , j = 1, 2, 3, is minimised.
i=1

For convenience, we shall use regression equations expressed in terms of deviations of variables
from their respective means. Equation (1), on taking sum and dividing by n, can be written as

åX 1c
= a1.23 + b12.3
åX 2
+ b13.2
åX 3

n n n

LOVELY PROFESSIONAL UNIVERSITY 281


Research Methodology

Notes
or X 1 = a1.23 + b12.3X 2 + b13.2 X 3 .... (4)

Note: X1 = X1c.


Subtracting (4) from (1), we have

( )
X 1c - X 1 = b12.3 X 2 - X 2 + b13.2 X 3 - X 3 ( ) or x1c = b12.3x 2 + b13.2 x 3 .... (5)

where X 1c - X 1 = x1c , X 2 - X 2 = x 2 and X 3 - X 3 = x 3 .

Similarly, we can write equations (2) and (3) as


x2c = b21.3x1 + b23.1x3 .... (6)
and x3c = b31.2x1 + b32.1x2, respectively. .... (7)

Notes The subscript of the coefficients preceding the dot are termed as primary subscripts
while those appearing after it are termed as secondary subscripts. The number of secondary
subscripts gives the order of the regression coefficient, e.g., b12.3 is regression coefficient of
order one, etc.

Least Square Estimates of Regression Coefficients

Let us first estimate the coefficients of regression equation (5). Given n observations on each of
the three variables X1, X2 and X3, we have to find the values of the constants b12.3 and b13.2X3 so
that is minimised. Using method of least squares, the normal equations can be written as

åx x 1 2 = b12.3 å x22 + b13.2 å x2 x3 .... (8)

åx x 1 3 = b12.3 å x2 x3 + b13.2 å x32 .... (9)

Solving the above equations simultaneously, we get

(å x x )(å x ) - (å x x )(å x x )
1 2
2
3 1 3 2 3
b12.3 = .... (10)
(å x )(å x ) - (å x x )
2
2 2
2 3 2 3

b13.2 =
(å x x )(å x ) - (å x x )(å x x )
1 3
2
2 1 2 2 3
.... (11)
(å x )(å x ) - (å x x )
2
2 2
2 3 2 3

Using equation (4), we can find a1.23 = X 1 - b12.3X 2 - b13.2 X 3 .

Notes
1. Various sums of squares and sums of products of deviations, used above, can be

(å X )(å X ). For example, put p


åx x = åX X
p q
computed using the formula p q p q -
n

= 1 and q = 2 in the formula to obtain SX1X2 and put p = q = 2, to obtain åx 2


2 , etc.
Contd...

282 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

2. The fact that a regression coefficient is independent of change of origin can also be Notes
utilised to further simplify the computational work.
3. The regression coefficients of equations (2) and (3) can be written by symmetry as
given below:

(å x x )(å x ) - (å x x )(å x x )
2 1
2
3 2 3 1 3

(å x )(å x ) - (å x x )
b21.3 = 2 2
2
1 3 1 3

(å x x )(å x ) - (å x x )(å x x )
2 3
2
1 2 1 1 3
b23.1 =
(å x )(å x ) - (å x x )
2
2 2
1 3 1 3

Further, b31.2 = b13.2 and b32.1 = b23.1 and the expressions for the constant terms are
a2.13 = X 2 - b21.3X 1 - b23.1X 3 and a3.12 = X 3 - b31.2 X 1 - b32.1X 2 respectively.

Example: Fit a linear regression of rice yield (X 1 quintals) on the use of fertiliser
(X2 kgs per acre) and the amount of rain fall (X3 inches), from the following data:

X1 45 50 55 70 75 75 85

X2 25 35 45 55 65 75 85

X3 31 28 32 32 29 27 31

Estimate the yield when X2 = 60 and X3 = 25.


Solution:
Calculation Table

X1 X2 X3 X1X2 X1X3 X2X3 X 12 X 22 X 23

45 25 31 1125 1395 775 2025 625 961


50 35 28 1750 1400 980 2500 1225 784
55 45 32 2475 1760 1440 3025 2025 1024
70 55 32 3850 2240 1760 4900 3025 1024
75 65 29 4875 2175 1885 5625 4225 841
75 75 27 5625 2025 2025 5625 5625 729
85 85 31 7225 2635 2635 7225 7225 961
455 385 210 26925 13630 11500 30925 23975 6324

From the above table we compute the following sums of product and sums of squares:

( å X1 )( å X2 ) 455 ´ 385
å x1x2 = å X 1 å X 2 – = 26925 – = 1900
n 7

( å X1 )( å X3 ) 455 ´ 210
åx1x3 = SX1 X 3 – = 13630 – = –20
n 7

LOVELY PROFESSIONAL UNIVERSITY 283


Research Methodology

Notes
( å X2 )( å X3 ) 385 ´ 210
åx2x3 = SX 2 X 3 – = 11500 – = –50
n 7

( å X 2 )2 3852
å x 22 = SX 2 – = 23975 – = 2800
2

n 7

( å X 3 )2 210 2
å X 23 = SX 3 – = 6324 – = 24
2

n 7
Substituting these values in equations (10) and (11), we get

1900 ´ 24 - ( -20) ´ ( -50)


b12.3 = = 0.689
2800 ´ 24 - ( -50)2

( -20) ´ 2800 - 1900 ´ (- 50)


b13.2 = = 0.603
2800 ´ 24 - (-50)2

455 385 210


Also X1 =  65, X2   55, X3   30
7 7 7

Thus a1.23 = X 1 - b12.3 X 2 - b13.2 X 3 = 65 - 0.689 ´ 55 - 0.603 ´ 30 = 9.015

 The fitted regression of X1 on X2 and X3 is X1c = 9.015 + 0.689X2 + 0.603X3


The estimate of yield (X1c) when X2 = 60 and X3 = 25 is
X1c = 9.015 + 0.689 × 60 + 0.603 × 25 = 65.43 quintals.
Alternatively to simplify calculation work, we change origin of the three variable as
U1 = X1 – 65, U2 = X2 – 55 and U3 = X3 – 30.

U1 U2 U3 U1U2 U1U3 U2U3 U 21 U 22 U 32

– 20 – 30 1 600 – 20 – 30 400 900 1


– 15 – 20 –2 300 30 40 225 400 4
– 10 – 10 2 100 – 20 – 20 100 100 4
5 0 2 0 10 0 25 0 4
10 10 –1 100 – 10 – 10 100 100 1
10 20 –3 200 – 30 – 60 100 400 9
20 30 1 600 20 30 400 900 1
0 0 0 1900 – 20 – 50 1350 2800 24

(
Note: Since SUi = 0,  ui = Ui – U = Ui, i = 1, 2, 3. U i = 0 )
1900 ´ 24 - (- 20 )(- 50 )
= 0.689
2800 ´ 24 - (- 50 )
Hence b12.3 = 2

(- 20) ´ 2800 - 1900 ´ (- 50) = 0.603


2800 ´ 24 - (- 50 )
b13.2 = 2

284 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

Further, we have Notes

X1 =
åU 1
+ 65 = 65, X 2 =
åU 2
+ 55 = 55 and X 3 =
åU 3
+ 30 = 30
n n n

!
Caution The above method should be used when mean of all the variables are integers.

Alternative Method
The coefficients of the regression equation X1c = a1.23 + b12.3X2 + b13.2X3 can also be obtained by
simultaneously solving the following normal equations:
SX1 = na1.23 + b12.3SX2 + b13.2SX3
SX1X2 = a1.23SX2 + b12.3SX22 + b13.2SX2X3
SX1X3 = a1.23SX3 + b12.3SX2X3 + b13.2SX32

Self Assessment

Fill in the blanks:

1. Regression coefficient is independent of change of .......................

2. In the case of ………………regression, one variable is affected by a linear combination of


another variable.

3. ……………..analysis is based on the statistical principle of multivariate statistics, which


involves observation and analysis of more than one statistical variable at a time.

13.2 Discriminant Analysis


In this analysis, two or more groups are compared. In the final analysis, we need to find out
whether the groups differ one from another.

Example: Where discriminant analysis is used


1. Those who buy our brand and those who buy competitors’ brand.

2. Good salesman, poor salesman, medium salesman

3. Those who go to Food World to buy and those who buy in a Kirana shop.

4. Heavy user, medium user and light user of the product.

Suppose there is a comparison between the groups mentioned as above along with demographic
and socio-economic factors, then discriminant analysis can be used. One way of doing this is to
proceed and calculate the income, age, educational level, so that the profile of each group could
be determined. Comparing the two groups based on one variable alone would be informative
but it would not indicate the relative importance of each variable in distinguishing the groups.
This is because several variables within the group will have some correlation which means that
one variable is not independent of the other.

If we are interested in segmenting the market using income and education, we would be
interested in the total effect of two variables in combinations, and not their effects separately.
Further, we would be interested in determining which of the variables are more important or

LOVELY PROFESSIONAL UNIVERSITY 285


Research Methodology

Notes had a greater impact. To summarize, we can say, that Discriminant Analysis can be used when
we want to consider the variables simultaneously to take into account their interrelationship.

Like regression, the value of dependent variable is calculated by using the data of independent
variable.
Z = b1x1 + b2x2 + b3x3 + ..............
Z = Discriminant score
b1 = Discriminant weight for variable
x = Independent variable
As can be seen in the above, each independent variable is multiplied by its corresponding
weightage.

This results in a single composite discriminant score for each individual. By taking the average
of discriminant score of the individuals within a certain group, we create a group mean. This is
known as centroid. If the analysis involves two groups, there are two centroids. This is very
similar to multiple regression, except that different types of variables are involved.

Application

A company manufacturing FMCG products introduces a sales contest among its marketing
executives to find out “How many distributors can be roped in to handle the company’s product”.
Assume that this contest runs for three months. Each marketing executive is given target regarding
number of new distributors and sales they can generate during the period. This target is fixed
and based on the past sales achieved by them about which, the data is available in the company.
It is also announced that marketing executives who add 15 or more distributors will be given a
Maruti Omni-van as prize. Those who generate between 5 and 10 distributors will be given a
two-wheeler as the prize. Those who generate less than 5 distributors will get nothing. Now
assume that 5 marketing executives won a Maruti van and 4 won a two-wheeler.

The company now wants to find out, “Which activities of the marketing executive made the
difference in terms of winning a prize and not winning the prize”. One can proceed in a number
of ways. The company could compare those who won the Maruti van against the others.
Alternatively, the company might compare those who won, one of the two prizes against those
who won nothing. It might compare each group against each of the other two.

Discriminant analysis will highlight the difference in activities performed by each group
members to get the prize. The activity might include:

1. More number of calls made to the distributors.

2. More personal visits to the distributors with advance appointments.

3. Use of better convincing skills.

Discriminant analysis answers the following questions:

1. What variable discriminates various groups as above; the number of groups could be two
or more? Dealing with more than two groups is called Multiple Discriminant Analysis
(M.D.A.).

2. Can discriminating variables be chosen to forecast the group to which the brand/person/
place belong to?

3. Is it possible to estimate the size of different groups?

286 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

SPSS Commands for Discriminate Analysis Notes

Input data has to be typed in an SPSS file.

1. Click on STATISTICS at the SPSS menu bar.

2. Click on CLASSIFY followed by DISCRIMINANT.

3. Dialogue box will appear. Select the GROUPING VARIABLE. This can be done by clicking
on the right arrow to transfer them from the variable list on the left to the grouping
variable box on the right.

4. Define the range of values by clicking on DEFINE RANGE. Enter Minimum and Maximum
value then click CONTINUE.

5. Select all the independent variable for discriminant analysis from the variable list by
clicking on the arrow that transfers them to box on the right.

6. Click on STATISTICS on the lower part of main dialogue box. This will open up a smaller
dialogue box.

7. Click on CLASSIFY on the lower part of the main dialogue box select SUMMARY TABLE
under the heading DISPLAY in a small dialogue box that appears.

8. Click OK to get the discriminant analysis output.

Self Assessment

Fill in the blanks:

4. In discriminant analysis, ....................... groups are compared.

5. If the discriminant analysis involves two groups, there are ....................... centroids.

13.3 Conjoint Analysis

Conjoint analysis is concerned with the measurement of the joint effect of two or more attributes
that are important from the customers’ point of view. In a situation where the company would
like to know the most desirable attributes or their combination for a new product or service, the
use of conjoint analysis is most appropriate.

Example: An airline would like to know, which is the most desirable combination of
attributes to a frequent traveller: (a) Punctuality (b) Air fare (c) Quality of food served on the
flight and (d) Hospitality and empathy shown.

Conjoint Analysis is a multivariate technique that captures the exact levels of utility that an
individual customer places on various attributes of the product offering. Conjoint Analysis
enables a direct comparison,

Example: A comparison between the utility of a price level of 400 versus 500, a delivery
period of 1 week versus 2 weeks, or an after-sales response of 24 hours versus 48 hours.

Once we know the utility levels for each attribute (and at individual levels as well), we can
combine these to find the best combination of attributes that gives the customer the highest
utility, the second best combination that gives the second highest utility, and so on. This
information is then used to design a product or service offering.

LOVELY PROFESSIONAL UNIVERSITY 287


Research Methodology

Notes Application

Conjoint Analysis is extremely versatile and the range of applications includes virtually in any
industry. New product or service design, including the concepts in the pre-prototyping stage
can specifically benefit from the conjoint applications.
Some examples of other areas where this technique can be used are:
1. Designing an automobile loan or insurance plan in the insurance industry,
2. Designing a complex machine for business customers.

Process

Design attributes for a product are first identified. For a shirt manufacturer, these could be
design such as designer shirts vs plain shirts, this price of 400 versus 800. The outlets can have
exclusive distribution or mass distribution. All possible combinations of these attribute levels
are then listed out. Each design combination will be ranked by customers and used as input data
for Conjoint Analysis. Then the utility of the products relative to price can be measured.

The output is a part-worth or utility for each level of each attribute. For example, the design may
get a utility level of 5 and plain, 7.5. Similarly, the exclusive distribution may have a part utility
of 2, and mass distribution, 5.8. We then put together the part utilities and come up with a total
utility for any product combination we want to offer, and compare that with the maximum
utility combination for this customer segment.

This process clarifies to the marketer about the product or service regarding the attributes that
they should focus on in the design.

If a retail store finds that the height of a shelf is an important attribute for selling at a particular
level, a well-designed shelf may result from this knowledge. Similarly, a designer of clocks will
benefit from knowing the utility attached by customers to the dial size, background colours, and
price range of the clocks.

Approach

From a discussion with the client, identify the design attributes to be studied and the levels at
which they can be offered. Then build a list of product concepts on offer. These product concepts
are then ranked by customers. Once this data is available, use Conjoint Analysis to derive the
part utilities of each attribute level. This is then used to predict the best product design for the
given customer segment. Use the SPSS Conjoint procedure to analyse the data.
There are three steps in conjoint analysis:
1. Identification of relevant products or service attributes.
2. Collection of data.
3. Estimation of worth for the attribute chosen.
For attributes selection, the market researcher can conduct interview with the customers directly.

Example: Example of conjoint analysis for a Laptop:


For a laptop, consider 3 attributes:

1. Weight (3 Kg or 5 Kg)

2. Battery life (2 hours or 4 hours)

3. Brand name (Lenovo or Dell)

288 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

SPSS Command for Conjoint Analysis Notes

SPSS commands for conjoint Analysis. A data file is to be created containing all possible attribute
combination.

1. Ask each of the respondent to rank all the combination of attributes contained in the file.
This is nomenclated at DATA FILE 1. All the rankings should be entered in another file
called DATA FILE 2.

2. Now 2 files namely DATA FILE 1 and DATA FILE 2 are created.

3. A third file called SYNTAX file is to be opened. By using the FILE, OPEN command
followed by syntax.

4. Type the following - conjoint plan = DATA FILE 1 SAV/DATA' DATA FILE 2 SAV/
SCORES=SCORE 1 to Score number of ranking/FACTOR VARI (DISCRETE)/PLOT ALL
(Here 25 is the possible combination of attributes). Score is the term used for rankings. The
no of scores will be equal to number of rankings. We should use the word RANK in the
syntax instead of scores if Rankings are contained in the data file.

5. Click RUN from the menu of the syntax file that was created click all in the menu which
appears on the screen. If the syntax is correct, the output for conjoint will appear.

Task Rank order the following combination of these characteristics:

1 = Most preferred, 8 = Least preferred

Combination Rank
3 Kg, 2 hours, Lenovo 4
5 Kg, 4 hours, Dell 5
5 Kg, 2 hours, Lenovo 8
3 Kg, 4 hours, Lenovo 3
3 Kg, 2 hours, Dell 2
5 Kg, 4 hours, Lenovo 7
5 Kg, 2 hours, Dell 6
3 Kg, 4 hours, Dell 1

One combination 3 kg, 4 hours, Dell clearly dominates and 5 kg, 2 hours, Lenovo is least
preferred.

Let us now take the average rank for 3 kg option = 4 + 3 + 2 + 1/4 = 2.5

For 5 kg option average rank is 5 + 8 + 7 + 6/4 = 6.5

For 4 hour option 5 + 3 + 7 + 1/4 = 4

For 2 hour option 4 + 8 + 2 + 6/4 = 5

For Dell 5 + 6 + 1 + 2/4 = 3.5

For Lenovo 5.5

Looking at the difference in average ranks, the most important characteristic to this
respondent is weight = 4, followed by brand name = 2 and battery life = 1.

LOVELY PROFESSIONAL UNIVERSITY 289


Research Methodology

Notes Self Assessment

Fill in the blanks:

6. ....................... analysis is concerned with the measurement of the joint effect of two or more
attributes.

7. For ....................... selection, the market researcher can conduct interview with the customers
directly.

8. The ....................... is a part-worth or utility for each level of each attribute.

13.4 Factor Analysis

The main purpose of Factor Analysis is to group large set of variable factors into fewer factors.
Each factor will account for one or more component. Each factor a combination of many variables.
There are two most commonly employed factor analysis procedures or methods. They are:

1. Principle component analysis

2. Common factor analysis.


When the objective is to summarise information from a large set of variables into fewer factors,
principle component factor analysis is used. On the other hand, if the researcher wants to analyse
the components of the main factor, common factor analysis is used.

Example: Common factor – Inconvenience inside a car. The components may be:
1. Leg room

2. Seat arrangement

3. Entering the rare seat

4. Inadequate dickey space

5. Door locking mechanism.

13.4.1 Principle Component Factor Analysis

Purposes: Customer feedback about a two-wheeler manufactured by a company.

Method: The MR manager prepares a questionnaire to study the customer feedback. The researcher
has identified six variables or factors for this purpose. They are as follows:

1. Fuel efficiency (A)

2. Durability (Life) (B)

3. Comfort (C)

4. Spare parts availability (D)

5. Breakdown frequency (E)

6. Price (F)

The questionnaire may be administered to 5,000 respondents. The opinion of the customer is
gathered. Let us allot points 1 to 10 for the variables factors A to F. 1 is the lowest and 10 is the
highest. Let us assume that application of factor analysis has led to grouping the variables as
follows:

290 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

A, B, D, E into factor-1 Notes

F into Factor -2

C into Factor - 3

Factor - 1 can be termed as Technical factor;

Factor - 2 can be termed as Price factor;

Factor - 3 can be termed as Personal factor.

For future analysis, while conducting a study to obtain customers’ opinion, three factors
mentioned above would be sufficient. One basic purpose of using factor analysis is to reduce the
number of independent variables in the study. By having too many independent variables, the
M.R study will suffer from following disadvantages:

1. Time for data collection is very high due to several independent variables.

2. Expenditure increases due to the time factor.

3. Computation time is more, resulting in delay.


4. There may be redundant independent variables.

Did u know? What is correspondence analysis?

Correspondence analysis is a descriptive/exploratory technique designed to analyze


simple two-way and multi-way tables containing some measure of correspondence between
the rows and columns.

The results provide information which is similar in nature to those produced by Factor Analysis
techniques, and they allow one to explore the structure of categorical variables included in the
table. The most common kind of table of this type is the two-way frequency cross-tabulation
table.

In a typical correspondence analysis, a cross-tabulation table of frequencies is first standardized,


so that the relative frequencies across all cells sum to 1.0. One way to state the goal of a typical
analysis is to represent the entries in the table of relative frequencies in terms of the distances
between individual rows and/or columns in a low-dimensional space.

Example: Following are the data on the drinking habits of different employees in an
organization:

Drinking Habits
(2) (3)
Employee Group (1) None (4) Heavy Row Totals
Light Medium
(1) Senior Level Management 5 2 4 3 14
(2) Middle Level Management 4 2 5 9 20
(3) Junior Level Management 15 12 10 5 42
(4) Executives 25 20 30 15 90
(5) Other Employees 30 5 10 5 50
Column Totals 79 41 59 37 216

One may think of the 4 column values in each row of the table as coordinates in a 4-dimensional
space, and one could compute the (Euclidean) distances between the 5 row points in the 4-

LOVELY PROFESSIONAL UNIVERSITY 291


Research Methodology

Notes dimensional space. The distances between the points in the 4-dimensional space summarize all
information about the similarities between the rows in the table above. Now suppose one could
find a lower-dimensional space, in which to position the row points in a manner that retains all,
or almost all, of the information about the differences between the rows. You could then present
all information about the similarities between the rows (types of employees in this case) in a
simple 1, 2, or 3-dimensional graph. While this may not appear to be particularly useful for
small tables like the one shown above, one can easily imagine how the presentation and
interpretation of very large tables (e.g., differential preference for 10 consumer items among
100 groups of respondents in a consumer survey) could greatly benefit from the simplification
that can be achieved via correspondence analysis (e.g., represent the 10 consumer items in a two-
dimensional space).

13.4.2 Rotation in Factor Analysis

Rotation is the step in factor analysis that permits you to identify meaningful factor names or
descriptions like these.

Linear Functions of Predictors

To identify with rotation, first consider a problem that doesn’t involve factor analysis. Suppose
you want to predict the grades of college students (all in the same college) in many dissimilar
courses, from their scores on general “verbal” and “math” skill tests. To build up predictive
formulas, you have a body of past data consisting of the grades of numerous hundred previous
students in these courses, plus the scores of those students on the math and verbal tests. To
predict grades for present and future students, you might use these data from past students to fit
a series of two-variable multiple regressions, each regression forecasting grade in one course
from scores on the two skill tests.

At present suppose a co-worker suggests summing each student’s verbal and math scores to
obtain a composite “academic skill” score I’ll call AS, and taking the difference among each
student’s verbal and math scores to obtain a second variable I’ll call VMD (verbal-math difference).
The co-worker advises running the same set of regressions to predict grades in individual
courses, except using AS and VMD as predictors in each regression, instead of the original verbal
and math scores. In this instance, you would get exactly the same predictions of course grades
from these two families of regressions: one predicting grades in individual courses from verbal
and math scores, the other predicting the identical grades from AS and VMD scores. In fact, you
would get the same predictions if you formed composites of 3 math + 5 verbal and 5 verbal + 3
math, and ran a series of two-variable multiple regressions forecasting grades from these two
composites. These examples are all linear functions of the original verbal and math scores.

The vital point is that if you have m predictor variables, and you replace the m original predictors
by m linear functions of those predictors, you usually neither gain nor lose any information—
you could if you wish use the scores on the linear functions to rebuild the scores on the original
variables. But multiple regression uses whatever information you have in the optimum way (as
measured by the sum of squared errors in the current sample) to forecast a new variable (e.g.
grades in a particular course). Since the linear functions contain the same information as the
original variables, you get the similar predictions as before.

Specified that there are lots of ways to get exactly the same predictions, is there any advantage
to using one set of linear functions rather than another? Yes there is; one set might be simpler
than another. One particular pair of linear functions may enable many of the course grades to be
forecasted from just one variable (that is, one linear function) rather than from two. If we regard
regressions with less predictor variables as simpler, then we can ask this question: Out of all the

292 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

possible pairs of predictor variables that would give the same predictions, which is simplest to Notes
use, in the logic of minimizing the number of predictor variables needed in the typical regression?
The pair of predictor variables maximising some measure of minimalism could be said to have
simple structure. In this example involving grades, you might be able to predict grades in some
courses correctly from just a verbal test score, and predict grades in other courses accurately
from just a math score. If so, then you would have achieved a “simpler structure” in your
predictions than if you had used both tests for each and every predictions.

Simple Structure in Factor Analysis

The points of the preceding section are relevant when the predictor variables are factors. Think
of the m factors F as a set of independent or predictor variables, and imagine of the p observed
variables X as a set of dependent or criterion variables. Think a set of p multiple regressions,
each predicting one of the variables from all m factors. The standardized coefficients in this set of
regressions structure a p x m matrix called the factor loading matrix. If we replaced the original
factors by a set of linear functions of those factors, we would get just the same predictions as
before, but the factor loading matrix would be different. So we can ask which, of the many
possible sets of linear functions we might use, produces the simplest factor loading matrix.
Specially we will define simplicity as the number of zeros or near-zero entries in the factor
loading matrix—the more zeros, the simpler the structure. Rotation does not alter matrix C or
U at all, but does transform the factor loading matrix.

In the intense case of simple structure, each X-variable will have merely one large entry, so that
all the others can be ignored. But that would be a simpler structure than you would usually
expect to achieve; after all, in the real world each variable isn’t in general affected by only one
other variable. You then name the factors subjectively, based on an examination of their loadings.

In common factor analysis the procedure of rotation is in fact somewhat more abstract that I
have implied here, since you don’t actually know the individual scores of cases on factors.
However, the statistics for a multiple regression that is mainly relevant here—the multiple
correlation and the standardized regression slopes—can all be calculated just from the correlations
of the variables and factors involved. So we can base the calculations for rotation to simple
structure on just those correlations, devoid of using any individual scores.

A rotation which necessitates the factors to remain uncorrelated is an orthogonal rotation, while
others are oblique rotations. Oblique rotations regularly achieve greater simple structure, though
at the cost that you have to also consider the matrix of factor intercorrelations when interpreting
results. Manuals are usually clear which is which, but if there is ever any ambiguity, a simple
rule is that if there is any capability to print out a matrix of factor correlations, then the rotation
is oblique, as no such capacity is needed for orthogonal rotations.

Self Assessment

Fill in the blanks:

9. When the objective is to summarise information from a large set of variables into fewer
factors, ....................... analysis is used.

10. Correspondence analysis is a ....................... technique.

11. In a typical correspondence analysis, a cross-tabulation table of frequencies is first


.......................

LOVELY PROFESSIONAL UNIVERSITY 293


Research Methodology

Notes 13.5 Cluster Analysis

Cluster Analysis is used:


1. To classify persons or objects into small number of clusters or group.
2. To identify specific customer segment for the company’s brand.
Cluster Analysis is a technique used for classifying objects into groups. This can be used to sort
data (a number of people, companies, cities, brands or any other objects) into homogeneous
groups based on their characteristics.
The result of Cluster Analysis is a grouping of the data into groups called clusters. The researcher
can analyse the clusters for their characteristics and give the cluster, names based on these.
Where can Cluster Analysis be applied?
The marketing application of cluster analysis is in customer segmentation and estimation of
segment sizes. Industries, where this technique is useful include automobiles, retail stores,
insurance, B-to-B, durables and packaged goods. Some of the well-known frameworks in consumer
behaviour (like VALS) are based on value cluster analysis.
Cluster Analysis is applicable when:
1. An FMCG company wants to map the profile of its target audience in terms of life-style,
attitude and perceptions.
2. A consumer durable company wants to know the features and services a consumer takes
into account, when purchasing through catalogues.
3. A housing finance corporation wants to identify and cluster the basic characteristics, life-
styles and mindset of persons who would be availing housing loans. Clustering can be
done based on parameters such as interest rates, documentation, processing fee, number
of installments etc.

Process

There are two ways in which Cluster Analysis can be carried out:

1. First, objects/respondents are segmented into a pre-decided number of clusters. In this


case, a method called non-hierarchical method can be used, which partitions data into the
specified number of clusters

2. The second method is called the hierarchical method.

The above two are basic approaches used in cluster analysis. This can be used to segment
customer groups for a brand or product category, or to segment retail stores into similar groups
based on selected variables.

Interpretation of Results

Ideally, the variables should be measured on an interval or ratio scale. This is because the
clustering techniques use the distance measure to find the closest objects to group into a cluster.
An example of its use can be clustering of towns similar to each other which will help decide
where to locate new retail stores.

If clusters of customers are found based on their attitudes towards new products and interest in
different kinds of activities, an estimate of the segment size for each segment of the population
can be obtained, by looking at the number of objects in each cluster.

294 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

Marketing strategies for each segment are fine-tuned based on the segment characteristics. For Notes
instance, a segment of customers, like sports car, get a special promotional offer during specific
period.

Example: In cluster analysis, the following five steps to be used:


1. Selection of the sample to be clustered (buyers, products, employees).

2. Definition on which the measurement to be made (E.g.: product attributes, buyer


characteristics, employees’ qualification).

3. Computing the similarities among the entities.

4. Arrange the cluster in a hierarchy.

5. Cluster comparison and validation.

Did u know? Names can also be given to clusters to describe each one. For example, there
can be a cluster called “neo-rich”. Segments are prioritised based on their estimated size.

Cluster Analysis on Three Dimensions

The example below shows Cluster Analysis based on three dimensions age, income and family
size. Cluster Analysis is used to segment the car-buying population in a Metro. For example “A”
might represent potential buyers of low end cars. Example: Maruti 800 (for common man).
These are people who are graduating from the two-wheeler market segment. Cluster “B” may
represent mid-population segment buying Zen, Santro, Alto etc. Cluster “C” represents car
buyers, who belong to upper strata of society. Buyers of Lancer, Honda city etc. Cluster “D”
represents the super-rich cluster, i.e. Buyers of Benz, BMW, etc.

Figure 13.1: Matching Measure

Income

B
A

Age

Family size

Example: Suppose there are five attributes, 1 to 5, on which we are judging two objects A and
B. The existence of an attribute may be indicated by 1 and its absence by 0. In this way, two
objects are viewed as similar if they share common attributes.

LOVELY PROFESSIONAL UNIVERSITY 295


Research Methodology

Notes
Attribute 1 2 3 4 5 6 7
Brand - A 1 0 0 1 0 0 1
Brand - B 0 0 1 1 1 0 0

One measure of simple matching S is given by:

a+d
S=
a+b+c+d

Where
a = No. of attributes possessed by brands A and B
b = No. of attributes possessed by brand A but not by brand B
c = No. of attributes possessed by brand B but not by brand A
d = No. of attributes not possessed by both brands.

1+2 3
Substituting, we get S= = = 0.43
1+2+2+2 7

A and B’s association is to be the extent of 43%.

It is now clear that object A possess attributes 1, 4, and 7 while object B possess the attributes 3,
4 and 5. A glance at the above table will indicate that objects A and B are similar in respect of 2
(0 & 0), 6 (0 & 0) and 4 (1 & 1). In respect of other attributes, there is no similarity between A and
B. Now we can arrive at a simple matching measure by (a) counting up the total number of
matches - either 0, 0 or 1, (b) dividing this number by the total number of attributes.

Symbolically SAB = M/N

SAB = Similarity between A and B

M = Number of attributes held in common (0 or 1)

N = Total number of attributes

SAB = 3/7 = 0.43

i.e., A & B are similar to the extent of 43%.

SPSS Command for Cluster Analysis

Stage 1

Enter the input data along with variable and value labels in an SPSS file.

1. Click on STATISTICS at the spss menu bar.

2. Click on CLASSIFY followed by HIERARCHICAL CLUSTER.

3. Dialogue box will appear select all the variables which are required to be used in cluster
analysis. This can be done by clicking on the right arrow to transfer them from the variable
list on the left.

4. Click on METHOD. The dialogue box will open. Choose "Between Groups Linkage" as the
CLUSTER METHOD.

5. Click CONTINUE to return to main dialogue box.

6. Click STATISTICS on the main dialogue box. Choose "Agglomeration schedule" so that it
will appear in the final output click CONTINUE.

296 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

7. Choose DENDROGRAM then on the box called ICICLE, Choose "All Clusters" and "Vertical". Notes

8. Click OK on the main dialogue box to get the output of the hierarchical cluster analysis.

Stage 2

This stage is used to know how many clusters are required. This stage is called K- MEANS
CLUSTERING.

1. Click CLASSIFY, followed by K- FANS CLUSTER desired.

2. Fill in the desired number of clusters that has been identified from stage 1.

3. Click OPTIONS on the main dialogue box. Select "Initial Cluster Centers". Then click
CONTINUE to return to the main dialogue box.

4. Click OK on the main dialogue box to get the output which has final clusters.

Self Assessment

Fill in the blanks:

12. ....................... Analysis is a technique used for classifying objects into groups.

13. The ....................... application of cluster analysis is in customer segmentation and estimation
of segment sizes.

13.6 Multidimensional Scaling (MDS)

In addition to fulfilling the goals of detecting underlying structure and data reduction that is
shares with other methods, multidimensional scaling (MDS) provides the researcher with a
spatial representation of data that can facilitate interpretation and reveal relationships. Therefore,
we can define MDS as “a set of multivariate statistical methods for estimating the parameters in
and assessing the fit of various spatial distance models for proximity data.”

The spatial display of data provided by MDS is why it is also sometimes referred to as perceptual
mapping. MDS has much more flexibility about the types of data that can be used to generate the
solution. Almost any measures of similarity and dissimilarity can be used, depending on what
your statistical computer software will accept.

Types of MDS

In general, there are two types of MDS:

1. Metric

2. Non-metric

Metric MDS makes the assumption that the input data is either ratio or interval data, while the
non-metric model requires simply that the data be in the form of ranks. Therefore, the non-
metric model has more fewer restrictions than the metric model, but also less rigor. One technique
to use if you are unsure whether your data is ordinal or can be considered interval is to try both
metric and non-metric models. If the results are very close, the metric model may be used.

An advantage of the non-metric models is that they permit the researcher to categorize and
examine preference data, such as the kind obtained in marketing studies or other areas where
comparisons are useful.

Another technique, correspondence analysis, can work with categorical data, i.e., data at the
nominal level of measurement, however that technique will not be described here.

LOVELY PROFESSIONAL UNIVERSITY 297


Research Methodology

Notes

Notes Similarities and Differences between Factor Analysis and MDS


We have already seen that MDS can accept more different measures of similarity and
dissimilarity than factor analysis techniques can. In addition, there are some differences in
terminology. These differences reflect the origin of MDS in the field of psychology. The
measure corresponding to factors are called alternatively dimensions or stimulus
coordinates.
The output of MDS looks very similar to that of factor analysis and the determination of
the optimal number of dimensions is handled in much the same way.

Steps in using MDS

There are four basic steps in MDS:

1. Data collection and formation of the similarity/dissimilarity matrix

2. Extraction of stimulus coordinates


3. Decision about the number of stimulus coordinates that represent the data

4. Rotation and interpretation

Example: Let us say that you have a matrix of distances between a number of major cities,
such as you might find on the back of a road map. These distances can be used as the input data
to derive an MDS solution. When the results are mapped in two dimensions, the solution will
reproduce a conventional map, except that the MDS plot might need to be rotated so that the
north-south and east-west dimensions conform to expectations. However, the once the rotation
is completed, the configuration of the cities will be spatially correct.

Self Assessment

Fill in the blanks:

14. An advantage of the non-metric models is that they permit the researcher to .......................
and ....................... preference data.

15. The spatial display of data provided by MDS is also sometimes referred to as ………………..

13.7 Summary

 Multivariate analysis is used if there are more than 2 variables.

 Some of the multi variate analysis are discriminant analysis, Factor analysis, Cluster
analysis, conjoint analysis, and multi dimensional scaling.

 In discriminant analysis, it is verified whether the 2 groups differ from one another.

 Factor analysis is used to reduce large no of various factors into fewer variables cluster
analysis is used to segmenting the market or to identify the target group.

 Regression is a term used for predicting the value of one variable from the other.

 Least square method is used to fit the line.

298 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

 MDS as a set of multivariate statistical methods for estimating the parameters in and Notes
assessing the fit of various spatial distance models for proximity data.

 The output of MDS looks very similar to that of factor analysis and the determination of
the optimal number of dimensions is handled in much the same way.

13.8 Keywords

Cluster Analysis: Cluster Analysis is a technique used for classifying objects into groups.

Conjoint Analysis: Conjoint analysis is concerned with the measurement of the joint effect of
two or more attributes that are important from the customers’ point of view.

Discriminant Analysis: In this analysis, two or more groups are compared. In the final analysis,
we need to find out whether the groups differ one from another.

Factor Analysis: Factor Analysis is the analysis whose main purpose is to group large set of
variable factors into fewer factors.

Multivariate Analysis: In multi variate analysis, the number of variables to be tackled are
many.

13.9 Review Questions

1. Which technique would you use to measure the joint effect of various attributes while
designing an automobile loan and why?

2. Do you think that the conjoint analysis will be useful in any manner for an airline? If yes
how, if no, give an example where you think the technique is of immense help.

3. In your opinion, what are the main advantages of cluster analysis?

4. Which analysis would you use in a situation when the objective is to summarise information
from a large set of variables into fewer factors? What will be the steps you would follow?

5. Which analysis would answer if it is possible to estimate the size of different groups?

6. Which analysis would you use to compare a good, bad and a mediocre doctor and why?

7. Analyse the weakness of principle component factor analysis.

8. Which multivariate analysis would you apply to identify specific customer segment for a
company’s brand and why?

9. Critically evaluate multidimensional scaling.


10. In your opinion what will be the disadvantages of having too many independent variables
in an MR study?

11. The following constructs of intelligence measures of management students:

Variable Load

CGPA 0.60 x F

Problem Solving Skills 0.75 x F

Communication Skills 0.85 x F

This table tells us communication skill score loads highly on intelligence factor of
management students, followed by problem solving skills and CGPA. These loads or

LOVELY PROFESSIONAL UNIVERSITY 299


Research Methodology

Notes weights are correlations, i.e., the correlations between communication skills and the factor.
But here we have only three variables and only one factor. In real life we may have many
variables and more factors. Whatever may be the case, the basic ideas remain the same.

Suppose we want to recruit management trainees from the campus and as a selection
process, we need to consider the following variables.

X1 = CGPA

X2 = Problem Solving Skills

X3 = Communication Skills

X4 = Knowledge Test Score

X5 = GD Score

X6 = Personal Interview Score

12. People have been rated on their suitability for an advanced training course in computer
programming on the basis of six ratings given by their manager (rated 1=low to 20=high):

(a) Intellect
(b) Interest in doing the course

(c) Experience of computer programming

(d) Likelihood of them staying with the company

(e) Commitment to the company

(f) Loyalty to their team and two other ratings:

(g) Number of GCSEs

(h) Score on a computer programming aptitude test

The training department believe that these are really measuring only three things; intellect,
computer programming experience and loyalty, and want you to carry out a factor analysis
to explore that hypothesis. Describe the decisions you would have to make in carrying out
a factor analysis and what the results would be likely to tell you.

13. Six observations on two variables are available, as shown in the following table:

Obs. X1 X2
a 3 2
b 4 1
c 2 5
d 5 2
e 1 6
f 4 2

(a) Plot the observations in a scatter diagram. How many groups would you say there
are, and what are their members?

(b) Apply the nearest neighbor method and the squared Euclidean distance as a measure
of dissimilarity. Use a dendrogram to arrive at the number of groups and their
membership.

300 LOVELY PROFESSIONAL UNIVERSITY


Unit 13: Multivariate Analysis

14. Six observations on two variables are available, as shown in the following table: Notes

Obs. X1 X2
a -1 -2
b 0 0
c 2 2
d -2 -2
e 1 -1
f 1 2

(a) Plot the observations in a scatter diagram. How many groups would you say there
are, and what are their members?

(b) Apply the nearest neighbor method and the Euclidean distance as a measure of
dissimilarity.

Answers: Self Assessment

1. origin 2. simple linear


3. Multivariate 4. two or more
5. two 6. Conjoint
7. attributes 8. output
9. principle component factor 10. descriptive/exploratory
11. standardized 12. Cluster
13. marketing 14. categorize, examine
15. perceptual mapping.

13.10 Further Readings

Books A Parasuraman, Dhruv Grewal, Marketing Research, Biztantra


Cisnal Peter, Marketing Research, MCGE.
Hague & Morgan, Marketing Research in Practice, Kogan page.
Paneerselvam, R, Research Methods, PHI.
Tull and Donalds, Marketing Research, MMIL.

LOVELY PROFESSIONAL UNIVERSITY 301


Research Methodology

Notes Unit 14: Report Writing

CONTENTS
Objectives
Introduction
14.1 Characteristics of Research Report
14.1.1 Substantive Characteristics
14.1.2 Semantic Characteristics
14.2 Significance of Report Writing
14.3 Techniques and Precautions of Interpretation
14.3.1 Basic Analysis of "Quantitative" Information
14.3.2 Basic Analysis of "Qualitative" Information
14.3.3 Interpreting Information
14.3.4 Precautions
14.4 Types of Report
14.4.1 Oral Report
14.4.2 Written Report
14.4.3 Distinguish between Oral and Written Report
14.5 Preparation of Research Report
14.5.1 How to Write a Bibliography?
14.6 Style, Layout and Precautions of the Report writing
14.6.1 Style of Report Writing
14.6.2 Layout of the Report
14.6.3 Precautions in Report Writing
14.7 Summary
14.8 Keywords
14.9 Review Questions
14.10 Further Readings

Objectives
After studying this unit, you will be able to:

 Explain the meaning and characteristics of research report

 Recognize the significance of report writing

 Describe the techniques and precaution of interpretation

 Discuss the layout of report

 Categorize different types of report

302 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

Introduction Notes

A report is a very formal document that is written for a variety of purposes, generally in the
sciences, social sciences, engineering and business disciplines. Generally, findings pertaining to
a given or specific task are written up into a report. It should be noted that reports are considered
to be legal documents in the workplace and, thus, they need to be precise, accurate and difficult
to misinterpret.

There are three features that, together, characterize report writing at a very basic level: a predefined
structure, independent sections, and reaching unbiased conclusions.

 Predefined structure: Broadly, these headings may indicate sections within a report, such
as an introduction, discussion, and conclusion.

 Independent sections: Each section in a report is typically written as a stand-alone piece, so


the reader can selectively identify the report sections they are interested in, rather than
reading the whole report through in one go from start to finish.

 Unbiased conclusions: A third element of report writing is that it is an unbiased and


objective form of writing.

14.1 Characteristics of Research Report

Characteristics feature is an integral part of the report. There is no hard and fast rule for preparing
a research report. The research report will differ based on the need of the particular managers
using the report. The report also depends on the philosophy of the researcher.

Example: A report prepared for a government agency will be different from the one prepared
for a private organization.
In spite of the fact that, marketing report is influenced by the researcher, there are certain
characteristics which the report should possess, if it is to be effectively communicated. These
characteristics can be classified as:

i. Substantive characteristics

ii. Semantic characteristics.

14.1.1 Substantive Characteristics

Substantive characteristics are:

 Accuracy

 Currency

 Sufficiency

 Availability

 Relevancy

The more that the report possesses the above characteristics, the greater is its practical value in
decision making.

Accuracy: Accuracy refers to the degree to which information reflects reality. Specifically, research
report must accurately present both research procedure and research results. Even if the research
results are not as per the expectation of the management, the researcher has the professional

LOVELY PROFESSIONAL UNIVERSITY 303


Research Methodology

Notes obligation to present the findings accurately and objectively. Less accurate report means, injustice
to the management.

Currency: Currency refers to the time span between completion of the research project and
presentation of the research report to management. If the management receives the research
report too late, the results are no longer valid due to environmental changes, and then the report
will have no or little value for decision making. Currency is one of the reasons for orally or
informally communicating preliminary research results to management to ensure timely decision
making.

Sufficiency: The research report must have sufficient details, so that important and valid decision
can be made. Sometimes the sample size, sample representativeness may act as a constraint for
sufficient details not being available.

Example: Data required by the management, say segment wise market, whereas overall
market data is available.

Notes A research report must document methodology and techniques used so that an
assessment can be made regarding validity, reliability and generalizability. Therefore,
sufficiency refers to whether enough information is present in the research report to
enable the manager to take valid decision.
It should be remembered that sufficiency characteristic does not mean that all possible research
project information must be incorporated in the research report. A researcher should include in
a report only that information, which is necessary to convey complete perspective of the research
project.

Availability: The fourth important characteristic of research report is that, it is available to the
appropriate decision maker when they need it. Availability refers to the communication process
between researcher and the decision maker. We use the word 'appropriate decision maker' to
emphasize the fact "who should or who should not have access to the report". This decision is
made by the management, and it is the duty of the researcher to carry out this decision. Most
reports carry confidential information. Therefore, it is necessary to restrict the report availability,
to individuals as well as outside of an organization to prevent the competitor from having
access to it.

Relevancy: The research report should be confined to the decision issue researched. Sometimes
the researcher might include some information, which he thinks is interesting, but may not
have any relevance. This type of information should be excluded from the report.

Example: A researcher may be preparing a report on the audience perception of RJs


(Radio Jockeys). This may be done with a view to recruit them based on the perception. In this
context, a lengthy commentary on relative audience appeal of each radio station is included.
This type of data may be readily available from some research agency, who is selling commercial
data. Therefore, including this type of aspect may not be necessary.

14.1.2 Semantic Characteristics

Semantic characteristics are equally important in report. The report should be grammatically
correct. It should be free from spelling and typing errors. This will ensure that there is no
ambiguity or misunderstanding. Assistance of a proof reader, other than the researcher would
be required to eliminate the above errors.

304 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

i. Creative expressions in the form of superlatives, similes should be avoided. Notes

ii. The report should be concise.

iii. Jargon of any kind should be avoided.

iv. Common words with multiple meaning should be avoided.

v. Language of the report must be simple. For example, sentences like "illumination must be
extinguished when premises are not in use" can be expressed in simple words say "switch
off the lights when you leave".

vi. Avoid using 'I' 'we'. The report should be more impersonal.

vii. Sometimes, the current research uses the data of research conducted in the past. In this case
it is better to use past tense than present tense.

The following are the hindrances for clarity of any research report.

 Ambiguity

 Jargon
 Misspelled words

 Excessive prediction

 Improper punctuation

 Unfamiliar words

 Clerical error

Some of the illustrations that can cause inaccuracy in report writing are given below:

 Addition/subtraction error: Assume that a survey was conducted to ascertain the income
of various strata of population in a city. Suppose, it is found that 15% belong to super rich,
18% belong to rich class, 61% belong to middle class.

By oversight the total is recorded as (15+61+18) which is not equal to hundred. This error
can be corrected easily by the researcher. This type of error leads to confusion because the
reader or decision maker does not know which categories are left out (may be lower
middle class and lower class).

 Confusion between percentage and percentage points: Suppose the report indicates that
raw material cost of a product as a percentage of total cost increased from 8 percentage
points in 2003 to 10 percentage points in 2009. Therefore, the raw material cost has increased
by only 2 percentage points in 6 years. The real increase is 2 percentage points or
25 percent.

 Wrong conclusion: Mr. X annual income has increased from 20,000 to 40,000 in 8 years.
Therefore, the conclusion is, since income has doubled, the purchasing power also has
doubled. This may not be true because due to inflation in 8 years, purchasing power might
come down or money value could get eroded.

Self Assessment

Fill in the blanks:

1. The research report will differ based on the …………of the particular managers using the
report.

2. Accuracy refers to the degree to which information reflects……………..

LOVELY PROFESSIONAL UNIVERSITY 305


Research Methodology

Notes 3. Availability refers to the communication process between researcher and the………………..

4. …………….refers to the time span between completion of the research project and
presentation of the research report to management

14.2 Significance of Report Writing

Preparation and presentation of a research report is the most important part of the research
process. No matter how brilliant the hypothesis and how well designed is the research study,
they are of little value unless communicated effectively to others in the form of a research
report. Moreover, if the report is confusing or poorly written, the time and effort spent on
gathering and analysing data would be wasted. It is therefore, essential to summarise and
communicate the result to the management in the form of an understandable and logical research
report.

Research report is regarded as a major component of the research study for the research task
remains unfinished till the report has been presented and/or written. As a matter of fact even
the most brilliant hypothesis, very well designed and conducted research study, and the most
striking generalizations and findings are of modest value unless they are effectively communicated
to others. The rationale of research is not well served unless the findings are made known to
others. Research results must customarily enter the general store of knowledge. All this explains
the importance of writing research report. There are people who do not consider writing of
report as an essential part of the research process. But the general opinion is in favour of treating
the presentation of research results or the writing of report as division and parcel of the research
project. Writing of report is the final step in a research study and requires a set of skills somewhat
different from those called for in respect of the former stages of research. This task should be
accomplished by the researcher with extreme care; he may seek the assistance and guidance of
experts for the reason.

Self Assessment

Fill in the blanks:

5. …………………is regarded as a major component of the research study

6. Writing of report is the ………..step in a research study and requires a set of skills somewhat
different from those called for in respect of the former stages of research.

14.3 Techniques and Precautions of Interpretation

Interpretation means bringing out the meaning of data. We can also say that interpretation is
to convert data into information. The essence of any research is to do interpretation about
the study. This requires a high degree of skill. There are two methods of drawing conclusions
(i) induction (ii) deduction.

In the induction method, one starts from observed data and then generalisation is done which
explains the relationship between objects observed.

On the other hand, deductive reasoning starts from some general law and is then applied to a
particular instance i.e., deduction comes from the general to a particular situation.

Example:
Example of Induction: All products manufactured by Sony are excellent. DVD player model 2602
MX is made by Sony. Therefore, it must be excellent.

306 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

Example of Deduction: All products have to reach decline stage one day and become obsolete. This Notes
radio is in decline mode. Therefore, it will become obsolete.

During the inductive phase, we reason from observation. During the deductive phase, we reason
towards the observation. Successful interpretation depends on how well the data is analysed. If
data is not properly analysed, the interpretation may go wrong. If analysis has to be corrected,
then data collection must be proper. Similarly, if the data collected is proper but analysed
wrongly, then too the interpretation or conclusion will be wrong. Sometimes, even with the
proper data and proper analysis, the data can still lead to wrong interpretation. Interpretation
depends upon the experience of the researcher and methods used by him for interpretation.

Did u know? Both logic and observation are essential for interpretation.

Example: A detergent manufacturer is trying to decide which of the three sales promotion
methods (discount, contest, buy one get one free) would be most effective in increasing the sales.
Each sales promotion method is run at different times in different cities. The sales obtained by
the different sale promotion methods is as follows.

Sales Impact of Different Sale Promotion Methods

Sales Promotion Method Sales Associated with Sales Promotion

1 2,000

2 3,500

3 2,510

The results may lead us to the conclusion that the second sales promotion method was the most
effective in developing sales. This may be adopted nationally to promote the product. But one
cannot say that the same method of sales promotion will be effective in each and every city
under study.

14.3.1 Basic Analysis of "Quantitative" Information

(for information other than commentary, e.g., ratings, rankings, yes's, no's, etc.)

 Make copies of your data and store the master copy away. Use the copy for making edits,
cutting and pasting, etc.

 Tabulate the information, i.e., add up the number of ratings, rankings, yes's, no's for each
question.

 For ratings and rankings, consider computing a mean, or average, for each question. For
example, "For question #1, the average ranking was 2.4". This is more meaningful than
indicating, e.g., how many respondents ranked 1, 2, or 3.

 Consider conveying the range of answers, e.g., 20 people ranked "1", 30 ranked "2", and 20
people ranked "3".

14.3.2 Basic Analysis of "Qualitative" Information

(respondents' verbal answers in interviews, focus groups, or written commentary on


questionnaires):

LOVELY PROFESSIONAL UNIVERSITY 307


Research Methodology

Notes  Read through all the data.

 Organize comments into similar categories, e.g., concerns, suggestions, strengths,


weaknesses, similar experiences, program inputs, recommendations, outputs, outcome
indicators, etc.

 Label the categories or themes, e.g., concerns, suggestions, etc.

 Attempt to identify patterns, or associations and causal relationships in the themes, e.g.,
all people who attended programs in the evening had similar concerns, most people came
from the same geographic area, most people were in the same salary range, what processes
or events respondents experience during the program, etc.

 Keep all commentary for several years after completion in case needed for future reference.

14.3.3 Interpreting Information

 Attempt to put the information in perspective, e.g., compare results to what you expected,
promised results; management or program staff; any common standards for your products
or services; original goals (especially if you're conducting a program evaluation);
indications or measures of accomplishing outcomes or results (especially if you're
conducting an outcomes or performance evaluation); description of the program's
experiences, strengths, weaknesses, etc. (especially if you're conducting a process
evaluation).

 Consider recommendations to help employees improve the program, product or service;


conclusions about program operations or meeting goals, etc.

 Record conclusions and recommendations in a report, and associate interpretations to


justify your conclusions or recommendations.

14.3.4 Precautions

1. Keep the main objective of research in mind.

2. Analysis of data should start from simpler and more fundamental aspects.

3. It should not be confusing.

4. The sample size should be adequate.

5. Take care before generalising of the sample studied.

6. Give due attention to significant questions.

Caution: In report writing, do not miss the significance of some answers, because they are found
from very few respondents, such as "don't know" or "can't say".

Self Assessment

Fill in the blanks:

7. ………………means bringing out the meaning of data.

8. Successful interpretation depends on how well the data is……………...

9. In the ………………method, one starts from observed data and then generalisation is done

308 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

14.4 Types of Report Notes

There are two types of reports (1) Oral report (2) Written report.

14.4.1 Oral Report

This type of reporting is required, when the researchers are asked to make an oral presentation.
Making an oral presentation is somewhat difficult compared to the written report. This is
because the reporter has to interact directly with the audience. Any faltering during an oral
presentation can leave a negative impression on the audience. This may also lower the self-
confidence of the presenter. In an oral presentation, communication plays a big role. A lot of
planning and thinking is required to decide 'What to say', 'How to say', 'How much to say'. Also,
the presenter may have to face a barrage of questions from the audience. A lot of preparation is
required; the broad classification of an oral presentation is as follows.

Nature of an Oral Presentation

Opening: A brief statement can be made on the nature of discussion that will follow. The opening
statement should explain the nature of the project, how it came about and what was attempted.

Finding/Conclusion: Each conclusion may be stated backed up by findings.

Recommendation: Each recommendation must have the support of conclusion. At the end of the
presentation, question-answer session should follow from the audience.

Method of presentation: Visuals, if need to be exhibited, can be made use of. The use of tabular
form for statistical information would help the audience.

(a) What type of presentation is a root question? Is it read from a manuscript or memorized or
delivered ex-tempo. Memorization is not recommended, since there could be a slip during
presentation. Secondly, it produces speaker-centric approach. Even reading from the manuscript
is not recommended, because it becomes monotonous, dull and lifeless. The best way to deliver
in ex-tempo, is to make main points notes, so that the same can be expanded. Logical sequences
should be followed.

Notes Points to remember in oral presentation:

1. Language used must be simple and understandable.

2. Time Management should be adhered.

3. Use of charts, graph, etc., will enhance understanding by the audience.

4. Vital data such as figures may be printed and circulated to the audience so that their
ability to comprehend increases, since they can refer to it when the presentation is
going on.

5. The presenter should know his target audience well in advance to prepare tailor-
made presentation.

6. The presenter should know the purpose of report such as "Is it for making a decision",
"Is it for the sake of information", etc.

LOVELY PROFESSIONAL UNIVERSITY 309


Research Methodology

Notes 14.4.2 Written Report

Following are the Various Types of Written Reports:

(A) Reports can be classified based on the time-interval such as:

(1) Daily

(2) Weekly

(3) Monthly

(4) Quarterly

(5) Yearly

(B) Type of reports:

(1) Short report

(2) Long report

(3) Formal report

(4) Informal report

(5) Government report

1. Short report: Short reports are produced when the problem is very well defined and if the
scope is limited. For example, Monthly sales report. It will run into about five pages. It
consists of report about the progress made with respect to a particular product in a clearly
specified geographical locations.

2. Long report: This could be both a technical report as well as non-technical report. This will
present the outcome of the research in detail.

(a) Technical report: This will include the sources of data, research procedure, sample
design, tools used for gathering data, data analysis methods used, appendix,
conclusion and detailed recommendations with respect to specific findings. If any
journal, paper or periodical is referred, such references must be given for the benefit
of reader.
(b) Non-technical report: This report is meant for those who are not technically qualified.
E.g. Chief of the finance department. He may be interested in financial implications
only, such as margins, volumes, etc. He may not be interested in the methodology.

3. Formal report:

Example: The report prepared by the marketing manager to be submitted to the Vice-
President (marketing) on quarterly performance, reports on test marketing.

4. Informal report: The report prepared by the supervisor by way of filling the shift log
book, to be used by his colleagues.

5. Government report: These may be prepared by state governments or the central government
on a given issue.

Example: Programme announced for rural employment strategy as a part of five-year plan.

310 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

Notes

Did u know? Report on children's education is a kind of government and social welfare
report.

14.4.3 Distinguish between Oral and Written Report

Oral report Written report

No rigid standard format. Standard format can be adopted.

Remembering all that is said is difficult if not This can be read a number of times and
impossible. This is because the presenter cannot clarification can be sought whenever the reader
be interrupted frequently for clarification. chooses.

Tone, voice modulation, comprehensibility and Free from presentation problems.


several other communication factors play an
important role.

Correcting mistakes if any, is difficult. Mistakes, if any, can be pinpointed and


corrected.

The audience has no control over the speed of Not applicable.


presentation.

The audience does not have the choice of picking The reader can pick and choose what he thinks
and choosing from the presentation. is relevant to him. For instance, the need for
information is different for technical and non-
technical persons.

Self Assessment

Fill in the blanks:

10. In an oral presentation, ……………….plays a big role.

11. ………….report presents the outcome of the research in detail.

12. The …………….statement should explain the nature of the project, how it came about and
what was attempted.

14.5 Preparation of Research Report

Having decided on the type of report, the next step is report preparation. The following is the
format of a research report:

1. Title Page

2. Page Contents

3. Executive Summary

4. Body

5. Conclusions and Recommendations

6. Bibliography

7. Appendix

1. Title Page: Title Page should indicate the topic on which the report is prepared. It should
include the name of the person or agency who has prepared the report.

LOVELY PROFESSIONAL UNIVERSITY 311


Research Methodology

Notes 2. Table of Contents: The table of contents will help the reader to know "what the report
contains". The table of contents should indicate the various parts or sections of the report.
It should also indicate the chapter headings along with the page number.

Chapter no. Title of the chapter Page no.


Declaration
Certificates
Acknowledgement
Executive summary
1 Introduction to the project
2 Research design and methodology
3 Theoretical perspective of the study
4 Company and industry profile
5 Data analysis and interpretation
6 Summary of findings, suggestions and conclusions
Bibliography
Appendix

3. Executive Summary: If your report is long and drawn out, the person to whom you have
prepared the report may not have the time to read it in detail. Apart from this, an executive
summary will help in highlighting major points. It is a condensed version of the whole
report. It should be written in one or two pages. Since top executives read only the executive
summary, it should be accurate and well-written. An executive summary should help in
decision-making.

An executive summary should have,

(a) Objectives

(b) Brief methodology

(c) Important findings

(d) Key results

(e) Conclusion

4. The Body: This section includes:

(a) Introduction

(b) Methodology

(c) Limitations

(d) Analysis and interpretations

Introduction: The introduction must explain clearly the decision problem and research
objective. The background information should be provided on the product and services
provided by the organisation which is under study.

Methodology: How you have collected the data is the key in this section. For example,
Was primary data collected or secondary data used? Was a questionnaire used? What was
the sample size and sampling plan and method of analysis? Was the design exploratory or
conclusive?

Limitations: Every report will have some shortcoming. The limitations may be of time,
geographical area, the methodology adopted, correctness of the responses, etc.

312 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

Analysis and interpretations: collected data will be tabulated. Statistical tools if any will Notes
be applied to make analysis and to take decisions.

5. Conclusion and Recommendation:

(a) What was the conclusion drawn from the study?

(b) Based on the study, what recommendation do you make?

6. Bibliography: If portions of your report are based on secondary data, use a bibliography
section to list the publications or sources that you have consulted. The bibliography
should include, title of the book, name of the journal in case of article, volume number,
page number, edition, etc.

7. Appendix: The purpose of an appendix is to provide a place for material which is not
absolutely essential to the body of the report. The appendix will contain copies of data
collection forms called questionnaires, details of the annual report of the company, details
of graphs/charts, photographs, CDs, interviewers' instructions. Following are the items
to be placed in this section.

(a) Data collection forms


(b) Project related paper cuttings

(c) Pictures and diagrams related to project

(d) Any other relevant things.

!
Caution The date of the submission of the report is to be included in the title page of the
report.

14.5.1 How to Write a Bibliography?

Bibliography, the last section of the report comes after appendices. Appendices contains
questionnaires and other relevant material of the study. The bibliography contains the source of
every reference used and any other relevant work that has been consulted. It imparts an
authenticity regarding the source of data to the reader.

Bibliography are of different types viz., bibliography of works cited; this contains only the
items referred in the text. A selected bibliography lists the items which the author thinks are of
primary interest to the reader. An annotated bibliography gives brief description of each item.
The method of representing bibliography is explained below.

Books

Name of the author, title of the book (underlined), publisher's detail, year of publishing, page
number.

Single Volume Works. Dube, S. C. "India's Changing Villages", Routledge and Kegan Paul Ltd.,
1958, p. 76.

Chapter in an Edited Book

Warwick, Donald P., "Comparative Research Methods" in Balmer, Martin and Donald Warwick
(eds), 1983, pp. 315-30.

LOVELY PROFESSIONAL UNIVERSITY 313


Research Methodology

Notes Periodicals Journal

Dawan Radile (2005), "They Survived Business World" (India), May 98, pp. 29-36.

Newspaper, Articles

Kumar Naresh, "Exploring Divestment", The Economic Times (Bangalore), August 7, 1999, p. 14.

Website

www.infocom.in.com

For citing Seminar Paper

Krishna Murthy, P., "Towards Excellence in Management" (Paper presented at a Seminar in XYZ
College Bangalore, July 2000).

Task List the various abbreviations frequently used in footnotes with their meanings.

Self Assessment

Fill in the blanks:

13. The ………………..should indicate the various parts or sections of the report.

14. …………..Page should indicate the topic on which the report is prepared.

15. A selected bibliography lists the items which the author thinks are of ………….interest to
the reader.

14.6 Style, Layout and Precautions of the Report writing

14.6.1 Style of Report Writing

Remember that the reader:

 Has short of time,

 Has many other urgent matters demanding his or her interest and attention,

 Is probably not knowledgeable concerning 'research jargon'.

Therefore, the rules are:

 Simplify. Keep to the essentials.

 Justify. Make no statement that is not based on facts and data.

 Quantify when you have the data to do so. Avoid large, small, instead, say 50%, one in
three.

 Be precise and specific in your phrasing of findings.

 Inform, not impress. Avoid exaggeration.

 Use short sentences.

314 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

 Use adverbs and adjectives sparingly. Notes

 Avoid the passive voice, if possible, as it creates vagueness (e.g., 'patients were interviewed'
leaves uncertainty as to who interviewed them) and repeated use makes dull reading.
 Aim to be logical and systematic in your presentation.

!
Caution In report writing, be consistent in the use of tenses (past or present tense).

14.6.2 Layout of the Report

A good physical layout is important, as it will help your report:

i. Make a good initial impression,

ii. Encourage the readers, and

iii. Give them an idea of how the material has been organised so the reader can make a quick
determination of what he will read first.

Particular attention should be paid to make sure there is:

i. An attractive layout for the title page and a clear table of contents.

ii. Consistency in margins and spacing.

iii. Consistency in headings and subheadings, for example, font size 16 or 18 bold, for headings
of chapters; size 14 bold for headings of major sections; size 12 bold, for headings of sub-
sections, etc.

iv. Good quality printing and photocopying. Correct drafts carefully with spell check as well
as critical reading for clarity by other team-members, your facilitator and, if possible,
outsiders.

v. Numbering of figures and tables, provision of clear titles for tables, and clear headings for
columns and rows, etc.

vi. Accuracy and consistency in quotations and references.

14.6.3 Precautions in Report Writing

Endless description without interpretation is another pitfall. Tables need conclusions, not detailed
presentation of all numbers or percentages in the cells which readers can see for themselves.

Notes The unit discussion, in particular, needs comparison of data, highlighting of


unexpected results, your own or others' opinions on problems discovered, weighing of
pro's and con's of possible solutions. Yet, too often the discussion is merely a dry summary
of findings.
Neglect of qualitative data is also quite common. Still, quotes of informants as illustration of
your findings and conclusions make your report lively. They also have scientific value in allowing
the reader to draw his/her own conclusions from the data you present. (Assuming you are not
biased in your presentation!)

Sometimes qualitative data (e.g., open opinion questions) are just coded and counted like
quantitative data, without interpretation, whereas they may be providing interesting illustrations

LOVELY PROFESSIONAL UNIVERSITY 315


Research Methodology

Notes of reasons for the behavior of informants or of their attitudes. This is serious maltreatment of
data that needs correction.

The following must be avoided while preparing a report:

 The inclusion of careless, inaccurate, or conflicting data.

 The inclusion of outdated or irrelevant data.

 Facts and opinions that are not separated.

 Unsupported conclusions and recommendations.

 Careless presentation and proofreading.

 Too much emphasis on appearance and not enough on content.

Self Assessment

Fill in the blanks:

16. In a report there must be …………….in margins and spacing.

17. Aim must be logical and ……………in the report presentation

14.7 Summary

 A report is a very formal document that is written for a variety of purposes, generally in
the sciences, social sciences, engineering and business disciplines.

 The most important aspect to be kept in mind while developing research report, is the
communication with the audience.

 Report should be able to draw the interest of the readers. Therefore, report should be
reader centric.

 Other aspect to be considered while writing report are accuracy and clarity.

 The point to be remembered while doing oral presentation is language used, Time
management, use of graph, purpose of the report, etc. Visuals used must be understandable
to the audience.

 The presenter must make sure that presentation is completed within the time allotted.
Sometime should be set apart for questions and answers.

 Written report may be classified based on whether the report is a short report or a long
report. It can also be classified based on technical report or non technical report.

 Written report should contain title page, contents, executive summary. Body, conclusions
and appendix. The last part is bibliography.

 The style of the report should be simple and to the essentials.

 There should not be endless description in report writing and qualitative data is not to be
excluded.

14.8 Keywords

Appendix: The part of the report whose purpose is to provide a place for material which is not
absolutely essential to the body of the report.

316 LOVELY PROFESSIONAL UNIVERSITY


Unit 14: Report Writing

Bibliography: The section to list the publications or sources that you have consulted in Notes
preparation of report

Executive Summary: It is a condensed version of the whole report.

Informal Report: The report prepared by the supervisor by way of filling the shift log book, to
be used by his colleagues

Short Report: Short reports are the reports that are produced when the problem is very well
defined and if the scope is limited.

14.9 Review Questions

1. What is a research report?

2. What are the characteristics of report?

3. What is the criterion for an oral report? Explain.

4. What is meant by "consider the audience" when writing a research report.

5. On what criteria, oral report is evaluated? Suggest a suitable format.

6. Why are visual aids used in oral presentation?

7. What are the various criteria used for classification of written report?

8. What are the essential content of the following parts of research report?

(a) Table of contents

(b) Title page

(c) Executive summary

(d) Introduction

(e) Conclusion

(f) Appendix

9. Oral presentation requires the researcher to be good public speaker explain.

10. Explain the style and layout of report.

Answers: Self Assessment

1. need 2. reality

3. decision maker 4. Currency

5. Research report 6. final

7. Interpretation 8. analysed

9. induction 10. communication

11. Long 12. opening

13. table of contents 14. Title

15. primary 16. Consistency

17. systematic

LOVELY PROFESSIONAL UNIVERSITY 317


Research Methodology

Notes 14.10 Further Readings

Books Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation, Baltimore: John Hopkins
University Press, 1943.

Bernal, J.D., The Social Function of Science, London: George Routledge and Sons,
1939.

Chase, Stuart, The Proper Study of Mankind: An inquiry into the Science of Human
Relations, New York, Harper and Row Publishers, 1958.

S. N. Murthy and U. Bhojanna, Business Research Methods, Excel Books.

318 LOVELY PROFESSIONAL UNIVERSITY


Statistical Tables

Notes
Statistical Tables

I. Logarithms

LOVELY PROFESSIONAL UNIVERSITY 319


Research Methodology

Notes

I. Logarithms

320 LOVELY PROFESSIONAL UNIVERSITY


Statistical Tables

II. Antilogarithms Notes

LOVELY PROFESSIONAL UNIVERSITY 321


Research Methodology

Notes II. Antilogarithms

322 LOVELY PROFESSIONAL UNIVERSITY


Statistical Tables

Notes

III. Binomial Coefficients

IV. Values of e–m

Note: To obtain the value of e–1.75, we write e–1.75 = e–1 × e–0.75 = 0.s36788 × 0.4724 = 0.17379

LOVELY PROFESSIONAL UNIVERSITY 323


Research Methodology

Notes

V. Ordinates of Normal Curve

0.3989

0.1182

–1.56 0 1.56

324 LOVELY PROFESSIONAL UNIVERSITY


Statistical Tables

Notes

VI. Areas under the Normal Curve

P(z)
0.4945

P(0  z  2.54) = 0.4945

0 2.54 z

LOVELY PROFESSIONAL UNIVERSITY 325


Research Methodology

Notes

VII. Critical Values of t

–2.086 0 +2.086 0 1.725

two tailed test t.05,20 = ± 2.086 ; one tailed test t.05,20 = 1.725

326 LOVELY PROFESSIONAL UNIVERSITY


Statistical Tables

VIII. Critical Values of c 2 Notes

p( 2)

2
.05,12 = 21.026

2
0 21.026

LOVELY PROFESSIONAL UNIVERSITY 327


Research Methodology

Notes

F.01, 9, 7 = 6.72

F
6.72
IX. Critical Values of F

0
p ( F)

328 LOVELY PROFESSIONAL UNIVERSITY


Statistical Tables

Notes

LOVELY PROFESSIONAL UNIVERSITY 329


Research Methodology

Notes

X. Quality Control Charts

330 LOVELY PROFESSIONAL UNIVERSITY

You might also like